Speech input/voice recognition has been a long standing area of research. While progress is being made, it is slower than optimists like IBM (spear headers of the early device “Shoebox”) and members of the health care domain originally predicted, and further work remains in this field. Although the goal of continuous speech recognition remains difficult to master, unnatural, isolated-word speech recognition is appropriate for some tasks and even natural for communicating with a computer, rather than another human.
Speech recognition has been best utilized as of late with telephony and other domains such as computer gaming. The improvement of mobile processor speeds made feasible the speech-enabled Symbian and Windows Mobile Smartphones. Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands (Wiki, July 2010). Research is needed not only in the actual speech recognition technology but also in how to use speech in an interface (Kamel, 1990).
The ideal that a perfect computer is one that behaves and communicates just like a personal assistant is a naive one: people should only expect computers to behave like the tools they are, not like other people; and furthermore the computer-as-person approach ultimately limits the usefulness of the computer to that of the person being mimicked. The obstacle in improving the usefulness of interactive systems such as speech recognition software gradually lies in communicating requests and results between the system and its user. The best hope for progress in this area now lies at the user interface, rather than the system interior.
Faster, more natural, and more convenient means for users and computers to exchange information are needed. Is speech recognition where it’s at? On the user’s side, interactive system technology is bridled by the nature of human communication devices; i. e. brain, lips, tongue, etc. and abilities; on the computer side, it is constrained only by input/output devices and methods that we can invent. The challenge is to design new devices/software and types of dialogues that better fit and take advantage of the communication-relevant characteristics of humans.
So where does that leave us as we look forward to bigger and better ways of utilizing speech recognition (SR)? What is the future of SR? DARPA has three teams of researchers working on Global Autonomous Language Exploitation (GALE), a program that will take in streams of information from foreign news broadcasts and newspapers and translate them. It hopes to create software that can instantly translate two languages with at least 90 percent accuracy. (Grabianowski, July 2010). At some point in the future, speech recognition may become speech understanding.
Computers could potentially not only translate what was said and annotate it, but actually grasp the meaning behind the words. The staggering amount of computing power needed behind such a feat is just too far out to believe we are close to that at this time though. Accuracy of speech recognition stopped improving in 2001, well before reaching human levels. Funders stopped many projects. In the early 1990s, the newly minted Microsoft Research organization developed a system called MindNet which traced out a network in a dictionary from each word to its every mention in the definitions of other words.
MindNet was shelved in 2005. The Defense Advanced Research Projects Agency (DARPA) financed investigations into conversational speech recognition but shifted priorities and money after accuracy plateaued. Attention has now shifted from speech recognition to research to “understand and emulate relevant human capabilities” as well as understanding how the brain processes language. This fundamental shift in direction acknowledges that “speech recognition” is not the answer. (Baker, Deng, Glass, Khudanpur, Lee, Morgan, O’Shaughnessy, May 2009).