speech recognition an overview
DESCRIPTION
Speech recognitionTRANSCRIPT
Speech Recognition Kimberlee A. Kemble
Program Manager, Voice Systems Middleware Education
IBM Corporation
Presenter: Sajana.A S2-ELT
Agenda
• What is speech Recognition??• Closer look• Terms & concepts• Components• How it works??• Pros & cons• Applications
What is speech recognition?
Speech Recognition (SR) is the ability to translate a dictation or spoken word to text.
Also known as “automatic speech recognition” (ASR), “computer speech recognition”, or “speech to text” (STT)
A Closer look!!!• Speech recognition engine
1. Command and control application The application can interpret the result of the
recognition as a command.
2. Dictation application Application handles the recognized text simply as text.
Terms &Concepts
• Utterances1. An utterance is any stream of speech between
two periods of silence. 2. Silence delineates the start and end of an
utterance.3. An utterance can be a single word, or it can
contain multiple words (a phrase or a sentence)
Continued..
• Pronunciations Represents what the speech engine thinks a word
should sound like. • Grammars
uses a particular syntax, or set of rules, to define the words and phrases that can be recognized by the engine.
define the domain, or context, within which the recognition engine works
Continued..• Speaker-dependent systems– Require “training” to “teach” the individual System– More robust– But less convenient– And obviously less portable
• Speaker-independent systems– Language coverage is reduced to compensate need to be
flexible in phoneme identification– Clever compromise is to learn on the fly
Components• Audio input• Grammar• Speech Recognition Engine• Acoustic Model• Recognized text
TheMicrophoneStore.comKnowBrainer.com
How it works??
Speech recognition Engine
Grammar
Acoustic model
Audio input Recognized
Text
ProcessHere’s another look at how SRS works...
Source:Automatic Speech Recognition: A ReviewPreeti Saini#1, Parneet Kaur*2
Acceptance and Rejection
• An accepted utterance is one in which the engine returns recognized text.
• confidence score along with the text to indicate the likelihood that the returned text is correct.
• Not all utterances that are processed by the speech engine are accepted
What’s hard about that?• Digitization
– Converting analogue signal into digital representation.
• Signal processing – Separating speech from background noise.
• Phonetics– Variability in human speech.
• Phonology– Recognizing individual sound distinctions (similar phonemes.)
• Lexicology and syntax– Disambiguating homophones.– Features of continuous speech.
• Syntax and pragmatics– Interpreting features.– Filtering of performance errors (disfluencies).
The Uses • Individuals With Disabilities – Assists those who have visual impairment, hand immobility, dyslexia, etc.
• Medical Transcription – Reduces delays to write out medical transcriptions
• Dictation - Converts words to text in emails or other word documents (also helpful for English Language Learners).
• Access Menu Commands – Opens files using voice commands.
Applications of Speech Recognition• Speech recognition applications include
Voice dialling (e.g., "Call home"), Call routing (e.g., "I would like to make a collect call"), Simple data entry (e.g., entering a credit card number), Preparation of structured documents (e.g., A radiology
report), Speech-to-text processing (e.g., word processors or
emails), and In aircraft cockpits (usually termed Direct Voice Input).
Applications• Medical Transcription• Military• Telephony and other domains• Serving the disabledFurther Applications• Home automation• Automobile audio systems• Telematics
TheMicrophoneStore.comKnowBrainer.com
Pros of Speech Recognition• Faster than “hand-writing”.• Allows for better spelling, whether it be in text
or documents.• Helpful for people with a mental or physical
disability .• Hands-free capability .
Cons of Speech Recognition
• No program is 100% perfect
• Factors that affect the accuracy of speech recognition are: slang, homonyms, signal-to-noise ratio, and overlapping speech
• Can be expensive depending on the program
Programs
Now let’s take a look at a some of the many SRS programs...DragonSiriIndigo
KnowBrainer.com
Using Dragon Mobile
ftp://public.dhe.ibm.com/software/pervasive/info/products/Introduction_to_Speech_Recognition.pdf
Different Home Appliances Control Scenarios
http://en.wikipedia.org/wiki/VoiceXML
The Future of Assistive Technologyin Schools
•Students who need assistance in their writing skills because they have stronger oral skills.
•Students who need were absent for a class, have poor memory, or need assistance hearing the lesson.
•Students who need assistance during Guided Reading.
•Students who are English Language Learners.
•Students with visual/hearing impairments and learning disabilities regarding reading/spelling/writing.
Conclusion
• Revolutionize the way people conduct business over the Web and ,differentiate world-class e-businesses.
• VoiceXML ties speech recognition and telephony together
• voice-enabled Web solutions TODAY!
References• Kai-Fu Lee, Hsiao-Wuen Hon, and Raj Reddy, An Overview of the SPHINX
Speech Recognition System. IEEE Transactions on Acoustics, Speech and Signal Processing,
• Pellom, B., Sonic: The University of Colorado Continuous Speech Recognition System.
• http://www.tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html• http://www.zachary.com/s/xvoice• http://xvoice.sourceforge.net/Willie Walker, Paul Lamere, Philip Kwok,
Bhiksha Raj, Rita Singh, Evandro Gouvea,• Peter Wolf, Joe Woelfel, Sphinx-4: A Flexible Open Source Framework for
SpeechRecognition.• A. Hagen, D. A. Connors, B. L. Pellom, The Analysis and Design of
Architecture Systems
thank you!