seminar speech
TRANSCRIPT
-
8/7/2019 Seminar Speech
1/14
-: SPEECH RECOGNITION :-
Introduction
One dont have to be a scientist to know that the computer of the future will
talk, listen and understand. One of them is the Apple Macintosh of today.
Apples Speech Recognition and Speech Synthesis Technologies now give
speech-savvy applications the power to carry out your voice commands and
even speak back to you in plain English.
Apple Speech Recognition lets the system (Macintosh) understand what you
say, giving you a new dimension for interacting with and controlling your
computer by voice. You dont even have to train it to understand your voice,
because it already understands you, from your very first word. You can
speak naturally, without pausing or stopping. Apples leadership in speech
recognition technology makes it possible by bringing a whole new dimension
to the user interface: speech. Combined with Voice-Over, speech synthesis
will help turn the graphical user interface into a vocal user interface.
Speech recognition (in many contexts also known as 'automatic speechrecognition', computer speech recognition or erroneously as Voice
Recognition) is the process of converting a speech signal to a sequence of
words, by means of an algorithm implemented as a computer program.
Speech recognition applications that have emerged over the last years
include voice dialing (e.g., Call home), call routing (e.g.,I would like to
make a collect call), simple data entry (e.g., entering a credit card number),
and preparation of structured documents (e.g., a radiology report).
Voice Verification or speaker recognition is a related process that attempts to
identify the person speaking, as opposed to what is being said.
1
-
8/7/2019 Seminar Speech
2/14
Speech Technology Development at IBM:
The overall view, with emphasis on Via-Scribe and Accessibility
Speech technologies development, deployments
Technology Applications
Large Vocabulary Speech
Recognition
Broadcast news transcription, Content spotting and indexing,
Via-Scribe, MALACH, DARPA projects
Telephony Speech
Recognition (+natural
language understanding)
Mutual funds transactions, contact center call routing, contact
center analytics
Embedded Speech
Recognition
(+ multimodal input)
Embedded speech in telematics (e.g., vehicles), devices (e.g.,
cell phones, pdas, etc.) And other consumer appliances (e.g.,
set top boxes, DVD players).
Audio Visual Speech Improved ASR on trading floor
2
-
8/7/2019 Seminar Speech
3/14
Recognition
Conversational Biometrics Speaker identification, speaker verification
Text to Speech Synthesis Home Page Reader, viavoice
Machine Translation MASTOR, DARPA projects, websphere
Speech Analytics: Automated Quality Assurance Application
Monitor 100% of calls
Download recorded calls daily from across North America
Answer questions and assign default ratings
Provide a ranked list to human monitors to focus on bad calls
Speech recognition is the process of converting an acoustic signal, captured by
a microphone or a telephone, to a set of words. The recognized words can be the
final results, as for applications such as commands & control, data entry, and
document preparation. They can also serve as the input to further linguistic
processing in order to achieve speech understanding.
An isolated-word speech recognition system requires that the speaker pause
briefly between words, whereas a continuous speech recognition system does not.
Spontaneous, or extemporaneously generated, speech contains disfluencies, and is
much more difficult to recognize than speech read from script. Some systems
require speaker enrollment---a user must provide samples of his or her speech
before using them, whereas other systems are said to be speaker-independent, in
that no enrollment is necessary. Some of the other parameters depend on the
specific task. Recognition is generally more difficult when vocabularies are large
or have many similar-sounding words. When speech is produced in a sequence of
words, language models or artificial grammars are used to restrict the combination
of words.
3
-
8/7/2019 Seminar Speech
4/14
Speech recognition is a technology that is constantly evolving. It is a technology
that is experiencing tremendous growth in the commercial market, apart from its
original niche as an assistive technology product. There are presently three majorcompanies with speech recognition products, Dragon Systems, Lernout & Hauspie
(L&H), and IBM. Stiff competition between these companies and more demand
from consumer and business markets, has led to a tremendous drop in prices over
the last few years. Competition has also fueled the development of a plethora of
new products. Each company has several products available, ranging in price,
features, and the applications that they support. This paper seeks to make sense of
the overwhelming array of products so that persons who are shopping for speech
recognition will have a better understanding of their choices.
What are the Types of Speech Recognition?
*Discrete
Slower dictation process - better for persons with difficulty in language
processing or in fluid speech
Word-by-word style, rather than phrases, reflects the way beginning writers
form sentences
*Continuous
Processes speech by phrase
Takes context into account
Is less accurate if phrases are interrupted
Advantages: Speed and accuracy (for most users)
Who Can Benefit from Speech Recognition?
Persons with mobility impairments or injuries that prevent keyboard access
Persons who have or who are seeking to prevent repetitive stress injuries
Persons with writing difficulties
Any person who want hands-free access to the computer
4
-
8/7/2019 Seminar Speech
5/14
Any persons who wants to increase their typing speed
(reportedly up to 160 wpm)
What is Required to Use Speech Recognition? A Powerful Computer
Consistent Speech (not necessarily intelligible)
Fluid speech (i.e., not pausing between words) desirable for use of
continuous speech products
Patience
Basic knowledge of computers
Fairly high cognitive ability
Applications of speech recognition
Command recognition - Voice user interface with the computer
Dictation
Interactive Voice Response
Automotive speech recognition
Medical Transcription
Pronunciation Teaching in computer-aided language learning applications
Automatic Translation
Hands-free computing
Speech Analysis
Speech analysis/input deals with the the following research areas;
Speech Analysis
5
-
8/7/2019 Seminar Speech
6/14
WHO? What? How?
Verification Identification Recognition Understanding
Human speech has certain characteristics determined by a speaker. Hence,
speech analysis can serve to analyze who is speaking,i.e. To recognize a
speaker for his/her identification and verification. The computer identifies and
verifies the speaker using an acoustic finger print. An acoustic finger print is a
digitally stored speech probe of a person; for example a company that uses the
speech analysis for identification and verification of the employees. The
employee has to say a certain sentence into a microphone. The computer
system gets the speakers voice, identifies it and verifies the spoken statement.
Another main task of the speech analysis is to analyses what has been said,i.e.
To recognize and understand the speech signal itself. Based on the speech
sequence the corresponding text is generated. This can lead to a speech
controlled type writer, a translation system or part of a workplace for the
physically-challenged.
Another area of speech analysis tries to reseach sppech paterns with respect to
how a certain statement was said. For example, a spoken sentence sound s
differently if a person is angry or calm. An another application of this research
could be a LIE-DETECTOR.
The primary goal of the speech analysis is to correctly determine individual words
with probability 1. A word is recognized only with a certain probability.
Environmental noise, room acoustics and a speakers physical and
psychological conditions play an important role.
6
-
8/7/2019 Seminar Speech
7/14
For example, lets assume extremely bad individual words recognition with a
probability of 0.95. This means that 5% of the words are incorrectly
recognized. If we have a sentence with three words, the probability of
recognizing the sentence correctly is 0.95 0.95 0.95 = 0.857.
This small example should emphasize the fact that speech analysis system
should have a very high individual word recognize the fact that speech
analysis system should have a very high individual word recognition
probability.
Speech recognition system
Speech
Special chip Main Program
Recognized Speech
The speech recognition system is divided into system components according
to a baisic principle: Data Reduction Through property Extraction.
First speech analysis occrs where properties must be determined.
7
Reference storage:
Properties of
Learned Material
Speech Analysis:
Parameters;
Response,Pro ert Extraction
Problem Recognition:
Comparison with
Reference,Decision
--:Speech Recognition System :--
-
8/7/2019 Seminar Speech
8/14
Speech
Understoodspeech
Properties are extracted by comparision of individual speech element
characteristics with a sequence of in advance given speech element
characteristics. The characteristics with a sequence of in advance
given speech elements are present.
Second, the speech elements are compared with existent reference to
determine the mapping to one of the existent speech elements. The
identified speech can be stored, transmitted or processed as a
parameterized sequence of speech elements.
Usually the comparison and decision are executed through the main
system processor. The computers secondary storage contains theletter0to-phone rules, a Dictionary of exceptions and a reference
characteristics. The concrete methods differ in definition of the
characteristics. The principle of data reduction through property
extraction, can be applied several times to different characteristics. The
system which provides recognition and understanding of a speech signal
applies this principle several times:-
Sound pattern Syntax SemanticsWord model
Acoustical and Syntactical Semantic
Phonetic Analysis Analysis Analysis
Recognized Speech
8
Components of speech recognition and understanding.
-
8/7/2019 Seminar Speech
9/14
-
8/7/2019 Seminar Speech
10/14
Speech recognition systems are divided intospeaker independent
recognition systems andspeaker-dependentrecognition system. A speaker
independent system can recognisewith the same reliability essentially fewer
words than a speaker dependent system because the latter is TRAINED IN
ADVANCE. Training in advance means that there exists a training phase for
the speech recognition system, which takes a half an hour. speaker-
dependentrecognition system can recognize around 25,000 words,speaker-
independentrecognition system can recognize around 500 words but with a
worse recognition rate. These should be understood as gross guidelines.
Speech Transmission
The area of speech transmission deals with the efficient coding to
transmit the speech/sound signal correctly and efficiently over networks
such that the same quality of speech/sound. Some principles are:
Signal form coding
Here no speech specific properties and parameters are needed.
Here the goal is to schieve the most effiecent of the audio signal. The
data rate of a PCM coded sterio audio signal with CD-quqlity
requirements is 1,411,200 bits/s.
Telephony quality , in comparision to Cd quality needs only 64
kbits/s. using DPCM,the data rate can be lowered to 56 kbits/s
without loss of quality.
Recognition/synthesis Methods
There have been attempt to reduce transmission rate using pure
recognition /synthesis methods. Speech analysis (recognition)
10
-
8/7/2019 Seminar Speech
11/14
follows on the sender side of a speech transmission system and
speech synthesis (generation) follows on the receiver side.
Analog Speech signal Speech Recognition
Coded Speech Signal
Speech Synthesis Analog speech signal
Conclusion
The major players in the speech recognition market are
Dragon Systems,Lernout & Hauspie (L&H), andIBM. Each
company offers several products, ranging in price and features. Because of
the variety of products available, shopping for a speech recognition system
can be an overwhelming experience.
Dragons original product, Dragon Dictate, is currently the only product
that uses the discrete speech model. Discrete speech, is the best solution for
persons with difficulty in language processing or in fluid speech, or who
form sentences one word at a time, rather than in phrases. The latest version,
3.0 Classic, offers fully functional voice control across all applications. It is
the only current speech recognition product that supports Windows 3.x.
Because it uses discrete speech, it is better than current continuous speech
products at recognizing the speech patterns of persons who naturally pause
between words, and seems to be better at learning to recognize persons with
unique speech patterns. Unfortunately, Dragon Systems has discontinued
development on this product, as the companys focus is now on continuous
speech products, which are more viable in the larger commercial market.
11
-
8/7/2019 Seminar Speech
12/14
Dragons current continuous speech product line, known as Dragon
NaturallySpeaking, includes a Standard, Preferred, and Professional edition,
listed in order from low end to high end. The Preferred edition includes
dictation playback and text-to-speech, features that distinguish it from the
Standard edition. The Preferred edition also supports input from an external
recording device, although no recording device is provided. A special
version of the Preferred edition, Dragon NaturallySpeaking Mobile, does
include a digital recording device for additional cost. On the high end of
Dragons NaturallySpeaking product line, the Professional edition is
distinguished by its expanded macro and scripting capabilities, which allow
users to dictate long sections of text or complex computer operations with
simple commands. The Professional edition also comes in Legal and Medical
versions, which feature custom vocabularies for these disciplines.
L & Hproducts are based on speech recognition technology
developed by Kurzweil, a major pioneer in speech recognition.
The current L&H product line, called VoiceXpress, includes a Standard,
Advanced, and Professional edition. The differences in these editions are
fairly straightforward. In the Standard edition, VoiceXpresss natural
language command interface works only in L&Hs own word processing
application, called XpressPad. The Advanced edition extends natural
language support to include Microsoft Word. The Professional edition
further extends natural language support to encompass the entire Microsoft
Office suite, plus Internet Explorer. The Professional edition also provides
support for recorded dictation, and includes a bundled digital recorder.
IBM has been a major player in speech recognition for many
years. Its discrete speech product, IBM VoiceType, was a12
-
8/7/2019 Seminar Speech
13/14
major competitor of Dragon Dictate. However, IBM has discontinued this
product and is now focusing all its efforts on developing continuous speech
products. Its current product line, IBM ViaVoice Millenium, includes a
Standard, Web and Professional edition. The web edition features natural
language commands for Internet Explorer, Netscape Communicator and
America Online. The web edition also features a specialized vocabulary for
on-line chats. The Professional edition provides most of the features of the
Web edition, but also provides natural language commands for the entire
Microsoft Office suite, and specialized business, finance, and computer
vocabularies.
Although speech recognition got its start as an assistive technology product,
the commercial market has fueled its rapid development in recent years, and
the primary target market of each of the companies described above is now
the general public, rather than persons with disabilities.
A person who has a disability or who works with persons with disabilities
will come out of this system with a more accurate representation on which
speech recognition products will best work with them. There is a lot of
confusion today about speech recognition products. The main focus of this
presentation is to clarify the speech recognition technology.
References
13
-
8/7/2019 Seminar Speech
14/14
Multilingual Speech Processing, Edited by Tanja Schultz and
Katrin Kirchhoff, April 2006
Multimedia : COMPUTING ,COMMNICATIONS &
APPLICATIONS (By. RALF STEINMETZ & KLARA
NABRSTED)
www.software.ibm.com/speech/
www.dragonsys.com
http://cslu.cse.ogi.edu/HLTsurvey/ch1node5.html
http://www.apple.com/macosx/developertools/
14