seminar speech

8/7/2019 Seminar Speech

1/14

-: SPEECH RECOGNITION :-

Introduction

One dont have to be a scientist to know that the computer of the future will

talk, listen and understand. One of them is the Apple Macintosh of today.

Apples Speech Recognition and Speech Synthesis Technologies now give

speech-savvy applications the power to carry out your voice commands and

even speak back to you in plain English.

Apple Speech Recognition lets the system (Macintosh) understand what you

say, giving you a new dimension for interacting with and controlling your

computer by voice. You dont even have to train it to understand your voice,

because it already understands you, from your very first word. You can

speak naturally, without pausing or stopping. Apples leadership in speech

recognition technology makes it possible by bringing a whole new dimension

to the user interface: speech. Combined with Voice-Over, speech synthesis

will help turn the graphical user interface into a vocal user interface.

Speech recognition (in many contexts also known as 'automatic speechrecognition', computer speech recognition or erroneously as Voice

Recognition) is the process of converting a speech signal to a sequence of

words, by means of an algorithm implemented as a computer program.

Speech recognition applications that have emerged over the last years

include voice dialing (e.g., Call home), call routing (e.g.,I would like to

make a collect call), simple data entry (e.g., entering a credit card number),

and preparation of structured documents (e.g., a radiology report).

Voice Verification or speaker recognition is a related process that attempts to

identify the person speaking, as opposed to what is being said.

1


2/14

Speech Technology Development at IBM:

The overall view, with emphasis on Via-Scribe and Accessibility

Speech technologies development, deployments

Technology Applications

Large Vocabulary Speech

Recognition

Broadcast news transcription, Content spotting and indexing,

Via-Scribe, MALACH, DARPA projects

Telephony Speech

Recognition (+natural

language understanding)

Mutual funds transactions, contact center call routing, contact

center analytics

Embedded Speech

Recognition

(+ multimodal input)

Embedded speech in telematics (e.g., vehicles), devices (e.g.,

cell phones, pdas, etc.) And other consumer appliances (e.g.,

set top boxes, DVD players).

Audio Visual Speech Improved ASR on trading floor

2


3/14

Recognition

Conversational Biometrics Speaker identification, speaker verification

Text to Speech Synthesis Home Page Reader, viavoice

Machine Translation MASTOR, DARPA projects, websphere

Speech Analytics: Automated Quality Assurance Application

Monitor 100% of calls

Download recorded calls daily from across North America

Answer questions and assign default ratings

Provide a ranked list to human monitors to focus on bad calls

Speech recognition is the process of converting an acoustic signal, captured by

a microphone or a telephone, to a set of words. The recognized words can be the

final results, as for applications such as commands & control, data entry, and

document preparation. They can also serve as the input to further linguistic

processing in order to achieve speech understanding.

An isolated-word speech recognition system requires that the speaker pause

briefly between words, whereas a continuous speech recognition system does not.

Spontaneous, or extemporaneously generated, speech contains disfluencies, and is

much more difficult to recognize than speech read from script. Some systems

require speaker enrollment---a user must provide samples of his or her speech

before using them, whereas other systems are said to be speaker-independent, in

that no enrollment is necessary. Some of the other parameters depend on the

specific task. Recognition is generally more difficult when vocabularies are large

or have many similar-sounding words. When speech is produced in a sequence of

words, language models or artificial grammars are used to restrict the combination

of words.

3


4/14

Speech recognition is a technology that is constantly evolving. It is a technology

that is experiencing tremendous growth in the commercial market, apart from its

original niche as an assistive technology product. There are presently three majorcompanies with speech recognition products, Dragon Systems, Lernout & Hauspie

(L&H), and IBM. Stiff competition between these companies and more demand

from consumer and business markets, has led to a tremendous drop in prices over

the last few years. Competition has also fueled the development of a plethora of

new products. Each company has several products available, ranging in price,

features, and the applications that they support. This paper seeks to make sense of

the overwhelming array of products so that persons who are shopping for speech

recognition will have a better understanding of their choices.

What are the Types of Speech Recognition?

*Discrete

Slower dictation process - better for persons with difficulty in language

processing or in fluid speech

Word-by-word style, rather than phrases, reflects the way beginning writers

form sentences

*Continuous

Processes speech by phrase

Takes context into account

Is less accurate if phrases are interrupted

Advantages: Speed and accuracy (for most users)

Who Can Benefit from Speech Recognition?

Persons with mobility impairments or injuries that prevent keyboard access

Persons who have or who are seeking to prevent repetitive stress injuries

Persons with writing difficulties

Any person who want hands-free access to the computer

4


5/14

Any persons who wants to increase their typing speed

(reportedly up to 160 wpm)

What is Required to Use Speech Recognition? A Powerful Computer

Consistent Speech (not necessarily intelligible)

Fluid speech (i.e., not pausing between words) desirable for use of

continuous speech products

Patience

Basic knowledge of computers

Fairly high cognitive ability

Applications of speech recognition

Command recognition - Voice user interface with the computer

Dictation

Interactive Voice Response

Automotive speech recognition

Medical Transcription

Pronunciation Teaching in computer-aided language learning applications

Automatic Translation

Hands-free computing

Speech Analysis

Speech analysis/input deals with the the following research areas;

Speech Analysis

5


6/14

WHO? What? How?

Verification Identification Recognition Understanding

Human speech has certain characteristics determined by a speaker. Hence,

speech analysis can serve to analyze who is speaking,i.e. To recognize a

speaker for his/her identification and verification. The computer identifies and

verifies the speaker using an acoustic finger print. An acoustic finger print is a

digitally stored speech probe of a person; for example a company that uses the

speech analysis for identification and verification of the employees. The

employee has to say a certain sentence into a microphone. The computer

system gets the speakers voice, identifies it and verifies the spoken statement.

Another main task of the speech analysis is to analyses what has been said,i.e.

To recognize and understand the speech signal itself. Based on the speech

sequence the corresponding text is generated. This can lead to a speech

controlled type writer, a translation system or part of a workplace for the

physically-challenged.

Another area of speech analysis tries to reseach sppech paterns with respect to

how a certain statement was said. For example, a spoken sentence sound s

differently if a person is angry or calm. An another application of this research

could be a LIE-DETECTOR.

The primary goal of the speech analysis is to correctly determine individual words

with probability 1. A word is recognized only with a certain probability.

Environmental noise, room acoustics and a speakers physical and

psychological conditions play an important role.

6


7/14

For example, lets assume extremely bad individual words recognition with a

probability of 0.95. This means that 5% of the words are incorrectly

recognized. If we have a sentence with three words, the probability of

recognizing the sentence correctly is 0.95 0.95 0.95 = 0.857.

This small example should emphasize the fact that speech analysis system

should have a very high individual word recognize the fact that speech

analysis system should have a very high individual word recognition

probability.

Speech recognition system

Speech

Special chip Main Program

Recognized Speech

The speech recognition system is divided into system components according

to a baisic principle: Data Reduction Through property Extraction.

First speech analysis occrs where properties must be determined.

7

Reference storage:

Properties of

Learned Material

Speech Analysis:

Parameters;

Response,Pro ert Extraction

Problem Recognition:

Comparison with

Reference,Decision

--:Speech Recognition System :--


8/14

Speech

Understoodspeech

Properties are extracted by comparision of individual speech element

characteristics with a sequence of in advance given speech element

characteristics. The characteristics with a sequence of in advance

given speech elements are present.

Second, the speech elements are compared with existent reference to

determine the mapping to one of the existent speech elements. The

identified speech can be stored, transmitted or processed as a

parameterized sequence of speech elements.

Usually the comparison and decision are executed through the main

system processor. The computers secondary storage contains theletter0to-phone rules, a Dictionary of exceptions and a reference

characteristics. The concrete methods differ in definition of the

characteristics. The principle of data reduction through property

extraction, can be applied several times to different characteristics. The

system which provides recognition and understanding of a speech signal

applies this principle several times:-

Sound pattern Syntax SemanticsWord model

Acoustical and Syntactical Semantic

Phonetic Analysis Analysis Analysis

Recognized Speech

8

Components of speech recognition and understanding.


9/14


10/14

Speech recognition systems are divided intospeaker independent

recognition systems andspeaker-dependentrecognition system. A speaker

independent system can recognisewith the same reliability essentially fewer

words than a speaker dependent system because the latter is TRAINED IN

ADVANCE. Training in advance means that there exists a training phase for

the speech recognition system, which takes a half an hour. speaker-

dependentrecognition system can recognize around 25,000 words,speaker-

independentrecognition system can recognize around 500 words but with a

worse recognition rate. These should be understood as gross guidelines.

Speech Transmission

The area of speech transmission deals with the efficient coding to

transmit the speech/sound signal correctly and efficiently over networks

such that the same quality of speech/sound. Some principles are:

Signal form coding

Here no speech specific properties and parameters are needed.

Here the goal is to schieve the most effiecent of the audio signal. The

data rate of a PCM coded sterio audio signal with CD-quqlity

requirements is 1,411,200 bits/s.

Telephony quality , in comparision to Cd quality needs only 64

kbits/s. using DPCM,the data rate can be lowered to 56 kbits/s

without loss of quality.

Recognition/synthesis Methods

There have been attempt to reduce transmission rate using pure

recognition /synthesis methods. Speech analysis (recognition)

10


11/14

follows on the sender side of a speech transmission system and

speech synthesis (generation) follows on the receiver side.

Analog Speech signal Speech Recognition

Coded Speech Signal

Speech Synthesis Analog speech signal

Conclusion

The major players in the speech recognition market are

Dragon Systems,Lernout & Hauspie (L&H), andIBM. Each

company offers several products, ranging in price and features. Because of

the variety of products available, shopping for a speech recognition system

can be an overwhelming experience.

Dragons original product, Dragon Dictate, is currently the only product

that uses the discrete speech model. Discrete speech, is the best solution for

persons with difficulty in language processing or in fluid speech, or who

form sentences one word at a time, rather than in phrases. The latest version,

3.0 Classic, offers fully functional voice control across all applications. It is

the only current speech recognition product that supports Windows 3.x.

Because it uses discrete speech, it is better than current continuous speech

products at recognizing the speech patterns of persons who naturally pause

between words, and seems to be better at learning to recognize persons with

unique speech patterns. Unfortunately, Dragon Systems has discontinued

development on this product, as the companys focus is now on continuous

speech products, which are more viable in the larger commercial market.

11


12/14

Dragons current continuous speech product line, known as Dragon

NaturallySpeaking, includes a Standard, Preferred, and Professional edition,

listed in order from low end to high end. The Preferred edition includes

dictation playback and text-to-speech, features that distinguish it from the

Standard edition. The Preferred edition also supports input from an external

recording device, although no recording device is provided. A special

version of the Preferred edition, Dragon NaturallySpeaking Mobile, does

include a digital recording device for additional cost. On the high end of

Dragons NaturallySpeaking product line, the Professional edition is

distinguished by its expanded macro and scripting capabilities, which allow

users to dictate long sections of text or complex computer operations with

simple commands. The Professional edition also comes in Legal and Medical

versions, which feature custom vocabularies for these disciplines.

L & Hproducts are based on speech recognition technology

developed by Kurzweil, a major pioneer in speech recognition.

The current L&H product line, called VoiceXpress, includes a Standard,

Advanced, and Professional edition. The differences in these editions are

fairly straightforward. In the Standard edition, VoiceXpresss natural

language command interface works only in L&Hs own word processing

application, called XpressPad. The Advanced edition extends natural

language support to include Microsoft Word. The Professional edition

further extends natural language support to encompass the entire Microsoft

Office suite, plus Internet Explorer. The Professional edition also provides

support for recorded dictation, and includes a bundled digital recorder.

IBM has been a major player in speech recognition for many

years. Its discrete speech product, IBM VoiceType, was a12


13/14

major competitor of Dragon Dictate. However, IBM has discontinued this

product and is now focusing all its efforts on developing continuous speech

products. Its current product line, IBM ViaVoice Millenium, includes a

Standard, Web and Professional edition. The web edition features natural

language commands for Internet Explorer, Netscape Communicator and

America Online. The web edition also features a specialized vocabulary for

on-line chats. The Professional edition provides most of the features of the

Web edition, but also provides natural language commands for the entire

Microsoft Office suite, and specialized business, finance, and computer

vocabularies.

Although speech recognition got its start as an assistive technology product,

the commercial market has fueled its rapid development in recent years, and

the primary target market of each of the companies described above is now

the general public, rather than persons with disabilities.

A person who has a disability or who works with persons with disabilities

will come out of this system with a more accurate representation on which

speech recognition products will best work with them. There is a lot of

confusion today about speech recognition products. The main focus of this

presentation is to clarify the speech recognition technology.

References

13


14/14

Multilingual Speech Processing, Edited by Tanja Schultz and

Katrin Kirchhoff, April 2006

Multimedia : COMPUTING ,COMMNICATIONS &

APPLICATIONS (By. RALF STEINMETZ & KLARA

NABRSTED)

www.software.ibm.com/speech/

www.dragonsys.com

http://cslu.cse.ogi.edu/HLTsurvey/ch1node5.html

http://www.apple.com/macosx/developertools/

14

seminar speech

Documents