speech recognition system seminar
DESCRIPTION
TRANSCRIPT
![Page 1: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/1.jpg)
SPEECH RECOGNITION
SYSTEMS
TWINKLE SAHU CSE 6TH SEM
![Page 2: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/2.jpg)
INTRODUCTION• Speech recognition is a process by which a
computer takes a speech signal (recorded using a microphone) and converts it into words in real-time. It is achieved by following certain steps and the software responsible for it is known as a ‘Speech Recognition System’
• SR systems are usually implemented in the form of dictation software and intelligent assistants in personal computers, smartphones, web browsers and many other devices.
![Page 3: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/3.jpg)
CHALLENGES IN THE DESIGN OF A SR
SYSTEMSR systems have to deal with a large number of challenges like :-• The speaker’s voice is often accompanied by
surrounding noise which makes their accurate recognition difficult.
• A speaker may speak a number of different words and all of these words have to be accurately recognized.
• Accent of speaking varies from person to person and this is a very big challenge
• A speaker may speak something very quickly and all of the words spoken have to be individually recognized accurately.
![Page 4: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/4.jpg)
TYPES OF SR SYSTEMS
• Speaker Dependent SR systems : Work by learning the unique characteristics of a single person’s voice and depend on the speaker for training.
• Speaker Independent SR systems : Designed to recognize anyone’s voice, so no training is involved.
![Page 5: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/5.jpg)
BASIC PRINCIPLES OF SPEECH RECOGNITION• The smallest unit of spoken language is known as
a Phoneme.
• The English language contains approximately 44 phonemes representing all the vowels and consonants that we use for speech.
• We can take the example of a typical word such
as moon which can be broken down into three phonemes: m, ue, n.
![Page 6: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/6.jpg)
• To interpret speech we must have a way of identifying the components of spoken words and phonemes act as identifying markers within speech.
• An algorithm has to be used to interpret the speech further. The Hidden Markov Model is a commonly used mathematical model used to do this.
• To create a speech recognition engine, a large database of models is created to match each phoneme.
• When a comparison is performed, the most likely match is determined between the spoken phoneme and the stored one, and further computations are performed.
![Page 7: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/7.jpg)
COMPONENTS OF SPEECH RECOGNITION
• Corpus Collection : Database consisting of speech data that built from multiple speech samples.
![Page 8: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/8.jpg)
• Corpus collection construction for a speaker-dependent SR system :-
![Page 9: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/9.jpg)
• Corpus collection construction for a speaker-independent SR system.
![Page 10: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/10.jpg)
• Signal Analyzer :Analyses the speech signaland removes the background noise thus focusing only on the speaker’s speech .
• Acoustic Model : Identifies phonemes from the speech sample using a probability based mathematical model.
ACOUSTIC MODEL
![Page 11: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/11.jpg)
• Language Model : Identifies words and thus sentences uttered by the speaker from the phonemes by making use of a dictionary file and grammar file.
DICTIONARY FILE
GRAMMAR FILE
![Page 12: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/12.jpg)
PROCESS OF SPEECH
RECOGNITIONPAIN……
……
SPEECH ANALYZER
![Page 13: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/13.jpg)
/p/--/ae/--/n/
SPEECH ANALYZER
![Page 14: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/14.jpg)
ACOUSTIC MODEL
/p/--/ae/--/n/
CORRECT
TRAINED HIDDEN MARKOV MODEL
/p/--/ae/--/n/
![Page 15: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/15.jpg)
DICTIONARY FILE
GRAMMAR FILE
/p/--/ae/--/n/ pain
pain
pain
TEXT OUTPUT
LANGUAGE MODEL
![Page 16: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/16.jpg)
The Grammar File
![Page 17: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/17.jpg)
HIDDEN MARKOV MODEL• Markov models are excellent ways of abstracting
simple concepts into a relatively easily computable form.
• Used in data compression to sound recognition.
From this graph we can create sequences such as:
N1 N2 N3N1 N2 N2 N2 N3 N3 N3 N3 N3
N1 N1 N2 N2 N3
![Page 18: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/18.jpg)
N1 N2 N3 = 0.4 X 0.8 X 0.5 = 0.16 N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x 0.5 x 0.5 x 0.5 x 0.5 = 0.0008 N1 N1 N2 N2 N3 = 0.6 x 0.4 x 0.2 x 0.8 x 0.5 = 0.192
![Page 19: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/19.jpg)
This accommodates for pronunciations such as:t ow m aa t ow - British Englisht ah m ey t ow - American Englisht ah mey t a - Possibly pronunciation when speaking quickly
![Page 20: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/20.jpg)
With sentences such as:I like apple juice - Very probableI like tomato juice - Very improbable!I hate apple juice - Relatively improbableI hate tomato juice - Relatively probable
![Page 21: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/21.jpg)
• The Markov Model makes the Speech Recognition systems more intelligent i.e. it can accurately differentiate between similar sounding words like in the case :
James's school... James is cool
• In simpler Markov models , the state is directly visible to the observer.
• In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible.
![Page 22: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/22.jpg)
PERFORMANCE OF A SR SYSTEM
• Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor.
• Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).
![Page 23: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/23.jpg)
Factors affecting the accuracy of a SR system :-
• Vocabulary size and confusability• Speaker dependence vs. independence• Isolated, discontinuous, or continuous
speech• Task and language constraints• Read vs. spontaneous speech• Adverse conditions
![Page 24: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/24.jpg)
APPLICATIONS• Health Care
• Military - High Performance Aircrafts - Air Traffic Control Systems
• Telephony – Smart-phones - Customer Helpline Services
• Personal Computers
![Page 25: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/25.jpg)
SIRI AND GOOGLE NOW
Intelligent Personal Assistant developed by Apple.
Google Now is an intelligent personal assistant developed by Google.
Both use a combination of speaker- dependent and speaker-independent sr systems
![Page 26: Speech recognition system seminar](https://reader033.vdocuments.us/reader033/viewer/2022061221/54bde6f94a79594b0e8b45f7/html5/thumbnails/26.jpg)
CONCLUSION• Speech Recognition systems are an indispensable
part of the ever-advancing field of human-computer interaction.
• Needs greater research to tackle various challenges.
Thank You!