7- speech recognition (cont’d)
DESCRIPTION
7- Speech Recognition (Cont’d). HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training In HMM. Speech Recognition Concepts. Speech recognition is inverse of Speech Synthesis. Speech. Text. NLP. Speech Processing. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/1.jpg)
11
7-Speech Recognition (Cont’d)7-Speech Recognition (Cont’d)
HMM Calculating ApproachesHMM Calculating ApproachesNeural ComponentsNeural ComponentsThree Basic HMM ProblemsThree Basic HMM ProblemsViterbi AlgorithmViterbi AlgorithmState Duration ModelingState Duration ModelingTraining In HMMTraining In HMM
![Page 2: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/2.jpg)
22
Speech Recognition ConceptsSpeech Recognition Concepts
NLP SpeechProcessing
Text Speech
NLPSpeech Processing
Speech Understanding
Speech Synthesis
TextPhone Sequence
Speech Recognition
Speech recognition is inverse of Speech Synthesis
![Page 3: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/3.jpg)
33
Speech Recognition Speech Recognition ApproachesApproaches
Bottom-Up ApproachBottom-Up Approach
Top-Down ApproachTop-Down Approach
Blackboard ApproachBlackboard Approach
![Page 4: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/4.jpg)
44
Bottom-Up ApproachBottom-Up Approach
Signal Processing
Feature Extraction
Segmentation
Signal Processing
Feature Extraction
Segmentation
Segmentation
Sound Classification Rules
Phonotactic Rules
Lexical Access
Language Model
Voiced/Unvoiced/Silence
Kno
wle
dge
Sou
rces
Recognized Utterance
![Page 5: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/5.jpg)
55
UnitMatching
System
Top-Down ApproachTop-Down Approach
FeatureAnalysis
LexicalHypothesis
SyntacticHypothesis
SemanticHypothesis
UtteranceVerifier/Matcher
Inventory of speech
recognition units
Word Dictionary Grammar
TaskModel
Recognized Utterance
![Page 6: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/6.jpg)
66
Blackboard ApproachBlackboard Approach
EnvironmentalProcesses
Acoustic Processes Lexical
Processes
SyntacticProcesses
SemanticProcesses
Blackboard
![Page 7: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/7.jpg)
77
An overall view of a speech recognition system
bottom up
top down
From Ladefoged 2001
![Page 8: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/8.jpg)
88
Recognition TheoriesRecognition Theories
Articulatory Based RecognitionArticulatory Based Recognition– Use from Articulatory system for recognitionUse from Articulatory system for recognition– This theory is the most successful until nowThis theory is the most successful until now
Auditory Based RecognitionAuditory Based Recognition– Use from Auditory system for recognitionUse from Auditory system for recognition
Hybrid Based RecognitionHybrid Based Recognition– Is a hybrid from the above theoriesIs a hybrid from the above theories
Motor TheoryMotor Theory– Model the intended gesture of speakerModel the intended gesture of speaker
![Page 9: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/9.jpg)
99
Recognition ProblemRecognition Problem
We have the sequence of acoustic We have the sequence of acoustic symbols and we want to find the words symbols and we want to find the words that expressed by speakerthat expressed by speaker
Solution : Finding the most probable Solution : Finding the most probable word sequence having Acoustic symbolsword sequence having Acoustic symbols
![Page 10: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/10.jpg)
1010
Recognition ProblemRecognition Problem
A : Acoustic SymbolsA : Acoustic SymbolsW : Word SequenceW : Word Sequence
we should find so that we should find so that w)|(max)|ˆ( AwPAwP
w
![Page 11: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/11.jpg)
1111
Bayse RuleBayse Rule
),()()|( yxPyPyxP
)()()|()|(
yPxPxyPyxP
)()()|()|(
APwPwAPAwP
![Page 12: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/12.jpg)
1212
Bayse Rule (Cont’d)Bayse Rule (Cont’d)
)()()|(max
APwPwAP
w
)|(max)|ˆ( AwPAwPw
)()|(max
)|(maxˆ
wPwAPArg
AwPArgw
w
w
![Page 13: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/13.jpg)
1313
Simple Language ModelSimple Language Model
nwwwww 321
)|()( 1211wwwwPwP iii
n
i
Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.
![Page 14: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/14.jpg)
1414
Simple Language Model Simple Language Model (Cont’d)(Cont’d)
)|()( 211 iii
n
iwwwPwP
)|()( 11 ii
n
iwwPwP
Trigram :
Bigram :
)()(1 i
n
iwPwP
Monogram :
![Page 15: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/15.jpg)
1515
Simple Language Model Simple Language Model (Cont’d)(Cont’d)
)|( 123 wwwP
Computing Method :Number of happening W3 after W1W2
Total number of happening W1W2
AdHoc Method :)()|()|()|( 332321231123 wfwwfwwwfwwwP
![Page 16: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/16.jpg)
1616
7-Speech Recognition7-Speech Recognition
Speech Recognition Concepts Speech Recognition Concepts Speech Recognition ApproachesSpeech Recognition ApproachesRecognition TheoriesRecognition TheoriesBayse RuleBayse RuleSimple Language ModelSimple Language ModelP(A|W) Network TypesP(A|W) Network Types
![Page 17: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/17.jpg)
1717From Ladefoged 2001
![Page 18: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/18.jpg)
1818
P(A|W) Computing P(A|W) Computing ApproachesApproaches
Dynamic Time Warping (DTW)Dynamic Time Warping (DTW)
Hidden Markov Model (HMM)Hidden Markov Model (HMM)
Artificial Neural Network (ANN)Artificial Neural Network (ANN)
Hybrid SystemsHybrid Systems
![Page 19: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/19.jpg)
1919
Dynamic Time Warping Dynamic Time Warping Method (DTW)Method (DTW)
To obtain a global distance between two speech patterns a time alignment must be performed
Ex :A time alignment path between a template pattern “SPEECH” and a noisy input “SsPEEhH”
![Page 20: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/20.jpg)
2020
Recognition TasksRecognition Tasks
Isolated Word Recognition (IWR) And Isolated Word Recognition (IWR) And Continuous Speech Recognition (CSR)Continuous Speech Recognition (CSR)Speaker Dependent And Speaker Speaker Dependent And Speaker Independent Independent Vocabulary SizeVocabulary Size– Small <20Small <20– Medium >100 , <1000Medium >100 , <1000– Large >1000, <10000Large >1000, <10000– Very Large >10000Very Large >10000
![Page 21: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/21.jpg)
2121
Error Production FactorError Production Factor
Prosody (Recognition should be Prosody (Recognition should be Prosody Independent)Prosody Independent)Noise (Noise should be prevented)Noise (Noise should be prevented)
Spontaneous SpeechSpontaneous Speech
![Page 22: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/22.jpg)
2222
Artificial Neural NetworkArtificial Neural Network
...
1x
0x
1w0w
1Nw1Nx
y)(
1
1
i
N
iixwy
Simple Computation Element of a Neural Network
![Page 23: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/23.jpg)
2323
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
Neural Network TypesNeural Network Types– PerceptronPerceptron– Time DelayTime Delay– Time Delay Neural Network Computational Time Delay Neural Network Computational
Element (TDNN)Element (TDNN)
![Page 24: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/24.jpg)
2424
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
. . .
. . .0x
0y 1My
1Nx
Single Layer Perceptron
![Page 25: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/25.jpg)
2525
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
. . .
. . .
Three Layer Perceptron
. . .
. . .
![Page 26: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/26.jpg)
2626
Hybrid MethodsHybrid Methods
Hybrid Neural Network and Matched Filter For Hybrid Neural Network and Matched Filter For RecognitionRecognition
PATTERN
CLASSIFIER
Speech Acoustic Features Delays
Output Units
![Page 27: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/27.jpg)
2727
Neural Network PropertiesNeural Network Properties
The system is simple, But too much The system is simple, But too much iterativeiterativeDoesn’t determine a specific structureDoesn’t determine a specific structureRegardless of simplicity, the results are Regardless of simplicity, the results are goodgoodTraining size is large, so training should be Training size is large, so training should be offlineofflineAccuracy is relatively goodAccuracy is relatively good
![Page 28: 7- Speech Recognition (Cont’d)](https://reader035.vdocuments.us/reader035/viewer/2022062816/5681344d550346895d9b3415/html5/thumbnails/28.jpg)
2828
Hidden Markov ModelHidden Markov Model
Observation : O1,O2, . . . Observation : O1,O2, . . .
States in time : q1, q2, . . .States in time : q1, q2, . . .
All states : s1, s2, . . .All states : s1, s2, . . .
tOOOO ,,,, 321
tqqqq ,,,, 321
Si Sjjiaija