Download - 7- Speech Recognition
![Page 1: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/1.jpg)
11
7-Speech Recognition7-Speech Recognition
Speech Recognition Concepts Speech Recognition Concepts
Speech Recognition ApproachesSpeech Recognition Approaches
Recognition TheoriesRecognition Theories
Bayse RuleBayse Rule
Simple Language ModelSimple Language Model
P(A|W) Network TypesP(A|W) Network Types
![Page 2: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/2.jpg)
22
7-Speech Recognition (Cont’d)7-Speech Recognition (Cont’d)
HMM Calculating ApproachesHMM Calculating Approaches
Neural ComponentsNeural Components
Three Basic HMM ProblemsThree Basic HMM Problems
Viterbi AlgorithmViterbi Algorithm
State Duration ModelingState Duration Modeling
Training In HMMTraining In HMM
![Page 3: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/3.jpg)
33
Recognition TasksRecognition Tasks
Isolated Word Recognition (IWR)Isolated Word Recognition (IWR)
Connected Word (CW) , And Continuous Connected Word (CW) , And Continuous Speech Recognition (CSR)Speech Recognition (CSR)
Speaker Dependent, Multiple Speaker, And Speaker Dependent, Multiple Speaker, And Speaker Independent Speaker Independent
Vocabulary SizeVocabulary Size– Small <20Small <20– Medium >100 , <1000Medium >100 , <1000– Large >1000, <10000Large >1000, <10000– Very Large >10000Very Large >10000
![Page 4: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/4.jpg)
44
Speech Recognition ConceptsSpeech Recognition Concepts
NLPSpeech
Processing
Text Speech
NLPSpeech
ProcessingSpeech
Understanding
Speech Synthesis
TextPhone Sequence
Speech Recognition
Speech recognition is inverse of Speech Synthesis
![Page 5: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/5.jpg)
55
Speech Recognition Speech Recognition ApproachesApproaches
Bottom-Up ApproachBottom-Up Approach
Top-Down ApproachTop-Down Approach
Blackboard ApproachBlackboard Approach
![Page 6: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/6.jpg)
66
Bottom-Up ApproachBottom-Up Approach
Signal Processing
Feature Extraction
Segmentation
Signal Processing
Feature Extraction
Segmentation
Segmentation
Sound Classification Rules
Phonotactic Rules
Lexical Access
Language Model
Voiced/Unvoiced/Silence
Kno
wle
dge
Sou
rces
Recognized Utterance
![Page 7: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/7.jpg)
77
UnitMatching
System
Top-Down ApproachTop-Down Approach
FeatureAnalysis
LexicalHypothesis
SyntacticHypothesis
SemanticHypothesis
UtteranceVerifier/Matcher
Inventory of speech
recognition units
Word Dictionary Grammar
TaskModel
Recognized Utterance
![Page 8: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/8.jpg)
88
Blackboard ApproachBlackboard Approach
EnvironmentalProcesses
Acoustic Processes Lexical
Processes
SyntacticProcesses
SemanticProcesses
Blackboard
![Page 9: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/9.jpg)
99
Recognition TheoriesRecognition Theories
Articulatory Based RecognitionArticulatory Based Recognition– Use from Articulatory system for recognitionUse from Articulatory system for recognition– This theory is the most successful until nowThis theory is the most successful until now
Auditory Based RecognitionAuditory Based Recognition– Use from Auditory system for recognitionUse from Auditory system for recognition
Hybrid Based RecognitionHybrid Based Recognition– Is a hybrid from the above theoriesIs a hybrid from the above theories
Motor TheoryMotor Theory– Model the intended gesture of speakerModel the intended gesture of speaker
![Page 10: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/10.jpg)
1010
Recognition ProblemRecognition Problem
We have the sequence of acoustic We have the sequence of acoustic symbols and we want to find the words symbols and we want to find the words that expressed by speakerthat expressed by speaker
Solution : Finding the most probable of Solution : Finding the most probable of word sequence by having Acoustic word sequence by having Acoustic symbolssymbols
![Page 11: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/11.jpg)
1111
Recognition ProblemRecognition Problem
A : Acoustic SymbolsA : Acoustic Symbols
W : Word SequenceW : Word Sequence
we should find so that we should find so that W)|(max)|ˆ( AWPAWP
W
![Page 12: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/12.jpg)
1212
Bayse RuleBayse Rule
),()()|( yxPyPyxP
)(
)()|()|(
yP
xPxyPyxP
)(
)()|()|(
AP
WPWAPAWP
![Page 13: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/13.jpg)
1313
Bayse Rule (Cont’d)Bayse Rule (Cont’d)
)(
)()|(max
AP
WPWAPW
)|(max)|ˆ( AWPAWPW
)()|(max
)|(maxˆ
WPWAPArg
AWPArgW
W
W
![Page 14: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/14.jpg)
1414
Simple Language ModelSimple Language Model
nwwwww 321
),...,,,(
),...,,|(
).....,,|(
),|()|()(
)|()(
121
121
1234
123121
1211
WWWWP
WWWWP
WWWWP
WWWPWWPWP
wwwwPwP
nnn
nnn
iii
n
i
Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.
![Page 15: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/15.jpg)
1515
Simple Language Model Simple Language Model (Cont’d)(Cont’d)
)|()( 211
iii
n
iwwwPwP
)|()( 11
ii
n
iwwPwP
Trigram :
Bigram :
)()(1
i
n
iwPwP
Monogram :
![Page 16: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/16.jpg)
1616
Simple Language Model Simple Language Model (Cont’d)(Cont’d)
)|( 123 wwwP
Computing Method :Number of happening W3 after W1W2
Total number of happening W1W2
AdHoc Method :
)()|()|()|( 332321231123 wfwwfwwwfwwwP
![Page 17: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/17.jpg)
1717
Error Production FactorError Production Factor
Prosody (Recognition should be Prosody (Recognition should be Prosody Independent)Prosody Independent)
Noise (Noise should be prevented)Noise (Noise should be prevented)
Spontaneous SpeechSpontaneous Speech
![Page 18: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/18.jpg)
1818
P(A|W) Computing P(A|W) Computing ApproachesApproaches
Dynamic Time Warping (DTW)Dynamic Time Warping (DTW)
Hidden Markov Model (HMM)Hidden Markov Model (HMM)
Artificial Neural Network (ANN)Artificial Neural Network (ANN)
Hybrid SystemsHybrid Systems
![Page 19: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/19.jpg)
1919
Dynamic Time Warping Dynamic Time Warping Method (DTW)Method (DTW)
To obtain a global distance between two speech patterns a time alignment must be performed
Ex :A time alignment path between a template pattern “SPEECH” and a noisy input “SsPEEhH”
![Page 20: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/20.jpg)
2020
Artificial Neural NetworkArtificial Neural Network
...
1x
0x
1w0w
1Nw
1Nx
y)(
1
0
i
N
iixwy
Simple Computation Element of a Neural Network
![Page 21: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/21.jpg)
2121
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
Neural Network TypesNeural Network Types– PerceptronPerceptron– Time DelayTime Delay– Time Delay Neural Network Computational Time Delay Neural Network Computational
Element (TDNN)Element (TDNN)
![Page 22: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/22.jpg)
2222
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
. . .
. . .
0x
0y 1My
1Nx
Single Layer Perceptron
![Page 23: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/23.jpg)
2323
Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)
. . .
. . .
Three Layer Perceptron
. . .
. . .
![Page 24: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/24.jpg)
2424
Hybrid MethodsHybrid Methods
Hybrid Neural Network and Matched Filter For Hybrid Neural Network and Matched Filter For RecognitionRecognition
PATTERN
CLASSIFIER
SpeechAcoustic Features Delays
Output Units
![Page 25: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/25.jpg)
2525
Neural Network PropertiesNeural Network Properties
The system is simple, But too much The system is simple, But too much iteration is needed for trainingiteration is needed for training
Doesn’t determine a specific structureDoesn’t determine a specific structure
Regardless of simplicity, the results are Regardless of simplicity, the results are goodgood
Training size is large, so training should be Training size is large, so training should be offlineoffline
Accuracy is relatively goodAccuracy is relatively good
![Page 26: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/26.jpg)
2626
Hidden Markov ModelHidden Markov Model
Observation : O1,O2, . . . Observation : O1,O2, . . .
States in time : q1, q2, . . .States in time : q1, q2, . . .
All states : s1, s2, . . ., sNAll states : s1, s2, . . ., sN
tOOOO ,,,, 321
tqqqq ,,,, 321
Si Sjjiaija
![Page 27: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/27.jpg)
2727
Hidden Markov Model (Cont’d)Hidden Markov Model (Cont’d)
Discrete Markov ModelDiscrete Markov Model
)|(
),,,|(
1
121
itjt
zktitjt
sqsqP
sqsqsqsqP
Degree 1 Markov Model
![Page 28: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/28.jpg)
2828
Hidden Markov Model (Cont’d)Hidden Markov Model (Cont’d)
)|( 1 itjtij sqsqPa
ija : Transition Probability from Si to Sj ,
Nji ,1
![Page 29: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/29.jpg)
2929
Discrete Markov Model Discrete Markov Model ExampleExample
S1 : The weather is rainyS2 : The weather is cloudyS3 : The weather is sunny
8.01.01.0
2.06.02.0
3.03.04.0
}{ ijaA
rainy cloudy sunnyrainy
cloudy
sunny
![Page 30: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/30.jpg)
3030
Hidden Markov Model Example Hidden Markov Model Example (Cont’d)(Cont’d)
Question 1:How much is this probability:Sunny-Sunny-Sunny-Rainy-Rainy-Sunny-Cloudy-Cloudy
22311333 ssssssss
22321311313333 aaaaaaa
87654321 qqqqqqqq410536.1
![Page 31: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/31.jpg)
3131
Hidden Markov Model Example Hidden Markov Model Example (Cont’d)(Cont’d)
Question 2:The probability of staying in state Si for d days if we are in state Si?
NisqP ii 1),( 1The probability of being in state i in time t=1
)()1()( 1 dPaassssP iiidiiijiii
d Days
![Page 32: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/32.jpg)
3232
Discrete Density HMM Discrete Density HMM ComponentsComponents
N : Number Of StatesN : Number Of States
M : Number Of OutputsM : Number Of Outputs
A (NxN) : State Transition Probability A (NxN) : State Transition Probability MatrixMatrix
B (NxM): Output Occurrence Probability in B (NxM): Output Occurrence Probability in each stateeach state
(1xN): Initial State Probability(1xN): Initial State Probability),,( BA : Set of HMM Parameters
![Page 33: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/33.jpg)
3333
Three Basic HMM ProblemsThree Basic HMM Problems
Given an HMM Given an HMM and a sequence of and a sequence of observations observations O,O,what is the probability what is the probability ? ?
Given a model and a sequence of Given a model and a sequence of observations observations OO, what is the most likely , what is the most likely state sequence in the model that produced state sequence in the model that produced the observations?the observations?
Given a model Given a model and a sequence of and a sequence of observationsobservations O, O, how should we adjust how should we adjust model parameters in order to maximize model parameters in order to maximize ? ?
)|( OP
)|( OP
![Page 34: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/34.jpg)
3434
First Problem SolutionFirst Problem Solution
)(),|(),|(11
tq
T
ttt
T
tobqoPqoP
t
TT qqqqqqq aaaqP132211
)|(
)()|(),( yPyxPyxP
)|(),|()|,( zyPzyxPzyxP We Know That:
And
![Page 35: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/35.jpg)
3535
First Problem Solution (Cont’d)First Problem Solution (Cont’d)
)|(),|()|,( qPqoPqoP
)()()(
)|,(
122111 21 Tqqqqqqqq obaobaob
qoP
TTT
T
TTTqqq
Tqqqqqqqq
q
obaobaob
qoPoP
21
122111)()()(
)|,()|(
21
Computation Order : )2( TTNO
![Page 36: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/36.jpg)
3636
Forward Backward ApproachForward Backward Approach
)|,,,,()( 21 iqoooPi ttt
Niobi ii 1),()( 11
Computing )(it
1) Initialization
![Page 37: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/37.jpg)
3737
Forward Backward Approach Forward Backward Approach (Cont’d)(Cont’d)
NjTt
obaij tjij
N
itt
1,11
)(])([)( 11
1 2) Induction :
3) Termination :
N
iT ioP
1
)()|(
Computation Order : )( 2TNO
![Page 38: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/38.jpg)
3838
Backward VariableBackward Variable
),|,,,()( 21 iqoooPi tTttt
NiiT 1,1)(1) Initialization
2)Induction
NiAndTTt
jobaiN
jttjijt
11,,2,1
)()()(1
11
![Page 39: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/39.jpg)
3939
Second Problem SolutionSecond Problem Solution
Finding the most likely state sequenceFinding the most likely state sequence
N
itt
ttN
it
t
ttt
ii
ii
iqoP
iqoP
oP
iqoPoiqPi
11
)()(
)()(
)|,(
)|,(
)|(
)|,(),|()(
Individually most likely state :Ttiq t
it 1)],([maxarg*
![Page 40: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/40.jpg)
4040
Viterbi AlgorithmViterbi Algorithm
Define : Define :
Ni
oooiqqqqP
i
tttqqq
t
t
1
]|,,,,,,,,[max
)(
21121,,, 121
P is the most likely state sequence with this conditions : state i , time t and observation o
![Page 41: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/41.jpg)
4141
Viterbi Algorithm (Cont’d)Viterbi Algorithm (Cont’d)
)(].)(max[)( 11 tjijti
t obaij
1) Initialization
0)(
1),()(
1
11
i
Niobi ii
)(it Is the most likely state before state i at time t-1
![Page 42: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/42.jpg)
4242
Viterbi Algorithm (Cont’d)Viterbi Algorithm (Cont’d)
NjTt
aij
obaij
ijtNi
t
tjijtNi
t
1,2
])([maxarg)(
)(])([max)(
11
11
2) Recursion
![Page 43: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/43.jpg)
4343
Viterbi Algorithm (Cont’d)Viterbi Algorithm (Cont’d)
)]([maxarg
)]([max
1
*
1
*
iq
ip
TNi
T
TNi
3) Termination:
4)Backtracking:
1,,2,1),( *11
* TTtqq ttt
![Page 44: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/44.jpg)
4444
Third Problem SolutionThird Problem Solution
Parameters Estimation using Baum-Parameters Estimation using Baum-Welch Or Expectation Maximization Welch Or Expectation Maximization (EM) Approach(EM) Approach
Define:
N
i
N
jttjijt
ttjijt
tt
ttt
jobai
jobai
oP
jqiqoP
ojqiqPji
1 111
11
1
1
)()()(
)()()(
)|(
)|,,(
),|,(),(
![Page 45: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/45.jpg)
4545
Third Problem Solution Third Problem Solution (Cont’d)(Cont’d)
N
jtt jii
1
),()(
1
1
)(T
tt i
T
tt ji
1
),(
: Expectation value of the number of jumps from state i
: Expectation value of the number of jumps from state i to state j
![Page 46: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/46.jpg)
4646
Third Problem Solution Third Problem Solution (Cont’d)(Cont’d)
)(1 ii
T
tt
T
tt
ij
i
jia
1
1
)(
),(
T
tt
Vo
T
tt
j
j
j
kb kt
1
1
)(
)(
)(
![Page 47: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/47.jpg)
4747
Baum Auxiliary FunctionBaum Auxiliary Function
q
qoPqoPQ )|,(log)'|,()|( '
)'|()|(
)|()|(: '
oPoP
QQif
By this approach we will reach to a local optimum
![Page 48: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/48.jpg)
4848
Restrictions Of Restrictions Of Reestimation FormulasReestimation Formulas
11
N
ii
NiaN
jij
1,11
NjkbM
kj
1,1)(1
![Page 49: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/49.jpg)
4949
Continuous Observation Continuous Observation DensityDensity
We have amounts of a PDF instead of We have amounts of a PDF instead of
We haveWe have
)|()( jqVoPkb tktj
1)(,),,()(1
dooboCob j
M
kjkjkjkj
Mixture Coefficients
Average Variance
![Page 50: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/50.jpg)
5050
Continuous Observation Continuous Observation DensityDensity
Mixture in HMMMixture in HMM
),,()( jkjkjkk
j oCMaxob
M2|1M1|1
M4|1M3|1
M2|3M1|3
M4|3M3|3
M2|2M1|2
M4|2M3|2
S1 S2 S3Dominant Mixture:
![Page 51: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/51.jpg)
5151
Continuous Observation Continuous Observation Density (Cont’d)Density (Cont’d)
Model Parameters:Model Parameters:
),,,,( CA
N×N N×M×K×KN×M×KN×M1×N
N : Number Of StatesM : Number Of Mixtures In Each StateK : Dimension Of Observation Vector
![Page 52: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/52.jpg)
5252
Continuous Observation Continuous Observation Density (Cont’d)Density (Cont’d)
T
t
M
kt
T
tt
jk
kj
kjC
1 1
1
),(
),(
T
tt
t
T
tt
jk
kj
okj
1
1
),(
),(
![Page 53: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/53.jpg)
5353
Continuous Observation Continuous Observation Density (Cont’d)Density (Cont’d)
T
tt
jktjkt
T
tt
jk
kj
ookj
1
1
),(
)()(),(
),( kjt Probability of event j’th state and k’th mixture at time t
![Page 54: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/54.jpg)
5454
State Duration ModelingState Duration Modeling
Si Sj
Probability of staying d times in state i :
)1()( 1ii
diii aadP
jia
ija
![Page 55: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/55.jpg)
5555
State Duration Modeling State Duration Modeling (Cont’d)(Cont’d)
Si Sjjia
……. …….
HMM With clear duration
ija )(dPj)(dPi
![Page 56: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/56.jpg)
5656
State Duration Modeling State Duration Modeling (Cont’d)(Cont’d)
HMM consideration with State Duration :HMM consideration with State Duration :– Selecting using ‘sSelecting using ‘s– Selecting usingSelecting using– Selecting Observation Sequence Selecting Observation Sequence
using using in practice we assume the following in practice we assume the following
independence:independence:
– Selecting next state using transition probabilities Selecting next state using transition probabilities . We also have an additional constraint: . We also have an additional constraint:
),(),,,(1
1
11 121 tq
d
tdq OtbOOOb
iiq 1
dOOO ,,, 21 )(
1dPq1d
21qqa
),,,(11 21 dq OOOb
jq 2
011qqa
![Page 57: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/57.jpg)
5757
Training In HMMTraining In HMM
Maximum Likelihood (ML)Maximum Likelihood (ML)
Maximum Mutual Information (MMI)Maximum Mutual Information (MMI)
Minimum Discrimination Information (MDI)Minimum Discrimination Information (MDI)
![Page 58: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/58.jpg)
5858
Training In HMMTraining In HMM
Maximum Likelihood (ML)Maximum Likelihood (ML)
)|( 1oP
)|( 2oP)|( 3oP
)|( noP
.
.
.
)]|([*V
rOPMaximumP
ObservationSequence
![Page 59: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/59.jpg)
5959
Training In HMM (Cont’d)Training In HMM (Cont’d)
Maximum Mutual Information (MMI)Maximum Mutual Information (MMI)
)()(
)|,(log),(
POP
OPOI
v
ww
v
wPwOP
OPOI
1
)(),|(log
)|(log),(
Mutual Information
}{, v
![Page 60: 7- Speech Recognition](https://reader030.vdocuments.us/reader030/viewer/2022032606/56812cad550346895d9160d8/html5/thumbnails/60.jpg)
6060
Training In HMM (Cont’d)Training In HMM (Cont’d)
Minimum Discrimination Information Minimum Discrimination Information (MDI)(MDI)
dooP
oqoqPQI )|(
)(log)():(
),,,( 21 TOOOO
),,,( 21 tRRRR
Observation :
Auto correlation :
):(inf),( PQIPR )(RQ