duration modeling for speech recognition
DESCRIPTION
Duration modeling for speech recognition. Presented for BBN Dr . Andrey Nikiforov Department of Applied Mathematics and Statistics State University of New York at Stony Brook. Additional topics. Computational and modeling issues improving the performance of speech recognition algorithms - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/1.jpg)
Exit
Duration modeling for speech Duration modeling for speech recognitionrecognition
Presented for BBN
Dr. Andrey Nikiforov
Department of Applied Mathematics and Statistics
State University of New York at Stony Brook
![Page 2: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/2.jpg)
Start Exit
Additional topics
Computational and modeling issues improving the performance of speech recognition algorithms
Partial classification techniques Tree-dependence covariance models in HMM Fast search and computations for codebooks Interpolation for acoustic space
![Page 3: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/3.jpg)
Start Exit
State duration in HMM
![Page 4: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/4.jpg)
Start Exit
Duration distributions
Duration probability density functions
Time
Exponential
Raleigh
Weibull
Normal
![Page 5: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/5.jpg)
Start Exit
From …
![Page 6: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/6.jpg)
Start Exit
… to
![Page 7: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/7.jpg)
Start Exit
Progressive model
![Page 8: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/8.jpg)
Start Exit
Time calculation
BA
t t+1
![Page 9: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/9.jpg)
Start Exit
Time calculation (continued)
BA
t t+1
![Page 10: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/10.jpg)
Start Exit
Probability calculations: from …
![Page 11: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/11.jpg)
Start Exit
…to
![Page 12: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/12.jpg)
Start Exit
Hazard function
![Page 13: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/13.jpg)
Start Exit
Hazard function estimation
![Page 14: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/14.jpg)
Start Exit
“Nonparametric estimate”
![Page 15: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/15.jpg)
Start Exit
“Trajectories”
![Page 16: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/16.jpg)
Start Exit
State duration correction
(Fant et al., 1991)
![Page 17: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/17.jpg)
Start Exit
Word duration
0.0
2.7
5.3
8.0
30.0 41.7 53.3 65.0
Word duration distribution
Word_length__frames_
Count
![Page 18: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/18.jpg)
Start Exit
State duration correction
0.0
2.7
5.3
8.0
0.0 0.1 0.1 0.1
State duration distribution
C4
Count
0.0
2.7
5.3
8.0
0.0 0.1 0.1 0.1
State duration distribution
C5
Count
0.0
4.7
9.3
14.0
2.8 3.3 3.7 4.2
State duration distribution
C4
Count
0.0
4.0
8.0
12.0
2.5 3.5 4.5 5.5
State duration distribution
C5
Count
![Page 19: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/19.jpg)
Start Exit
State duration correction (continued)
0.0
2.7
5.3
8.0
0.1 0.1 0.1 0.1
State duration distribution
C6
Count
0.0
3.3
6.7
10.0
0.0 0.1 0.1 0.2
State duration distribution
C7
Count
0.0
5.0
10.0
15.0
2.5 3.7 4.8 6.0
State duration distribution
C6
Count
0.0
6.7
13.3
20.0
2.0 4.7 7.3 10.0
State duration distribution
C7
Count
![Page 20: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/20.jpg)
Start Exit
Conclusions
• Representation of duration distribution via the hazard function is simple, effective and comfortable for programming
• Speech recognition errors dropped by 20-25% in different tasks
• Pure time spent in Viterbi search or full probability calculation increased in average by 20% compared to the conventional HMM (almost completely compensated by the reduction of computations due to more adequate modeling)
![Page 21: Duration modeling for speech recognition](https://reader036.vdocuments.us/reader036/viewer/2022062518/5681485d550346895db56bfc/html5/thumbnails/21.jpg)
Start Exit
Partial classification techniques for speech recognition
Helps to create structure in speech HMMs
Useful in codebook(s) estimation
Initial estimates for HMMs and codebooks
More accurate estimates