spectral envelope analysis of timit corpus
DESCRIPTION
Spectral envelope analysis of TIMIT corpus. using LP, WLSP, and MVDR. Steve Vest Matlab implementation of methods by Tien-Hsiang Lo. Overview. Methods WLSP MVDR TIMIT corpus Measurements. Analysis methods. LP Linear Prediction using autocorrelation method WLSP - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/1.jpg)
Spectral envelope analysis of TIMIT corpususing LP, WLSP, and MVDR
Steve Vest
Matlab implementation of methods by Tien-Hsiang Lo
![Page 2: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/2.jpg)
Overview
• Methods• WLSP• MVDR
• TIMIT corpus
• Measurements
![Page 3: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/3.jpg)
Analysis methods
• LP• Linear Prediction using autocorrelation method
• WLSP• Weighted-sum Line Spectrum Pairs
• MVDR• Minimum Variance Distortionless Response
• MVDR of WLSP• MVDR applied to WLSP coefficients
![Page 4: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/4.jpg)
WLSP
• Purpose: Increase spectral dynamics between peaks and valleys in spectral envelope• Maximizes difference between peak and valley
amplitudes• Uses autocorrelation values beyond N to obtain
better accuracy
• When applied to Speech coding• Improves quality of decoded speech• Attenuates quantization noise level in the valleys
![Page 5: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/5.jpg)
WLSP Algorithm
1. Apply Hamming window to signal
2. Calculate N-1 order LP coefficients
3. Using LP coefficients calculate LSP polynomials
ˆ ˆ
ˆ ˆR
Rp = a +a
q = a a
where p and q are the symmetric and antisymmetric LSP polynomials, â is the zero-extended vector of LP coefficients, and âR is the reversal of â.
![Page 6: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/6.jpg)
WLSP Algorithm
3. Calculate WLSP polynomial
4. λ is the weighting parameter chosen to minimize the error between the autocorrelations of the speech and the WLSP all-pole filter impulse response• autocorrelations match n=1:N
• Minimize SSE for n=N+1:N+1+L
1
0,1
d p q
![Page 7: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/7.jpg)
WLSP vs. LP
![Page 8: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/8.jpg)
MVDR
• Estimates the power at each frequency by applying a special FIR filter
• Distortionless constraint• FIR filter minimizes the total output power while
preserving unity gain at the estimating frequency• Solving for distortionless filter is a constrained
optimization problem
• More robust modeling method than LP but can be equated from LP
![Page 9: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/9.jpg)
MVDR Algorithm
1. Calculate LP coefficients ak
2. Calculate MVDR coefficients μk
*
0
*
11 2 , for 0 :
, for : 1
N k
i i kiek
k
N k i a a k NP
k N
Note that MVDR coefficients are symmetric and have order 2N+1
![Page 10: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/10.jpg)
MVDR vs. LP
![Page 11: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/11.jpg)
MVDR of WLSP
• Just an exercise out of curiosity• Performs WLSP• Performs MVDR using coefficients from WLSP
instead of LP
• Resulting conclusion• It’s a bad idea…
![Page 12: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/12.jpg)
MVDR of WLSP vs. MVDR
![Page 13: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/13.jpg)
TIMIT corpus
• “The TIMIT corpus of read speech has been designed to provide speech data forthe acquisition of acoustic-phonetic knowledge and for the development andevaluation of automatic speech recognition systems.”
• Large collection of speech samples from 8 regions of the USA
• Samples are phonetically labeled
![Page 14: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/14.jpg)
TIMIT regions
• Region 1: New England
• Region 2: Northern
• Region 3: North Midland
• Region 4: South Midland
• Region 5: Southern
• Region 6: New York City
• Region 7: Western
• Region 8: Army Brat (moved around)
![Page 15: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/15.jpg)
Analyzed Vowels• iy beet• ih bit• eh bet• ey bait• ae bat• aa bott• aw bout• ay bite• ah but• ao
bought
• oy boy• ow boat• uh book• uw boot• ux toot• er bird• ax about• ix debit• axr butter• ax-h suspect
![Page 16: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/16.jpg)
Collected Data
• First three formants• Frequency [Hz]• Amplitude [dB]
• Valleys after formants• Frequency [Hz]• Delta [dB]• Difference between formant amplitude and valley
amplitude
• Collected from entire training data set in TIMIT corpus
![Page 17: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/17.jpg)
Collected Data
• Data organized by:• Vowel• Region• Sex• Spectral approximation method• Trineme• Phonemes preceding and following vowel
![Page 18: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/18.jpg)
Collected Data
• Filter orders N=22• LP: N → 22
• WLSP: M=N+1=23
• MVDR: M=2(2N)+1=89
• MVDR of WLSP: M=2(2N)+1=89
• WLSP data is erroneous• Hamming window was not applied which has
noticeable impact on results
• MVDR of WLSP needs to be excluded
• MVDR order is too high
![Page 19: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/19.jpg)
General Observations
• Formant locations vary greatly• Between different speakers• Between different Trinemes• 100-200 Hz for F1• 300-600 Hz for F2• 600-1000 Hz for F3
![Page 20: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/20.jpg)
Work still to be done
• Optimize methods• e.g. WLSP search method for λ• Analysis of data took over 5 hrs
• Determine best filter orders for each method
• Reorganize data storage for easier analysis• Very difficult to sort through 100,000 sets of data
averages
• Determine exact statistics to be taken
• Perform analysis of TIMIT data again
![Page 21: Spectral envelope analysis of TIMIT corpus](https://reader034.vdocuments.us/reader034/viewer/2022051215/5681492b550346895db664b5/html5/thumbnails/21.jpg)
Sources
• Murthi, Manohar N. “All-Pole Modeling of Speech Based on the Minimum Variance Distortionless Response Spectrum”. IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 3, May 2000
• Backstrom, Tom. “All-Pole Modeling Technique Based on Weighted Sum of LSP Polynomials”. IEEE Signal Processing Letters, Vol. 10, No. 6, June 2003