1.130 -wavelets, filter banks and applications wavelet-based feature extraction for phoneme...
TRANSCRIPT
1.130 -Wavelets, Filter Banks and Applications
Wavelet-Based Feature Extraction for Phoneme Recognition and
Classification
Ghinwa Choueiter
Outline • Introduction: What are wavelets/phonemes
• Problem specification
• Motivation
• Experimental Setup
• Wavelet-based feature extractor architecture
• Results
• Conclusions
• References
What are Wavelets
• The wavelet is a well localised function both in the time and frequency domains
• Alternative proposed to overcome the resolution problem of the STFT for analyzing nonstationary signals
• Uses a constant-Q analysis to represent the signal in a time-scale plane
• Showed potential in applications of speech recognition such as speech analysis, pitch detection, and speech compression
What are Wavelets (2)
Daubechies 4-tap filter
k
k ktgt 22
k
k ktht 22
Wavelet equationScaling equation
What are Wavelets (3)
• P. P. Vaidyanathan, “Lossless systems in wavelet transforms”. IEEE International Symposium on Circuits and Systems, 1991.
Discrete time Wavelet transforms and magnitude responses of wavelet filters at 3 different scales
What are Phonemes • Phonemes are the smallest units in the sound
system of a language that allows distinguishing between the meanings of words
• Phonemes Categories:1. Vowels are produced with periodic excitation and are thus
characterized by resonance frequencies (200Hz-3500Hz)
2. Fricatives are generated due to turbulence at narrow constriction and are characterized by a noisy broad-spectrum
3. Plosives are produced by a complete closure of the vocal tract followed by its sudden release. Spectral content is usually weak in energy
Problem Specification
• Mel-frequency cepstral coefficients are the most widely speech features in the problem of speech recognition
Speech WaveformPower
SpectrumComputation
MelSpectrum
Computation
NaturalLogarithm
DCT MFCC
• The mel-scaled filterbank is a series of triangular BPF designed to simulate the human auditory system
Problem Specification (2)
• In this work we attempt to extract features based on a wavelet analysis making use of the flexibility that it provides in manipulating time versus frequency resolution in order to design the appropriate classifiers for the different types of signals that we have.
Problem Specification (3)
• Perform phoneme recognition among three classes:
• Perform phoneme recognition within each category
1. Vowels ‘ae’/bat ‘aa’/ Bob ‘iy’/beat ‘uw’/boot
2. Fricatives ‘sh’/she ‘v’/vowel ‘s’/see ‘dh’/thee
3. Plosives Stops ‘b’/bob ‘p’/poop ‘d’/dot ‘k’/cot
MotivationSample vowels spectrograms
Low-Frequency Formants
Motivation (2)Sample fricatives spectrograms
Strong High- Frequency Content
Motivation (3)Sample plosives spectrograms
Weak Overall Frequency Content
The Experimental Setup
• Timit speech database
• Speech signals sampled at 16khz
• Phonemes extracted from 200 training utterances and 150 test utterances
Phoneme class Training Data Test Data
Vowels 322 299
Stops 370 381
Fricatives 396 347
The Experimental Setup (2)Vowels Training
DataTest Data
‘ae’ 100 86
‘iy’ 100 100
‘aa’ 100 93
‘uw’ 22 20
Frics Training Data
Test Data
‘sh’ 96 52
‘v’ 100 97
‘s’ 100 100
‘dh’ 100 98
Stops Training Data
Test Data
‘b’ 70 81
‘p’ 100 100
‘d’ 100 100
‘k’ 100 100
The Experimental Setup (3)
• Features Extracted:
1. 13 dimensional MFCC vectors
2. Variable dimensions Wavelet-DCT vectors depending on the phoneme class
• ML and MAP classifiers used with Gaussian Mixture Models where Mixture=4
Previous Work
• Mel wavelet cepstral coefficients
• Applying wavelet analysis to speech segmentation
and classification
• Mel-scaled discrete wavelet coefficients
• Applying sampled continuous wavelet transform in
phoneme recognition
• Symmetric octave filter bank
A Basic Feature Extractor Architecture
Provides us with three degrees of freedom:•The wavelet type•The fractional moment k•The decomposition
Speech WaveformDWT
FractionalMoment
Computation
NaturalLogarithm
Wavelet-DCTCoeffWavelet Type
Fractional MomentDegree k
Decomposition type
HammingWindow
DCT
The Vowel-Fricative/Stop Feature Extraction
• The wavelet type: ‘sym4’.• k=1• The decomposition:
The Plosive-Fricative Feature Extractor
• The wavelet type: ‘haar’• k=1• The decomposition:
The Vowel Feature Extractor
• The wavelet type: ‘sym4’• k=1• The decomposition:
The Fricative Feature Extractor
• The wavelet type: ‘sym6’• k=1• The decomposition:
The Plosive Feature Extractor
• The wavelet type: ‘haar’• k=0.85• The decomposition:
The Complete Classifier Architecture
SpeechWaveform
Vowel-FricClassifier
Vowel-Stop
Classifier
Stop-FricClassifier
VowelClassifier
StopClassifier
FricativeClassifier
Vowel, Stop, orFricative?
Vowel or Stop/Fricative?
Vowel
Fric/Stop
Vowel
Fric
Stop
Preliminaries: Consistency
Preliminaries: Discrimination
vs sv v
vs s v
Preliminaries: Behavior
Results for Vowels
Wavelet-DCT
MFCC
Maximum Likelihood Maximum A-Priori
ae iy aa uw
ae 70 4 11 1
iy 1 96 0 3
aa 11 0 78 4
uw 1 2 2 15
86.622 %
ae iy aa uw
ae 73 5 8 0
iy 7 92 1 0
aa 10 0 82 1
uw 2 2 5 11
86.288 %
ae iy aa uw
ae 68 7 7 4
iy 9 94 0 2
aa 7 0 82 4
uw 1 1 5 13
85.953 %
ae iy aa uw
ae 67 10 9 0
iy 5 93 1 1
aa 13 0 77 3
uw 1 1 7 11
82.943 %
Results for Fricatives
Wavelet-DCT
MFCC
Maximum Likelihood Maximum A-Priori
sh v s dh
sh 42 1 9 0
v 0 75 1 21
s 20 2 78 0
dh 0 16 0 82
79.827 %
sh v s dh
sh 42 0 9 1
v 0 64 2 31
s 21 1 77 1
dh 0 16 1 81
76.081 %
sh v s dh
sh 42 0 10 0
v 0 68 0 29
s 20 0 80 0
dh 0 34 1 63
72.911 %
sh v s dh
sh 44 1 7 0
v 0 77 0 20
s 17 0 82 1
dh 1 26 0 71
78.963 %
Results for Plosives
Wavelet-DCT
MFCC
Maximum Likelihood Maximum A-Priori
b p d k
b 50 10 8 13
p 11 54 12 23
d 23 7 52 18
k 5 27 17 51
54.331 %
b p d k
b 50 13 9 9
p 10 57 9 24
d 19 11 53 17
k 3 37 14 46
54.068 %
b p d k
b 46 15 18 2
p 11 64 8 17
d 23 9 52 16
k 4 28 17 51
55.906 %
b p d k
b 48 17 11 5
p 6 68 10 16
d 23 11 53 13
k 2 27 16 55
52.000 %
Results of Category Classification
• Wavelets perform considerably better than MFCC in discriminating between vowels on one side and fricatives (95% vs. 90%) or plosives (98% vs. 95%) on the other.
• For classifying between fricatives and plosives, wavelets fall only marginally behind MFCC (90% vs. 91%).
Conclusions and Future Work
• The results obtained from the wavelet-based feature extraction are quite promising.
• Designing specific wavelets that would be optimized for the task at hand.
• Consider an algorithm that would select the optimum decomposition for a family of signals.
• Incorporating confidence scoring.
• Further investigation into the fractional moments.
References
• G. Strang and T. Nguyen, Wavelets and Filter Banks. Wellesley-Cambridge Press, 1997.
• P. P. Vaidyanathan, “Lossless systems in wavelet transforms”. IEEE International Symposium on Circuits and Systems, 1991.
• K. Kim, D. H. Youn and C. Lee “Evaluation of wavelet filters for speech recognition”. IEEE International Conference on Systems, Man, and Cybernetics, 2000, vol. 4, pp. 2891-2894. 2000.
• Z. Tufekci and J. N. Gowdy, “Feature extraction using discrete wavelet transform for speech recognition”. Proceedings of the IEEE Southeastcon 2000, pp. 116-123. 2000.
• B. T. Tan, M. Fu and A. Spray “The use of wavelet transforms in phoneme recognition”. Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 4, pp. 2431-2434. Oct 3-6, 1996.
• B. T. Tan, R. Lang, H. Schroder, A. Spray, and P. Dermody. "Applying wavelet analysis to speech segmentation and classification." Wavelet Applications, Harold H. Szu, Editor, Proc. SPIE 2242, pp. 750-761, 1994.