dual-domain hierarchical classification of phonetic time series hossein hamooni, abdullah mueen...
TRANSCRIPT
![Page 1: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/1.jpg)
Dual-domain Hierarchical Classification of Phonetic
Time Series
Hossein Hamooni, Abdullah Mueen University of New Mexico
Department of Computer Science
![Page 2: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/2.jpg)
What is Phoneme? Phonemes are very small units of intelligible sound (usually less than 200 ms).
Phonetic spelling is the sequence of phonemes that a word comprises.
Example: Coat ([kōt] /K OW T/) From ([frəm] /F R AH M/) impressive ([imˈpresiv] /IH M P R EH S IH V/)
2
![Page 3: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/3.jpg)
Phoneme Classification
What is phoneme classification?
Input: A short segment of audio signal.
Output: What phoneme it is.
Phoneme classification is a complex task:
More than 100 classes (based on International Phonetic Alphabet)
Variation in speakers, dialects, accents, noise in the environment, etc.
Phoneme classification can be used in:
Robust speech recognition
Accent/dialect detection
Speech quality scoring
3
![Page 4: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/4.jpg)
Related Work
Different methods for phoneme classification have been used in the literature: Hidden Markov model [Lee, 1989]
Neural network [Schwarz, 2009]
Deep belief network [Mohamed, 2012]
Support vector machine [Salomon, 2001]
Hierarchical methods [Dekel, 2005]
Boltzmann machine [Mohamed, 2010]
Although data mining society has shown that k-NN classifiers can work well on time series data, it hasn’t been tried on phoneme yet.
4
[C. Lopes, F. Perdigao, 2011]
![Page 5: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/5.jpg)
Our Dual-domain Approach
5
Time Domain: Using k-NN Dynamic Time Warping (DTW) Expensive Speed up by lower bounding
techniques
Frequency Domain: Using k-NN Euclidean distance between Mel-
frequency cepstrum coefficients (MFCC)
Fast
![Page 6: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/6.jpg)
Real Example
6
![Page 7: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/7.jpg)
Challenge
7
DTW is expensive (quadratic in time and space complexity)
We need to apply a speed up technique Solution: Lower bounding techniques
w w
![Page 8: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/8.jpg)
DTW Lower bounding
8
Resampling to equal length doesn’t always work !!!
![Page 9: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/9.jpg)
DTW Lower bounding
9
We use the prefix of the longer signal (Prefixed LB_Keogh) We show that Prefixed LB_Keogh is a lower bound if:
w > difference between lengths of two signals We set w = c * length of the longer signal We ignore all pairs of signals that don’t satisfy the above condition.
2 4 6 8 10 12 14 16 18x104
0
0.5
1
1.5
2
2.5
3
3.5
Sp
eed
up
Training Set Size10 20 30 40 50 60 70 80 90 100
80.2
80.4
80.6
80.8
81
81.2
81.4
81.6
81.8
Window Size (c%)
Acc
urac
y(%
)
c = 30%
![Page 10: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/10.jpg)
Data Collection
10
370,000 phonemes are segmented from: Data is publicly available.
AH T S IH IY M EH AE AA FOW V AO
UW W HH CHAW OY ZH
05000
1000015000200002500030000350004000045000
Num
ber o
f sam
ples
![Page 11: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/11.jpg)
Phoneme Segmentation
11
The Penn Phonetics Lab Forced Aligner (p2fa) Takes a signal and a transcript Produces timing segmentations (word level and phoneme level)
![Page 12: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/12.jpg)
Accuracy (All layers)
12
10-fold cross validation 100 random phonemes in each fold
![Page 13: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/13.jpg)
Accented Phoneme Classification
13
0 0.5 1 1.5 2 2.5 3 3.5x 104
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Training Set Size
Acc
urac
y
MFCC
DTW
British vs. American accent Using Oxford test set 2-class classification problem No hierarchy
![Page 14: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/14.jpg)
Conclusion We present a dual-domain hierarchical method for phoneme
classification.
We generate a novel dataset of 370,000 phonemes.
We achieve up to 73% accuracy rate for 39 classes.
Our lower bounding technique gives us up to 3X speedup.
14
![Page 15: Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science](https://reader030.vdocuments.us/reader030/viewer/2022032611/56649ce65503460f949b464d/html5/thumbnails/15.jpg)
15
Thank You
Data and code available at:http://cs.unm.edu/~hamooni/papers/
Dual_2014