thesis defense 2007
TRANSCRIPT
Student: Chao-Hong Meng
Advisor: Lin-Shan Lee
June 15, 2009
Phone Sequence sh iy hv ae
MFCC/PLP
Structural SVMHMM
Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Speech Recognition:
Modeling is hard
Traditional approach:
Acoustic Model
Language Model
Structural SVM:◦ A model can handle structural output
◦ Formulated by Joachims
◦ Directly model
As a preliminary research◦ is MFCC/PLP/Posterior
◦ is phone sequence
Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Baseline Model:◦ Monophone HMM
◦ Tandem
Difference is their inputs:◦ HMM: MFCC/PLP
◦ Tandem: Posterior
Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Define a compatibility function◦ is feature vector sequence
◦ is the correct phone sequence of
Goal is to find a such that
High to Low
We assumes that
is inner product
is combined feature function to encode the relationship of x and y
In this research, the transition count and emission count are considered
The output of :
◦ (transition count, emission count)
Settings:◦ 3 different phone labels
◦ 2-dim feature vectors
A training sample:
Calculate◦ Transition count matrix A
◦ Emission count matrix B:
A
B
C
A B C
A
B
C
z1
z2
z2z1
Concate the rows of A and B
The output of consists of◦ transition count
◦ emission count
Output label set (phone set) with size K:
Kronecker delta:
Define
◦ Only one element is not zero in
Tensor product:
Define as follows:
Emission Count
Transition Count
For different pairs of (x, y), there is a different value of
Recall we define compatibility function as follows
So we have a different preference to y for a given x
w to be estimated Obtain from training data
contains the information about transition and emission
w
Y
X
Purpose:
◦ Given a training sample , we want pairs with answer be the first.
◦ The gap between first ant the others be as large as possible
Margin:
Training sample 1
Training sample 2
Training sample 3
Maximise
The primal form of Structural SVM:
Where are slack variables, which could let margin be negative.
means the margin should be at least
Error tolerance weight
Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Corpus: TIMIT◦ Continuous English◦ Phone set: 48 phones
Frontend: ◦ MFCC/PLP + delta + delta delta (39 dim in total)◦ processed by CMS
HMM◦ 3 state HMM for each phone.◦ 32 Gaussian Mixture in each state.
Tandem◦ 1000 hidden nodes◦ Looking at 4 previous frames and 4 next frames and current frames◦ Reduce to 37 dimension with PCA
Training set Testing set
# of Sentences 3696 192
Structural SVM◦ Define as previously stated:
◦ The output dimension of
48 * 48 + 48 * 39 = 4176
Emission Matrix
Transition Matrix
Assume First Order Hypothesis
Could be Second Order
48 48
39 48
58.00%
60.00%
62.00%
64.00%
66.00%
68.00%
70.00%
72.00%
HMM
HMM Tandem
Best: 70.42%
Better
40.00%
45.00%
50.00%
55.00%
60.00%
65.00%
70.00%
Primal C=10
Dual C=10
In theory, they should be the same
Solving Primal Form
Solving Dual Form
35.00%
40.00%
45.00%
50.00%
55.00%
60.00%
65.00%
70.00%
C=1
C=10
C=100
C=1000
Slack Variable Weight
Accuracy increases with C
Input: PosteriorInput: MFCC/PLPBest: 71.75%
35.00%
40.00%
45.00%
50.00%
55.00%
60.00%
65.00%
70.00%
C=1 C=10 C=100 C=1000
MFCC
PLP
MLP-MFCC
MLP-PLP
PCA-37-MLP-MFCC
PCA-37-MLP-PLP
Without dim-reduction is better than dim-reduction
45.00%
50.00%
55.00%
60.00%
65.00%
70.00%
HMM
SVM-struct
absolute1.33% improvement
71.75% 70.42%
Getting Worse
35.00%
40.00%
45.00%
50.00%
55.00%
60.00%
65.00%
70.00%
First, C=1
Second, C=1
First, C=10
Second, C=10
C = 1
C = 10
Better Better
Motivation
Baseline Model
Structural SVM
Experiments
Conclusion
Structural SVM performs badly when input is MFCC/PLP
But Structural SVM can improve absolute 1% over Tandem when using Posterior Probability.