thesis defense 2007

30
Student: Chao-Hong Meng Advisor: Lin-Shan Lee June 15, 2009

Upload: paul-meng

Post on 04-Jul-2015

497 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Thesis Defense 2007

Student: Chao-Hong Meng

Advisor: Lin-Shan Lee

June 15, 2009

Page 2: Thesis Defense 2007

Phone Sequence sh iy hv ae

MFCC/PLP

Structural SVMHMM

Page 3: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 4: Thesis Defense 2007

Speech Recognition:

Modeling is hard

Traditional approach:

Acoustic Model

Language Model

Page 5: Thesis Defense 2007

Structural SVM:◦ A model can handle structural output

◦ Formulated by Joachims

◦ Directly model

As a preliminary research◦ is MFCC/PLP/Posterior

◦ is phone sequence

Page 6: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 7: Thesis Defense 2007

Baseline Model:◦ Monophone HMM

◦ Tandem

Difference is their inputs:◦ HMM: MFCC/PLP

◦ Tandem: Posterior

Page 8: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 9: Thesis Defense 2007

Define a compatibility function◦ is feature vector sequence

◦ is the correct phone sequence of

Goal is to find a such that

High to Low

Page 10: Thesis Defense 2007

We assumes that

is inner product

is combined feature function to encode the relationship of x and y

In this research, the transition count and emission count are considered

The output of :

◦ (transition count, emission count)

Page 11: Thesis Defense 2007

Settings:◦ 3 different phone labels

◦ 2-dim feature vectors

A training sample:

Calculate◦ Transition count matrix A

◦ Emission count matrix B:

A

B

C

A B C

A

B

C

z1

z2

z2z1

Page 12: Thesis Defense 2007

Concate the rows of A and B

The output of consists of◦ transition count

◦ emission count

Page 13: Thesis Defense 2007

Output label set (phone set) with size K:

Kronecker delta:

Define

◦ Only one element is not zero in

Tensor product:

Page 14: Thesis Defense 2007

Define as follows:

Emission Count

Transition Count

Page 15: Thesis Defense 2007

For different pairs of (x, y), there is a different value of

Recall we define compatibility function as follows

So we have a different preference to y for a given x

w to be estimated Obtain from training data

Page 16: Thesis Defense 2007

contains the information about transition and emission

w

Y

X

Page 17: Thesis Defense 2007

Purpose:

◦ Given a training sample , we want pairs with answer be the first.

◦ The gap between first ant the others be as large as possible

Margin:

Training sample 1

Training sample 2

Training sample 3

Maximise

Page 18: Thesis Defense 2007

The primal form of Structural SVM:

Where are slack variables, which could let margin be negative.

means the margin should be at least

Error tolerance weight

Page 19: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 20: Thesis Defense 2007

Corpus: TIMIT◦ Continuous English◦ Phone set: 48 phones

Frontend: ◦ MFCC/PLP + delta + delta delta (39 dim in total)◦ processed by CMS

HMM◦ 3 state HMM for each phone.◦ 32 Gaussian Mixture in each state.

Tandem◦ 1000 hidden nodes◦ Looking at 4 previous frames and 4 next frames and current frames◦ Reduce to 37 dimension with PCA

Training set Testing set

# of Sentences 3696 192

Page 21: Thesis Defense 2007

Structural SVM◦ Define as previously stated:

◦ The output dimension of

48 * 48 + 48 * 39 = 4176

Emission Matrix

Transition Matrix

Assume First Order Hypothesis

Could be Second Order

48 48

39 48

Page 22: Thesis Defense 2007

58.00%

60.00%

62.00%

64.00%

66.00%

68.00%

70.00%

72.00%

HMM

HMM Tandem

Best: 70.42%

Better

Page 23: Thesis Defense 2007

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

Primal C=10

Dual C=10

In theory, they should be the same

Solving Primal Form

Solving Dual Form

Page 24: Thesis Defense 2007

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

C=1

C=10

C=100

C=1000

Slack Variable Weight

Accuracy increases with C

Input: PosteriorInput: MFCC/PLPBest: 71.75%

Page 25: Thesis Defense 2007

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

C=1 C=10 C=100 C=1000

MFCC

PLP

MLP-MFCC

MLP-PLP

PCA-37-MLP-MFCC

PCA-37-MLP-PLP

Without dim-reduction is better than dim-reduction

Page 26: Thesis Defense 2007

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

HMM

SVM-struct

absolute1.33% improvement

71.75% 70.42%

Getting Worse

Page 27: Thesis Defense 2007

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

First, C=1

Second, C=1

First, C=10

Second, C=10

C = 1

C = 10

Better Better

Page 28: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 29: Thesis Defense 2007

Structural SVM performs badly when input is MFCC/PLP

But Structural SVM can improve absolute 1% over Tandem when using Posterior Probability.

Page 30: Thesis Defense 2007