boosting training scheme for acoustic modeling rong zhang and alexander i. rudnicky language...

15
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie Mellon University

Upload: kathleen-wilson

Post on 04-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Boosting Training Scheme for Acoustic Modeling

Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,

School of Computer Science

Carnegie Mellon University

Page 2: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Reference Papers

• [ICASSP 03] Improving The Performance of An LVCSR System Through Ensembles of Acoustic Models

• [EUROSPEECH 03] Comparative Study of Boosting and Non-Boosting Training for Constructing Ensembles of Acoustic Models

• [ICSLP 04] A Frame Level Boosting Training Scheme for Acoustic Modeling

• [ICSLP 04] Optimizing Boosting with Discriminative Criteria

• [ICSLP 04] Apply N-Best List Re-Ranking to Acoustic Model Combinations of Boosting Training

• [EUROSPEECH 05] Investigations on Ensemble Based Semi-Supervised Acoustic Model Training

• [ICSLP 06] Investigations of Issues for Using Multiple Acoustic Models to Improve Continuous Speech Recognition

Page 3: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Improving The Performance of An LVCSR System Through Ensembles of Acoustic Models

ICASSP 2003

Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,

School of Computer Science

Carnegie Mellon University

Page 4: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Introduction

• An ensemble of classifiers is a collection of single classifiers – which is used to select a hypothesis based on the majority vote

from its components

• Bagging and Boosting are the two most successful algorithms for construction ensembles

• This paper describes the work on applying ensembles of acoustic models to the problem of LVCSR

plurality voting

majority voting

weighted voting

Page 5: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Bagging vs. Boosting

• Bagging– In each round, bagging randomly selects a number of examples

from the original training set, and produces a new single classifier based on the selected subset

– The final classifier is built by choosing the hypothesis best agreed on by single classifiers

• Boosting– In boosting, the single classifiers are iteratively trained in a

fashion such that hard-to-classify examples are given increasing emphasis

– A parameter that measures the classifier’s importance is determined in respect of its classification accuracy

– The final hypothesis is the weighted majority vote from the single classifiers

Page 6: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Algorithms

• The first algorithm is based on the intuition that an incorrectly recognized utterance should receive more attention in training

• If the weight of an utterance is 2.6, we first add two copies of the utterance to the new training set, and then add its third copy with probability 0.6

Page 7: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Algorithms

• The exponential increase in the size of training set is a severe problem for algorithm 1

• Algorithm 2 is proposed to address this problem

Page 8: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Algorithms

• In algorithm 1 and 2, there is no concern to measure how important a model is relative to others– Good model should play

more important role than bad one

x

x1 1

expi

T

tittecL

otherwise

ife

ti

ti

it

0

x

x

x

x

xx

1

1

1

1

1

exp

expexp

iiTT

Ti

iiTT

T

titt

ecw

ececL

Page 9: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Experiments

• Corpus : CMU Communicator system

• Experimental results :

Page 10: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

A Frame Level Boosting Training Scheme for Acoustic Modeling

ICSLP 2004

Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,

School of Computer Science

Carnegie Mellon University

Page 11: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Introduction

• In the current Boosting algorithm, utterance is the basic unit used for acoustic model training

• Our analysis shows that there are two notable weaknesses in this setting..– First, the objective function of current Boosting algorithm is designed

to minimize utterance error instead of word error

– Second, in the current algorithm, an utterance is treated as a unity for resample

• This paper proposes a frame level Boosting training scheme for acoustic modeling to address these two problems

Page 12: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

is the pseudo loss for frame t, which describes the

degree of confusion of this frame for recognition

Frame Level Boosting Training Scheme

• The metrics that we will use in Boosting training is the frame level conditional probability -----(word level)

• Objective function :

x|taP

NBest

ttNBest

h

ahtt

hP

hP

P

aPaP

x

x

x

x

x

xx

,

,,

| label,

N

i

T

t aaiii

i

t

aPaPL1 1

||exp xx

taa

iii aPaP xx ||exp

Page 13: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Frame Level Boosting Training Scheme

• Training Scheme:– How to resample the frame

level training data?• to duplicate for

times and creates a new utterance for acoustic model training

ti,x ti ,x

Page 14: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Experiments

• Corpus : CMU Communicator system

• Experimental results :

Page 15: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie

Discussions

• Some speculations on this outcome : – 1. The mismatch between training criterion and target in principle still

exists in the new scheme

– 2. The frame based resample method is exclusively depends on the weight calculated in the training process

– 3. Forced-alignment is used to determine the correct word for each frame

– 4. A new training scheme considering both utterance and frame level recognition errors may be more suitable for accurate modeling