dynamic data selection in search (1/4)

28
A Dynamic In-Search Data Selection Method With Its Application to Acoustic Modeling and Utterance Verification (about training)

Upload: gema

Post on 19-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

A Dynamic In-Search Data Selection Method With Its Application to Acoustic Modeling and Utterance Verification (about training). Dynamic data selection in search (1/4). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic data selection in search (1/4)

A Dynamic In-Search Data Selection Method With Its

Application to Acoustic Modeling and Utterance Verification (about

training)

Page 2: Dynamic data selection in search (1/4)

Dynamic data selection in search (1/4)

• In continue speech recognition, it become much more difficult to define the CT or TT set because of unknown unit boundaries.

• As a result, any possible segmentation in an utterance could potentially become a competing token. However, an exhaustive search it too expensive to be affordable.

Page 3: Dynamic data selection in search (1/4)

Dynamic data selection in search (2/4)

• In this paper, every utterance is recognized with Viterbi beam search algorithm.

• All partial paths surviving during beam search always have relatively large likelihood values and usually potentially compete with the true path.

Page 4: Dynamic data selection in search (1/4)

Dynamic data selection in search (3/4)

tat time path ending- wordactive

)(,),(),()( 123

2

2

1

1211

W

apapapW Mtt

tt

tt

tt

M

M

M

Reference phone segmentation

)(1m

tt ap m

m

alignment

2

)1()1tt((

1),max(t),min(t

phones. edhypothesiz as

identity phone same the withsegmentsreferenct

allover rate overlapmax thecalculate we

''se

's

'e

se

se

tt

tt

Page 5: Dynamic data selection in search (1/4)

Dynamic data selection in search (4/4)

)(1m

tt ap m

m

1 da threshol exceeds if True token of

ma

procedure.alignment force

theon basedsegment same theof likelihood

average the )( of frameper likelihood

average theand , da thresholbelow is if1

2

m

tt ap m

m

)(1m

tt ap m

m

competing

Token ofma

Page 6: Dynamic data selection in search (1/4)

Application I : acoustic modeling (1/20)

• When giving true tokes of => MAP

• MAP :

)()|(max)(

)()|(max)|(max λλX

X

λλXXλλ

λλλpp

p

pppMAP

EstimationML)|(max withcomparing λXλλ

pML

training observation sequences with prior information

Page 7: Dynamic data selection in search (1/4)

Application I : acoustic modeling (2/20)

T

tlltlssslllss tttttt

NwaNwp

gpp

gpE

g

21,

_

_

__

),|(),|()|,,(Where

))(log()|,,(log),|,(

))(log(,|)|,,(log

))(log()|Q()|R(

: step-E

111111rmxrmxLSX

LSXXLS

XLSX

LS

Page 8: Dynamic data selection in search (1/4)

S

S

S

S

S

S

x

λSX

xx

xx

λSX

λSX

λSXλX

λSX

λSXλXSλXλSXλλ

T

tts

T

tsss

T

ttsssss

T

ttsssss

ttt

ttt

ttt

ba

P

bab

bab

P

P

PP

P

PPPEQ

22

21

21

)(loglog

|,

)()(

)()(log

|,

|,

,log|

|,

,log,,|,log|

11

111

111

A simpler example (DHMM) :

Page 9: Dynamic data selection in search (1/4)

4,1117,1114,11 babab 1 4,1117,1114,11 loglogloglogloglog babab

4,2127,1114,11 babab 2 4,2127,1114,11 loglogloglogloglog babab

4,1217,2124,11 babab 3 4,1217,2124,11 loglogloglogloglog babab

4,2227,2124,11 babab 4 4,2227,2124,11 loglogloglogloglog babab

4,1117,1214,22 babab 5 4,1117,1214,22 loglogloglogloglog babab

4,2127,1214,22 babab 6 4,2127,1214,22 loglogloglogloglog babab

4,1217,2224,22 babab 7 4,1217,2224,22 loglogloglogloglog babab

4,2227,2224,22 babab 8 4,2227,2224,22 loglogloglogloglog babab

)|,(log λsXp)|,( λsXp

1

2

1

2

1

2

statrt1

212a

22a

21a

4v 7v 4vA simpler example (DHMM) :

Page 10: Dynamic data selection in search (1/4)

221121 log)1(log)1(log8765

log4321

allall

2221

1211

log8487

log7365

log6243

log5121

aallall

aallall

aallall

aallall

1i

2i

1j 2j1t 2t

pathsall87654321 all

A simpler example (DHMM) :

i

j

t

Page 11: Dynamic data selection in search (1/4)

jkvxt

ktt

K

k

N

j

ij

T

ttt

N

j

N

i

N

ii

bvjs

ajsisis

p

p

pQ

kt

log),|~,Pr(

log),|,Pr(log),|Pr(

)|,(log

)|,(

)|,()|(

~:11

1

11

1111

λXx

λXλX

λSX

λSX

λSXλλ

S

S

A simpler example (DHMM) :

Page 12: Dynamic data selection in search (1/4)

Application I : acoustic modeling (3/20)

K

kkktik

kktikt

ttt

ttttt

kkt

T

tt

K

k

N

iik

T

tt

K

k

N

i

ijt

T

t

N

j

N

ii

N

i

Nw

Nwki

kliski

isijsisji

Nkiwki

ajiiQ

1

1

111111

1

1111

1

_

),|(

),|(),(and

),|,Pr(),(

),|Pr()(),|,Pr(),(where

),|(log),(log),(

log),(log)()|(

indecomposedbecanfunctionQ

rmx

rmx

λX

λXλX

rmx

Page 13: Dynamic data selection in search (1/4)

Application I : acoustic modeling (4/20)

0,, where

),(gammanormalofproduct

aasassumedis),(thenmatrixprecisiondiagonalaisIf

),()(

:betoassumedisfordensitypriorThe

ikd

1

)(2)2/1(

1

1

111

1

11

1

2

ikdikd

D

d

rmr

ikdikik

ikikik

K

kikikik

K

k

N

i

N

jij

N

i

N

iic

ikdikdikdikdikd

ikd

ikd

ikiji

eerg

g

gwaKg

rm

rmr

rm

Constantrmgw

ag

ikik

K

kikik

K

k

N

i

ijij

N

j

N

iii

N

i

),(loglog1

log1log1)(log

111

111

_

Page 14: Dynamic data selection in search (1/4)

Application I : acoustic modeling (5/20)

)(

)1())(2()2

1(

),|(

1),|(log

0)|R(

take

2

1

)(2

12/1

1

_

ikdtdikd

ikdtdikd

mxr

ikd

D

d

ikiktikd

ikikt

ikd

mxr

mxrer

Nm

N

m

ikdtdikd

D

d

rmx

rmx

ikdikdikd

D

d

rmx

ikd

D

dikikt erN

2

1

)(2

12/1

1

),|(

rmx

Page 15: Dynamic data selection in search (1/4)

)(

)2)((2),(

1

),(

1),(log

2

2

)(2

1)2/1(

)(2

12/1

ikdikdikdikd

ikdikdikdkdrmr

ikdikik

rmr

ikdikikikd

ikik

mr

mreerg

eergm

g

ikdikdikdikdikdikd

ikd

ikdikdikdikdikdikd

kd

rm

rm

rm

D

d

rmr

ikdikikikdikd

ikdikdikdikd

ikd eerg1

)(2)2/1(

2

),( rm

Page 16: Dynamic data selection in search (1/4)

Application I : acoustic modeling (6/20)

),(

),(

),(),(

0)()(),(

1

1

11

1

ki

xkim

xkimmki

mrmxrki

t

T

tikd

tdt

T

tikdikd

ikd

ikdikdtdt

T

tikdikdikdt

T

t

ikdikdikdikdikdtdikdt

T

t

Page 17: Dynamic data selection in search (1/4)

Application I : acoustic modeling (7/20)

21

21

22/1

_

_

)(2

1

)(2

1)(

2

1

)()(2

1)log(

),|(log

0)|R(

take

ikdtdikd

ikdtdikd

ikdikdtdikdikd

ikikt

ikd

N

r

mxr

mxr

rmxrr

rmx

ikdikdikd

D

d

rmx

ikd

D

dikikt erN

2

1

)(2

12/1

1

),|(

rmx

Page 18: Dynamic data selection in search (1/4)

Application I : acoustic modeling (8/20)

ikdikdikdikd

ikdikd

ikd

ikdikdikd

ikdikd

ikik

rmr

ikdikikikd

ikik

mr

m

r

g

eergr

g

ikd

ikdikdikdikdikdikd

ikd

21

2

2/3

)(2

12/1

)(2

)2/1(

)()3()2()1(

)(2

)2()3()1(

)3()2()2/1(

),(

1

||),(

1),(log 2

rm

rm

rm

(1) (2) (3)

Page 19: Dynamic data selection in search (1/4)

Application I : acoustic modeling (9/20)

),()12(

))(,()(2

))(,()(2

)12(),(

0)(2

)2/1(

)(2

1),(

1

2

1

2

1

2

1

2

11

1

21

21

1

),(

ki

mxkimr

mxkim

rrki

mr

mxrki

t

T

tikd

ikdtdt

T

tikdikdikdikd

ikd

ikdtdt

T

tikdikdikdikd

ikdikdikdt

T

t

ikdikdikdikd

ikdikd

ikdtdikdt

T

t

nm

Page 20: Dynamic data selection in search (1/4)

Application I : acoustic modeling (10/20)

ikd

ikdikd

ikdikd

ikd

ikdikd

ikdikd

r

m

r

m

prior PDF theof mean theas chosen be can also And

2/1

prior PDF theof mode theas chosen be can estimate initial The

Page 21: Dynamic data selection in search (1/4)

Application I : acoustic modeling (11/20)

• When giving true tokes of => MCE– Imposter word : the hypothesized word is wrong,

and the likelihood exceeds the likelihood given its reference model.

– If we can minimize the total number of imposter word, we can reduce the WER.

a

refhrefh WWWXPWXP )|()|(

Page 22: Dynamic data selection in search (1/4)

Application I : acoustic modeling (12/20)

MCE• The misclassification distance measure for the

wrong word W is defined as

• And then a general form of the “smoothed” count of the imposter word is defined as

)exp(1

1)(

Ww d

d

)|())(|(111 Wtt

ttref

ttw

MMM XlXXld

Page 23: Dynamic data selection in search (1/4)

Application I : acoustic modeling (13/20)

MCE• The so-called GPD algorithm is adopted to minimized the

“smoothed” count of the imposter word

W

Wtt dLL )()( where)(1

)( wdL

)|())(|(111 Wtt

ttref

ttw

MMM XlXXld

Page 24: Dynamic data selection in search (1/4)

Application I : acoustic modeling (14/20)

MCE

ikdikd

ikd

ikd

T

rr

r

r

lll

a

xxxaX

log are

ations transform the where, parameters ed transform therespect to with

takenisgradient The . 0 precision theof ssdefinitene positive

theas such sconstraint certainsatisfy tohave parameters HMM The

sequence. labelcomponent mixture optimal its is },,,{ and path,

Viterbi optimal ingcorrespond its is }s,,s,{s assume We . model

phone ofset CT thein},,,,{)( tokencompetinga Given

_

T21

T21

21

Page 25: Dynamic data selection in search (1/4)

Application I : acoustic modeling (15/20)

ki

taq

aqt

rq

rq

ww

T

t

ki

taq

aqt

rq

rq

uw

uw

uw

T

t

ki

taq

aqt

rq

rq

T

tuw

uw

ki

taq

aqt

rq

rq

t

ttw

uw

ki

w

w

wki

w

xbaxbadd

xbaxba

d

d

d

xbaxba

d

d

xbaxba

d

d

Xd

Xd

XdXd

tttt

tttt

tttt

ttttm

)))(log())(log(())(1()(

)))(log())(log((

))exp(1(

)exp(

))exp(1(

1

)))(log())(log((

))exp(1(

)exp()(

)))(log())(log(()exp(1

1

)(

)(

))(())((

11

11

11

11

1

1

1

12

Page 26: Dynamic data selection in search (1/4)

Application I : acoustic modeling (16/20)

)()()())(1()(

)()()()(

),|())(1()(

)()()1())(()2

1(2

)())(1()(

)()())(log(

))(1()(

)))(log())(log(())(1()(

))((

1)(

1)(

)(2

12/1

11)(

1)(

1)(

)(

'

2

1

klisrmyddm

klisrmyyb

rmyNcddm

klismyr

erxb

cddm

klism

xbddm

m

xbaxbaddm

m

Xdmm

ttikdikdtd

T

tww

SXikd

ttikdikdtdt

ri

ikikrik

T

tww

SXikd

ttikdtdikd

rmy

ikd

D

itri

rik

T

tww

SXikd

ttikd

tri

T

tww

SXikd

ikd

tai

ait

ri

ri

T

tww

SXikd

ikd

w

SXikdikd

c

c

ikdikdtd

D

d

c

c

c

c

Reference model

Page 27: Dynamic data selection in search (1/4)

Application I : acoustic modeling (17/20)

)()())((1))(1()(log

)()()()()(

),|())(1()(log

)()(

)(22

1

2

1

)())(1()(log

)()())(log(

))(1()(log

)))(log())(log(())(1()(log

))((loglog

2

1)(

21'

1)(

2')()(

2

12/1

1

)()(2

1

2/1

2/1

,1

1)(

1)(

'_1)(

_)(

'

2'

1

'2

1

klismyrddr

klisrmyryb

rmyNcddr

klisr

myer

err

xb

cddr

klisrr

xbddr

r

r

r

xbaxbaddr

r

Xdrr

ttikdtdikd

T

tww

SXikd

ttikdikdtdikdt

ri

ikikrik

T

tww

SXikd

ttikd

ikdtd

rmy

ikj

D

j

rmy

ikdikj

D

djj

ti

rik

T

tww

SXikd

ttikdikd

tki

T

tww

SXikd

ikd

ikd

ikd

tai

ait

ci

ci

T

tww

SXikd

ikd

w

SXikdikd

c

c

ikdikdtd

D

d

ikdikdtd

D

d

c

c

c

c

Reference model

Page 28: Dynamic data selection in search (1/4)

Application I : acoustic modeling (18/20)

)()())((1))(1()(log

))((loglog

)()()())(1()(

))((

2

1)(

_)(

'

2

1)(

')(

'

klismyrddr

r

Xdrr

klisrmxddm

m

Xdmm

ttikdtdikd

T

tww

SXikd

ikd

w

SXikdikd

ttikdikdtd

T

tww

SXikd

ikd

w

SXikdikd

c

c

c

c

a modelFor