page 1 of 19 confidence measure using word posteriors sridhar raghavan confidence measure using word...

19
Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 3/6 3/6 2/6 4/6 2/6 2/6 2/6 1/6 4/6 1/6 1/6 4/6 1/ 6 1/6 4/6 5/6 Sil Sil This is a test sentence Sil this is the is a the guest a quest sentence sense 1/6 1/6 Sil

Upload: lauren-barton

Post on 26-Mar-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 1 of 19Confidence measure using word posteriors

Sridhar Raghavan

Confidence Measure using Word Graphs

3/6

3/6 3/6

2/6

4/6

2/6

2/6

2/6

1/6

4/6

1/6

1/6

4/6

1/6

1/6

4/6

5/6Sil

Sil

This

is

atest

sentence

Sil

this

is

the

is

a

the

guest

a

quest

sentence

sense 1/6 1/6

Sil

Page 2: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 2 of 19Confidence measure using word posteriors

Abstract

Confidence measure using word posterior:

• There is a strong need for determining the confidence of a word hypothesis in a LVCSR system because conventional viterbi decoding just generates the overall one best sequence, but the performance of a speech recognition system is based on Word error rate and not sentence error rate.

• Word posterior probability in a hypothesis is a good estimate of the confidence.

• The word posteriors can be computed from a word graph where the links correspond to the words.

• A forward-backward type algorithm is used to compute the link posteriors.

Page 3: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 3 of 19Confidence measure using word posteriors

What is a word posterior?

A word posterior is a probability that is computed by considering a word’s acoustic score, language model score and its presence is a particular path through the word graph.

An example of a word graph is given below, note that the nodes are the start-stop times and the links are the words. The goal is to determine the link posterior probabilities. Every link holds an acoustic score and a language model probability.

3/6

3/6 3/6

2/6

4/6

2/6

2/6

2/6

1/6

4/6

1/6

1/6

4/6

1/6

1/6

4/6

5/6Sil

Sil

This

is

atest

sentence

Sil

this

is

the

is

a

the

guest

a

quest

sentence

sense 1/6 1/6

Sil

Page 4: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 4 of 19Confidence measure using word posteriors

Example

Let us consider an example as shown below:

3/6

3/6 3/6

2/6

4/6

2/6

2/6

2/6

1/6

4/6

1/6

1/6

4/6

1/6

1/6

4/6

5/6Sil

Sil

This

is

atest

sentence

Sil

this

is

the

is

a

the

guest

a

quest

sentence

sense 1/6 1/6

Sil

The values on the links are the likelihoods.

Page 5: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 5 of 19Confidence measure using word posteriors

Forward-backward algorithm

Using forward-backward algorithm for determining the link probability.

The equations used to compute the alphas and betas are as follows:

Computing alphas:

Step 1: Initialization: In a conventional HMM forward-backward algorithm we would perform the following –

i statein are given we X

nobservatio theofy probabilitemission )(

state ofy probabilit Initial

1 )()(

1

1

11

Xb

i

NiXbi

i

i

ii

We need to use a slightly modified version of the above equation for processing a word graph. The emission probability will be the language model probability and the initial probability in this case has been taken as 0.01 (assuming we have 100 words in a loop grammar and hence all the words are equally probable with probability 1/100).

Page 6: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 6 of 19Confidence measure using word posteriors

Forward-backward algorithm continue…

The α for the first node in the word graph is computed as follows:

4-1E

01.0*01.0)(1

i

Step 2: Induction

yprobabilit model language theisit graphs rd wo

for ,Xn observatio theofy probabilitemmision )(b

score) (acoustic likelihood the

isit graphs for word y,probabilittion transi

1 ;2 )()()(

tj

11

t

ij

tj

N

iijtt

X

a

NjTtXbaij

This step is the main reason we use forward-backward algorithm for computing such probabilities. The alpha values computed in the previous step is used to compute the alphas for the succeeding nodes.

Note: Unlike in HMMs where we move from left to right at fixed intervals of time, over here we move from one start time of a word to the next closest word’s start time.

Page 7: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 7 of 19Confidence measure using word posteriors

Forward-backward algorithm continue…

Let us see the computation of the alphas from node 2, the alpha for node 1 was computed in the previous step during initialization.

Node 2:

07-E 5

01.0*)6/3(*0412

E

07-5.025E

)01.0*)6/3(*075()01.0*)6/3(*041(3

EE

Node 3:

Node 4:

09-1.675E

)01.0*)6/2(*07025.5(4

E

The alpha calculation continues in this manner for all the remaining nodes

Page 8: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 8 of 19Confidence measure using word posteriors

Forward-backward algorithm continue…

Once we compute the alphas using the forward algorithm we begin the beta computation using the backward algorithm.

The backward algorithm is similar to the forward algorithm, but we start from the last node and proceed from right to left.

Step 1 : Initialization

systems. ASRour ofboth in used valueinitial same

theis This 1. is node final at the of valueinitial the

hence and '1'usually isinstant final at the N The

N1 /1)(

iNiT

Step 2: Induction

node.current

thepreceedingjust nodes theof valuebeta The )(

yprobabilit model language The )(b

score likelihood a

Ni1 1....1;-T t)()()(

1

1j

ij

1 11

j

X

jXbai

t

t

N

j ttjijt

Page 9: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 9 of 19Confidence measure using word posteriors

Forward-backward algorithm continue…

Let us see the computation of the beta values from node 14 and backwards.

Node 14:

03-1.66E

1*01.0*)6/1(14

03-8.33E

.1*01.0*)6/5(13

Node 13:

Node 12:

05-5.555E

0333.8*01.0*)6/4(12

E

Page 10: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 10 of 19Confidence measure using word posteriors

Forward-backward algorithm continue…

Node 11:

05-1.666E

)0333.8*01.0*)6/1(( )03667.1*01.0*)6/1((12

EE

In a similar manner we obtain the beta values for all the nodes till node 1.

We can compute the probabilities on the links (between two nodes) as follows:

Let us call this link probability as Γ.

Therefore Γ(t-1,t) is computed as the product of α(t-1)*ß(t). These values give the un-normalized posterior probabilities of the word on the link considering all possible paths through the link.

Page 11: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 11 of 19Confidence measure using word posteriors

Word graph showing the computed alphas and betas

1 3

4

5

6

7 10

11

12

139

15

2

3/6

3/6 3/6

2/6

4/6

2/6

2/6

2/6

1/6

4/6

1/6

1/6

4/6

1/6

1/6

4/6

5/6Sil

Sil

This

is

atest

sentence

Sil

this

is

the

is

a

the

guest

quest14

sentence

sense 1/6 1/6

Silα =1E-04β=2.8843E-16

8

α =5E-07β=2.87E-16

α =5.025E-07β=5.740E-14

α=1.117E-11β=2.514E-9

α=1.675E-09β=1.5422E-13

α=3.35E-9β=8.534E-12

α=1.675E-11β=4.626E-11

α=2.79E-14β=2.776E-8

α=1.861E-14β=2.776E-8

α=7.446E-14β=3.703E-7

α=7.75E-17β=1.666E-5

α=4.964E-16β=5.555E-5

α=3.438E-18β=8.33E-3

α=1.2923E-19β=1.667E-3

α=2.886E-20β=1

Assumption here is that the probability of occurrence of any word is 0.01. i.e. we have 100 words in a loop grammar

This is the word graph with every node with its corresponding alpha and beta value.

Page 12: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 12 of 19Confidence measure using word posteriors

Link probabilities calculated from alphas and betas

Γ=4.649E-19

1 3

4

5

15

2

3/6

3/6 3/6

2/6

4/6

2/6

2/6

2/6

1/6

4/6

1/6

1/6

4/6

1/6

1/6

4/6

5/6Sil

Sil

This

is

atest

sentence

Sil

this

is

the

is

a

the

guest

quest14

sentence

sense 1/6 1/6

SilΓ=5.74E-18

Γ=2.87E-20

Γ=4.288E-18

Γ=7.749E-20

Γ=7.749E-20

Γ=1.549E-19

Γ=8.421E-18

Γ=4.649E-19

Γ=3.1E-19

Γ=4.136E1-18

Γ=3.1E-19

Γ=4.136E-18

Γ=4.136E-18

Γ=6.46E-19

Γ=1.292E-19

Γ=1.292E-19

Γ=3.438E-18

The following word graph shows the links with their corresponding link posterior probabilities (not yet normalized).

6

7

8

9

10

11

12

13

Γ=2.87E-20

By choosing the links with the maximum posterior probability we can be certain that we have included most probable words in the final sequence.

Page 13: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 13 of 19Confidence measure using word posteriors

Using it on a real application

Using the algorithm on real application:

* Need to perform word spotting without using a language model i.e. we

can only use a loop grammar.

* In order to spot the word of interest we will construct a loop grammar

with just this one word.

* Now the final one best hypothesis will consist of a sequence of the

same word repeated N times. So, the challenge here is to determine

which of these N words actually corresponds to the word of interest.

* This is achieved by computing the link posterior probability and

selecting the one with the maximum value.

Page 14: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 14 of 19Confidence measure using word posteriors

1-best output from the word spotter

The recognizer puts out the following output :-

0000 0023 !SENT_START -1433.434204

0023 0081 BIG -4029.476440

0081 0176 BIG -6402.677246

0176 0237 BIG -4080.437500

0237 0266 !SENT_END -1861.777344

We have to determine which of the three instances of the word actually exists.

Page 15: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 15 of 19Confidence measure using word posteriors

0 1

2

3

4

5 6 7

sent_start

sent_end

-1433

-1095

-1888

-2875

-4029

-912

-1070

-1232

-6402

-1861

8-4056

Lattice from one of the utterances

For this example we have to spot the word “BIG” in an utterance that consists of three words (“BIG TIED GOD”). All the links in the output lattice contains the word “BIG”. The values on the links are the acoustic likelihoods in log domain. Hence a forward backward computation just involves addition of these numbers in a systematic manner.

Page 16: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 16 of 19Confidence measure using word posteriors

Alphas and betas for the lattice

0 1

2

3

4

5 6 7

sent_start

sent_end

-1433

-1095

-1888

-2875

-4029

-912

-1070

-1232

-6402

-1861

8-4056

α =0β=-67344

α =-1433β=-65911

α =-2528β=-15533

α =-6761β=-14621

α =-12139β=-13551

α =-18833β=-12319

α =-25235β=-5917

α =-29291β=-1861

α =-31152β=0

Let the initial probability at both the nodes in this case be ‘1’. So, its logarithmic value is 0. The initial value can be any constant as it will not change the net result. The language model probability of the word is also ‘1’ since it is the only word in the loop grammar.

Page 17: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 17 of 19Confidence measure using word posteriors

Link posterior calculation

0 1

2

3

4

5 6 7

sent_start

sent_end

8

Γ=-67344

Γ=-18061Γ=-18061

Γ=-17942

Γ=-17859

Γ=-17781

Γ=-21382

Γ=-25690

Γ=-31152 Γ=-31152

Γ=-31152

It is observed that we can obtain a greater discrimination in confidence levels if we also multiply the final probability with the likelihood of the link other than the corresponding alphas and betas. In this example we add the likelihood since it is in log domain.

Page 18: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 18 of 19Confidence measure using word posteriors

Inference from the link posteriors

Link 1 to 5 corresponds to the first word time instance while 5 to 6 and 6 to 7 correspond to the second and third word instances respectively. It is very clear from the link posterior values that the first instance of the word “BIG” has a much higher probability than the other two.

Note: The part that is missing in this presentation is the normalization of these probabilities, this is needed to make comparison between various link posteriors.

Page 19: Page 1 of 19 Confidence measure using word posteriors Sridhar Raghavan Confidence Measure using Word Graphs 3/6 2/6 4/6 2/6 1/6 4/6 1/6 4/6 1/6 4/6 5/6

Page 19 of 19Confidence measure using word posteriors

References:• F. Wessel, R. Schlüter, K. Macherey, H. Ney. "Confidence Measures for Large Vocabulary

Continuous Speech Recognition". IEEE Trans. on Speech and Audio Processing. Vol. 9, No. 3, pp. 288-298, March 2001

• Wessel, Macherey, and Schauter, "Using Word Probabilities as Confidence Measures, ICASSP'97

• G. Evermann and P.C. Woodland, “Large Vocabulary Decoding and Confidence Estimation using Word Posterior Probabilities in Proc. ICASSP 2000, pp. 2366-2369, Istanbul.