[paper introduction] efficient lattice rescoring using recurrent neural network language models

Efficient Lattice Rescoring using Recurrent Neural Network Language ModelsX. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland Proc. of ICASSP 2014

Introduced by Makoto Morishita 2016/02/25 MT Study Group

What is a Language Model

• Language models assign a probability to each sentence.

2

W1 = speech recognition system

W2 = speech cognition system

W3 = speck podcast histamine

P(W1) = 4.021 * 10-3

P(W2) = 8.932 * 10-4

P(W3) = 2.432 * 10-7

What is a Language Model

• Language models assign a probability to each sentence.

3

W1 = speech recognition system

W2 = speech cognition system

W3 = speck podcast histamine

P(W1) = 4.021 * 10-3

P(W2) = 8.932 * 10-4

P(W3) = 2.432 * 10-7

Best!

In this paper…

• Authors propose 2 new methods to efficiently re-score speech recognition lattices.

4

0 1

7

9

2 3 4 5 6

8

high this is my mobile phone

phones

this

this

hi

hy

Language Models

n-gram back off model

6

This is my mobile phone

hone

home2345

1

• Use n-gram words to estimate the next word probability.

n-gram back off model

• Use n-gram words to estimate the next word probability.

7

This is my mobile phone

hone

home2345

1If bi-gram, use these words.

Feedforward neural network language model

• Use n-gram words and feedforward neural network.

8

[Y. Bengio et. al. 2002]

Feedforward neural network language model

9

[Y. Bengio et. al. 2002]

http://kiyukuta.github.io/2013/12/09/mlac2013_day9_recurrent_neural_network_language_model.html


Recurrent neural network language model

• Use full history contexts and recurrent neural network.

10

[T. Mikolov et. al. 2010]

001

0

current word

history

sigmoid softmax

wi�1

si�2

si�1

si�1

P (wi|wi�1, si�2)

Language Model States

LM states

12

• To use LM for re-scoring task, we need to store the states of LM to efficiently score the sentence.

bi-gram

13

0 1 2 3

a

b

c

e

d

SR Lattice

bi-gram LM states

1aa

bc

e

1b

2c

2d

0<s> 3e

e

cd

d

tri-gram

14

0 1 2 3

a

b

c

e

d

SR Lattice

tri-gramLM states

1<s>,aa

b

0<s>

2<s>,b

2a,c

2a,d

2a,c

2a,d

c

d

c

d

3e,d

3e,c

e

ee

e

tri-gram

15

0 1 2 3

a

b

c

e

d

SR Lattice

tri-gramLM states

1<s>,aa

b

0<s>

2<s>,b

2a,c

2a,d

2a,c

2a,d

c

d

c

d

3e,d

3e,c

e

ee

e

States become larger!

Difference

• n-gram back off model & feedforward NNLM - Use only fixed n-gram words.

• Recurrent NNLM- Use whole past words (history). - LM states will grow rapidly. - It takes a lot of computational cost.

16

We want to reduce recurrent NNLM states

Hypothesis

Context information gradually diminishing

• We don’t have to distinguish all of the histories.

• e.g.I am presenting the paper about RNNLM. ≒ We are presenting the paper about RNNLM.

18

Similar history make similar vector

• We don’t have to distinguish all of the histories.

• e.g.I am presenting the paper about RNNLM. ≒ I am introducing the paper about RNNLM.

19

Proposed Method

n-gram based history clustering

• I am presenting the paper about RNNLM. ≒ We are presenting the paper about RNNLM.

• If the n-gram is the same,we use the same history vector.

21

History vector based clustering

• I am presenting the paper about RNNLM. ≒ I am introducing the paper about RNNLM.

• If the history vector is similar to other vector, we use the same history vector.

22

Experiments

Experimental results

24

4-gram back-off LMFeedforward NNLM

RNNLM Reranking

RNNLM n-gram based history clustering

RNNLM history vector based clustering

Baseline


25


RNNLM Reranking



Baseline


26


RNNLM Reranking



Baseline

comparable WER and70% reduction in lattice size

27



Same WER and45% reduction in lattice size


28



Same WER and7% reduction in lattice size



29


RNNLM Reranking



Baseline

Comparable WER and72.4% reduction in lattice size

Conclusion

Conclusion

• Proposed methods can achieve comparable WER with 10k-best re-ranking, as well as over 70% compression in lattice size.

• Small lattice size make computational cost smaller!

31

References

• これもある意味Deep Learning，Recurrent Neural Network Language Modelの話 [MLAC2013_9日目]http://kiyukuta.github.io/2013/12/09/mlac2013_day9_recurrent_neural_network_language_model.html

32


Prefix tree structuring

33

[paper introduction] efficient lattice rescoring using recurrent neural network language models

Technology