sequence learning with ctc technique

38
替代役男 俊豪 [email protected] Sequence Learning from character based acoustic model to End-to-End ASR system

Upload: chunhao-wang

Post on 15-Apr-2017

1.132 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Sequence Learning with CTC technique

替代役男 ⺩王俊豪 [email protected]

Sequence Learning from character based acoustic model to End-to-End ASR system

Page 2: Sequence Learning with CTC technique

figure from A. Graves, Supervised sequence labelling, ch2 p9

…a stream of input data = a sequence of acoustic feature = a list of MFCC ( Here we use)

Learning Algorithm Thee ound oof

Error ?Extract Feature

Sequence Learning

Page 3: Sequence Learning with CTC technique

Outline1. Feedforward Neural Network 2. Recurrent Neural Network 3. Bidirectional Recurrent Neural Network 4. Long Short-Term Memory 5. Connectionist Temporal Classification 6. Build a End-to-End System 7. Experiment on Low-resource Language

Page 4: Sequence Learning with CTC technique

Active Function

Weighted

Feedforward Neural Network

Page 5: Sequence Learning with CTC technique
Page 6: Sequence Learning with CTC technique

Active Function

Weighted

Recurrent Neural Network

Page 7: Sequence Learning with CTC technique
Page 8: Sequence Learning with CTC technique

Active Function

Weighted

Bidirectional RNN

Page 9: Sequence Learning with CTC technique

figure from A. Graves, Supervised sequence labelling, ch3 p24

Page 10: Sequence Learning with CTC technique

figure from A. Graves, Supervised sequence labelling, ch3 p23

Page 11: Sequence Learning with CTC technique

Active func

G

G

G Input Gate

Output Gate

Forget Gate

G

W WeightedGate

M Multiplyer

M

M

M

W

W

RNNs

LSTMW

W

W

W

W

WW

Page 12: Sequence Learning with CTC technique

LSTM Forward Pass, Previous Layer Input

G

G

G Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

Previous layer

Page 13: Sequence Learning with CTC technique

LSTM Fordward Pass, Input Gate

G

G

Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

Previous layer

G

Page 14: Sequence Learning with CTC technique

LSTM Fordward Pass, Foget Gate

G

G Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

Previous layer

G

Page 15: Sequence Learning with CTC technique

LSTM Forward Pass, Cell

G

G Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

G

Page 16: Sequence Learning with CTC technique

LSTM Forward Pass, Output Gate

G Input Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

G

Previous layer

G

Page 17: Sequence Learning with CTC technique

LSTM Forward Pass, Output

G

G Input Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

G

Page 18: Sequence Learning with CTC technique

G

G

G Input Gate

Output Gate

Forget Gate

M

M

M

W

W

W

W

W

WW

LSTM Backward Pass, Output Gate

Page 19: Sequence Learning with CTC technique

RNN

LSTM

figure from A. Graves, Supervised sequence labelling, p33,35

Page 20: Sequence Learning with CTC technique

Additional CTC Output Layer(1) Predict label at any timestep

Connectionist Temporal Classification

(2) Probability of Sentence

Page 21: Sequence Learning with CTC technique

Predict label at any tilmestep Framewise Acoustic Feature

Label

A B C D E F G I J K L M N O P Q R S T U V W X Y Z {space} {blank} others like ‘ “ .

Page 22: Sequence Learning with CTC technique

Probability of SentenceThh..e S..outtd ooof

Th.e S.outd of

The Soutd of

Target

Step 1, remove repeated labels

Step 2, remove all blanks

paths:

labelling:

Predict

Now, we can do the maximum likelihood training

Page 23: Sequence Learning with CTC technique

Choose the labelling :

Best Path, the most probable path will correspond to most probable labelling

Output Decoding

Something wrong here. Prefix Search would be better

Page 24: Sequence Learning with CTC technique

How to connect target label and RNNs Output

Page 25: Sequence Learning with CTC technique

Sum over paths probability, forward

Page 26: Sequence Learning with CTC technique

l_s = . s-0 : .h.e.l. s-1 : .h.e.l s-2 : .h.e. ( s-2 blank to blank )

l_s = l s-0 : .h.e.l.l s-1 : .h.e.l. s-2 : .h.e.l ( s-2 same non-blank label )

only allow transition (1) between blank and non-blank label (2) distinct non-blank label

Page 27: Sequence Learning with CTC technique

Sum over paths probability, backward

Page 28: Sequence Learning with CTC technique

l_s = . s+0 : .l.l.o. s+1 : l.l.o. s+2 : .l.o. ( s+2 blank to blank )

l_s = l s+0 : l.l.o. s+1 : .l.o. s+2 : l.o. ( s+2 same non-blank label )

only allow transition (1) between blank and non-blank label (2) distinct non-blank label

Page 29: Sequence Learning with CTC technique

Sum over paths probability

D GO Target : dog -> .d.o.g..t=0

D GO .t=1

D GO .t=2

D GO .t=T-2

D GO .t=T-1

D GO .t=T

.

.d

.d.

.d.o

.d.o.

.d.o.g

.d.o.g.

s=1

s=2

s=3

s=4s=5

s=6

s=7

Page 30: Sequence Learning with CTC technique

Sum over paths probability

D GO

backward-forward : the probability of all the paths corresponding to l that go through the symbol at time t

.t=0

D GO .t=1

D GO .t=t’

D GO .t=T-1

D GO .t=T

Page 31: Sequence Learning with CTC technique

last we derivates w.r.t. unnormalised outputs, get error signal

Page 32: Sequence Learning with CTC technique

A. Graves. Sequence Transduction with Recurrent Neural Networks

Transcription network + Prediction network

Previous Labelfeature at time t

label prediction at time t next label probability

Page 33: Sequence Learning with CTC technique

A. Graves. Towards End-to-end Speech Recognition with Recurrent Neural Networks ( Google Deepmind )

Additional CTC Output Layer

Additional WER Output Layer

Minimise Objective Function -> Minimise Loss Function

Spectrogram feature

Page 34: Sequence Learning with CTC technique

Andrew Y. Ng. Deep Speech: Scaling up end-to-end speech recognition ( Baidu Research)

Additional CTC Output Layer

Additional WER Output Layer

Spectrogram feature

Language Model Transcription

RNN Output : arther n tickets for the game Decode Target: are there any tickets for the game

Q( L ) = log( P( L | x ) ) + a * log( P_LM( L ) ) + b * word_count( L )

Page 35: Sequence Learning with CTC technique

Low-resources language experiment

Target: ? ka didto sigi ra og tindogOutput: h bai to giy ngtndog

test 0.44, test 0.71

Page 36: Sequence Learning with CTC technique

1. Boulard and Morgan (1994) Connectionist speech recognition A hybrid approach. 2. A. Grave. Supervised Sequence Labelling with Recurrent Neural Networks3. A. Grave. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with

Recurrent Neural Network. 4. A. Graves. Sequence Transduction with Recurrent Neural Networks5. A. Graves. Towards End-to-end Speech Recognition with Recurrent Neural Networks ( Google

Deepmind ) 6. Andrew Y. Ng. Deep Speech: Scaling up end-to-end speech recognition ( Baidu Research) 7. standford CTC: https://github.com/amaas/stanford-ctc8. R. Pascanu , On the difficulty of training recurrent neural networks9. L. Besacier. Automatic Speech Recognition for Under-Resourced Languages: A Survey10.A. Gibiansky. http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-

networks/11. LSTM Mathematic formalism ( mandarin )http://blog.csdn.net/u010754290/article/details/4716797911. Assumption of Markov Modelhttp://jedlik.phy.bme.hu/~gerjanos/HMM/node5.html12. Story about ANN-HMM Hybrid ( mandarin )HMM : http://www.taodocs.com/p-5260781.html13. Generative model v.s. discriminative model http://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-discriminative-algorithm

Reference

Page 37: Sequence Learning with CTC technique

Memory Network : A. Grave. Neural Turing Machine

Appendix B. future work

figure from http://www.slideshare.net/yuzurukato/neural-turing-machines-43179669

Page 38: Sequence Learning with CTC technique

Appendix A. Generative v.s. discriminative

A generative model learns the joint probability p(x,y). A discriminative model learns the conditional probability p(y|x).

data input : (1,0), (1,0), (2,0), (2,1)

y=0 y=1

x=1 1/2 0

x=2 1/4 1/4

y=0 y=1

x=1 1 0

x=2 1/2 1/2

According to Andrew Y. Ng. On Discriminative vs. Generative classifier: A comparison of logistic regression and naive Bayes

The overall gist is that discriminative models generally outperform generative models in classification tasks.

Generative Discriminative