a short introduction to neural machine...

158
Machine Translation Marc Dymetman Centrale-Supélec NLP Course: Lecture 6 26 February 2018 1

Upload: hoangthuan

Post on 27-Jul-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Machine Translation

Marc Dymetman

Centrale-Supélec NLP Course: Lecture 6

26 February 2018

1

Page 2: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Outline

• Machine Translation: The Problem

• Symbolic MT : Rule-Based MT (RBMT)

• Statistical MT : Phrase-Based MT(PBMT)

• MT Evaluation

• Neural MT (NMT)

• Language Modelling with RNNs and LSTMs

• Seq2Seq Models for NMT

• Attention Models

• Advanced NMT models

• Other uses of Seq2Seq models

• Toolkits and Learning Resources2

Page 3: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Machine Translation: a difficult problem

3

Page 4: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Lexical differences between languages

4[Credit: Jurafsky and Martin 2000]

Page 5: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Different specificities

5[Credit: Jurafsky and Martin 2000]

Page 6: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Different specificities

6

The student was thinking

L’étudiant réfléchissaitL’étudiante réfléchissait

In many cases, the source text is not enough.Only access to the situation helps.

Page 7: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Different specificities

7

The student was thinking

L’étudiant réfléchissaitL’étudiante réfléchissait

QUIZ: In fact even the second translation is (probably) wrongCan you spot why ?

Page 8: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Translation and mental representations

[Credit: Dymetman 1994]

Page 9: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Human Translation

[Credit: Dymetman 1994]

Translation by human

Mostly easy for us

English French

leg jambe / patte

map / plane plan

they ils / elles

his / her son (also sa)

student étudiant / étudiante

he walked out il est sorti

she sailed across the Atlantic

Elle a traversé l’Atlantique à la voile

47 miles per gallon 6 litres aux cents

… …

Page 10: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Translation and mental representations

[Credit: Dymetman 1994]

Translation by machine

Sometimes difficult or even impossible for machines

English French

leg jambe / patte

map / plane plan

they ils / elles

his / her son (also sa)

student étudiant / étudiante

he walked out il est sorti

she sailed across the Atlantic

Elle a traversé l’Atlantique à la voile

47 miles per gallon 6 litres aux cents

… …

Page 11: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT progress: some problems are now solved

11

Quiz: Can you guess the English source ?

English French

? La rose du taux de chômage

? Vieillissez par le sexe

Early rule-based MT

Page 12: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT progress: some problems are now solved

12

English French

The unemployment rate rose La rose du taux de chômage

Age by sex Vieillissez par le sexe

Early rule-based MT

Page 13: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT progress: some problems are now solved

13

English French

The unemployment rate rose La rose du taux de chômage

Age by sex Vieillissez par le sexe

Early rule-based MT

GT 2018

Page 14: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT progress: some problems are now solved

14

English French

The unemployment rate rose La rose du taux de chômage

Age by sex Vieillissez par le sexe

Early rule-based MT

GT 2018

Page 15: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT progress: some problems are being solved

15

GT 2018

DeepL 2018

Page 16: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT progress: some problems will probably be solved “soon”

16

GT 2018

Promise vs. PersuadeWell-knownsyntactic rule (*)

Cultural conventionsabout units

(*) Pierre Isabelle 2017: Challenge Set Approach to Evaluating Machine Translation

Page 17: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT progress: some problems will (probably) take a long time to solve

17

GT 2018

[Winograd Schemas, see: https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WSCollection.html]

Commonsense reasoning

Page 18: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT progress: some problems will (probably) take a long time to solve

18

GT 2018

[Winograd Schemas, see: https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WSCollection.html]

Commonsense reasoning

Page 19: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Brief History of Machine Translation

19

Page 20: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

20[Credit: Chris Manning 2016]

Page 21: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

21[Credit: Ken Heafield 2017]

Page 22: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Symbolic MT

aka: Rule Based MT (RBMT)

22

Page 23: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Rule-Based MT: Syntax plays a large role

23[Credit: P. Koehn. Statistical Machine Translation Book. 2010]

Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.

Phrase-StructureGrammar

Dependency-StructureGrammar

Page 24: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Vauquois Triangle

Page 25: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

RBMT (Rule-Based MT)

25

Tom misses Paris Paris manque à Tom

Page 26: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

RBMT (Rule-Based MT)

26

Tom misses Paris Paris manque à Tom

Page 27: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

RBMT (Rule-Based MT)

27

Tom misses Paris Paris manque à Tom

miss(Tom,Paris)

Page 28: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

RBMT (Rule-Based MT)

28

Tom misses Paris Paris manque à Tom

miss(Tom,Paris) manquer_à(Paris,Tom)

Page 29: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

RBMT (Rule-Based MT)

29

Tom misses Paris Paris manque à Tom

miss(Tom,Paris) manquer_à(Paris,Tom)

Page 30: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

RBMT (Rule-Based MT)

30

Tom misses Paris Paris manque à Tom

miss(Tom,Paris) manquer_à(Paris,Tom)

Pred: wish_forAgent: TomObject: Paris

Page 31: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Statistical MT(SMT)

31

Page 32: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Phrase-Based MT(PBMT)

32

Page 33: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Phrase-based SMT

33[Credit: Koehn 2006]

Page 34: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Bilingual corpora

34[Credit: P. Koehn. Statistical Machine Translation Book. 2010]

Page 35: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Learning bi-phrases

35[Credit: Koehn]

• Word-alignment phase

• Based on EM (Expectation Maximization)

• Tools: GIZA++, …

Page 36: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Learning bi-phrases

36

• Bi-phrase extraction based on word-alignments

• Tools: MOSES, …

[Credit: Koehn]

Page 37: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Bi-phrase table

37[Credit: P. Koehn. Statistical Machine Translation Book. 2010]

Page 38: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding

38

Obama was waiting for election results in Chicago

Page 39: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding

39

Obama was waiting for election results in Chicago

Obama was waiting for election results in Chicago

Page 40: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with bi-phrases

40

Obama was waiting for election results in Chicago

Obama

1a

Obama was waiting for election results in Chicago

Page 41: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding

41

Obama was waiting for election results in Chicago

Obama attendait

1a 2a

Obama was waiting for election results in Chicago

Page 42: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with bi-phrases

42

Obama was waiting for election results in Chicago

Obama attendait les résultats des élections

1a 2a 3a

Obama was waiting for election results in Chicago

Page 43: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding

43

Obama was waiting for election results in Chicago

Obama attendait les résultats des élections Chicagoà

1a 2a 3a 4a

Obama was waiting for election results in Chicago

Page 44: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with bi-phrases

44

Obama was waiting for election results in Chicago

Obama attendait les résultats des élections Chicagoà

1a 2a 3a 4ascore = 5.21

Obama was waiting for election results in Chicago

Page 45: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding

45

Obama was waiting for election results in Chicago

Obama attendait résultats élection Chicagodans

1a 2a 3b 5bscore = 2.04

4b

Obama was waiting for election results in Chicago

Page 46: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with bi-phrases

46

Obama was waiting for election results in Chicago

Obama attendait les résultats des élections Chicagoà

1a 2a 3a 4ascore = 5.21

Obama was waiting for election results in Chicago In practice, several candidates are compared in parallel using beam-search

Page 47: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Scoring

47

𝑝𝝀 𝑡 𝑠 ∝ exp𝑖𝜆𝑖 ℎ𝑖(𝑠, 𝑡)

parameters features

source sentence

target sentence score

Log-Linear model

Features:• ℎlm: language model feature• ℎbs1, ℎbs2, …: features related to

plausibilities of the bi-phrases applied• Other features (distorsion, word

penalty, etc.)

Parameters:• Optimized towards minimizing distance

(e.g. BLEU) relative to reference translations

Page 48: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT Evaluation

48

Page 49: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

MT evaluation: no simple notion of a correcttranslation

49[Credit: Koehn 2010]

Page 50: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Manual Evaluation

50[Credit: Bojar 2017]

Page 51: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Manual Evaluation: Adequacy and Fluency

51

Page 52: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Problems of Manual Evaluation

52[Credit: Bojar 2017]

Page 53: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Automatic Evaluation

53[Credit: Bojar 2017]

Page 54: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Reference (human) translation:

The US island of Guam is

maintaining a high state of alert

after the Guam airport and its

offices both received an e-mail

from someone calling himself

Osama Bin Laden and threatening a

biological/chemical attack against

the airport.

Machine translation:

The American International airport and its

the office a receives one calls self the sand

Arab rich business and so on electronic

mail, which sends out; The threat will be

able after the maintenance at the airport.

• N-gram precision (score between 0 & 1)

• What % of MT n-grams (a sequence

of words) can be found in the

reference translation?

• Brevity Penalty

• Can’t just type out single word

“the’’ (precision 1.0!)

BLEU

[Credit: A. Way] 54

Page 55: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

BLEU

• Reference Translation: The gunman was shot to death by the police .

• The gunman was shot kill .

• Wounded police jaya of

• The gunman was shot dead by the police .

• The gunman arrested by police kill .

• The gunmen were killed .

• The gunman was shot to death by the police .

• The ringer is killed by the police .

• Police killed the gunman .

• Green = 4-gram match (good!) Red = unmatched word (bad!)

[Credit: A. Way] 55

Page 56: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

BLEU Metrics

• Proposed by IBM’s SMT group (Papineni et al, ACL-2002)

• Widely used in MT evaluations

• BLEU Metric:

– pn: Modified n-gram precision

– Geometric mean of p1, p2,..pn

– BP: Brevity penalty (c=length of MT hypothesis, r=length of reference)

– Usually, N=4 and wn=1/N.

𝐵𝐿𝐸𝑈 = 𝐵𝑃. exp

𝑛=1

𝑁

𝑤𝑛 log 𝑝𝑛

[Credit: A. Way] 56

Page 57: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

An Example

• MT Hypothesis: The gunman was shot dead by police .

– Ref 1: The gunman was shot to death by the police .

– Ref 2: The gunman was shot to death by the police .

– Ref 3: Police killed the gunman .

– Ref 4: The gunman was shot dead by the police .

• Precision: p1=1.0 (8/8) p2=0.86 (6/7) p3=0.67 (4/6) p4=0.6 (3/5)

• Brevity Penalty: c=8, r=9, BP=0.8825

• Final Score:

[Credit: A. Way] 57

Page 58: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Correlation of BLEU with human judgments

58[Credit: P. Koehn. Statistical Machine Translation Book. 2010]

Page 59: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Neural MT

59

[Cre

dit

: hla

litec

h.o

rg]

Page 60: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

NMT Techniques

60

Page 61: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Recurrent Neural Networks: Refresher

61

Page 62: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

A Feedforward Neural Network

62

𝑥

𝑜

input

output

hiddenstate

Page 63: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

A Recurrent Neural Network

ℎ𝑡

𝑥𝑡

𝑜𝑡

63

𝑥

𝑜

Page 64: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

A Recurrent Neural Network

ℎ𝑡

𝑥𝑡

𝑜𝑡

64

ℎ𝑡 = 𝑓𝑊 ℎ𝑡−1, 𝑥𝑡

Page 65: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

A Recurrent Neural Network

ℎ𝑡

𝑥𝑡

𝑜𝑡

65

ℎ𝑡 = 𝑓𝑊 ℎ𝑡−1, 𝑥𝑡

𝑜𝑡 = 𝑔𝑊 ℎ𝑡

Page 66: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

A Recurrent Neural Network

ℎ𝑡

𝑥𝑡

𝑜𝑡

66

ℎ𝑡 = 𝑓𝑊 ℎ𝑡−1, 𝑥𝑡

𝑜𝑡 = 𝑔𝑊 ℎ𝑡

= tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

= softmax(𝑊ℎ𝑜ℎ𝑡)

tanh

softmax 𝒚 𝑖 =e𝑦𝑖

σ𝑗 e𝑦𝑗

Page 67: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

A Recurrent Neural Network

ℎ𝑡

𝑥𝑡

𝑜𝑡

67

ℎ𝑡 = 𝑓𝑊 ℎ𝑡−1, 𝑥𝑡

𝑜𝑡 = 𝑔𝑊 ℎ𝑡

= tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

= softmax(𝑊ℎ𝑜ℎ𝑡)

tanh

softmax 𝒚 𝑖 =e𝑦𝑖

σ𝑗 e𝑦𝑗

𝑊 = (𝑊ℎℎ ,𝑊𝑥ℎ ,𝑊ℎ𝑜) : shared between all time steps

Page 68: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

A Recurrent Neural Network

ℎ𝑡

𝑥𝑡

𝑜𝑡

68

ℎ𝑡 = 𝑓𝑊 ℎ𝑡−1, 𝑥𝑡

𝑜𝑡 = 𝑔𝑊 ℎ𝑡

= tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

= softmax(𝑊ℎ𝑜ℎ𝑡)

tanh

softmax 𝒚 𝑖 =e𝑦𝑖

σ𝑗 e𝑦𝑗

𝑊 = (𝑊ℎℎ ,𝑊𝑥ℎ ,𝑊ℎ𝑜) : shared between all time steps

Note: bias terms omitted

Page 69: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

ℎ𝑡

𝑥𝑡

𝑜𝑡

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡

𝑥𝑡

𝑜𝑡

69

A Recurrent Neural Network

Page 70: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Language modelling with RNNs

70

Il était une fois un roi et

une reine si fâchés de

n’avoir point d’enfants

Page 71: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Language modelling with RNNs

Il était une fois un roi et

une reine si fâchés de

n’avoir point d’enfants

71

Let’s first assume that we have already trained some ``good’’ RNN : 𝑊 = (𝑊ℎℎ ,𝑊𝑥ℎ ,𝑊ℎ𝑜) …

Page 72: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Language modelling with RNNs

72

Let’s first assume that we have already trained some ``good’’ RNN : 𝑊 = (𝑊ℎℎ ,𝑊𝑥ℎ ,𝑊ℎ𝑜) …

… and see how this RNN can be used for predicting new texts

Il était une fois un roi et

une reine si fâchés de

n’avoir point d’enfants

Page 73: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

ℎ0

73

hidden-state initialization

Page 74: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

ℎ0

𝑜0

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡) 74

Page 75: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

.001 .7 … .002 … .1 .1 … .005

Decoding with the trained RNN

ℎ0

𝑜0

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡) 75

Page 76: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

.001 .7 … .002 … .1 .1 … .005

Decoding with the trained RNN

ℎ0

𝑜0

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡) 76

Page 77: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

il

ℎ0

𝑜0

77

Page 78: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

il

ℎ0

𝑜0

il

78

𝑥1

Page 79: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

il

ℎ0

𝑜0

il

𝑥10 1 … 0 … 0 0 … 0

79

Page 80: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

il

ℎ0

𝑜0

il

𝑥10 1 … 0 … 0 0 … 0

80

1-hot encoding

Page 81: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

il

ℎ0

𝑜0

il

ℎ1

𝑥10 1 … 0 … 0 0 … 0

81𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

Page 82: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

il

ℎ0

𝑜0

il

ℎ1

𝑥1

.001 .01 … .002 … .5 .1 … .005

𝑜1

0 1 … 0 … 0 0 … 0

82𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

Page 83: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

il

ℎ0

𝑜0

il

ℎ1

𝑥1

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

.001 .01 … .002 … .5 .1 … .005

𝑜1

83

Page 84: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

il

ℎ0

𝑜0

il

ℎ1

𝑥1

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

𝑜1

était

était

84

𝑥2

Page 85: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Decoding with the trained RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

il

ℎ0

𝑜0

il

ℎ1

𝑥1

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

𝑜1

était

était

ℎ2

𝑥2

𝑜2

ℎ3

𝑥3

𝑜3

ℎ4

𝑥4

𝑜4

une fois

une fois …

85

Page 86: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Training a RNN

il était une fois …il était une fois un …

86

Training set

Page 87: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Training a RNN

il était une fois …il était une fois un …

87

Training set

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

Initial values of parameters

Page 88: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Training a RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

ℎ0

il était une fois …il était une fois un …

88

Page 89: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Training a RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

ℎ0

𝑜0 .5 .01 … .002 … .1 .1 … .005

il était une fois un …

89

Page 90: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Training a RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

ℎ0

𝑜0 .5 .01 … .002 … .1 .1 … .005

il était une fois un …

90

Page 91: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Training a RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

ℎ0

𝑜0 .5 .01 … .002 … .1 .1 … .005

Cross-Entropy Loss: −log 𝑝 il = −log .01

il était une fois un …

91

Page 92: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Training a RNN

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

ℎ0

𝑜0 .1 .05 … .07 … .3 .1 … .3

il était une fois un

Cross-Entropy Loss: −log 𝑝 était = −log .3

il

ℎ1

𝑥1

𝑜1

92

Page 93: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

il était une fois un …

Backpropagation through time

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

ℎ0

𝑜0

il

ℎ1

𝑥1

𝑜1

était

ℎ2

𝑥2

𝑜2

ℎ3

𝑥3

𝑜3

ℎ4

𝑥4

𝑜4

une fois

LOSS

93

Page 94: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

il était une fois un …

Backpropagation through time

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

ℎ0

𝑜0

il

ℎ1

𝑥1

𝑜1

était

ℎ2

𝑥2

𝑜2

ℎ3

𝑥3

𝑜3

ℎ4

𝑥4

𝑜4

une fois

𝑊ℎℎ 𝑊ℎℎ 𝑊ℎℎ 𝑊ℎℎ

𝑊ℎ𝑜 𝑊ℎ𝑜 𝑊ℎ𝑜 𝑊ℎ𝑜 𝑊ℎ𝑜

94

LOSS

Page 95: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

il était une fois un …

Backpropagation through time

𝑜𝑡 = softmax(𝑊ℎ𝑜ℎ𝑡)

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

ℎ0

𝑜0

il

ℎ1

𝑥1

𝑜1

était

ℎ2

𝑥2

𝑜2

ℎ3

𝑥3

𝑜3

ℎ4

𝑥4

𝑜4

une fois

𝑊ℎℎ 𝑊ℎℎ 𝑊ℎℎ 𝑊ℎℎ

𝑊ℎ𝑜 𝑊ℎ𝑜 𝑊ℎ𝑜 𝑊ℎ𝑜 𝑊ℎ𝑜

95

LOSSBPTT:

Back-PropagationThrough Time

Page 96: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Recap:word-levellanguagemodeling

96[Credit: Jozefowicz]

Page 97: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Vanilla RNNs have issues with long-term interactions

97[Credit: Alex Graves]

Page 98: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

LSTMs can maintain long-term memories

98

Long Short Term Memory Networks Hochreiter & Schmidthuber 1997

[Credit: Alex Graves]

Page 99: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

RNNs with longer-term memory: LSTMs and GRUs

LSTM

GRU

http://colah.github.io/posts/2015-08-Understanding-LSTMs

These variants of RNNs alleviate the “vanishing gradient problem” and allow the network to model long-distance effects

99

Page 100: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

It time permits: Hinton’s explanation of LSTM

100

Page 101: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

101[Credit: Hinton 2013]

Page 102: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

102[Credit: Hinton 2013]

Page 103: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

103[Credit: Hinton 2013]

Page 104: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

104[Credit: Hinton 2013]

Page 105: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

105[Credit: Hinton 2013]

Page 106: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

106[Credit: Hinton 2013]

Page 107: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

107[Credit: Hinton 2013]

Page 108: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

End Hinton’s explanation of LSTM

108

Page 109: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Seq2Seq models for NMT

109

Page 110: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Seq2Seq for Machine Translation

110[Cho’s blog on NMT]

Page 111: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Seq2Seq RNN

111http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Page 112: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Seq2Seq RNN

112http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

[Sutzkever et al, 2014. Sequence to Sequence Learning with Neural Networks]

Source/TargetInterface

Page 113: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Vanilla RNN-based NMT

113https://github.com/tensorflow/nmt

Page 114: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention in NMT

114

Page 115: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Introducing Attention:Vanilla seq2seq and information bottleneck

115[Adapted from http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf]

Problem: fixed-dimensional

interface between

encoder and decoder

Page 116: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention mechanism

116http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Solution: “Random Access

Memory” (kind of)

Page 117: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention mechanism

117http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Simplified version of (Bahdanau et al. 2015)

Page 118: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention mechanism: scoring

118http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Compare target and source

hidden states

Page 119: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention mechanism: scoring

119http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Compare target and source

hidden states

Page 120: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention mechanism: scoring

120http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Compare target and source

hidden states

Page 121: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention mechanism: normalization

121http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Convert into alignment

weights

Page 122: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention mechanism: context

122http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Build context vector:

weighted average

Page 123: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention mechanism: next hidden state

123http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Compute the next hidden

state

Page 124: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Attention

124http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture10.pdf

Page 125: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

NMT with attention

125https://github.com/tensorflow/nmt

Page 126: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

NMT with attention

126http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp

Page 127: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

The “canonical” NMT architecture:RNN seq2seq with attention and with bidirectional encoding

127https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-3/ (Cho’s Blog, 2015)

bidirectional encoding

Page 128: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Advanced NMT Models

128

Page 129: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Alternative Architectures• Convolutional approaches

• “Attention is all you need”

129

Page 130: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Convolutional models

• Illustration from:

[Kalchbrenner et al. 2016, Neural Machine Translation in Linear Time]https://arxiv.org/abs/1610.10099

(ByteNet)

130

• Shorter “maximum path length”

• More parallelizable

Page 131: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Convolutional models

131

• Shorter “maximum path length”

• More parallelizable

RNN

Page 132: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Convolutional models

• Also: Convolution + Attention:

[Gehring et al. 2017, Convolutional Sequence to Sequence Learning]https://arxiv.org/abs/1705.03122

(ConvS2S, fairseq)

132

• Illustration from:

[Kalchbrenner et al. 2016, Neural Machine Translation in Linear Time]https://arxiv.org/abs/1610.10099

(ByteNet)

Page 133: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

From convolution to self-attention

𝑠1 𝑠2 𝑠3 𝑠4 𝑠5 𝑠6 𝑠7 𝑠8

Convolution• The next level for 𝑠3 gets input

from a fixed number of (immediate) neighbors

133

Page 134: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

From convolution to self-attention

134

𝑠1 𝑠2 𝑠3 𝑠4 𝑠5 𝑠6 𝑠7 𝑠8 𝑠1 𝑠2 𝑠3 𝑠4 𝑠5 𝑠6 𝑠7 𝑠8

Convolution• The next level for 𝑠3 gets input

from a fixed number of (immediate) neighbors

Self-attention• The next level for 𝑠3 gets input from

a variable number of (perhaps distant) neighbors …… according to attention weights relative to 𝑠3

Page 135: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

“Attention is all you need”

• [Vaswani et al. (2017). Attention Is All You Need]http://arxiv.org/abs/1706.03762

(Transformer)

• Attention applied to:• Encoding of the source (self-attention)

• Decoding of the next target word:

• Attention to source-words encodings

• Attention to previous target words

135

Page 136: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Speed

136

https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html

Page 137: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Coreference resolution (Winograd schemas)

137

self-attention

https://research.googleblog.com/2017/08/transformer-novel-neural-network.html

Page 138: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Coreference resolution (Winograd schemas)

138https://research.googleblog.com/2017/08/transformer-novel-neural-network.html

Page 139: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Multilingual NMT

139

Page 140: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Multilingual NMT

140

[Slide cited by: Marta Costa-Jussa, 2017]

Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism (2016), O. Firhat et al. ] Google Multilingual NMT system: Enabling zero-shot translation (2017) ,Melvin Johnson et al.

Page 141: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Google Multilingual NMT

141https://www.youtube.com/watch?v=nR74lBO5M3s

[Melvin Johnson et al, 2017. Google Multilingual NMT system: Enabling zero-shot translation ]

Page 142: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Google Multilingual NMT

142https://www.youtube.com/watch?v=nR74lBO5M3s

[Melvin Johnson et al, 2017. Google Multilingual NMT system: Enabling zero-shot translation ]

Page 143: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Google Multilingual NMT

143https://www.youtube.com/watch?v=nR74lBO5M3s

[Melvin Johnson et al, 2017. Google Multilingual NMT system: Enabling zero-shot translation ]

Page 144: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Google Multilingual NMT

144https://www.youtube.com/watch?v=nR74lBO5M3s

[Melvin Johnson et al, 2017. Google Multilingual NMT system: Enabling zero-shot translation ]

Page 145: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Google Multilingual NMT

145https://www.youtube.com/watch?v=nR74lBO5M3s

[Melvin Johnson et al, 2017. Google Multilingual NMT system: Enabling zero-shot translation ]

Page 146: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Google Multilingual NMT

146https://www.youtube.com/watch?v=nR74lBO5M3s

[Melvin Johnson et al, 2017. Google Multilingual NMT system: Enabling zero-shot translation ]

Page 147: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Google Multilingual NMT: Enabling zero-shot translation

147

https://www.youtube.com/watch?v=nR74lBO5M3s[Melvin Johnson et al, 2017. Google Multilingual NMT system: Enabling zero-shot translation ]

https://arxiv.org/pdf/1611.04558.pdf

Page 148: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Google Multilingual NMT: Enabling zero-shot translation

148https://arxiv.org/pdf/1611.04558.pdf

Page 149: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

The Generality of Seq2Seq Models

149

Page 150: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Image Captioning

150[Show and Tell: A Neural Image Caption Generator, Vinyals et al, 2014]

Page 151: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Natural Language Generation

151[Credit: Agarwal et al., 2017]

MR: Meaning Representation

(Dialog Act)

RF: Reference (Test Set)

Pred: Seq2seq prediction

Structure is encoded as a character sequence

Char2Char model with attention

Page 152: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Seq2Seq Applications

152

• text —> text• Machine Translation

• Summarization

• Dialogue

• —> text• Language Modeling

• other —> text• Image Captions

• Natural Language Generation

• Speech Recognition

• Handwriting Recognition

• text —> other• Semantic Parsing

• Code Generation

• Handwriting Generation

• Speech Synthesis

• other —> other• Image Generation

• etc.

Page 153: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Tools and Resources

153

Page 154: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

A few open-source NMT toolkits

Extensive list at: https://github.com/jonsafari/nmt-list154

NAME Model Type Main Framework Who Comments

tf-seq2seq RNNTensorFlow Denny Britz

(Google Brain)

Nematus RNN Theano Edinburgh U.

Marian-NMT RNNC++ Poznan U. and

Edinburgh U.

Compatible with

Nematus

OpenNMT-py RNN PyTorch Harvard, Systran Based on

OpenNMT (Torch)

FairseqCNN

(ConvS2S)

Torch Facebook

Tensor2Tensor

(T2T)

“Attention is all

you need”

(Transformer)

+ other models

TensorFlow Google Brain

Page 155: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

References: some overviews

155

An introduction to machine translation (1992), Hutchins and Somers,

Academic Press. [web]

Statistical Machine Translation (2010), P. Koehn, Cambridge University Press.

Neural Machine Translation and Sequence-to-sequence Models (2017): A

Tutorial, G. Neubig. [pdf]

Neural Machine Translation (chapter draft) (2017), P. Koehn [pdf]

CS224d, Deep Learning for Natural Language Processing, Lecture 10

(Machine Translation), Manning et al., Stanford University [web]

Survey of the State of the Art in Natural Language Generation: Core tasks,

applications and evaluation (2017), Gatt et al. [pdf]

Page 156: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

References: a few papers

156

Long short-term memory (1997), S. Hochreiter and J. Schmidhuber. [pdf]

Generating sequences with recurrent neural networks (2013), A. Graves. [pdf]

Sequence to sequence learning with neural networks (2014), I. Sutskever et al. [pdf]

Neural machine translation by jointly learning to align and translate (2014), D.

Bahdanau et al. [pdf]

Google's neural machine translation system: Bridging the gap between human and

machine translation (2016), Y. Wu et al. [pdf]

A Convolutional Encoder Model for Neural Machine Translation (2017), J. Gehring et

al. [pdf]

Attention Is All You Need (2017), A Vaswani. [pdf]

Page 157: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning

Concluding remarks• Many aspects I did not discuss:

• Detailed Implementation techniques (batching, dropout, ensembling, …)

• Pros/Cons of NMT relative to PBMT (Philipp Koehn)

• Sub-word units, Byte-Pair encoding

• Use of monolingual data

• Fine-grained linguistic evaluation techniques (Pierre Isabelle’s challenge dataset)

• Prior linguistic knowledge

157

Seq2Seq in NMT, text generation, etc. :

An active research field with exciting applications

Page 158: A short introduction to Neural Machine Translationnlpcourse.europe.naverlabs.com/slides/06_machine_translation.pdf · •Neural MT (NMT) • Language ... •Toolkits and Learning