neural machine transla#on

Post on 02-Jan-2017

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NeuralMachineTransla/on

ThangLuongLecture@CS224D

Spring2016

(SpecialthankstoChrisManningforfeedback!)

7billionpeople,7000languages

5/19/16 ThangLuong-NeuralMachineTranslaHon 2

Auniversaltranslator

(TheBabelFishfrom“theHitchhiker'sGuidetotheGalaxy”)

5/19/16 ThangLuong-NeuralMachineTranslaHon 3

Machinevs.HumanTransla/on

“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”

Faithfultransla/on

5/19/16 ThangLuong-NeuralMachineTranslaHon 4

•  GrammaHcallyincorrect.

Machinevs.HumanTransla/on

“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”

Faithfultransla/on

5/19/16 ThangLuong-NeuralMachineTranslaHon 5

•  Badwordchoices.

Machinevs.HumanTransla/on

“Nevertheless,withinthedisciplineofmachinelearning,aspecializa8oncalleddeeplearninghasbeendeveloped.Itsdis8nguishingfeatureisthatitisinspiredbyneurobiology.Deeplearningdealswithcomputa8onalelementswhichallowanetworkofar8ficialneuronstolearnamodelofthehumanbrain.”

Faithfultransla/on

5/19/16 ThangLuong-NeuralMachineTranslaHon 6

•  Badsentencestructures.

Machinevs.HumanTransla/on

“However,inmachinelearningaspecializa8oncalleddeeplearninghasemerged.Itcanberecognizedbyitsdis8nc8veneurobiologicalinfluence.Deeplearningiscenteredaroundnetworksofar8ficialneuronswhichcanlearnmodelsofthehumanbrain.”

Fluenttransla/on

Abiggap!5/19/16 ThangLuong-NeuralMachineTranslaHon 7

HowhasMTevolved?

5/19/16 ThangLuong-NeuralMachineTranslaHon 8

Phrase-basedMT

Sheloves

Elleaime

cute

leschats

cats

mignons(Brownetal.,1993;Koehnetal.,2003;Och&Ney,2004)

•  Breaksentencesintochunks.•  Transla8onmodel:lookupphrasetranslaHons.•  Languagemodel:Hephrasestogether.

Translatelocally LMusesonlytargetwords5/19/16 ThangLuong-NeuralMachineTranslaHon 9

JointNeuralLanguageModel

•  CondiHonedonsourcewords(Devlinetal.,2014)•  SHlltranslatelocally.

…alléspourunepromenadelelongdela

WewentforastrollalongtheSouthBank

rive

walk along bank(river)

MTsystemsbecomemorecomplex!5/19/16 ThangLuong-NeuralMachineTranslaHon 10

NeuralMachineTransla/ontotherescue!

Let’sfindout!

•  Sequence-to-sequence:translateglobally.•  End-to-end:simple&generalizable.

(Sutskeveretal.,2014;Choetal.,2014)

NMT

Iamastudent Jesuisétudiant

5/19/16 ThangLuong-NeuralMachineTranslaHon 11

Outline

•  BasicNMT– RNNRecap.– Encoder-Decoder.– Training.– TesHng.

•  AdvancedNMT

5/19/16 ThangLuong-NeuralMachineTranslaHon 12

RecurrentNeuralNetworks(RNNs)

I am a studentinput:

5/19/16 ThangLuong-NeuralMachineTranslaHon 13

(PictureadaptedfromAndrejKarparthy)

RecurrentNeuralNetworks(RNNs)

I am a studentinput:

RNNstorepresentsequences!

ht-1 ht

xt

(PictureadaptedfromAndrejKarparthy)5/19/16 ThangLuong-NeuralMachineTranslaHon 14

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 15

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 16

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 17

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 18

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 19

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder

5/19/16 ThangLuong-NeuralMachineTranslaHon 20

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder

5/19/16 ThangLuong-NeuralMachineTranslaHon 21

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Boundarymarker

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder

5/19/16 ThangLuong-NeuralMachineTranslaHon 22

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder

5/19/16 ThangLuong-NeuralMachineTranslaHon 23

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

NeuralMachineTransla/on(NMT)

am a student _ Je suis étudiant

Je suis étudiant _

I

Encoder Decoder

•  RecurrentNeuralNetworks:– ModelP(target|source)directly.– Canbetrainedend-to-end.

Boundarymarker

5/19/16 ThangLuong-NeuralMachineTranslaHon 24

WordEmbeddings

am a student _ Je suis étudiant

Je suis étudiant _

I

•  Oneforeachlanguage:canlearnfromscratch.

Sourceembeddings

Targetembeddings

5/19/16 ThangLuong-NeuralMachineTranslaHon 25

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/onsIniHalstates

•  Olensetto0.5/19/16 ThangLuong-NeuralMachineTranslaHon 26

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/ons

Encoder1stlayer

5/19/16 ThangLuong-NeuralMachineTranslaHon 27

•  Different:{1stlayer,2ndlayer}x{encoder,decoder}.

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/ons

Encoder2ndlayer

5/19/16 ThangLuong-NeuralMachineTranslaHon 28

•  Different:{1stlayer,2ndlayer}x{encoder,decoder}.

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/ons

•  Different:{1stlayer,2ndlayer}x{encoder,decoder}.

Decoder1stlayer

5/19/16 ThangLuong-NeuralMachineTranslaHon 29

am a student _ Je suis étudiant

Je suis étudiant _

I

RecurrentConnec/ons

Decoder2ndlayer

5/19/16 ThangLuong-NeuralMachineTranslaHon 30

•  Different:{1stlayer,2ndlayer}x{encoder,decoder}.

RecurrentUnits

•  Vanilla:

•  LSTM:

5/19/16 ThangLuong-NeuralMachineTranslaHon 31

RNN

LSTM

C’mon,it’sbeenaroundfor20years!

Vanishinggradientproblem!

SoTmax:vectors↦categories

am a student _ Je suis étudiant

Je suis étudiant _

I

am a student _ Je suis étudiant

Je suis étudiant _

I

Solmaxparameters

5/19/16 ThangLuong-NeuralMachineTranslaHon 32

am a student _ Je suis étudiant

Je suis étudiant _

I

Targethiddenstate

|V|

SoTmax:vectors↦categories

am a student _ Je suis étudiant

Je suis étudiant _

I

am a student _ Je suis étudiant

Je suis étudiant _

I

5/19/16 ThangLuong-NeuralMachineTranslaHon 33

am a student _ Je suis étudiant

Je suis étudiant _

I

|V|

•  Hiddenstates↦scores↦probabiliHes.

Scores

=

Probs

P(Je|…)exp&normalize

TrainingLoss

•  MaximizeP(target|source)

am a student _ Je suis étudiantI

Je suis étudiant _

5/19/16 ThangLuong-NeuralMachineTranslaHon 34

TrainingLoss

am a student _ Je suis étudiant

-log P(Je)

I

-log P(suis)

•  Sumofallindividuallosses5/19/16 ThangLuong-NeuralMachineTranslaHon 35

TrainingLoss

am a student _ Je suis étudiant

-log P(Je)

I

-log P(suis)

•  Sumofallindividuallosses5/19/16 ThangLuong-NeuralMachineTranslaHon 36

TrainingLoss

•  Sumofallindividuallosses

am a student _ Je suis étudiantI

-log P(étudiant)

5/19/16 ThangLuong-NeuralMachineTranslaHon 37

TrainingLoss

•  Sumofallindividuallosses

am a student _ Je suis étudiantI

-log P(_)

5/19/16 ThangLuong-NeuralMachineTranslaHon 38

Backpropaga/onThroughTime

am a student _ Je suis étudiantI

-log P(_)

Initto0

5/19/16 ThangLuong-NeuralMachineTranslaHon 39

am a student _ Je suis étudiantI

-log P(étudiant)

Backpropaga/onThroughTime

5/19/16 ThangLuong-NeuralMachineTranslaHon 40

am a student _ Je suis étudiantI

-log P(étudiant)

Backpropaga/onThroughTime

5/19/16 ThangLuong-NeuralMachineTranslaHon 41

am a student _ Je suis étudiantI

-log P(suis)

Backpropaga/onThroughTime

5/19/16 ThangLuong-NeuralMachineTranslaHon 42

am a student _ Je suis étudiantI

-log P(suis)

Backpropaga/onThroughTime

5/19/16 ThangLuong-NeuralMachineTranslaHon 43

am a student _ Je suis étudiantI

Backpropaga/onThroughTime

RNNgradientsareaccumulated.5/19/16 ThangLuong-NeuralMachineTranslaHon 44

Trainingvs.Tes/ng

•  Training– CorrecttranslaHonsareavailable.

•  Tes8ng– Onlysourcesentencesaregiven.

am a student _ Je suis étudiant

Je suis étudiant _

I

am a student _ Je suis étudiant

Je suis étudiant _

I5/19/16 ThangLuong-NeuralMachineTranslaHon 45

Tes/ng

•  Feedthemostlikelyword5/19/16 ThangLuong-NeuralMachineTranslaHon 46

Tes/ng

5/19/16 ThangLuong-NeuralMachineTranslaHon 47

•  Feedthemostlikelyword

Tes/ng

5/19/16 ThangLuong-NeuralMachineTranslaHon 48

•  Feedthemostlikelyword

Tes/ng

5/19/16 ThangLuong-NeuralMachineTranslaHon 49

•  Feedthemostlikelyword

Tes/ng

Simplebeam-searchdecoders!5/19/16 ThangLuong-NeuralMachineTranslaHon 50

37

33.3

34.8

36.5

31

32

33

34

35

36

37

38

BLEU

SOTASMT(Durrani+,2014)

AvgSMT(Schwenk,2014)

NMT(Sutskever+,2014)

SMT+NMTrescore(Sutskever+,2014)

English-FrenchWMT’14results

5/19/16 ThangLuong-NeuralMachineTranslaHon 51

English-FrenchWMT’14results

37

33.3

34.8

36.5

31

32

33

34

35

36

37

38

BLEU

SOTASMT(Durrani+,2014)

AvgSMT(Schwenk,2014)

NMT(Sutskever+,2014)

SMT+NMTrescore(Sutskever+,2014)

5/19/16 ThangLuong-NeuralMachineTranslaHon 52

2decadesofresearch

1-2yearsofresearch

Encoder-decoderVariants

Encoder Decoder

(Sutskeveretal.,2014)MyNMTmodels DeepLSTM DeepLSTM

(Choetal.,2014)(Bahdanauetal.,2015)

(Jeanetal.,2015)

(BidirecHonal)GRU GRU

(Kalchbrenner&Blunsom,2013) CNN (InverseCNN)

RNN

Next,advancedNMT!5/19/16 ThangLuong-NeuralMachineTranslaHon 53

Deepfriedbaby Meatmusclestupidbeansprouts

Break/me:whenMTfails…

Saleofchickenmurder Gobacktowardyourbehind

5/19/16 ThangLuong-NeuralMachineTranslaHon 54

Limita/ons

•  #1:thevocabularysizeproblem– Goal:extendthevocabularycoverage.

•  #2:thesentencelengthproblem– Goal:translatelongsentencesbever.

•  #3:thelanguagecomplexityproblem– Goal:handlemorelanguagevariaHons.

5/19/16 ThangLuong-NeuralMachineTranslaHon 55

AdvancingNMT

•  #1:thevocabularysizeproblem– Sol:“copy”mechanism.

•  #2:thesentencelengthproblem– Sol:avenHonmechanism.

•  #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 56

The<unk>porHcoin<unk>

Le<unk><unk>de<unk>

ecotax Pont-de-Buis

por8que écotaxe Pont-de-Buis

am a student _ Je suis étudiant

Je suis étudiant _

I

CVvs.NLP

ComputerVision

cat

1Kcategories 1Mcategories

NLP

mat

Thecatsatona

5/19/16 ThangLuong-NeuralMachineTranslaHon 57

#1TheVocabularySizeProblem

•  WordgeneraHonproblem– Vocabsaremodest:50K.– Simplesolmax:GPUfriendliness.

am a student _ Je suis étudiant

Je suis étudiant _

I

The<unk>porHcoin<unk>Le<unk><unk>de<unk>

TheecotaxporHcoinPont-de-BuisLeporHqueécotaxedePont-de-Buis

5/19/16 ThangLuong-NeuralMachineTranslaHon 58

•  Propose“copy”mechanismsfor<unk>.

•  Simple&effecHve– TreatanyNMTasablackbox.– Annotatetrainingdata.– Post-processtranslaHons.

ThangLuong*,IlyaSutskever*,QuocLe*,OriolVinyals,andWojciechZaremba.AddressingtheRareWordProbleminNeuralMachineTransla>on.ACL2015.

SOTAforEnglish-FrenchtranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 59

Ourapproach–trainingannota>on

•  AddrelaHveposiHons.

“AvenHon”forrarewords

TheecotaxporHcoinPont-de-Buis

LeporHqueécotaxedePont-de-Buis

•  Learnalignments.

The<unk>porHcoin<unk>

Leunk1unk-1deunk0

5/19/16 ThangLuong-NeuralMachineTranslaHon 60

Ourapproach–post-process

Testsentence

Transla8on

The<unk>porHcoin<unk>

LeporHqueunk-1deunk0

ecotax Pont-de-Buis

5/19/16 ThangLuong-NeuralMachineTranslaHon 61

Ourapproach–post-process

Testsentence

Transla8on

The<unk>porHcoin<unk>

LeporHqueunk-1deunk0

ecotax Pont-de-Buis

Post-editTransla8on

Dic/onarytranslaHon

LeporHqueécotaxedePont-de-Buis

5/19/16 ThangLuong-NeuralMachineTranslaHon 62

Ourapproach–post-process

Testsentence

Transla8on

The<unk>porHcoin<unk>

LeporHqueunk-1deunk0

ecotax Pont-de-Buis

LeporHqueécotaxedePont-de-Buis

Iden/tycopy

Post-editTransla8on

Orthogonaltolarge-vocabtechniques5/19/16 ThangLuong-NeuralMachineTranslaHon 63

EffectsofTransla/ngRareWords

25

30

35

40

BLEU

Sentencesorderedbyaveragefrequencyrank

Durranietal.(37.0)

Sutskeveretal.(34.8)

Thiswork(37.5)

5/19/16 ThangLuong-NeuralMachineTranslaHon 64

FirstSOTANMTsystem!

Sampletransla/ons

•  Predictwelllong-distancealignments.– Correct:cataractvs.cataracte.

source AnaddiHonal2600operaHonsincludingorthopedicandcataractsurgerywillhelpclearabacklog.

human 2600opéraHonssupplémentaires,notammentdansledomainedelachirurgieorthopédiqueetdelacataracte,aiderontàravraperleretard.

trans Enoutre,unk1opéraHonssupplémentaires,dontlachirurgieunk5etlaunk6,permevrontderésorberl'arriéré.

trans+unk

Enoutre,2600opéraHonssupplémentaires,dontlachirurgieorthopédiquesetlacataracte,permevrontderésorberl'arriéré.

5/19/16 ThangLuong-NeuralMachineTranslaHon 65

Sampletransla/ons

•  Translatewelllongsentences.– Correct:JPMorganvs.JPMorgan.

sourceThistrader,RichardUsher,lelRBSin2010andisunderstandtohavebegivenleavefromhiscurrentposiHonasEuropeanheadofforexspottradingatJPMorgan.

humanCetrader,RichardUsher,aquivéRBSen2010etauraitétémissuspendudesonpostederesponsableeuropéendutradingaucomptantpourlesdeviseschezJPMorgan.

transCeunk0,Richardunk0,aquivéunk1en2010etacomprisqu'ilestautoriséàquiversonposteactuelentantqueleadereuropéendumarchédespointsdeventeauunk5.

trans+unk

Cenégociateur,RichardUsher,aquivéRBSen2010etacomprisqu'ilestautoriséàquiversonposteactuelentantqueleadereuropéendumarchédespointsdeventeauJPMorgan.

5/19/16 ThangLuong-NeuralMachineTranslaHon 66

Sampletransla/ons

•  IncorrectalignmentpredicHon:was–étaitvs.abandonnait.

source ButconcernshavegrownalerMrMazangawasquotedassayingRenamowasabandoningthe1992peaceaccord.

human Maisl'inquiétudeagrandiaprèsqueM.MazangaadéclaréquelaRenamoabandonnaitl'accorddepaixde1992.

trans MaislesinquiétudessesontaccruesaprèsqueM.unkpos3adéclaréquelaunk3unk3l'accorddepaixde1992.

trans+unk

MaislesinquiétudessesontaccruesaprèsqueM.MazangaadéclaréquelaRenamoétaitl'accorddepaixde1992.

5/19/16 ThangLuong-NeuralMachineTranslaHon 67

AdvancingNMT

•  #1:thevocabularysizeproblem– Sol:“copy”mechanism.

•  #2:thesentencelengthproblem– Sol:avenHonmechanism.

•  #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 68

The<unk>porHcoin<unk>

Le<unk><unk>de<unk>

ecotax Pont-de-Buis

por8que écotaxe Pont-de-Buis

am a student _ Je suis étudiant

Je suis étudiant _

I

#2TheSentenceLengthProblem

Problem:sentencemeaningisrepresentedbyafixed-dimensionalvector.

am a student _ Je suis étudiant

Je suis étudiant _

I

•  TranslaHonqualitydegradeswithlongsentences.

5/19/16 ThangLuong-NeuralMachineTranslaHon 69

Youcan'tcramthemeaningofawhole%&!$#sentenceintoasingle$&!#*vector!

5/19/16 ThangLuong-NeuralMachineTranslaHon 70

(AdaptedfromKyungHuynCho’talk)

Amen/onMechanism

•  SoluHon:randomaccessmemory– Retrieveasneeded.

am a student _ Je suis étudiant

Je suis étudiant _

I

Poolofsourcestates

5/19/16 ThangLuong-NeuralMachineTranslaHon 71

5/19/16 ThangLuong-NeuralMachineTranslaHon 72

DzmitryBahdanau,KyungHuynCho,andYoshuaBengio.NeuralMachineTransla>onbyJointlyLearningtoTranslateandAlign.ICLR2015.

WithavenHonWithoutavenHon

Amen/onMechanism

am a student _ Je

suis

I

Attention Layer

Context vector

?

Asimplifiedversionof(Bahdanauetal.,2015)5/19/16 ThangLuong-NeuralMachineTranslaHon 73

•  Comparetargetandsourcehiddenstates.

Amen/onMechanism–Scoring

am a student _ Je

suis

I

Attention Layer

Context vector

?

3

5/19/16 ThangLuong-NeuralMachineTranslaHon 74

•  Comparetargetandsourcehiddenstates.

Amen/onMechanism–Scoring

am a student _ Je

suis

I

Attention Layer

Context vector

?

53

5/19/16 ThangLuong-NeuralMachineTranslaHon 75

•  Comparetargetandsourcehiddenstates.

Amen/onMechanism–Scoring

am a student _ Je

suis

I

Attention Layer

Context vector

?

13 5

5/19/16 ThangLuong-NeuralMachineTranslaHon 76

•  Comparetargetandsourcehiddenstates.

Amen/onMechanism–Scoring

am a student _ Je

suis

I

Attention Layer

Context vector

?

13 5 1

5/19/16 ThangLuong-NeuralMachineTranslaHon 77

•  Convertintoalignmentweights.

Amen/onMechanism–Normaliza>on

am a student _ Je

suis

I

Attention Layer

Context vector

?

0.10.3 0.5 0.1

5/19/16 ThangLuong-NeuralMachineTranslaHon 78

am a student _ Je

suis

I

Context vector

•  Buildcontextvector:weightedaverage.

Amen/onMechanism–Contextvector

?

5/19/16 ThangLuong-NeuralMachineTranslaHon 79

am a student _ Je

suis

I

Context vector

•  Computethenexthiddenstate.

Amen/onMechanism–Hiddenstate

5/19/16 ThangLuong-NeuralMachineTranslaHon 80

am a student _ Je

suis

I

Context vector

•  Predictthenextword.

Amen/onMechanism–Predict

5/19/16 ThangLuong-NeuralMachineTranslaHon 81

•  ExaminevariousavenHonmechanisms:

ThangLuong,HieuPham,andChrisManning.Effec>veApproachestoAGen>on-basedNeuralMachineTransla>on.EMNLP2015.

Global:allsourcestates. Local:subsetofsourcestates.

SOTAforEnglish-GermantranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 82

TranslateLongSentences

10 20 30 40 50 60 7010

15

20

25

Sent Lengths

BLEU

����

ours, no attn (BLEU 13.9)ours, local−p attn (BLEU 20.9)ours, best system (BLEU 23.0)WMT’14 best (BLEU 20.7)Jeans et al., 2015 (BLEU 21.6)

NoAvenHon

AvenHon

5/19/16 ThangLuong-NeuralMachineTranslaHon 83

SampleEnglish-Germantransla/ons

•  Translatenamescorrectly.

source OrlandoBloomandMirandaKerrsHllloveeachother

human OrlandoBloomundMirandaKerrliebensichnochimmer

bestOrlandoBloomundMirandaKerrliebeneinandernochimmer.

baseOrlandoBloomundLucasMirandaliebeneinandernochimmer.

5/19/16 ThangLuong-NeuralMachineTranslaHon 84

SampleEnglish-Germantransla/ons

•  Translateadoubly-negatedphrasecorrectly•  Failtotranslate“passengerexperience”.

sourceWeʹrepleasedtheFAArecognizesthatanenjoyablepassengerexperienceisnotincompa>blewithsafetyandsecurity,saidRogerDow,CEOoftheU.S.TravelAssociaHon.

humanWirfreuenuns,dassdieFAAerkennt,dasseinangenehmesPassagiererlebnisnichtimWider-spruchzurSicherheitsteht,sagteRogerDow,CEOderU.S.TravelAssociaHon.

best Wirfreuenuns,dassdieFAAanerkennt,dasseinangenehmesistnichtmitSicherheitundSicherheitunvereinbarist,sagteRogerDow,CEOderUS-die.

baseWirfreuenunsuberdie<unk>,dassein<unk><unk>mitSicherheitnichtvereinbaristmitSicherheitundSicherheit,sagteRogerCameron,CEOderUS-<unk>.

5/19/16 ThangLuong-NeuralMachineTranslaHon 85

sourceWeʹrepleasedtheFAArecognizesthatanenjoyablepassengerexperienceisnotincompa>blewithsafetyandsecurity,saidRogerDow,CEOoftheU.S.TravelAssociaHon.

humanWirfreuenuns,dassdieFAAerkennt,dasseinangenehmesPassagiererlebnisnichtimWider-spruchzurSicherheitsteht,sagteRogerDow,CEOderU.S.TravelAssociaHon.

best Wirfreuenuns,dassdieFAAanerkennt,dasseinangenehmesistnichtmitSicherheitundSicherheitunvereinbarist,sagteRogerDow,CEOderUS-die.

baseWirfreuenunsuberdie<unk>,dassein<unk><unk>mitSicherheitnichtvereinbaristmitSicherheitundSicherheit,sagteRogerCameron,CEOderUS-<unk>.

SampleEnglish-Germantransla/ons

•  Translateadoubly-negatedphrasecorrectly•  Failtotranslate“passengerexperience”.5/19/16 ThangLuong-NeuralMachineTranslaHon 86

TEDtalk,English-German

30.85

26.18 26.02 24.9622.51

20.08

0

5

10

15

20

25

30

35

BLEU

16.16

21.84 22.67 23.42

28.18

0

5

10

15

20

25

30

Stanford EdinburghKarlsruheHeidelberg PJAIT

HUMANTER

26%

ThangLuongandChrisManning.StanfordNeuralMachineTransla>onSystemsforSpokenLanguageDomain.IWSLT2015.

Winning

5/19/16 ThangLuong-NeuralMachineTranslaHon 87

AdvancingNMT

•  #1:thevocabularysizeproblem– Sol:“copy”mechanism.

•  #2:thesentencelengthproblem– Sol:avenHonmechanism.

•  #3:thelanguagecomplexityproblem– Sol:character-leveltranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 88

The<unk>porHcoin<unk>

Le<unk><unk>de<unk>

ecotax Pont-de-Buis

por8que écotaxe Pont-de-Buis

am a student _ Je suis étudiant

Je suis étudiant _

I

#3Therarewordproblem

•  “Copying”mechanismsarenotsufficient.– Differentalphabets:Christopher↦Kryštof– MulH-wordalignment:Solarsystem↦Sonnensystem

•  Needtohandlelarge,openvocabulary– Richmorphology:nejneobhospodařovávatelnějšímu

(“totheworstfarmableone”)–  Informalspelling:gooooooodmorning!!!!!

Beabletogenerateatthecharacterlevel.5/19/16 ThangLuong-NeuralMachineTranslaHon 89

Recentcharacter-levelNMT

•  UnsaHsfactoryperformance–  (WangLing,IsabelTrancoso,ChrisDyer,AlanBlack,arXiv2015)

•  IncompletesoluHon– Decoderonly(JunyoungChung,KyunghyunCho,YoshuaBengio.arXiv2016).

5/19/16 ThangLuong-NeuralMachineTranslaHon 90

•  Abest-of-both-worldsarchitecture:– Translatemostlyatthewordlevel– Onlygothecharacterlevelwhenneeded.

•  AddiHonal+2.1↦+11.4BLEUimprovement.

ThangLuongandChrisManning.AchievingOpenVocabularyNeuralMachineTransla>onwithHybridWord-CharacterModels.Insubmission,ACL2016.

SOTAforEnglish-CzechtranslaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 91

HybridNMT

Word-level(4layers)

End-to-endtraining8-stackingLSTMlayers.

5/19/16 ThangLuong-NeuralMachineTranslaHon 92

SourceRepresenta/ons

•  On-the-flyembeddings.– Zero-init,batchcomputaHon.

5/19/16 ThangLuong-NeuralMachineTranslaHon 93

TargetGenera/on Initwithword

hiddenstates.

Purposely,use<unk>tomakedecodingeasier.

5/19/16 ThangLuong-NeuralMachineTranslaHon 94

English-CzechWMT’15Results

Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8

Exis8ngword-levelNMT(Jeanetal.,2015)

Singlemodel 15.7

Ensemble4models 18.3Largevocab+unkreplace

30xdata3systems

5/19/16 ThangLuong-NeuralMachineTranslaHon 95

English-CzechWMT’15Results

Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8

Exis8ngword-levelNMT(Jeanetal.,2015)

Singlemodel 15.7

Ensemble4models 18.3

Ourcharacter-basedNMT

Singlemodel(600-stepbackprop) 15.9

Largevocab+unkreplace

30xdata3systems

•  Purelycharacter-based:slowbutpromising!

5/19/16 ThangLuong-NeuralMachineTranslaHon 96

English-CzechWMT’15Results

Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8

Exis8ngword-levelNMT(Jeanetal.,2015)

Singlemodel 15.7

Ensemble4models 18.3

Ourcharacter-basedNMT

Singlemodel(600-stepbackprop) 15.9

OurhybridNMT

Singlemodel 19.6

Largevocab+unkreplace

30xdata3systems

NewSOTA!

5/19/16 ThangLuong-NeuralMachineTranslaHon 97

English-CzechWMT’15Results

Systems BLEUWinningentry(Bojar&Tamchyna,2015) 18.8

Exis8ngword-levelNMT(Jeanetal.,2015)

Singlemodel 15.7

Ensemble4models 18.3

Ourcharacter-basedNMT

Singlemodel(600-stepbackprop) 15.9

OurhybridNMT

Singlemodel 19.6

Ensemble4models 20.7

Largevocab+unkreplace

30xdata3systems

BemerSOTA!

5/19/16 ThangLuong-NeuralMachineTranslaHon 98

EffectsofVocabularySizes

02468101214161820

1K 10K 20K 50K

BLEU

VocabularySize

Word Word+unkreplace Hybrid

5/19/16 ThangLuong-NeuralMachineTranslaHon 99

EffectsofVocabularySizes

02468101214161820

1K 10K 20K 50K

BLEU

VocabularySize

Word Word+unkreplace Hybrid

AddiHonalgainsof+2.1↦+11.4BLEU

+11.4

+4.5+3.5

+2.1

5/19/16 ThangLuong-NeuralMachineTranslaHon 100

EffectsofVocabularySizes

02468101214161820

1K 10K 20K 50K

BLEU

VocabularySize

Word Word+unkreplace Hybrid

Small-vocabhybrid=Large-vocabword5/19/16 ThangLuong-NeuralMachineTranslaHon 101

RareWordEmbeddings

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

acceptable

acknowledgement

admissionadmitadmittanceadmitting

advance

antagonistchoose

chooses

connect

decide

developdevelopments

evidentlyexplicit

founder

governance

immobileimmoveable

impossible

insensitiveinsufficiency

linkmanagement

necessary

nominated

noticeable

obvious

perceptible

possible

practice

satisfactory

sponsor

unacceptable

unaffected

uncomfortableunsatisfactory

unsuitable

antagonize

cofounderscompanionships

disrespectful

heartlesslyheartlessness

illiberal

impossibilitiesinabilities

loveless

narrowïmindednarrowïmindedness nonconscious

regretful

spiritless

unattainableness

unconcern

uncontroversial

unfeatheredunfledged

ungraceful

unrealizable

unsighted

untrustworthy

wholeheartedness

•  Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 102

RareWordEmbeddings

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

acceptable

acknowledgement

admissionadmitadmittanceadmitting

advance

antagonistchoose

chooses

connect

decide

developdevelopments

evidentlyexplicit

founder

governance

immobileimmoveable

impossible

insensitiveinsufficiency

linkmanagement

necessary

nominated

noticeable

obvious

perceptible

possible

practice

satisfactory

sponsor

unacceptable

unaffected

uncomfortableunsatisfactory

unsuitable

antagonize

cofounderscompanionships

disrespectful

heartlesslyheartlessness

illiberal

impossibilitiesinabilities

loveless

narrowïmindednarrowïmindedness nonconscious

regretful

spiritless

unattainableness

unconcern

uncontroversial

unfeatheredunfledged

ungraceful

unrealizable

unsighted

untrustworthy

wholeheartedness

•  Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 103

RareWordEmbeddings

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

acceptable

acknowledgement

admissionadmitadmittanceadmitting

advance

antagonistchoose

chooses

connect

decide

developdevelopments

evidentlyexplicit

founder

governance

immobileimmoveable

impossible

insensitiveinsufficiency

linkmanagement

necessary

nominated

noticeable

obvious

perceptible

possible

practice

satisfactory

sponsor

unacceptable

unaffected

uncomfortableunsatisfactory

unsuitable

antagonize

cofounderscompanionships

disrespectful

heartlesslyheartlessness

illiberal

impossibilitiesinabilities

loveless

narrowïmindednarrowïmindedness nonconscious

regretful

spiritless

unattainableness

unconcern

uncontroversial

unfeatheredunfledged

ungraceful

unrealizable

unsighted

untrustworthy

wholeheartedness

•  Word&character-basedembeddings.5/19/16 ThangLuong-NeuralMachineTranslaHon 104

SampleEnglish-Czechtransla/ons

source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.

char AutorStepherStepherzemřel20letpodiagnóze.

wordAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpopo.

hybridAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpodiagnóze.

PerfecttranslaHon!

5/19/16 ThangLuong-NeuralMachineTranslaHon 105

SampleEnglish-Czechtransla/ons

•  Char-based:wrongnametranslaHon.•  Char-based:wrongnametranslaHon.

source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.

char AutorStepherStepherzemřel20letpodiagnóze.

wordAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpopo.

hybridAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpodiagnóze.

5/19/16 ThangLuong-NeuralMachineTranslaHon 106

SampleEnglish-Czechtransla/ons

•  Word-based:incorrectalignment•  Char-based:wrongnametranslaHon.

source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.

char AutorStepherStepherzemřel20letpodiagnóze.

wordAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpopo.

hybridAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpodiagnóze.

5/19/16 ThangLuong-NeuralMachineTranslaHon 107

SampleEnglish-Czechtransla/ons

•  Char-based&hybrid:correcttranslaHonofdiagnóze.•  Char-based:wrongnametranslaHon.

source TheauthorStephenJayGoulddied20yearsalerdiagnosis.human AutorStephenJayGouldzemřel20letpodiagnóze.

char AutorStepherStepherzemřel20letpodiagnóze.

wordAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpopo.

hybridAutorStephenJay<unk>zemřel20letpo<unk>.

AutorStephenJayGouldzemřel20letpodiagnóze.

5/19/16 ThangLuong-NeuralMachineTranslaHon 108

SampleEnglish-Czechtransla/ons

source AstheReverendMar>nLutherKingJr.saidfiTyyearsago:human Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:

char JakoreverendMar/nLutherkrálříkalpředpadesá>lety:

wordJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:

JakřeklreverendMar/nLutherKingřeklpředpadesá>lety:

hybridJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:

Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:

Correctreordering!

5/19/16 ThangLuong-NeuralMachineTranslaHon 109

source AstheReverendMar>nLutherKingJr.saidfiTyyearsago:human Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:

char JakoreverendMar/nLutherkrálříkalpředpadesá>lety:

wordJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:

JakřeklreverendMar/nLutherKingřeklpředpadesá>lety:

hybridJakřeklreverendMarHn<unk>King<unk>předpadesáHlety:

Jakpředpadesá/letyřeklreverendMar/nLutherKingJr.:

SampleEnglish-Czechtransla/ons

•  Char-based:“král”means“king”.•  Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 110

SampleEnglish-Czechtransla/ons

source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird

human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní

char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně

wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné

Její11-year-olddceraShani,řekla,žejetotrochudivné

hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>

Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný

Generatecomplexwords!

5/19/16 ThangLuong-NeuralMachineTranslaHon 111

SampleEnglish-Czechtransla/ons

source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird

human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní

char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně

wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné

Její11-year-olddceraShani,řekla,žejetotrochudivné

hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>

Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný

•  Word-based:idenHtycopyfails.•  Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 112

SampleEnglish-Czechtransla/ons

source Her11-year-olddaughter,ShaniBart,saiditfeltalivlebitweird

human Jejíjedenác/letádceraShaniBartováprozradila,žejetotrochuzvláštní

char Jejíjedenác/letádcera,ShaniBartová,říkala,žecí�trochudivně

wordJejí<unk>dcera<unk><unk>řekla,žejetotrochudivné

Její11-year-olddceraShani,řekla,žejetotrochudivné

hybridJejí<unk>dcera,<unk><unk>,řekla,žejeto<unk><unk>

Jejíjedenác/letádcera,GrahamBart,řekla,žecí�trochudivný

•  Hybrid:translatenamesincorrectly.•  Char-based:wrongnametranslaHon.5/19/16 ThangLuong-NeuralMachineTranslaHon 113

WehaveadvancedNMT

•  #1:thevocabularysizeproblem–  Sol:“copy”mechanism.–  SOTAEnglish-French

•  #2:thesentencelengthproblem–  Sol:avenHonmechanism.–  SOTAEnglish-German

•  #3:thelanguagecomplexityproblem–  Sol:character-leveltranslaHon.–  SOTAEnglish-Czech

5/19/16 ThangLuong-NeuralMachineTranslaHon 114

The<unk>porHcoin<unk>

Le<unk><unk>de<unk>

ecotax Pont-de-Buis

por8que écotaxe Pont-de-Buis

am a student _ Je suis étudiant

Je suis étudiant _

I

NMT&beyond

•  UnsupervisedlearningforNMT– UHlizemonolingualdata.

•  Long-contextNMT–  TranslaHnganarHcle/abook.–  SmarteravenHon,longersequences.

•  MulH-modalLanguageUnderstandingSystem– MulH-lingualtranslaHon+speechrecogniHon+more– MulH-tasklearning

Thankyou!

5/19/16 ThangLuong-NeuralMachineTranslaHon 115

I’mdone.Butifyouarecurious,readon!

5/19/16 ThangLuong-NeuralMachineTranslaHon 116

•  CanweuHlizeallsequence-to-sequencedata?

•  CanwecompressNMTformobiledevices?

#4ForthefutureofNMT

MachinetranslaHon ConsHtuentparsing

5/19/16 ThangLuong-NeuralMachineTranslaHon 117

Ourwork

•  MulH-tasklearning:– MachinetranslaHon –ConsHtuentparsing–  ImagecapHongeneraHon–Unsupervisedlearning

•  TranslaHonimprovement:upto+1.5BLEU.

•  State-of-the-artinconsHtuentparsing.

ThangLuong,QuocLe,IlyaSutskever,OriolVinyals,andLukaszKaiser.Mul>-tasksequencetosequencelearning.ICLR2016.

5/19/16 ThangLuong-NeuralMachineTranslaHon 118

Many-to-one:shareddecoder

English (unsupervised)

Image (captioning) English

German (translation)

1616.517

17.518

18.519

German-EnglishTransla/on(BLEU)

Big(transla>on)+Medium(cap>on)

(Luongetal.,2015)

Single

+capHon(0.1x)

+capHon(0.05x)

+capHon(0.01x)

+0.7BLEU

5/19/16 ThangLuong-NeuralMachineTranslaHon 119

1313.514

14.515

15.516

English-GermanTransla/on(BLEU)

Big(transla>on)+Small(PTBparsing)

(Luongetal.,2015)

Single

+parsing(1x)

+parsing(0.1x)

+parsing(0.01x)

+1.5BLEU

English (unsupervised)

German (translation)

Tags (parsing)English

One-to-many:sharedencoder

MixingraHo5/19/16 ThangLuong-NeuralMachineTranslaHon 120

Ourwork

AbigailSee*,ThangLuong*,andChrisManning.CompressionofNeuralMachineTransla>onModelsviaPruning.Insubmission.

•  CompressNMTviapruning&retraining:

0 10 20 30 40 50 60 70 80 90

0

5

10

15

20

25

percentage pruned

BLEU

score

pruned

pruned and retrained

Figure 1: Performance of pruned models, immediately after pruning and after

retraining.

1

Originalmodel

Prunesmallestweights

Prune+retrain

5/19/16 ThangLuong-NeuralMachineTranslaHon 121

Ourwork•  CompressNMTviapruning&retraining:

0 10 20 30 40 50 60 70 80 90

0

5

10

15

20

25

percentage pruned

BLEU

score

pruned

pruned and retrained

Figure 1: Performance of pruned models, immediately after pruning and after

retraining.

1

Originalmodel

Prunesmallestweights

Prune+retrain

Prune80%withoutlossofperformance.5/19/16 ThangLuong-NeuralMachineTranslaHon 122

NMTRedundancy–Embeddings

•  Frequentwordshavelargerweights– white:large.– black:small.

target embedding weights

source embedding weights

least common wordmost common word

source layer 1 weights

recurrentfeed-forward

input gate

forget gate

output gate

input

source layer 2 weights source layer 3 weights source layer 4 weights

target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights

5/19/16 ThangLuong-NeuralMachineTranslaHon 123

target embedding weights

source embedding weights

least common wordmost common word

source layer 1 weights

recurrentfeed-forward

input gate

forget gate

output gate

input

source layer 2 weights source layer 3 weights source layer 4 weights

target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights

NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4

Encoder

Decoder

Inputgate

Forgetgate

Outputgate

Inputsignal

Feed-forward Recurrent

5/19/16 ThangLuong-NeuralMachineTranslaHon 124

target embedding weights

source embedding weights

least common wordmost common word

source layer 1 weights

recurrentfeed-forward

input gate

forget gate

output gate

input

source layer 2 weights source layer 3 weights source layer 4 weights

target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights

NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4

Encoder

Decoder

Inputgate

Forgetgate

Outputgate

Inputsignal

Feed-forward Recurrent

Inputsignalisimportant!

5/19/16 ThangLuong-NeuralMachineTranslaHon 125

target embedding weights

source embedding weights

least common wordmost common word

source layer 1 weights

recurrentfeed-forward

input gate

forget gate

output gate

input

source layer 2 weights source layer 3 weights source layer 4 weights

target layer 1 weights target layer 2 weights target layer 3 weights target layer 4 weights

NMTRedundancy–LSTMLayer1 Layer2 Layer3 Layer4

Encoder

Decoder

Inputgate

Forgetgate

Outputgate

Inputsignal

Feed-forward Recurrent

Forgetgate:small–layer1large–layer4

5/19/16 ThangLuong-NeuralMachineTranslaHon 126

FutureChallenges

Shesawanelephantinherdress.Theelephantmusthaveagood

senseoffashion!Needstounderstand

commonsense&largercontext.

Shesawanelephantinherdress.

5/19/16 ThangLuong-NeuralMachineTranslaHon 127

Thankyou!

top related