final class - cs.columbia.edukathy/nlp/2017/classslides/class25-final/... · • an gle pretenders...

100
Final Class

Upload: voquynh

Post on 31-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

FinalClass

Announcements• Takeafewminutesfortheevalua3onatthebeginningofclass

• Grades:expecttopostfinalgradesby12/23• SeveralfundedGRAposi3onsopentoMSstudentsnextspring–ifinterestedemailme.

• HW3gradesreturnedtoday• Finalexamwillbeintwoloca3ons.Youwillreceiveanemailaboutwhere.

• DONOTANSWERPOLLEVERWHEREAHEADOFTIME

HW4•  Instruc3onsforsubmiMng:Thedirectorystructureonthebitbucketreposhouldbeexactlysameasthehw4.zipprovidedtoyou(withtheexcep3onofdatadirectory.Donotuploadit).Topushthecodetoremoterepo,usethesameinstruc3onsasgiveninHW0.Doublecheckyouremoterepoforcorrectdirectorystructure.Wewon'tconsideranyregraderequestsbasedonwrongdirectorystructurepenalty.Again,donotuploaddatatoyourbitbucketrepo.

•  HW4:UselogofweightsinaVen3ontogetslightlybeVervisualiza3on.SeepostbyDheeraj.

•  ApoorvhasofficehoursTODAY4-6forthosewithques3ons.

• WriVenpart:Notjusttalkingaboutsmallextensionstoneuralnet.Thinkbig.

ProjectsinNLP•  Searchandsummariza3onoverlowresourcelanguages•  Howdowesummarizeinthesource?Howdowesummarizewhentransla3onisbad?Howdowesummarizespeech?

•  Iden3fyingaggressionandlossinpostsfromgang-involvedyouth•  Canweiden3fypaVernsinpostsover3me?Canweusethesocialnetwork?Canweiden3fyreferencestotriggeringevents?

•  Iden3fyinghatespeech•  Bullying,threatsagainstjournalists,howdoescultureaffectinterpreta3on?

•  Jointuseofvisualandtextualcues•  Toiden3fysen3menttowardstargets,toiden3fyevents

Re9lections

Today• PoetryGenera3on

• Reviewforfinal

PoetryGeneration• Genera3ngTopicalPoetry,Ghazvininejadetal2016)hVps://aclweb.org/anthology/D16-1126

• Hafez–generatessonnetsonauserprovidedtopic•  Iambicpentameter•  Everyotherlinerhymes

• Roughoverviewofmethodsplusoutput

SystemOverview• SelectlargevocabularyandcomputestresspaVerns

•  Selectwordsrelatedtouser-suppliedtopic• Selectpairsofrhymingwordstoendlines• BuildFSAwithapathforeveryconceivablesequenceofvocabularywordstoobeyformalrhythmconstraintswithrhymewordsinplace

• SelectafluentpaththroughtheFSAusingaRNNforscoring

Ghazvininejadetal2016)

Vocabulary•  Iambicpentameter:tensyllablesalterna3ngbetweenstressedandunstressed•  A<endingonhisgoldenpilgramage0101010101

• UseCMUpronuncia3ondic3onary•  RemovewordswhosestresspaVerndoesnotmatchiambicpaVern

•  Removeambiguouswords(recordN10;recordV01)•  Avoidsto,it,in,is

•  Finalvocabulary:14368words•  4833monosyllabic•  9535mul3syllabic

Selectingtopicallyrelatedwords• Usersuppliesatopic:colonel• Output:colonel,lieutenant_colonel,brigadier_general,commander,army

• UseWord2Vecusingwindowsizeof40• Wordembeddingvectorfortopicwordorphrase

• Wordembeddingsforeachvocabularyword

• Howwouldsimilaritybecomputed?

Rhymewords•  Shakespeareansonnet:ABABCDCDEFEFGG

•  Strictrhyme:soundsoftwowordsmustmatchfromthelaststressedvowelonwards•  Masculinerhyme:thelastsyllableisstressed•  Femininerhyme:thepenul3matesyllableisstressed•  Pre-computestrictrhymeclassesforwordsandhashvocabularyintothoseclasses(CMUpronuncia3ondic3onary)

•  Slantrhymes:viking/fighGng,snoopy/spooky,baby/crazyandcomic/ironic

Rhymewordselection• Hashallrelatedwords/phrasesintorhymeclasses

• Eachcollisiongeneratesacandidaterhymepair(s1,s2)

• Scorepairwithmax:cosine(s1,topic);cosine(s2,topi)

• Chooserhymepairsrandomlywithprobabilitypropor3onaltotheirscore

FSAConstruction• CreatelargeFSAthatencodesallwordsequencesthatuseselectedrhymepairsandobeysformalsonnetconstraints•  Contains14lines•  LinesareiniambicpentameterwithstresspaVern(01)5or(01)50(feminine)

•  Eachlineendswithchosenrhymeword/phrase•  Eachlineispunctuatedwithcommaorperiodexceptfor4th,8th,12thand14thwhichhaveaperiod

FSAOutput• Topic:naturallanguage• Contains10229paths• Randomlyselectedpath

• OfpocketsolaceammuniGongrammar.• AnGlepretendersspreadinglogical.• AnstoriesJackiegallonposingbanner.• AncorpsesKatobiological…

PathextractionthroughFSAwithRNN• Needscoringfunc3onandsearchprocedure•  RNN“genera3onmodel”

•  TwolayerLSTMwithbeamsearchguidedbyFSA•  Beamsearchstate:(h,s,word,score)

•  HthehiddenstateofLSTMatsteptintheithstate•  StheFSAstateatsteptintheithstate•  Generatesonewordateachstep

•  Trainedusingsonglyrics->repea3ngwords(nevereverevereverever)•  Applypenaltytowordsalreadygenerated

•  Beamof50onenresultsinnotbeingabletogeneratefinalrhymingwordinFSA•  Generatethewholesonnetinreverse

Translationmodel• Useencoder-decoderLSTM

•  Assemblerhymingwordsinreverseorder(encoderside)

•  Pairedwithen3rereversedlyric(decoderside)

•  Atgenera3on3me:allselectedrhymewordsonsourcesideandletmodelgeneratethepoemcondi3onedonthoserhymewords

• Whengenera3ngthelastline,italreadyknowsall14rhymewords

Results• Transla3onmodelbeVerthangenera3onmodel

• EncouragingtopicwordsbeVerthannot

• Doesthesystemplagiarize?• Averagecopyingof1.25-gramspersonnet•  Ifrelaxrepeated-wordpenalty->7.9copied5-grams

•  Ifrelaxiambicmeter->10.6copied5-grams

BipolarDisorder•  ExistenceentersyourenGrenaGon.Atwistedmindrevealsbecomingmanic,AnendlessmodernendingmedicaGon,Anotherro<ensoulbecomesdynamic.

•  OrunderpressureongeneGctests.Surroundedbycontrollingmydepression,Andonlyhumantortureneverrests,Ormaybeyouexpectaneasylesson.

•  Orsomethingfromthecancerheartdisease,AndIconsideryouafriendofmine.Withoutali<lesignofjudgementplease,Delivermeacrosstheborderline.

•  Analteredstateofmanicepisodes,Ajourneythroughthelongandwindingroads.

Othertopics• LoveatFirstSightAnearlymorningonarainynight,Relaxandmaketheotherpeoplehappy,Ormaybegetali<leoutofsight,AndwanderdownthestreetsofCincinnaG.

Girlfriend

• AnotherpartystartedgeVngheavy.Andneverhadali<lebitofBobby,OrsomethinggoingbythenameofEddie,Andgotafingeronthetriggersloppy.

Noodles

• ThepeoplewannadrinkspageValla,Andmaybeeatalotoftheothercrackers,Orsitaroundandtalkaboutthesalsa,Ali<lebitofnothingreallyma<ers.

FinalReview•  Finalexamwillbeintwoloca3ons.Youwillreceiveanemailaboutwhere

•  Today:anermidtermonly.Lookatmidtermreviewtoreviewearliertopics

•  Finalwillbecumula3ve•  Someemphasistowardslasthalfoftheclass•  Someemphasistowardstopicsnottestedinhomeworks•  Anythingcoveredinclassisapoten3altopicfortheexam

•  Calculatorallowedinfinal.Nootherelectronics.Nonotesorbooks.

•  ThreereviewsessionswillbeofferedbyTas•  Bhavana,December13th,NeuralNetbasics,HW3•  Elsbeth,December18th,evening•  Fei-Tzin,December19th,evening

•  Allofficehourswillbeheldbetweennowandfinal

Today• Seman3cs

• RNN,LSTM,AVen3on

• Summariza3on

• MachineTransla3on

• Ques3ons

AbstractMeaningRepresentation• Givenasentenceproposearepresenta3on

• Givenarepresenta3on,providethesentence

• Understandtheparsingframework

• Discussprosandcons

AMRcharacteristics• Rooted,labeledgraphs• Abstractawayfromsyntac3cdifferences

• Hedescribedherasagenius• Hisdescrip3onofher:genius•  Shewasageniusaccordingtohisdescrip3on

• UsePropbankframesets•  “bondinvestor”:invest-01

• HeavilybiasedtowardsEnglish

AMRrelations•  ~100rela3ons•  Framearguments

•  Arg0,arg1,arg2,arg3,arg4,arg5(Propbank)•  Generalseman3crela3ons

•  :Accompanier,:age,:beneficiary,:cause,:compared-to,:concession,:condi3on,:consistof,:degree,:des3na3on,:direc3on,:domain,:dura3on,:employed-by,:example,:extent,:frequency,:instrument,:li,:loca3on,:manner,:medium,:mod,:mode,:name,:part,:path,:polarity,:poss,:purpose,:source,:subevent,:subset,:3me,:topic,:value.

•  Rela3onsforquan3ty•  :quant,:unit,:scale

•  Rela3onsfordateen3ty•  :day,:month,:year,:weekday,:3me,:3mezone,:quarter,:dayperiod,:season,:year2,:decade,:century,:calendar,:era.

•  Rela3onsforlists•  :op1,:op2,….:op10

•  Plusinverses(e.g.,:arg0-of,:loca3on-of)

AMRrelations•  ~100rela3ons•  Framearguments

•  Arg0,arg1,arg2,arg3,arg4,arg5(Propbank)•  Generalseman3crela3ons

•  :Accompanier,:age,:beneficiary,:cause,:compared-to,:concession,:condi3on,:consistof,:degree,:des3na3on,:direc3on,:domain,:dura3on,:employed-by,:example,:extent,:frequency,:instrument,:li,:loca3on,:manner,:medium,:mod,:mode,:name,:part,:path,:polarity,:poss,:purpose,:source,:subevent,:subset,:3me,:topic,:value.

•  Rela3onsforquan3ty•  :quant,:unit,:scale

•  Rela3onsfordateen3ty•  :day,:month,:year,:weekday,:3me,:3mezone,:quarter,:dayperiod,:season,:year2,:decade,:century,:calendar,:era.

•  Rela3onsforlists•  :op1,:op2,….:op10

•  Plusinverses(e.g.,:arg0-of,:loca3on-of)

NOTNECESSARYTOMEMORIZE–WOULDBEPROVIDED

Framesets• ExamplesofusingFramesetstoextractawayfromEnglishsyntax

•  (d/describe-01•  :arg0(m/man)•  :arg1(m2/mission)•  :arg2(d/disaster))

•  :arg0thedescriber,:arg1thethingdescribed,:arg2whatitisdescribing

• Themandescribedthemissionasadisaster.Asthemandescribedit,themissionwasadisaster

Questions• Amr-unknowntoindicatewh-ques3ons

• (f/find-01:arg0(g/girl):arg1(a/amr-unknown))

Whatdidthegirlfind?

Compositionality•  Themeaningofthewholeisequaltothesumofthemeaningofitsparts

•  HowisAMRcomposi3onal?(d/describe-01•  :arg0(m/man):arg1(m2/mission):arg2(d/disaster))

•  (s/spy:arg0-of(a/aVract-01))

•  WhatistheAMRfortheaVrac3vespydescribedthemissionasadisaster?

LearningtoSearch(L2S)• Familyofapproachesthatsolvesstructuredpredic3onproblems•  Decomposestheproduc3onofthestructuredoutputintermsofexplicitsearchspace

•  Learnshypothesesthatcontrolapolicythattakesac3onsinthesearchspace

• AMRisastructuredseman3crepresenta3on

• Modellearningofconceptsandrela3onsinaunifiedseMng.

AMRparsingtaskdecomposed• Predic3ngconcepts

• Predic3ngtheroot

• Predic3ngrela3onsbetweenpredicatedconcepts

Searchspace• States={x1,x2,....,xn,y1,y2,….,yi-1}wheretheinput{x1,x2,....,xn}arethenwordsofthesentence

• Conceptpredic3on:labelsy1,y2,….,yi-1aretheconceptspredicteduptoi-1.•  Nextac3on:yiistheconceptforwordxifromak-bestlistofconcepts

• Rela3onpredic3on:labelsarerela3onsforpredictedpairsofconcepts

• Rootpredic3on:mul3-taskclassifierselectsrootconceptfromallpredictedconcepts

Example

WordEmbeddings,DistributionalSemanticsWordDisambiguationTextSimilarity

Topicstoknow• Howtodoworddisambigua3on

• Distributedvsdistribu3onalrepresenta3ons

• Howtocomputetextsimilarity

• Whatwordembeddingscapture

MainIdeaofword2vec• Predictbetweeneverywordanditscontext

• Twoalgorithms•  Skip-gram(SG)Predictcontextwordsgiventarget(posi3onindependent)

• Con3nuousBagofWords(CBOW)Predicttargetwordfrombag-of-wordscontext

SlideadaptedfromChrisManning

TrainingMethods• Two(moderatelyefficient)trainingmethods

HierarchicalsonmaxNega3vesamplingToday:naïvesonmax

SlideadaptedfromChrisManning

Instead,abankcanholdtheinvestmentsinacustodialaccountContextcentercontextwordswordsword2wordt2wordwindowwindowButasagricultureburgeonsontheeastbank,theriverwillshrinkContextwordscentercontext2wordwindowt2wordwindow

ObjectiveFunction• Maximizetheprobabilityofcontextwordsgiventhecenterword

J’(Θ)=ΠΠP(wt+j|wtjΘ)t=1-m≤j≤mj≠0Nega3veloglikelihood

J’(Θ)=-1/TΣΣlogP(wt+j|wt)t=1-m≤j≤mj≠0WhereΘrepresentsallvariablestobeop3mized

SlideadaptedfromChrisManning

• Whataretheparametersintheobjec3vefunc3on?Whatarewelearning?

Softmaxusingwordctoobtainprobabilityofwordo

• ConvertP(wt+j|wt)

P(o|c)=exp(uoTvc)/Σvw=1exp(uwTvc)exponen3atenormalizetomakeposi3vewhereoistheoutside(oroutput)wordindexandcisthecenterwordindex,vcanduoarecenterandoutsidevectorsofindicescando

SlideadaptedfromChrisManning

Softmax

SlidefromDragomirRadev

SlidefromKapilThadani

SlidefromChrisManning

Question• Whatarewelearning?

• Howisthelosscomputed?

NeuralNets• Basicarchitectureoffeedforwardneuralnetwork

• Lossandgradientdescent• Sonmax• Backpropaga3on• Determiningdimensionsofparameters,inputandoutput(basically,allHW3ques3ons)

• RecurrentNeuralNetwork• LSTM

ReviewSession-Bhavana• HW3answersplusthebasicsofneuralnetarchitectures

RNN–Ihadinmindyourfacts,buddy,nothers.

wx

U

x1

h0

h1

wx

U

x2

h2

σwx

U

x3

h3

σ

y3sigmoid

I had in

wx

U

x3

h3

σ

mind …

σ

RNN–Ihadinmindyourfacts,buddy,nothers.

wx

U

x1

h0

h1

wx

U

x2

h2

σwx

U

x3

h3

σ

y3sigmoid

I had in

wx

U

x3

h3

σ

mind …

Waretheweights:thewordembeddingmaatrixmul3plica3onwithxtyieldstheembeddingforxUisanotherweightmatrixH0isonennotspecified.Histhehiddenlayer.

σ

RNN–Ihadinmindyourfacts,buddy,nothers.

wx

U

x1

h0

h1

wx

U

x2

h2

σwx

U

x3

h3

σ

y3sigmoid

I had in

wx

U

x3

h3

σ

mind …

ht=σ(Uwxt)ht-1

σ

RNN–Ihadinmindyourfacts,buddy,nothers.

wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

Sigmoid

I had in

wx

wh

x3

h3

σ

mind …

Y=posi3ve?Y=nega3ve?Finalembeddingrunthroughthesigmoid

func3on->[0,1]1=posi3ve0=nega3veOnenfinalhisusedaswordembeddingforthesentence

Questions• Howishcomputed?

• Whatparametersarelearned?

• Howisypredicted?

• WhataretheproblemswithanRNN?

UpdatingParametersofanRNN

wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

y3sigmoid

Cost

wy

Backpropaga3onthrough3meGoldlabel=0(nega3ve)AdjustweightsusinggradientRepeatmany3meswithallexamples

SlidefromRadev

I had in

QuestionQuesJon30Supposeyouaregiventhefollowingstepfunc3on:defstep(x_t,h_tm1,c_tm1):u_t=T.nnet.sigmoid(T.dot(params[“Wx”],x_t)+T.dot(params[“Wh”],h_tm1))#Calculatetheinputgatei=T.nnet.sigmoid(T.dot(params[“Wxi”],x_t)+T.dot(params[“Whi”],h_tm1))#Calculatetheforgetgatef=T.nnet.sigmoid(T.dot(params[“Wxf”],x_t)+T.dot(params[“Whf”],h_tm1))#Calculatetheoutputgateo=T.nnet.sigmoid(T.dot(params[“Wxo”],x_t)+T.dot(params[“Who”],h_tm1))#Findthememorycellvalueforthecurrent3mestepc_t=f*c_tm1+i*u_t#Findthehiddenvalueforthecurrent3mesteph_t=o*T.tanh(c_t)returnh_t,c_t•  • 

•  AssumethatT.nnet.sigmoidappliesthesigmoidfunc3on.T.tanhappliesthetanhfunc3on.T.dot(A,B)computesthedotproductofAandB.The*operatorwillperformelementwisemul3plica3onwhenappliedtotwovectors.paramsisadic3onaryofparametersthathavealreadybeenini3alized.WhichdeeplearningarchitecturedoesthisfuncJonbelongto?a.RecursiveNeuralNetworkb.GatedRecurrentUnitc.LongShortTermMemoryNetworkd.ConvoluJonalNeuralNetwork

GatedArchitectures• RNN:ateachstateofthearchitecture,theen3rememorystate(h)isreadandwriVen

• Gate=binaryvectorgε{0,1}•  Controlsaccesston-dimensionalvectorx�g

• Consider•  Readsentriesfromxspecifiedbyg•  Copiesremainingentriesfroms(orhaswe’vebeenlabelingthehiddenstate)

Example:gatecopiesfromposi3ons2and5intheinputRemainingelementscopiedfrommemory

LSTMSolution§ Usememorycelltostoreinforma3onateach3mestep.

§ Use“gates”tocontroltheflowofinforma3onthroughthenetwork.§ Inputgate:protectthecurrentstepfromirrelevantinputs

§ Outputgate:preventthecurrentstepfrompassingirrelevantoutputstolatersteps

§ Forgetgate:limitinforma3onpassedfromonecelltothenext

[slidesfromCatherineFinegan-Dollak]

TransformingRNNtoLSTM

𝑢↓𝑡 =𝜎( 𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )

wx

wh

x1

h0

u1

σ

[slidesfromCatherineFinegan-Dollak]

TransformingRNNtoLSTM

wx

wh

x1

h0

u1

σ

c0

[slidesfromCatherineFinegan-Dollak]

TransformingRNNtoLSTM

𝑐↓𝑡 = 𝑓↓𝑡 ⊙ 𝑐↓𝑡−1 + 𝑖↓𝑡 ⊙ 𝑢↓𝑡 

wx

wh

x1

h0

u1

σ

c0

+ c1

f1

i1

[slidesfromCatherineFinegan-Dollak]

TransformingRNNtoLSTM

𝑐↓𝑡 = 𝑓↓𝑡 ⊙ 𝑐↓𝑡−1 + 𝑖↓𝑡 ⊙ 𝑢↓𝑡 

wx

wh

x1

h0

u1

σ

c0

+ c1

f1

i1

[slidesfromCatherineFinegan-Dollak]

TransformingRNNtoLSTM

𝑐↓𝑡 = 𝑓↓𝑡 ⊙ 𝑐↓𝑡−1 + 𝑖↓𝑡 ⊙ 𝑢↓𝑡 

wx

wh

x1

h0

u1

σ

c0

+ c1

f1

i1

[slidesfromCatherineFinegan-Dollak]

Summarization• Extrac3vevsabstrac3vesummariza3on

• Indica3vevsinforma3vesummary

• Singledocumentvsmul3-document

• Genericvsuser-focused

Extractionmethods• Topicsignaturewords

• Graph-basedmethods

TopicSignatureWords•  Usesthelogra3otesttofindwordsthatarehighlydescrip3veoftheinput

•  thelog-likelihoodra3otestprovidesawayofseMngathresholdtodivideallwordsintheinputintoeitherdescrip3veornot•  theprobabilityofawordintheinputisthesameasinthebackground•  thewordhasadifferent,higherprobability,intheinputthaninthebackground

•  Binomialdistribu3onusedtocomputethera3oofthetwolikelihoods

•  Thesentencescontainingthehighestpropor3onoftopicsignaturesareextracted.

Loglikelihoodratio

WherethecountswithsubscriptioccurintheinputcorpusandthosewithsubscriptBoccurinthebackgroundcorpusProbability(p)ofwoccuringk3mesinNBernoullitrialsThesta3s3c-2λhasaknownsta3s3caldistribu3on:chi-squared

• 

Graph-basedmethods•  Sentencesimilarityismeasuredasafunc3onofwordoverlap•  Frequentlyoccurringwordslinkmanysentences•  Similarsentencesgivesupportforeachother’simportance

•  Inputrepresentedashighlyconnec3vedgraph• Ver3cesrepresentsentences•  Edgesbetweensentencesweightedbysimilaritybetweentwosentences

• CosinesimilaritywithTF*IDFweightsforwords

SentenceSelection• Verteximportance(centrality)computedusinggraphalgorithms•  Edgeweightsnormalizedtoformprobabilitydistribu3on->Markovchain

•  Computeprobablityofofbeingineachvertexofgraphat3metwhilemakingconsecu3vetransi3onsfromonevertextonext

•  Asmoretransi3onsmade,probabilityofeachvertexconverges->sta3onarydistribu3on

• Ver3ceswithhigherprobability=moreimportantsentences

Abstractivesummarization• Whatiscompression?

• Whatisfusion?

• Whattradi3onalmethodmightIuseforasupervisedcompressionsystem?

Datasetforcompression(~3000sentencepairs)

Clarke&Lapata(2008)Input

•  ItalianairforcefightersscrambledtointerceptaLibyanairlinerflyingtowardsEuropeyesterdayastheUnitedNa3onsimposedsanc3onsonLibyaforthefirst3meinColMuammarGaddafi'sturbulent22yearsinpower.

Compression•  ItalianairforcefightersscrambledtointerceptaLibyanairlinerastheUnitedNa3onsimposedsanc3onsonLibya.

TexttoTextGenerationModeltexttransforma3onasastructuredpredicGonproblem• Input:Oneormoresentenceswithparses• Output:Singlesentence+parse

Jointinferenceover• wordchoice,• n-gramordering• dependencystructure

Thadani&McKeown,CONLL2013

SlidefromThadani

SlidefromThadani

SlidefromThadani

Compression• Input:singlesentence• Output:sentencewithsalientinforma3on• Dataset+baselinefromClarke&Lapata(2008)

Whataboutcompressionusingneuralnets?• Dataset:Dailymailhighlights

NeuralSummarizationArchitecture• Hierarchicaldocumentreader

• Derivemeaningrepresenta3onofdocumentfromitscons3tuentsentences

• AVen3onbasedhierarchicalcontentextractor

• Encoder-decoderarchitecture

DocumentReader• CNNsentenceencoder

• Usefulforsentenceclassifica3on•  Easytotrain

• LSTMdocumentencoder• Avoidsvanishinggradients

ChengandLapata2016

ChengandLapata2016

ChengandLapata2016

ChengandLapata2016

Typesofsummarizationevaluation• Automated:Rougescores

• Manual:Pyramidscores• Whatarethey?

• Taskbasedevalua3on• DoesasummaryhelpyoutoperformaresearchtaskbeVer?

MachineTranslation• Challengesformul3lingualtransla3on

• WhatistheMTpyramid?

• WhatarethedifferenttrainedmodelsusedintheIBMmodel?

• Whatisphrased-basedMT?

StatisticalMTIBMModel(Word-basedModel)

hVp://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

IBM’sEMtrainedmodels(1-5)•  Wordtransla3on•  Localalignment•  Fer3li3es•  Class-basedalignment•  Re-orderingAllareseparatemodelstotrain!Model1: ∏

=+==

m

jajm jefp

nceafpeapeafp

1

)|()1(

),|(*)|()|,(

Phrase-BasedStatisticalMT

•  Foreigninputsegmentedintophrases–  “phrase”isanysequenceofwords

•  Eachphraseisprobabilis3callytranslatedintoEnglish

–  P(totheconference|zurKonferenz)–  P(intothemee3ng|zurKonferenz)

•  Phrasesareprobabilis3callyre-orderedSee[Koehnetal,2003]foranintro.Thiswasstate-of-the-artbeforeneuralMT

Morgen fliege ich nach Kanada zur Konferenz

Tomorrow I will fly to the conference In Canada

SlidecourtesyofKevinKnighthVp://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Marydidnotslapthegreenwitch

Marianodióunabofetadaalabrujaverde

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)

WordAlignmentInducedPhrasesSlidecourtesyofKevinKnighthVp://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

HowisMTevaluationdone?• Automatedmetrics:Bleu,Meteor

• Humanjudgments:• Adequacy(accuracy)•  Fluency

• HowwerehumanjudgmentsdoneinWMT2017?• Whatweresomeapproachestoqualitycontrolforcrowdsourcing?

WhendidNeuralMTsurpassstatisticalmethods(phrase-basedandsyntax)?

• WMT2016

• WhendidcompaniesfirstreleaseNMTsystems?•  2016

NeuralMT• Encoder-decoderapproach

• WhatistheproblemwithabasicRNN?

• HowisaVen3onused?

• HowelsehastheRNNmemoryproblembeenaddressed?

Whatotherapproaches?•  TrainstackedRNNSusingmul3plelayers

• Useabidirec3onalencoder•  Thiscanhelpinrememberingtheearlypartofthesourceinputsentence

•  Traintheinputsequenceinreverseorder

• Deepernetworks:decoderdepthof8

• Data:parallel,back-translated,duplicatedmonolingual

Questions?

Thankyou!• ItwasgreatgeMngtoknowyou!Goodluckontheexam!