machine translation - sameer singhsameersingh.org/courses/statnlp/wi17/slides/lecture-0228...•...

46
Machine Translation Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 28, 2017 Based on slides from Jason Eisenstein, Chris Dyer, Alan Ritter, Yejin Choi, and everyone else they copied from.

Upload: others

Post on 03-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

MachineTranslation

Prof.SameerSinghCS295:STATISTICALNLP

WINTER2017

February28,2017

BasedonslidesfromJasonEisenstein,ChrisDyer,AlanRitter,YejinChoi,andeveryoneelsetheycopiedfrom.

Upcoming…

• Homework4isdueonMarch13• Write-upanddatareleasingsoon.Homework

• Statusreportduein1weeks:March7,2017• Instructionscomingtoday!• Almostfinalreport,only5pages

Project

• Papersummaries:February28,March14• Summary1gradedSummaries

CS295:STATISTICALNLP(WINTER2017) 2

Outline

MachineTranslation

IntroductiontoStatisticalMT

IBMTranslationModels

CS295:STATISTICALNLP(WINTER2017) 3

Outline

MachineTranslation

IntroductiontoStatisticalMT

IBMTranslationModels

CS295:STATISTICALNLP(WINTER2017) 4

MachineTranslation

CS295:STATISTICALNLP(WINTER2017) 5

IhavealwaysimaginedParadiseasakindoflibrary.

Yo,quemefigurabaelParaíso/Bajolaespeciedeunabiblioteca.

Challenges:WordOrder

CS295:STATISTICALNLP(WINTER2017) 6

EvenforSVO English:IwillbuyitFrench:Jevais l’acheter (Iwillitbuy)

English:IboughtitFrench:Jel’ai achet´ e(Iithavebought)

SVOvsSOV English:IBMboughtLotusJapanese:IBMLotusbought

Challenges:LexicalAmbiguity

CS295:STATISTICALNLP(WINTER2017) 7

bill

pico

cuenta

Challenges:Pronouns

CS295:STATISTICALNLP(WINTER2017) 8

InSpanish,youcanrecoverthepronounfromverbinflection:Vivimos en Atlanta→We liveinAtlanta

IAgain,discoursecontextisoftencrucial:Vive en Atlanta→She/he/it livesinAtlanta

Englishpossessivepronounstakethegenderoftheowner:Marieridesher bike

Frenchpossessivepronounstakethegenderoftheobject:Mariemonte surson vélo

DifferentPronouns

DroppingPronouns

Challenges:Tenses

CS295:STATISTICALNLP(WINTER2017) 9

Thepreterite tenseisforeventswithadefinitetime,e.g.Ibikedtoworkthismorning

Theimperfectisforeventswithindefinitetimes,e.g.Ibikedtoworkalllastsummer

TotranslateEnglishtoSpanish,wemustpicktherighttense.

Challenges:Idioms

CS295:STATISTICALNLP(WINTER2017) 10

Whyintheworld

Kickthebucket

Lendmeyourears

DeadAsADoornail

AsCoolAsaCucumber

HoldYourHorses

StorminaTeacupBob'sYourUncle

BlueintheFace

HeadInTheClouds

Rules forMachineTranslation

CS295:STATISTICALNLP(WINTER2017) 11

Rulesfortranslatingmuch ormany intoRussian:

if precedingwordishow returnskol’koelseifprecedingwordisas return stol’ko zheelseif wordismuch

ifprecedingwordisvery returnnilelseiffollowingwordisanounreturnmnogo

else (wordismany)ifprecedingwordisaprepositionandfollowingwordisnounreturnmnogiielsereturnmnogo

Panov (1960)

TheVauquios Triangle

CS295:STATISTICALNLP(WINTER2017) 12

Outline

MachineTranslation

IntroductiontoStatisticalMT

IBMTranslationModels

CS295:STATISTICALNLP(WINTER2017) 13

StatisticalMachineTranslation

CS295:STATISTICALNLP(WINTER2017) 14

ParallelCorpus:Examples

CS295:STATISTICALNLP(WINTER2017) 15

ParallelCorpus:Examples

CS295:STATISTICALNLP(WINTER2017) 16

ParallelCorpus:Examples

CS295:STATISTICALNLP(WINTER2017) 17

ParallelCorpus:Examples

CS295:STATISTICALNLP(WINTER2017) 18

TheRosettaStone

CS295:STATISTICALNLP(WINTER2017) 19

WarrenWeaver(1949)

CS295:STATISTICALNLP(WINTER2017) 20

ParallelCorpus:Examples

CS295:STATISTICALNLP(WINTER2017) 21

ParallelCorpus:Examples

CS295:STATISTICALNLP(WINTER2017) 22

NoisyChannelModel

CS295:STATISTICALNLP(WINTER2017) 23

“NoisyChannel” Decoder

NoisyChannelModel

CS295:STATISTICALNLP(WINTER2017) 24

“NoisyChannel” Decoder

Example:NoisyChannel

CS295:STATISTICALNLP(WINTER2017) 25

Example:NoisyChannel

CS295:STATISTICALNLP(WINTER2017) 26

ComponentsofanMTsystem

CS295:STATISTICALNLP(WINTER2017) 27

LanguageModel

TranslationModel

DecodingAlgo

ComponentsofanMTsystem

CS295:STATISTICALNLP(WINTER2017) 28

EvaluatingMT

CS295:STATISTICALNLP(WINTER2017) 29

HumanEvaluation

CS295:STATISTICALNLP(WINTER2017) 30

Fluency

Adequacy

A:furiousnAgA onwednesday ,thetribalminimumpur oftenschoolsalsowasburnt

B:furiousnAgA onwednesday thetribalpur minitenschoolsofthemwasalsoburnt

AutomatedEvaluation

CS295:STATISTICALNLP(WINTER2017) 31

Fluency

Adequacy

BLEUScore

CS295:STATISTICALNLP(WINTER2017) 32

BLEUScore:Example

CS295:STATISTICALNLP(WINTER2017) 33

‘extensionofisi inuttar pradesh ’

‘isi ’sexpansioninuttar pradesh ’‘thespreadofisi inuttar pradesh ’‘isi spreadinginuttar pradesh ’thespreadofisi inuttar pradesh

BLEUScore:Example

CS295:STATISTICALNLP(WINTER2017) 34

‘extensionofisi inuttar pradesh ’

‘isi ’sexpansioninuttar pradesh ’‘thespreadofisi inuttar pradesh ’‘isi spreadinginuttar pradesh ’thespreadofisi inuttar pradesh

BLEU’snotbad…

CS295:STATISTICALNLP(WINTER2017) 35G.Doddington,NIST

Outline

MachineTranslation

IntroductiontoStatisticalMT

IBMTranslationModels

CS295:STATISTICALNLP(WINTER2017) 36

Statistical TranslationModel

CS295:STATISTICALNLP(WINTER2017) 37

Andtheprogramwasimplemented

Laprogrammation aété mise en application

WordAlignment:Direct

CS295:STATISTICALNLP(WINTER2017) 38

WordAlignment:1-to-Many

CS295:STATISTICALNLP(WINTER2017) 39

WordAlignment:Reordering

CS295:STATISTICALNLP(WINTER2017) 40

WordAlignment:Inserting

CS295:STATISTICALNLP(WINTER2017) 41

WordAlignment:Dropping

CS295:STATISTICALNLP(WINTER2017) 42

TranslatingwithAlignments

CS295:STATISTICALNLP(WINTER2017) 43

Example:TranslationProb

CS295:STATISTICALNLP(WINTER2017) 44

IBMModels

CS295:STATISTICALNLP(WINTER2017) 45

Model1

Model2

Model3/4/5

WordAlignmentAlgorithm

CS295:STATISTICALNLP(WINTER2017) 46