statistical machine translation part v – phrase-based smt
DESCRIPTION
Statistical Machine Translation Part V – Phrase-based SMT. Alexander Fraser ICL, U. Heidelberg CIS, LMU München 2014.06.05 Statistical Machine Translation. Where we have been. We defined the overall problem and talked about evaluation We have now covered word alignment - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/1.jpg)
Statistical Machine TranslationPart V – Phrase-based SMT
Alexander FraserICL, U. Heidelberg
CIS, LMU München
2014.06.05 Statistical Machine Translation
![Page 2: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/2.jpg)
Where we have been
• We defined the overall problem and talked about evaluation
• We have now covered word alignment– IBM Model 1, true Expectation Maximization– IBM Model 4, approximate Expectation
Maximization– Symmetrization Heuristics (such as Grow)• Applied to two Model 4 alignments• Results in final word alignment
![Page 3: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/3.jpg)
Where we are going
• We will define a high performance translation model
• We will show how to solve the search problem for this model
![Page 4: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/4.jpg)
Outline
• Phrase-based translation– Model– Estimating parameters
• Decoding
![Page 5: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/5.jpg)
• We could use IBM Model 4 in the direction p(f|e), together with a language model, p(e), to translate
argmax P( e | f ) = argmax P( f | e ) P( e ) e e
![Page 6: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/6.jpg)
• However, decoding using Model 4 doesn’t work well in practice– One strong reason is the bad 1-to-N assumption– Another problem would be defining the search
algorithm• If we add additional operations to allow the English words
to vary, this will be very expensive
– Despite these problems, Model 4 decoding was briefly state of the art
• We will now define a better model…
![Page 7: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/7.jpg)
Slide from Koehn 2008
![Page 8: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/8.jpg)
Slide from Koehn 2008
![Page 9: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/9.jpg)
Language Model• Often a trigram language model is used for p(e)– P(the man went home) = p(the | START) p(man |
START the) p(went | the man) p(home | man went)• Language models work well for comparing the
grammaticality of strings of the same length– However, when comparing short strings with long
strings they favor short strings– For this reason, an important component of the
language model is the length bonus• This is a constant > 1 multiplied for each English word in the
hypothesis• It makes longer strings competitive with shorter strings
![Page 10: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/10.jpg)
Modified from Koehn 2008
d
![Page 11: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/11.jpg)
Slide from Koehn 2008
![Page 12: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/12.jpg)
Slide from Koehn 2008
![Page 13: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/13.jpg)
Slide from Koehn 2008
![Page 14: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/14.jpg)
Slide from Koehn 2008
![Page 15: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/15.jpg)
Slide from Koehn 2008
![Page 16: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/16.jpg)
Slide from Koehn 2008
![Page 17: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/17.jpg)
Slide from Koehn 2008
![Page 18: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/18.jpg)
Slide from Koehn 2008
![Page 19: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/19.jpg)
Slide from Koehn 2008
![Page 20: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/20.jpg)
Slide from Koehn 2008
![Page 21: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/21.jpg)
Slide from Koehn 2008
z^n
![Page 22: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/22.jpg)
Outline
• Phrase-based translation model• Decoding– Basic phrase-based decoding– Dealing with complexity• Recombination• Pruning• Future cost estimation
![Page 23: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/23.jpg)
Slide from Koehn 2008
![Page 24: Statistical Machine Translation Part V – Phrase-based SMT](https://reader036.vdocuments.us/reader036/viewer/2022062520/568158d5550346895dc61bd3/html5/thumbnails/24.jpg)
Decoding
• Goal: find the best target translation of a source sentence• Involves search
– Find maximum probability path in a dynamically generated search graph
• Generate English string, from left to right, by covering parts of Foreign string– Generating English string left to right allows scoring with the n-gram
language model
• Here is an example of one path