machine translation - coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...machine...
TRANSCRIPT
![Page 1: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/1.jpg)
The Dark Secrets of
MTRevealed
Machine TranslationI256: Applied Natural Language Processing
John DeNeroSome slides on loan from Dan Klein & others
Thursday, November 5, 2009
![Page 2: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/2.jpg)
Data-Driven Machine Translation
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Target language corpus:
I will get to it soon See you later He will do it
Thursday, November 5, 2009
![Page 3: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/3.jpg)
Data-Driven Machine Translation
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Machine translation system:
Model of translation
Target language corpus:
I will get to it soon See you later He will do it
Thursday, November 5, 2009
![Page 4: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/4.jpg)
Data-Driven Machine Translation
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Yo lo haré prontoNOVEL SENTENCE
Machine translation system:
Model of translation
Target language corpus:
I will get to it soon See you later He will do it
Thursday, November 5, 2009
![Page 5: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/5.jpg)
Data-Driven Machine Translation
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Yo lo haré prontoNOVEL SENTENCE
I will do it soon
Machine translation system:
Model of translation
Target language corpus:
I will get to it soon See you later He will do it
Thursday, November 5, 2009
![Page 6: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/6.jpg)
Uses of Translation
• Assimilation
• Gist of a document is helpful
• Dissemination
• High quality expected; may be closed domain
• Communication
• Wide range of quality requirements
Thursday, November 5, 2009
![Page 7: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/7.jpg)
Uses of Translation
• Assimilation
• Gist of a document is helpful
• Dissemination
• High quality expected; may be closed domain
• Communication
• Wide range of quality requirements
Machine translation is much lower cost, much faster, and much easier to access than convetional translation. However, it’s worse.
Thursday, November 5, 2009
![Page 8: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/8.jpg)
A Brief and Biased History
’47 ’66 ’90’s’58 ’00’s
Thursday, November 5, 2009
![Page 9: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/9.jpg)
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
’58 ’00’s
Thursday, November 5, 2009
![Page 10: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/10.jpg)
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
’58 ’00’s
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
![Page 11: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/11.jpg)
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
’58 ’00’s
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
![Page 12: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/12.jpg)
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
ALPAC report deems MT bad
’58 ’00’s
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
![Page 13: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/13.jpg)
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
ALPAC report deems MT bad
’58 ’00’s
John Pierce
“Machine Translation” presumably means going by algorithm from machine-readable source text to
useful target text... In this context, there has been no
machine translation...
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
![Page 14: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/14.jpg)
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
ALPAC report deems MT bad
Statistical data-driven approach introduced
’58 ’00’s
John Pierce
“Machine Translation” presumably means going by algorithm from machine-readable source text to
useful target text... In this context, there has been no
machine translation...
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
![Page 15: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/15.jpg)
A Brief and Biased History
’47 ’66 ’90’s
MT is the “first” non-numeral compute task
Berkeley’s first MT grant
ALPAC report deems MT bad
Statistical data-driven approach introduced
Statistical MT thrives
’58 ’00’s
John Pierce
“Machine Translation” presumably means going by algorithm from machine-readable source text to
useful target text... In this context, there has been no
machine translation...
Thus it may be true that the way to translate from Chinese to Arabic, or from Russian to
Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way
is to descend, from each language, down to the common base of human communication
— the real but as yet undiscovered universal
language — and — then re-emerge by whatever particular
route is convenient.
Warren Weaver
Warren Weaver
When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed
to decode.”
Thursday, November 5, 2009
![Page 16: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/16.jpg)
The Problem with Dictionary Look-ups
顶部顶端顶头
盖盖帽极尖峰面摘心
/top/roof/
/summit/peak/top/apex/
/coming directly towards one/top/end/
/lid/top/cover/canopy/build/Gai/
/surpass/top/
/extremely/pole/utmost/top/collect/receive/
/peak/top/
/fade/side/surface/aspect/top/face/flour/
/top/topping/
Example from Douglas Hofstadter
Thursday, November 5, 2009
![Page 17: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/17.jpg)
The Problem with Dictionary Look-ups
顶部顶端顶头
盖盖帽极尖峰面摘心
/top/roof/
/summit/peak/top/apex/
/coming directly towards one/top/end/
/lid/top/cover/canopy/build/Gai/
/surpass/top/
/extremely/pole/utmost/top/collect/receive/
/peak/top/
/fade/side/surface/aspect/top/face/flour/
/top/topping/
carrot, class, pile, condition, drawer, speed, bikini, lungs, “top dog”, “top brass”, “top of the line”, “big top”, “over the top”, “pop top”, “top off”, “off the top of my head”, “take it from the top”, “I’m on top of it”, ...
Example from Douglas Hofstadter
Thursday, November 5, 2009
![Page 18: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/18.jpg)
Levels of Language Transfer
Source text
Target text
Thursday, November 5, 2009
![Page 19: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/19.jpg)
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Thursday, November 5, 2009
![Page 20: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/20.jpg)
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Morphology
Thursday, November 5, 2009
![Page 21: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/21.jpg)
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Syntax
Morphology
Thursday, November 5, 2009
![Page 22: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/22.jpg)
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Semantics
Syntax
Morphology
Thursday, November 5, 2009
![Page 23: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/23.jpg)
Interlingua
Levels of Language Transfer
Source text
Generation
Analys
is Transfer
Target text
Semantics
Syntax
Morphology
Thursday, November 5, 2009
![Page 24: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/24.jpg)
Translating with Tree Transducers
lo haré .de muy buen grado
Input Output
Grammar
Thursday, November 5, 2009
![Page 25: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/25.jpg)
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
Input Output
Grammar
Thursday, November 5, 2009
![Page 26: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/26.jpg)
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
ADV
Input Output
Grammar
gladly
ADV
Thursday, November 5, 2009
![Page 27: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/27.jpg)
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
ADV
S → 〈 lo haré ADV . ; I will do it ADV . 〉
Input Output
Grammar
gladly
ADV
Thursday, November 5, 2009
![Page 28: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/28.jpg)
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
ADV
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Input Output
Grammar
gladly
ADV
Thursday, November 5, 2009
![Page 29: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/29.jpg)
Translating with Tree Transducers
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
ADV
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Input Output
Grammar
gladly
ADV
PRPVB
MD VP
VPNP .
S
PRP
Thursday, November 5, 2009
![Page 30: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/30.jpg)
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
Thursday, November 5, 2009
![Page 31: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/31.jpg)
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
Product of Experts Model
Models that factor over rules
Product of Experts Model
Thursday, November 5, 2009
![Page 32: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/32.jpg)
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
Product of Experts Model
Models that factor over rules
Product of Experts Model
Thursday, November 5, 2009
![Page 33: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/33.jpg)
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
Product of Experts Model
Models that factor over rules
Product of Experts Model
How good is this rule?
Thursday, November 5, 2009
![Page 34: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/34.jpg)
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
Language model factors over n-grams
Product of Experts Model
Models that factor over rules
Product of Experts Model
How good is this rule?
Thursday, November 5, 2009
![Page 35: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/35.jpg)
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
I!
i=1
P (ei|ei!1, ..., e1)!1
Language model factors over n-grams
Product of Experts Model
Models that factor over rules
Product of Experts Model
How good is this rule?
Thursday, November 5, 2009
![Page 36: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/36.jpg)
A Statistical Translation Model
lo haré .
ADV → 〈 de muy buen grado ; gladly 〉
de muy buen grado
S → 〈 lo haré ADV . ; I will do it ADV . 〉
S
I will do it
S
.
Synchronous Derivation
Grammar
ADV
gladly
ADV
!
r
P (er|fr)!2P (fr|er)!3 . . .
I!
i=1
P (ei|ei!1, ..., e1)!1
Language model factors over n-grams
Product of Experts Model
Models that factor over rules
Product of Experts Model
How good is this rule?
How good is this target sentence?
Thursday, November 5, 2009
![Page 37: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/37.jpg)
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
![Page 38: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/38.jpg)
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
![Page 39: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/39.jpg)
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
![Page 40: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/40.jpg)
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
![Page 41: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/41.jpg)
Learning to Translate
Example from Adam Lopez
Thursday, November 5, 2009
![Page 42: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/42.jpg)
Unsupervised Word Alignment
• Input: A large bitext of sentences and their translations
• Approach: Using what we know about the problem and corpus statistics, align words of translations automatically
• Exciting fact: Unsupervised methods perform well enough that very few systems use supervised word alignment
Thursday, November 5, 2009
![Page 43: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/43.jpg)
Unsupervised Word Alignment
• Input: A large bitext of sentences and their translations
• Approach: Using what we know about the problem and corpus statistics, align words of translations automatically
• Exciting fact: Unsupervised methods perform well enough that very few systems use supervised word alignment
Thursday, November 5, 2009
![Page 44: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/44.jpg)
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
![Page 45: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/45.jpg)
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
![Page 46: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/46.jpg)
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
![Page 47: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/47.jpg)
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
![Page 48: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/48.jpg)
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
Thursday, November 5, 2009
![Page 49: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/49.jpg)
Properties of Cross-Lingual Alignments
I declare resumed the session of the european parliament
Declaro reanudado el periodo de sesiones del parlamento europeo
adjourned on Friday 17 December 1999 , ...
interrumpido el Viernes 17 de Diciembre pasado , ...
• Often one-to-one or many-to-one (usually over contiguous phrases)
• Occasionally many-to-many, driven by non-literal translations
Thursday, November 5, 2009
![Page 50: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/50.jpg)
Heuristic Estimation
• Two words that co-occur regularly are translations
• Normalize by the word frequencies
• Enforcing competition across words (e.g., finding a one-to-one or many-to-one mapping) is a good idea
c(f) c(e)
c(e, f)
2 · c(e, f)c(e) + c(f)
The number of times e and f appear together
Count of word f Count of word f
Dice coefficient
Thursday, November 5, 2009
![Page 51: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/51.jpg)
Heuristic Estimation
• Two words that co-occur regularly are translations
• Normalize by the word frequencies
• Enforcing competition across words (e.g., finding a one-to-one or many-to-one mapping) is a good idea
c(f) c(e)
c(e, f)
2 · c(e, f)c(e) + c(f)
The number of times e and f appear together
Count of word f Count of word f
Dice coefficient
Thursday, November 5, 2009
![Page 52: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/52.jpg)
IBM Model 1 (Brown et al, ’93)
• Probabilistic models naturally impose competition
• Assume that foreign words are generated independently
• Assume a hidden alignment vector a encoding which English word generates each foreign word
I declare resumed the session
Declaro reanudado el periodo de sesiones
Thursday, November 5, 2009
![Page 53: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/53.jpg)
IBM Model 1 (Brown et al, ’93)
• Probabilistic models naturally impose competition
• Assume that foreign words are generated independently
• Assume a hidden alignment vector a encoding which English word generates each foreign word
I declare resumed the session
Declaro reanudado el periodo de sesiones
a6=5
Thursday, November 5, 2009
![Page 54: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/54.jpg)
IBM Model 1 (Brown et al, ’93)
• Probabilistic models naturally impose competition
• Assume that foreign words are generated independently
• Assume a hidden alignment vector a encoding which English word generates each foreign word
I declare resumed the session
Declaro reanudado el periodo de sesiones
P (f, a|e) =J!
j=1
P (aj = i|I, J)P (fj |ei)
=1
I + 1P (fj |ei)
a6=5
Thursday, November 5, 2009
![Page 55: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/55.jpg)
IBM Model 1 (Brown et al, ’93)
• Probabilistic models naturally impose competition
• Assume that foreign words are generated independently
• Assume a hidden alignment vector a encoding which English word generates each foreign word
I declare resumed the session
Declaro reanudado el periodo de sesiones
P (f, a|e) =J!
j=1
P (aj = i|I, J)P (fj |ei)
=1
I + 1P (fj |ei)
a6=5
Thursday, November 5, 2009
![Page 56: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/56.jpg)
Estimating Model 1 Parameters
P (f |e)
Thursday, November 5, 2009
![Page 57: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/57.jpg)
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
P (f |e)
Thursday, November 5, 2009
![Page 58: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/58.jpg)
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
P (f |e)
Thursday, November 5, 2009
![Page 59: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/59.jpg)
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
P (f |e)
P (aj = i|e, f) =1
I+1P (fj |ei)!i!
1I+1P (fj |ei!)
Thursday, November 5, 2009
![Page 60: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/60.jpg)
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
• M-step computes ratios of expected counts
P (f |e)
P (aj = i|e, f) =1
I+1P (fj |ei)!i!
1I+1P (fj |ei!)
Thursday, November 5, 2009
![Page 61: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/61.jpg)
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
• M-step computes ratios of expected counts
P (f |e)
P (aj = i|e, f) =1
I+1P (fj |ei)!i!
1I+1P (fj |ei!)
P (f |e) =sum of posteriors for f aligned to e
sum of posteriors of any f ! aligned to e
Thursday, November 5, 2009
![Page 62: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/62.jpg)
Estimating Model 1 Parameters
• Free parameters in the model:
• Goal is to maximize the data likelihood
• E-step computes expected alignments (posteriors)
• M-step computes ratios of expected counts
• Repeat e- and m-step many times (like 5 or 10)
P (f |e)
P (aj = i|e, f) =1
I+1P (fj |ei)!i!
1I+1P (fj |ei!)
P (f |e) =sum of posteriors for f aligned to e
sum of posteriors of any f ! aligned to e
Thursday, November 5, 2009
![Page 63: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/63.jpg)
Aligning Words Under the Model
• Viterbi: For every j, select i that maximizes
• Posterior: Align every (i,j) that has
P (aj = i|e, f)
P (aj = i|e, f) > !
Gives competition among explanations
Gives control over how many alignment links to posit
Thursday, November 5, 2009
![Page 64: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/64.jpg)
Evaluation: Alignment Error Rate
Sure align.
Possible align.
Predicted align.
=
=
=
Thursday, November 5, 2009
![Page 65: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/65.jpg)
Evaluation: Alignment Error Rate
Sure align.
Possible align.
Predicted align.
=
=
=
Thursday, November 5, 2009
![Page 66: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/66.jpg)
Evaluation: Alignment Error Rate
Sure align.
Possible align.
Predicted align.
=
=
=
Thursday, November 5, 2009
![Page 67: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/67.jpg)
Problems with IBM Model 1
• Too many alignments to rare words (garbage collection)
• Alignments jump around all over the sentence
Thursday, November 5, 2009
![Page 68: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/68.jpg)
Problems with IBM Model 1
• Too many alignments to rare words (garbage collection)
• Alignments jump around all over the sentence
Thursday, November 5, 2009
![Page 69: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/69.jpg)
Intersected IBM Model 1
Thursday, November 5, 2009
![Page 70: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/70.jpg)
Intersected IBM Model 1
• Train Model 1 in both directions, align with each, then intersect the output(Och and Ney, ’03)
• Result is one-to-one with Viterbi alignments
• Second model filters the first, eliminating mistakes
Thursday, November 5, 2009
![Page 71: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/71.jpg)
Intersected IBM Model 1
Model P/R AERModel 1 E→F 82/58 30.6
Model 1 F→E 85/58 28.7
Model 1 AND 96/46 34.8
• Train Model 1 in both directions, align with each, then intersect the output(Och and Ney, ’03)
• Result is one-to-one with Viterbi alignments
• Second model filters the first, eliminating mistakes
Thursday, November 5, 2009
![Page 72: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/72.jpg)
Joint Training for IBM Model 1
Model P/R AERModel 1 E→F 82/58 30.6Model 1 F→E 85/58 28.7Model 1 AND 96/46 34.8Model 1 INT 93/69 19.5
• We can intersect model predictions during training as well
• Modified alignment posterior:
• Models are forced to agree as they select parameters
• Same precision benefits, but higher recall from more agreement
Pe!f (aj = i|e, f) · Pf!e(ai = j|e, f)
Thursday, November 5, 2009
![Page 73: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/73.jpg)
IBM Model 2
• Words at the beginning of sentences should align
• Words at the end of sentences should align
• Alignment probability depends on position, e.g.
P (f, a|e) =J!
j=1
P (aj = i|I, J) · P (fj |ei)
! exp("!
""""ai " iI
J
"""") · P (fj |ei)
Thursday, November 5, 2009
![Page 74: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/74.jpg)
Phrase Movement
Des tremblements de terre ont à nouveau touché le Japon jeudi 4 novembre.
On Tuesday Nov. 4, earthquakes rocked Japan once again
Absolute position distortion isn’t quite right
Thursday, November 5, 2009
![Page 75: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/75.jpg)
IBM Models 1/2
Thank you , I shall do so gladly .
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
![Page 76: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/76.jpg)
A:
IBM Models 1/2
Thank you , I shall do so gladly .
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
![Page 77: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/77.jpg)
A:
IBM Models 1/2
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
![Page 78: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/78.jpg)
A:
IBM Models 1/2
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3| I, J)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
![Page 79: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/79.jpg)
A:
IBM Models 1/2
Thank you , I shall do so gladly .
1 3 7 6 9
1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3| I, J)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
8 8 88
E:
F:
Thursday, November 5, 2009
![Page 80: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/80.jpg)
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
![Page 81: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/81.jpg)
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
E:
F:
Thursday, November 5, 2009
![Page 82: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/82.jpg)
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1
E:
F:
Thursday, November 5, 2009
![Page 83: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/83.jpg)
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1 3
E:
F:
Thursday, November 5, 2009
![Page 84: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/84.jpg)
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1 3 7 6
E:
F:
Thursday, November 5, 2009
![Page 85: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/85.jpg)
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1 3 7 6 8 8 88
E:
F:
Thursday, November 5, 2009
![Page 86: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/86.jpg)
A:
The HMM Model
Thank you , I shall do so gladly .1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
1 3 7 6 98 8 88
E:
F:
Thursday, November 5, 2009
![Page 87: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/87.jpg)
The HMM Model
• Model 2 preferred global monotonicity
• We want local monotonicity (small jumps)
• HMM model (Vogel et al 96)
• Re-estimate using the forward-backward algorithm
• Handling nulls requires some care
Thursday, November 5, 2009
![Page 88: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/88.jpg)
The HMM Model
• Model 2 preferred global monotonicity
• We want local monotonicity (small jumps)
• HMM model (Vogel et al 96)
• Re-estimate using the forward-backward algorithm
• Handling nulls requires some care
Thursday, November 5, 2009
![Page 89: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/89.jpg)
The HMM Model
• Model 2 preferred global monotonicity
• We want local monotonicity (small jumps)
• HMM model (Vogel et al 96)
• Re-estimate using the forward-backward algorithm
• Handling nulls requires some care
-2 -1 0 1 2 3
Thursday, November 5, 2009
![Page 90: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/90.jpg)
HMM Examples
Thursday, November 5, 2009
![Page 91: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/91.jpg)
AER for HMMs
Model AER
Model 1 INT 19.5
HMM E→F 11.4
HMM F→E 10.8
HMM AND 7.1
HMM INT 4.7
GIZA M4 AND 6.9
Thursday, November 5, 2009
![Page 92: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/92.jpg)
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
Thursday, November 5, 2009
![Page 93: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/93.jpg)
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
Thursday, November 5, 2009
![Page 94: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/94.jpg)
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
Thursday, November 5, 2009
![Page 95: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/95.jpg)
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
Thursday, November 5, 2009
![Page 96: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/96.jpg)
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
In 2004, we aligned trees
Yo lo haré mañanaI will do it tomorrow
Thursday, November 5, 2009
![Page 97: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/97.jpg)
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
In 2004, we aligned trees
Yo lo haré mañanaI will do it tomorrow
VPNP
Thursday, November 5, 2009
![Page 98: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/98.jpg)
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
In 2004, we aligned trees
Yo lo haré mañanaI will do it tomorrow
VPNP
Thursday, November 5, 2009
![Page 99: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/99.jpg)
Aligning Larger Structures
In 1990, we aligned words
Yo lo haré mañana
I will do it tomorrow
English (E) P( E | mañana )
tomorrow 0.7
morning 0.3
English (E) P( E | lo haré )
will do it 0.8
will do so 0.2
In 1999, we aligned phrases
Yo lo haré mañanaI will do it tomorrow
In 2004, we aligned trees
Yo lo haré mañanaI will do it tomorrow
VPNP PRNVB
MD VP
VP
NP
will do it
P( ) = 0.8VP
lo haré NP
Thursday, November 5, 2009
![Page 100: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/100.jpg)
Aligning Structural Components
In 2009, we still align words
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
![Page 101: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/101.jpg)
Aligning Structural Components
In 2009, we still align words
Align words with a probabilistic model
1
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
![Page 102: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/102.jpg)
Aligning Structural Components
In 2009, we still align words
Yo lo haré mañana
I will do it tomorrow
Align words with a probabilistic model
1
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
![Page 103: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/103.jpg)
Aligning Structural Components
In 2009, we still align words
Yo lo haré mañana
I will do it tomorrow
Align words with a probabilistic model
1
Infer presence of larger structures from this alignment
2
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
![Page 104: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/104.jpg)
Aligning Structural Components
In 2009, we still align words
Yo lo haré mañana
I will do it tomorrow
Align words with a probabilistic model
1
Infer presence of larger structures from this alignment
2
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
![Page 105: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/105.jpg)
Aligning Structural Components
In 2009, we still align words
Yo lo haré mañana
I will do it tomorrow
Align words with a probabilistic model
1
Infer presence of larger structures from this alignment
2
Translate with the larger structures
3
Fragment-level correspondence is derived from word alignments
Thursday, November 5, 2009
![Page 106: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/106.jpg)
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
Grammar RulesWord Aligned Sentence Pair
Thursday, November 5, 2009
![Page 107: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/107.jpg)
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
Grammar RulesWord Aligned Sentence Pair
Thursday, November 5, 2009
![Page 108: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/108.jpg)
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
〈haré ;
will do〉
Grammar RulesWord Aligned Sentence Pair
Thursday, November 5, 2009
![Page 109: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/109.jpg)
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
〈haré ;
will do〉
Grammar RulesWord Aligned Sentence Pair
Thursday, November 5, 2009
![Page 110: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/110.jpg)
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
〈haré ;
will do〉
Grammar Rules
〈lo X de ... grado ;
X it gladly〉
Word Aligned Sentence Pair
Thursday, November 5, 2009
![Page 111: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/111.jpg)
Estimating Rule Parameters from Words
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
〈haré ;
will do〉
Grammar Rules
〈lo X de ... grado ;
X it gladly〉
Word Aligned Sentence Pair
Model Parameters
Relative frequency counts
c( lo X de muy buen grado ; X it gladly )P(es|en) =
c( * ; X it gladly )
Thursday, November 5, 2009
![Page 112: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/112.jpg)
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar Rules
Thursday, November 5, 2009
![Page 113: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/113.jpg)
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar Rules
Thursday, November 5, 2009
![Page 114: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/114.jpg)
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar RulesA
DV
Thursday, November 5, 2009
![Page 115: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/115.jpg)
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar RulesA
DV
Thursday, November 5, 2009
![Page 116: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/116.jpg)
Learning Grammars for Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
PRPVB
MD VP
VPNP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
Grammar Rules
〈lo haré ADV ;
will do it ADV〉
VP →
AD
V
Thursday, November 5, 2009
![Page 117: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/117.jpg)
What Happens in Practice
Je vois un chat
Machine translation system:
Model of translation
Thursday, November 5, 2009
![Page 118: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/118.jpg)
What Happens in Practice
Je vois un chat I see a spade
Machine translation system:
Model of translation
Thursday, November 5, 2009
![Page 119: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/119.jpg)
What Happens in Practice
Je vois un chat I see a spade
Machine translation system:
Model of translation
... appelez un chat un chat
... call a spade a spade
Sentence-aligned parallel corpus:
......
Thursday, November 5, 2009
![Page 120: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/120.jpg)
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .
A real word alignment(GIZA++ Model 4 with
grow-diag-final combination)
Thanks,thatdo [first; future]
ofverygooddegree.
Gloss
Thursday, November 5, 2009
![Page 121: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/121.jpg)
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .
A real word alignment(GIZA++ Model 4 with
grow-diag-final combination)
Thanks,thatdo [first; future]
ofverygooddegree.
Gloss
Thursday, November 5, 2009
![Page 122: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/122.jpg)
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .Thank you , I shall do so gladly .
A real word alignment(GIZA++ Model 4 with
grow-diag-final combination)
A sampled phrase alignment(our system)
Thursday, November 5, 2009
![Page 123: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/123.jpg)
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .Thank you , I shall do so gladly .
A real word alignment(GIZA++ Model 4 with
grow-diag-final combination)
A sampled phrase alignment(our system)
Thursday, November 5, 2009
![Page 124: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/124.jpg)
Example Machine Translation Pipeline
Thursday, November 5, 2009
![Page 125: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/125.jpg)
A Machine Translation Pipeline
Phrase Model Training (Moses)
Example from CMU INCA System (Vogel et al)
Thursday, November 5, 2009
![Page 126: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/126.jpg)
Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 3
[ara-tune4600:211] 1-best PoS-Tree
al
NNP
@-@
HYPH
baz
NNP
NML
NPB
NP-C63425995
declined
VBD
to
TO
give
VB
any
DT
statements
NNS30229081
NPB
upon
IN
his
PRP$
arrival
NN
NPB
in
IN
the
DT
province
NN
NPB
NP-C59736686
PP
NP-C
PP114470921
NP-C
VP-C
VP220719583
SG-C
VP
S-BAR
.
.
S151963398
GLUE265961794
TOP265961890
64
2190
13
3
7
26
New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 !v · !w 52.82
New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 !v · !w 52.82
Thursday, November 5, 2009
![Page 127: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/127.jpg)
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
Thursday, November 5, 2009
![Page 128: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/128.jpg)
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
2/5
Thursday, November 5, 2009
![Page 129: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/129.jpg)
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
2/5
3/6
Thursday, November 5, 2009
![Page 130: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/130.jpg)
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
2/5
3/6
5/7
7/8
Thursday, November 5, 2009
![Page 131: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/131.jpg)
Automatic Translation Evaluation
• Scores how similar an automatically generated hypothesis is to human-generated references
• Dozens of variants — most common is BLEU
Al - baz declined to make any statement
Al - baz declined to give any statement
Reference:
Hypothesis:
2/5
3/6
5/7
7/8
Systems are trained to optimize this
metric
Thursday, November 5, 2009
![Page 132: Machine Translation - Coursescourses.ischool.berkeley.edu/i256/f09/lectures/anlp...Machine translation is much lower cost, much faster, and much easier to access than convetional translation](https://reader030.vdocuments.us/reader030/viewer/2022041119/5f30bae0f93ef749087058a6/html5/thumbnails/132.jpg)
Integrating MT into Other Systems
• Speech-to-speech translation
• Cross-lingual information retrieval
• Translated optical character recognition
• Mobile device integration
• Text-oriented web services of all kinds
Thursday, November 5, 2009