large-context models for large-scale machine...
TRANSCRIPT
![Page 1: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/1.jpg)
Large-Context Models forLarge-Scale Machine Translation
John DeNeroDissertation Talk
![Page 2: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/2.jpg)
Statistical Language Systems are Working
1
![Page 3: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/3.jpg)
Statistical Language Systems are Working
1
![Page 4: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/4.jpg)
Statistical Language Systems are Working
1
Google spent $5.6 billion on infrastructure
in the last 3 years
How?
Data!
Google.com annual report of capital expenditure, “the majority of which was related to IT infrastructure investments.”1
![Page 5: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/5.jpg)
Statistical Language Systems are Working
1
Google spent $5.6 billion on infrastructure
in the last 3 years
How?
Data!
Google.com annual report of capital expenditure, “the majority of which was related to IT infrastructure investments.”1
![Page 6: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/6.jpg)
Statistical Language Systems are Working
1
Google spent $5.6 billion on infrastructure
in the last 3 years
How?
Data!
Google.com annual report of capital expenditure, “the majority of which was related to IT infrastructure investments.”1
![Page 7: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/7.jpg)
Statistical Language Systems are Working
1
People use [Google Translate] hundreds of millions of times a week.“ ” 2
“Google’s Computing Power Refines Translation Tool,” New York Times, 9 March 2010, Technology Section.2
Google spent $5.6 billion on infrastructure
in the last 3 years
How?
Data!
Google.com annual report of capital expenditure, “the majority of which was related to IT infrastructure investments.”1
![Page 8: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/8.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
![Page 9: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/9.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
• Document translation
• Broadcast monitoring
• Intelligence gathering
![Page 10: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/10.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
73%Most Internet users can’t read the English Web
• Document translation
• Broadcast monitoring
• Intelligence gathering
![Page 11: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/11.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
73%Most Internet users can’t read the English Web
• Document translation
• Broadcast monitoring
• Intelligence gathering
![Page 12: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/12.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
73%Most Internet users can’t read the English Web
• Document translation
• Broadcast monitoring
• Intelligence gathering
machine translated
![Page 13: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/13.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
73%Most Internet users can’t read the English Web
• Document translation
• Broadcast monitoring
• Intelligence gathering
machine translated
![Page 14: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/14.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
73%Most Internet users can’t read the English Web
• Document translation
• Broadcast monitoring
• Intelligence gathering
machine translated
John I’m in Berkeley
![Page 15: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/15.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
73%Most Internet users can’t read the English Web
• Document translation
• Broadcast monitoring
• Intelligence gathering
machine translated
John I’m in Berkeley
Juan Estoy en Berkeley
![Page 16: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/16.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
73%Most Internet users can’t read the English Web
• Document translation
• Broadcast monitoring
• Intelligence gathering
machine translated
John I’m in Berkeley
Juan Estoy en Berkeley
• Emergency room triage
• Military deployments
• Multilingual education
• 9-1-1 Response
• Commerce with tourists
![Page 17: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/17.jpg)
The Many Use(r)s of Machine Translation
Assimilation Dissemination Communication
73%Most Internet users can’t read the English Web
• Document translation
• Broadcast monitoring
• Intelligence gathering
machine translated
John I’m in Berkeley
Juan Estoy en Berkeley
• Emergency room triage
• Military deployments
• Multilingual education
• 9-1-1 Response
• Commerce with tourists
婆婆
John
[Grandma-in-law]
![Page 18: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/18.jpg)
Data-Driven Machine Translation
Parallel corpus gives translation examples
Yo lo haré de muy buen grado
I will do it gladly
Después lo veras
You will see later
Machine translation system:
Target language corpus gives examples of well-formed sentences
I will get to it later See you later He will do it
![Page 19: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/19.jpg)
Data-Driven Machine Translation
Parallel corpus gives translation examples
Yo lo haré de muy buen grado
I will do it gladly
Después lo veras
You will see later
Machine translation system:
Model of translation
Target language corpus gives examples of well-formed sentences
I will get to it later See you later He will do it
![Page 20: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/20.jpg)
I will do it later
Target language
Data-Driven Machine Translation
Parallel corpus gives translation examples
Yo lo haré de muy buen grado
I will do it gladly
Después lo veras
You will see later
Machine translation system:
Model of translation
Target language corpus gives examples of well-formed sentences
I will get to it later See you later He will do it
Yo lo haré despuésNOVEL SENTENCE
Source language
![Page 21: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/21.jpg)
VB
MD VP
VPNP
S
PRP ADV
Stitching Together Fragments
Yo lo haré de muy buen grado
I will do it gladly
Después lo veras
You will see later
PRPVB
MD VP
VPNP
S
PRP ADV
I will do it laterModel of translation
Yo lo haré después
Machine translation system:
Parallel corpus gives translation examples
![Page 22: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/22.jpg)
VB
MD VP
VPNP
S
PRP ADV
Stitching Together Fragments
Yo lo haré de muy buen grado
I will do it gladly
Después lo veras
You will see later
PRPVB
MD VP
VPNP
S
PRP ADV
I will do it laterModel of translation
Yo lo haré después
Machine translation system:
ADV ADV
Parallel corpus gives translation examples
![Page 23: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/23.jpg)
VB
MD VP
VPNP
S
PRP ADV
Stitching Together Fragments
Yo lo haré de muy buen grado
I will do it gladly
Después lo veras
You will see later
PRPVB
MD VP
VPNP
S
PRP ADV
I will do it laterModel of translation
Yo lo haré después
Machine translation system:S S
ADV ADV
Parallel corpus gives translation examples
![Page 24: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/24.jpg)
VB
MD VP
VPNP
S
PRP ADV
Stitching Together Fragments
Yo lo haré de muy buen grado
I will do it gladly
Después lo veras
You will see later
PRPVB
MD VP
VPNP
S
PRP ADV
I will do it laterModel of translation
Yo lo haré después
Machine translation system:S S
ADV ADV
Parallel corpus gives translation examples
![Page 25: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/25.jpg)
An Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82
Arabic source sentence:
![Page 26: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/26.jpg)
An Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82
Arabic source sentence:
Al-baz declined to make any statements upon his arrival in the province
Reference translation from a human translator:
![Page 27: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/27.jpg)
An Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 3
[ara-tune4600:211] 1-best PoS-Tree
al
NNP
@-@
HYPH
baz
NNP
NML
NPB
NP-C63425995
declined
VBD
to
TO
give
VB
any
DT
statements
NNS30229081
NPB
upon
IN
his
PRP$
arrival
NN
NPB
in
IN
the
DT
province
NN
NPB
NP-C59736686
PP
NP-C
PP114470921
NP-C
VP-C
VP220719583
SG-C
VP
S-BAR
.
.
S151963398
GLUE265961794
TOP265961890
64
2190
13
3
7
26
New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82
Arabic source sentence:
Al-baz declined to make any statements upon his arrival in the province
Reference translation from a human translator:
![Page 28: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/28.jpg)
An Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 3
[ara-tune4600:211] 1-best PoS-Tree
al
NNP
@-@
HYPH
baz
NNP
NML
NPB
NP-C63425995
declined
VBD
to
TO
give
VB
any
DT
statements
NNS30229081
NPB
upon
IN
his
PRP$
arrival
NN
NPB
in
IN
the
DT
province
NN
NPB
NP-C59736686
PP
NP-C
PP114470921
NP-C
VP-C
VP220719583
SG-C
VP
S-BAR
.
.
S151963398
GLUE265961794
TOP265961890
64
2190
13
3
7
26
New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82
Arabic source sentence:
Al-baz declined to make any statements upon his arrival in the province
Reference translation from a human translator:
New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2
New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .
Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .
[ara-tune4600:211] 1-best Dot Productfeature weight value product
derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0
is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76
lm1-unk 30.08 0 0lm2 0.74 26.66 19.79
lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22
nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30
text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38
unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82
Features:
![Page 29: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/29.jpg)
The Steps in a Modern Translation System
Choose a translation
Learn a model
Apply the model
![Page 30: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/30.jpg)
The Steps in a Modern Translation System
Choose a translation
Learn a model
Apply the model
Proceedings
United Nations
will do it ADVVP
lo haré ADV
![Page 31: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/31.jpg)
The Steps in a Modern Translation System
Choose a translation
Learn a model
Apply the model
Yo lo haré despuésNOVEL SENTENCE
Later do it I will
Proceedings
United Nations
will do it ADVVP
lo haré ADV
![Page 32: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/32.jpg)
The Steps in a Modern Translation System
Choose a translation
Learn a model
Apply the model
Yo lo haré despuésNOVEL SENTENCE
Later do it I willI will later do it
That I’ll do laterLater that I’ll do
I will do it later...
...
Proceedings
United Nations
will do it ADVVP
lo haré ADV
![Page 33: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/33.jpg)
The Steps in a Modern Translation System
Choose a translation
Learn a model
Apply the model
Yo lo haré despuésNOVEL SENTENCE
Later do it I willI will later do it
That I’ll do laterLater that I’ll do
I will do it later...
...
Proceedings
United Nations
will do it ADVVP
lo haré ADV
![Page 34: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/34.jpg)
The Steps in a Modern Translation System
Choose a translation
Learn a model
Apply the model
Yo lo haré despuésNOVEL SENTENCE
Later do it I willI will later do it
That I’ll do laterLater that I’ll do
I will do it later...
...
Proceedings
United Nations
will do it ADVVP
lo haré ADV
![Page 35: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/35.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
![Page 36: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/36.jpg)
The Alignment Problem in Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
![Page 37: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/37.jpg)
The Alignment Problem in Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
Thank you
Gracias
![Page 38: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/38.jpg)
The Alignment Problem in Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
Thank you
muy buen
Thank you
Gracias
![Page 39: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/39.jpg)
The Alignment Problem in Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
Thank you
muy buen
Thank you
Gracias
![Page 40: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/40.jpg)
The Alignment Problem in Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
About the task:• A lot can be inferred
from lexical statistics
• Correct alignments are not one-to-one
• Some cases are tricky, even for people
Thank you
muy buen
Thank you
Gracias
![Page 41: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/41.jpg)
The Alignment Problem in Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
About the task:• A lot can be inferred
from lexical statistics
• Correct alignments are not one-to-one
• Some cases are tricky, even for people
Thank you
muy buen
Thank you
Gracias
![Page 42: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/42.jpg)
The Alignment Problem in Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
About the task:• A lot can be inferred
from lexical statistics
• Correct alignments are not one-to-one
• Some cases are tricky, even for people
About solutions:
• Word-to-word links
• Learning driven by conditional word distributions
Thank you
muy buen
Thank you
Gracias
![Page 43: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/43.jpg)
The Alignment Problem in Translation
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
About the task:• A lot can be inferred
from lexical statistics
• Correct alignments are not one-to-one
• Some cases are tricky, even for people
About solutions:
• Word-to-word links
• Learning driven by conditional word distributions
P(gracias|you)
Thank you
muy buen
Thank you
Gracias
![Page 44: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/44.jpg)
Thank you , I will do it gladly .
Large-Context Alignment Challenges
Gracias,loharédemuybuengrado.
Goal: Model multi-word structures during alignment
![Page 45: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/45.jpg)
Thank you , I will do it gladly .
Large-Context Alignment Challenges
Gracias,loharédemuybuengrado.
Goal: Model multi-word structures during alignment
1
P(gracias|you)
Challenge• Jointly infer phrase
boundaries and alignments
• Boundaries depend on both languages
P(gracias,Thank you)
![Page 46: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/46.jpg)
Thank you , I will do it gladly .
Large-Context Alignment Challenges
Gracias,loharédemuybuengrado.
Goal: Model multi-word structures during alignment
Challenge• Capture context
• Compose phrases
1
2
P(gracias|you)
Challenge• Jointly infer phrase
boundaries and alignments
• Boundaries depend on both languages
P(gracias,Thank you)�(lo hare, I will do it)
![Page 47: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/47.jpg)
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Paradigm:
![Page 48: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/48.jpg)
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Phrase pairs are generated independentlyProcess:
Paradigm:
![Page 49: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/49.jpg)
Thank you , I will do it gladly
Gracias , lo haré de muy buen grado
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Phrase pairs are generated independentlyProcess:
Paradigm:
![Page 50: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/50.jpg)
Thank you , I will do it gladly
Gracias , lo haré de muy buen grado
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Phrase pairs are generated independentlyProcess:
Paradigm:
![Page 51: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/51.jpg)
Thank you , I will do it gladly
Gracias , lo haré de muy buen grado
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Phrase pairs are generated independentlyProcess:
Paradigm:
![Page 52: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/52.jpg)
Thank you , I will do it gladly
Gracias , lo haré de muy buen grado
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Phrase pairs are generated independentlyProcess:
Paradigm:
![Page 53: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/53.jpg)
Thank you , I will do it gladly
Gracias , lo haré de muy buen grado
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Phrase pairs are generated independentlyProcess:
Paradigm:
![Page 54: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/54.jpg)
Thank you , I will do it gladly
Gracias , lo haré de muy buen grado
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Phrase pairs are generated independentlyProcess:
Paradigm:
![Page 55: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/55.jpg)
Thank you , I will do it gladly
Gracias , lo haré de muy buen grado
Modeling Phrasal Correspondence1
Train a generative model that explains observed translations via latent structure
Phrase pairs are generated independentlyProcess:
Paradigm:
Optimization: Explain all translations with shared parameters
![Page 56: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/56.jpg)
Modeling Phrasal Correspondence
P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·
Thank you , I will do it gladly .
Gracias
,
lo
haré
de
muy
buen
grado
.
1
We learn , a multinomial distribution over phrase pairs �
![Page 57: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/57.jpg)
Modeling Phrasal Correspondence
P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·
Thank you , I will do it gladly .
Gracias
,
lo
haré
de
muy
buen
grado
.
1
We learn , a multinomial distribution over phrase pairs �
P (A = a) =�
(e,s)�a
�(e, s)
* Terms omitted: Phrase pair count and phrase permutation
*
![Page 58: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/58.jpg)
Modeling Phrasal Correspondence
P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·
Thank you , I will do it gladly .
Gracias
,
lo
haré
de
muy
buen
grado
.
1
We learn , a multinomial distribution over phrase pairs �
L(�) =�
d�D
�
��
a�A(d)
P (A = a)
�
�
P (A = a) =�
(e,s)�a
�(e, s)
* Terms omitted: Phrase pair count and phrase permutation
*
![Page 59: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/59.jpg)
Modeling Phrasal Correspondence
P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·
Thank you , I will do it gladly .
Gracias
,
lo
haré
de
muy
buen
grado
.
For each sentence pair:
1
We learn , a multinomial distribution over phrase pairs �
L(�) =�
d�D
�
��
a�A(d)
P (A = a)
�
�
P (A = a) =�
(e,s)�a
�(e, s)
* Terms omitted: Phrase pair count and phrase permutation
*
![Page 60: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/60.jpg)
Modeling Phrasal Correspondence
P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·
Thank you , I will do it gladly .
Gracias
,
lo
haré
de
muy
buen
grado
.
For each sentence pair:For each alignment:
1
We learn , a multinomial distribution over phrase pairs �
L(�) =�
d�D
�
��
a�A(d)
P (A = a)
�
�
P (A = a) =�
(e,s)�a
�(e, s)
* Terms omitted: Phrase pair count and phrase permutation
*
![Page 61: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/61.jpg)
Modeling Phrasal Correspondence
P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·
Thank you , I will do it gladly .
Gracias
,
lo
haré
de
muy
buen
grado
.
For each sentence pair:For each alignment:
Maximizing likelihood gives a degenerate solution: huge phrases!
1
We learn , a multinomial distribution over phrase pairs �
L(�) =�
d�D
�
��
a�A(d)
P (A = a)
�
�
P (A = a) =�
(e,s)�a
�(e, s)
* Terms omitted: Phrase pair count and phrase permutation
*
![Page 62: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/62.jpg)
Modeling Phrasal Correspondence
P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·
Thank you , I will do it gladly .
Gracias
,
lo
haré
de
muy
buen
grado
.
For each sentence pair:For each alignment:
Maximizing likelihood gives a degenerate solution: huge phrases!
1
We learn , a multinomial distribution over phrase pairs �
L(�) =�
d�D
�
��
a�A(d)
P (A = a)
�
�
P (A = a) =�
(e,s)�a
�(e, s)
* Terms omitted: Phrase pair count and phrase permutation
*
![Page 63: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/63.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
![Page 64: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/64.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
![Page 65: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/65.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
![Page 66: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/66.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
![Page 67: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/67.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
P(z|c) =c(z) + � · �0(z)|c| + �
![Page 68: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/68.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
P(z|c) =c(z) + � · �0(z)|c| + �
![Page 69: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/69.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
P(z|c) =c(z) + � · �0(z)|c| + �
![Page 70: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/70.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
P(z|c) =c(z) + � · �0(z)|c| + �
grow fixed
![Page 71: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/71.jpg)
.
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado
Guiding Phrasal Correspondence Models
� � DP(�0,�)
Prefers short phrases�0Base distribution:
DP( · ,�) Non-parametric cache modelDirichlet process:
Iterative realignment of all the data by sampling Consistent, efficient estimation
1
(Thank you, Gracias)
(Thanks, Gracias)
(Thank you, Muchas gracias)
...
...English-Spanish phrase pair Count
Phrase Pair Cache (c):
P(z|c) =c(z) + � · �0(z)|c| + �
grow fixed
![Page 72: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/72.jpg)
Thank you , I shall do so gladly .
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .
A state-of-the-art word-level alignment
A sampled phrase alignmentfrom our system
1
![Page 73: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/73.jpg)
Thank you , I shall do so gladly .
What Happens in Practice
Gracias
,
lo
haré
de
muy
buen
grado
.
Thank you , I shall do so gladly .
A state-of-the-art word-level alignment
A sampled phrase alignmentfrom our system
1
![Page 74: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/74.jpg)
Performance Results
Translation performance in a phrase-based system (Moses)for Spanish-to-English parliamentary proceedings (Europarl)
Translation quality (BLEU)
28293031
30.129.8
Word-level baselinePhrase-level model [DeNero et al. EMNLP ’08]*
0246
3.14.4Millions of
phrase pairs
Higher is better
Lower is cheaper
* John DeNero, Alex Bouchard-Côté, and Dan Klein. Sampling Alignment Structure under a Bayesian Translation Model, EMNLP 2008.
1
![Page 75: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/75.jpg)
Subsequent Work
We described a non-parametric Bayesian prior and a consistent sampling procedure (EMNLP 2008)
• Phil Blunsom, Trevor Cohn, Chris Dyer, and Miles Osborne. A Gibbs sampler for phrasal synchronous grammar induction, ACL 2009.
• Abhishek Arun, Chris Dyer, Barry Haddow, Phil Blunsom, Adam Lopez, and Philipp Koehn. Monte Carlo inference and maximization for phrase-based translation, CoNLL 2009.
• Ding Liu and Daniel Gildea. Bayesian Learning of Phrasal Tree-to-String Templates, EMNLP 2009.
• Matt Post and Daniel Gildea. Bayesian Learning of a Tree Substitution Grammars, ACL 2009.
• Phil Blunsom and Trevor Cohn. Inducing Synchronous Grammars with Slice Sampling, NAACL 2010.
• Trevor Cohn and Phil Blunsom. A Bayesian Model of Syntax-Directed Tree to String Grammar Induction, EMNLP 2009.
1
![Page 76: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/76.jpg)
A Model of Composed Phrases
过去
In the past two years
[past]
两 [two]
年 [year]
中 [in]
2
![Page 77: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/77.jpg)
A Model of Composed Phrases
过去
In the past two years
[past]
两 [two]
年 [year]
中 [in]
2
![Page 78: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/78.jpg)
A Model of Composed Phrases
过去
In the past two years
[past]
两 [two]
年 [year]
中 [in]
2
![Page 79: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/79.jpg)
A Model of Composed Phrases
过去
In the past two years
[past]
两 [two]
年 [year]
中 [in]
2
![Page 80: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/80.jpg)
A Model of Composed Phrases
过去 [past]
两 [two]
年 [year]
中 [in]
A model can predict the whole analysis above, including minimal links & composed phrase pairs .
In the past two years
2
![Page 81: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/81.jpg)
A Model of Composed Phrases
过去 [past]
两 [two]
年 [year]
中 [in]
A model can predict the whole analysis above, including minimal links & composed phrase pairs .
In the past two years
< P(in|中) = 0.8 , InDictionary = 1.0, ... >
Features on
word links:
2
![Page 82: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/82.jpg)
A Model of Composed Phrases
过去 [past]
两 [two]
年 [year]
中 [in]
A model can predict the whole analysis above, including minimal links & composed phrase pairs .
In the past two years
< Count(the past two, 过去 两 ) = 7, Size(3,2) = 1, ... > phrase pairs:
< P(in|中) = 0.8 , InDictionary = 1.0, ... >
Features on
word links:
2
![Page 83: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/83.jpg)
Learning from Supervised Data
过去 [past]
两 [two]
年 [year]
中 [in]
Guess: Model Prediction
In the past two years
2
![Page 84: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/84.jpg)
Learning from Supervised Data
过去 [past]
两 [two]
年 [year]
中 [in]
Guess: Model Prediction
In the past two years
Gold: Human Annotation
In the past two years
2
![Page 85: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/85.jpg)
Learning from Supervised Data
过去 [past]
两 [two]
年 [year]
中 [in]
Guess: Model Prediction
In the past two years
Gold: Human Annotation
In the past two years
2
![Page 86: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/86.jpg)
Learning from Supervised Data
过去 [past]
两 [two]
年 [year]
中 [in]
Guess: Model Prediction
In the past two years
Gold: Human Annotation
In the past two years
2
![Page 87: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/87.jpg)
Learning from Supervised Data
过去 [past]
两 [two]
年 [year]
中 [in]
Guess: Model Prediction
In the past two years
Gold: Human Annotation
In the past two years
Loss function: Number of differing rounded rectangles
2
![Page 88: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/88.jpg)
Learning from Supervised Data
过去 [past]
两 [two]
年 [year]
中 [in]
Guess: Model Prediction
In the past two years
Gold: Human Annotation
In the past two years
Online learning (MIRA) adjusts model parameters to prefer the gold over the guess by a margin of the loss
Loss function: Number of differing rounded rectangles
2
![Page 89: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/89.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 90: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/90.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 91: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/91.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 92: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/92.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 93: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/93.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 94: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/94.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 95: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/95.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 96: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/96.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 97: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/97.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 98: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/98.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 99: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/99.jpg)
过去 [past]
两 [two]
年 [year]
中 [in]
In the past two years
Finding the Optimal Correspondence2
Hierarchical decomposition
ITG parser with a state space that tracks peripheral alignments for each region
arg maxy�ITG(x)
� · [ �word(x, y) + �phrase(x, y) ]
![Page 100: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/100.jpg)
Experimental Results
Unsupervised word model baselineSupervised word model [Haghighi, Blitzer, DeNero, and Klein. ACL ’09]*Composed Phrase Pair Model [DeNero and Klein. In submission]**
** John DeNero and Dan Klein. Supervised Modeling of Extraction Sets for Machine Translation, in submission.
* Aria Haghighi, John Blitzer, John DeNero, and Dan Klein. Better Word Alignments with Supervised ITG Models, ACL 2009.
55
65
75
71.668.464.1
Phrase Pair F1
Alignment quality relative to human-annotated data
33343536
35.934.734.5
Translation quality for Chinese-to-English
BLEU
2
![Page 101: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/101.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
![Page 102: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/102.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
Large data sets provide statistics for larger structures
![Page 103: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/103.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
Non-parametric models scale with the data
Large data sets provide statistics for larger structures
![Page 104: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/104.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
Non-parametric models scale with the data
Large data sets provide statistics for larger structures
The more context we incorporate, the better we do
![Page 105: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/105.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
![Page 106: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/106.jpg)
Extracting Translation Rules
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
![Page 107: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/107.jpg)
Extracting Translation Rules
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
VP
PRPVB
MD VP
NP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
![Page 108: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/108.jpg)
VP
Extracting Translation Rules
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
VP
PRPVB
MD VP
NP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
![Page 109: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/109.jpg)
VP
Extracting Translation Rules
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
ADV
AD
V
VP
PRPVB
MD VP
NP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
![Page 110: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/110.jpg)
VP
Extracting Translation Rules
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
ADV
AD
V
VP
PRPVB
MD VP
NP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
will do it ADVVP
lo haré ADV
![Page 111: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/111.jpg)
VP
Extracting Translation Rules
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
ADV
AD
V
VP
PRPVB
MD VP
NP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
will do it ADVVP
lo haré ADV
![Page 112: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/112.jpg)
VP
Extracting Translation Rules
Thank you , I will do it gladly .
Gracias,loharédemuybuengrado.
ADV
AD
V
VP
PRPVB
MD VP
NP
.S
PRP ADV
S
S
VB NP
PRP
VP
,
will do it ADVVP
lo haré ADV
Frequency statistics on these rules guide
translation
![Page 113: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/113.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
Translation:
Source:
![Page 114: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/114.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
Translation:
Source:
![Page 115: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/115.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
NN
Translation:
Source:
![Page 116: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/116.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
NN
bedroomNN
Translation:
Source:
![Page 117: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/117.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
nuevoJJ new
grandeJJ big
pequeñoJJ small
NN
bedroomNN
Translation:
Source:
![Page 118: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/118.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
nuevoJJ new
grandeJJ big
pequeñoJJ small
NN JJ JJ JJ
newJJ
bigJJ
smallJJ
bedroomNN
Translation:
Source:
![Page 119: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/119.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
nuevoJJ new
grandeJJ big
pequeñoJJ small
Mi NN JJNP
My JJ NN
NN JJ JJ JJ
newJJ
bigJJ
smallJJ
bedroomNN
Translation:
Source:
![Page 120: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/120.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
nuevoJJ new
grandeJJ big
pequeñoJJ small
Mi NN JJNP
My JJ NN
NN JJ JJ JJ
NP
newJJ
bigJJ
smallJJ
bedroomNN
My
NP
Translation:
Source:
![Page 121: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/121.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
nuevoJJ new
grandeJJ big
pequeñoJJ small
Mi NN JJNP
My JJ NN
NP no es ni JJ ni JJSNP is neither JJ nor JJ
NN JJ JJ JJ
NP
newJJ
bigJJ
smallJJ
bedroomNN
My
NP
Translation:
Source:
![Page 122: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/122.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
nuevoJJ new
grandeJJ big
pequeñoJJ small
Mi NN JJNP
My JJ NN
NP no es ni JJ ni JJSNP is neither JJ nor JJ
NN JJ JJ JJ
NP
S
NP no es ni JJ ni JJ
newJJ
bigJJ
smallJJ
bedroomNN
My
NP
Translation:
Source:
![Page 123: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/123.jpg)
Synchronous Context-Free Grammars
Mi dormitorio nuevo no es ni grande ni pequeño
GrammarDerivation
dormitorioNN bedroom
nuevoJJ new
grandeJJ big
pequeñoJJ small
Mi NN JJNP
My JJ NN
NP no es ni JJ ni JJSNP is neither JJ nor JJ
NN JJ JJ JJ
NP
S
NP no es ni JJ ni JJ
newJJ
bigJJ
smallJJ
bedroomNN
S
is neither norMy
NP
Translation:
Source:
![Page 124: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/124.jpg)
The Size of the Grammar
0
22,500
45,000
67,500
90,000
1 2 3 4 5 6 7 8 9 10+
332,000 rules match a 30-word sentence to be translated
Size of the source-side yield (terminals + non-terminals)
Rule
Cou
nt
A grammar learned from 220 million words of Arabic-to-English example translations:
John DeNero, Adam Pauls, Mohit Bansal, and Dan Klein. Efficient Parsing for Transducer Grammars, NAACL 2009.
![Page 125: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/125.jpg)
The Size of the Grammar
0
22,500
45,000
67,500
90,000
1 2 3 4 5 6 7 8 9 10+
332,000 rules match a 30-word sentence to be translated
Size of the source-side yield (terminals + non-terminals)
Rule
Cou
nt
A grammar learned from 220 million words of Arabic-to-English example translations:
NP no es ni JJ ni JJ . NP is neither JJ nor JJ .
S
John DeNero, Adam Pauls, Mohit Bansal, and Dan Klein. Efficient Parsing for Transducer Grammars, NAACL 2009.
![Page 126: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/126.jpg)
The Structure of the Grammar
NP no es ni JJ ni JJ
S
Mi dormitorio nuevo no es ni grande ni pequeño
NP JJ JJ
S → NP no es ni JJ ni JJ
![Page 127: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/127.jpg)
The Structure of the Grammar
Mi dormitorio nuevo no es ni grande ni pequeño
S → NP no es ni JJ ni JJ
NP JJ JJ
![Page 128: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/128.jpg)
ni JJ ni JJ
ADJP
ADJP → ni JJ ni JJ
The Structure of the Grammar
Mi dormitorio nuevo no es ni grande ni pequeño
S → NP no es ni JJ ni JJ
NP JJ JJ
![Page 129: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/129.jpg)
es ADJP
VP
VP→ es ADJP
ni JJ ni JJ
ADJP
ADJP → ni JJ ni JJ
The Structure of the Grammar
Mi dormitorio nuevo no es ni grande ni pequeño
S → NP no es ni JJ ni JJ
NP JJ JJ
![Page 130: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/130.jpg)
es ADJP
VP
VP→ es ADJP
ni JJ ni JJ
ADJP
ADJP → ni JJ ni JJ
The Structure of the Grammar
Mi dormitorio nuevo no es ni grande ni pequeño
S → NP no es ni JJ ni JJ
NP JJ JJ
S
NP no VP
S → NP no VP
![Page 131: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/131.jpg)
Coarse-to-Fine Translation
Mi dormitorio nuevo no es ni grande ni pequeño
![Page 132: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/132.jpg)
Coarse-to-Fine Translation
1 Apply a subset of the grammar with only small rules
Mi dormitorio nuevo no es ni grande ni pequeño
![Page 133: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/133.jpg)
Coarse-to-Fine Translation
1 Apply a subset of the grammar with only small rules
Mi dormitorio nuevo no es ni grande ni pequeño
ADJP
VP
S
NP
JJ JJJJNN
![Page 134: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/134.jpg)
Coarse-to-Fine Translation
1 Apply a subset of the grammar with only small rules
2 Prune away unlikely portions of the search space
Mi dormitorio nuevo no es ni grande ni pequeño
ADJP
VP
S
NP
JJ JJJJNN
![Page 135: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/135.jpg)
Coarse-to-Fine Translation
1 Apply a subset of the grammar with only small rules
2 Prune away unlikely portions of the search space
Mi dormitorio nuevo no es ni grande ni pequeño
ADJP
VP
S
NP
JJ JJJJNN
![Page 136: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/136.jpg)
Coarse-to-Fine Translation
1 Apply a subset of the grammar with only small rules
2 Prune away unlikely portions of the search space
Mi dormitorio nuevo no es ni grande ni pequeño
![Page 137: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/137.jpg)
Coarse-to-Fine Translation
1 Apply a subset of the grammar with only small rules
2 Prune away unlikely portions of the search space
Apply the full translation grammar to the pruned space
3
Mi dormitorio nuevo no es ni grande ni pequeño
![Page 138: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/138.jpg)
Experimental Results
* John DeNero, Adam Pauls, Mohit Bansal, and Dan Klein. Efficient Parsing for Transducer Grammars, NAACL 2009.
Baseline
+Optimization
+Transformation
+Coarse-to-Fine
0 75 150 225 300
50
104
181
264
5x speed-up with the largest translation grammars in use today (ISI Syntax-Based MT System) [DeNero et al. NAACL ’09]*
Minutes required to analyze a 300 sentence test set
![Page 139: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/139.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
![Page 140: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/140.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
Fully exploiting large data sets requires searching over very large spaces
![Page 141: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/141.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
Coarse-to-fine inference is a powerful technique for doing so
Fully exploiting large data sets requires searching over very large spaces
![Page 142: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/142.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
![Page 143: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/143.jpg)
Even the Best Models are Wrong
Model
NOVEL SENTENCE
Yo lo haré después
![Page 144: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/144.jpg)
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
Even the Best Models are Wrong
Model
NOVEL SENTENCE
Yo lo haré después
![Page 145: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/145.jpg)
Even the Best Models are Wrong
Total model score for 1000 sentences
Tran
slat
ion
Qua
lity
(BLE
U)
Samples from output spaceSamples near maximumHighest scoring translation
Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
Yo lo haré después
![Page 146: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/146.jpg)
Even the Best Models are Wrong
511660 513245 51483050.0
50.2
50.4
50.6
50.8
Total model score for 1000 sentences
Tran
slat
ion
Qua
lity
(BLE
U)
Samples from output spaceSamples near maximumHighest scoring translation
Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
Yo lo haré después
![Page 147: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/147.jpg)
Even the Best Models are Wrong
511660 513245 51483050.0
50.2
50.4
50.6
50.8
Total model score for 1000 sentences
Tran
slat
ion
Qua
lity
(BLE
U)
Samples from output spaceSamples near maximumHighest scoring translation
Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
Yo lo haré después
![Page 148: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/148.jpg)
Even the Best Models are Wrong
511660 513245 51483050.0
50.2
50.4
50.6
50.8
Total model score for 1000 sentences
Tran
slat
ion
Qua
lity
(BLE
U)
Samples from output spaceSamples near maximumHighest scoring translation
Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
Yo lo haré después
![Page 149: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/149.jpg)
Even the Best Models are Wrong
511660 513245 51483050.0
50.2
50.4
50.6
50.8
Total model score for 1000 sentences
Tran
slat
ion
Qua
lity
(BLE
U)
Samples from output spaceSamples near maximumHighest scoring translation
Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
Yo lo haré después
![Page 150: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/150.jpg)
Consensus by Averaging Over Sentences
Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
Yo lo haré después
![Page 151: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/151.jpg)
Consensus by Averaging Over Sentences
Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*
* Leo Tolstoy. Анна Каренина. 1877.
Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
Yo lo haré después
![Page 152: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/152.jpg)
Consensus by Averaging Over Sentences
Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*
* Leo Tolstoy. Анна Каренина. 1877.
Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.
Yo lo haré después
![Page 153: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/153.jpg)
Consensus by Averaging Over Sentences
Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*
* Leo Tolstoy. Анна Каренина. 1877.
Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.
“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...
Yo lo haré después
![Page 154: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/154.jpg)
Consensus by Averaging Over Sentences
Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*
* Leo Tolstoy. Анна Каренина. 1877.
Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.
“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...
1 1 1 0 0 1
1 1 1 0 0 0
1 1 0 1 1 0
1 1 0 1 0 0
Yo lo haré después
![Page 155: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/155.jpg)
Consensus by Averaging Over Sentences
Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*
* Leo Tolstoy. Анна Каренина. 1877.
Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.
“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...
1 1 1 0 0 1
1 1 1 0 0 0
1 1 0 1 1 0
1 1 0 1 0 0
Yo lo haré después
![Page 156: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/156.jpg)
Consensus by Averaging Over Sentences
Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*
* Leo Tolstoy. Анна Каренина. 1877.
Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.
“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...
1 1 1 0 0 1
1 1 1 0 0 0
1 1 0 1 1 0
1 1 0 1 0 0
Yo lo haré después
![Page 157: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/157.jpg)
Consensus by Averaging Over Sentences
Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*
* Leo Tolstoy. Анна Каренина. 1877.
Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.
“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...
1 1 1 0 0 1
1 1 1 0 0 0
1 1 0 1 1 0
1 1 0 1 0 0
0.12
0.10
0.07
0.07...
Yo lo haré después
![Page 158: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/158.jpg)
Consensus by Averaging Over Sentences
Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*
* Leo Tolstoy. Анна Каренина. 1877.
Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model
I will later do itLater do it I will
That I’ll do laterLater that I’ll do
...
NOVEL SENTENCE
** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.
“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...
0.97 0.98 0.54 0.41 0.34 0.121.00
1 1 1 0 0 1
1 1 1 0 0 0
1 1 0 1 1 0
1 1 0 1 0 0
0.12
0.10
0.07
0.07...
Expected output
Yo lo haré después
![Page 159: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/159.jpg)
Phrase Expectations from Forests
Yo lo haré después
![Page 160: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/160.jpg)
Phrase Expectations from Forests
Yo lo haré después
YoNP
I
I
![Page 161: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/161.jpg)
Phrase Expectations from Forests
Yo lo haré después
YoNP
I
I
lo haréVP
shall do it
shall do it
![Page 162: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/162.jpg)
Phrase Expectations from Forests
Yo lo haré después
YoNP
I
I
lo haréVP
shall do it
shall do it
loPRP it
it
![Page 163: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/163.jpg)
Phrase Expectations from Forests
Yo lo haré después
YoNP
I
I
lo haréVP
shall do it
shall do it
loPRP it
it
PRP haréVP
will do PRP
will do it
“do it”
![Page 164: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/164.jpg)
Phrase Expectations from Forests
Yo lo haré después
YoNP
I
I
lo haréVP
shall do it
shall do it
loPRP it
it
PRP haréVP
will do PRP
will do it
NP VPS
NP VP“I will”
“do it”
![Page 165: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/165.jpg)
NP VPS
NP VP
Phrase Expectations from Forests
Yo lo haré después
YoNP
I
I
lo haréVP
shall do it
shall do it
loPRP it
it
PRP haréVP
will do PRP
will do it
NP VPS
NP VP“I will”
“I shall”
“do it”
![Page 166: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/166.jpg)
NP VPS
NP VP
Phrase Expectations from Forests
Yo lo haré después
YoNP
I
I
lo haréVP
shall do it
shall do it
loPRP it
it
PRP haréVP
will do PRP
will do it
NP VPS
NP VP
I {will, shall} do it
“I will”
“I shall”
“do it”
![Page 167: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/167.jpg)
NP VPS
NP VP
Phrase Expectations from Forests
Yo lo haré después
YoNP
I
I
lo haréVP
shall do it
shall do it
loPRP it
it
PRP haréVP
will do PRP
will do it
NP VPS
NP VP
I {will, shall} do it
I {will, shall} do it later
S despuésS
S later
“it later”“I will”
“I shall”
“do it”
![Page 168: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/168.jpg)
Single System Translation Results
Translation quality in ISI’s Full-Scale Arabic-to-English Hierarchical Translation System
Translation Quality (BLEU)
50
51
52
5353.0
52.252.0
Model-Only BaselineConsensus from a List [DeNero et al. ACL ’09]*Consensus from a Forest [DeNero et al. ACL ’09]*
* John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.
![Page 169: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/169.jpg)
Model 1 Model 2 Model 3
Translating Using Multiple Systems
NOVEL SENTENCE
Yo lo haré después
![Page 170: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/170.jpg)
Model 1 Model 2 Model 3
Translating Using Multiple Systems
NOVEL SENTENCE
Yo lo haré después
![Page 171: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/171.jpg)
Model 1 Model 2 Model 3
Translating Using Multiple Systems
NOVEL SENTENCE
Yo lo haré después
0.96 0.99 0.57 0.44 0.30 0.00
0.94 0.92 0.81 0.67 0.11 0.09
0.97 0.98 0.54 0.41 0.34 0.12
![Page 172: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/172.jpg)
Model 1 Model 2 Model 3
Translating Using Multiple Systems
NOVEL SENTENCE
Yo lo haré después
I will do it later
0.96 0.99 0.57 0.44 0.30 0.00
0.94 0.92 0.81 0.67 0.11 0.09
0.97 0.98 0.54 0.41 0.34 0.12
![Page 173: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/173.jpg)
Consensus Modeling FAQ
![Page 174: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/174.jpg)
Consensus Modeling FAQ
Q: How do we combine different models?
![Page 175: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/175.jpg)
Consensus Modeling FAQ
Q: How do we combine different models?
Models Which model?
Phrase lengths
Expected counts
Model score
Length
A: Train a linear consensus model scoring a derivation d:
I�
i=1
�w(�)
i �i(d) +4�
n=1
w(n)i v(n)
i (d)
�+ w(b) · b(d) + w(�) · �(d)
![Page 176: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/176.jpg)
Consensus Modeling FAQ
Q: How do we combine different models?
Models Which model?
Phrase lengths
Expected counts
Model score
Length
A: Train a linear consensus model scoring a derivation d:
I�
i=1
�w(�)
i �i(d) +4�
n=1
w(n)i v(n)
i (d)
�+ w(b) · b(d) + w(�) · �(d)
Q: What output sentences are considered?
![Page 177: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/177.jpg)
Consensus Modeling FAQ
Q: How do we combine different models?
Models Which model?
Phrase lengths
Expected counts
Model score
Length
A: Train a linear consensus model scoring a derivation d:
I�
i=1
�w(�)
i �i(d) +4�
n=1
w(n)i v(n)
i (d)
�+ w(b) · b(d) + w(�) · �(d)
Q: What output sentences are considered?
A: The union of output spaces of models:R
Rpb Rh
![Page 178: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/178.jpg)
Multi-System Translation Results
Google’s Full-Scale Research Translation System for Arabic-to-English
Translation quality (BLEU)
42
43
44
45
46
45.3
43.9
Best Single-System Model-Only BaselineMulti-System Forest-Based Consensus [DeNero et al. NAACL ’10]*
* John DeNero, Shankar Kumar, Ciprian Chelba, and Franz Och. Model Combination for Machine Translation, NAACL 2010.
![Page 179: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/179.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
![Page 180: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/180.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
Statistical models provide distributions over outputs
![Page 181: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/181.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
Statistical models provide distributions over outputs
Leveraging those distributions improves performance
![Page 182: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/182.jpg)
The Steps in a Modern Translation System
Learn a model
Apply the model
Choose a translation
Statistical models provide distributions over outputs
Leveraging those distributions improves performance
Compact representations can enable large-scale computation
![Page 183: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/183.jpg)
Summary of Translation Research
Learn a model
Apply the model
Choose a translation
![Page 184: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/184.jpg)
Summary of Translation Research
Learn a model
Apply the model
Choose a translation
Non-parametric models
Large-context models
[DeNero et al. EMNLP ’08]
[DeNero & Klein. ACL ’10]
![Page 185: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/185.jpg)
Summary of Translation Research
Learn a model
Apply the model
Choose a translation
Non-parametric models
Large-context models Coarse-to-fine
[DeNero et al. NAACL ’09][DeNero et al. EMNLP ’08]
[DeNero & Klein. ACL ’10]
![Page 186: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/186.jpg)
Summary of Translation Research
Learn a model
Apply the model
Choose a translation
Non-parametric models
Large-context models Coarse-to-fine
Compact encodings
Full distributions
[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]
[DeNero et al. NAACL ’10]
[DeNero et al. EMNLP ’08]
[DeNero & Klein. ACL ’10]
![Page 187: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/187.jpg)
Summary of Translation Research
Learn a model
Apply the model
Choose a translation
Non-parametric models
Large-context models Coarse-to-fine
Compact encodings
Full distributions
[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]
[DeNero et al. NAACL ’10]
[DeNero et al. EMNLP ’08]
[DeNero & Klein. ACL ’10]
Are we done yet?
![Page 188: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/188.jpg)
Summary of Translation Research
Learn a model
Apply the model
Choose a translation
Non-parametric models
Large-context models Coarse-to-fine
Compact encodings
Full distributions
[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]
[DeNero et al. NAACL ’10]
[DeNero et al. EMNLP ’08]
[DeNero & Klein. ACL ’10]
Are we done yet?
• Morphology in alignment modeling
![Page 189: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/189.jpg)
Summary of Translation Research
Learn a model
Apply the model
Choose a translation
Non-parametric models
Large-context models Coarse-to-fine
Compact encodings
Full distributions
[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]
[DeNero et al. NAACL ’10]
[DeNero et al. EMNLP ’08]
[DeNero & Klein. ACL ’10]
Are we done yet?
• Morphology in alignment modeling
• Unsupervised composed phrase learning
![Page 190: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/190.jpg)
Summary of Translation Research
Learn a model
Apply the model
Choose a translation
Non-parametric models
Large-context models Coarse-to-fine
Compact encodings
Full distributions
[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]
[DeNero et al. NAACL ’10]
[DeNero et al. EMNLP ’08]
[DeNero & Klein. ACL ’10]
Are we done yet?
• Morphology in alignment modeling
• Unsupervised composed phrase learning
• Adding information to consensus models
![Page 191: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9](https://reader034.vdocuments.us/reader034/viewer/2022050223/5f6906ce904330273b20a022/html5/thumbnails/191.jpg)
Acknowledgements
Thank you!
and many thanks to my excellent coauthors on this work:
Berkeley: Mohit Bansal, John Blitzer, Alex Bouchard-Côté, Aria Haghighi, Dan Klein, and Adam Pauls
Information Sciences Institute: David Chiang and Kevin Knight
Google: Ciprian Chelba, Shankar Kumar, and Franz Och
John
Juan Gracias!