variational decoding for statistical machine translationjason/papers/li+al.acl09.slides-anim.pdf ·...
TRANSCRIPT
![Page 1: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/1.jpg)
Variational Decoding for Statistical Machine Translation
Zhifei Li, Jason Eisner, and Sanjeev KhudanpurCenter for Language and Speech Processing
Computer Science DepartmentJohns Hopkins University
1
Monday, August 17, 2009
![Page 2: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/2.jpg)
Spurious Ambiguity
• Statistical models in MT exhibit spurious ambiguity
• Many different derivations (e.g., trees or segmentations) generate the same translation string
• Regular phrase-based MT systems
• phrase segmentation ambiguity
• Tree-based MT systems
• derivation tree ambiguity
2
Monday, August 17, 2009
![Page 3: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/3.jpg)
Spurious Ambiguity in Phrase Segmentations
3
Monday, August 17, 2009
![Page 4: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/4.jpg)
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
3
Monday, August 17, 2009
![Page 5: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/5.jpg)
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine translation software
3
Monday, August 17, 2009
![Page 6: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/6.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine translation software
3
Monday, August 17, 2009
![Page 7: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/7.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
software
machine translation software
3
Monday, August 17, 2009
![Page 8: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/8.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
机器 翻译 软件
software
machine translation software
3
Monday, August 17, 2009
![Page 9: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/9.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine
机器 翻译 软件
software
machine translation software
3
Monday, August 17, 2009
![Page 10: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/10.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine
机器 翻译 软件
software
translation software
machine translation software
3
Monday, August 17, 2009
![Page 11: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/11.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine
机器 翻译 软件
software
translation software
machine
机器
machine translation software
3
Monday, August 17, 2009
![Page 12: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/12.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine
机器 翻译 软件
software
translation software
machine
机器 翻译
translation
machine translation software
3
Monday, August 17, 2009
![Page 13: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/13.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine
机器 翻译 软件
software
translation software
machine
机器 翻译 软件
translation software
machine translation software
3
Monday, August 17, 2009
![Page 14: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/14.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine
机器 翻译 软件
software
translation software
machine
机器 翻译 软件
translation software
• Same output: “machine translation software”
• Three different phrase segmentations
machine translation software
3
Monday, August 17, 2009
![Page 15: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/15.jpg)
machine translation
机器 翻译 软件
Spurious Ambiguity in Phrase Segmentations
machine
机器 翻译 软件
software
translation software
machine
机器 翻译 软件
translation software
• Same output: “machine translation software”
• Three different phrase segmentations
machine translation software
3
machine transfer software
Monday, August 17, 2009
![Page 16: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/16.jpg)
Spurious Ambiguity in Derivation Trees
4
Monday, August 17, 2009
![Page 17: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/17.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
4
Monday, August 17, 2009
![Page 18: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/18.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
S->(机器, machine)
4
Monday, August 17, 2009
![Page 19: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/19.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
S->(机器, machine) S->(翻译, translation)
4
Monday, August 17, 2009
![Page 20: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/20.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
S->(机器, machine) S->(翻译, translation) S->(软件, software)
4
Monday, August 17, 2009
![Page 21: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/21.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
S->(机器, machine) S->(翻译, translation) S->(软件, software)
S->(S0 S1, S0 S1)
4
Monday, August 17, 2009
![Page 22: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/22.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
S->(机器, machine) S->(翻译, translation) S->(软件, software)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
4
Monday, August 17, 2009
![Page 23: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/23.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
S->(机器, machine) S->(翻译, translation) S->(软件, software)
S->(机器, machine) S->(翻译, translation) S->(软件, software)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
4
Monday, August 17, 2009
![Page 24: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/24.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
S->(机器, machine) S->(翻译, translation) S->(软件, software)
S->(机器, machine) S->(翻译, translation) S->(软件, software)
S->(机器, machine) 翻译 S->(软件, software)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
S->(S0 翻译 S1, S0 translation S1)
4
Monday, August 17, 2009
![Page 25: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/25.jpg)
Spurious Ambiguity in Derivation Trees 机器 翻译 软件
S->(机器, machine) S->(翻译, translation) S->(软件, software)
S->(机器, machine) S->(翻译, translation) S->(软件, software)
S->(机器, machine) 翻译 S->(软件, software)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
S->(S0 S1, S0 S1)
S->(S0 翻译 S1, S0 translation S1)
• Same output: “machine translation software”
• Three different derivation trees
4
Monday, August 17, 2009
![Page 26: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/26.jpg)
5
Maximum A Posterior (MAP) Decoding
Monday, August 17, 2009
![Page 27: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/27.jpg)
red translation
blue translation
green translation
translation string
5
Maximum A Posterior (MAP) Decoding
Monday, August 17, 2009
![Page 28: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/28.jpg)
red translation
blue translation
green translation
derivationtranslation string
5
Maximum A Posterior (MAP) Decoding
Monday, August 17, 2009
![Page 29: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/29.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
5
Maximum A Posterior (MAP) Decoding
Monday, August 17, 2009
![Page 30: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/30.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
5
Maximum A Posterior (MAP) Decoding
Monday, August 17, 2009
![Page 31: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/31.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
5
• Exact MAP decoding
Maximum A Posterior (MAP) Decoding
Monday, August 17, 2009
![Page 32: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/32.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
5
• Exact MAP decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
Maximum A Posterior (MAP) Decoding
Monday, August 17, 2009
![Page 33: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/33.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
5
• Exact MAP decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
Maximum A Posterior (MAP) Decoding
Monday, August 17, 2009
![Page 34: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/34.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
Maximum A Posterior (MAP) Decoding
6
• Exact MAP decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
Monday, August 17, 2009
![Page 35: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/35.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
Maximum A Posterior (MAP) Decoding
6
• Exact MAP decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
0.28
Monday, August 17, 2009
![Page 36: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/36.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
7
• Exact MAP decoding
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
0.28
Maximum A Posterior (MAP) Decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
Monday, August 17, 2009
![Page 37: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/37.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
7
• Exact MAP decoding
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
0.28
0.28
Maximum A Posterior (MAP) Decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
Monday, August 17, 2009
![Page 38: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/38.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
8
• Exact MAP decoding
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
0.28
0.28
Maximum A Posterior (MAP) Decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
Monday, August 17, 2009
![Page 39: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/39.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
8
• Exact MAP decoding
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
0.28
0.28
0.44
Maximum A Posterior (MAP) Decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
Monday, August 17, 2009
![Page 40: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/40.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
0.28
0.28
0.44
9
• Exact MAP decoding
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
Maximum A Posterior (MAP) Decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
Monday, August 17, 2009
![Page 41: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/41.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
0.28
0.28
0.44
9
• Exact MAP decoding
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
Maximum A Posterior (MAP) Decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
Monday, August 17, 2009
![Page 42: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/42.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
0.28
0.28
0.44
9
• Exact MAP decoding
y! = arg maxy"Trans(x)
p(y|x)
= arg maxy"Trans(x)
!
d"D(x,y)
p(y, d|x)
Maximum A Posterior (MAP) Decoding
• x: Foreign sentence
• y: English sentence
• d: derivation
Monday, August 17, 2009
![Page 43: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/43.jpg)
Hypergraph as a search space
Monday, August 17, 2009
![Page 44: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/44.jpg)
Hypergraph as a search space
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Monday, August 17, 2009
![Page 45: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/45.jpg)
Hypergraph as a search space
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
A hypergraph is a compact structure to encode exponentially many trees.
Monday, August 17, 2009
![Page 46: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/46.jpg)
Hypergraph as a search space
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Monday, August 17, 2009
![Page 47: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/47.jpg)
Hypergraph as a search space
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Probabilistic Hypergraph
Monday, August 17, 2009
![Page 48: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/48.jpg)
Hypergraph as a search space
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
The hypergraph defines a probability distribution over derivation trees, i.e. p(y, d | x),
Probabilistic Hypergraph
Monday, August 17, 2009
![Page 49: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/49.jpg)
Hypergraph as a search space
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
The hypergraph defines a probability distribution over derivation trees, i.e. p(y, d | x),and also a distribution (implicit) over strings, i.e. p(y | x).
Probabilistic Hypergraph
Monday, August 17, 2009
![Page 50: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/50.jpg)
Hypergraph as a search space
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
The hypergraph defines a probability distribution over derivation trees, i.e. p(y, d | x),and also a distribution (implicit) over strings, i.e. p(y | x).
Probabilistic Hypergraph
• Exact MAP decoding
NP-hard (Sima’an 1996)
exponential size
y! = arg maxy"HG(x)
p(y|x)
= arg maxy"HG(x)
!
d"D(x,y)
p(y, d|x)
Monday, August 17, 2009
![Page 51: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/51.jpg)
• Maximum a posterior (MAP) decoding
• Viterbi approximation
• N-best approximation (crunching) (May and Knight 2006)
Decoding with spurious ambiguity?
Monday, August 17, 2009
![Page 52: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/52.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
Viterbi Approximation
• Viterbi approximationy! = arg max
y"Trans(x)max
d"D(x,y)p(y, d|x)
= Y(arg maxd!D(x)
p(y, d|x))
0.28
0.28
0.44
Monday, August 17, 2009
![Page 53: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/53.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
Viterbi Approximation
• Viterbi approximationy! = arg max
y"Trans(x)max
d"D(x,y)p(y, d|x)
= Y(arg maxd!D(x)
p(y, d|x))
0.28
0.28
0.44
Viterbi
Monday, August 17, 2009
![Page 54: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/54.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
Viterbi Approximation
• Viterbi approximationy! = arg max
y"Trans(x)max
d"D(x,y)p(y, d|x)
= Y(arg maxd!D(x)
p(y, d|x))
0.28
0.28
0.44
Viterbi
Monday, August 17, 2009
![Page 55: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/55.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
Viterbi Approximation
• Viterbi approximationy! = arg max
y"Trans(x)max
d"D(x,y)p(y, d|x)
= Y(arg maxd!D(x)
p(y, d|x))
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
Monday, August 17, 2009
![Page 56: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/56.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
Viterbi Approximation
• Viterbi approximationy! = arg max
y"Trans(x)max
d"D(x,y)p(y, d|x)
= Y(arg maxd!D(x)
p(y, d|x))
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
Monday, August 17, 2009
![Page 57: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/57.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
N-best Approximation
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
• N-best approximation (crunching) (May and Knight 2006)
y! = arg maxy"Trans(x)
!
d"D(x,y)#ND(x)
p(y, d|x)
Monday, August 17, 2009
![Page 58: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/58.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
N-best Approximation
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
4-best crunching
• N-best approximation (crunching) (May and Knight 2006)
y! = arg maxy"Trans(x)
!
d"D(x,y)#ND(x)
p(y, d|x)
Monday, August 17, 2009
![Page 59: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/59.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
4-best crunching
N-best Approximation
• N-best approximation (crunching) (May and Knight 2006)
y! = arg maxy"Trans(x)
!
d"D(x,y)#ND(x)
p(y, d|x)
Monday, August 17, 2009
![Page 60: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/60.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
4-best crunching
0.16
0.28
0.13
N-best Approximation
• N-best approximation (crunching) (May and Knight 2006)
y! = arg maxy"Trans(x)
!
d"D(x,y)#ND(x)
p(y, d|x)
Monday, August 17, 2009
![Page 61: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/61.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
4-best crunching
0.16
0.28
0.13
N-best Approximation
• N-best approximation (crunching) (May and Knight 2006)
y! = arg maxy"Trans(x)
!
d"D(x,y)#ND(x)
p(y, d|x)
Monday, August 17, 2009
![Page 62: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/62.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
MAP vs. Approximations
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
4-best crunching
0.16
0.28
0.13
Monday, August 17, 2009
![Page 63: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/63.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
MAP vs. Approximations
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
4-best crunching
0.16
0.28
0.13
• Exact MAP decoding under spurious ambiguity is intractable
Monday, August 17, 2009
![Page 64: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/64.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
MAP vs. Approximations
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
4-best crunching
0.16
0.28
0.13
• Viterbi and crunching are efficient, but ignore most derivations• Exact MAP decoding under spurious ambiguity is intractable
Monday, August 17, 2009
![Page 65: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/65.jpg)
red translation
blue translation
green translation
0.160.140.140.130.120.110.100.10
probabilityderivationtranslation string
MAP
MAP vs. Approximations
0.28
0.28
0.44
Viterbi
0.16
0.14
0.13
4-best crunching
0.16
0.28
0.13
• Our goal: develop an approximation that considers all the derivations but still allows tractable decoding
• Viterbi and crunching are efficient, but ignore most derivations• Exact MAP decoding under spurious ambiguity is intractable
Monday, August 17, 2009
![Page 66: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/66.jpg)
Variational Decoding
18
Monday, August 17, 2009
![Page 67: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/67.jpg)
Variational Decoding
18
Decoding using Variational approximation
Monday, August 17, 2009
![Page 68: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/68.jpg)
Variational Decoding
18
Decoding using Variational approximation
Decoding using a sentence-specific approximate distribution
Monday, August 17, 2009
![Page 69: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/69.jpg)
Variational Decoding for MT: an Overview
Monday, August 17, 2009
![Page 70: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/70.jpg)
Variational Decoding for MT: an Overview
Sentence-specific decoding
Monday, August 17, 2009
![Page 71: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/71.jpg)
Variational Decoding for MT: an Overview
Sentence-specific decoding
Three steps:
Monday, August 17, 2009
![Page 72: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/72.jpg)
Variational Decoding for MT: an Overview
Sentence-specific decoding
1 Generate a hypergraph
Three steps:
Monday, August 17, 2009
![Page 73: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/73.jpg)
Variational Decoding for MT: an Overview
Sentence-specific decoding
Foreign sentence x
1 Generate a hypergraph
Three steps:
Monday, August 17, 2009
![Page 74: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/74.jpg)
Variational Decoding for MT: an Overview
Sentence-specific decoding
Foreign sentence x SMT
1 Generate a hypergraph
Three steps:
Monday, August 17, 2009
![Page 75: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/75.jpg)
Variational Decoding for MT: an Overview
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Sentence-specific decoding
Foreign sentence x SMT
1 Generate a hypergraph
Three steps:
Monday, August 17, 2009
![Page 76: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/76.jpg)
Variational Decoding for MT: an Overview
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Sentence-specific decoding
Foreign sentence x SMT
1 Generate a hypergraph
Three steps:
p(y, d | x)
Monday, August 17, 2009
![Page 77: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/77.jpg)
Variational Decoding for MT: an Overview
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Sentence-specific decoding
Foreign sentence x SMT
p(y | x)
1 Generate a hypergraph
Three steps:
p(y, d | x)
Monday, August 17, 2009
![Page 78: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/78.jpg)
Variational Decoding for MT: an Overview
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Sentence-specific decoding
Foreign sentence x SMT
MAP decoding under P is intractable
p(y | x)
1 Generate a hypergraph
Three steps:
p(y, d | x)
Monday, August 17, 2009
![Page 79: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/79.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
1
p(y, d | x)Generate a hypergraph
Monday, August 17, 2009
![Page 80: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/80.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
1
p(y, d | x)Generate a hypergraph
Monday, August 17, 2009
![Page 81: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/81.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
1
p(y, d | x)
2
Generate a hypergraph
Monday, August 17, 2009
![Page 82: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/82.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
1
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
p(y, d | x)
2
Generate a hypergraph
Monday, August 17, 2009
![Page 83: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/83.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
1
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
p(y, d | x)
2 Estimate a model from the hypergraph
Generate a hypergraph
q*(y | x)
Monday, August 17, 2009
![Page 84: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/84.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
q* is an n-gram model over output strings.
1
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
p(y, d | x)
2 Estimate a model from the hypergraph
Generate a hypergraph
q*(y | x)
Monday, August 17, 2009
![Page 85: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/85.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
q* is an n-gram model over output strings.
1
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
p(y, d | x)
2 Estimate a model from the hypergraph
Generate a hypergraph
q*(y | x)
≈∑d∈D(x,y) p(y,d|x)
Monday, August 17, 2009
![Page 86: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/86.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
q* is an n-gram model over output strings.
1
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
p(y, d | x)
2
3
Estimate a model from the hypergraph
Generate a hypergraph
q*(y | x)
≈∑d∈D(x,y) p(y,d|x)
Monday, August 17, 2009
![Page 87: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/87.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
q* is an n-gram model over output strings.
Decode using q*on the hypergraph
1
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
q*(y | x)
2
3
Estimate a model from the hypergraph
Generate a hypergraph
q*(y | x)
≈∑d∈D(x,y) p(y,d|x)
Monday, August 17, 2009
![Page 88: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/88.jpg)
Variational Inference
21
Monday, August 17, 2009
![Page 89: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/89.jpg)
Variational Inference• We want to do inference under p, but it is intractable
21
Monday, August 17, 2009
![Page 90: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/90.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
21
Monday, August 17, 2009
![Page 91: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/91.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
21
Monday, August 17, 2009
![Page 92: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/92.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
q! = arg minq"Q
KL(p||q)
21
Monday, August 17, 2009
![Page 93: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/93.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
• Then, we will use q* as a surrogate for p in inference
q! = arg minq"Q
KL(p||q)
21
Monday, August 17, 2009
![Page 94: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/94.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
• Then, we will use q* as a surrogate for p in inference
y! = arg maxy
q!(y | x)
q! = arg minq"Q
KL(p||q)
21
Monday, August 17, 2009
![Page 95: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/95.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
• Then, we will use q* as a surrogate for p in inference
y! = arg maxy
q!(y | x)
q! = arg minq"Q
KL(p||q)
21
P
Monday, August 17, 2009
![Page 96: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/96.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
• Then, we will use q* as a surrogate for p in inference
y! = arg maxy
q!(y | x)
q! = arg minq"Q
KL(p||q)
21
pP
Monday, August 17, 2009
![Page 97: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/97.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
• Then, we will use q* as a surrogate for p in inference
y! = arg maxy
q!(y | x)
q! = arg minq"Q
KL(p||q)
21
p
Q
P
Monday, August 17, 2009
![Page 98: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/98.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
• Then, we will use q* as a surrogate for p in inference
y! = arg maxy
q!(y | x)
q! = arg minq"Q
KL(p||q)
21
p
Q
P
Monday, August 17, 2009
![Page 99: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/99.jpg)
Variational Inference• We want to do inference under p, but it is intractable
y! = arg maxy
p(y|x)
• Instead, we derive a simpler distribution q*
• Then, we will use q* as a surrogate for p in inference
y! = arg maxy
q!(y | x)
q! = arg minq"Q
KL(p||q)
21
p
Q q*
P
Monday, August 17, 2009
![Page 100: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/100.jpg)
Variational Approximation• q*: an approximation having minimum distance to p
q! = arg minq"Q
KL(p||q)a family of distributions
22
Monday, August 17, 2009
![Page 101: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/101.jpg)
Variational Approximation• q*: an approximation having minimum distance to p
= arg minq!Q
!
y!Trans(x)
plogp
q
q! = arg minq"Q
KL(p||q)a family of distributions
22
Monday, August 17, 2009
![Page 102: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/102.jpg)
Variational Approximation• q*: an approximation having minimum distance to p
= arg minq!Q
!
y!Trans(x)
plogp
q
= arg minq!Q
!
y!Trans(x)
(plogp! plogq)
q! = arg minq"Q
KL(p||q)a family of distributions
22
Monday, August 17, 2009
![Page 103: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/103.jpg)
constant
Variational Approximation• q*: an approximation having minimum distance to p
= arg minq!Q
!
y!Trans(x)
plogp
q
= arg minq!Q
!
y!Trans(x)
(plogp! plogq)
q! = arg minq"Q
KL(p||q)a family of distributions
22
Monday, August 17, 2009
![Page 104: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/104.jpg)
constant
Variational Approximation• q*: an approximation having minimum distance to p
= arg minq!Q
!
y!Trans(x)
plogp
q
= arg minq!Q
!
y!Trans(x)
(plogp! plogq)
= arg maxq!Q
!
y!Trans(x)
plogq
q! = arg minq"Q
KL(p||q)a family of distributions
22
Monday, August 17, 2009
![Page 105: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/105.jpg)
constant
Variational Approximation• q*: an approximation having minimum distance to p
= arg minq!Q
!
y!Trans(x)
plogp
q
= arg minq!Q
!
y!Trans(x)
(plogp! plogq)
= arg maxq!Q
!
y!Trans(x)
plogq
q! = arg minq"Q
KL(p||q)
• Three questions
a family of distributions
22
Monday, August 17, 2009
![Page 106: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/106.jpg)
constant
Variational Approximation• q*: an approximation having minimum distance to p
= arg minq!Q
!
y!Trans(x)
plogp
q
= arg minq!Q
!
y!Trans(x)
(plogp! plogq)
= arg maxq!Q
!
y!Trans(x)
plogq
q! = arg minq"Q
KL(p||q)
• Three questions
• how to parameterize q?
a family of distributions
22
Monday, August 17, 2009
![Page 107: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/107.jpg)
constant
Variational Approximation• q*: an approximation having minimum distance to p
= arg minq!Q
!
y!Trans(x)
plogp
q
= arg minq!Q
!
y!Trans(x)
(plogp! plogq)
= arg maxq!Q
!
y!Trans(x)
plogq
q! = arg minq"Q
KL(p||q)
• Three questions
• how to parameterize q?
• how to estimate q*?
a family of distributions
22
Monday, August 17, 2009
![Page 108: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/108.jpg)
constant
Variational Approximation• q*: an approximation having minimum distance to p
= arg minq!Q
!
y!Trans(x)
plogp
q
= arg minq!Q
!
y!Trans(x)
(plogp! plogq)
= arg maxq!Q
!
y!Trans(x)
plogq
q! = arg minq"Q
KL(p||q)
• Three questions
• how to parameterize q?
• how to estimate q*?
• how to use q* for decoding?
a family of distributions
22
Monday, August 17, 2009
![Page 109: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/109.jpg)
Parameterization of q∈Q
23
Monday, August 17, 2009
![Page 110: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/110.jpg)
Parameterization of q∈Q• Naturally, we parameterize q as an n-gram model
23
Monday, August 17, 2009
![Page 111: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/111.jpg)
Parameterization of q∈Q• Naturally, we parameterize q as an n-gram model
• The probability of a string is a product of the probabilities of those n-grams appearing in that string
23
Monday, August 17, 2009
![Page 112: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/112.jpg)
Parameterization of q∈Q• Naturally, we parameterize q as an n-gram model
• The probability of a string is a product of the probabilities of those n-grams appearing in that string
3-gram model
23
Monday, August 17, 2009
![Page 113: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/113.jpg)
Parameterization of q∈Q• Naturally, we parameterize q as an n-gram model
• The probability of a string is a product of the probabilities of those n-grams appearing in that string
y: a b c d e f3-gram model
23
Monday, August 17, 2009
![Page 114: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/114.jpg)
Parameterization of q∈Q• Naturally, we parameterize q as an n-gram model
• The probability of a string is a product of the probabilities of those n-grams appearing in that string
y: a b c d e f3-gram model
23
q(y) = q(a) · q(b|a) · q(c|ab) · q(d|bc) · q(e|cd) · q(f |de)
Monday, August 17, 2009
![Page 115: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/115.jpg)
Parameterization of q∈Q• Naturally, we parameterize q as an n-gram model
• The probability of a string is a product of the probabilities of those n-grams appearing in that string
y: a b c d e f3-gram model
23
q(y) = q(a) · q(b|a) · q(c|ab) · q(d|bc) · q(e|cd) · q(f |de)
Other ways of parameterizations are possible!
Monday, August 17, 2009
![Page 116: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/116.jpg)
• Naturally, we parameterize q as an n-gram model
• The probability of a string is a product of the probabilities of those n-grams appearing in that string
y: a b c d e f3-gram model
24
q(y) = q(a) · q(b|a) · q(c|ab) · q(d|bc) · q(e|cd) · q(f |de)
Parameterization of q∈Q
Monday, August 17, 2009
![Page 117: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/117.jpg)
• Naturally, we parameterize q as an n-gram model
• The probability of a string is a product of the probabilities of those n-grams appearing in that string
y: a b c d e f3-gram model
24
q(y) = q(a) · q(b|a) · q(c|ab) · q(d|bc) · q(e|cd) · q(f |de)
how to estimate these n-gram probabilities?
Parameterization of q∈Q
Monday, August 17, 2009
![Page 118: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/118.jpg)
Estimation of q*∈Q• Variational approximation
25
q! = arg maxq"Q
!
y"Trans(x)
plogq
Monday, August 17, 2009
![Page 119: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/119.jpg)
Estimation of q*∈Q• Variational approximation
• q* is a maximum likelihood estimate (MLE) where p is the empirical distribution
25
q! = arg maxq"Q
!
y"Trans(x)
plogq
Monday, August 17, 2009
![Page 120: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/120.jpg)
Estimation of q*∈Q• Variational approximation
• q* is a maximum likelihood estimate (MLE) where p is the empirical distribution
25
But in our case, p is defined not by a corpus, but by a hypergraph for a given test sentence!
q! = arg maxq"Q
!
y"Trans(x)
plogq
Monday, August 17, 2009
![Page 121: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/121.jpg)
Estimation of q*∈Q• Variational approximation
• q* is a maximum likelihood estimate (MLE) where p is the empirical distribution
25
But in our case, p is defined not by a corpus, but by a hypergraph for a given test sentence!
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
q! = arg maxq"Q
!
y"Trans(x)
plogq
Monday, August 17, 2009
![Page 122: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/122.jpg)
Estimation of q*∈Q• Variational approximation
• q* is a maximum likelihood estimate (MLE) where p is the empirical distribution
25
But in our case, p is defined not by a corpus, but by a hypergraph for a given test sentence!
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
estimate
q! = arg maxq"Q
!
y"Trans(x)
plogq
Monday, August 17, 2009
![Page 123: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/123.jpg)
Estimation of q*∈Q• Variational approximation
• q* is a maximum likelihood estimate (MLE) where p is the empirical distribution
25
But in our case, p is defined not by a corpus, but by a hypergraph for a given test sentence!
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1# bi-gram modelestimate
q! = arg maxq"Q
!
y"Trans(x)
plogq
Monday, August 17, 2009
![Page 124: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/124.jpg)
Estimation of q*∈Q• Variational approximation
• q* is a maximum likelihood estimate (MLE) where p is the empirical distribution
25
But in our case, p is defined not by a corpus, but by a hypergraph for a given test sentence!
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1# bi-gram model
• brute force
estimate
q! = arg maxq"Q
!
y"Trans(x)
plogq
Monday, August 17, 2009
![Page 125: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/125.jpg)
Estimation of q*∈Q• Variational approximation
• q* is a maximum likelihood estimate (MLE) where p is the empirical distribution
25
But in our case, p is defined not by a corpus, but by a hypergraph for a given test sentence!
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1# bi-gram model
• brute force
• dynamic programming
estimate
q! = arg maxq"Q
!
y"Trans(x)
plogq
Monday, August 17, 2009
![Page 126: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/126.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X1 on X0#X!"X0 de X1,X1 of X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
26
Estimating q* from a hypergraph: brute force
Monday, August 17, 2009
![Page 127: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/127.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X1 on X0#X!"X0 de X1,X1 of X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
26
Estimating q* from a hypergraph: brute force
Bi-gram estimation:
Monday, August 17, 2009
![Page 128: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/128.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X1 on X0#X!"X0 de X1,X1 of X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
26
Estimating q* from a hypergraph: brute force
Bi-gram estimation:
‣ unpack the hypergraph
Monday, August 17, 2009
![Page 129: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/129.jpg)
dianzi0 shang1 de2 mao3
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X1 on X0# X!"X0 de X1,X1 of X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
27
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Estimating q* from a hypergraph: brute force
Bi-gram estimation:
‣ unpack the hypergraph
Monday, August 17, 2009
![Page 130: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/130.jpg)
dianzi0 shang1 de2 mao3
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X1 on X0# X!"X0 de X1,X1 of X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
the mat a cat
a cat on the mat a cat of the mat
the mat ‘s a cat
27
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Estimating q* from a hypergraph: brute force
Bi-gram estimation:
‣ unpack the hypergraph
Monday, August 17, 2009
![Page 131: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/131.jpg)
dianzi0 shang1 de2 mao3
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
X!"X0 de X1,X1 on X0# X!"X0 de X1,X1 of X0#
dianzi0 shang1 de2 mao3
X!"mao,a cat#X!"dianzi shang, the mat#
S!"X0,X0#
the mat a cat
a cat on the mat a cat of the mat
the mat ‘s a cat
p=2/8
p=1/8
p=3/8
p=2/8
27
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Estimating q* from a hypergraph: brute force
Bi-gram estimation:
‣ unpack the hypergraph
Monday, August 17, 2009
![Page 132: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/132.jpg)
the mat a cat
a cat of the mat
the mat ‘s a cat
p=2/8
p=1/8
p=3/8
p=2/8
28
Estimating q* from a hypergraph: brute force
a cat on the mat
Monday, August 17, 2009
![Page 133: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/133.jpg)
the mat a cat
a cat of the mat
the mat ‘s a cat
p=2/8
p=1/8
p=3/8
p=2/8
28
Bi-gram estimation:
‣ unpack the hypergraph
Estimating q* from a hypergraph: brute force
a cat on the mat
Monday, August 17, 2009
![Page 134: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/134.jpg)
the mat a cat
a cat of the mat
the mat ‘s a cat
p=2/8
p=1/8
p=3/8
p=2/8
28
Bi-gram estimation:
‣ unpack the hypergraph
‣ accumulate the soft-count of each bigram
Estimating q* from a hypergraph: brute force
a cat on the mat
Monday, August 17, 2009
![Page 135: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/135.jpg)
the mat a cat
a cat of the mat
the mat ‘s a cat
p=2/8
p=1/8
p=3/8
p=2/8
28
Bi-gram estimation:
‣ unpack the hypergraph
‣ accumulate the soft-count of each bigram
‣ normalize the counts
Estimating q* from a hypergraph: brute force
a cat on the mat
Monday, August 17, 2009
![Page 136: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/136.jpg)
the mat a cat
a cat of the mat
the mat ‘s a cat
p=2/8
p=1/8
p=3/8
p=2/8
28
Bi-gram estimation:
‣ unpack the hypergraph
‣ accumulate the soft-count of each bigram
‣ normalize the counts
Estimating q* from a hypergraph: brute force
Pr(on | cat)=1/8
Pr(of | cat)=2/8
Pr(</s> | cat)=5/8
a cat on the mat
Monday, August 17, 2009
![Page 137: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/137.jpg)
the mat a cat
a cat of the mat
the mat ‘s a cat
p=2/8
p=1/8
p=3/8
p=2/8
28
Bi-gram estimation:
‣ unpack the hypergraph
‣ accumulate the soft-count of each bigram
‣ normalize the counts
Estimating q* from a hypergraph: brute force
Pr(on | cat)=1/8
Pr(of | cat)=2/8
Pr(</s> | cat)=5/8
a cat on the mat
Monday, August 17, 2009
![Page 138: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/138.jpg)
29
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Estimating q* from a hypergraph: dynamic programming
Monday, August 17, 2009
![Page 139: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/139.jpg)
29
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Estimating q* from a hypergraph: dynamic programming
Bi-gram estimation:
Monday, August 17, 2009
![Page 140: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/140.jpg)
29
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Estimating q* from a hypergraph: dynamic programming
Bi-gram estimation:
‣ run inside-outside on the hypergraph
Monday, August 17, 2009
![Page 141: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/141.jpg)
29
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Estimating q* from a hypergraph: dynamic programming
Bi-gram estimation:
‣ run inside-outside on the hypergraph
‣ accumulate the soft-count of each bigram at each hyperedge
Monday, August 17, 2009
![Page 142: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/142.jpg)
29
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
Estimating q* from a hypergraph: dynamic programming
Bi-gram estimation:
‣ run inside-outside on the hypergraph
‣ accumulate the soft-count of each bigram at each hyperedge
‣ normalize the counts
Monday, August 17, 2009
![Page 143: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/143.jpg)
Decoding using q*∈Q
30
Monday, August 17, 2009
![Page 144: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/144.jpg)
Decoding using q*∈Q
• Rescore the hypergraph HG(x)
30
Monday, August 17, 2009
![Page 145: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/145.jpg)
Decoding using q*∈Q
• Rescore the hypergraph HG(x)
y! = arg maxy"HG(x)
q!(y|x)
30
Monday, August 17, 2009
![Page 146: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/146.jpg)
Decoding using q*∈Q
• Rescore the hypergraph HG(x)
y! = arg maxy"HG(x)
q!(y|x)
30
Monday, August 17, 2009
![Page 147: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/147.jpg)
Decoding using q*∈Q
• Rescore the hypergraph HG(x)
y! = arg maxy"HG(x)
q!(y|x)
30
q* is an n-gram model.
Monday, August 17, 2009
![Page 148: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/148.jpg)
Decoding using q*∈Q
• Rescore the hypergraph HG(x)
• have efficient dynamic programming algorithms
• score the hypergraph using an n-gram model
y! = arg maxy"HG(x)
q!(y|x)
30
q* is an n-gram model.
Monday, August 17, 2009
![Page 149: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/149.jpg)
Decoding using q*∈Q
• Rescore the hypergraph HG(x)
• have efficient dynamic programming algorithms
• score the hypergraph using an n-gram model
y! = arg maxy"HG(x)
q!(y|x)
30
q* is an n-gram model.
John already told you how to do this☺
Monday, August 17, 2009
![Page 150: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/150.jpg)
KL divergences under different variational models
31
q! = arg minq"Q
KL(p||q) = H(p, q)!H(p)
Monday, August 17, 2009
![Page 151: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/151.jpg)
KL divergences under different variational models
31
q! = arg minq"Q
KL(p||q) = H(p, q)!H(p)
Measure H(p) KL(p||·)bits/word q!1 q!2 q!3 q!4
MT’04 1.36 0.97 0.32 0.21 0.17MT’05 1.37 0.94 0.32 0.21 0.17
Monday, August 17, 2009
![Page 152: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/152.jpg)
KL divergences under different variational models
• The larger the order n is, the smaller the KL divergence is!
• The reduction of KL divergence happens mostly when switching from unigram to bigram
31
q! = arg minq"Q
KL(p||q) = H(p, q)!H(p)
Measure H(p) KL(p||·)bits/word q!1 q!2 q!3 q!4
MT’04 1.36 0.97 0.32 0.21 0.17MT’05 1.37 0.94 0.32 0.21 0.17
Monday, August 17, 2009
![Page 153: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/153.jpg)
KL divergences under different variational models
32
Measure H(p) KL(p||·)bits/word q!1 q!2 q!3 q!4
MT’04 1.36 0.97 0.32 0.21 0.17MT’05 1.37 0.94 0.32 0.21 0.17
q! = arg minq"Q
KL(p||q) = H(p, q)!H(p)
Monday, August 17, 2009
![Page 154: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/154.jpg)
KL divergences under different variational models
32
How to compute them on a hypergraph?
see (Li and Eisner, EMNLP’09)
Measure H(p) KL(p||·)bits/word q!1 q!2 q!3 q!4
MT’04 1.36 0.97 0.32 0.21 0.17MT’05 1.37 0.94 0.32 0.21 0.17
q! = arg minq"Q
KL(p||q) = H(p, q)!H(p)
Monday, August 17, 2009
![Page 155: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/155.jpg)
BLEU scores when using a single variational n-gram model
Decoding scheme MT’04 MT’05Viterbi 35.4 32.61gram 25.9 24.52gram 36.1 33.43gram 36.0 33.14gram 35.8 32.9
33
Monday, August 17, 2009
![Page 156: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/156.jpg)
BLEU scores when using a single variational n-gram model
• unigram performs very badly
Decoding scheme MT’04 MT’05Viterbi 35.4 32.61gram 25.9 24.52gram 36.1 33.43gram 36.0 33.14gram 35.8 32.9
33
Monday, August 17, 2009
![Page 157: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/157.jpg)
BLEU scores when using a single variational n-gram model
• unigram performs very badly
Decoding scheme MT’04 MT’05Viterbi 35.4 32.61gram 25.9 24.52gram 36.1 33.43gram 36.0 33.14gram 35.8 32.9
33
• bigram achieves best BLEU scores
Monday, August 17, 2009
![Page 158: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/158.jpg)
BLEU scores when using a single variational n-gram model
• unigram performs very badly
Decoding scheme MT’04 MT’05Viterbi 35.4 32.61gram 25.9 24.52gram 36.1 33.43gram 36.0 33.14gram 35.8 32.9
33
• bigram achieves best BLEU scores ???
Monday, August 17, 2009
![Page 159: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/159.jpg)
BLEU scores when using a single variational n-gram model
• unigram performs very badly
Decoding scheme MT’04 MT’05Viterbi 35.4 32.61gram 25.9 24.52gram 36.1 33.43gram 36.0 33.14gram 35.8 32.9
33
• bigram achieves best BLEU scores ???
modeling error in p
Monday, August 17, 2009
![Page 160: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/160.jpg)
34
Monday, August 17, 2009
![Page 161: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/161.jpg)
34
BLEU cares about both low- and high-order n-gram matches
Monday, August 17, 2009
![Page 162: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/162.jpg)
34
BLEU cares about both low- and high-order n-gram matches
• Interpolating variational n-gram model for different n
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
Monday, August 17, 2009
![Page 163: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/163.jpg)
34
BLEU cares about both low- and high-order n-gram matches
Viterbi and variational are different ways in approximating p
• Interpolating variational n-gram model for different n
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
Monday, August 17, 2009
![Page 164: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/164.jpg)
34
BLEU cares about both low- and high-order n-gram matches
Viterbi and variational are different ways in approximating p
y! = arg maxy"HG(x)
!"
n
!n · log q!n(y | x) + !v · log pViterbi(y | x)
#
• Interpolating variational n-gram model for different n
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
Monday, August 17, 2009
![Page 165: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/165.jpg)
34
BLEU cares about both low- and high-order n-gram matches
Viterbi and variational are different ways in approximating p
y! = arg maxy"HG(x)
!"
n
!n · log q!n(y | x) + !v · log pViterbi(y | x)
#
• Interpolating variational n-gram model for different n
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
Monday, August 17, 2009
![Page 166: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/166.jpg)
Minimum Bayes Risk (MBR) decoding?
35
(Tromble et al. 2008)
(Denero et al. 2009)
Monday, August 17, 2009
![Page 167: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/167.jpg)
Minimum Risk Decoding
• Minimum risk decoding
• find the consensus translation string
• Maximum A Posterior (MAP) decoding
• find the most probable translation string
Risk(y) =!
y!
L(y, y!)p(y
!|x)
y! = arg maxy"HG(x)
p(y|x)
y! = arg miny"HG(x)
Risk(y)
36
Monday, August 17, 2009
![Page 168: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/168.jpg)
Variational Decoding(VD) vs. MBR (Tromble et al. 2008)
37
Monday, August 17, 2009
![Page 169: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/169.jpg)
Variational Decoding(VD) vs. MBR (Tromble et al. 2008)
37
spurious ambiguity
Monday, August 17, 2009
![Page 170: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/170.jpg)
Variational Decoding(VD) vs. MBR (Tromble et al. 2008)
37
spurious ambiguity
VD
Monday, August 17, 2009
![Page 171: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/171.jpg)
Variational Decoding(VD) vs. MBR (Tromble et al. 2008)
37
spurious ambiguity
cons
ensu
s
VD
Monday, August 17, 2009
![Page 172: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/172.jpg)
Variational Decoding(VD) vs. MBR (Tromble et al. 2008)
37
spurious ambiguity
cons
ensu
s
VD
MBR
Monday, August 17, 2009
![Page 173: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/173.jpg)
Variational Decoding(VD) vs. MBR (Tromble et al. 2008)
37
spurious ambiguity
cons
ensu
s
VD
MBR Interpolated VD
Monday, August 17, 2009
![Page 174: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/174.jpg)
Variational Decoding(VD) vs. MBR (Tromble et al. 2008)
37
spurious ambiguity
cons
ensu
s
VD
MBR Interpolated VD
Both BLEU metric and our variational distributions happen to use n-gram dependencies.
Monday, August 17, 2009
![Page 175: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/175.jpg)
• Variational decoding with interpolation
q(r(w) | h(w), x) =!
y! cw(y!)p(y
! | x)!
y! ch(w)(y!)p(y! | x)
qn(y | x) =!
w!Wn
q(r(w) | h(w), x)cw(y)
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
• Minimum risk decoding (Tromble et al. 2008)
gn(y | x) =!
w!Wn
g(w | x)cw(y)
g(w | x) =!
y!
!w(y!)p(y! | x)
y! = arg maxy"HG(x)
!
n
!n · gn(y | x)
38
Monday, August 17, 2009
![Page 176: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/176.jpg)
• Variational decoding with interpolation
q(r(w) | h(w), x) =!
y! cw(y!)p(y
! | x)!
y! ch(w)(y!)p(y! | x)
qn(y | x) =!
w!Wn
q(r(w) | h(w), x)cw(y)
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
• Minimum risk decoding (Tromble et al. 2008)
gn(y | x) =!
w!Wn
g(w | x)cw(y)
g(w | x) =!
y!
!w(y!)p(y! | x)
y! = arg maxy"HG(x)
!
n
!n · gn(y | x)
38
decision rule
decision rule
Monday, August 17, 2009
![Page 177: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/177.jpg)
• Variational decoding with interpolation
q(r(w) | h(w), x) =!
y! cw(y!)p(y
! | x)!
y! ch(w)(y!)p(y! | x)
qn(y | x) =!
w!Wn
q(r(w) | h(w), x)cw(y)
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
• Minimum risk decoding (Tromble et al. 2008)
gn(y | x) =!
w!Wn
g(w | x)cw(y)
g(w | x) =!
y!
!w(y!)p(y! | x)
y! = arg maxy"HG(x)
!
n
!n · gn(y | x)
38
decision rule
decision rule
n-gram model
n-gram model
Monday, August 17, 2009
![Page 178: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/178.jpg)
• Variational decoding with interpolation
q(r(w) | h(w), x) =!
y! cw(y!)p(y
! | x)!
y! ch(w)(y!)p(y! | x)
qn(y | x) =!
w!Wn
q(r(w) | h(w), x)cw(y)
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
• Minimum risk decoding (Tromble et al. 2008)
gn(y | x) =!
w!Wn
g(w | x)cw(y)
g(w | x) =!
y!
!w(y!)p(y! | x)
y! = arg maxy"HG(x)
!
n
!n · gn(y | x)
38
decision rule
decision rule
n-gram model
n-gram model
n-gram probability
n-gram probability
Monday, August 17, 2009
![Page 179: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/179.jpg)
• Variational decoding with interpolation
q(r(w) | h(w), x) =!
y! cw(y!)p(y
! | x)!
y! ch(w)(y!)p(y! | x)
qn(y | x) =!
w!Wn
q(r(w) | h(w), x)cw(y)
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
• Minimum risk decoding (Tromble et al. 2008)
gn(y | x) =!
w!Wn
g(w | x)cw(y)
g(w | x) =!
y!
!w(y!)p(y! | x)
y! = arg maxy"HG(x)
!
n
!n · gn(y | x)
39
Monday, August 17, 2009
![Page 180: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/180.jpg)
• Variational decoding with interpolation
q(r(w) | h(w), x) =!
y! cw(y!)p(y
! | x)!
y! ch(w)(y!)p(y! | x)
qn(y | x) =!
w!Wn
q(r(w) | h(w), x)cw(y)
y! = arg maxy"HG(x)
!
n
!n · log q!n(y | x)
• Minimum risk decoding (Tromble et al. 2008)
gn(y | x) =!
w!Wn
g(w | x)cw(y)
g(w | x) =!
y!
!w(y!)p(y! | x)
non-probabilistic
very expensive to compute
y! = arg maxy"HG(x)
!
n
!n · gn(y | x)
39
Monday, August 17, 2009
![Page 181: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/181.jpg)
BLEU Results on Chinese-English NIST MT Tasks
Decoding scheme MT’04 MT’05Viterbi 35.4 32.6MBR (K=1000) 35.8 32.7Crunching (N=10000) 35.7 32.8Crunching+MBR (N=10000) 35.8 32.7Variational (1to4gram+wp+vt) 36.6 33.5
40
Monday, August 17, 2009
![Page 182: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/182.jpg)
BLEU Results on Chinese-English NIST MT Tasks
• variational decoding improves over Viterbi, MBR, and crunching
Decoding scheme MT’04 MT’05Viterbi 35.4 32.6MBR (K=1000) 35.8 32.7Crunching (N=10000) 35.7 32.8Crunching+MBR (N=10000) 35.8 32.7Variational (1to4gram+wp+vt) 36.6 33.5
40
Monday, August 17, 2009
![Page 183: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/183.jpg)
Conclusions
• Exact MAP decoding with spurious ambiguity is intractable
• Viterbi or N-best approximations are efficient, but ignore most derivations
• We developed a variational approximation, which considers all derivations but still allows tractable decoding
• Our variational decoding improves a state of the art baseline
41
Monday, August 17, 2009
![Page 184: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/184.jpg)
Future directions
• The MT pipeline is full of intractable problems
• variational approximation is a principled way to tackle these problems
• Decoding with spurious ambiguity is a common problem in many other NLP applications
• Models with latent variables
• Data oriented parsing (DOP)
• Hidden Markov Models (HMM)
• ......
42
Monday, August 17, 2009
![Page 185: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/185.jpg)
Thank you!谢谢!
43
Monday, August 17, 2009
![Page 186: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/186.jpg)
Joshua44
Monday, August 17, 2009
![Page 187: Variational Decoding for Statistical Machine Translationjason/papers/li+al.acl09.slides-anim.pdf · Spurious Ambiguity • Statistical models in MT exhibit spurious ambiguity •](https://reader035.vdocuments.us/reader035/viewer/2022081516/6023740a1d79f027642ced37/html5/thumbnails/187.jpg)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
q* is an n-gram model over output strings.
Decode using q*on the hypergraph
1
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
p(y, d | x)
dianzi0 shang1 de2 mao3
S 0,4
X 0,4 the · · · cat X 0,4 a · · · mat
X 0,2 the · · · mat X 3,4 a · · · cat
X!"mao,a cat#
X!"X0 de X1,X0 X1#
X!"dianzi shang, the mat#
X!"X0 de X1,X1 on X0#
S!"X0,X0#
X!"X0 de X1,X1 of X0#
S!"X0,X0#
X!"X0 de X1,X0 ’s X1#
q*(y | x)
2
3
Estimate a model from the hypergraph
Generate a hypergraph
q*(y | x)
≈∑d∈D(x,y) p(y,d|x)
Monday, August 17, 2009