bayesian word alignment for statistical machine translation authors: coskun mermer, murat saraclar...
TRANSCRIPT
![Page 1: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/1.jpg)
Bayesian Word Alignment for Statistical Machine Translation
Authors: Coskun Mermer, Murat Saraclar
Present by Jun Lang2011-10-13 I2R SMT-Reading Group
![Page 2: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/2.jpg)
Paper info
• Bayesian Word Alignment for Statistical Machine Translation
• ACL 2011 Short Paper
• With Source Code in Perl on 379 lines
• Authors– Coskun Mermer– Murat Saraclar
![Page 3: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/3.jpg)
Core Idea
• Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1
• Result– Outperform classical EM in BLEU up to 2.99– Effectively address the rare word problem– Much smaller phrase table than EM
![Page 4: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/4.jpg)
Mathematics
• (E, F): parallel corpus• ei , fj : i-th (j-th) source (target) word in e (f), whic
h contains I (J) words in corpus E (F).• e0 : Each E sentence contains “null” word• VE (VF): size of source (target) vocabulary• a (A): alignment for sentence (corpus)• aj : fj has alignment aj for source word eaj
• T: parameter table, size is VE x VF
• te,f = P(f|e): word translation probability
![Page 5: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/5.jpg)
IBM Model 1
T as a random variable
![Page 6: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/6.jpg)
Dirichlet Distribution
• T={te,f} is an exponential family distribution
• Specifically being multinomial distribution
• We choose the conjugate prior
• In the case of Dirichlet Distribution for computational convenience
![Page 7: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/7.jpg)
Dirichlet Distribution
Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution
Avoid rare words acting as “garbage collectors”
![Page 8: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/8.jpg)
Dirichlet Distribution
sample the unknowns A and T in turn
¬j denotes the exclusion ofthe current value of aj .
![Page 9: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/9.jpg)
Algorithm
A can be arbitrary, but normal EM output is better
![Page 10: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/10.jpg)
Results
![Page 11: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/11.jpg)
![Page 12: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/12.jpg)
![Page 13: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/13.jpg)
Code View
bayesalign.pl
![Page 14: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/14.jpg)
Conclusions
• Outperform classical EM in BLEU up to 2.99
• Effectively address the rare word problem
• Much smaller phrase table than EM
• Shortcomings– Too slow: 100 sentence pairs costs 18 mins– Maybe can be speedup by parallel computing
![Page 15: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group](https://reader035.vdocuments.us/reader035/viewer/2022062804/56649f285503460f94c4137b/html5/thumbnails/15.jpg)
3