bayesian word alignment for statistical machine translation authors: coskun mermer, murat saraclar...

15

Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Sara clar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Upload: percival-bell

Post on 17-Jan-2016

216 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

Page 1: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Bayesian Word Alignment for Statistical Machine Translation

Authors: Coskun Mermer, Murat Saraclar

Present by Jun Lang2011-10-13 I2R SMT-Reading Group

Page 2: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Paper info

• Bayesian Word Alignment for Statistical Machine Translation

• ACL 2011 Short Paper

• With Source Code in Perl on 379 lines

• Authors– Coskun Mermer– Murat Saraclar

Page 3: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Core Idea

• Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1

• Result– Outperform classical EM in BLEU up to 2.99– Effectively address the rare word problem– Much smaller phrase table than EM

Page 4: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Mathematics

• (E, F): parallel corpus• ei , fj : i-th (j-th) source (target) word in e (f), whic

h contains I (J) words in corpus E (F).• e0 : Each E sentence contains “null” word• VE (VF): size of source (target) vocabulary• a (A): alignment for sentence (corpus)• aj : fj has alignment aj for source word eaj

• T: parameter table, size is VE x VF

• te,f = P(f|e): word translation probability

Page 5: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

IBM Model 1

T as a random variable

Page 6: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

• T={te,f} is an exponential family distribution

• Specifically being multinomial distribution

• We choose the conjugate prior

• In the case of Dirichlet Distribution for computational convenience

Page 7: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution

Avoid rare words acting as “garbage collectors”

Page 8: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

sample the unknowns A and T in turn

¬j denotes the exclusion ofthe current value of aj .

Page 9: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Algorithm

A can be arbitrary, but normal EM output is better

Page 10: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Results

Page 11: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Page 12: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Page 13: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Code View

bayesalign.pl

Page 14: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Conclusions

• Outperform classical EM in BLEU up to 2.99

• Effectively address the rare word problem

• Much smaller phrase table than EM

• Shortcomings– Too slow: 100 sentence pairs costs 18 mins– Maybe can be speedup by parallel computing

Page 15: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

3

I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules

INTERNET TOPOLOGY MAPPING INTERNET MAPPING PROBING OVERHEAD MINIMIZATION Intra- and inter-monitor redundancy reduction IBRAHIM ETHEM COSKUN University

Nobility and Charm Of White - Alpay Mermer · Alpay Mermer mermercilik sektörü denilince akla ge-len ilk firma olmayı, bunu gerçekleştirirken de ticari ilkele-rinden taviz vermemeyi

BİLECİK BÖLGESİ MERMER SEKTÖRÜNÜN …Bilecik Bölgesi Mermer Sektörünün Uluslar Arası Rekabetçilik Analizi: Sektörel Sorunlar ve Çözüm Önerileri 197 ucuz işçilik

HAZ MERMER HAZ MARBLE - hazuae.ae · HAZ Mermer A.Ş. has been established in 1988 in Turkey, followed with branches in England, United Arab Emirates, Egypt, Qatar and Russia. HAZ

yeraltı mermer işletmelerinde oda topuk yöntem parametrelerinin

TEVFIKCAN COSKUN MAKING USE OF SIMULATION FOR …

Brain fingerprinting field studies comparing P300-MERMER and ... · Keywords Brain fingerprinting P300-MERMER P300 Event-related potential Detection of concealed information MERMER

MURAT COSKUN VISIONS - Pianissimo Musikpianissimomusik.com/pdf/visions_flyer.pdf · Murat Coskun - Visions Wenn Instrumentalisten von Visionen reden, dann sind es weniger Worte, mit

TABANLI MALİYETLEME SİSTEMİNİN KARŞILAŞTIRILMASI: Yrd. …iibf.kilis.edu.tr/iibfdergi/vol5no8/a3akın.pdf · Anahtar Kelimeler: Maliyet, Faaliyet Tabanlı Maliyetleme, Mermer

Transfer Learning Part I: Overview Sinno Jialin Pan Sinno Jialin Pan Institute for Infocomm Research (I2R), Singapore

Graphic Communications - İTÜweb.itu.edu.tr/~coskun/contents/lessons/graph/graphcom_01.pdf · Graphic Communications ... Graphics & Engineering Drafting and documentation, along

A Coskun Thesis Revised

MUTLU YILLAR HAPPY NEW YEAR ERMAS MERMER I

INTERNET MEASUREMENT INTERNET MAPPING OVERHEAD MINIMIZATION Intra- and inter-monitor redundancy reduction IBRAHIM ETHEM COSKUN University of Nevada,

Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group

14th International Marble Natural Stone Products ...cnrexpo.com/bulten/mermer/Natural_Stone_2017.pdf · Istanbul Mermer , the meeting point of professionals all around the world,

Kahramanlar mermer

Infrastructure to Responder (I2R) Technical Memo · 2019-03-19 · Opportunities and Challenges for Future I2R ... communication and , connectivity. Standard message sets are being

sercansevimermak.com...mermer makinalar. SRCN 1500 Dönerba§ll Mermer ve Granit Kesim Makinasl Kafa Dönü§leri ve Vagon Otomatik N Özellikler / Technical Data 1500 ... Hidrolik

Coskun KILIC, Chief Financial Officer, Turkish Airlinesinvestor.turkishairlines.com/documents/ThyInvestor... · Coskun KILIC, Chief Financial Officer, Turkish Airlines . ... 2014

TUMfar.in.tum.de/pub/artinger2010TechMapGestures/artinger2010TechM… · {artingee,echtler,schanzen,nestler, coskun,klinker}@in.tum.de ABSTRACT Interaction with virtual maps is a

IR 203 Global Economy & International Relations Lecture Notes Dr. Bezen Coskun, [email protected]@zirve.edu.tr

IR 203 Current issues in international relations (5) Bezen Balamir Coskun office: 417 [email protected] [email protected]

homepages.math.uic.eduhomepages.math.uic.edu/~coskun//skew-restrict.pdf · SYMPLECTIC RESTRICTION VARIETIES AND GEOMETRIC BRANCHING RULES IZZET COSKUN To Joe, with gratitude, in celebration

ÇUKUROVA ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ YÜKSEK ... · KİLİKYA BÖLGESİ’NDEN SEÇİLMİŞ ANTİK MERMER ESERLERİN KÖKENİNİN SAPTANMASI ARKEOMETRİ ANABİLİM

DEFCON 23 Why Nation-State Malwares Target Telco Networks - OMER COSKUN

Module 8: I2R Change Control 1.1 Gatekeeper Audit

IR 501 THEORIES of ınternatıonal relatıons (introduction) Bezen Balamir Coskun office: 417 [email protected] [email protected]

MERMER ATIĞININ GEOTEKNİK MÜhENDİSLİĞİNDE … · Serbest basınç deneyleri yapılan numuneler taramalı elektron mikroskopu (SEM) ile incelenerek içyapısındaki değişiklikler

IR 203 Human Security Lecture Notes Dr. Bezen Coskun, [email protected]@zirve.edu.tr

IR 203 Current ıssues ın ınternatıonal relatıons (2) Bezen Balamir Coskun office: 417 [email protected] [email protected]

WELCOME TO OUR CONFERENCE ADIYAMAN PUBLICITY PRESENTERS: OSMAN MURAT RANA COSKUN

Opportunistic Traffic Scheduling Over Multiple Network Path Coskun Cetinkaya and Edward Knightly