exploration of system combination in statistical machine translation
DESCRIPTION
Different paradigms and approaches in Machine Translation (MT) result in different MT systems with their own strengths and weaknesses. The complementary strengths of multiple MT systems can be exploited by system combination. This research work aims to examine the effect of system combination on MT via empirical experiments, and more importantly, utilize system combination to improve a state-of-the-art Chinese-to-English statistical machine translation (SMT) system. Extensive experiments were carried out on gold standard datasets in MT, in particular, the evaluation sets of NIST Open Machine Translation (OpenMT) evaluation series and Workshops on Statistical Machine Translation (WMT). We not only evaluate the effects of system combination on translation performance but also examine different ways of selecting component systems. Finally, we exploit different Chinese word segmentation (CWS) standards as a way to produce diverse translation output for system combination. This approach yields significant gain of 0.5-0.8 BLEU points on average over strong baseline systems.TRANSCRIPT
Exploration of system combination in statistical machine translation
Le Truong Vinh Phu
Supervisor: Prof. Ng Hwee Tou
Master of Computing dissertation
School of Computing
27th May 2014
• Introduction • Literature Review • Multi-Engine Machine Translation (MEMT)
• Experiments • Conclusion and Future Research
Outline
2
• Introduction ♦ Machine translation (MT)
♦ Statistical machine translation (SMT)
♦ Machine translation system combination
♦ Problem description & objective
• Literature Review
• Multi-Engine Machine Translation (MEMT) • Experiments
• Conclusion and Future Research
Outline
3
• the use of computers to automate translation • difficulty: translation divergences • real-world benefits
• different paradigms and approaches ♦ dictionary-based
♦ rule-based
♦ statistical
Machine translation (MT)
4
• enabled by the availability of large corpora (mono, bi-lingual)
• relying on probability models ♦ faithfulness
♦ fluency
• P(F|E): translation model, P(E): language model
• Phrase-based SMT (Koehn et al., 2003)
Statistical machine translation (SMT)
5
• Language model: ♦ conditional probability of a word given previous words
♦ requires monolingual corpus
• Alignment
Statistical machine translation (SMT)
6
• Reordering model: ♦ penalties for long distance reordering
♦ distance-based (Koehn et al., 2005), phrase-based and hierarchical reordering (Galley & Manning, 2008)
• Automatic evaluation: ♦ BLEU (Papineni et al., 2002)
Statistical machine translation (SMT)
7
• different MT systems => different strengths and weaknesses
• synthesizing a consensus translation
• main aspects: ♦ combination method
♦ selection of good component systems to combine
MT system combination
8
• Problem description ♦ in which situation and settings system combination works well?
• Objective:
♦ evaluating system combination via empirical experiments Ø available datasets: NIST OpenMT, WMT
♦ utilizing system combination to improve a Chinese-to-English phrase-based system
Problem description & objective
9
• Introduction • Literature Review ♦ System combination
♦ Confusion network decoding
♦ Other approaches
♦ Diverse hypotheses generation
• Multi-Engine Machine Translation (MEMT) • Experiments • Conclusion and Future Research
Outline
10
• successfully applied in speech recognition (Fiscus, 1997; Mangu et al., 2000)
• crucial steps: aligning hypotheses, controlling word order
• variety of approaches: ♦ hypothesis re-ranking (Hildebrand & Vogel, 2008)
♦ confusion networks (Rosti et al., 2007a, 2007b)
♦ collaborative decoding (Li et al., 2009)
System combination
11
• current mainstream • Bangalore et al. (2001), Matusov et al. (2006), Rosti et
al. (2007a, 2007b), Sim et al. (2007), He et al. (2008)
• Rosti et al. (2007a) ♦ Sentence level
♦ Phrase level
♦ Word level
Confusion network decoding
12
Confusion network decoding
• cat sat the mat, cat sitting on the mat, and hat on a mat.
13
• Collaborative decoding (Li et al.,2009) ♦ avoid early pruning of potentially good translations
♦ leverage agreement information of n-grams
• Multi-Engine Machine Translation (MEMT) ♦ METEOR alignment (Banerjee & Lavie, 2005)
♦ no fixed backbone
Other approaches
14
• Not a trivial problem (Siohan et al., 2005) • Key point: complementary error patterns • Approaches: ♦ selecting different systems of different paradigms
♦ diversifying one baseline system Ø introducing randomness (Siohan et al., 2005) Ø different morphological decompositions of source language (de
Gispert et al., 2009) Ø varying alignment algorithms (Xu & Rosti, 2010) Ø controlling target “trait” values (Devlin and Matsoukas, 2012)
Diverse hypothesis generation
15
• Exploiting multiple Chinese word segmentation standards: Zhang et al. (2008), Dyer et al. (2008), Xu et al. (2005)
• Zhang et al. (2008): ♦ Exploiting four SIGHAN standards: AS, CITYU, MSR, PKU
Diverse hypothesis generation
16
• Introduction • Literature Review • Multi-Engine Machine Translation (MEMT) ♦ Overview
♦ Description
• Experiments • Conclusion and Future Research
Outline
17
• Open source toolkit: http://kheafield.com/code/memt/ • WMT system name: cmu-combo (2009), cmu-heafield-
combo (2010, 2011) • Superior performance in WMT 2011
• Easy to use, robust and efficient
Overview
18
• Combining 1-best outputs of component systems ♦ Pair-wise alignment (METEOR)
♦ Beam search
♦ Z-MERT tuning (Zaidan, 2009)
• Features: ♦ length
♦ language model
♦ backoff
♦ match
Description
19
• METEOR alignment: ♦ exact matches
♦ identical stems (Porter, 2001)
♦ WordNet synonyms (Miller, 1995)
♦ TERp unigram paraphrases (Snover et al., 2009)
Description
20
• Search space: ♦ picking one word at a time, from left to right
♦ maintaining two sets of “captured” and “uncaptured” words
♦ no duplication, fluency across switches
♦ no fixed backbone
Description
21
• final hypothesis weaves together parts of component outputs
Description
22
• Introduction • Literature Review • Multi-Engine Machine Translation (MEMT)
• Experiments ♦ MEMT on WMT11
♦ MEMT on NIST MT08
♦ Diversifying Chinese-English phrase-based SMT
♦ Exploiting multiple CWS standards
• Conclusion and Future Research
Outline
23
• http://www.statmt.org/wmt11 • two language pairs: French-English and Spanish-English • Ranking participating systems by BLEU on the test set
• Selecting different component systems for system combination
MEMT on WMT11
24
• French-English MEMT on WMT11
system combination gain 25
• Spanish-English MEMT on WMT11
system combination gain 26
• Spanish-English ♦ why E1 (combining all) < E2 (excluding the bottom two) ?
MEMT on WMT11
27
• LDC catalog no. LDC2010T21 and LDC2010T01 • No accompanied system papers • Challenging: mix of newswire and web texts
• Chinese-English and Arabic-English ♦ split datasets into tuning set and test set
MEMT on NIST MT08
28
• Chinese-English: ♦ Tuning set: 524 sentences, test set: 788 sentences
♦ Combining the top 5 systems out of 23 systems
♦ similar to Ma and McKeown (2012)
• Arabic-English ♦ Tuning set: 509 sentences, test set: 803 sentences
♦ Combining the top 7 systems out of 14 systems
MEMT on NIST MT08
29
• Chinese-English, gain = 3.76
MEMT on NIST MT08
30
• Arabic-English, gain = 3.47
MEMT on NIST MT08
31
• Varying different steps of training pipeline • Tune on MTC1+MTC3 datasets (LDC2002T01 and
LDC2004T07), test on NIST02-NIST08 evaluation sets
• Varying decoding algorithm: Maximum A Posteriori (MAP), Minimum Bayes Risk (MBR), Lattice Minimum Bayes Risk (LMBR)
• Varying reordering model: word-based (wbe), phrase-based (phrase), hierarchical (hier), combined reordering (phrase-hier)
Diversifying Chinese-English SMT
32
• Varying decoding algorithm, gain=-0.17
Diversifying Chinese-English SMT
33
• Varying reordering model, gain=0.19 Diversifying Chinese-English SMT
34
• Chinese Word Segmentation ♦ Correlates weakly with MT quality
♦ Potential source of diversity
• SIGHAN Bakeoff evaluation campaign ♦ Academia Sinica (AS)
♦ City University of Hong Kong (CITYU)
♦ Penn Chinese Treebank (CTB)
♦ Microsoft Research (MSR)
♦ Peking University (PKU)
Exploiting multiple CWS standards
35
• Chinese Word Segmentation
Exploiting multiple CWS standards
36
• Baseline System ♦ Chinese-English phrase-based SMT systems trained with
Moses
♦ Segmenting and training five different systems corresponding to five CWS standards
♦ Training bi-text: 8,290,649 sentence pairs
♦ Interpolated language model of order 5
♦ Tuning set MTC1+MTC3: 1928 sentences, 4 references each
♦ giza++ alignment, combined reordering scheme, MBR decoding
Exploiting multiple CWS standards
37
• System combination experiments ♦ Same tuning set MTC1+MTC3
♦ ZMERT and PRO tuning
♦ Test sets: NIST 2002 to 2006, 2008
♦ Evaluation: mteval-v11b, case-insensitive
Exploiting multiple CWS standards
38
• Results – component systems Exploiting multiple CWS standards
39
• Results – combining 5 systems ♦ Avg gain: 0.52 (ZMERT) and 0.82 (PRO)
Exploiting multiple CWS standards
40
• Results – combining the top 3 systems ♦ Avg gain: 0.35 (ZMERT) and 0.64 (PRO)
♦ Lower than when combining 5 systems
Exploiting multiple CWS standards
41
• Discussion ♦ CWS is a good source to generate diverse SMT systems
♦ Benefits: Ø Reducing segmentation errors Ø Reducing out-of-vocabulary words Ø Providing diverse translations
Exploiting multiple CWS standards
42
• Component system outputs
Exploiting multiple CWS standards
43
• Combined system output
Exploiting multiple CWS standards
44
Conclusion and future research
• Conclusion ♦ System combination does benefit MT
♦ Exceptions Ø Combining very few systems Ø Some component systems with exceptionally bad performance Ø Combining very similar systems (non-complementary)
♦ Achieved the goal of improving Chinese-English SMT system
45
Conclusion and future research
• Future research ♦ Evaluating different combination algorithms
Ø Collaborative decoding (Li et al., 2009)
♦ Trait-based approach as a way to generate diverse inputs (Devlin and Matsoukas, 2012)
46
Summary
• Empirical experiments ♦ MEMT as system combination module
♦ WMT and NIST evaluation sets
• System combination does benefit MT quality ♦ comparable, complementary input systems
• Exploiting multiple CWS as a way to diversify SMT systems ♦ improve a strong Chinese-English phrase-based system
♦ average gain 0.5-0.8 BLEU in NIST02-06 and NIST08
47
Thank You
48