exploration of system combination in statistical machine translation

Exploration of system combination in statistical machine translation

Le Truong Vinh Phu

Supervisor: Prof. Ng Hwee Tou

Master of Computing dissertation

School of Computing

27th May 2014

•  Introduction •  Literature Review •  Multi-Engine Machine Translation (MEMT)

•  Experiments •  Conclusion and Future Research

Outline

2

•  Introduction ♦  Machine translation (MT)

♦  Statistical machine translation (SMT)

♦  Machine translation system combination

♦  Problem description & objective

•  Literature Review

•  Multi-Engine Machine Translation (MEMT) •  Experiments

•  Conclusion and Future Research

Outline

3

•  the use of computers to automate translation •  difficulty: translation divergences •  real-world benefits

•  different paradigms and approaches ♦  dictionary-based

♦  rule-based

♦  statistical

Machine translation (MT)

4

•  enabled by the availability of large corpora (mono, bi-lingual)

•  relying on probability models ♦  faithfulness

♦  fluency

•  P(F|E): translation model, P(E): language model

•  Phrase-based SMT (Koehn et al., 2003)

Statistical machine translation (SMT)

5

•  Language model: ♦  conditional probability of a word given previous words

♦  requires monolingual corpus

•  Alignment


6

•  Reordering model: ♦  penalties for long distance reordering

♦  distance-based (Koehn et al., 2005), phrase-based and hierarchical reordering (Galley & Manning, 2008)

•  Automatic evaluation: ♦  BLEU (Papineni et al., 2002)


7

•  different MT systems => different strengths and weaknesses

•  synthesizing a consensus translation

•  main aspects: ♦  combination method

♦  selection of good component systems to combine

MT system combination

8

•  Problem description ♦  in which situation and settings system combination works well?

•  Objective:

♦  evaluating system combination via empirical experiments Ø  available datasets: NIST OpenMT, WMT

♦  utilizing system combination to improve a Chinese-to-English phrase-based system

Problem description & objective

9

•  Introduction •  Literature Review ♦  System combination

♦  Confusion network decoding

♦  Other approaches

♦  Diverse hypotheses generation

•  Multi-Engine Machine Translation (MEMT) •  Experiments •  Conclusion and Future Research

Outline

10

•  successfully applied in speech recognition (Fiscus, 1997; Mangu et al., 2000)

•  crucial steps: aligning hypotheses, controlling word order

•  variety of approaches: ♦  hypothesis re-ranking (Hildebrand & Vogel, 2008)

♦  confusion networks (Rosti et al., 2007a, 2007b)

♦  collaborative decoding (Li et al., 2009)

System combination

11

•  current mainstream •  Bangalore et al. (2001), Matusov et al. (2006), Rosti et

al. (2007a, 2007b), Sim et al. (2007), He et al. (2008)

•  Rosti et al. (2007a) ♦  Sentence level

♦  Phrase level

♦  Word level

Confusion network decoding

12

Confusion network decoding

• cat sat the mat, cat sitting on the mat, and hat on a mat.

13

•  Collaborative decoding (Li et al.,2009) ♦  avoid early pruning of potentially good translations

♦  leverage agreement information of n-grams

•  Multi-Engine Machine Translation (MEMT) ♦  METEOR alignment (Banerjee & Lavie, 2005)

♦  no fixed backbone

Other approaches

14

•  Not a trivial problem (Siohan et al., 2005) •  Key point: complementary error patterns •  Approaches: ♦  selecting different systems of different paradigms

♦  diversifying one baseline system Ø  introducing randomness (Siohan et al., 2005) Ø  different morphological decompositions of source language (de

Gispert et al., 2009) Ø  varying alignment algorithms (Xu & Rosti, 2010) Ø  controlling target “trait” values (Devlin and Matsoukas, 2012)

Diverse hypothesis generation

15

•  Exploiting multiple Chinese word segmentation standards: Zhang et al. (2008), Dyer et al. (2008), Xu et al. (2005)

•  Zhang et al. (2008): ♦  Exploiting four SIGHAN standards: AS, CITYU, MSR, PKU

Diverse hypothesis generation

16

•  Introduction •  Literature Review •  Multi-Engine Machine Translation (MEMT) ♦  Overview

♦  Description

•  Experiments •  Conclusion and Future Research

Outline

17

•  Open source toolkit: http://kheafield.com/code/memt/ •  WMT system name: cmu-combo (2009), cmu-heafield-

combo (2010, 2011) •  Superior performance in WMT 2011

•  Easy to use, robust and efficient

Overview

18

•  Combining 1-best outputs of component systems ♦  Pair-wise alignment (METEOR)

♦  Beam search

♦  Z-MERT tuning (Zaidan, 2009)

•  Features: ♦  length

♦  language model

♦  backoff

♦  match

Description

19

•  METEOR alignment: ♦  exact matches

♦  identical stems (Porter, 2001)

♦  WordNet synonyms (Miller, 1995)

♦  TERp unigram paraphrases (Snover et al., 2009)

Description

20

•  Search space: ♦  picking one word at a time, from left to right

♦  maintaining two sets of “captured” and “uncaptured” words

♦  no duplication, fluency across switches

♦  no fixed backbone

Description

21

•  final hypothesis weaves together parts of component outputs

Description

22

•  Introduction •  Literature Review •  Multi-Engine Machine Translation (MEMT)

•  Experiments ♦  MEMT on WMT11

♦  MEMT on NIST MT08

♦  Diversifying Chinese-English phrase-based SMT

♦  Exploiting multiple CWS standards

•  Conclusion and Future Research

Outline

23

•  http://www.statmt.org/wmt11 •  two language pairs: French-English and Spanish-English •  Ranking participating systems by BLEU on the test set

•  Selecting different component systems for system combination

MEMT on WMT11

24

•  French-English MEMT on WMT11

system combination gain 25

•  Spanish-English MEMT on WMT11

system combination gain 26

•  Spanish-English ♦  why E1 (combining all) < E2 (excluding the bottom two) ?

MEMT on WMT11

27

•  LDC catalog no. LDC2010T21 and LDC2010T01 •  No accompanied system papers •  Challenging: mix of newswire and web texts

•  Chinese-English and Arabic-English ♦  split datasets into tuning set and test set

MEMT on NIST MT08

28

•  Chinese-English: ♦  Tuning set: 524 sentences, test set: 788 sentences

♦  Combining the top 5 systems out of 23 systems

♦  similar to Ma and McKeown (2012)

•  Arabic-English ♦  Tuning set: 509 sentences, test set: 803 sentences

♦  Combining the top 7 systems out of 14 systems

MEMT on NIST MT08

29

•  Chinese-English, gain = 3.76

MEMT on NIST MT08

30

•  Arabic-English, gain = 3.47

MEMT on NIST MT08

31

•  Varying different steps of training pipeline •  Tune on MTC1+MTC3 datasets (LDC2002T01 and

LDC2004T07), test on NIST02-NIST08 evaluation sets

•  Varying decoding algorithm: Maximum A Posteriori (MAP), Minimum Bayes Risk (MBR), Lattice Minimum Bayes Risk (LMBR)

•  Varying reordering model: word-based (wbe), phrase-based (phrase), hierarchical (hier), combined reordering (phrase-hier)

Diversifying Chinese-English SMT

32

•  Varying decoding algorithm, gain=-0.17

Diversifying Chinese-English SMT

33

•  Varying reordering model, gain=0.19 Diversifying Chinese-English SMT

34

•  Chinese Word Segmentation ♦  Correlates weakly with MT quality

♦  Potential source of diversity

•  SIGHAN Bakeoff evaluation campaign ♦  Academia Sinica (AS)

♦  City University of Hong Kong (CITYU)

♦  Penn Chinese Treebank (CTB)

♦  Microsoft Research (MSR)

♦  Peking University (PKU)

Exploiting multiple CWS standards

35

•  Chinese Word Segmentation


36

•  Baseline System ♦  Chinese-English phrase-based SMT systems trained with

Moses

♦  Segmenting and training five different systems corresponding to five CWS standards

♦  Training bi-text: 8,290,649 sentence pairs

♦  Interpolated language model of order 5

♦  Tuning set MTC1+MTC3: 1928 sentences, 4 references each

♦  giza++ alignment, combined reordering scheme, MBR decoding


37

•  System combination experiments ♦  Same tuning set MTC1+MTC3

♦  ZMERT and PRO tuning

♦  Test sets: NIST 2002 to 2006, 2008

♦  Evaluation: mteval-v11b, case-insensitive


38

•  Results – component systems Exploiting multiple CWS standards

39

•  Results – combining 5 systems ♦  Avg gain: 0.52 (ZMERT) and 0.82 (PRO)


40

•  Results – combining the top 3 systems ♦  Avg gain: 0.35 (ZMERT) and 0.64 (PRO)

♦  Lower than when combining 5 systems


41

•  Discussion ♦  CWS is a good source to generate diverse SMT systems

♦  Benefits: Ø  Reducing segmentation errors Ø  Reducing out-of-vocabulary words Ø  Providing diverse translations


42

•  Component system outputs


43

•  Combined system output


44

Conclusion and future research

•  Conclusion ♦  System combination does benefit MT

♦  Exceptions Ø  Combining very few systems Ø  Some component systems with exceptionally bad performance Ø  Combining very similar systems (non-complementary)

♦  Achieved the goal of improving Chinese-English SMT system

45

Conclusion and future research

•  Future research ♦  Evaluating different combination algorithms

Ø  Collaborative decoding (Li et al., 2009)

♦  Trait-based approach as a way to generate diverse inputs (Devlin and Matsoukas, 2012)

46

Summary

•  Empirical experiments ♦  MEMT as system combination module

♦  WMT and NIST evaluation sets

•  System combination does benefit MT quality ♦  comparable, complementary input systems

•  Exploiting multiple CWS as a way to diversify SMT systems ♦  improve a strong Chinese-English phrase-based system

♦  average gain 0.5-0.8 BLEU in NIST02-06 and NIST08

47

Thank You

48

exploration of system combination in statistical machine translation

Documents

translation model

translation difficulty

wmt11 memt

settings system combination

baseline system

different component

description meteor alignment

comcodememt wmt system