pptphrase tagset mapping for french and english treebanks and its application in machine translation...

41
25th International Conference, GSCL 2013 Aaron L.-F. Han, Derek F. Wong, Lidia S. Chao, Liangye He, Shuo Li, and Ling Zhu September 25 th -27 th , 2013, Darmstadt, Germany Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory Department of Computer and Information Science University of Macau

Upload: aaron-l-f-han

Post on 11-Jul-2015

45 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

25th International Conference, GSCL 2013

Aaron L.-F. Han, Derek F. Wong, Lidia S. Chao, Liangye He, Shuo Li, and Ling Zhu

September 25th -27th, 2013, Darmstadt, Germany

Natural Language Processing & Portuguese-Chinese Machine Translation

Laboratory

Department of Computer and Information Science

University of Macau

Page 2: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Background of language Treebank

Motivation

Designed phrase tagset mapping

Application in MT evaluation1. Manual evaluations

2. Traditional automatic MT evaluation methods

3. Designed unsupervised MT evaluation

4. Evaluating the evaluation method

5. Experiments

6. Open source code

Discussion

Further information

Page 3: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• To promote the development of syntactic analysis

• Many language treebanks are developed

– English Penn Treebank (Marcus et al., 1993; Mitchell et al., 1994)

– German Negra Treebank (Skut et al., 1997)

– French Treebank (Abeillé et al., 2003)

– Chinese Sinica Treebank (Chen et al., 2003)

– Etc.

Page 4: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Problems

– Different treebanks use their own syntactic tagsets

– The number of tags ranging from tens (e.g. English Penn Treebank) to hundreds (e.g. Chinese Sinica Treebank)

– Inconvenient when undertaking the multilingual or cross-lingual research

Page 5: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• To bridge the gap between these treebanks and facilitate future research

– E.g. the unsupervised induction of syntactic structure

• Petrov et al. (2012) develop a universal POS tagset

• How about the phrase level tags?

• The disaccord problem in the phrase level tags remains unsolved– Let’s try to solve it

Page 6: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Tentative design of phrase tagset mapping

– On English Penn Treebank I, II & French Treebank

• 9 universal phrasal categories covering

– 14 phrase tags in English Penn Treebank I

– 26 phrase tags in English Penn Treebank II

– 14 phrase tags in French Treebank

Page 7: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Table 1: phrase tagset mapping for French and English treebanks

Page 8: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Universal phrasal categories: NP (noun phrase), VP (verb phrase), AJP (adjective phrase), AVP (adverbial phrase), PP (prepositional phrase), S (sub/-sentence), CONJP (conjunction phrase), COP (coordinated phrse), X (other phrases or unknown)

• NP covering

– French tags: NP

– English tags: NP, NAC (the scope of certain prenominal modifiers within an NP), NX (within certain complex NPs to mark the head of NP), WHNP (wh-noun phrase), QP (quantifier phrase)

Page 9: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• VP covering

– French tags: VN (verbal nucleus), VP (infinitives and nonfinite clauses)

– English tags: VP (verb phrase)

• AJP covering

– French tags: AP (adjectival phrase)

– English tags: ADJP (adjective phrase), WHADJP (wh-adjective phrase)

Page 10: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• AVP covering

– French tags: AdP (adverbial phrases)

– English tags: ADVP (adverb phrase), WHAVP (wh-adverb phrase), PRT (particle)

• PP covering

– French tags: PP

– English tags: PP, WHPP (wh-propositional phrase phrase)

Page 11: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• S covering

– French tags: SENT (sentence), S (finite clause)

– English tags: S (simple declarative clause), SBAR (clause introduced by a subordinating conjunction), SBARQ (direct question introduced by a wh-phrase), SINV (declarative sentence with subject-aux inversion), SQ (sub-constituent of SBARQ), PRN (parenthetical), FRAG (fragment), RRC (reduced relative clause).

• CONJP covering

– French tags: N/A

– English tags: CONJP

Page 12: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• COP covering

– French tags: COORD (coordinated phrase)

– English tags: UCP (coordinated phrases belonging to different categories)

• X covering

– French tags: unknown

– English tags: X (unknown or uncertain), INTJ (interjection), LST (list marker)

Page 13: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

4. Application in Machine Translation

evaluation

Page 14: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Rapid development of Machine Translations

– MT began as early as in the 1950s (Weaver, 1955)

– Big progress science the 1990s due to the development of computers (storage capacity and computational power) and the enlarged bilingual corpora (Marino et al. 2006)

• Difficulties of MT evaluation

– language variability results in no single correct translation

– the natural languages are highly ambiguous and different languages do not always express the same content in the same way (Arnold, 2003)

Page 15: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Traditional manual evaluation criteria:

– intelligibility (measuring how understandable the sentence is)

– fidelity (measuring how much information the translated sentence retains as compared to the original) by the Automatic Language Processing Advisory Committee (ALPAC) around 1966 (Carroll, 1966)

– adequacy (similar as fidelity), fluency (whether the sentence is well-formed and fluent) and comprehension(improved intelligibility) by Defense Advanced Research Projects Agency (DARPA) of US (White et al., 1994)

Page 16: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Problems of manual evaluations :

– Time-consuming

– Expensive

– Unrepeatable

– Low agreement (Callison-Burch, et al., 2011)

Page 17: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Measuring the similarity of automatic translation and reference translation

– Automatic translation (or hypothesis translation, target translation): by automatic MT system

– Reference translation: by professional translators

– Source language and source document: not used

• Traditional automatic evaluation:

– BLEU: n-gram precisions (Papineni et al., 2002)

– TER: edit distances (Snover et al., 2006)

– METEOR: precision and recall (Banerjee and Lavie, 2005)

Page 18: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Problems in supervised MT evaluation

– Reference translations are expensive

– Reference translations are not available is some cases

• Could we get rid of the reference translation?

– Unsupervised MT evaluation method

– Extract information from source and target language

– How to use the designed universal phrase tagset?

Page 19: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Assume that the translated sentence should have a similar set of phrase categories with the source sentence.

– This design is inspired by the synonymous relation between source and target sentence.

• Two sentences that have similar set of phrases may talk about different things.

– However, this evaluation approach is not designed for general circumstance

– Assume that the targeted sentences are indeed the translated sentences from the source document

Page 20: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• First, we parse the source and target languages respectively

• Then we extract the phrase set from the source and target sentences

• Third, we convert the phrases into the developed universal phrase categories

• Last, we measure the similarity of source and target language on the universal phrase sequences

Page 21: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Figure 1: the parsed French and English sentence

Page 22: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Figure 2: convert the extracted phrase into universal phrase tags

The level of extracted phrase tags: just the upper level of POS tags, bottom-up

Page 23: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• What is the similarity metric we employed?

• Designed similarity metric: HPPR

– N1 gram position order difference penalty

– Weighted N2 gram precision

– Weighted N3 gram recall

– Weighted geometric mean in n-gram precision & recall

– Weighted harmonic mean to combine sub-factors

– The parameters are tunable according to different language pairs

Page 24: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• 𝐻𝑃𝑃𝑅 = 𝐻𝑎𝑟(𝑤𝑃𝑠𝑁1𝑃𝑠𝐷𝑖𝑓, 𝑤𝑃𝑟𝑁2𝑃𝑟𝑒,𝑤𝑅𝑐𝑁3𝑅𝑒𝑐)

• 𝐻𝑃𝑃𝑅 =𝑤𝑃𝑠+𝑤𝑃𝑟+𝑤𝑅𝑐

𝑤𝑃𝑠𝑁1𝑃𝑠𝐷𝑖𝑓

+𝑤𝑃𝑟𝑁2𝑃𝑟𝑒

+𝑤𝑅𝑐𝑁3𝑅𝑒𝑐

• 𝑁1𝑃𝑠𝐷𝑖𝑓, 𝑁2𝑃𝑟𝑒, and 𝑁3𝑅𝑒𝑐 are the corpus levelscores of sub-factors position difference penalty, precision and recall.

Page 25: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• The sentence level 𝑁1𝑃𝑠𝐷𝑖𝑓 score:

• 𝑁1𝑃𝑠𝐷𝑖𝑓 = exp(−𝑁1𝑃𝐷)

• 𝑁1𝑃𝐷 =1

𝐿𝑒𝑛𝑔𝑡ℎℎ𝑦𝑝∑|𝑃𝐷𝑖|

• 𝑃𝐷𝑖 = |𝑃𝑠𝑁ℎ𝑦𝑝 −𝑀𝑎𝑡𝑐ℎ𝑃𝑠𝑁𝑠𝑟𝑐|

• 𝑃𝑠𝑁ℎ𝑦𝑝 and𝑀𝑎𝑡𝑐ℎ𝑃𝑠𝑁𝑠𝑟𝑐 are the position number

of matching tag in the hypothesis and sourcesentence respectively. When no match for the tag:𝑃𝐷𝑖 = |𝑃𝑠𝑁ℎ𝑦𝑝 − 0|

Page 26: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Figure 3: N1 gram tag alignment algorithm

Page 27: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Figure 4: 𝑁1𝑃𝐷 calculation example

Page 28: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Corpus-level weighted n-gram precision & recall

• 𝑁2𝑃𝑟𝑒 = exp(∑𝑛=1𝑁2 𝑤𝑛𝑙𝑜𝑔𝑃𝑛)

• 𝑁3𝑅𝑒𝑐 = exp(∑𝑛=1𝑁3 𝑤𝑛𝑙𝑜𝑔𝑅𝑛)

• 𝑃𝑛 =#𝑚𝑎𝑡𝑐ℎ𝑒𝑑 𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘𝑠

#𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘𝑠 𝑜𝑓 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑐𝑜𝑟𝑝𝑢𝑠

• 𝑅𝑛 =#𝑚𝑎𝑡𝑐ℎ𝑒𝑑 𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘𝑠

#𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘𝑠 𝑜𝑓 𝑠𝑜𝑢𝑟𝑐𝑒 𝑐𝑜𝑟𝑝𝑢𝑠

Page 29: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Figure 5: bigram chunk matching example

Page 30: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• How reliable is the automatic metric?

• Evaluation criteria for evaluation metrics:

– Human judgments are the golden to approach, currently

– Correlation with human judgments (Callison-Burch, et al., 2011, 2012)

• Spearman rank correlation coefficient rs:

– 𝑟𝑠 𝑋𝑌 = 1 −6 ∑𝑖=1

𝑛 𝑑𝑖2

𝑛(𝑛2−1)

– Two rank sequences 𝑋 = 𝑥1, … , 𝑥𝑛 , 𝑌 = {𝑦1, … , 𝑦𝑛}

Page 31: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Corpus from WMT

– Workshop of statistical machine translation

– SIGMT, ACL’S special interest group of machine translation

• Training data (WMT11), tune the parameters

– 3, 003 sentences for each document

– 18 automatic French-to-English MT systems

• Testing data (WMT12)

– 3, 003 sentences for each document

– 15 automatic French-to-English MT systems

Page 32: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Training, tune the parameters

– N1, N2 and N3 are tuned as 2, 3 and 3 due to the fact that the 4-gram chunk match usually results in 0 score.

– Tuned values of factor weights are shown in table

Table 2: tuned parameter values

Page 33: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Comparisons with:– BLEU, measure the closeness of the hypothesis and

reference translations, n-gram precision

– TER, measure the editing distance of hypothesis to reference translations

Page 34: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Table 3: training (development) scores on WMT11 corpus

Table 4: testing scores on WMT12 corpus

Page 35: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Table 5: correlation score intro (Cohen, 1988)

The experiment results on the development and testing corpora show thatHPPR without using reference translations has yielded promisingcorrelation scores (0.63 and 0.59 respectively).

There is still potential to improve the performances of all the three

metrics, even though that the correlation scores which are higher than 0.5are already considered as strong correlation as shown in Table 5.

Page 36: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation– Aaron L.-F. Han, Derek F. Wong, Lidia S. Chao, Liangye He,

Shuo Li, and Ling Zhu. GSCL 2013, Darmstadt, Germany. LNCS Vol. 8105, pp. 119-131, Volume Editors: IrynaGurevych, Chris Biemann and Torsten Zesch.

• Open source tool for phrase tagset mapping and HPPR similarity measuring algorithms: https://github.com/aaronlifenghan/aaron-project-hppr

Page 37: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Facilitate future research in multilingual or cross-lingual literature, this paper designs a phrase tags mapping between the French Treebank and the English Penn Treebank using 9 phrase categories.

• One of the potential applications of the designed universal phrase tagset is shown in the unsupervised MT evaluation task in the experiment section.

Page 38: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• There are still some limitations in this work to be addressed in the future.

– The designed universal phrase categories may not be

able to cover all the phrase tags of other languagetreebanks, so this tagset could be expanded when necessary.

– The designed HPPR formula contains the n-gram factors

of position difference, precision and recall, which may not

be sufficient or suitable for some of the other language

pairs, so different measuring factors should be added or

switched when facing new tasks.

Page 39: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Actually speaking, the designed models are very related to the similarity measuring. Where we have employed them is in the MT evaluation. These works may be further developed into other literature:

– information retrieval

– question and answering

– Searching

– text analysis

– etc.

Page 40: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

• Ongoing and further works:

– The combination of translation and evaluation, tuning the translation model using evaluation metrics

– Evaluation models from the perspective of semantics

– The further explorations of unsupervised evaluation models, extracting other features from source and target languages

• Aaron open source tools: https://github.com/aaronlifenghan

• Aaron network Home: http://www.linkedin.com/in/aaronhan

Page 41: Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

GSCL 2013, Darmstadt, Germany

Aaron L.-F. Hanemail: hanlifengaaron AT gmail DOT com

Natural Language Processing & Portuguese-Chinese Machine Translation

Laboratory

Department of Computer and Information Science

University of Macau