using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf ·...

56
Using unsupervised corpus-based methods to build rule-based machine translation systems Felipe S´ anchez Mart´ ınez [email protected] Ph.D. thesis supervised by Mikel L. Forcada Juan Antonio P´ erez Ortiz 30th June 2008 Felipe S´ anchez Mart´ ınez (Univ. d’Alacant) 30th June 2008 1 / 45

Upload: others

Post on 08-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Using unsupervised corpus-based methods to buildrule-based machine translation systems

Felipe Sanchez Martınez

[email protected]

Ph.D. thesissupervised by

Mikel L. Forcada

Juan Antonio Perez Ortiz

30th June 2008

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 1 / 45

Page 2: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Outline

1 Motivation & goal

2 Part-of-speech taggers for machine translationPart-of-speech taggingMT-oriented hidden Markov model training

3 Pruning of disambiguation pathsDisadvantages of the MT-oriented methodPruning method

4 Part-of-speech tag clusteringBest HMM topology for taggers used in MTBottom-up agglomerative clustering

5 Automatic inference of transfer rulesAlignment templates for shallow-transfer machine translationGeneration of Apertium transfer rules

6 Concluding remarks

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 2 / 45

Page 3: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Motivation & goal

Outline

1 Motivation & goal

2 Part-of-speech taggers for machine translationPart-of-speech taggingMT-oriented hidden Markov model training

3 Pruning of disambiguation pathsDisadvantages of the MT-oriented methodPruning method

4 Part-of-speech tag clusteringBest HMM topology for taggers used in MTBottom-up agglomerative clustering

5 Automatic inference of transfer rulesAlignment templates for shallow-transfer machine translationGeneration of Apertium transfer rules

6 Concluding remarks

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 2 / 45

Page 4: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Motivation & goal

Motivation

Experience in the development of shallow-transfer MT systemsinterNOSTRUM Spanish↔CatalanTraductor Universia Spanish↔PortugueseApertium Several language pairs available

Huge human effort to code all the linguistic resources

Resources usually needed by shallow-transfer MT systemsMonolingual dictionariesPart-of speech (PoS) taggersBilingual dictionariesStructural transfer rules

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 3 / 45

Page 5: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Motivation & goal

Goal

Goal:To reduce the human effortUsing corpus-based methodsIn an unsupervised way

Focus on:the PoS taggers used in the analysis phasethe set of shallow structural transfer rules used in translation

⇒ Benefiting from the rest of resources⇐

lexicaltransfer

lSLtext

→ morph.analyzer → PoS

tagger →struct.transfer

→ morph.generator → post-

generator → TLtext

http://apertium.org

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 4 / 45

Page 6: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation

Outline

1 Motivation & goal

2 Part-of-speech taggers for machine translationPart-of-speech taggingMT-oriented hidden Markov model training

3 Pruning of disambiguation pathsDisadvantages of the MT-oriented methodPruning method

4 Part-of-speech tag clusteringBest HMM topology for taggers used in MTBottom-up agglomerative clustering

5 Automatic inference of transfer rulesAlignment templates for shallow-transfer machine translationGeneration of Apertium transfer rules

6 Concluding remarks

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 4 / 45

Page 7: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation Part-of-speech tagging

Part-of-speech tagging /1

Problem: Selecting the correct PoS tag for those words with more thanone (ambiguous words)⇒ Hidden Markov models (HMM) are one of the standard statisticalsolutions

Each HMM state corresponds to a different PoS tagEach input word is replaced by its corresponding ambiguity class

verb

noun

. . .{verb, noun}

{verb} {verb, noun, adj}

. . .

{noun}{noun, verb}{noun, prep}

. . .

. . .

0.1

0.2 0.02

0.4

0.20.01

0.08

0.12{verb}

0

{noun}0

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 5 / 45

Page 8: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation Part-of-speech tagging

Part-of-speech tagging /2

In MT PoS tagging becomes crucial:

Translation may differ from one PoS tag to another

English PoS Spanish

book noun libroverb reservar

Structural transformations may be applied (or not) for some PoStag

English PoS Spanish reordering

the green house green-adj la casa verde ←rulegreen-noun * el cesped casa applied

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 6 / 45

Page 9: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation Part-of-speech tagging

General-purpose HMM training methods

General-purpose HMM training methods:

Supervised (hand-tagged corpora available):Maximum-likelihood estimate (MLE)

Unsupervised (only untagged corpora available):Baum-Welch (expectation-maximization, EM)

Main features:

Only use information from the language being taggedIndependent of the natural language processing applicationTo get high tagging accuracy supervised resources (hand-taggedcorpora) must be built

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 7 / 45

Page 10: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

MT-oriented HMM training method

PoS tagging is just an intermediate task for the whole translationprocedureGood translation performance, rather than PoS tagging accuracy,becomes the real objective

Idea: As the goal is to get good translations into TL,let a TL model decide whether a given “construction”in the TL is good or not

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 8 / 45

Page 11: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

MT-oriented HMM training method: overview /1

lexicaltransfer

lSLtext

→ morph.analyzer → PoS

tagger →struct.transfer

→ morph.generator → post-

generator → TLtext

Unsupervised training

Resources required:an SL untagged text automatically obtained from an SL raw corpus

the other modules of the MT system following the PoS tagger

a TL model trained from a raw TL corpus

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 9 / 45

Page 12: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

MT-oriented HMM training method: overview /2

Procedure:1 SL corpus is segmented2 All possible disambiguations of each segment are translated into TL3 A TL model is used to score each translation4 HMM parameters are computed according to the likelihood of the

corresponding translations into TL

paths translations MTL scores counts

s↗ g1↘

MT↗ τ(g1, s)↘

MTTL

↗ PTL(τ(g1, s)) 99K n(·)g2 τ(g2, s) PTL(τ(g2, s)) 99K n(·)

↘... ↗ ↘

... ↗ ↘...

......

gm τ(gm, s) PTL(τ(gm, s)) 99K n(·)

⇒ The resulting tagger is tuned to the translation fluency⇐

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 10 / 45

Page 13: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

Example: English→Spanish

SL segment (English):He-prn rocks-noun|verb the-art table-noun|verb

Possible translations (Spanish) according to each disambiguationand their normalized likelihoods according to a TL model:

• El-prn mece-verb la-art mesa-noun 0.75• El-prn mece-verb la-art presenta-verb 0.15• El-prn rocas-noun la-art mesa-noun 0.06• El-prn rocas-noun la-art presenta-verb + 0.04

1.00

The HMM parameters involved in these 4 disambiguations areupdated according to their likelihoods in the TL

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 11 / 45

Page 14: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

Example: English→Spanish

SL segment (English):He-prn rocks-noun|verb the-art table-noun|verb

Possible translations (Spanish) according to each disambiguationand their normalized likelihoods according to a TL model:

• El-prn mece-verb la-art mesa-noun 0.75• El-prn mece-verb la-art presenta-verb 0.15• El-prn rocas-noun la-art mesa-noun 0.06• El-prn rocas-noun la-art presenta-verb + 0.04

1.00

The HMM parameters involved in these 4 disambiguations areupdated according to their likelihoods in the TL

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 11 / 45

Page 15: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

Example: English→Spanish

SL segment (English):He-prn rocks-noun|verb the-art table-noun|verb

Possible translations (Spanish) according to each disambiguationand their normalized likelihoods according to a TL model:

• El-prn mece-verb la-art mesa-noun 0.75• El-prn mece-verb la-art presenta-verb 0.15• El-prn rocas-noun la-art mesa-noun 0.06• El-prn rocas-noun la-art presenta-verb + 0.04

1.00

The HMM parameters involved in these 4 disambiguations areupdated according to their likelihoods in the TL

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 11 / 45

Page 16: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

Experiments /1

Task: training PoS tagger for Spanish, French and Occitan to beused in MT into Catalan

TL model: trigram language model trained from a Catalan corpuswith ≈ 2 · 106 words

Experiments conducted with5 disjoint corpora with 0.5 · 106 words for Spanish5 disjoint corpora with 0.5 · 106 words for FrenchOnly one corpus with 0.3 · 106 words for Occitan

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 12 / 45

Page 17: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

Experiments /2

Reference results:Baum-Welch expectation maximization on 10 · 106 words corpora

Supervised: MLE from a hand-tagged corpus ≈ 21.5 · 103 words(only for Spanish)

TLM-best: when a TL model is used at translation time to selectalways the most likely translation

approximate indication of the best results the MT-oriented methodcould achieve

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 13 / 45

Page 18: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

Some results: Spanish→Catalan /1

Mean and std. dev. of the translation performance, WER (% of words)

6.5

6.75

7.0

8.5

0 0.1 0.2 0.3 0.4 0.5

Wor

d er

ror

rate

(W

ER

, % o

f wor

ds)

SL (Spanish) words x 106

Baum−Welch

SupervisedTLM−best

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 14 / 45

Page 19: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

Some results: Spanish→Catalan /2

Mean and std. dev. of the PoS tagging error rate (% of words)

5

6

7

8

9

10

0 0.1 0.2 0.3 0.4 0.5

PoS

tagg

ing

erro

r ra

te (

% o

f wor

ds)

SL (Spanish) words x 106

Baum−Welch

Supervised

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 15 / 45

Page 20: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech taggers for machine translation MT-oriented hidden Markov model training

Some results: Spanish→Catalan /3

Why are the translation performances for the supervised and theMT-oriented method comparable, but no the PoS tagging error rates?

TL information does not discriminate among the SL analyses of asegment leading to the same translation

French PoS Spanish

la ville la-art la ciudadla-prn

Free-ride: phenomenon by which choosing the incorrectinterpretation for an ambiguous word does not result in atranslation error

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 16 / 45

Page 21: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths

Outline

1 Motivation & goal

2 Part-of-speech taggers for machine translationPart-of-speech taggingMT-oriented hidden Markov model training

3 Pruning of disambiguation pathsDisadvantages of the MT-oriented methodPruning method

4 Part-of-speech tag clusteringBest HMM topology for taggers used in MTBottom-up agglomerative clustering

5 Automatic inference of transfer rulesAlignment templates for shallow-transfer machine translationGeneration of Apertium transfer rules

6 Concluding remarks

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 16 / 45

Page 22: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths Disadvantages of the MT-oriented method

Disadvantages of the MT-oriented method

The number of possible disambiguations to translate growsexponentially with segment length

Translation is the most time-consuming task

Goal: To overcome this problem

How: Pruning unlikely disambiguation paths by using a prioriknowledge

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 17 / 45

Page 23: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths Pruning method

Pruning method /1

Based on an initial model of SL tags

Assumption: Any reasonable model of SL tags maybe useful to choose a subset of possible disambigua-tion paths so that the correct one is in that subset

The model used for pruning can be updated dynamically duringtraining

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 18 / 45

Page 24: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths Pruning method

Pruning method /2

1 The a priori likelihood of each possible disambiguation path of SLsegment s is calculated using the pruning model

2 The set of disambiguation paths to take into account isdetermined by using a mass probability threshold ρ

Only the minimum number of paths to reach the mass probabilitythreshold ρ are taken into account

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 19 / 45

Page 25: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths Pruning method

Example (English→Spanish)

SL segment (English):He-prn rocks-noun|verb the-art table-noun|verb

Normalized a priori likelihoods:

g1 = (prn, verb, art, noun) 0.69g2 = (prn, verb, art, verb) 0.14g3 = (prn, noun, art, noun) 0.10g4 = (prn, noun, art, verb) + 0.07

1.00

With ρ = 0.8, g3 and g4 are discarded because 0.69 + 0.14 ≥ 0.8

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 20 / 45

Page 26: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths Pruning method

Example (English→Spanish)

SL segment (English):He-prn rocks-noun|verb the-art table-noun|verb

Normalized a priori likelihoods:

g1 = (prn, verb, art, noun) 0.69g2 = (prn, verb, art, verb) 0.14g3 = (prn, noun, art, noun) 0.10g4 = (prn, noun, art, verb) + 0.07

1.00

With ρ = 0.8, g3 and g4 are discarded because 0.69 + 0.14 ≥ 0.8

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 20 / 45

Page 27: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths Pruning method

Example (English→Spanish)

SL segment (English):He-prn rocks-noun|verb the-art table-noun|verb

Normalized a priori likelihoods:

g1 = (prn, verb, art, noun) 0.69g2 = (prn, verb, art, verb) 0.14g3 = (prn, noun, art, noun) 0.10g4 = (prn, noun, art, verb) + 0.07

1.00

With ρ = 0.8, g3 and g4 are discarded because 0.69 + 0.14 ≥ 0.8

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 20 / 45

Page 28: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths Pruning method

Some results: Spanish→Catalan

Mean and std. dev. of the translation performance, WER (% of words)

6.6

6.7

6.8

6.9

7.0

7.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Wor

d er

ror

rate

(W

ER

, % o

f wor

ds)

Probability mass (ρ)

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 21 / 45

Page 29: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Pruning of disambiguation paths Pruning method

Some results

Ratio of translated words

0

10

20

30

40

50

60

70

80

90

100

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Rat

io o

f tra

nsla

ted

wor

ds (

%)

Probability mass (ρ)

Spanish−CatalanFrench−Catalan

Occitan−Catalan

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 22 / 45

Page 30: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech tag clustering

Outline

1 Motivation & goal

2 Part-of-speech taggers for machine translationPart-of-speech taggingMT-oriented hidden Markov model training

3 Pruning of disambiguation pathsDisadvantages of the MT-oriented methodPruning method

4 Part-of-speech tag clusteringBest HMM topology for taggers used in MTBottom-up agglomerative clustering

5 Automatic inference of transfer rulesAlignment templates for shallow-transfer machine translationGeneration of Apertium transfer rules

6 Concluding remarks

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 22 / 45

Page 31: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech tag clustering Best HMM topology for taggers used in MT

Best HMM topology for taggers used in MT

Large tagsets (set of PoS tags) for richly-inflected languagesfine PoS tags convey lot of informatione.g. verb.pret.3rd.pl, noun.m.sg

A reduced tagset manually defined following linguistic guidelinesis usually used

Maps fine tags into coarse onesShould allow for better parameter estimation

Goal: To automatically determine the set of states to be usedAvoid the human intervention in defining the tagset

⇒ Model merging approach (Stolcke and Omohundro, 1994)cannot be applied using untagged corpora

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 23 / 45

Page 32: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech tag clustering Bottom-up agglomerative clustering

Bottom-up agglomerative clustering

1 Place each object in its own cluster (singleton)

2 Iteratively compare all pairs of clusters and choose the two closestclusters according to a distance measure

If the distance between the selected clusters is below a certainthreshold, merge both clusters

Otherwise, stop

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 24 / 45

Page 33: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech tag clustering Bottom-up agglomerative clustering

Clustering of PoS tag

First model trained using the large tagset via the MT-orientedmethod

Distance between cluster based on the state-to-state transitionprobabilities

An additional constraint ensures that it is possible to restore theinformation about the fine tag from the coarse one

1 2 3 4 5 6 7

1 7

3456

3456

23456

HMM

HMM ’

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 25 / 45

Page 34: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech tag clustering Bottom-up agglomerative clustering

Some results: Spanish→Catalan

Mean and std. dev. of the translation performance, WER

6.6

6.7

6.8

6.9

0 0.5 1 1.5 2 2.5 0

500

1000

1500

2000

Wor

d er

ror

rate

(W

ER

, % o

f wor

ds)

Num

ber

of c

oars

e ta

gs

Threshold

WER

# tags

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 26 / 45

Page 35: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Part-of-speech tag clustering Bottom-up agglomerative clustering

Some results: French→Catalan

Mean and std. dev. of the translation performance, WER

24.6

24.8

25

25.2

25.4

0 0.5 1 1.5 2 2.5 0

100

200

300

400

Wor

d er

ror

rate

(W

ER

, % o

f wor

ds)

Num

ber

of c

oars

e ta

gs

Threshold

WER

# tags

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 27 / 45

Page 36: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules

Outline

1 Motivation & goal

2 Part-of-speech taggers for machine translationPart-of-speech taggingMT-oriented hidden Markov model training

3 Pruning of disambiguation pathsDisadvantages of the MT-oriented methodPruning method

4 Part-of-speech tag clusteringBest HMM topology for taggers used in MTBottom-up agglomerative clustering

5 Automatic inference of transfer rulesAlignment templates for shallow-transfer machine translationGeneration of Apertium transfer rules

6 Concluding remarks

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 27 / 45

Page 37: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules

Automatic inference of transfer rules

Goal:To automatically learn those transformations that produce correcttranslations in the TL

How:Adapting the alignment templates (ATs) already used in statisticalMT to the shallow-transfer approach

AT z = (Sn,Tm,G)

Sn: sequence of n SL word classesTm: sequence of m TL word classesG: alignment information

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 28 / 45

Page 38: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Alignment templates for shallow-transfer machine translation

AT for shallow-transfer MT: overview /1

Resources required:A SL–TL parallel corpusThe morphological analyzers and PoS taggers of the MT systemThe bilingual dictionary of the MT system

Procedure:1 Analyze both sides of the training corpus2 Compute word alignments3 Extract bilingual phrase pairs and derive ATs from them4 Generate shallow-transfer rules

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 29 / 45

Page 39: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Alignment templates for shallow-transfer machine translation

AT for shallow-transfer MT: overview /2

Word class: part-of-speech (including all the inflection information)

Exception: lexicalized words are placed in single-word classes

Lexicalized categories: categories that are known to be involved inlexical changes, such as prepositions

the method can learn not only syntactic changes

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 30 / 45

Page 40: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Alignment templates for shallow-transfer machine translation

AT for shallow-transfer MT: overview /3

ATs are extended with a set R of restrictions over the TL inflectioninformation of non-lexicalized words

AT z = (Sn,Tm,G,R)

Restrictions are derived from the bilingual dictionaryBilingual entry that does not change inflection information<e><p>

<l>castigo<s n="noun"/></l><r>castig<s n="noun"/></r>

</p></e> R: w=noun.*Bilingual entry that does change inflection information<e><p>

<l>calle<s n="noun"/><s n="f"/></l><r>carrer<s n="noun"/><s n="m"/></r>

</p></e> R: w=noun.m.*

The bilingual dictionary is also used to discard phrase pairs thatcannot be reproduced by the MT system

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 31 / 45

Page 41: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Alignment templates for shallow-transfer machine translation

AT for shallow-transfer MT: overview /3

ATs are extended with a set R of restrictions over the TL inflectioninformation of non-lexicalized words

AT z = (Sn,Tm,G,R)

Restrictions are derived from the bilingual dictionaryBilingual entry that does not change inflection information<e><p>

<l>castigo<s n="noun"/></l><r>castig<s n="noun"/></r>

</p></e> R: w=noun.*Bilingual entry that does change inflection information<e><p>

<l>calle<s n="noun"/><s n="f"/></l><r>carrer<s n="noun"/><s n="m"/></r>

</p></e> R: w=noun.m.*

The bilingual dictionary is also used to discard phrase pairs thatcannot be reproduced by the MT system

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 31 / 45

Page 42: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Alignment templates for shallow-transfer machine translation

Alignment template example /1

Bilingual phrase: Alignment template:

Alacant

a

viure

van

Alicanteen

vivieron

a

anar

en

(verb

.pret.

3rd.p

l)

(noun.loc)

-(pr)

(verb.inf)

-(vbaux.pres.3rd.pl)

-(pr)

(nou

n.loc

)

Spanish analysis: vivieron en Alicante1 −→vivir-(verb.pret.3rd.pl) en-(pr)Alicante-(noun.loc)

Catalan analysis: van viure a Alacant −→anar-(vbaux.pres.3rd.pl) viure-(verb.inf)a-(pr) Alacant-(noun.loc)

Restrictions: w2 =verb.*, w4=noun.*

1Translated into English as They lived in AlicanteFelipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 32 / 45

Page 43: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Alignment templates for shallow-transfer machine translation

Alignment template example /2

Bilingual phrase: Alignment template:

el

carrer

estret

la

calle

estrecha

el

el

(noun.m.sg)

(nou

n.f.sg

)-(art.m.sg)

-(art.

f.sg)

(adj.m.sg)

(adj.

f.sg)

Spanish analysis: la calle estrecha2 −→ el-(art.f.sg)calle-(noun.f.sg) estrecho-(adj.f.sg)

Catalan analysis: el carrer estret −→ el-(art.m.sg)carrer-(noun.m.sg) estret-(adj.m.sg)

Restrictions: w2 =noun.m.*, w3=adj.*

2Translated into English as The narrow streetFelipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 33 / 45

Page 44: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Generation of Apertium transfer rules

Generation of Apertium transfer rules

Procedure:

1 Discard useless AT2 Select the AT to use according to their frequency3 For all ATs with the same SL part a rule is generated

Rule generation:

The rule matches the SL part all ATs have in common

In decreasing order of AT frequency counts code is generated totest the restrictions R over the TL inflection informationif they hold, apply the AT and stop rule execution

code that translates word-for-word is addedit is executed only if none of the AT were applicable

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 34 / 45

Page 45: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Generation of Apertium transfer rules

AT applicability test /1

Restrictions R are tested by looking at the bilingual dictionary

Example:R: w2 =noun.m.*, w3=adj.*

el

el

(noun.m.sg)

(nou

n.f.sg

)-(art.m.sg)

-(art.

f.sg)

(adj.m.sg)

(adj.

f.sg)

Input string (Spanish): la senal roja −→el-(art.f.sg) senal-(noun.f.sg)rojo-(adj.f.sg)Translation of non-lexicalized words:

senal-(noun.f.sg)→senyal-(noun.m.sg)rojo-(adj.f.sg)→vermell-(adj.f.sg)

Restriction holds, AT can be applied

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 35 / 45

Page 46: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Generation of Apertium transfer rules

AT applicability test /2

Restrictions R are tested by looking at the bilingual dictionary

Example:R: w2 =noun.m.*, w3=adj.*

el

el

(noun.m.sg)

(nou

n.f.sg

)-(art.m.sg)

-(art.

f.sg)

(adj.m.sg)

(adj.

f.sg)

Input string (Spanish): la silla blanca −→el-(art.f.sg) silla-(noun.f.sg)blanco-(adj.f.sg)Translation of non-lexicalized words:

silla-(noun.f.sg)→cadira-(noun.f.sg)blanco-(adj.f.sg)→blanc-(adj.f.sg)

Restriction does not hold, AT cannot be applied

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 36 / 45

Page 47: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Generation of Apertium transfer rules

Alignment templates application, an example

Spanish (input): permanecieron en Alemania3 −→permanecer-(verb.pret.3rd.pl) en-(pr)Alemania-(noun.loc)

Catalan (output): anar-(vbaux.pres.3rd.pl)romandre-(verb.inf) a-(pr)Alemanya-(noun.loc) −→van romandre a Alemanya

Word-for-word translation:romangueren *en Alemanya

R: w1 =verb.*, w3=noun.*a

anar

en

(verb

.pret.

3rd.p

l)

(noun.loc)

-(pr)

(verb.inf)

-(vbaux.pres.3rd.pl)

-(pr)

(nou

n.loc

)

3Translated into English as They remained in GermanyFelipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 37 / 45

Page 48: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Generation of Apertium transfer rules

Experiments

Task: Inference of shallow-transfer rules for Spanish↔Catalan,Spanish↔Galician and Spanish→Portuguese

≈ 8 lexicalized categories

Two different training corpora:One with 2 · 106 wordsAnother with only 0.5 · 106 words

Two different evaluation corpora:post-edit reference translation is a post-edited version of the

MT performed using hand-coded transfer rules

parallel text to translate and reference translation comes froma parallel corpus analogous to the one used fortraining

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 38 / 45

Page 49: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Generation of Apertium transfer rules

Some results

Spanish→Catalan, WER ± 95% confidence interval

Training Test Word-for-word AT transfer Hand

2 · 106 post-edit 12.6± 0.9 8.7± 0.7 6.7± 0.7parallel 26.4± 1.2 20.3± 1.1 20.7± 1.0

0.5 · 106 post-edit 12.6± 0.9 9.9± 0.7 6.7± 0.7parallel 26.4± 1.2 21.4± 1.1 20.7± 1.0

Spanish→Portuguese, WER ± 95% confidence interval

Training Test Word-for-word AT transfer Hand

2 · 106 post-edit 11.9± 0.8 12.1± 0.9 7.0± 0.7parallel 47.9± 1.7 46.5± 1.7 47.6± 1.8

0.5 · 106 post-edit 11.9± 0.8 12.1± 0.9 7.0± 0.7parallel 47.9± 1.7 47.4± 1.7 47.6± 1.8

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 39 / 45

Page 50: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Automatic inference of transfer rules Generation of Apertium transfer rules

Some results

Why such a large difference between Spanish→Catalan andSpanish→Portuguese?

Because of how training corpora have been builtSpanish→Catalan, by translating one language into another(newspaper El Periodico de Catalunya)

22% of discarded ATs

Spanish→Portuguese, by translating from a third language(JRC-ACQUIS parallel corpus)

53% of discarded ATs

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 40 / 45

Page 51: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Concluding remarks

Outline

1 Motivation & goal

2 Part-of-speech taggers for machine translationPart-of-speech taggingMT-oriented hidden Markov model training

3 Pruning of disambiguation pathsDisadvantages of the MT-oriented methodPruning method

4 Part-of-speech tag clusteringBest HMM topology for taggers used in MTBottom-up agglomerative clustering

5 Automatic inference of transfer rulesAlignment templates for shallow-transfer machine translationGeneration of Apertium transfer rules

6 Concluding remarks

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 40 / 45

Page 52: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Concluding remarks

Concluding remarks /1

Steps towards more efficient development of RBMT systems

A new method to train PoS tagger to be used in MTfocuses on the task in which it will be used

uses TL information without using parallel corpora

benefits from information in the rest of modules

using a priori knowledge saves around 80% of the translations toperform while training

better translation quality than tagging accuracy

PoS tags clusteringhas not provided the expected results, but

may be useful if the number of states is crucial

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 41 / 45

Page 53: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Concluding remarks

Concluding remarks /2

A method to infer shallow-transfer rules from parallel corporaextends the definition of alignment template

small amount of information provided by human is used

the process followed to build the parallel corpus deserves specialattention

inferred rules are human-readable

they can coexist with hand-coded rules

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 42 / 45

Page 54: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Concluding remarks

Concluding remarks /3

Open-source softwareCan be downloaded from sf.net/projects/apertium

Packages apertium-tagger-training-tools andapertium-transfer-tools

Ensures reproducibility

Allows other researchers to improve them

Eases the development of new language pairs for Apertium

apertium-tagger-training-tools is being used byPrompsit Language Enginnering S.L.

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 43 / 45

Page 55: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Concluding remarks

Future research lines

This thesis opens several research lines:

the use of TL information to train other statistical models that runon the SL

the use of more than one TL (triangulation)

the use of a TL model of different nature

linguistically-driven extraction of bilingual phrases

a more flexible way to use lexicalized categories

a bootstrapping method to learn both the PoS tagger and the setof transfer rules cooperatively

...

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 44 / 45

Page 56: Using unsupervised corpus-based methods to build rule ...fsanchez/pub/thesis/slides.pdf · Disadvantages of the MT-oriented method Pruning method 4 Part-of-speech tag clustering Best

Acknowledgments

Acknowledgments

Spanish Ministry of Education and Science, and European SocialFund; research grant BES-2004-4711

Spanish Ministry of Industry, Commerce and Tourism; projectsTIC2003-08681-C02-01, FIT340101-2004-3 andFIT-350401-2006-5

Autonomous Government of Catalonia; project Traduccioautomatica de codi obert per al catala

Spanish Ministry of Education and Science; projectTIN2006-15071-C03-01

⇒Thank you very much for your attention⇐

Felipe Sanchez Martınez (Univ. d’Alacant) 30th June 2008 45 / 45