enriching morphologically poor languages for …pb/cs712-2013/biplab-kritika...enriching...

Enriching Morphologically Poor Languages

for Statistical Machine Translation

Biplab Ch DasKritika Jain

Proceedings of ACL-08: HLT, 2008Eleftherios Avramidis

Philipp Koehn

English sentence Hindi sentence

Tense, mood, aspectVoice,

Verb inflections

Case agreement Possessive cases

Person ? Number ? Gender?

Case agreement Possessive cases

Nominative cases

Accusative cases

Person, Number,Gender, Tense, mood,

Aspect, Voice,

Verb inflections

Traditional SMT uses mapping on lexical level and complex linguisticinformation is lost predicting correct morphological variant in the targetlanguage may depend on the role of the word in the source sentence.

English: NP is rendered the same as a subject or an object unlikeHindiExample: English:

Ram saw a boy.A boy came running.

Hindi: Ram ne ek ladke ko dekha.

Ek ladka daudte hue aaya

Morphology in phrase based SMTran --> dauda(p1)/ daudi(p2)/ daude(p3)probability of 'ran' to be translated as 'dauda' becomes moreuncertain as the candidates increases.Agreement

(Matching of words in gender, case, number, person etc)Rules of agreement are closely linked to the morphologicalstructure os the language.

Why not phrse-based SMT or Language ModelPhrase- based :

When words involved in agreement exceeds the length of the chunkLM

When words involved in agreement exceeds the target n-gram

English sentence Hindi sentence

Case information

Gender

Person

SMT System

Identification of the verb.What is the piece of linguistic function to be added as tag?

Parse in top - down manner.Search for two discrete nodes containing the verb and thecorresponding subject.Recursively search for the subject until found.Identify the person of the subject and assign the tag to theverb and its subtrees.

There are six horses runnung in the race .

NNSCD

VPNPareThere

NPVBPEX

VPNP

S

PPVBG

six horses running

NN

NPIN

race

DTin

the

Direct connection of the verb person and the subject of thesentence. In most cases it is directly inferred by a personal pronoun

(I, you etc). Thus, when a pronoun existed, it was directly used as a tag.Example : I would like to have a cup of tea .

Any other pronouns in different case are converted into nominative case before being used as tags.Them, myself etcExample : Ram convinced them to do the work.

When the subject is a single noun, it is used as third person.POS tag of the noun is used to identify the subject is singular or plural.Example : Boys play football regularly.

That boy plays football daily.

Gender does not affect the verb form in Greek, but in Hindi,gender information is needed to be tagged.

she went for a ride with her friends.* vah apne doston ke saath ek savaari ke liye gayaa vah apne doston ke saath ek savaari ke liye gayi

Mainly defined by the syntactic roles the noun phrase has.Nominative case: Used for the Subject of the sentence

He has done his homework.Accusative case: Used for the direct object

He filled the glass with water.Dative case: Indirect object of bi-transitive verb

I gave the waiter a tip

English is a morphologically poor language with fixed word order,(subject-verb-object)So subject and object can easily be identified using a simple parse tree.Given the agreement restriction, all words that accompany

the noun (adjectives, articles, determiners) must follow the case ofthe noun.Once we have the word-case pairs annotated data,

factored SMT is tarined with it to produce the correct inflections.

Parse the tree depth-first mannerThe cases are identified within particular

sub-tree patterns which are manually specified.Example

Initially, the pattern of S-(NP-VP) are searched and nominative tags are assigned.

Accustive tags are assigned similarly to the right side and propagated in the nested sub-tree.

Factor Based Machine Translation

Biplab Ch DasKritika Jain

Motivation

This is a horse runningThere are horses running

Phrase based Models: Flaw

These models treat horse and horses as completely different words.

Training occurrences of horse have no effect on learning translation of horses

If we only see horse, we do not know how to translate horses

Morphologically rich languages(Difficulty level increases)

The horse itself may have many forms in morphologically rich languagesWe can see that 24 different forms of (ashwa) in sanskrit.

Better Approach

Better approach is to analyze surface word forms into lemma and morphology.Horses -> Horse + plural

Translate lemma and morphology separately. +

Generate target surface form + gives

Factor based models

We not only use the word for translation, we also use other factors of it.

Advantages

Translation models that operate on more general representations, such as lemmas instead of surface forms of words, can draw on richer statistics and overcome the data sparseness problems caused by limited training data.

Advantages

Many aspects of translation can be best explained on a morphological, syntactic, or semantic level.

Having such information available to the translation model allows the direct modeling of these aspects.

Morphological Constraints on translation

We consider the case of gender as a factor:English: She went to schoolGoogle Translate: But must have been:

Re-ordering is at the syntactic level

Basically S V O is the form in english Eg:Ram killed ravana

Where as in hindi it is SOVEg:

Decomposition of Factored Translation

1.Translate input lemmas into output lemmas.2. Translate morphological and POS factors.

(Both Steps 1 and 2 are phrase level.) 3. Generate surface forms given the lemma and

linguistic factors.(However the step 3 is in the word level.)

Mapping lemmasNew Houses Were Built

Mapping FactorsNNP NNS VBD VBN

NNP

NNS

VBN

VBD

A note on training

The word alignment methods like GIZA++ is used to get the get the word alignments.

Same is done for the other factors. Phrase table is constructed for both lemmas and

factors. Language and Generation model are trained on

the output data. Note that factored models used in practice are

synchronousthe same segmentation into phrases is used for all translation steps.

Statistical Modeling

1.The translation probability: Translation of the input sentence f into the

output sentence e breaks down to a set of phrase translations ( fj*,ej*).

We model this translation for all the factors.

Translation Models

Where, h is the function of modeling translation\tau is the scoring function.

The language model

For language model we will be using only the target side corpus and is modeled as, say bigram model.

The generation ModelExample: The/det man/nn sleeps/vbzCount collectioncount(the,det)++count(man,nn)++count(sleeps,vbz)++

Using MLE we get p(det|the), p(the|det),p(nn|man), p(man|nn)

p(vbzjsleeps)

We use there conditional probabilities as score for generation model.

Generation Model

Combine using Log linear Model

For instance we have various component models of a system as:

We can combine these component models using a log linear model.

where Z is the normalizing constraint.

The model becomes

We take the log linear combination of eachof the features to produce the complete statistical model.

Learning the weights

Weights in the log-linear model are determined using a minimum error rate training method, typically Powells method.

Its a variant of co-ordinate ascent. You let one of the k to vary and all the is to be constant.

And optimize the function in steps.

Efficient Decoding

Compared to phrase-based models, the decomposition of phrase translation into several mapping steps creates additional computational complexity.

Key insight: executing of mapping steps can be pre-computed and stored as translation options1.Apply mapping steps to all input phrases.2. Store results as translation options.3.Decoding algorithm unchanged.

Example

We can see that the translation options for ghar has already been precomputed.

Something is wrong

It can cause a combinatorial explosion. Generating all of them is costly in terms of computational time and memory.

Pruning will likely discard good hypotheses, as stacks will be filled with too many factor combinations.

But consistency in translation option may help.

Consistency

An expansion is considered consistent if the target side has the same length and if the shared factors match.

Consistency in translation option

The mapping DT NN->ADJ DT NN is inconsistent with any of the mappings in table 1

Consistency in translation options

The mapping DT NN-> is house and DT NN-> full building are discarded.

A Note ON Complexity of Factored Setups

Estimate the average number of options for a single step.

we cannot use the arithmetic mean because extracted phrases obey the power law.

Phrases that occur only once have only one translation in the phrase table.

Very frequent phrases tend to have a large number of translations.

Formulation

frequency-weighted average

(ti denotes the number of translations and fi is the source phrase frequency):

The Order of application

The order of application of mapping step plays a significant role. In this case, only consistent translation options can be generated during expansion.

It limits the number of translation options generated from the existing options.

It discards those partial options for which no consistent expansion exists.

Example

For example, suppose that we define two separate translation steps:1. lemmalemma2. taglemmaDoing mapping in order 1 ->2 is good. First words will be mapped and then the consistent tags will be used for translation options. But if order 2-> 1 is followed than taglemma will lead to mapping of a tag to all the lemma it can map, a combinatorial explosion.

Experiments

Experiments for the language pair GermanEnglish, using the 52,185 sentence News Commentary corpus.

Using a part-of-speech language the BLEU score increases from 18.19% to 19.05%.

But!!! surface word translation mapping to a lemma/morphology mapping leads to a deterioration of performance to a BLEU score of 14.46%.

Reason why this happens.

Note that this model completely ignores the surface forms of input words .

Only relies on the more general lemma and morphology information.

Though, allows the translation of word forms with known lemma and unknown surface form, on balance it seems to be disadvantage to throw away surface form information.

The Alternative Approach

To overcome this problem, an alternative path model was introduced:Translation options in this model may come either from the surface form model or from the lemma/morphology model.

For surface forms with rich evidence in the training data, surface form mappings are prefered.

For surface forms with poor or no evidence in the training data surface forms are decomposed into lemma and morphology information and map these separately.

The Alternative approach

Various Other Factors

We can consider various other factors like: t-lemma Tectogrammatical lemma, i.e. the deep-

syntactic lemma. functor Describes syntactic-semantic relation of

a node to its parent node. Its possible values include ACT (actor), PAT (patient)

or ADDR (addressee). grammatemes A set of factors that describe

meaning-bearing morphological properties of t-nodes. We extracted the following

categories: gender Grammatical gender. number Grammatical number.

Various Other Factors

sempos Semantic part of speech. This factor classifies autosemnatic words into 4 classes: nouns, adjectives, adverbs and verbs (with their respective

subcategories). tense This attribute specifies the tense of verbs. verbmod This factor indicates the verb mood. negation Indicator of negation. formeme Contains a projection of some

morpho-syntactic information from the morphological and analytical layers.

GermanEnglish SystemsSystem

Baseline 18.19 15.01

With POS LM 19.05 15.03

Morph-gen model 14.38 11.65

Back-off lemma/morph model

19.47 15.23

References

Philipp Koehn and Hieu Hoang, Factored Translation Models, Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2007

Ales Tamchyna and Ondrej Bojar,No Free Lunch in Factored Phrase-Based Machine Translation, Institute of Formal and Applied Linguistics,2013

enriching morphologically poor languages for …pb/cs712-2013/biplab-kritika...enriching...

Documents