enriching morphologically poor languages for …pb/cs712-2013/biplab-kritika...enriching...
TRANSCRIPT
-
Enriching Morphologically Poor Languages
for Statistical Machine Translation
Biplab Ch DasKritika Jain
Proceedings of ACL-08: HLT, 2008Eleftherios Avramidis
Philipp Koehn
-
English sentence Hindi sentence
Tense, mood, aspectVoice,
Verb inflections
Case agreement Possessive cases
Person ? Number ? Gender?
Case agreement Possessive cases
Nominative cases
Accusative cases
Person, Number,Gender, Tense, mood,
Aspect, Voice,
Verb inflections
Traditional SMT uses mapping on lexical level and complex linguisticinformation is lost predicting correct morphological variant in the targetlanguage may depend on the role of the word in the source sentence.
-
English: NP is rendered the same as a subject or an object unlikeHindiExample: English:
Ram saw a boy.A boy came running.
Hindi: Ram ne ek ladke ko dekha.
Ek ladka daudte hue aaya
-
Morphology in phrase based SMTran --> dauda(p1)/ daudi(p2)/ daude(p3)probability of 'ran' to be translated as 'dauda' becomes moreuncertain as the candidates increases.Agreement
(Matching of words in gender, case, number, person etc)Rules of agreement are closely linked to the morphologicalstructure os the language.
-
Why not phrse-based SMT or Language ModelPhrase- based :
When words involved in agreement exceeds the length of the chunkLM
When words involved in agreement exceeds the target n-gram
-
English sentence Hindi sentence
Case information
Gender
Person
SMT System
-
Identification of the verb.What is the piece of linguistic function to be added as tag?
-
Parse in top - down manner.Search for two discrete nodes containing the verb and thecorresponding subject.Recursively search for the subject until found.Identify the person of the subject and assign the tag to theverb and its subtrees.
-
There are six horses runnung in the race .
NNSCD
VPNPareThere
NPVBPEX
VPNP
S
PPVBG
six horses running
NN
NPIN
race
DTin
the
-
Direct connection of the verb person and the subject of thesentence. In most cases it is directly inferred by a personal pronoun
(I, you etc). Thus, when a pronoun existed, it was directly used as a tag.Example : I would like to have a cup of tea .
-
Any other pronouns in different case are converted into nominative case before being used as tags.Them, myself etcExample : Ram convinced them to do the work.
-
When the subject is a single noun, it is used as third person.POS tag of the noun is used to identify the subject is singular or plural.Example : Boys play football regularly.
That boy plays football daily.
-
Gender does not affect the verb form in Greek, but in Hindi,gender information is needed to be tagged.
she went for a ride with her friends.* vah apne doston ke saath ek savaari ke liye gayaa vah apne doston ke saath ek savaari ke liye gayi
-
Mainly defined by the syntactic roles the noun phrase has.Nominative case: Used for the Subject of the sentence
He has done his homework.Accusative case: Used for the direct object
He filled the glass with water.Dative case: Indirect object of bi-transitive verb
I gave the waiter a tip
-
English is a morphologically poor language with fixed word order,(subject-verb-object)So subject and object can easily be identified using a simple parse tree.Given the agreement restriction, all words that accompany
the noun (adjectives, articles, determiners) must follow the case ofthe noun.Once we have the word-case pairs annotated data,
factored SMT is tarined with it to produce the correct inflections.
-
Parse the tree depth-first mannerThe cases are identified within particular
sub-tree patterns which are manually specified.Example
Initially, the pattern of S-(NP-VP) are searched and nominative tags are assigned.
Accustive tags are assigned similarly to the right side and propagated in the nested sub-tree.
-
Factor Based Machine Translation
Biplab Ch DasKritika Jain
-
Motivation
This is a horse runningThere are horses running
-
Phrase based Models: Flaw
These models treat horse and horses as completely different words.
Training occurrences of horse have no effect on learning translation of horses
If we only see horse, we do not know how to translate horses
-
Morphologically rich languages(Difficulty level increases)
The horse itself may have many forms in morphologically rich languagesWe can see that 24 different forms of (ashwa) in sanskrit.
-
Better Approach
Better approach is to analyze surface word forms into lemma and morphology.Horses -> Horse + plural
Translate lemma and morphology separately. +
Generate target surface form + gives
-
Factor based models
We not only use the word for translation, we also use other factors of it.
-
Advantages
Translation models that operate on more general representations, such as lemmas instead of surface forms of words, can draw on richer statistics and overcome the data sparseness problems caused by limited training data.
-
Advantages
Many aspects of translation can be best explained on a morphological, syntactic, or semantic level.
Having such information available to the translation model allows the direct modeling of these aspects.
-
Morphological Constraints on translation
We consider the case of gender as a factor:English: She went to schoolGoogle Translate: But must have been:
-
Re-ordering is at the syntactic level
Basically S V O is the form in english Eg:Ram killed ravana
Where as in hindi it is SOVEg:
-
Decomposition of Factored Translation
1.Translate input lemmas into output lemmas.2. Translate morphological and POS factors.
(Both Steps 1 and 2 are phrase level.) 3. Generate surface forms given the lemma and
linguistic factors.(However the step 3 is in the word level.)
-
Example
Hindi: Transliteration: naye ghar banaye gaye.English: new houses were built.
| lemma | part-of-speech NNS | count plural | case nominative | gender neutral.
Let us consider only: | lemma | part-of-speech NNS
-
Mapping lemmasNew Houses Were Built
-
Mapping FactorsNNP NNS VBD VBN
NNP
NNS
VBN
VBD
-
Generating Surface FormSay Mapping lemmas gives rise to: -> house, home, building, apartment.And Mapping morphology gives:
NNS->NNS or NN.
Then Generating surface form Becomes:houses|house|NNS,homes|home|NNS,buildings|building|NNS,shells|shell|NNS,house|house|NN
-
A note on training
The word alignment methods like GIZA++ is used to get the get the word alignments.
Same is done for the other factors. Phrase table is constructed for both lemmas and
factors. Language and Generation model are trained on
the output data. Note that factored models used in practice are
synchronousthe same segmentation into phrases is used for all translation steps.
-
Statistical Modeling
1.The translation probability: Translation of the input sentence f into the
output sentence e breaks down to a set of phrase translations ( fj*,ej*).
We model this translation for all the factors.
-
Translation Models
Where, h is the function of modeling translation\tau is the scoring function.
-
The language model
For language model we will be using only the target side corpus and is modeled as, say bigram model.
-
The generation ModelExample: The/det man/nn sleeps/vbzCount collectioncount(the,det)++count(man,nn)++count(sleeps,vbz)++
Using MLE we get p(det|the), p(the|det),p(nn|man), p(man|nn)
p(vbzjsleeps)
We use there conditional probabilities as score for generation model.
-
Generation Model
-
Combine using Log linear Model
For instance we have various component models of a system as:
We can combine these component models using a log linear model.
where Z is the normalizing constraint.
-
The model becomes
We take the log linear combination of eachof the features to produce the complete statistical model.
-
Learning the weights
Weights in the log-linear model are determined using a minimum error rate training method, typically Powells method.
Its a variant of co-ordinate ascent. You let one of the k to vary and all the is to be constant.
And optimize the function in steps.
-
Efficient Decoding
Compared to phrase-based models, the decomposition of phrase translation into several mapping steps creates additional computational complexity.
Key insight: executing of mapping steps can be pre-computed and stored as translation options1.Apply mapping steps to all input phrases.2. Store results as translation options.3.Decoding algorithm unchanged.
-
Example
We can see that the translation options for ghar has already been precomputed.
-
Something is wrong
It can cause a combinatorial explosion. Generating all of them is costly in terms of computational time and memory.
Pruning will likely discard good hypotheses, as stacks will be filled with too many factor combinations.
But consistency in translation option may help.
-
Consistency
An expansion is considered consistent if the target side has the same length and if the shared factors match.
-
Consistency in translation option
The mapping DT NN->ADJ DT NN is inconsistent with any of the mappings in table 1
-
Consistency in translation options
The mapping DT NN-> is house and DT NN-> full building are discarded.
-
A Note ON Complexity of Factored Setups
Estimate the average number of options for a single step.
we cannot use the arithmetic mean because extracted phrases obey the power law.
Phrases that occur only once have only one translation in the phrase table.
Very frequent phrases tend to have a large number of translations.
-
Formulation
frequency-weighted average
(ti denotes the number of translations and fi is the source phrase frequency):
-
The Order of application
The order of application of mapping step plays a significant role. In this case, only consistent translation options can be generated during expansion.
It limits the number of translation options generated from the existing options.
It discards those partial options for which no consistent expansion exists.
-
Example
For example, suppose that we define two separate translation steps:1. lemmalemma2. taglemmaDoing mapping in order 1 ->2 is good. First words will be mapped and then the consistent tags will be used for translation options. But if order 2-> 1 is followed than taglemma will lead to mapping of a tag to all the lemma it can map, a combinatorial explosion.
-
Experiments
Experiments for the language pair GermanEnglish, using the 52,185 sentence News Commentary corpus.
Using a part-of-speech language the BLEU score increases from 18.19% to 19.05%.
But!!! surface word translation mapping to a lemma/morphology mapping leads to a deterioration of performance to a BLEU score of 14.46%.
-
Reason why this happens.
Note that this model completely ignores the surface forms of input words .
Only relies on the more general lemma and morphology information.
Though, allows the translation of word forms with known lemma and unknown surface form, on balance it seems to be disadvantage to throw away surface form information.
-
The Alternative Approach
To overcome this problem, an alternative path model was introduced:Translation options in this model may come either from the surface form model or from the lemma/morphology model.
For surface forms with rich evidence in the training data, surface form mappings are prefered.
For surface forms with poor or no evidence in the training data surface forms are decomposed into lemma and morphology information and map these separately.
-
The Alternative approach
-
Various Other Factors
We can consider various other factors like: t-lemma Tectogrammatical lemma, i.e. the deep-
syntactic lemma. functor Describes syntactic-semantic relation of
a node to its parent node. Its possible values include ACT (actor), PAT (patient)
or ADDR (addressee). grammatemes A set of factors that describe
meaning-bearing morphological properties of t-nodes. We extracted the following
categories: gender Grammatical gender. number Grammatical number.
-
Various Other Factors
sempos Semantic part of speech. This factor classifies autosemnatic words into 4 classes: nouns, adjectives, adverbs and verbs (with their respective
subcategories). tense This attribute specifies the tense of verbs. verbmod This factor indicates the verb mood. negation Indicator of negation. formeme Contains a projection of some
morpho-syntactic information from the morphological and analytical layers.
-
GermanEnglish SystemsSystem
Baseline 18.19 15.01
With POS LM 19.05 15.03
Morph-gen model 14.38 11.65
Back-off lemma/morph model
19.47 15.23
-
References
Philipp Koehn and Hieu Hoang, Factored Translation Models, Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2007
Ales Tamchyna and Ondrej Bojar,No Free Lunch in Factored Phrase-Based Machine Translation, Institute of Formal and Applied Linguistics,2013