enriching morphologically poor languages for …pb/cs712-2013/biplab-kritika...enriching...

62
Enriching Morphologically Poor Languages for Statistical Machine Translation Biplab Ch Das Kritika Jain Proceedings of ACL-08: HLT, 2008 Eleftherios Avramidis Philipp Koehn

Upload: ledien

Post on 20-May-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • Enriching Morphologically Poor Languages

    for Statistical Machine Translation

    Biplab Ch DasKritika Jain

    Proceedings of ACL-08: HLT, 2008Eleftherios Avramidis

    Philipp Koehn

  • English sentence Hindi sentence

    Tense, mood, aspectVoice,

    Verb inflections

    Case agreement Possessive cases

    Person ? Number ? Gender?

    Case agreement Possessive cases

    Nominative cases

    Accusative cases

    Person, Number,Gender, Tense, mood,

    Aspect, Voice,

    Verb inflections

    Traditional SMT uses mapping on lexical level and complex linguisticinformation is lost predicting correct morphological variant in the targetlanguage may depend on the role of the word in the source sentence.

  • English: NP is rendered the same as a subject or an object unlikeHindiExample: English:

    Ram saw a boy.A boy came running.

    Hindi: Ram ne ek ladke ko dekha.

    Ek ladka daudte hue aaya

  • Morphology in phrase based SMTran --> dauda(p1)/ daudi(p2)/ daude(p3)probability of 'ran' to be translated as 'dauda' becomes moreuncertain as the candidates increases.Agreement

    (Matching of words in gender, case, number, person etc)Rules of agreement are closely linked to the morphologicalstructure os the language.

  • Why not phrse-based SMT or Language ModelPhrase- based :

    When words involved in agreement exceeds the length of the chunkLM

    When words involved in agreement exceeds the target n-gram

  • English sentence Hindi sentence

    Case information

    Gender

    Person

    SMT System

  • Identification of the verb.What is the piece of linguistic function to be added as tag?

  • Parse in top - down manner.Search for two discrete nodes containing the verb and thecorresponding subject.Recursively search for the subject until found.Identify the person of the subject and assign the tag to theverb and its subtrees.

  • There are six horses runnung in the race .

    NNSCD

    VPNPareThere

    NPVBPEX

    VPNP

    S

    PPVBG

    six horses running

    NN

    NPIN

    race

    DTin

    the

  • Direct connection of the verb person and the subject of thesentence. In most cases it is directly inferred by a personal pronoun

    (I, you etc). Thus, when a pronoun existed, it was directly used as a tag.Example : I would like to have a cup of tea .

  • Any other pronouns in different case are converted into nominative case before being used as tags.Them, myself etcExample : Ram convinced them to do the work.

  • When the subject is a single noun, it is used as third person.POS tag of the noun is used to identify the subject is singular or plural.Example : Boys play football regularly.

    That boy plays football daily.

  • Gender does not affect the verb form in Greek, but in Hindi,gender information is needed to be tagged.

    she went for a ride with her friends.* vah apne doston ke saath ek savaari ke liye gayaa vah apne doston ke saath ek savaari ke liye gayi

  • Mainly defined by the syntactic roles the noun phrase has.Nominative case: Used for the Subject of the sentence

    He has done his homework.Accusative case: Used for the direct object

    He filled the glass with water.Dative case: Indirect object of bi-transitive verb

    I gave the waiter a tip

  • English is a morphologically poor language with fixed word order,(subject-verb-object)So subject and object can easily be identified using a simple parse tree.Given the agreement restriction, all words that accompany

    the noun (adjectives, articles, determiners) must follow the case ofthe noun.Once we have the word-case pairs annotated data,

    factored SMT is tarined with it to produce the correct inflections.

  • Parse the tree depth-first mannerThe cases are identified within particular

    sub-tree patterns which are manually specified.Example

    Initially, the pattern of S-(NP-VP) are searched and nominative tags are assigned.

    Accustive tags are assigned similarly to the right side and propagated in the nested sub-tree.

  • Factor Based Machine Translation

    Biplab Ch DasKritika Jain

  • Motivation

    This is a horse runningThere are horses running

  • Phrase based Models: Flaw

    These models treat horse and horses as completely different words.

    Training occurrences of horse have no effect on learning translation of horses

    If we only see horse, we do not know how to translate horses

  • Morphologically rich languages(Difficulty level increases)

    The horse itself may have many forms in morphologically rich languagesWe can see that 24 different forms of (ashwa) in sanskrit.

  • Better Approach

    Better approach is to analyze surface word forms into lemma and morphology.Horses -> Horse + plural

    Translate lemma and morphology separately. +

    Generate target surface form + gives

  • Factor based models

    We not only use the word for translation, we also use other factors of it.

  • Advantages

    Translation models that operate on more general representations, such as lemmas instead of surface forms of words, can draw on richer statistics and overcome the data sparseness problems caused by limited training data.

  • Advantages

    Many aspects of translation can be best explained on a morphological, syntactic, or semantic level.

    Having such information available to the translation model allows the direct modeling of these aspects.

  • Morphological Constraints on translation

    We consider the case of gender as a factor:English: She went to schoolGoogle Translate: But must have been:

  • Re-ordering is at the syntactic level

    Basically S V O is the form in english Eg:Ram killed ravana

    Where as in hindi it is SOVEg:

  • Decomposition of Factored Translation

    1.Translate input lemmas into output lemmas.2. Translate morphological and POS factors.

    (Both Steps 1 and 2 are phrase level.) 3. Generate surface forms given the lemma and

    linguistic factors.(However the step 3 is in the word level.)

  • Example

    Hindi: Transliteration: naye ghar banaye gaye.English: new houses were built.

    | lemma | part-of-speech NNS | count plural | case nominative | gender neutral.

    Let us consider only: | lemma | part-of-speech NNS

  • Mapping lemmasNew Houses Were Built

  • Mapping FactorsNNP NNS VBD VBN

    NNP

    NNS

    VBN

    VBD

  • Generating Surface FormSay Mapping lemmas gives rise to: -> house, home, building, apartment.And Mapping morphology gives:

    NNS->NNS or NN.

    Then Generating surface form Becomes:houses|house|NNS,homes|home|NNS,buildings|building|NNS,shells|shell|NNS,house|house|NN

  • A note on training

    The word alignment methods like GIZA++ is used to get the get the word alignments.

    Same is done for the other factors. Phrase table is constructed for both lemmas and

    factors. Language and Generation model are trained on

    the output data. Note that factored models used in practice are

    synchronousthe same segmentation into phrases is used for all translation steps.

  • Statistical Modeling

    1.The translation probability: Translation of the input sentence f into the

    output sentence e breaks down to a set of phrase translations ( fj*,ej*).

    We model this translation for all the factors.

  • Translation Models

    Where, h is the function of modeling translation\tau is the scoring function.

  • The language model

    For language model we will be using only the target side corpus and is modeled as, say bigram model.

  • The generation ModelExample: The/det man/nn sleeps/vbzCount collectioncount(the,det)++count(man,nn)++count(sleeps,vbz)++

    Using MLE we get p(det|the), p(the|det),p(nn|man), p(man|nn)

    p(vbzjsleeps)

    We use there conditional probabilities as score for generation model.

  • Generation Model

  • Combine using Log linear Model

    For instance we have various component models of a system as:

    We can combine these component models using a log linear model.

    where Z is the normalizing constraint.

  • The model becomes

    We take the log linear combination of eachof the features to produce the complete statistical model.

  • Learning the weights

    Weights in the log-linear model are determined using a minimum error rate training method, typically Powells method.

    Its a variant of co-ordinate ascent. You let one of the k to vary and all the is to be constant.

    And optimize the function in steps.

  • Efficient Decoding

    Compared to phrase-based models, the decomposition of phrase translation into several mapping steps creates additional computational complexity.

    Key insight: executing of mapping steps can be pre-computed and stored as translation options1.Apply mapping steps to all input phrases.2. Store results as translation options.3.Decoding algorithm unchanged.

  • Example

    We can see that the translation options for ghar has already been precomputed.

  • Something is wrong

    It can cause a combinatorial explosion. Generating all of them is costly in terms of computational time and memory.

    Pruning will likely discard good hypotheses, as stacks will be filled with too many factor combinations.

    But consistency in translation option may help.

  • Consistency

    An expansion is considered consistent if the target side has the same length and if the shared factors match.

  • Consistency in translation option

    The mapping DT NN->ADJ DT NN is inconsistent with any of the mappings in table 1

  • Consistency in translation options

    The mapping DT NN-> is house and DT NN-> full building are discarded.

  • A Note ON Complexity of Factored Setups

    Estimate the average number of options for a single step.

    we cannot use the arithmetic mean because extracted phrases obey the power law.

    Phrases that occur only once have only one translation in the phrase table.

    Very frequent phrases tend to have a large number of translations.

  • Formulation

    frequency-weighted average

    (ti denotes the number of translations and fi is the source phrase frequency):

  • The Order of application

    The order of application of mapping step plays a significant role. In this case, only consistent translation options can be generated during expansion.

    It limits the number of translation options generated from the existing options.

    It discards those partial options for which no consistent expansion exists.

  • Example

    For example, suppose that we define two separate translation steps:1. lemmalemma2. taglemmaDoing mapping in order 1 ->2 is good. First words will be mapped and then the consistent tags will be used for translation options. But if order 2-> 1 is followed than taglemma will lead to mapping of a tag to all the lemma it can map, a combinatorial explosion.

  • Experiments

    Experiments for the language pair GermanEnglish, using the 52,185 sentence News Commentary corpus.

    Using a part-of-speech language the BLEU score increases from 18.19% to 19.05%.

    But!!! surface word translation mapping to a lemma/morphology mapping leads to a deterioration of performance to a BLEU score of 14.46%.

  • Reason why this happens.

    Note that this model completely ignores the surface forms of input words .

    Only relies on the more general lemma and morphology information.

    Though, allows the translation of word forms with known lemma and unknown surface form, on balance it seems to be disadvantage to throw away surface form information.

  • The Alternative Approach

    To overcome this problem, an alternative path model was introduced:Translation options in this model may come either from the surface form model or from the lemma/morphology model.

    For surface forms with rich evidence in the training data, surface form mappings are prefered.

    For surface forms with poor or no evidence in the training data surface forms are decomposed into lemma and morphology information and map these separately.

  • The Alternative approach

  • Various Other Factors

    We can consider various other factors like: t-lemma Tectogrammatical lemma, i.e. the deep-

    syntactic lemma. functor Describes syntactic-semantic relation of

    a node to its parent node. Its possible values include ACT (actor), PAT (patient)

    or ADDR (addressee). grammatemes A set of factors that describe

    meaning-bearing morphological properties of t-nodes. We extracted the following

    categories: gender Grammatical gender. number Grammatical number.

  • Various Other Factors

    sempos Semantic part of speech. This factor classifies autosemnatic words into 4 classes: nouns, adjectives, adverbs and verbs (with their respective

    subcategories). tense This attribute specifies the tense of verbs. verbmod This factor indicates the verb mood. negation Indicator of negation. formeme Contains a projection of some

    morpho-syntactic information from the morphological and analytical layers.

  • GermanEnglish SystemsSystem

    Baseline 18.19 15.01

    With POS LM 19.05 15.03

    Morph-gen model 14.38 11.65

    Back-off lemma/morph model

    19.47 15.23

  • References

    Philipp Koehn and Hieu Hoang, Factored Translation Models, Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2007

    Ales Tamchyna and Ondrej Bojar,No Free Lunch in Factored Phrase-Based Machine Translation, Institute of Formal and Applied Linguistics,2013