discriminative modeling extraction sets for machine translation author john denero and dan kleinuc...

Discriminative Modeling Discriminative Modeling extraction Sets for Machine extraction Sets for Machine TranslationTranslationAuthorJohn DeNero and Dan Klein UC BerkeleyPresenterJustin Chiu

ContributionContributionExtraction set

◦Nested collections of all the overlapping phrase pairs consistent with an underlying word-alignment

Advantages over word-factored alignment model◦Can incorporate features on phrase pairs,

more than word link◦Optimize a extraction-based loss function

really direct to generating translationPerform better than both supervised

and unsupervised baseline

Progress of Statistical MTProgress of Statistical MTGenerate translated sentences

word by wordUsing while fragments of training

example, building translation rules◦Aligned at the word level ◦Extract fragment-level rules from word

aligned sentence pair Tree to string translation

Extraction Set Models◦Set of all overlapping phrasal

translation rule + alignment

OutlineOutlineExtraction Set ModelsModel EstimationModel InferenceExperiments

EXTRACTION SET EXTRACTION SET MODELSMODELS

Extraction Set ModelsExtraction Set ModelsInput

◦Unaligned sentence

Output◦Extraction set of phrasal translation

rules◦Word alignment

Extraction Sets from Word Extraction Sets from Word AlignmentsAlignments

Possible and Null Alignment Possible and Null Alignment LinksLinksPossible links has two types

◦ Function words that is unique in its language◦ Short phrase that has no lexical equivalent

Null alignment◦ Express content that is

absent in its translation

Interpreting Possible and Null Interpreting Possible and Null Alignment LinksAlignment Links

Linear Model for Linear Model for Extraction SetExtraction Set

Scoring Extraction SetsScoring Extraction Sets

MODEL ESTIMATIONMODEL ESTIMATION

MIRA(Margin-infused Relaxed MIRA(Margin-infused Relaxed Algorithm)Algorithm)

Extraction Set Loss Extraction Set Loss FunctionFunction

MODEL INFERENCEMODEL INFERENCE

Possible DecompositionsPossible Decompositions

DP for Extraction SetsDP for Extraction Sets

Finding Pseudo-Gold ITG Finding Pseudo-Gold ITG AlignmentAlignment

EXPERIMENTSEXPERIMENTS

Five systems for Five systems for comparisoncomparisonUnsupervised baseline◦ Giza++◦ Joint HMMSupervised baseline◦ Block ITGExtraction Set Coarse Pass◦ Does not score bispans that corss

bracketing of ITG derivationsFull Extraction Set Model

DataDataDiscriminative training and

alignment evaluation◦Trained baseline HMM on 11.3 million

words of FBIS newswire data◦Hand-aligned portion of the NIST MT02

test set 150 training and 191 test sentences

End-to-end translation experiments◦Trained on 22.1 million word prarllel

corpus consisting of sentence up to 40 of newswire data from GALE program

◦NIST MT04/MT05 test sets

ResultsResults

DiscussionDiscussionSyntax labels v.s wordsWord align to rule Rule to word

alignInformation from two directions65% of type 1 error

discriminative modeling extraction sets for machine translation author john denero and dan kleinuc...

Documents