thai-english mt project: transfer module prachya boonkwan prachya boonkwan ipa: /pratʃəˑjaː...
TRANSCRIPT
Thai-English MT Project:Thai-English MT Project:Transfer ModuleTransfer Module
Prachya BoonkwanPrachya BoonkwanIPA: /pratʃəˑjaː bunˑkʰʷan/
NECTEC, Thailand
April 11, 2007 CERDEC, NJ 2
OutlineOutline
Introduction
Analysis Module
Transfer Module
Generation Module
Conclusion
April 11, 2007 CERDEC, NJ 3
IntroductionIntroduction
Thai-English Machine Translation
ThaiSentence
EnglishSentence
ThaiAnalysis
EnglishGenerationThai English
Transfer
April 11, 2007 CERDEC, NJ 4
Introduction Introduction (cont’d)(cont’d)
Characteristics of ThaiAnalytic language
Subject-Verb-Object pattern
Words written consecutively without space
Serial verb construction
No articles and no mass/concrete classification
Use of classifiers
Auxiliary words to express number, voice, tense, and aspect
April 11, 2007 CERDEC, NJ 5
Introduction Introduction (cont’d)(cont’d)
Issues of Thai-English translationsSummarized from Monthika’s observation
Different orderings between Thai and English
Different verb arguments between Thai and English
Implicit relations in Thai serial noun construction
Semantic duplication in Thai serial verb construction
No plural inflection in Thai
No inflection to express voices, tenses, and aspects in Thai
April 11, 2007 CERDEC, NJ 6
Introduction Introduction (cont’d)(cont’d)
Issue 1: Different orderings between Thai and English
TypesTypes ExamplesExamples
Noun Noun phrasephrase
ssûûererN mmàiàiADJ sõrngsõrngNUM tuatuaCL nánnánDET
Lit: shirtN newADJ twoNUM bodyCL thatDET
Trans: ThoseDET twoNUM newADJ shirtsN
Verb Verb phrasephrase
mmâekruaâekruaN triamtriamV aahãanaahãanN yàangyàangADVMARK rûatrewrûatrewADJ
Lit: female-cookN prepareV mealN ADVMARK rapidADJ
Trans: The female cookN rapidlyADJ+ADVMARK preparesV mealN
April 11, 2007 CERDEC, NJ 7
Introduction Introduction (cont’d)(cont’d)
Issue 2: Different verb arguments between Thai and English
chchãnãnPR paipaiV fràngsèsfràngsèsN
Lit: IPR goV FranceN
Trans: IPR goV toto FranceN
phaanrowngphaanrowngN1 yyùtùtV1 sùubsùubV2 bùrìibùrìiN2
Lit: janitorN1 stopV1 smokeV2 cigaretteN2
Trans: The janitorN1 stoppedV1 smokingsmokingV2 cigaretteN2
April 11, 2007 CERDEC, NJ 8
Introduction Introduction (cont’d)(cont’d)
Issue 3: Implicit relations in Thai serial noun construction
RelationsRelations ExamplesExamples
AdjectiveAdjectivechchâonâathîiâonâathîiN1 kamphuuchaakamphuuchaaN2
Lit: officerN1 cambodiaN2
Trans: cambodianN2 officerN1
PossessionPossessionphuumpanyaaphuumpanyaaN1 banphbanpháábbùùrrùùttN2
Lit: intelligenceN1 ancestorN2
Trans: ancestorN2’s intelligenceN1
AppositionAppositionphphûusàmàkûusàmàkN1 khonnôrkkhonnôrkN2
Lit: applicantN1 outsiderN2
Trans: applicantN1, who is an outsiderN2,
April 11, 2007 CERDEC, NJ 9
Introduction Introduction (cont’d)(cont’d)
Issue 4: Semantic duplication in Thai serial verb construction
raayngaanraayngaanV1 hâiP ssââababV2
Lit: reportV1 toP knowV2
Trans: reportV1 (to knowV2)
khkhâaâaV1 hâiP taaytaayV2
Lit: killV1 toP dieV2
Trans: killV1 (to dieV2)
phûutphûutV1 hâiP fangfangV2
Lit: tellV1 toP listenV2
Trans: tellV1 (to listenV2)
April 11, 2007 CERDEC, NJ 10
Introduction Introduction (cont’d)(cont’d)
Issue 5: No plural inflection in ThaiPluralization MethodsPluralization Methods ExamplesExamples
Numeral phraseNumeral phrasessùnákùnákN sãamsãamNUM tuatuaCL
Lit: dogN threeNUM bodyCL
Trans: threeNUM dogsN
Collective phraseCollective phraseplaaplaaN sãamsãamNUM fũungfũungCOL
Lit: fishN threeNUM schoolCOL
Trans: threeNUM schoolsCOL of fishN
DuplicationDuplicationddèkèkN-dèk-dèkN
Lit: child-childN
Trans: childrenN
Pluralization markerPluralization markerphphûakûakPLU nák-riannák-rianN
Lit: groupPLU studentN
Trans: studentsN
April 11, 2007 CERDEC, NJ 11
Introduction Introduction (cont’d)(cont’d)
Issue 6: No inflection to express voices, tenses, and aspects in Thai
Tns/Asp/ModTns/Asp/Mod ExamplesExamples
Past tensePast tensechchãnãnPR koeykoeyPAST paipaiV fràngsèsfràngsèsN
Lit: IPR PAST goV FranceN
Trans: IPR wentV+PAST to FranceN
Progressive Progressive aspectaspect
chchãnãnPR kamlangkamlangPROG dùemdùemV náamnáamN
Lit: IPR PROG drinkV waterN
Trans: IPR [am drinking]V+PROG waterN
Passive voicePassive voicechchãnãnPR thùukthùukPASS khruukhruuN thamthôtthamthôtV
Lit: IPR PASS teacherN punishV
Trans: IPR [am punished]V+PASS by the teacherN
April 11, 2007 CERDEC, NJ 12
Analysis ModuleAnalysis Module
Overview of Analysis Module
ThaiDep. Tree
Thai WordSegmentor
ThaiParser
ww11ww22ww33
List of wordsand POSes
Analysis ModuleAnalysis Module
ThaiGrammar
ThaiSentence
April 11, 2007 CERDEC, NJ 13
Analysis Module Analysis Module (cont’d)(cont’d)
Thai Parser: input and outputแบลร์�แบลร์� ส่�งส่�ง ทหาร์ทหาร์ อั งกฤษอั งกฤษ ไปไป อั�ร์ กอั�ร์ ก ในใน ป� ป� 20032003
Blair send soldier Britain to Iraq in 2003
‘Blair sent British soldiers to Iraq in 2003.’<
<>
<
< <
แบลร์�แบลร์�Blair
N
ส่�งส่�งsendVT
ทหาร์ทหาร์soldier
N
ไปไปtoP
อั�ร์ กอั�ร์ กIraqN
ในในinP
ป� ป� 200320032003
N
อั งกฤษอั งกฤษBritain
N
<
April 11, 2007 CERDEC, NJ 14
Transfer ModuleTransfer Module
Overview of Transfer Module
ThaiDep. Tree
Thai-EngTransformation English
Dep. Tree
Leaf-NodeCollection
ww11ww22ww33
List of lemmasannotated with
syntacticattributes
Transfer ModuleTransfer Module
MappingTables
April 11, 2007 CERDEC, NJ 15
Transfer Module Transfer Module (cont’d)(cont’d)
Attributes of Thai nouns
TypesTypes AttributesAttributes ExamplesExamples
SyntaxSyntax
Number singular, plural
Person first, second, third
Gender masculine, feminine, neuter
Definiteness definite, indefinite
Type antecedent, anaphora
SemanticsSemantics
Concept human, place, etc.
Role organization, etc.
Domain military, criminal, etc.
Reference human, place, etc.
April 11, 2007 CERDEC, NJ 16
Transfer Module Transfer Module (cont’d)(cont’d)
Attributes of Thai verb
TypesTypes AttributesAttributes ExamplesExamples
SyntaxSyntax
Number singular, plural
Time present, past, future
Gender masculine, feminine, neuter
Type antecedent, anaphora
SemanticsSemantics
Concept dynamic_place, etc.
Domain military, criminal, etc.
Direction inward, outward, etc.
Reference dynamic_place, etc.
April 11, 2007 CERDEC, NJ 17
Transfer Module Transfer Module (cont’d)(cont’d)
Transfer operations
PhrasesPhrases OperationsOperations
Noun phraseNoun phrase
reordering, adjectivization, possessive insertion (-’s), ‘of’ insertion, appositivization, classifier dropping, collective restructuring, number assignment, possessivization
Verb phraseVerb phrase
reordering, VP structure selection, adverbialization, participialization (present, past), infinitivization (with ‘to’/without ‘to’), tense/aspect assignment
SentenceSentence reordering, sentence structure selection
April 11, 2007 CERDEC, NJ 18
Transfer Module Transfer Module (cont’d)(cont’d)
Reordering (R)Relocates constituents resulting quasi-English dependency tree
Attribute assignment (A)Assigns English’s syntactic attributes to quasi-English tree
Insertion (I) & Deletion (D)Inserts/deletes constituents to quasi-English dependency tree resulting English tree
April 11, 2007 CERDEC, NJ 19
Transfer Module Transfer Module (cont’d)(cont’d)
Transfer operations classified into groups
GroupsGroups OperationsOperations
ReorderingReorderingconstituent reordering, collective restructuring, VP structure selection, sentence structure selection
Attribute Attribute AssignmentAssignment
number assignment, tense/aspect assignment, adjectivization, possessivization, participialization, adverbialization
InsertionInsertionpossessive insertion (-’s), ‘of’ insertion, appositivization, infinitivization
DeletionDeletion classifier dropping, serial verb fusion
April 11, 2007 CERDEC, NJ 20
Transfer Module Transfer Module (cont’d)(cont’d)
Steps of transfer operations
ThaiDep. Tree
ReorderingQuasi-English
Dep. Tree
Attributeassignment
Insertion& Deletion
EnglishDep. Tree
April 11, 2007 CERDEC, NJ 21
Transfer Module Transfer Module (cont’d)(cont’d)
Graphical notations: tree pattern
<
any depthany depth only one depthonly one depth
W1 W2
<
> W2
>W3
W4 W1W1 < *W2 (*W3 > (*W4 > *W1)) < *W2
April 11, 2007 CERDEC, NJ 22
Transfer Module Transfer Module (cont’d)(cont’d)
Graphical notation: transfer operation
<
> ADV
N V
>
>
V
N
ADV
OPERATIONOPERATION
RR
(N > V) < *ADV --> N > (*ADV > V) {R}
April 11, 2007 CERDEC, NJ 23
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration
dog body big three body this - bark IN CL ADJ NUM CL DET PROG V PR
> < <
><
<
<
>
April 11, 2007 CERDEC, NJ 24
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
body big dog three body this - bark ICL ADJ N NUM CL DET PROG V PR
>
< <
>
>
<
<
>N < ADJN < ADJ--> ADJ > N {R}--> ADJ > N {R}
April 11, 2007 CERDEC, NJ 25
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
three body body big dog this - bark INUM CL CL ADJ N DET PROG V PR
>
> <
>
<
>
<
>N < NUMN < NUM--> NUM > N {R}--> NUM > N {R}
April 11, 2007 CERDEC, NJ 26
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR
>
> <
>
<
>
>
>N < DETN < DET--> DET > N {R}--> DET > N {R}
April 11, 2007 CERDEC, NJ 27
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR
+plu
>
> <
>
<
>
>
>NUM > NNUM > N--> NUM > N[+plu] {A}--> NUM > N[+plu] {A}
April 11, 2007 CERDEC, NJ 28
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR
+plu +plu
>
> <
>
<
>
>
>DET > N[+plu]DET > N[+plu]--> DET[+plu] > N[+plu]--> DET[+plu] > N[+plu]{A}{A}
April 11, 2007 CERDEC, NJ 29
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR
+plu +plu +acc
>
> <
>
<
>
>
>V < PRV < PR--> V < PR[+acc] {A}--> V < PR[+acc] {A}
April 11, 2007 CERDEC, NJ 30
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR
+plu +plu +prog +acc
>
> <
>
<
>
>
>PROG > VPROG > V--> PROG > V[+prog]--> PROG > V[+prog]{A}{A}
April 11, 2007 CERDEC, NJ 31
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three body body big dog - bark at IDET NUM CL CL ADJ N PROG V PR
+plu +plu +prog +acc
>
>
<
>
<
>
>
>
<
bark < PR -->bark < PR -->bark < (at < PR)bark < (at < PR){I}{I}
April 11, 2007 CERDEC, NJ 32
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three body big dog - bark at IDET NUM CL ADJ N PROG V PR
+plu +plu +prog +acc
>
>
<
>>
>
>
<
NUM < CL --> NUM {D}NUM < CL --> NUM {D}
April 11, 2007 CERDEC, NJ 33
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three big dog - bark at IDET NUM ADJ N PROG V PR
+plu +plu +prog +acc
> <
>>
>
>
<
CL > ADJ --> ADJ {D}CL > ADJ --> ADJ {D}
April 11, 2007 CERDEC, NJ 34
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three big dog bark at IDET NUM ADJ N V PR
+plu +plu +prog +acc
>
<>
>
>
<
PROG > V[+prog]PROG > V[+prog]--> V[+prog] {D}--> V[+prog] {D}
April 11, 2007 CERDEC, NJ 35
Transfer Module Transfer Module (cont’d)(cont’d)
Demonstration (cont’d)
this three big dog bark at IDET NUM ADJ N V PR
+plu +plu +prog +acc
>
<>
>
>
<
this three big dog bark at IDET NUM ADJ N V PR
+plu +plu +prog +acc
Left-NodeLeft-NodeCollectionCollection
These three big dogs are barking at me.These three big dogs are barking at me.
April 11, 2007 CERDEC, NJ 36
Generation ModuleGeneration Module
Overview of generation module
Surface WordGeneration
ArticleInsertion
ww11ww22ww33
List of lemmasannotated with
syntacticattributes
Generation ModuleGeneration Module
LemmaMappingTables
W’W’11W’W’22W’W’33
Surface WordList
DiscourseStack DB
EE11EE22EE33
EnglishOutput
Sentence
April 11, 2007 CERDEC, NJ 37
ConclusionConclusion
Thai-English Machine Translation
ThaiSentence
EnglishSentence
ThaiAnalysis
EnglishGenerationThai English
Transfer
ThaiDep. Tree
ww11ww22ww33
List of lemmasannotated with
syntactic attributes
April 11, 2007 CERDEC, NJ 38
Conclusion Conclusion (cont’d)(cont’d)
Issues of Thai-English translation
Attributes of Thai lexical units
Generalized transfer operationsReordering
Attribute assignment
Insertion
Deletion