tree automata for automatic language translation kevin knight information sciences institute...
Post on 22-Dec-2015
218 views
TRANSCRIPT
Tree Automata for Automatic Language Translation
kevin knightinformation sciences institute
university of southern california
Outline
• History of the World (of Automata in NLP)• Weighted string automata in NLP
– Applications• transliteration• machine translation• language modeling• speech, lexical processing, tagging, summarization, optical
character recognition, …– Generic algorithms and toolkits
• Weighted tree automata in NLP– Applications– Generic algorithms and toolkits
• Some connections with theory
History of the World
[Markov 1913][Shannon 1948][Chomsky 1956][Chomsky 1957][Rounds 1970] & [Thatcher 1970]
consonant/vowelsequences in Pushkin novels
noisy channel modelcryptography
context free grammarstransformational grammars
tree transducers, toformalizetransformationalgrammars
Transformational Grammar
S
VPNP
DT N V NP
the boy sawDT N
the door
*S
VP
AUX
was
NP
DT N
the door
V
seen
PP
P NP
byDT N
the boy
the boy saw the door the door was seen by the boy*
History of the World
[Markov 1913][Shannon 1948][Chomsky 1956][Chomsky 1957][Rounds 1970] & [Thatcher 1970]
consonant/vowelsequences in Pushkin novels
noisy channel modelcryptography
context free grammarstransformational grammars
tree transducers, toformalizetransformationalgrammars
History of the World
[Markov 1913][Shannon 1948][Chomsky 1956][Chomsky 1957][Rounds 1970] & [Thatcher 1970][Thatcher 1973]
consonant/vowelsequences in Pushkin novels
noisy channel modelcryptography
context free grammarstransformational grammars
tree transducers, toformalizetransformationalgrammars
tree automatasurvey article
History of the World
[Markov 1913][Shannon 1948][Chomsky 1956][Chomsky 1957][Rounds 1970] & [Thatcher 1970][Thatcher 1973]
consonant/vowelsequences in Pushkin novels
noisy channel modelcryptography
context free grammarstransformational grammars
tree transducers, toformalizetransformationalgrammars
tree automatasurvey article
“The number one priority in the area [of tree automata theory] is a careful assessment of the significant problems concerning natural language and programming language semantics and translation. If such problems can be found and formulated, I am convinced that the approach informally surveyed here can provide a unifying framework within which to study them.”
History of the World
Linguistics
TreeAutomataTheory
Computers
History of the World
LINGUISTICSLet’s drop formalismuntil we understandthings better!
NATURAL LANGUAGE PROCESSINGLet’s build demo systems!
THEORYLet’s prove theorems!
Natural Language Processing
• 1970-80s– models of English syntax, demonstration grammars– beyond CFG
• augmented transition networks (ATN)• unification-based grammars (HPSG, LFG, ...)
– mostly turned out to be formally equivalent to each other … and to Turing machines
• tree-adjoining grammar (TAG), categorial grammar– mildly context-sensitive grammars
• Meanwhile, in speech recognition…– probabilistic finite-state grammars of English– built automatically from training data (corpus)– word n-grams– successful paradigm
Natural Language Processing
• 1993– US agency DARPA presided over forced marriage of
speech and language research• 1990s
– NLP dominated by probabilistic finite-state string formalisms and automatic training
– Weighted FSA/FST toolkits • 2000s
– Re-awakened interest in tree formalisms for modeling syntax-sensitive operations
Back to the Outline
• History of the World (of Automata for NLP)• Weighted string automata in NLP
– Applications• transliteration• machine translation• language modeling• speech, lexical processing, tagging, summarization,
optical character recognition, …– Generic algorithms and toolkits
• Weighted tree automata in NLP– Applications– Generic algorithms and toolkits
• Some connections with theory
Natural Language Transformations
• Machine Translation• Name Transliteration• Compression• Question Answering• Spelling Correction• Speech Recognition• Language Generation• Text to Speech
Input
Output
Finite-State Transducer (FST)
k
n
i
g
h
t
q k q2 *e*
q2 n q N
q i q AYq g q3 *e*
q4 t qfinal Tq3 h q4 *e*
Original input: Transformation:q k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : *e*
n : N
h : *e*
g : *e*t : T
i : AY
Finite-State (String) Transducer
q2 n
i
g
h
t
q k q2 *e*
q2 n q N
q i q AYq g q3 *e*
q4 t qfinal Tq3 h q4 *e*
Original input: Transformation:k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : *e*
n : N
h : *e*
g : *e*t : T
i : AY
Finite-State (String) Transducer
N
q i
g
h
t
q k q2 *e*
q2 n q N
q i q AYq g q3 *e*
q4 t qfinal Tq3 h q4 *e*
Original input: Transformation:k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : *e*
n : N
h : *e*
g : *e*t : T
i : AY
Finite-State (String) Transducer
q g
h
t
q k q2 *e*
q2 n q N
q i q AYq g q3 *e*
q4 t qfinal Tq3 h q4 *e*
AY
N
Original input: Transformation:k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : *e*
n : N
h : *e*
g : *e*t : T
i : AY
Finite-State (String) Transducer
q3 h
t
q k q2 *e*
q2 n q N
q i q AYq g q3 *e*
q4 t qfinal Tq3 h q4 *e*
AY
N
Original input: Transformation:k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : *e*
n : N
h : *e*
g : *e*t : T
i : AY
Finite-State (String) Transducer
q4 t
q k q2 *e*
q2 n q N
q i q AYq g q3 *e*
q4 t qfinal Tq3 h q4 *e*
AY
N
Original input: Transformation:k
n
i
g
h
t
FST
qq2
qfinal
q3 q4
k : *e*
n : N
h : *e*
g : *e*t : T
i : AY
Finite-State (String) Transducer
q k q2 *e*
q2 n q N
q i q AYq g q3 *e*
q4 t qfinal Tq3 h q4 *e*
T
qfinal
AY
N
k
n
i
g
h
t
Original input: Transformation:
FST
qq2
qfinal
q3 q4
k : *e*
n : N
h : *e*
g : *e*t : T
i : AY
Transliteration
Angela Knight
a n ji ra na i to
transliteration
Frequently occurring translation problem across languageswith different sound systems and character sets.(Japanese, Chinese, Arabic, Russian, English…)
Can’t be solved by dictionary lookup.
Forward and Backward Transliteration
Angela Knight
a n ji ra na i to
forwardtransliteration (some variation allowed)
Angela Knight
a n ji ra na i to
backwardtransliteration (no variation allowed)
Practical Problem
Transliteration
Angela KnightWFST
7 input symbols 13 output symbols
Transliteration
Angela KnightWFST
7 input symbols 13 output symbols
ra
Transliteration
Angela Knight
WFSTP(k | e)
WFSAP(e)
generate/accept well-formedEnglish sequences
make transformations w/o worrying too muchabout context
noisychannelframework
Transliteration
Angela Knight
WFSTP(k | e)
WFSAP(e)
make transformations w/o worrying too muchabout context
noisychannelframework
Angela Knight
DECODEargmax P(e | k) = eargmax P(e) P(k | e) e
generate/accept well-formedEnglish sequences
Transliteration
Angela Knight
WFST B
WFSA A
WFST D
AE N J EH L UH N AY T
WFST C a n j i r a n a i t o
“generative story”
WFST B
WFSA A
WFST D
WFST C a n j i r a n a i t o
AE N J IH R UH N AY TAH N J IH L UH N AY T OH
+ millions more
+ millions more
+ millions more
DECODE
Machine Translation
美国关岛国际机场及其办公室均接获一名自称沙地阿拉伯富商拉登等发出的电子邮件,威胁将会向机场等公众地方发动生化袭击後,关岛经保持高度戒备。
The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport.
Machine Translation
“I see a Spanish sentence on the page. How did it get there?”
directmodel
noisychannelmodel
Machine Translation
[Brown et al 93][Knight & Al-Onaizan 98]
Machine Translation
[Brown et al 93][Knight & Al-Onaizan 98]
WFSA A
Machine Translation
[Brown et al 93][Knight & Al-Onaizan 98]
WFSA B
Machine Translation
[Brown et al 93][Knight & Al-Onaizan 98]
WFSA C
Machine Translation
[Brown et al 93][Knight & Al-Onaizan 98]
WFSA D
Machine Translation
[Brown et al 93][Knight & Al-Onaizan 98]
WFSA E
Other Applications ofWeighted String Automata in NLP
• speech recognition [Pereira, Riley, Sproat 94]• lexical processing
– word segmentation [Sproat et al 96]– morphological analysis/generation [Kaplan and Kay 94; Clark 02]
• tagging– part of speech tagging [Church 88]– name finding
• summarization [Zajic, Dorr, Schwartz 02]• optical character recognition [Kolak, Byrne, Resnik 03]• decipherment [Knight et al 06]
Algorithms for String Automata
N-best … … paths through an WFSA(Viterbi, 1967; Eppstein, 1998)
EM training Forward-backward EM(Baum & Welch, 1971; Eisner 2001)
Determinization … … of weighted string acceptors(Mohri, 1997)
Intersection WFSA intersection
Application string WFST WFSA
Transducer composition WFST composition(Pereira & Riley, 1996)
String Automata Toolkits forUsed in NLP
• Unweighted– Xerox finite-state calculus
• plus many children
• Weighted– AT&T FSM– plus many children
• Google OpenFST, ISI Carmel, Aachen FSA, DFKI FSM toolkit, MIT FST toolkit …
String Automata Toolkits forUsed in NLP
% echo 'a n ji ra ho re su te ru na i to' | carmel -rsi -k 5 -IEQ word.names.50000wds.transducer /* wfsa */
word-epron.names.55000wds.transducer /* wfst */ epron-jpron.1.transducer /* wfst */ jpron.transducer /* wfst */ vowel-separator.transducer /* wfst */ jpron-asciikana.transducer /* wfst */
ANGELA FORRESTAL KNIGHT 2.60e-20ANGELA FORRESTER KNIGHT 6.00e-21ANGELA FOREST EL KNIGHT 1.91e-21ANGELA FORESTER KNIGHT 1.77e-21ANGELA HOLLISTER KNIGHT 1.33e-21
The Beautiful World of Composable Transducers
P(e) P(f|e) P(e|f)
P(p|e)
P(e|p)
P(p|e)
P(r|e)
P(f|r)
P(r)
English wordsequence
Foreign wordsequence
English wordsequence
English phonemesequence
English word sequence
English phonemesequence
Foreign phonemesequence
P(l|e)
Long Englishword sequence
The Beautiful World of Composable Transducers
P(e) P(f|e) P(e|f)
P(p|e)
P(e|p)
P(p|e)
P(r|e)
P(f|r)
P(r)
English wordsequence
Foreign wordsequence
English wordsequence
English phonemesequence
English word sequence
English phonemesequence
Foreign phonemesequence
P(l|e)
Long Englishword sequence
The Beautiful World of Composable Transducers
P(e) P(f|e) P(e|f)
P(p|e)
P(e|p)
P(p|e)
P(r|e)
P(f|r)
P(r)
English wordsequence
Foreign wordsequence
English wordsequence
English phonemesequence
English word sequence
English phonemesequence
Foreign phonemesequence
P(l|e)
Long Englishword sequence
The Beautiful World of Composable Transducers
P(e) P(f|e) P(e|f)
P(p|e)
P(e|p)
P(p|e)
P(r|e)
P(f|r)
P(r)
English wordsequence
Foreign wordsequence
English wordsequence
English phonemesequence
English word sequence
English phonemesequence
Foreign phonemesequence
P(l|e)
Long Englishword sequence
The Beautiful World of Composable Transducers
P(e) P(f|e) P(e|f)
P(p|e)
P(e|p)
P(p|e)
P(r|e)
P(f|r)
P(r)
English wordsequence
Foreign wordsequence
English wordsequence
English phonemesequence
English word sequence
English phonemesequence
Foreign phonemesequence
P(l|e)
Long Englishword sequence
The Beautiful World of Composable Transducers
P(e) P(f|e) P(e|f)
P(p|e)
P(e|p)
P(p|e)
P(r|e)
P(f|r)
P(r)
English wordsequence
Foreign wordsequence
English wordsequence
English phonemesequence
English word sequence
English phonemesequence
Foreign phonemesequence
P(l|e)
Long Englishword sequence
Finite-State String Transducers
Nice properties Nice toolkits
2004 2005
TranslationAccuracy
2002 2003
NIST Common Evaluations
2006
phrase substitution/transposition
Finite-State String Transducers
• Not expressive enough for many problems!• For example, machine translation:
– Arabic to English: Move the verb from the beginning of the sentence to the middle (in between the subject and object)
– Chinese to English: When translating noun-phrase “de” noun-phrase, flip the order of the noun-phrases & substitute “of” for “de”
Experimental Progress inStatistical Machine Translation
2004 2005
TranslationAccuracy
2002 2003
NIST Common Evaluations
2006
phrase substitution,no linguistic categories
tree transformation,linguistic categories
Syntax Started to Be Helpfulin 2006
30
35
40
45
apr may jun jul aug sept oct nov dec jan feb mar apr may jun july jan feb2005 2006 2007
Chinese/English
all sentences(NIST-2003)
String-based
String-based
sentences < 16 words(NIST-03/04)
Translation Accuracy
String-Based Output
Gunman of police killed . Decoder Hypothesis #1
. 击毙警方被枪手
String-Based Output
Gunman of police attack . Decoder Hypothesis #7
. 击毙警方被枪手
String-Based Output
Gunman by police killed . Decoder Hypothesis #12
. 击毙警方被枪手
String-Based Output
Killed gunman by police . Decoder Hypothesis #134
. 击毙警方被枪手
String-Based Output
Gunman killed the police . Decoder Hypothesis #9,329
. 击毙警方被枪手
String-Based Output
Gunman killed by police .
Problematic:• VBD “killed” needs a direct object• VBN “killed” needs an auxiliary verb (“was”)• countable “gunman” needs an article (“a”, “the”)• “passive marker” in Chinese controls re-ordering
Can’t enforce/encourage any of this!
Decoder Hypothesis #50,654
. 击毙警方被枪手 highest scoringoutput, phrase-based model
The gunman killed by police . DT NN VBD IN NN NPB PP NP-C VP S
Tree-Based Output
Decoder Hypothesis #1
. 击毙警方被枪手
Gunman by police shot . NN IN NN VBD NPB PP NP-C VP S
Tree-Based Output
Decoder Hypothesis #16
. 击毙警方被枪手
The gunman was killed by police . DT NN AUX VBN IN NN NPB PP NP-C VP S
Tree-Based Output
Decoder Hypothesis #1923
. 击毙警方被枪手
highest scoringoutput, syntax-based model
OK, so how does a Chinese string transform into an English tree, or vice-versa?
Back to the Outline
• History of the World (of Automata for NLP)• Weighted string automata in NLP
– Applications• transliteration• machine translation• language modeling• speech, lexical processing, tagging, summarization,
optical character recognition, …– Generic algorithms and toolkits
• Weighted tree automata in NLP– Applications– Generic algorithms and toolkits
• Some connections with theory
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
NP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
, ,
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
, wa ,ga
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
,kare wa,
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
, ,ga
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
kare kikuongaku owa daisuki desugano
Original input: Final output:
, , , , , , ,,
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
q S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
q S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
q S
x0:NP VP
s x0, wa, r x2, ga, q x1
x1:VBZ x2:NP
0.2
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
s NP
PRO
he
q VBZ
enjoys
r NP
VBG
listening
VP
P
to
NP
SBAR
music
, ,
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
, wa ,ga
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
s NP
PRO
he
q VBZ
enjoys
r NP
VBG
listening
VP
P
to
NP
SBAR
music
, ,
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
, wa ,ga
s NP
PRO
kare
he
0.7
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
q VBZ
enjoys
r NP
VBG
listening
VP
P
to
NP
SBAR
music
,kare wa,
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
, ,ga
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
kare kikuongaku owa daisuki desugano
Original input: Final output:
, , , , , , ,,
Top-Down Tree Transducer(W. Rounds 1970; J. Thatcher 1970)
To get total probability, multiply probabilities of theindividual steps.
Top-Down Tree Transducer
• Introduced by Rounds (1970) & Thatcher (1970)“Recent developments in the theory of automata have pointed
to an extension of the domain of definition of automata from strings to trees … parts of mathematical linguistics can be formalized easily in a tree-automaton setting … Our results should clarify the nature of syntax-directed translations and transformational grammars …”
(Rounds 1970, “Mappings on Grammars and Trees”, Math. Systems Theory 4(3))
• Large theory literature– e.g., Gécseg & Steinby (1984), Comon et al (1997)
• Once again re-connecting with NLP practice– e.g., Knight & Graehl (2005), Galley et al (2004, 2006)
Tree Transducers Can be Extracted from Bilingual Data(Galley, Hopkins, Knight, Marcu, 2004)
i felt obliged to do my part
我 有 责任 尽 一份 力
RULES ACQUIRED:
VBD(felt) 有VBN(obliged) 责任VB(do) 尽NN(part) 一份NN(part) 一份 力VP-C(x0:VBN x1:SG-C) x0 x1VP(TO(to) x0:VP-C) x0 …S(x0:NP-C x1:VP) x0 x1
SNP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-CNPB NPB
PRP PRP$ NN
Tree-to-String Transducer,used (noisy-channel-wise) todo string to tree translation.
Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004)
i felt obliged to do my part
我 有 责任 尽 一份 力
RULES ACQUIRED:
VBD(felt) 有VBN(obliged) 责任VB(do) 尽NN(part) 一份NN(part) 一份 力VP-C(x0:VBN x1:SG-C) x0 x1VP(TO(to) x0:VP-C) x0 …S(x0:NP-C x1:VP) x0 x1
SNP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-CNPB NPB
PRP PRP$ NN
Tree-to-String Transducer,used (noisy-channel-wise) todo string to tree translation.
Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004)
i felt obliged to do my part
我 有 责任 尽 一份 力
RULES ACQUIRED:
VBD(felt) 有VBN(obliged) 责任VB(do) 尽NN(part) 一份NN(part) 一份 力VP-C(x0:VBN x1:SG-C) x0 x1VP(TO(to) x0:VP-C) x0 …S(x0:NP-C x1:VP) x0 x1
SNP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-CNPB NPB
PRP PRP$ NN
Tree-to-String Transducer,used (noisy-channel-wise) todo string to tree translation.
Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004)
i felt obliged to do my part
我 有 责任 尽 一份 力
RULES ACQUIRED:
VBD(felt) 有VBN(obliged) 责任VB(do) 尽NN(part) 一份NN(part) 一份 力VP-C(x0:VBN x1:SG-C) x0 x1VP(TO(to) x0:VP-C) x0 …S(x0:NP-C x1:VP) x0 x1
SNP-C VP VP-C VBD SG-C VP VBN TO VP-C VB NP-CNPB NPB
PRP PRP$ NN
Additional extraction methods: (Galley et al, 2006) (Marcu et al, 2006)Current systems learn ~500m rules.
Sample “said that” rules
0.57 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说 , x0 0.09 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说 x0 0.02 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 他 说 , x0 0.02 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 指出 , x0 0.02 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> x0 0.01 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 表示 x0 0.01 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说 , x0 的
VP
VBD SBAR-C
IN x0:S-C
that
said
?
Sample Subject-Verb-Object Rules
CHINESE / ENGLISH
0.82 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1 x2 x3 0.02 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1 , x2 x3 0.01 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 , x1 x2 x3
ARABIC / ENGLISH
0.54 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1 x2 x3 0.44 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x1 x0 x2 x3
S
x0:NP-C VP
x1:VBD x2:NP-C
x3:. ?
Decoding
• argmax P(etree | cstring) etree
• Difficult search problem– Bottom-up CKY parser– Builds English constituents on top of Chinese spans– Record of rule applications (the derivation) provides
information to construct English tree– Returns k-best trees
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
Rules apply when their right-hand sides (RHS)match some portion of the input.
Syntax-Based Decoding
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
RULE 1:DT(these) 这
RULE 2:VBP(include) 中包括
RULE 6:NNP(Russia) 俄罗斯
RULE 4:NNP(France) 法国
RULE 8:NP(NNS(astronauts)) 宇航 , 员
RULE 5:CC(and) 和
RULE 9:PUNC(.) .
Rules apply when their right-hand sides (RHS)match some portion of the input.
“these” “Russia” “astronauts” “.”“include” “France” “and”
Syntax-Based Decoding
RULE 1:DT(these) 这
RULE 2:VBP(include) 中包括
RULE 6:NNP(Russia) 俄罗斯
RULE 4:NNP(France) 法国
RULE 8:NP(NNS(astronauts)) 宇航 , 员
RULE 5:CC(and) 和
RULE 9:PUNC(.) .
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
RULE 13:NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2
“France and Russia”
“include”“these” “France” “and” “Russia” “astronauts” “.”
Syntax-Based Decoding
RULE 1:DT(these) 这
RULE 2:VBP(include) 中包括
RULE 6:NNP(Russia) 俄罗斯
RULE 4:NNP(France) 法国
RULE 8:NP(NNS(astronauts)) 宇航 , 员
RULE 5:CC(and) 和
RULE 9:PUNC(.) .
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
RULE 13:NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2
RULE 11:VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0
“France and Russia”
“coming from France and Russia”
“these” “Russia” “astronauts” “.”“include” “France” “&”
Syntax-Based Decoding
RULE 1:DT(these) 这
RULE 2:VBP(include) 中包括
RULE 6:NNP(Russia) 俄罗斯
RULE 4:NNP(France) 法国
RULE 8:NP(NNS(astronauts)) 宇航 , 员
RULE 5:CC(and) 和
RULE 9:PUNC(.) .
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
RULE 13:NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2
RULE 11:VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0
RULE 16:NP(x0:NP, x1:VP) x1 , 的 , x0
“astronauts coming fromFrance and Russia”
“France and Russia”
“coming from France and Russia”
“these” “Russia” “astronauts” “.”“include” “France” “&”
Syntax-Based Decoding
RULE 1:DT(these) 这
RULE 2:VBP(include) 中包括
RULE 6:NNP(Russia) 俄罗斯
RULE 4:NNP(France) 法国
RULE 8:NP(NNS(astronauts)) 宇航 , 员
RULE 5:CC(and) 和
RULE 9:PUNC(.) .
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
RULE 13:NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2
RULE 16:NP(x0:NP, x1:VP) x1 , 的 , x0
RULE 11:VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0
RULE 14:VP(x0:VBP, x1:NP) x0 , x1
“include astronauts coming fromFrance and Russia”
“France and Russia”
“coming from France and Russia”
“astronauts coming fromFrance and Russia”
“these” “Russia” “astronauts” “.”“include” “France” “&”
RULE 1:DT(these) 这
RULE 2:VBP(include) 中包括
RULE 6:NNP(Russia) 俄罗斯
RULE 4:NNP(France) 法国
RULE 8:NP(NNS(astronauts)) 宇航 , 员
RULE 5:CC(and) 和
RULE 9:PUNC(.) .
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
RULE 10:NP(x0:DT, CD(7), NNS(people) x0 , 7 人
RULE 13:NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2
RULE 15:S(x0:NP, x1:VP, x2:PUNC) x0 , x1 , x2
RULE 16:NP(x0:NP, x1:VP) x1 , 的 , x0
RULE 11:VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0
RULE 14:VP(x0:VBP, x1:NP) x0 , x1
“These 7 people include astronauts coming from France and Russia” Derivation Tree
“France and Russia”
“coming from France and Russia”
“astronauts coming fromFrance and Russia”
“these 7 people”
“include astronauts coming fromFrance and Russia”
“these” “Russia” “astronauts” “.”“include” “France” “&”
These 7 people include astronauts coming from France and Russia .
DT CD VBP NNS IN NNP CC NNP PUNC
NPNP NP
VP
NP
VP
S
NNS VBG
PP
NPNP
DerivedEnglish Tree
Chinese/English Translation Examples
Chinese gloss:six unit Iraq civilian today in Iraq south part
possessive protest in , police and UK troops shot killed .
Machine Translation:Police and British troops shot and killed six Iraqi
civilians in protests in southern Iraq today.
Chinese/English Translation Examples
Chinese:
Machine Translation: Currently, a total of 74 types of medicine
prices increased after the price of medicines will account for more than 40 per cent of medicines sold by India.
印度目前共有 74 种控价药 , 增加后的控价 药品将占印度所售药品的 40% 以上。
First, this isnot a sentence.The VP belowis not finite (e.g., “visited Iran”).
Second, even if theS-C really were a sentence, the verb“discussed” doesn’ttake an S argument.So this is a bogus VP.
Third, even if the lowerVP weren’t bogus,“confirms” only takesa certain type of VP,namely a gerund(“confirms discussingthe idea”).
Arabic-Englishtranslation
Tree Automata Operations forMachine Translation?
e = yield(best-tree(intersect(lm.rtg, b-apply(cstring, tm.tt)))
Weighted tree grammarthat accepts/scoresEnglish trees
Weighted tree-to-stringtransducer that turnsEnglish trees into Chinese strings
argmax P(etree | cstring) etree
Tree Automata AlgorithmsString Automata
AlgorithmsTree Automata
AlgorithmsN-best … … paths through an WFSA
(Viterbi, 1967; Eppstein, 1998)… trees in a weighted forest (Jiménez & Marzal, 2000; Huang & Chiang, 2005)
EM training Forward-backward EM (Baum/Welch, 1971; Eisner 2003)
Tree transducer EM training (Graehl & Knight, 2004)
Determinization … … of weighted string acceptors (Mohri, 1997)
… of weighted tree acceptors (Borchardt & Vogler, 2003; May & Knight, 2005)
Intersection WFSA intersection Tree acceptor intersection (despite CFG not closed)
Applying transducers string WFST WFSA tree TT weighted tree acceptor
Transducer composition WFST composition (Pereira & Riley, 1996)
Many tree transducers not closed under composition (Rounds 70; Engelfriet 75)
Tree Automata Toolkits forUsed in NLP
• Tiburon: Weighted tree automata toolkit– Developed by Jonathan May, USC/ISI– First version distributed in April 2006– Includes tutorial– Inspired by string automata toolkits
• www.isi.edu/licensed-sw/tiburon [May & Knight 06]
Tree Automata Toolkits forUsed in NLP
% echo "A(B(C) B(B(C)))" | tiburon -k 1 - even.rtg three.rtg
A(B(C) B(B(C))): 3.16E-9
% echo "A(B(C) B(C))" | tiburon -k 1 - even.rtg three.rtg
Warning: returning fewer trees than requested
0
Back to the Outline
• History of the World (of Automata for NLP)• Weighted string automata in NLP
– Applications• transliteration• machine translation• language modeling• speech, lexical processing, tagging, summarization,
optical character recognition, …– Generic algorithms and toolkits
• Weighted tree automata in NLP– Applications– Generic algorithms and toolkits
• Some connections with theory
Desirable Properties of Transducer Formalism
• Expressiveness– Can express the knowledge needed to capture the
transformation & solve the linguistic problem• Modularity
– Can integrate smaller components into bigger systems, co-ordinate search
• Inclusiveness– Encompasses simpler formalisms
• Teachability– Can learn from input/output examples
Desirable Formal Properties of Transformation Formalism
Modularity be closed under composition
Inclusiveness capture any transformation that a string-based FST can
Teachability given input/output tree pairs, find locally optimal rule probabilities in low-polynomial time
Expressiveness see next few slides
Expressiveness
S
X VP
Y Z
Y X Z
S
PRO VP
VB Xthere
are
hay X
NP
X PP
of
P Y
Y X
Re-Ordering
Non-constituent Phrases
Lexicalized Re-Ordering
VP
VBZ VBG
is
está cantando
Phrasal Translation
singing
VP
VB X PRT
put
poner X
Non-contiguous Phrases
on
NPB
DT X
the
X
Context-SensitiveWord Insertion/Deletion
some necessary things for machine translation
Expressiveness
S
VPNP
DT N V NP
the boy sawDT N
the door
*S
S’CONJ
V NPwa-
[and]
ra’aa[saw]
N
atefl[the boy]
NP
N
albab[the door]
Local rotation
Desirable Formal Properties of Transformation Formalism
How do different tree formalisms fare?
Expressiveness do local rotationModularity be closed under compositionInclusiveness capture any transformation that a
string-based FST canTeachability given input/output pairs, find locally
optimal rule probabilities in low-polynomial time
Top-down Tree Transducers
S
x0 x2x1
S
x1 VP
x2x0
LNT
T – top-downL – linear (non-copying)N – non-deleting
arabic verbarabic subject
arabic object
one-level LHS multilevel RHS
every rule has this form
S
x0 x2x1
T – top-downL – linear (non-copying)N – non-deleting
arabic verbarabic subject
arabic object
S
VP
x2x0
LT can delete subtrees
one-level LHS multilevel RHS
Top-down Tree Transducers
S
x0 x2x1
T – top-downL – linear (non-copying)N – non-deleting
arabic verbarabic subject
arabic object
S
x0 VP
x2x0x0
T can copy & delete subtrees
one-level LHS multilevel RHS
Top-down Tree Transducers
q S
x0 x2x1
T – top-downL – linear (non-copying)N – non-deleting
arabic verbarabic subject
arabic object
S
r x1 VP
s x2q x0s x0
T all employ states
one-level LHS multilevel RHS
LT LNT
Top-down Tree Transducers
LNT
LT
Tcopying
non-copying
deleting
non-deleting
T – top-downL – linear (non-copying)N – non-deleting
LNT
LT
Tcopying
non-copying
deleting
non-deleting
q S
x0 x1
?
T – top-downL – linear (non-copying)N – non-deleting
S
V NPPRO
S
PRO VP
NPV
*
Expressiveness:
LNT
LT
Tcopying
non-copying
deleting
non-deleting
q S
x0 x1
?
q S
x0 x1
S
r x1 s x1q x0
r VP
x0 x1
q x0
s VP
x0 x1
q x1
T – top-downL – linear (non-copying)N – non-deleting
S
V NPPRO
S
PRO VP
NPV
*
Expressiveness:
Extended (x-) Transducers
S
x1 x2x0
S
x0 VP
x2x1 xLNT
T – top-downL – linear (non-copying)N – non-deletingx – extended LHS
english verb
english subject
english object
multilevel LHS multilevel RHS
can grab more structure
• possibility mentioned in [Rounds 70]• defined in [Graehl & Knight 04]• used for practical MT by [Galley et al 04, 06]
LNT
LT
Tcopying
non-copying
deleting
non-deleting
GS’84
GS’84
LNT
LT
T
xLNT
xLT
xT
xTR=TR
copying
non-copying
deleting
non-deleting
GS’84
GK’04
GK’04
GS’84
+ local rotation
+ finite-checkbefore delete
Expressive power theorems in [Maletti, Graehl, Hopkins, Knight, submitted]
LNT
LT
T
xLNT
xLT
xT
xTR=TR
copying
non-copying
deleting
non-deleting
GS’84
GK’04
GK’04
GS’84
Expressive enough for local rotation
S
V NPPRO
S
PRO VP
NPV
*
Expressiveness:
Expressive power theorems in [Maletti, Graehl, Hopkins, Knight, to appear SIAM J. Comput]
LNT
LT
T
LB=LTR
Bcopying
non-copying
deleting
non-deleting
GS’84
GS’84
Expressive enough for local rotation
xLNT
xLT
xT
xTR=TR
GK’04
GK’04bottom uptransducers
copying
non-copying
deleting
non-deleting
Closed under composition
Expressive enough for local rotation
bottom uptransducers
LNT
LT
T
LB=LTR
B
GS’84
GS’84
xLNT
xLT
xT
xTR=TR
GK’04
GK’04
• Tree transducers are described as generalizing FSTs (strings are “long skinny trees”)
Inclusiveness
q rA/B
q r*e*/B
q rA/*e*
q A
x0
B
r x0
q A
x0
r x0
q x B
r x
FSTtransition
Equivalent tree transducer rule
But these transitions are not part of traditional tree transducers, whichmust consume a symbol at each step.
xLNT(xRHS, input-e,
output-e)
LNT(xRHS, output-e)
LT
T
LB=LTR
B
xLNT(xRHS, output-e)
xLT
xT
xTR=TR
copying
non-copying
deleting
non-deleting
xLNT(xRHS, e-free)
LNT(xRHS, e-free)
LNT(xRHS, input-e)
xLNT(xRHS, input-e)
LNT(xRHS, input-e,
output-e)
MBOT
FST
GSM
Closed under compositionExpressive enough
Generalizes FST
More Theory Connections• Other desirable properties
– More expressivity– Other types of teachability– Process trees horizontally as well as vertically– Graph transduction
• Papers:– Overview of tree automata in NLP [Knight & Graehl 05]– MT Journal [Knight 07]– SIAM J. Comput. [Maletti et al, forthcoming]– CIAA, e.g., Tiburon paper [May & Knight 06]– WATA (Weighted Automata Theory and Applications)– FSMNLP (Finite-State Methods and Natural Language
Processing). Subworkshop: “Tree Automata and Transducers” (papers due 4/13/09)
Conclusion• Weighted string automata for NLP
– well understood and exploited
• Weighted tree automata for NLP– just starting
• Some connections with theory– of continuing interest
• Good news from the empirical front– making good progress on machine translation