grammer
TRANSCRIPT
![Page 1: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/1.jpg)
Giorgio SattaUniversity of Padua
Parsing Techniques for
Lexicalized Context-Free Grammars*
* Joint work with : Jason Eisner, Mark-Jan Nederhof
![Page 2: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/2.jpg)
Summary
• Part I: Lexicalized Context-Free Grammars– motivations and definition– relation with other formalisms
• Part II: standard parsing– TD techniques– BU techniques
• Part III: novel algorithms– BU enhanced– TD enhanced
![Page 3: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/3.jpg)
Lexicalized grammars
• each rule specialized for one or more lexical items
• advantages over non-lexicalized formalisms:
– express syntactic preferences that are sensitive to lexical words
– control word selection
![Page 4: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/4.jpg)
Syntactic preferences
• adjuncts Workers [ dumped sacks ] into a bin
*Workers dumped [ sacks into a bin ]
• N-N compound [ hydrogen ion ] exchange
*hydrogen [ ion exchange ]
![Page 5: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/5.jpg)
Word selection
• lexical Nora convened the meeting
?Nora convened the party
• semantics Peggy solved two puzzles
?Peggy solved two goats
• world knowledge Mary shelved some books
?Mary shelved some cooks
![Page 6: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/6.jpg)
Lexicalized CFG
Motivations :
• study computational properties common to generative formalisms used in state-of-the-art real-world parsers
• develop parsing algorithm that can be directly applied to these formalisms
![Page 7: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/7.jpg)
Lexicalized CFG
dumped sacks into a bin
VP[dump][sack]
P[into]
N[bin]Det[a]
NP[bin]
PP[into][bin]
V[dump]
VP[dump][sack]
NP[sack]
N[sack]
![Page 8: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/8.jpg)
Lexicalized CFG
Context-free grammars with :
• alphabet VT:
– dumped, sacks, into, ...
• delexicalized nonterminals VD:
– NP, VP, ...
• nonterminals VN:
– NP[sack], VP[dump][sack], ...
![Page 9: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/9.jpg)
Lexicalized CFG
Delexicalized nonterminals encode :
• word senseN, V, ...
• grammatical featuresnumber, tense, ...
• structural information bar level, subcategorization state, ...
• other constraints distribution, contextual features, ...
![Page 10: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/10.jpg)
Lexicalized CFG
• productions have two forms :
– V[dump] dumped
– VP[dump][sack] VP[dump][sack] PP[into][bin]
• lexical elements in lhs inherited from rhs
![Page 11: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/11.jpg)
Lexicalized CFG
• production is k-lexical : k occurrences of lexical elements in rhs
– NP[bin] Det[a] N[bin] is 2-lexical
– VP[dump][sack] VP[dump][sack] PP[into][bin]
is 4-lexical
![Page 12: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/12.jpg)
LCFG at work
• 2-lexical CFG– Alshawi 1996 : Head Automata
– Eisner 1996 : Dependency Grammars
– Charniak 1997 : CFG
– Collins 1997 : generative model
![Page 13: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/13.jpg)
LCFG at work
Probabilistic LCFG G is strongly equivalent to probabilistic grammar G’ iff
• 1-2-1 mapping between derivations
• each direction is a homomorphism
• derivation probabilities are preserved
![Page 14: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/14.jpg)
LCFG at work
From Charniak 1997 to 2-lex CFG :
NPS[profits]
NNP[profits] ADJNP[corporate]
Pr1 (corporate | ADJ, NP, profits)
Pr1 (profits | N, NP, profits) Pr2 ( NP ADJ N | NP, S, profits)
![Page 15: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/15.jpg)
LCFG at work
From Collins 1997 (Model #2) to 2-lex CFG :
Prleft (NP, IBM | VP, S, bought, left , {NP-C})
N, [IBM] VPS, {NP-C}, left , S [bought]
VPS, {}, left , S [bought]
![Page 16: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/16.jpg)
LCFG at work
Major Limitation : Cannot capture relations involving lexical items outside actual constituent (cfr. history based models)
V[d0
]
d1
NP[d1]
d2
PP[d2][d3]
d0
cannot look at d0 when computing PP attachment
NP[d1][d0]
![Page 17: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/17.jpg)
LCFG at work
• lexicalized context-free parsers that are not LCFG :
– Magerman 1995 : Shift-Reduce+
– Ratnaparkhi 1997 : Shift-Reduce+
– Chelba & Jelinek 1998 : Shift-Reduce+
– Hermjakob & Mooney 1997 : LR
![Page 18: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/18.jpg)
Related work
Other frameworks for the study of lexicalized grammars :
• Carroll & Weir 1997 : Stochastic Lexicalized Grammars; emphasis on expressiveness
• Goodman 1997 : Probabilistic Feature Grammars; emphasis on parameter estimation
![Page 19: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/19.jpg)
Summary
• Part I: Lexicalized Context-Free Grammars– motivations and definition– relation with other formalisms
• Part II: standard parsing– TD techniques– BU techniques
• Part III: novel algorithms– BU enhanced– TD enhanced
![Page 20: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/20.jpg)
Standard Parsing
• standard parsing algorithms (CKY, Earley, LC, ...) run on LCFG in time O ( |G | |w |3 )
• for 2-lex CFG (simplest case) |G | grows with |VD|3 |VT|2 !!
Goal : Get rid of |VT| factors
![Page 21: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/21.jpg)
Standard Parsing: TD
Result (to be refined) : Algorithms satisfying the correct-prefix property are “unlikely” to run on LCFG in time independent of VT
![Page 22: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/22.jpg)
Correct-prefix property
Earley, Left-Corner, GLR, ... :
w
S
left-to-right reading position
![Page 23: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/23.jpg)
On-line parsing
No grammar precompilation (Earley) :
ParserG
w
Output
![Page 24: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/24.jpg)
Standard Parsing: TD
Result : On-line parsers with correct-prefix property cannot run in time O ( f(|VD|, |w |) ), for any function f
![Page 25: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/25.jpg)
Off-line parsing
Grammar is precompiled (Left-Corner, LR) :
Parser
G PreComp
C(G )
w OutputParser
![Page 26: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/26.jpg)
Standard Parsing: TD
Fact : We can simulate a nondeterministic FA M on w in time O ( |M | |w | )
Conjecture : Fix a polynomial p. We cannot simulate M on w in time p( |w | ) unless we spend exponential time in precompiling M
![Page 27: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/27.jpg)
Standard Parsing: TD
Assume our conjecture holds true
Result : Off-line parsers with correct-prefix property cannot run in time O ( p(|VD|, |w |) ), for any polynomial p, unless we spend exponential time in precompiling G
![Page 28: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/28.jpg)
Standard Parsing: BU
Common practice in lexicalized grammar parsing :
• select productions that are lexically grounded in w
• parse BU with selected subset of G
Problem :Algorithm removes |VT| factors but introduces new |w | factors !!
![Page 29: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/29.jpg)
Standard Parsing: BU
Running time is O ( |VD|3 |w |5 ) !!
i k j
B[d1] C[d2]
A[d2]
d1 d2
Time charged : • i, k, j |w |3
• A, B, C |VD|3
• d1, d2 |w |2
![Page 30: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/30.jpg)
Standard BU : Exhaustive
y = c x 5,2019
1
10
100
1000
10000
100000
10 100
length
tim
e BU naive
![Page 31: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/31.jpg)
Standard BU : Pruning
y = c x 3,8282
1
10
100
1000
10000
10 100
length
tim
e BU naive
![Page 32: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/32.jpg)
Summary
• Part I: Lexicalized Context-Free Grammars– motivations and definition– relation with other formalisms
• Part II: standard parsing– TD techniques– BU techniques
• Part III: novel algorithms– BU enhanced– TD enhanced
![Page 33: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/33.jpg)
BU enhanced
Result : Parsing with 2-lex CFG in time O ( |VD|3 |w |4 )
Remark : Result transfers to models in Alshawi 1996, Eisner 1996, Charniak 1997, Collins 1997
Remark : Technique extends to improve parsing of Lexicalized-Tree Adjoining Grammars
![Page 34: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/34.jpg)
Algorithm #1
Basic step in naive BU :
j
C[d2]
d2i k
B[d1]
d1
Idea :Indices d1 and j can be processed independently
A[d2]
![Page 35: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/35.jpg)
Algorithm #1
• Step 1
• Step 2
i k
B[d1
]
d1
A[d2]
C[d2]
d2
A[d2]
i d2 j
i k
A[d2]
C[d2]
d2
ji
A[d2]
C[d2]
d2k
![Page 36: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/36.jpg)
BU enhanced
Upper bound provided by Algorithm #1 : O (|w |4 )
Goal : Can we go down to O (|w |3 ) ?
![Page 37: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/37.jpg)
Spine
last week
AdvP[week]
NP[IBM]
NP[Lotus]
Lotus
IBM
S[buy]
S[buy]
VP[buy]
V[buy]
bought
The spine of a parse tree is the path from the root to the root’s head
S[buy]
S[buy]
VP[buy]
V[buy]
bought
![Page 38: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/38.jpg)
The spine projection is the yield of the sub-tree composed by the spine and all its sibling nodes
Spine projection
last week
AdvP[week]
S[buy]
NP[IBM]
S[buy]
VP[buy]
V[buy] NP[Lotus]
Lotus
bought
IBM
AdvP[week]
NP[IBM]
NP[Lotus]
NP[IBM] bought NP[Lotus] AdvP[week]
![Page 39: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/39.jpg)
Split Grammars
Split spine projections at head :
Problem :how much information do we need to store in order to construct new grammatical spine projections from splits ?
??
![Page 40: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/40.jpg)
Split Grammars
Fact : Set of spine projections is a linear context-free language
Definition : 2-lex CFG is split if set of spine projections is a regular language
Remark : For split grammars, we can recombine splits using finite information
![Page 41: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/41.jpg)
Split Grammars
Non-split grammar :
• unbounded # of dependencies between left and right dependents of head
S[d]
AdvP[a]
S[d] AdvP[b]
S1[d]
• linguistically unattested and unlikely
AdvP[a]
S[d] AdvP[b]
S1[d]
![Page 42: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/42.jpg)
Split Grammars
Split grammar :finite # of dependencies between left and right dependents of lexical head
S[d]
AdvP[a]
S[d] AdvP[b]
S[d]
![Page 43: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/43.jpg)
Split Grammars
Precompile grammar such that splits are derived separately
AdvP[week]
NP[IBM]
NP[Lotus]
S[buy]
S[buy]
VP[buy]
V[buy]
bought
AdvP[week]
NP[IBM]
NP[Lotus]
S[buy]
r3[buy]
r2[buy]
r1[buy]
bought
r3[buy] is a split symbol
![Page 44: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/44.jpg)
Split Grammars
• t : max # of states per spine automaton
• g : max # of split symbols per spine automaton (g < t )
• m : # of delexicalized nonterminals thare are maximal projections
![Page 45: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/45.jpg)
BU enhanced
Result : Parsing with split 2-lexical CFG in time O (t 2 g 2 m 2 |w |3 )
Remark: Models in Alshawi 1996, Charniak 1997 and Collins 1997 are not split
![Page 46: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/46.jpg)
Algorithm #2
Idea :
• recognize left and right splits separately
• collect head dependents one split at a time
B[d]
d
d
s
B[d]
s
B[d]
d
![Page 47: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/47.jpg)
NP[IBM] bought NP[Lotus] AdvP[week]
Algorithm #2
![Page 48: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/48.jpg)
Algorithm #2
• Step 1
B[d1]
d1
s1
k
r1
d2
s2
r2
• Step 2
B[d1]
d1
s1
d2
s2
r2
B[d1]
d1
s1
d2
s2
r2
i
r2
d2i
s2
![Page 49: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/49.jpg)
Algorithm #2 : Exhaustive
y = c x 5,2019
y = c x 3,328
1
10
100
1000
10000
100000
10 100
length
tim
e
BU split
BU naive
![Page 50: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/50.jpg)
Algorithm #2 : Pruning
y = c x 3,8282
y = c x 2,8179
1
10
100
1000
10000
10 100
length
tim
e
BU naive
BU split
![Page 51: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/51.jpg)
Related work
Cubic time algorithms for lexicalized grammars :
• Sleator & Temperley 1991 : Link Grammars
• Eisner 1997 : Bilexical Grammars (improved by transfer of Algorithm #2)
![Page 52: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/52.jpg)
TD enhanced
Goal : Introduce TD prediction for 2-lexical CFG parsing, without |VT| factors
Remark : Must relax left-to-right parsing (because of previous results)
![Page 53: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/53.jpg)
TD enhanced
Result : TD parsing with 2-lex CFG in time O ( |VD|3 |w |4 )
Open : O ( |w |3 ) extension to split grammars
![Page 54: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/54.jpg)
TD enhanced
Strongest version of correct-prefix property :
S
w
reading position
![Page 55: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/55.jpg)
Data Structures
Prods with lhs A[d] :
• A[d] X1[d1] X2[d2]
• A[d] Y1[d3] Y2[d2]
• A[d] Z1[d2] Z2[d1]d3
d2
d1
Trie for A[d] :
d1
d2
![Page 56: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/56.jpg)
Data Structures
Rightmost subsequence recognition by precompiling input w into a deterministic FA :
a
b
a
c
c b
a
b
b
a
a
c
![Page 57: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/57.jpg)
Algorithm #3
Item representation :
• i, j indicate extension of A[d] partial analysis
A[d]
i j k
S
• k indicates rightmost possible position for completion of A[d] analysis
![Page 58: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/58.jpg)
Algorithm #3 : Prediction
• Step 1 : find rightmost subsequence before k for some
A[d2] production• Step 2 :
make Earley prediction
C[d2] B[d1]
k’
i
A[d2]
k
d2d1
![Page 59: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/59.jpg)
Conclusions
• standard parsing techniques are not suitable for processing lexicalized grammars
• novel algorithms have been introduced using enhanced dynamic programming
• work to be done : extension to history-based models
![Page 60: grammer](https://reader035.vdocuments.us/reader035/viewer/2022070322/5591376c1a28ab07498b45e1/html5/thumbnails/60.jpg)
The End
Many thanks for helpful discussion to :
Jason Eisner, Mark-Jan Nederhof