trees, grammars, and parsing most slides are taken or adapted from slides by chris manning dan klein
TRANSCRIPT
![Page 1: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/1.jpg)
Trees, Grammars, and Parsing
Most slides are taken or adapted from slides by
Chris ManningDan Klein
![Page 2: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/2.jpg)
Parse Trees
From latent state sequences to latent tree structures (edges and nodes)
![Page 3: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/3.jpg)
Types of Trees
There are several ways to add tree structures to sentences. We will consider 2:- Phrase structure (constituency) trees- Dependency trees
![Page 4: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/4.jpg)
1. Phrase structure• Phrase structure trees organize
sentences into constituents or brackets.
• Each constituent gets a label.
• The constituents are nested in a tree form.
• Linguists can and do argue about the details.
• Lots of ambiguity …
![Page 5: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/5.jpg)
Constituency Tests• How do we know what nodes go in the tree?
• Classic constituency tests:– Substitution by proform– Question answers– Semantic grounds
• Coherence• Reference• Idioms
– Dislocation– Conjunction
• Cross-linguistic arguments
![Page 6: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/6.jpg)
Conflicting Tests
Constituency isn’t always clear.
• Phonological Reduction:– I will go I’ll go– I want to go I wanna go– a le centre au centre
• Coordination– He went to and came from the store.
![Page 7: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/7.jpg)
2. Dependency structure
• Dependency structure shows which words depend on (modify or are arguments of) which other words.
The boy put the tortoise on the rugrug
the
the
ontortoise
put
boy
The
![Page 8: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/8.jpg)
• Write symbolic or logical rules:
• Use deduction systems to prove parses from words– Minimal grammar on “Fed” sentence: 36 parses– Simple, 10-rule grammar: 592 parses– Real-size grammar: many millions of parses– With hand-built grammar, ~30% of sentences have no parse
• This scales very badly.– Hard to produce enough rules for every variation of language (coverage)– Many, many parses for each valid sentence (disambiguation)
Classical NLP: Parsing
![Page 9: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/9.jpg)
Ambiguity examples
![Page 10: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/10.jpg)
The bad effects of V/N ambiguities
![Page 11: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/11.jpg)
Ambiguities: PP Attachment
![Page 12: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/12.jpg)
Attachments
• I cleaned the dishes from dinner.
• I cleaned the dishes with detergent.
• I cleaned the dishes in my pajamas.
• I cleaned the dishes in the sink.
![Page 13: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/13.jpg)
Syntactic Ambiguities 1• Prepositional Phrases
They cooked the beans in the pot on the stove with handles.
• Particle vs. PrepositionThe puppy tore up the staircase.
• Complement StructureThe tourists objected to the guide that they couldn’t hear.She knows you like the back of her hand.
• Gerund vs. Participial AdjectiveVisiting relatives can be boring.Changing schedules frequently confused passengers.
![Page 14: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/14.jpg)
Syntactic Ambiguities 2• Modifier scope within NPs
impractical design requirementsplastic cup holder
• Multiple gap constructionsThe chicken is ready to eat.The contractors are rich enough to sue.
• Coordination scopeSmall rats and mice can squeeze into holes or cracks in
the wall.
![Page 15: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/15.jpg)
Classical NLP Parsing:The problem and its solution
• Very constrained grammars attempt to limit unlikely/weird parses for sentences– But the attempt makes the grammars not robust: many
sentences have no parse
• A less constrained grammar can parse more sentences– But simple sentences end up with ever more parses
• Solution: We need mechanisms that allow us to find the most likely parse(s)– Statistical parsing lets us work with very loose grammars that
admit millions of parses for sentences but to still quickly find the best parse(s)
![Page 16: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/16.jpg)
Polynomial-time Parsing with Context Free Grammars
![Page 17: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/17.jpg)
Parsing
Computational task:Given a set of grammar rules and a sentence, find a
valid parse of the sentence (efficiently)
Naively, you could try all possible trees until you get to a parse tree that conforms to the grammar rules, that has “S” at the root, and that has the right words at the leaves.
But that takes exponential time in the number of words.
17
![Page 18: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/18.jpg)
Aspects of parsing• Running a grammar backwards to find possible structures for a sentence
• Parsing can be viewed as a search problem
• Parsing is a hidden data problem
• For the moment, we want to examine all structures for a string of words
• We can do this bottom-up or top-down– This distinction is independent of depth-first or breadth-first search –
we can do either both ways– We search by building a search tree which his distinct from the parse
tree
![Page 19: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/19.jpg)
Human parsing
• Humans often do ambiguity maintenance– Have the police … eaten their supper?– come in and look around.– taken out and shot.
• But humans also commit early and are “garden pathed”:– The man who hunts ducks out on weekends.– The cotton shirts are made from grows in Mississippi.– The horse raced past the barn fell.
![Page 20: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/20.jpg)
A phrase structure grammar
• S NP VP N cats• VP V NP N claws• VP V NP PP N people• NP NP PP N scratch• NP N V scratch• NP e P with• NP N N• PP P NP
• By convention, S is the start symbol, but in the PTB, we have an extra node at the top (ROOT, TOP)
![Page 21: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/21.jpg)
Phrase structure grammars = context-free grammars
• G = (T, N, S, R)– T is set of terminals– N is set of nonterminals
• For NLP, we usually distinguish out a set P N of preterminals, which always rewrite as terminals
• S is the start symbol (one of the nonterminals)• R is rules/productions of the form X , where X is a
nonterminal and is a sequence of terminals and nonterminals (possibly an empty sequence)
• A grammar G generates a language L.
![Page 22: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/22.jpg)
Probabilistic or stochastic context-free grammars (PCFGs)
• G = (T, N, S, R, P)– T is set of terminals– N is set of nonterminals
• For NLP, we usually distinguish out a set P N of preterminals, which always rewrite as terminals
• S is the start symbol (one of the nonterminals)• R is rules/productions of the form X , where X is a
nonterminal and is a sequence of terminals and nonterminals (possibly an empty sequence)
• P(R) gives the probability of each rule.
• A grammar G generates a language model L.
X N, P(X )1X R
![Page 23: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/23.jpg)
Soundness and completeness
• A parser is sound if every parse it returns is valid/correct
• A parser terminates if it is guaranteed to not go off into an infinite loop
• A parser is complete if for any given grammar and sentence, it is sound, produces every valid parse for that sentence, and terminates
• (For many purposes, we settle for sound but incomplete parsers: e.g., probabilistic parsers that return a k-best list.)
![Page 24: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/24.jpg)
Top-down parsing• Top-down parsing is goal directed
• A top-down parser starts with a list of constituents to be built. The top-down parser rewrites the goals in the goal list by matching one against the LHS of the grammar rules, and expanding it with the RHS, attempting to match the sentence to be derived.
• If a goal can be rewritten in several ways, then there is a choice of which rule to apply (search problem)
• Can use depth-first or breadth-first search, and goal ordering.
![Page 25: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/25.jpg)
Top-down parsing
![Page 26: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/26.jpg)
Problems with top-down parsing• Left recursive rules
• A top-down parser will do badly if there are many different rules for the same LHS. Consider if there are 600 rules for S, 599 of which start with NP, but one of which starts with V, and the sentence starts with V.
• Useless work: expands things that are possible top-down but not there
• Top-down parsers do well if there is useful grammar-driven control: search is directed by the grammar
• Top-down is hopeless for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup.
• Repeated work: anywhere there is common substructure
![Page 27: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/27.jpg)
Repeated work…
![Page 28: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/28.jpg)
Bottom-up parsing• Bottom-up parsing is data directed
• The initial goal list of a bottom-up parser is the string to be parsed. If a sequence in the goal list matches the RHS of a rule, then this sequence may be replaced by the LHS of the rule.
• Parsing is finished when the goal list contains just the start category.
• If the RHS of several rules match the goal list, then there is a choice of which rule to apply (search problem)
• Can use depth-first or breadth-first search, and goal ordering.
• The standard presentation is as shift-reduce parsing.
![Page 29: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/29.jpg)
Problems with bottom-up parsing• Unable to deal with empty categories: termination
problem, unless rewriting empties as constituents is somehow restricted (but then it's generally incomplete)
• Useless work: locally possible, but globally impossible.
• Inefficient when there is great lexical ambiguity (grammar-driven control might help here)
• Conversely, it is data-directed: it attempts to parse the words that are there.
• Repeated work: anywhere there is common substructure
![Page 30: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/30.jpg)
PCFGs – Notation• w1n = w1 … wn = the word sequence from 1 to n
(sentence of length n) • wab = the subsequence wa … wb • Nj
ab = the nonterminal Nj dominating wa … wb
Nj
wa … wb
• We’ll write P(Ni ζj) to mean P(Ni ζj | Ni )• We’ll want to calculate maxt P(t * wab)
![Page 31: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/31.jpg)
The probability of trees and strings
• P(w1n, t) -- The probability of tree is the product of the probabilities of the rules used to generate it.
• P(w1n) -- The probability of the string is the sum of the probabilities of the trees which have that string as their yield
P(w1n) = Σt P(w1n, t) where t is a parse of w1n
twXRtABXR
n
i
RPRPtwP}{}{
1 )()(),(
![Page 32: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/32.jpg)
Example: A Simple PCFG (in Chomsky Normal Form)
S NP VP 1.0 VP V NP 0.7VP VP PP 0.3PP P NP 1.0P with 1.0V saw 1.0
NP NP PP 0.4 NP astronomers 0.1 NP ears 0.18 NP saw 0.04 NP stars 0.18 NP telescope 0.1
![Page 33: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/33.jpg)
)( 1tP
![Page 34: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/34.jpg)
![Page 35: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/35.jpg)
Tree and String Probabilities• w15 = astronomers saw stars with ears• P(t1) = 1.0 * 0.1 * 0.7 * 1.0 * 0.4 * 0.18 * 1.0 * 1.0 * 0.18 = 0.0009072• P(t2) = 1.0 * 0.1 * 0.3 * 0.7 * 1.0 * 0.18 * 1.0 * 1.0 * 0.18 = 0.0006804 • P(w15) = P(t1) + P(t2)
= 0.0009072 + 0.0006804 = 0.0015876
![Page 36: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/36.jpg)
Chomsky Normal Form
• All rules are of the form X Y Z or X w.• A transformation to this form doesn’t change the
weak generative capacity of CFGs.– With some extra book-keeping in symbol names, you
can even reconstruct the same trees with a detransform
– Unaries/empties are removed recursively– N-ary rules introduce new nonterminals:
• VP V NP PP becomes VP V @VP-V and @VP-V NP PP
• In practice it’s a pain– Reconstructing n-aries is easy– Reconstructing unaries can be trickier
• But it makes parsing easier/more efficient
![Page 37: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/37.jpg)
N-ary Trees in Treebank
Lexicon and Grammar
Binary Trees
TreeAnnotations.annotateTree
Parsing
TODO:CKY parsing
Treebank binarization
![Page 38: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/38.jpg)
ROOT
S
NP VP
N
cats
V NP PP
P NP
clawswithpeoplescratch
NN
An example: before binarization…
![Page 39: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/39.jpg)
P
NP
claws
N
@PP->_P
with
NP
N
cats peoplescratch
N
VP
V NP PP
@VP->_V
@VP->_V_NP
ROOT
S
@S->_NP
After binarization..
![Page 40: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/40.jpg)
ROOT
S
NP VP
N
cats
V NP PP
P NP
clawswithpeoplescratch
NN
Binary rule
![Page 41: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/41.jpg)
PP
P
NP
claws
N
@PP->_P
with
ROOT
S
NP VP
N
cats
V NP
peoplescratch
N
Seems redundant? (the rule was already binary)Reason: easier to see how to make finite-order horizontal markovizations – it’s like a finite automaton (explained later)
![Page 42: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/42.jpg)
PP
P
NP
claws
N
@PP->_P
with
ROOT
S
NP VP
N
cats
V NP
peoplescratch
N
ternary rule
![Page 43: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/43.jpg)
P
NP
claws
N
@PP->_P
with
ROOT
S
NP
N
cats peoplescratch
N
VP
V NP PP
@VP->_V
@VP->_V_NP
![Page 44: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/44.jpg)
P
NP
claws
N
@PP->_P
with
ROOT
S
NP
N
cats peoplescratch
N
VP
V NP PP
@VP->_V
@VP->_V_NP
![Page 45: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/45.jpg)
P
NP
claws
N
@PP->_P
with
NP
N
cats peoplescratch
N
VP
V NP PP
@VP->_V
@VP->_V_NP
ROOT
S
@S->_NP
![Page 46: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/46.jpg)
P
NP
claws
N
@PP->_P
with
NP
N
cats peoplescratch
N
VP
V NP PP
@VP->_V
@VP->_V_NP
ROOT
S
@S->_NP
VPV NP PPRemembers 2 siblings
If there’s a ruleVP V NP PP PP,@VP->_V_NP_PP will exist.
![Page 47: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/47.jpg)
Treebank: empties and unaries
TOP
S-HLN
NP-SUBJ VP
VB-NONE-
Atone
PTB Tree
TOP
S
NP VP
VB-NONE-
Atone
NoFuncTags
TOP
S
VP
VB
Atone
NoEmpties
TOP
S
Atone
NoUnaries
TOP
VB
Atone
High
Low
![Page 48: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/48.jpg)
CKY Parsing (aka, CYK)
Cocke–Younger–Kasami (CYK or CKY) parsing is a dynamic programming solution to identifying a valid parse for a sentence.
Dynamic programming: simplifying a complicated problem by breaking it down into simpler subproblems in a recursive manner
48
![Page 49: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/49.jpg)
CKY – Basic IdeaLet the input be a string S consisting of n characters: a1 ... an.
Let the grammar contain r nonterminal symbols R1 ... Rr. This grammar contains the subset Rs which is the set of start symbols.
Let P[n,n,r] be an array of booleans. Initialize all elements of P to false.
At each step, the algorithm sets P[i,j,k] to be true if the subsequence of words (span) starting from i of length j can be generated from Rk
We will start with spans of length 1 (individual words), and then proceed to increasingly larger spans, and determining which ones are valid given the smaller spans that have already been processed.
49
![Page 50: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/50.jpg)
CKY AlgorithmFor each i = 1 to n
For each unit production Rj -> ai, set P[i,1,j] = true.
For each i = 2 to n -- Length of span For each j = 1 to n-i+1 -- Start of span
For each k = 1 to i-1 -- Partition of span For each production RA -> RB RC
If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true
If any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs)
Then S is member of language Else S is not member of language
50
![Page 51: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/51.jpg)
CKY In Action
http://www.diotavelli.net/people/void/demos/cky.html
51
![Page 52: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/52.jpg)
Probabilistic CKY (This version doesn’t handle unaries)
Input: words, grammar. Output: most likely parse, and its probability.For each left = 1 to #words // initialize: all length 1 spans (indiv. words)
For each unit production Rj -> wordsleft,left+1, set score[left,1,j] = P(Rj -> wordsleft,left+1).
For each span = 2 to #words -- Length of span // induction: increasing span For each left = 1 to #words-span+1 -- Start of span
For each mid = 1 to span-1 -- Partition of span For each production RA -> RB RC If score[left,mid,B]>0 and score[left+mid,span-mid,C]>0
score = score[left,mid,B] * score[left+mid,span-mid,C] * P(RA -> RB RC)If score > score[left,span,A] score[left,span,A] = score back[left,span,A] = (B, C, mid)
Set parent = argmaxstart symbols RS score[1,#words,RS]Set score = score[1,#words,parent]Return [score, buildTree(parent,1,#words, back)]
52
![Page 53: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/53.jpg)
buildTree Input: root, left, span, backpointers Output: tree
Set tree.symbol = root
If span = 1 // Base caseSet tree.child = wleft,left+1
Else // recurSet (B, C, mid) = backpointers[left, span, root]Set tree.leftChild = buildTree(B, left, mid, backpointers)Set tree.rightRight = buildTree(C, left+mid, span-mid, backpointers)
Return tree
53
![Page 54: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/54.jpg)
score[1][1]
score[1][2]
score[1][3]
score[1][4]
score[1][5]
score[2][1]
score[2][2]
Score[2][3]
score[2][4]
score[3][1]
Score[3][2]
Score[3][3]
score[4][1]
Score[4][2]
score[5][1]
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Possible Solutions:Not shown: back pointer entries.
![Page 55: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/55.jpg)
N->cats 0.1V->cats 0.01
N->scratch .1V->scratch .2
N->walls .2V->walls .01
P->with .5 N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Initialization
![Page 56: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/56.jpg)
N->cats 0.1V->cats 0.01
N->scratch .1V->scratch .2
N->walls .2V->walls .01
P->with .5 N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 2
Span 2:
![Page 57: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/57.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001VP->V V.00001
N->scratch .1V->scratch .2
N->walls .2V->walls .01
P->with .5 N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 2
Span 2:
Span = 2Left = 1Mid = 1
Grammar ProbabilitiesP(S->N V) = .1P(NP->N N) = .1P(VP->V N) = .1P(VP->V V) = .005P(NP-> N P) = .01P(VP-> V P) = .02P(PP-> P N) = .1
![Page 58: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/58.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001
N->scratch .1V->scratch .2
S->N V .0001NP->N N .002VP->V N.004VP->V V.00001
N->walls .2V->walls .01
P->with .5 N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 2
Span 2:
Span = 2Left = 2Mid = 1
Grammar ProbabilitiesP(S->N V) = .1P(NP->N N) = .1P(VP->V N) = .1P(VP->V V) = .005P(NP-> N P) = .01P(VP-> V P) = .02P(PP-> P N) = .1
![Page 59: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/59.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001
N->scratch .1V->scratch .2
S->N V .002NP->N N .001VP->V N.0001
N->walls .2V->walls .01
NP->N P .000001VP->V P .00001
P->with .5
PP->P N .005
N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 2
Span 2:
Grammar ProbabilitiesP(S->N V) = .1P(NP->N N) = .1P(VP->V N) = .1P(VP->V V) = .005P(NP-> N P) = .01P(VP-> V P) = .02P(PP-> P N) = .1
![Page 60: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/60.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001
S->N VP 2e-6VP->V NP 2e-6
…
N->scratch .1V->scratch .2
S->N V .002NP->N N .001VP->V N.0001
S->N VP 1e-6VP->V NP2e-7
VP-> …V @VP_V 6e-6V NP 4e-6
N->walls .2V->walls .01
NP->N P .000001VP->V P .00001
NP->N PP 1e-4VP->V PP 5e-6@VP_V->N PP 1e-4
P->with .5
PP->P N .005
N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 4
Span 4:Span = 4Left = 2Mid = 1
Grammar ProbabilitiesP(S->N V) = .1P(S->N VP) = .2
P(NP->N N) = .1P(NP->N PP) = .1P(NP-> N P) = .01
P(VP->V N) = .1P(VP->V NP) = .2P(VP->V V) = .005P(VP-> V P) = .02P(VP->V PP) = .1P(VP->V @VP_V) = .3P(VP->VP PP) = .1
P(@VP_V -> N PP) = .1
P(PP-> P N) = .1
![Page 61: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/61.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001
S->N VP 2e-6VP->V NP 2e-6
…
N->scratch .1V->scratch .2
S->N V .002NP->N N .001VP->V N.0001
S->N VP 1e-6VP->V NP2e-7
VP-> …V @VP_V 6e-6V NP 4e-6VP PP5e-8
N->walls .2V->walls .01
NP->N P .000001VP->V P .00001
NP->N PP 1e-4VP->V PP 5e-6@VP_V->N PP 1e-4
P->with .5
PP->P N .005
N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 4
Span 4:Span = 4Left = 2Mid = 2
Grammar ProbabilitiesP(S->N V) = .1P(S->N VP) = .2
P(NP->N N) = .1P(NP->N PP) = .1P(NP-> N P) = .01
P(VP->V N) = .1P(VP->V NP) = .2P(VP->V V) = .005P(VP-> V P) = .02P(VP->V PP) = .1P(VP->V @VP_V) = .3P(VP->VP PP) = .1
P(@VP_V -> N PP) = .1
P(PP-> P N) = .1
![Page 62: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/62.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001
S->N VP 2e-6VP->V NP 2e-6
…
N->scratch .1V->scratch .2
S->N V .002NP->N N .001VP->V N.0001
S->N VP 1e-6VP->V NP2e-7
VP-> …V @VP_V 6e-6V NP 4e-6VP PP5e-8
N->walls .2V->walls .01
NP->N P .000001VP->V P .00001
NP->N PP 1e-4VP->V PP 5e-6@VP_V->N PP 1e-4
P->with .5
PP->P N .005
N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 4
Span 4:Span = 4Left = 2Mid = 3
Grammar ProbabilitiesP(S->N V) = .1P(S->N VP) = .2
P(NP->N N) = .1P(NP->N PP) = .1P(NP-> N P) = .01
P(VP->V N) = .1P(VP->V NP) = .2P(VP->V V) = .005P(VP-> V P) = .02P(VP->V PP) = .1P(VP->V @VP_V) = .3P(VP->VP PP) = .1
P(@VP_V -> N PP) = .1
P(PP-> P N) = .1
![Page 63: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/63.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001
S->N VP 2e-6VP->V NP 2e-6
…
N->scratch .1V->scratch .2
S->N V .002NP->N N .001VP->V N.0001
S->N VP 1e-6VP->V NP2e-7
VP-> …V @VP_V 6e-6V NP 4e-6VP PP5e-8
N->walls .2V->walls .01
NP->N P .000001VP->V P .00001
NP->N PP 1e-4VP->V PP 5e-6@VP_V->N PP 1e-4
P->with .5
PP->P N .005
N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 4
Span 4:Span = 4Left = 2
Grammar ProbabilitiesP(S->N V) = .1P(S->N VP) = .2
P(NP->N N) = .1P(NP->N PP) = .1P(NP-> N P) = .01
P(VP->V N) = .1P(VP->V NP) = .2P(VP->V V) = .005P(VP-> V P) = .02P(VP->V PP) = .1P(VP->V @VP_V) = .3P(VP->VP PP) = .1
P(@VP_V -> N PP) = .1
P(PP-> P N) = .1
![Page 64: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/64.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001
S->N VP 2e-6VP->V NP 2e-6
…
N->scratch .1V->scratch .2
S->N V .002NP->N N .001VP->V N.0001
S->N VP 1e-6VP->V NP2e-7
VP-> V @VP_V 6e-6
back = (left = V, right = @VP_V, mid=1)
N->walls .2V->walls .01
NP->N P .000001VP->V P .00001
NP->N PP 1e-4VP->V PP 5e-6@VP_V->N PP 1e-4
P->with .5
PP->P N .005
N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Preterminals:
Induction: Span 4
Span 4:Span = 4Left = 2
Grammar ProbabilitiesP(S->N V) = .1P(S->N VP) = .2
P(NP->N N) = .1P(NP->N PP) = .1P(NP-> N P) = .01
P(VP->V N) = .1P(VP->V NP) = .2P(VP->V V) = .005P(VP-> V P) = .02P(VP->V PP) = .1P(VP->V @VP_V) = .3P(VP->VP PP) = .1
P(@VP_V -> N PP) = .1
P(PP-> P N) = .1
![Page 65: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/65.jpg)
N->cats 0.1V->cats 0.01
S->N V .002NP->N N .001VP->V N.0001
S->N VP 2e-6VP->V NP 2e-6
…
S->N VP1.2e-7Back =(left=N,right=VP,mid=1)
N->scratch .1V->scratch .2
S->N V .002NP->N N .001VP->V N.0001
S->N VP 1e-6VP->V NP2e-7
VP-> V @VP_V 6e-6
back = (left = V, right = @VP_V, mid=1)
N->walls .2V->walls .01
NP->N P .000001VP->V P .00001
NP->N PP 1e-4VP->V PP 5e-6@VP_V->N PP 1e-4
P->with .5
PP->P N .005
N->claws .1V->claws .2
1
2
3
4
5
1 2 3 4 5cats scratch walls with claws 6
6
Left
Final ChartGrammar ProbabilitiesP(S->N V) = .1P(S->N VP) = .2
P(NP->N N) = .1P(NP->N PP) = .1P(NP-> N P) = .01
P(VP->V N) = .1P(VP->V NP) = .2P(VP->V V) = .005P(VP-> V P) = .02P(VP->V PP) = .1P(VP->V @VP_V) = .3P(VP->VP PP) = .1
P(@VP_V -> N PP) = .1
P(PP-> P N) = .1
![Page 66: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/66.jpg)
P
claws
N
with
S
N
cats wallsscratch
N
VP
V PP
@VP->_V
Corresponding Tree
(This is different from tree I showed before because this one doesn’t include unaries.)
probability = score = 1.2e-7
![Page 67: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/67.jpg)
Extended CKY parsing
• Unaries can be incorporated into the algorithm– Messy, but doesn’t increase algorithmic complexity
• Empties can be incorporated– Use fenceposts– Doesn’t increase complexity; essentially like unaries
• Binarization is vital– Without binarization, you don’t get parsing cubic in the
length of the sentence• Binarization may be an explicit transformation or implicit in how
the parser works (Early-style dotted rules), but it’s always there.
![Page 68: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/68.jpg)
Efficient CKY parsing• CKY parsing can be made very fast (!), partly due to the
simplicity of the structures used.– But that means a lot of the speed comes from engineering
details– And a little from cleverer filtering
– Store chart as (ragged) 3 dimensional array of float (log probabilities)
• score[start][end][category]– For treebank grammars the load is high enough that you don’t really gain
from lists of things that were possible– 50wds: (50x50)/2x(1000 to 20000)x4 bytes = 5–100MB for parse triangle.
Large (can move to beam for span[i][j]).
– Use int to represent categories/words (Index)
![Page 69: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/69.jpg)
Efficient CKY parsing
• Provide efficient grammar/lexicon accessors:– E.g., return list of rules with this left child category– Iterate over left child, check for zero (Neg. inf.) prob
of X:[i,j] (abort loop), otherwise get rules with X on left
– Some X:[i,j] can be filtered based on the input string• Not enough space to complete a long flat rule?• No word in the string can be a CC?
– Using a lexicon of possible POS for words gives a lot of constraint rather than allowing all POS for words
• Cf. later discussion of figures-of-merit/A* heuristics
![Page 70: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/70.jpg)
Runtime in practice: super-cubic!
• Super-cubic in practice! Why?
Best Fit Exponent:
3.47
0
60
120
180
240
300
360
0 10 20 30 40 50Sentence Length
Tim
e (s
ec)
![Page 71: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/71.jpg)
How good are PCFGs?• Robust (usually admit everything, but with low probability)
• Partial solution for grammar ambiguity: a PCFG gives some idea of the plausibility of a sentence
• But not so good because the independence assumptions are too strong
• Give a probabilistic language model – But in a simple case it performs worse than a trigram model
• The problem seems to be it lacks the lexicalization of a trigram model
![Page 72: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/72.jpg)
Parser Evaluation
![Page 73: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/73.jpg)
Evaluating Parsing Accuracy
• Most sentences are not given a completely correct parse by any currently existing parsers.
• Standardly for Penn Treebank parsing, evaluation is done in terms of the percentage of correct constituents (labeled spans).
• [ label, start, finish ]
• A constituent is a triple, all of which must be in the true parse for the constituent to be marked correct.
![Page 74: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/74.jpg)
![Page 75: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/75.jpg)
Evaluating Constituent Accuracy: LP/LR measure
• Let C be the number of correct constituents produced by the parser over the test set, M be the total number of constituents produced, and N be the total in the correct version [microaveraged]
• Precision = C/M• Recall = C/N
• It is possible to artificially inflate either one.• Thus people typically give the F-measure (harmonic mean) of the two.
Not a big issue here; like average.
• This isn’t necessarily a great measure … me and many other people think dependency accuracy would be better.
![Page 76: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/76.jpg)
Extensions to basic PCFG Parsing
![Page 77: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/77.jpg)
Many, many possibilities
• Tree Annotations– Lexicalization– Grandparent, sibling, etc. annotations– Manual label splitting– Latent label splitting
• Horizontal and Vertical Markovization• Discriminative Reranking
![Page 78: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/78.jpg)
Putting words into PCFGs• A PCFG uses the actual words only to determine the probability of
parts-of-speech (the preterminals)• In many cases we need to know about words to choose a parse• The head word of a phrase gives a good representation of the
phrase’s structure and meaning
– Attachment ambiguities The astronomer saw the moon with the telescope
– Coordination the dogs in the house and the cats
– Subcategorization framesput versus like
![Page 79: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/79.jpg)
(Head) Lexicalization• put takes both an NP and a VP
– Sue put [ the book ]NP [ on the table ]PP
– * Sue put [ the book ]NP
– * Sue put [ on the table ]PP
• like usually takes an NP and not a PP– Sue likes [ the book ]NP
– * Sue likes [ on the table ]PP
• We can’t tell this if we just have a VP with a verb, but we can if we know which verb it is
![Page 80: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/80.jpg)
(Head) Lexicalization
• Collins 1997, Charniak 1997• Puts the properties of words into a PCFG
Swalked
NPSue VPwalked
Sue Vwalked PPinto
walked Pinto NPstore
into DTthe NPstore
the store
![Page 81: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/81.jpg)
Lexicalized Parsing was seen as the breakthrough of the late 90s
• Eugene Charniak, 2000 JHU workshop: “To do better, it is necessary to condition probabilities on the actual words of the sentence. This makes the probabilities much tighter:– p(VP V NP NP) = 0.00151– p(VP V NP NP | said) = 0.00001– p(VP V NP NP | gave) = 0.01980 ”
• Michael Collins, 2003 COLT tutorial: “Lexicalized Probabilistic Context-Free Grammars … perform vastly better than PCFGs (88% vs. 73% accuracy)”
![Page 82: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/82.jpg)
Michael Collins (2003, COLT)
![Page 83: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/83.jpg)
Klein and Manning --Accurate Unlexicalized Parsing:
PCFGs and Independence• The symbols in a PCFG define independence
assumptions:
– At any node, the material inside that node is independent of the material outside that node, given the label of that node.
– Any information that statistically connects behavior inside and outside a node must flow through that node.
NP
S
VP
S NP VP
NP DT NNNP
![Page 84: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/84.jpg)
Michael Collins (2003, COLT)
![Page 85: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/85.jpg)
Non-Independence I
• Independence assumptions are often too strong.
• Example: the expansion of an NP is highly dependent on the parent of the NP (i.e., subjects vs. objects).
11%9%
6%
NP PP DT NN PRP
9% 9%
21%
NP PP DT NN PRP
7%4%
23%
NP PP DT NN PRP
All NPs NPs under S NPs under VP
![Page 86: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/86.jpg)
Non-Independence II
• Who cares?– NB, HMMs, all make false assumptions!– For generation, consequences would be obvious.– For parsing, does it impact accuracy?
• Symptoms of overly strong assumptions:– Rewrites get used where they don’t belong.– Rewrites get used too often or too rarely.
In the PTB, this construction is for possesives
![Page 87: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/87.jpg)
Breaking Up the Symbols
• We can relax independence assumptions by encoding dependencies into the PCFG symbols:
• What are the most useful features to encode?
Parent annotation[Johnson 98]
Marking possesive NPs
![Page 88: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/88.jpg)
Annotations
• Annotations split the grammar categories into sub-categories.
• Conditioning on history vs. annotating– P(NP^S PRP) is a lot like P(NP PRP | S)– P(NP-POS NNP POS) isn’t history conditioning.
• Feature grammars vs. annotation– Can think of a symbol like NP^NP-POS as
NP [parent:NP, +POS]
• After parsing with an annotated grammar, the annotations are then stripped for evaluation.
![Page 89: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/89.jpg)
Lexicalization
• Lexical heads are important for certain classes of ambiguities (e.g., PP attachment):
• Lexicalizing grammar creates a much larger grammar.– Sophisticated smoothing needed– Smarter parsing algorithms needed– More data needed
• How necessary is lexicalization?– Bilexical vs. monolexical selection– Closed vs. open class lexicalization
![Page 90: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/90.jpg)
Experimental Setup
• Corpus: Penn Treebank, WSJ
• Accuracy – F1: harmonic mean of per-node labeled precision and recall.
• Size – number of symbols in grammar.– Passive / complete symbols: NP, NP^S– Active / incomplete symbols: NP NP CC
Training: sections 02-21Development: section 22 (first 20 files)Test: section 23
![Page 91: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/91.jpg)
Experimental Process
• We’ll take a highly conservative approach:– Annotate as sparingly as possible– Highest accuracy with fewest symbols– Error-driven, manual hill-climb, adding one
annotation type at a time
![Page 92: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/92.jpg)
Unlexicalized PCFGs
• What do we mean by an “unlexicalized” PCFG?– Grammar rules are not systematically specified down to
the level of lexical items• NP-stocks is not allowed• NP^S-CC is fine
– Closed vs. open class words (NP^S-the)• Long tradition in linguistics of using function words as features
or markers for selection• Contrary to the bilexical idea of semantic heads• Open-class selection really a proxy for semantics
• Honesty checks:– Number of symbols: keep the grammar very small– No smoothing: over-annotating is a real danger
![Page 93: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/93.jpg)
Horizontal Markovization
• Horizontal Markovization: Merges States
70%
71%
72%
73%
74%
0 1 2v 2 inf
Horizontal Markov Order
0
3000
6000
9000
12000
0 1 2v 2 inf
Horizontal Markov Order
Sym
bo
ls
Merged
![Page 94: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/94.jpg)
Vertical Markovization
• Vertical Markov order: rewrites depend on past k ancestor nodes.(cf. parent annotation)
Order 1 Order 2
72%73%74%75%76%77%78%79%
1 2v 2 3v 3
Vertical Markov Order
0
5000
10000
15000
20000
25000
1 2v 2 3v 3
Vertical Markov Order
Sym
bo
ls
![Page 95: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/95.jpg)
Vertical and Horizontal
• Examples:– Raw treebank: v=1, h=– Johnson 98: v=2, h=– Collins 99: v=2, h=2– Best F1: v=3, h=2v
0 1 2v 2 inf1
2
3
66%68%70%72%74%76%78%80%
Horizontal Order
Vertical Order 0 1 2v 2 inf
1
2
3
0
5000
10000
15000
20000
25000
Sym
bo
lsHorizontal Order
Vertical Order
Model F1 Size
Base: v=h=2v 77.8 7.5K
![Page 96: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/96.jpg)
Unary Splits
• Problem: unary rewrites used to transmute categories so a high-probability rule can be used.
Annotation F1 Size
Base 77.8 7.5K
UNARY 78.3 8.0K
Solution: Mark unary rewrite sites with -U
![Page 97: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/97.jpg)
Tag Splits
• Problem: Treebank tags are too coarse.
• Example: Sentential, PP, and other prepositions are all marked IN.
• Partial Solution:– Subdivide the IN tag.
Annotation F1 Size
Previous 78.3 8.0K
SPLIT-IN 80.3 8.1K
![Page 98: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/98.jpg)
Other Tag Splits• UNARY-DT: mark demonstratives as DT^U (“the X”
vs. “those”)
• UNARY-RB: mark phrasal adverbs as RB^U (“quickly” vs. “very”)
• TAG-PA: mark tags with non-canonical parents (“not” is an RB^VP)
• SPLIT-AUX: mark auxiliary verbs with –AUX [cf. Charniak 97]
• SPLIT-CC: separate “but” and “&” from other conjunctions
• SPLIT-%: “%” gets its own tag.
F1 Size
80.4 8.1K
80.5 8.1K
81.2 8.5K
81.6 9.0K
81.7 9.1K
81.8 9.3K
![Page 99: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/99.jpg)
Treebank Splits
• The treebank comes with annotations (e.g., -LOC, -SUBJ, etc).– Whole set together hurt
the baseline.– Some (-SUBJ) were less
effective than our equivalents.
– One in particular was very useful (NP-TMP) when pushed down to the head tag.
– We marked gapped S nodes as well.
Annotation F1 Size
Previous 81.8 9.3K
NP-TMP 82.2 9.6K
GAPPED-S 82.3 9.7K
![Page 100: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/100.jpg)
Yield Splits
• Problem: sometimes the behavior of a category depends on something inside its future yield.
• Examples:– Possessive NPs– Finite vs. infinite VPs– Lexical heads!
• Solution: annotate future elements into nodes.
Annotation F1 Size
Previous 82.3 9.7K
POSS-NP 83.1 9.8K
SPLIT-VP 85.7 10.5K
![Page 101: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/101.jpg)
Distance / Recursion Splits
• Problem: vanilla PCFGs cannot distinguish attachment heights.
• Solution: mark a property of higher or lower sites:– Contains a verb.– Is (non)-recursive.
• Base NPs [cf. Collins 99]• Right-recursive NPs
Annotation F1 Size
Previous 85.7 10.5K
BASE-NP 86.0 11.7K
DOMINATES-V 86.9 14.1K
RIGHT-REC-NP 87.0 15.2K
NP
VP
PP
NP
v
-v
![Page 102: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/102.jpg)
A Fully Annotated Tree
![Page 103: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/103.jpg)
Final Test Set Results
• Beats “first generation” lexicalized parsers.
Parser LP LR F1 CB 0 CB
Magerman 95 84.9 84.6 84.7 1.26 56.6
Collins 96 86.3 85.8 86.0 1.14 59.9
Klein & M 03 86.9 85.7 86.3 1.10 60.3
Charniak 97 87.4 87.5 87.4 1.00 62.1
Collins 99 88.7 88.6 88.6 0.90 67.1
![Page 104: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/104.jpg)
Bilexical statistics are used often [Bikel 2004]
• The 1.49% use of bilexical dependencies suggests they don’t play much of a role in parsing
• But the parser pursues many (very) incorrect theories• So, instead of asking how often the decoder can use bigram probability on
average, ask how often while pursuing its top-scoring theory • Answering question by having parser constrain-parse its own output
– train as normal on §§02–21– parse §00– feed parse trees as constraints
• Percentage of time parser made use of bigram statistics shot up to 28.8%• So, used often, but use barely affect overall parsing accuracy• Exploratory Data Analysis suggests explanation
– distributions that include head words are usually sufficiently similar to those that do not as to make almost no difference in terms of accuracy
![Page 105: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/105.jpg)
Charniak (2000) NAACL: A Maximum-Entropy-Inspired Parser
• There was nothing maximum entropy about it. It was a cleverly smoothed generative model
• Smoothes estimates by smoothing ratio of conditional terms (which are a bit like maxent features):
• Biggest improvement is actually that generative model predicts head tag first and then does P(w|t,…)– Like Collins (1999)
• Markovizes rules similarly to Collins (1999)• Gets 90.1% LP/LR F score on sentences ≤ 40 wds
),,|(
),,,|(
pp
gpp
tlltP
ltlltP
![Page 106: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/106.jpg)
Outside
Petrov and Klein (2006):Learning Latent Annotations
Can you automatically find good symbols?
X1
X2X7X4
X5 X6X3
He was right
.
Brackets are known Base categories are known Induce subcategories Clever split/merge category refinement
EM algorithm, like Forward-Backward for HMMs, but constrained by tree.
Inside
![Page 107: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/107.jpg)
0
5
10
15
20
25
30
35
40
NP
VP PP
AD
VP S
AD
JP
SB
AR QP
WH
NP
PR
N
NX
SIN
V
PR
T
WH
PP
SQ
CO
NJP
FR
AG
NA
C
UC
P
WH
AD
VP
INT
J
SB
AR
Q
RR
C
WH
AD
JP X
RO
OT
LST
Number of phrasal subcategories
![Page 108: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/108.jpg)
Proper Nouns (NNP):
Personal pronouns (PRP):
NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
PRP-0 It He I
PRP-1 it he they
PRP-2 it them him
POS tag splits, commonest words: effectively a class-based model
![Page 109: Trees, Grammars, and Parsing Most slides are taken or adapted from slides by Chris Manning Dan Klein](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe9e/html5/thumbnails/109.jpg)
F1≤ 40 words
F1all words
Parser
Klein & Manning unlexicalized 2003
86.3 85.7
Matsuzaki et al. simple EM latent states 2005
86.7 86.1
Charniak generative (“maxent inspired”) 2000
90.1 89.5
Petrov and Klein NAACL 2007 90.6 90.1
Charniak & Johnson discriminative reranker 2005
92.0 91.4
The Latest Parsing Results…