si485i : nlp
DESCRIPTION
SI485i : NLP. Set 8 PCFGs and the CKY Algorithm. PCFGs. We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules NP -> DET NN with probability 0.34 NP -> NN NN with probability 0.16. PCFGs. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/1.jpg)
SI485i : NLP
Set 8PCFGs and the CKY Algorithm
![Page 2: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/2.jpg)
2
PCFGs• We saw how CFGs can model English (sort of)• Probabilistic CFGs put weights on the production
rules
• NP -> DET NN with probability 0.34• NP -> NN NN with probability 0.16
![Page 3: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/3.jpg)
3
PCFGs• We still parse sentences and come up with a
syntactic derivation tree• But now we can talk about how confident the tree is• P(tree) !
![Page 4: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/4.jpg)
4
Buffalo Example• What is the probability of this tree?
• It’s the product of all the inner trees, e.g., P(S->NP VP)
![Page 5: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/5.jpg)
5
PCFG Formalized• G = (T, N, S, R, P)
• T is set of terminals• N is set of nonterminals• For NLP, we usually distinguish out a set P N of preterminals, ⊂
which always rewrite as terminals• S is the start symbol (one of the nonterminals)• R is rules/productions of the form X → γ, where X is a
nonterminal and γ is a sequence of terminals and nonterminals• P(R) gives the probability of each rule.
• A grammar G generates a language model L.
Some slides adapted from Chris Manning
![Page 6: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/6.jpg)
6
Some notation• w1n = w1 … wn = the word sequence from 1 to n• wab = the subsequence wa … wb
• We’ll write P(Ni → ζj) to mean P(Ni → ζj | Ni )• Take note, this is a conditional probability. For instance, the
sum of all rules headed by an NP must sum to 1!• We’ll want to calculate the best tree T
• maxT P(T * ⇒ wab)
![Page 7: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/7.jpg)
7
Trees and Probabilities• P(t) -- The probability of tree is the product of the
probabilities of the rules used to generate it.• P(w1n) -- The probability of the string is the sum of
the probabilities of all possible trees that have the string as their yield• P(w1n) = Σj P(w1n, tj) where tj is a parse of w1n• = Σj P(tj)
![Page 8: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/8.jpg)
8
Example PCFG
![Page 9: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/9.jpg)
9
![Page 10: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/10.jpg)
10
![Page 11: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/11.jpg)
11
P(tree) computation
![Page 12: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/12.jpg)
12
Time to Parse• Let’s parse!!• Almost ready…• Trees must be in Chomsky Normal Form first.
![Page 13: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/13.jpg)
13
Chomsky Normal Form• All rules are Z -> X Y or Z -> w• Transforming a grammar to CNF does not change its
weak generative capacity.• Remove all unary rules and empties• Transform n-ary rules: VP->V NP PP becomes • VP -> V @VP-V and @VP-V -> NP PP
• Why do we do this? Parsing is easier now.
![Page 14: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/14.jpg)
14
Converting into CNF
![Page 15: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/15.jpg)
15
The CKY Algorithm• Cocke-Kasami-Younger (CKY)
DynamicProgrammingIs back!
![Page 16: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/16.jpg)
16
The CKY AlgorithmNP->NN NNS 0.13p = 0.13 x .0023 x .0014p = 1.87 x 10^-7
NP->NNP NNS 0.056p = 0.056 x .001 x .0014p = 7.84 x 10^-8
![Page 17: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/17.jpg)
17
The CKY Algorithm• What is the runtime? O( ?? )• Note that each cell must
check all pairs of children below it.
• Binarizing the CFG rules is a must. The complexity explodes if you do not.
![Page 18: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/18.jpg)
18
![Page 19: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/19.jpg)
19
![Page 20: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/20.jpg)
20
![Page 21: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/21.jpg)
21
![Page 22: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/22.jpg)
22
![Page 23: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/23.jpg)
23
Evaluating CKY• How do we know if our parser works?
• Count the number of correct labels in your table...the label and the span it dominates• [ label, start, finish ]
• Most trees have an error or two!
• Count how many spans are correct, wrong, and compute a Precision/Recall ratio.
![Page 24: SI485i : NLP](https://reader033.vdocuments.us/reader033/viewer/2022051219/568163f6550346895dd58572/html5/thumbnails/24.jpg)
24
Probabilities?• Where do the probabilities come from?• P( NP -> DT NN ) = ???
• Penn Treebank: a bunch of newspaper articles whose sentences have been manually annotated with full parse trees
• P( NP -> DT NN ) = C( NP -> DT NN ) / C( NP )