lecture 10 parsing ii - university of california, san...

61
Probabilistic CKY Roger Levy [thanks to Jason Eisner]

Upload: others

Post on 28-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Probabilistic CKY

Roger Levy

[thanks to Jason Eisner]

Page 2: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Managing Ambiguity

•  John saw Mary •  Typhoid Mary •  Phillips screwdriver Mary note how rare rules interact

•  I see a bird •  is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences

•  Time flies like an arrow • Fruit flies like a banana • Time reactions like this one • Time reactions like a chemist • or is it just an NP?

Page 3: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Our bane: Ambiguity

•  John saw Mary •  Typhoid Mary •  Phillips screwdriver Mary note how rare rules interact

•  I see a bird •  is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences

•  Time | flies like an arrow NP VP • Fruit flies | like a banana NP VP • Time | reactions like this one V[stem] NP • Time reactions | like a chemist S PP • or is it just an NP?

Page 4: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

How to solve this combinatorial explosion of ambiguity?

1.  First try parsing without any weird rules, throwing them in only if needed.

2.  Better: every rule has a weight. A tree’s weight is total weight of all its rules. Pick the overall lightest parse of sentence.

3.  Can we pick the weights automatically? We’ll get to this later …

Page 5: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

The plan for the rest of parsing

•  There’s probability (inference) and then there’s statistics (model estimation)

•  These two problems are logically separate, yet interrelated in practice

•  We’ll start with the problem of efficient inference (probabilistic bottom-up parsing)

•  Then we’ll move to the estimation of good (generative) models that permit efficient bottom-up parsing

•  Finally, we’ll mention state-of-the-art work that doesn’t permit exact inference (“reranking” approaches)

Page 6: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

The critical recursion

•  We’ll do bottom-up parsing (weighted CKY) •  When combining constituents into a larger constituent:

• The weight of the new constituent is the sum of the weights of the combined subconstituents…

• …plus the weight of the rule used to combine the subconstituents

Page 7: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

1 NP 4 VP 4

2 P 2 V 5

3 Det 1 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 8: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

1 NP 4 VP 4

2 P 2 V 5

3 Det 1 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 9: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10

1 NP 4 VP 4

2 P 2 V 5

3 Det 1 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 10: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8

1 NP 4 VP 4

2 P 2 V 5

3 Det 1 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 11: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

2 P 2 V 5

3 Det 1 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 12: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

2 P 2 V 5

3 Det 1 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 13: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

2 P 2 V 5

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 14: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

2 P 2 V 5

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 15: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

2 P 2 V 5

PP 12

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 16: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 17: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 18: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

NP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 19: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

NP 18 S 21

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 20: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 21: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 22: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 23: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 24: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 25: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 26: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 27: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 28: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 29: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Page 30: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

S Follow backpointers …

Page 31: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

S

NP VP

Page 32: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

S

NP VP

VP PP

Page 33: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

S

NP VP

VP PP

P NP

Page 34: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

S

NP VP

VP PP

P NP

Det N

Page 35: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Which entries do we need?

Page 36: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Which entries do we need?

Page 37: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Not worth keeping …

Page 38: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

… since it just breeds worse options

Page 39: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Keep only best-in-class!

“inferior stock”

Page 40: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5 NP 3 Vst 3

NP 10 S 8

NP 24 S 22

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Keep only best-in-class! (and backpointers so you can recover parse)

Page 41: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Probabilistic Trees

•  Instead of lightest weight tree, take highest probability tree •  Given any tree, your assignment 1 generator would have some

probability of producing it! •  Just like using n-grams to choose among strings … •  What is the probability of this tree?

S

NP time

VP

VP flies

PP

P like

NP

Det an

N arrow

Page 42: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Probabilistic Trees

•  Instead of lightest weight tree, take highest probability tree

•  Given any tree, your assignment 1 generator would have some probability of producing it!

•  Just like using n-grams to choose among strings … •  What is the probability of this tree?

•  You rolled a lot of independent dice …

S

NP time

VP

VP flies

PP

P like

NP

Det an

N arrow

p( | S)

Page 43: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Chain rule: One word at a time

p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

Page 44: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Chain rule + backoff (to get trigram model)

p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

Page 45: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Chain rule – written differently

p(time flies like an arrow) = p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an)

Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)

Page 46: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Chain rule + backoff

p(time flies like an arrow) = p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an)

Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)

Page 47: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Chain rule: One node at a time

S

NP time

VP

VP flies

PP

P like

NP

Det an

N arrow

p( | S) = p( S

NP VP | S) * p(

S

NP time

VP |

S

NP VP )

* p( S

NP time

VP

VP PP

| S

NP time

VP )

* p( S

NP time

VP

VP flies

PP

| S

NP time

VP ) * …

VP PP

Page 48: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Chain rule + backoff

S

NP time

VP

VP flies

PP

P like

NP

Det an

N arrow

p( | S) = p( S

NP VP | S) * p(

S

NP time

VP |

S

NP VP )

* p( S

NP time

VP

VP PP

| S

NP time

VP )

* p( S

NP time

VP

VP flies

PP

| S

NP time

VP ) * …

VP PP

Page 49: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Simplified notation

S

NP time

VP

VP flies

PP

P like

NP

Det an

N arrow

p( | S) = p(S → NP VP | S) * p(NP → flies | NP)

* p(VP → VP NP | VP)

* p(VP → flies | VP) * …

Page 50: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Already have a CKY alg for weights … S

NP time

VP

VP flies

PP

P like

NP

Det an

N arrow

w( | S) = w(S → NP VP) + w(NP → flies | NP)

+ w(VP → VP NP)

+ w(VP → flies) + …

Just let w(X → Y Z) = -log p(X → Y Z | X) Then lightest tree has highest prob

Page 51: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

multiply to get 2-22

2-8

2-12

2-2

Page 52: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

multiply to get 2-22

2-8

2-12

2-2 2-13

Need only best-in-class to get best parse

Page 53: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Why probabilities not weights? •  We just saw probabilities are really just a special case

of weights … •  … but we can estimate them from training data by

counting and smoothing! •  Warning: What kind of training data do we need for this

type of estimation?

Page 54: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

A slightly different task

•  One task: What is probability of generating a given tree with a PCFG generator? • To pick tree with highest prob: useful in parsing.

•  But could also ask: What is probability of generating a given string with the generator? • To pick string with highest prob: useful in speech

recognition, as substitute for an n-gram model. •  (“Put the file in the folder” vs. “Put the file and the

folder”) • To get prob of generating string, must add up

probabilities of all trees for the string …

Page 55: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S 8 S 13

NP 24 S 22 S 27 NP 24 S 27 S 22 S 27

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Could just add up the parse probabilities

2-22 2-27

2-27 2-22 2-27

oops, back to finding exponentially many

parses

Page 56: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S S 2-13

NP 24 S 22 S 27 NP 24 S 27 S S

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 2-12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2-2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Any more efficient way?

2-8

2-22

2-27

Page 57: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S

NP 24 S 22 S 27 NP 24 S 27 S

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 2-12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2-2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Add as we go … (the “inside algorithm”)

2-8+2-13

2-22

+2-27

Page 58: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP 3 Vst 3

NP 10 S

NP

S

1 NP 4 VP 4

NP 18 S 21 VP 18

2 P 2 V 5

PP 2-12 VP 16

3 Det 1 NP 10 4 N 8

1 S → NP VP 6 S → Vst NP 2-2 S → S PP

1 VP → V NP 2 VP → VP PP

1 NP → Det N 2 NP → NP PP 3 NP → NP NP

0 PP → P NP

Add as we go … (the “inside algorithm”)

2-8+2-13

+2-22

+2-27

2-22

+2-27 2-22

+2-27 +2-27

Page 59: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

PCFGs and HMMs

•  There is a natural connection between: •  the inside algorithm for filling out a probabilistic CKY

chart • and the forward algorithm for filling out an HMM trellis

•  Do you see the relationship?

•  However, bottom-up for PCFGs and left-to-right for HMMs are not the only ways to go for DPs • You can go outside-in for PCFGs • We’ll see left-to-right (the Earley algorithm) for PCFGs • And you can go right-to-left for HMMs

Page 60: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

Charts and lattices

•  You can equivalently represent a parse chart as a lattice constructed over some initial arcs

•  This will also set the stage for Earley parsing later salt flies scratch

NP N

NP S

S

VP V N NP

S VP NP

N NP V VP salt

N NP V VP

flies scratch

N NP V VP N NP

NP S NP VP S

S

S → NP VP VP→ V NP VP→V NP→ N NP→ N N

Page 61: lecture 10 parsing II - University of California, San Diegoidiom.ucsd.edu/~rlevy/lign256/winter2008/ppt/lecture_10_parsing_II.pdfNP 10 S 8 S 13 NP 24 S 22 S 27 NP 24 S 27 1 NP 4 VP

(Speech) Lattices

•  There was nothing magical about words spanning exactly one position.

•  When working with speech, we generally don’t know how many words there are, or where they break.

•  We can represent the possibilities as a lattice and parse these just as easily.

I awe

of

van

eyes

saw a

‘ve

an

Ivan