statistical nlp winter 2009

Post on 22-Feb-2016

36 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Statistical NLP Winter 2009. Lecture 10: Parsing I. Roger Levy Thanks to Jason Eisner & Dan Klein for slides. Why is natural language parsing hard?. As language structure gets more abstract, computing it gets harder Document classification finite number of classes - PowerPoint PPT Presentation

TRANSCRIPT

Statistical NLPWinter 2009

Lecture 10: Parsing I

Roger Levy

Thanks to Jason Eisner & Dan Klein for slides

Why is natural language parsing hard?

• As language structure gets more abstract, computing it gets harder

• Document classification• finite number of classes• fast computation at test time

• Part-of-speech tagging (recovering label sequences)• Exponentially many possible tag sequences• But exact computation possible in O(n)

• Parsing (recovering labeled trees)• Exponentially many, or even infinite, possible trees• Exact inference worse than tagging, but still within reach

Why is parsing harder than tagging

• How many trees are there for a given string?• Imagine a rule VPVP

• …∞!

• This is not a problem for inferring availability of structures (why?)

• Nor is this a problem for inferring the most probable structure in a PCFG (why?)

Why parsing is harder than tagging II

• Ingredient 1: syntactic category ambiguity• Exponentially many category sequences, like tagging

• Ingredient 2: attachment ambiguity• Classic case: prepositional-phrase (PP) attachment• 1 PP: no ambiguity

• 2 PPs: some ambiguity

Why parsing is harder than tagging III

• 3 PPs: much more attachment ambiguity!

• 5 PPs: 14 trees, 6 PPs: 42 trees, 7 PPs: 132 trees…

Why parsing is harder than tagging IV

• Tree-structure ambiguity grows like the Catalan numbers (Knuth, 1975; Church & Patil, 1982)

• This is factorial growth on top of the exponential growth associated with sequence label ambiguity

Why parsing is still tractable

• This all makes parsing look really bad• But there’s still hope• Those factorially many parses are different

combinations of common subparts

How to parse tractably

• Recall that we did HMM part-of-speech tagging by storing partial results in a trellis

• An HMM is a special type of grammar with essentially two types of rules:• “Category Y can follow category X (with cost π)”• “Category X can be realized as word w (with cost η)”

• The trellis is a graph whose structure reflects its rules• Edges between all sequentially adjacent category

pairs

How to parse tractably II

• But a (weighted) CFG has more complicated rules:1. “Category X can rewrite as categories α (with cost π)”2. “Preterminal X can be realized as word w (with cost η)”

• (2 is really a special case of 1)• A graph is not rich enough to reflect CFG/tree structure

• Phrases need to be stored as partial results• We also need rule combination structure

• We’ll do this with hypergraphs

How to parse tractably III

• Hypergraphs are like graphs, but have hyper-edges instead of edges

• “We observe a DT as word 1 and an NN as word 2.”• “Together, these let us infer an NP spanning words 1—2.”

start state allows us to infer each of these

both of these are needed to infer this

How to parse tractably IV

• Hypergraph for Bird shot flies• (only partial)

Spanning words 1—2 Spanning words 2—3

Spanning words 1—3

Grammar:S NP VPVP V NP VP VNP N NP N N

Goal

How to parse tractably V

• The nodes in the hypergraph can be thought of as being arranged in a triangle

• For a sentence of length N, this is the upper right triangle of an N×N matrix

• This matrix is called the parse chart

How to parse tractably VI

• Before we study examples of parsing, let’s linger on the hypergraph for a moment

• The goal of parsing is to fully interconnect all the evidence (words) and the goal

• This could be done from the bottom up…

• …or from the top down & left to right• These correspond to different parse

strategies• Today: bottom-up (later: top-down)

Bottom-up (CKY) parsing

• Bottom-up is the most straightforward efficient parsing algorithm to implement

• Known as Cocke-Kasami-Young (CKY) algorithm• We’ll illustrate it for the weighted CFG instance• Each rule has a weight (log-prob) associated with it• We’re looking for the “lightest” (lowest-weight or,

equivalently, highest-probability) tree T for sentence S• Implicitly this is Bayes’ rule!

CKY parsing II

• Here’s the (partial) grammar we’ll use:

• The sentence we’ll parse (see the ambiguity?):

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Time flies like an arrow

Imperative verb:“Do the dishes!”

3 NP time4 NP flies4 VP flies3 Vst time2 P like5 V like1 Det an8 N arrow

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

SFollow backpointers …

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

P NP

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

P NP

Det N

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Which entries do we need?

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Which entries do we need?

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Not worth keeping …

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

… since it just breeds worse options

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Keep only best-in-class!

“inferior stock”

time 1 flies 2 like 3 an 4 arrow 5NP3Vst3

NP10S8

NP24S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Keep only best-in-class!(and backpointers so you can recover parse)

Computational complexity of parsing

• This approach has good space complexity• O(GN2) where G is the # categories in the grammar

• What is the time complexity of the algorithm?• It’s cubic in N…why?

• What about time complexity in G?• First, a clarification is in order• CFG rules can have right-hand sides of arbitrary length

X α• But CKY works only w/ right-hand sides of max length 2

• So we need to convert the CFG for use with CKY

Computational complexity II

• Any CFG can be transformed into a new CFG whose rules are at most binary-branching (α=2)• (Look up Chomsky normal form in the book for an example)

• This transformation is reversible with no loss of information• It’s also possible to similarly transform weighted CFGs• This makes CKY possible, and it is cubic in G

top related