pushpak bhattacharyya cse dept., iit bombay 22 nd march, 2011

Post on 24-Feb-2016

30 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction). Pushpak Bhattacharyya CSE Dept., IIT Bombay 22 nd March, 2011. Penn POS Tags. John wrote those words in the Book of Proverbs. [John/NNP ] wrote/VBD - PowerPoint PPT Presentation

TRANSCRIPT

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 29– CYK; Inside Probability; Parse Tree construction)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

22nd March, 2011

Penn POS Tags

[John/NNP ]wrote/VBD [ those/DT words/NNS ]in/IN [ the/DT Book/NN ]of/IN [ Proverbs/NNS ]

• John wrote those words in the Book of Proverbs.

Penn Treebank

(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in

(NP (NP-TTL (NP the Book)(PP of (NP Proverbs)))

• John wrote those words in the Book of Proverbs.

PSG Parse Tree Official trading in the shares will start

in Paris on Nov 6. S

VP

NP

NAP

official

PP

trading will start on Nov 6

A

PP

NP

in

P

the shares

NP

PPVAux

in Paris

Penn POS Tags

[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN [ Nov./NNP 6/CD ]

• Official trading in the shares will start in Paris on Nov 6.

Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner: DT Preposition: IN Coordinating Conjunction CC Subordinating Conjunction: IN Singular Noun: NN Plural Noun: NNS Personal Pronoun: PP Proper Noun: NP Verb base form: VB Modal verb: MD Verb (3sg Pres): VBZ Wh-determiner: WDT Wh-pronoun: WP

CYK Parsing

(some slides borrowed from Jimmy Lin’s “Syntactic Parsing with CFGs)

Shared Sub-Problems Observation: ambiguous parses

still share sub-trees We don’t want to redo work that’s

already been done Unfortunately, naïve backtracking

leads to duplicate work

Shared Sub-Problems: Example

Efficient Parsing Dynamic programming to the

rescue! Intuition: store partial results in

tables, thereby: Avoiding repeated work on shared

sub-problems Efficiently storing ambiguous

structures with shared sub-parts Two algorithms:

CKY: roughly, bottom-up Earley: roughly, top-down

CKY Parsing: CNF CKY parsing requires that the

grammar consist of ε-free, binary rules = Chomsky Normal Form All rules of the form:

A BC or Aa What does the tree look like?

What if my CFG isn’t in CNF?

A → B C D → w

CKY Parsing with Arbitrary CFGs Problem: my grammar has rules like VP → NP

PP PP Can’t apply CKY!

Solution: rewrite grammar into CNF Introduce new intermediate non-terminals

into the grammar What does this mean?

= weak equivalence The rewritten grammar accepts (and

rejects) the same set of strings as the original grammar…

But the resulting derivations (trees) are different

A B C D A X DX B C(Where X is a symbol that

doesn’t occur anywhere else in the grammar)

CKY Parsing: Intuition Consider the rule D → w

Terminal (word) forms a constituent Trivial to apply

Consider the rule A → B C If there is an A somewhere in the input then

there must be a B followed by a C in the input

First, precisely define span [ i, j ] If A spans from i to j in the input then there

must be some k such that i<k<j Easy to apply: we just need to try different

values for k

i j

k

CKY Parsing: Table Any constituent can conceivably span [ i, j ] for

all 0≤i<j≤N, where N = length of input string We need an N × N table to keep track of all

spans… But we only need half of the table

Semantics of table: cell [ i, j ] contains A iff A spans i to j in the input string Of course, must be allowed by the

grammar!

CKY Parsing: Table-Filling In order for A to span [ i, j ]:

A B C is a rule in the grammar, and There must be a B in [ i, k ] and a C in [ k, j ] for

some i<k<j Operationally:

To apply rule A B C, look for a B in [ i, k ] and a C in [ k, j ]

In the table: look left in the row and down in the column

CKY Algorithm

CKY Parsing: Recognize or Parse Is this really a parser? Recognizer to parser: add

backpointers!

CKY: Algorithmic Complexity What’s the asymptotic complexity

of CKY? O(n3)

CKY: Analysis Since it’s bottom up, CKY populates the table with a lot

of “phantom constituents” Spans that are constituents, but cannot really occur

in the context in which they are suggested Conversion of grammar to CNF adds additional non-

terminal nodes Leads to weak equivalence wrt original grammar Additional terminal nodes not (linguistically)

meaningful: but can be cleaned up with post processing

Is there a parsing algorithm for arbitrary CFGs that combines dynamic programming and top-down control?

Yes: Earley Parsing

Penn Treebank

( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start

(PP-LOC in (NP Paris))

(PP-TMP on (NP (NP Nov 6)

• Official trading in the shares will start in Paris on Nov 6.

Probabilistic Context Free Grammars

S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4

DT the 1.0 NN gunman 0.5 NN building 0.5 VBD sprayed 1.0 NNS bullets 1.0

Example Parse t1

The gunman sprayed the building with bullets. S1.0

NP0.5 VP0.6

DT1.0NN0.5

VBD1.0NP0.5

PP1.0

DT1.0 NN0.5

P1.0 NP0.3

NNS1.0

bullets

with

buildingthe

The gunman

sprayed

P (t1) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225VP0.4

Another Parse t2

S1.0

NP0.5 VP0.4

DT1.0 NN0.5VBD1.0

NP0.5 PP1.0

DT1.0 NN0.5 P1.0 NP0.3

NNS1.

0bullets

withbuilding

the

Thegunman

sprayed

NP0.2

P (t2) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015

The gunman sprayed the building with bullets.

Illustrating CYK [Cocke, Younger, Kashmi] Algo

S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4

• DT the 1.0• NN gunman 0.5• NN building 0.5• VBD sprayed 1.0• NNS bullets 1.0

CYK: Start with (0,1)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT1 -------2 ------- -------

--3 ------- -------

----------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK: Keep filling diagonals0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT1 ------- NN2 ------- -------

--3 ------- -------

----------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK: Try getting higher level structures

0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP1 ------- NN2 ------- -------

--3 ------- -------

----------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK: Diagonal continues0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP1 ------- NN2 ------- -------

--VBD

3 ------- ---------

--------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

1 ------- NN --------

2 ------- ---------

VBD

3 ------- ---------

--------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

1 ------- NN --------

2 ------- ---------

VBD

3 ------- ---------

--------

DT

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

1 ------- NN --------

---------

2 ------- ---------

VBD ---------

3 ------- ---------

--------

DT

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK: starts filling the 5th column0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

1 ------- NN --------

---------

2 ------- ---------

VBD ---------

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

1 ------- NN --------

---------

2 ------- ---------

VBD ---------

VP

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

1 ------- NN --------

---------

---------

2 ------- ---------

VBD ---------

VP

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK: S found, but NO termination!

0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S

1 ------- NN --------

---------

---------

2 ------- ---------

VBD ---------

VP

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S

1 ------- NN --------

---------

---------

2 ------- ---------

VBD ---------

VP

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

P

6 --------

---------

--------

---------

---------

---------

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

3 ------- ---------

--------

DT NP ---------

4 --------

---------

--------

---------

NN ---------

5 --------

---------

--------

---------

---------

P

6 --------

---------

--------

---------

---------

---------

CYK: Control moves to last column

0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

3 ------- ---------

--------

DT NP ---------

4 --------

---------

--------

---------

NN ---------

5 --------

---------

--------

---------

---------

P

6 --------

---------

--------

---------

---------

---------

NPNNS

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

3 ------- ---------

--------

DT NP ---------

4 --------

---------

--------

---------

NN ---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

3 ------- ---------

--------

DT NP ---------

NP

4 --------

---------

--------

---------

NN ---------

---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

VP

3 ------- ---------

--------

DT NP ---------

NP

4 --------

---------

--------

---------

NN ---------

---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

CYK: filling the last column0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

VP

3 ------- ---------

--------

DT NP ---------

NP

4 --------

---------

--------

---------

NN ---------

---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

CYK: terminates with S in (0,7)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

S

1 ------- NN --------

---------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

VP

3 ------- ---------

--------

DT NP ---------

NP

4 --------

---------

--------

---------

NN ---------

---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

CYK: Extracting the Parse Tree

The parse tree is obtained by keeping back pointers.

S (0-7)

NP (0-2)

VP (2-7)

VBD (2-3)

NP (3-7)

DT (0-1)

NN (1-2)

The gunman

sprayed

NP (3-5)

PP (5-7)

DT (3-4)

NN (4-5)

P (5-6)

NP (6-7)

NNS (6-7)the buildin

gwith

bullets

top related