pushpak bhattacharyya cse dept., iit bombay 22 nd march, 2011

44
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE Dept., IIT Bombay 22 nd March, 2011

Upload: glyn

Post on 24-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction). Pushpak Bhattacharyya CSE Dept., IIT Bombay 22 nd March, 2011. Penn POS Tags. John wrote those words in the Book of Proverbs. [John/NNP ] wrote/VBD - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 29– CYK; Inside Probability; Parse Tree construction)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

22nd March, 2011

Page 2: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Penn POS Tags

[John/NNP ]wrote/VBD [ those/DT words/NNS ]in/IN [ the/DT Book/NN ]of/IN [ Proverbs/NNS ]

• John wrote those words in the Book of Proverbs.

Page 3: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Penn Treebank

(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in

(NP (NP-TTL (NP the Book)(PP of (NP Proverbs)))

• John wrote those words in the Book of Proverbs.

Page 4: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

PSG Parse Tree Official trading in the shares will start

in Paris on Nov 6. S

VP

NP

NAP

official

PP

trading will start on Nov 6

A

PP

NP

in

P

the shares

NP

PPVAux

in Paris

Page 5: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Penn POS Tags

[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN [ Nov./NNP 6/CD ]

• Official trading in the shares will start in Paris on Nov 6.

Page 6: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner: DT Preposition: IN Coordinating Conjunction CC Subordinating Conjunction: IN Singular Noun: NN Plural Noun: NNS Personal Pronoun: PP Proper Noun: NP Verb base form: VB Modal verb: MD Verb (3sg Pres): VBZ Wh-determiner: WDT Wh-pronoun: WP

Page 7: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK Parsing

(some slides borrowed from Jimmy Lin’s “Syntactic Parsing with CFGs)

Page 8: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Shared Sub-Problems Observation: ambiguous parses

still share sub-trees We don’t want to redo work that’s

already been done Unfortunately, naïve backtracking

leads to duplicate work

Page 9: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Shared Sub-Problems: Example

Page 10: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Efficient Parsing Dynamic programming to the

rescue! Intuition: store partial results in

tables, thereby: Avoiding repeated work on shared

sub-problems Efficiently storing ambiguous

structures with shared sub-parts Two algorithms:

CKY: roughly, bottom-up Earley: roughly, top-down

Page 11: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY Parsing: CNF CKY parsing requires that the

grammar consist of ε-free, binary rules = Chomsky Normal Form All rules of the form:

A BC or Aa What does the tree look like?

What if my CFG isn’t in CNF?

A → B C D → w

Page 12: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY Parsing with Arbitrary CFGs Problem: my grammar has rules like VP → NP

PP PP Can’t apply CKY!

Solution: rewrite grammar into CNF Introduce new intermediate non-terminals

into the grammar What does this mean?

= weak equivalence The rewritten grammar accepts (and

rejects) the same set of strings as the original grammar…

But the resulting derivations (trees) are different

A B C D A X DX B C(Where X is a symbol that

doesn’t occur anywhere else in the grammar)

Page 13: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY Parsing: Intuition Consider the rule D → w

Terminal (word) forms a constituent Trivial to apply

Consider the rule A → B C If there is an A somewhere in the input then

there must be a B followed by a C in the input

First, precisely define span [ i, j ] If A spans from i to j in the input then there

must be some k such that i<k<j Easy to apply: we just need to try different

values for k

i j

k

Page 14: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY Parsing: Table Any constituent can conceivably span [ i, j ] for

all 0≤i<j≤N, where N = length of input string We need an N × N table to keep track of all

spans… But we only need half of the table

Semantics of table: cell [ i, j ] contains A iff A spans i to j in the input string Of course, must be allowed by the

grammar!

Page 15: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY Parsing: Table-Filling In order for A to span [ i, j ]:

A B C is a rule in the grammar, and There must be a B in [ i, k ] and a C in [ k, j ] for

some i<k<j Operationally:

To apply rule A B C, look for a B in [ i, k ] and a C in [ k, j ]

In the table: look left in the row and down in the column

Page 16: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY Algorithm

Page 17: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY Parsing: Recognize or Parse Is this really a parser? Recognizer to parser: add

backpointers!

Page 18: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY: Algorithmic Complexity What’s the asymptotic complexity

of CKY? O(n3)

Page 19: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CKY: Analysis Since it’s bottom up, CKY populates the table with a lot

of “phantom constituents” Spans that are constituents, but cannot really occur

in the context in which they are suggested Conversion of grammar to CNF adds additional non-

terminal nodes Leads to weak equivalence wrt original grammar Additional terminal nodes not (linguistically)

meaningful: but can be cleaned up with post processing

Is there a parsing algorithm for arbitrary CFGs that combines dynamic programming and top-down control?

Yes: Earley Parsing

Page 20: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Penn Treebank

( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start

(PP-LOC in (NP Paris))

(PP-TMP on (NP (NP Nov 6)

• Official trading in the shares will start in Paris on Nov 6.

Page 21: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Probabilistic Context Free Grammars

S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4

DT the 1.0 NN gunman 0.5 NN building 0.5 VBD sprayed 1.0 NNS bullets 1.0

Page 22: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Example Parse t1

The gunman sprayed the building with bullets. S1.0

NP0.5 VP0.6

DT1.0NN0.5

VBD1.0NP0.5

PP1.0

DT1.0 NN0.5

P1.0 NP0.3

NNS1.0

bullets

with

buildingthe

The gunman

sprayed

P (t1) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225VP0.4

Page 23: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Another Parse t2

S1.0

NP0.5 VP0.4

DT1.0 NN0.5VBD1.0

NP0.5 PP1.0

DT1.0 NN0.5 P1.0 NP0.3

NNS1.

0bullets

withbuilding

the

Thegunman

sprayed

NP0.2

P (t2) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015

The gunman sprayed the building with bullets.

Page 24: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

Illustrating CYK [Cocke, Younger, Kashmi] Algo

S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4

• DT the 1.0• NN gunman 0.5• NN building 0.5• VBD sprayed 1.0• NNS bullets 1.0

Page 25: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: Start with (0,1)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT1 -------2 ------- -------

--3 ------- -------

----------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 26: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: Keep filling diagonals0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT1 ------- NN2 ------- -------

--3 ------- -------

----------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 27: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: Try getting higher level structures

0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP1 ------- NN2 ------- -------

--3 ------- -------

----------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 28: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: Diagonal continues0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP1 ------- NN2 ------- -------

--VBD

3 ------- ---------

--------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 29: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

1 ------- NN --------

2 ------- ---------

VBD

3 ------- ---------

--------

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 30: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

1 ------- NN --------

2 ------- ---------

VBD

3 ------- ---------

--------

DT

4 --------

---------

--------

---------

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 31: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

1 ------- NN --------

---------

2 ------- ---------

VBD ---------

3 ------- ---------

--------

DT

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 32: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: starts filling the 5th column0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

1 ------- NN --------

---------

2 ------- ---------

VBD ---------

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 33: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

1 ------- NN --------

---------

2 ------- ---------

VBD ---------

VP

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 34: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

1 ------- NN --------

---------

---------

2 ------- ---------

VBD ---------

VP

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 35: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: S found, but NO termination!

0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S

1 ------- NN --------

---------

---------

2 ------- ---------

VBD ---------

VP

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

6 --------

---------

--------

---------

---------

---------

Page 36: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S

1 ------- NN --------

---------

---------

2 ------- ---------

VBD ---------

VP

3 ------- ---------

--------

DT NP

4 --------

---------

--------

---------

NN

5 --------

---------

--------

---------

---------

P

6 --------

---------

--------

---------

---------

---------

Page 37: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

3 ------- ---------

--------

DT NP ---------

4 --------

---------

--------

---------

NN ---------

5 --------

---------

--------

---------

---------

P

6 --------

---------

--------

---------

---------

---------

Page 38: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: Control moves to last column

0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

3 ------- ---------

--------

DT NP ---------

4 --------

---------

--------

---------

NN ---------

5 --------

---------

--------

---------

---------

P

6 --------

---------

--------

---------

---------

---------

NPNNS

Page 39: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

3 ------- ---------

--------

DT NP ---------

4 --------

---------

--------

---------

NN ---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

Page 40: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

3 ------- ---------

--------

DT NP ---------

NP

4 --------

---------

--------

---------

NN ---------

---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

Page 41: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK (cont…)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

VP

3 ------- ---------

--------

DT NP ---------

NP

4 --------

---------

--------

---------

NN ---------

---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

Page 42: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: filling the last column0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

1 ------- NN --------

---------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

VP

3 ------- ---------

--------

DT NP ---------

NP

4 --------

---------

--------

---------

NN ---------

---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

Page 43: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: terminates with S in (0,7)0 The 1 gunman 2 sprayed 3 the 4 building 5 with 6 bullets 7.

To From

1 2 3 4 5 6 7

0 DT NP --------

---------

S ---------

S

1 ------- NN --------

---------

---------

---------

---------

2 ------- ---------

VBD ---------

VP ---------

VP

3 ------- ---------

--------

DT NP ---------

NP

4 --------

---------

--------

---------

NN ---------

---------

5 --------

---------

--------

---------

---------

P PP

6 --------

---------

--------

---------

---------

---------

NPNNS

Page 44: Pushpak Bhattacharyya CSE Dept.,  IIT  Bombay   22 nd  March, 2011

CYK: Extracting the Parse Tree

The parse tree is obtained by keeping back pointers.

S (0-7)

NP (0-2)

VP (2-7)

VBD (2-3)

NP (3-7)

DT (0-1)

NN (1-2)

The gunman

sprayed

NP (3-5)

PP (5-7)

DT (3-4)

NN (4-5)

P (5-6)

NP (6-7)

NNS (6-7)the buildin

gwith

bullets