parsing with context free grammars csc 9010 natural language processing paula matuszek and...

49
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim Martin (after Dan Jurafsky) from U. Colorado Rada Mihalcea, University of North Texas http://www.cs.unt.edu/~rada/CSCE5290/ Robert Berwick, MIT Bonnie Dorr, University of Maryland

Upload: arron-baker

Post on 03-Jan-2016

263 views

Category:

Documents


0 download

TRANSCRIPT

Parsing with Context Free GrammarsCSC 9010 Natural Language Processing

Paula Matuszek and Mary-Angela Papalaskari

This slide set was adapted frombullJim Martin (after Dan Jurafsky) from U ColoradobullRada Mihalcea University of North Texas httpwwwcsuntedu~radaCSCE5290bull Robert Berwick MITbullBonnie Dorr University of Maryland

Slide 1

Parsing

Mapping from strings to structured representation

bull Parsing with CFGs refers to the task of assigning correct trees to input strings

bull Correct here means a tree that covers all and only the elements of the input and has an S at the top

bull It doesnrsquot actually mean that the system can select the correct tree from among the possible trees

bull As with everything of interest parsing involves a search which involves the making of choices

bull Wersquoll start with some basic methods before moving on to more complex ones

Slide 1

Programming languages

max = min = grade

Read and process the rest of the grades

while (grade gt= 0)

count++

sum += grade

if (grade gt max)

max = grade

else

if (grade lt min)

min = grade

Systemoutprint (Enter the next grade (-1 to quit) )

grade = KeyboardreadInt ()

bull Easy to parsebull Designed that way

Slide 1

Natural Languages

max = min = grade Read and process the rest of the grades while (grade gt= 0)

count++ sum += grade if (grade gt max) max = grade else if (grade lt min) min =

grade Systemoutprint (Enter the next grade (-1 to quit) ) grade =

KeyboardreadInt ()

bull No ( ) [ ] to indicate scope and precedence

bull Lots of overloading (arity varies)

bull Grammar isnrsquot known in advance

bullContext-free grammar is not the best formalism

Slide 1

Some assumptions

bull You have all the words already in some buffer

bull The input isnrsquot pos tagged

bull We wonrsquot worry about morphological analysis

bull All the words are known

Slide 1

Top-Down Parsing

bull Since wersquore trying to find trees rooted with an S (Sentences) start with the rules that give us an S

bull Then work your way down from there to the words

Slide 1

Top Down Space

Slide 1

Bottom-Up Parsing

bull Of course we also want trees that cover the input words So start with trees that link up with the words in the right way

bull Then work your way up from there

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Parsing

Mapping from strings to structured representation

bull Parsing with CFGs refers to the task of assigning correct trees to input strings

bull Correct here means a tree that covers all and only the elements of the input and has an S at the top

bull It doesnrsquot actually mean that the system can select the correct tree from among the possible trees

bull As with everything of interest parsing involves a search which involves the making of choices

bull Wersquoll start with some basic methods before moving on to more complex ones

Slide 1

Programming languages

max = min = grade

Read and process the rest of the grades

while (grade gt= 0)

count++

sum += grade

if (grade gt max)

max = grade

else

if (grade lt min)

min = grade

Systemoutprint (Enter the next grade (-1 to quit) )

grade = KeyboardreadInt ()

bull Easy to parsebull Designed that way

Slide 1

Natural Languages

max = min = grade Read and process the rest of the grades while (grade gt= 0)

count++ sum += grade if (grade gt max) max = grade else if (grade lt min) min =

grade Systemoutprint (Enter the next grade (-1 to quit) ) grade =

KeyboardreadInt ()

bull No ( ) [ ] to indicate scope and precedence

bull Lots of overloading (arity varies)

bull Grammar isnrsquot known in advance

bullContext-free grammar is not the best formalism

Slide 1

Some assumptions

bull You have all the words already in some buffer

bull The input isnrsquot pos tagged

bull We wonrsquot worry about morphological analysis

bull All the words are known

Slide 1

Top-Down Parsing

bull Since wersquore trying to find trees rooted with an S (Sentences) start with the rules that give us an S

bull Then work your way down from there to the words

Slide 1

Top Down Space

Slide 1

Bottom-Up Parsing

bull Of course we also want trees that cover the input words So start with trees that link up with the words in the right way

bull Then work your way up from there

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Programming languages

max = min = grade

Read and process the rest of the grades

while (grade gt= 0)

count++

sum += grade

if (grade gt max)

max = grade

else

if (grade lt min)

min = grade

Systemoutprint (Enter the next grade (-1 to quit) )

grade = KeyboardreadInt ()

bull Easy to parsebull Designed that way

Slide 1

Natural Languages

max = min = grade Read and process the rest of the grades while (grade gt= 0)

count++ sum += grade if (grade gt max) max = grade else if (grade lt min) min =

grade Systemoutprint (Enter the next grade (-1 to quit) ) grade =

KeyboardreadInt ()

bull No ( ) [ ] to indicate scope and precedence

bull Lots of overloading (arity varies)

bull Grammar isnrsquot known in advance

bullContext-free grammar is not the best formalism

Slide 1

Some assumptions

bull You have all the words already in some buffer

bull The input isnrsquot pos tagged

bull We wonrsquot worry about morphological analysis

bull All the words are known

Slide 1

Top-Down Parsing

bull Since wersquore trying to find trees rooted with an S (Sentences) start with the rules that give us an S

bull Then work your way down from there to the words

Slide 1

Top Down Space

Slide 1

Bottom-Up Parsing

bull Of course we also want trees that cover the input words So start with trees that link up with the words in the right way

bull Then work your way up from there

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Natural Languages

max = min = grade Read and process the rest of the grades while (grade gt= 0)

count++ sum += grade if (grade gt max) max = grade else if (grade lt min) min =

grade Systemoutprint (Enter the next grade (-1 to quit) ) grade =

KeyboardreadInt ()

bull No ( ) [ ] to indicate scope and precedence

bull Lots of overloading (arity varies)

bull Grammar isnrsquot known in advance

bullContext-free grammar is not the best formalism

Slide 1

Some assumptions

bull You have all the words already in some buffer

bull The input isnrsquot pos tagged

bull We wonrsquot worry about morphological analysis

bull All the words are known

Slide 1

Top-Down Parsing

bull Since wersquore trying to find trees rooted with an S (Sentences) start with the rules that give us an S

bull Then work your way down from there to the words

Slide 1

Top Down Space

Slide 1

Bottom-Up Parsing

bull Of course we also want trees that cover the input words So start with trees that link up with the words in the right way

bull Then work your way up from there

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Some assumptions

bull You have all the words already in some buffer

bull The input isnrsquot pos tagged

bull We wonrsquot worry about morphological analysis

bull All the words are known

Slide 1

Top-Down Parsing

bull Since wersquore trying to find trees rooted with an S (Sentences) start with the rules that give us an S

bull Then work your way down from there to the words

Slide 1

Top Down Space

Slide 1

Bottom-Up Parsing

bull Of course we also want trees that cover the input words So start with trees that link up with the words in the right way

bull Then work your way up from there

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Top-Down Parsing

bull Since wersquore trying to find trees rooted with an S (Sentences) start with the rules that give us an S

bull Then work your way down from there to the words

Slide 1

Top Down Space

Slide 1

Bottom-Up Parsing

bull Of course we also want trees that cover the input words So start with trees that link up with the words in the right way

bull Then work your way up from there

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Top Down Space

Slide 1

Bottom-Up Parsing

bull Of course we also want trees that cover the input words So start with trees that link up with the words in the right way

bull Then work your way up from there

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Bottom-Up Parsing

bull Of course we also want trees that cover the input words So start with trees that link up with the words in the right way

bull Then work your way up from there

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Bottom-Up Space

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Top-Down VS Bottom-Up

bull Top-downndash Only searches for trees that can be answersndash But suggests trees that are not consistent with the wordsndash Guarantees that tree starts with S as rootndash Does not guarantee that tree will match input words

bull Bottom-upndash Only forms trees consistent with the wordsndash Suggest trees that make no sense globallyndash Guarantees that tree matches input wordsndash Does not guarantee that parse tree will lead to S as a root

bull Combine the advantages of the two by doing a search constrained from both sides (top and bottom)

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Top-Down Depth-First Left-to-Right Search

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Example (contrsquod)

flight flight

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Example (contrsquod)

flightflight

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Bottom-Up Filtering

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Possible Problem Left-Recursion

What happens in the following situationS -gt NP VPS -gt Aux NP VPNP -gt NP PPNP -gt Det NominalhellipWith the sentence starting with

Did the flighthellip

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Solution Rule Ordering

S -gt Aux NP VPS -gt NP VPNP -gt Det NominalNP -gt NP PP

The key for the NP is that you want the recursive option after any base case

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Avoiding Repeated Work

Parsing is hard and slow Itrsquos wasteful to redo stuff over and over and over

Consider an attempt to top-down parse the following as an NP

A flight from Indianapolis to Houston on TWA

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

flight

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

flight

flight

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Dynamic Programming

bull We need a method that fills a table with partial results thatndash Does not do (avoidable) repeated workndash Does not fall prey to left-recursionndash Solves an exponential problem in (approximately) polynomial

time

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley Parsing

Fills a table in a single sweep over the input wordsTable is length N+1 N is number of wordsTable entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

States

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

StatesLocations

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to

2

VP -gt V NP [03] A VP has been found starting at 0 and ending

at 3

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Graphically

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley

bull As with most dynamic programming approaches the answer is found by looking in the table in the right place

bull In this case there should be an S state in the final column that spans from 0 to n+1 and is complete

bull If thatrsquos the case yoursquore donendash S ndash α [0n+1]

bull So sweep through the table from 0 to n+1hellipndash New predicted states are created by states in current chartndash New incomplete states are created by advancing existing states

as new constituents are discoveredndash New complete states are created in the same way

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley

bull More specificallyhellipndash Predict all the states you can upfrontndash Read a word

ndash Extend states based on matchesndash Add new predictionsndash Go to 2

ndash Look at N+1 to see if you have a winner

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley and Left Recursion

bull So Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the search

ndash Never place a state into the chart thatrsquos already therendash Copy states before advancing them

S -gt NP VPNP -gt NP PP

bull The first rule predictsS -gt NP VP [00] that addsNP -gt NP PP [00]stops there since adding any subsequent prediction would be fruitless

bull When a state gets advanced make a copy and leave the original alone

ndash Say we have NP -gt NP PP [00]ndash We find an NP from 0 to 2 so we create NP -gt NP PP [02]ndash But we leave the original state as is

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Predictor

Given a stateWith a non-terminal to right of dotThat is not a part-of-speech categoryCreate a new state for each expansion of the non-terminalPlace these new states into same chart entry as generated state

beginning and ending where generating state ends

So predictor looking at

S -gt VP [00]

results in

VP -gt Verb [00]VP -gt Verb NP [00]

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Scanner

Given a stateWith a non-terminal to right of dotThat is a part-of-speech categoryIf the next word in the input matches this part-of-speechndash Create a new state with dot moved over the non-terminalndash insert in next chart entry

So scanner looking at

VP -gt Verb NP [00]

If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01]

Add this state to chart entry following current one

Note Earley algorithm uses top-down input to disambiguate POS Only POS predicted by some state can get added to chart

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Completer

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this

category

bull copy state bull move dotbull insert in current chart entry

GivenNP -gt Det Nominal [13]VP -gt Verb NP [01]

AddVP -gt Verb NP [03]

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley how do we know we are done

Find an S state in the final column that spans from 0 to n+1 and is complete

S ndashgt α [0n+1]

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley

So sweep through the table from 0 to n+1hellip

New predicted states are created by starting top-down from S

New incomplete states are created by advancing existing states as new constituents are discovered

New complete states are created in the same way

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley

More specificallyhellipPredict all the states you can upfront

Read a wordExtend states based on matchesAdd new predictionsGo to 2

Look at N+1 to see if you have a winner

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Example

Book that flight

We should findhellip an S from 0 to 3 that is a completed statehellip

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Example (contrsquod)

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Example (contrsquod)

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

A simple example

Chart[0]γ rarr S [00] (dummy start state)S rarr NP VP [00 ] (predictor)NP rarr N [00 ] (predictor)

Chart[1]N rarr I [01 ] (scan)NP rarr N [01 ] (completer)S rarr NP VP [01 ] (completer)VP rarr V NP [11 ] (predictor)

Chart[2]V rarr saw [12 ] (scan) VP rarr V NP [12 ] (complete) NP rarr N [22 ] (predict)

Chart[3]NP rarr N [23 ] (scan)NP rarr N [23 ] (completer)VP rarr V NP [13 ] (completer)S rarr NP VP [03 ] (completer)

Grammar

S rarr NP VP

NP rarr N

VP rarr V NP

Lexicon Nrarr I | saw | Mary Vrarr saw

Input I saw Mary

Sentence accepted

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

What is it

What kind of parser did we just describe (trick question)Earley parserhellip yesNot a parser ndash a recognizer

The presence of an S state with the right attributes in the right place indicates a successful recognition

But no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Converting Earley from Recognizer to Parser

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Augmenting the chart with structural information

S8

S9

S10

S11

S13

S12

S8

S9

S8

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Retrieving Parse Trees from Chart

All the possible parses for an input are in the tableWe just need to read off all the backpointers from every complete S

in the last column of the tableFind all the S -gt X [0N+1]Follow the structural traces from the CompleterOf course this wonrsquot be polynomial time since there could be an

exponential number of treesSo we can at least represent ambiguity efficiently

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley and Left Recursion

Earley solves the left-recursion problem without having to alter the grammar or artificially limiting the searchNever place a state into the chart thatrsquos already thereCopy states before advancing them

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley and Left Recursion 1

S -gt NP VPNP -gt NP PP

Predictor given first ruleS -gt NP VP [00]

PredictsNP -gt NP PP [00]stops there since predicting same again would be redundant

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Earley and Left Recursion 2

When a state gets advanced make a copy and leave the original alonehellip

Say we have NP -gt NP PP [00]We find an NP from 0 to 2 so we create

NP -gt NP PP [02]But we leave the original state as is

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not

Slide 1

Dynamic Programming Approaches

EarleyTop-down no filtering no restriction on grammar form

CYKBottom-up no filtering grammars restricted to Chomsky-Normal Form

(CNF)Details are not important

Bottom-up vs top-downWith or without filtersWith restrictions on grammar form or not