![Page 1: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/1.jpg)
Basic Parsing Algorithms –Chart Parsing
SeminarRecent Advances in Parsing Technology
WS 2011/2012
Anna Schmidt
![Page 2: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/2.jpg)
Talk Outline Chart Parsing – Basics Chart Parsing – Algorithms
– Earley Algorithm– CKY Algorithm
→ Basics→ BitPar: Efficient Implementation of CKY
![Page 3: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/3.jpg)
Chart Parsing – Basics
![Page 4: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/4.jpg)
Chart Parsing – Basics First proposed by Martin Kay Dynamic programming approach
– Partial results of the computation are stored and (re)used later if needed
→ Same problem is not solved more than once Operates on a CFG Functionality: Recogniser / Parser
… in this talk focus on recogniser functionality
![Page 5: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/5.jpg)
Main Components
Chart Edges Agenda
![Page 6: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/6.jpg)
Component: Chart Is a well-formed substring table (WFST)
– Stores partial and complete analyses of substrings– Information stored in one triangular half
of a two-dimensional array of (n+1)*(n+1) | n*n
Can also be understood as a (directed) graph– Vertices: positions between input words
0 Mary 1 feeds 2 the 3 otter 4– Edges connecting vertices
Allows no duplicate entries
![Page 7: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/7.jpg)
Component: Edge Data structure storing information about a
particular step in the parsing process Inhabit cells of the chart Contain
– Start and end position in input string– A dotted rule– Can also contain edge probability
![Page 8: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/8.jpg)
Component: Edge A dotted rule consists of
– Left hand side (LHS) = non-terminal symbol– Right hand side (RHS) = non-terminal or terminal symbol– A dot between RHS symbols indicating which
constituents have already been found Edges can be
– Active / incomplete: dot not the last element of RHS– Inactive / complete: dot is last element of RHS
Example: S → NP • VP (0,1)
![Page 9: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/9.jpg)
Component: Agenda
Organises the order in which tasks are executed
Here all tasks (edges) are collected before being put on the chart
Ordering of agenda determines what is processed first → Therefore also which parse is found first– Queue, stack, ordering with respect to
probabilities, …
![Page 10: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/10.jpg)
Parsing Strategies Kay differentiates parsing strategies along two dimensions:
– Bottom-up versus top-down – Directed versus undirected
Directed bottom-up– Only build edges for phrases that can actually be incorporated into a
higher level structure → Left-Corner Parser Directed top-down
– Only build a new (active) edge if the next word of the input can be used to extend such an edge → Earley
Undirected varieties: No such restrictions → Undirected Bottom-Up: CKY
![Page 11: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/11.jpg)
Parsing Strategies
Ways of achieving directedness: Reachability Table:
– Contains for each non-terminal N the set of all symbols that can be the first element of a string dominated by N
– For example: NP can start with DET, N, ADJ, but not with V Rule selection table:
– M*N table where M = non-terminals excluding pre-terminals N = all non-terminals
– Contains all grammar rules applicable in a situation where M is the 'upper' and N is the 'lower' symbol
![Page 12: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/12.jpg)
Chart Parsing: Advantages No repeated computation of same subproblem Deals well with left-recursive grammars Deals well with ambiguity No backtracking necessary
![Page 13: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/13.jpg)
Earley Algorithm
![Page 14: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/14.jpg)
Earley Algorithm Proposed by Jay Earley Top down search Can handle all CFGs Efficient:
– O(n3) in the general case – Faster for particular types of grammar
![Page 15: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/15.jpg)
Terminology In his paper, Earley does not use the notion of a
'chart' He represents the parsing process as
sets of states– Index of each state set
= end position of all states in the set– A state largely corresponds to an edge
- Contains dotted rule- Pointer to start position- End position can be derived from state set
![Page 16: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/16.jpg)
Terminology Formalisms are very similar Examples easier to follow when represented in
charts So we will stick with 'chart' representations
![Page 17: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/17.jpg)
Algorithm – Components Initialization Predictor Scanner Completer
Algorithm operates on one half of an array of size (n+1)*(n+1)
![Page 18: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/18.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
1
2
3
4
5
Initialise
![Page 19: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/19.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
1
2
3
4
5
Predict
![Page 20: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/20.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary •
1
2
3
4
5
Scan
![Page 21: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/21.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1
2
3
4
5
Complete
![Page 22: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/22.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
2
3
4
5
Predict
![Page 23: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/23.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •
2
3
4
5
Scan
![Page 24: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/24.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
2
3
4
5
Complete
![Page 25: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/25.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
3
4
5
Predict
![Page 26: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/26.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •
3
4
5
Scan
![Page 27: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/27.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
3
4
5
Complete
![Page 28: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/28.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
3 N → • MaryN → • otter
4
5
Predict
![Page 29: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/29.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
3 N → • MaryN → • otter
N → otter •
4
5
Scan
![Page 30: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/30.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
NP → DET N •
3 N → • MaryN → • otter
N → otter •
4
5
Complete
![Page 31: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/31.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
VP → V NP •
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
NP → DET N •
3 N → • MaryN → • otter
N → otter •
4
5
Complete
![Page 32: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/32.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
S → NP VP •
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
VP → V NP •
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
NP → DET N •
3 N → • MaryN → • otter
N → otter •
4
5
Complete
![Page 33: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/33.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
S → NP VP • X → S • eos
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
VP → V NP •
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
NP → DET N •
3 N → • MaryN → • otter
N → otter •
4
5
Complete
![Page 34: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/34.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
S → NP VP • X → S • eos
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
VP → V NP •
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
NP → DET N •
3 N → • MaryN → • otter
N → otter •
4 eos → • eos
5
Predict
![Page 35: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/35.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
S → NP VP • X → S • eos
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
VP → V NP •
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
NP → DET N •
3 N → • MaryN → • otter
N → otter •
4 eos → • eos eos → eos •
5
Scan
![Page 36: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/36.jpg)
0 Mary 1 feeds 2 the 3 otter 4 eos 5
0 1 2 3 4 50 X → • S eos
S → • NP VPNP → • NNP → • DET NN → • MaryN → • otterDET → • the
N → Mary • NP → N •S → NP • VP
S → NP VP • X → S • eos
X →S eos •
1 VP → • V NPV → • feeds
V → feeds •VP → V • NP
VP → V NP •
2 NP → • NNP → • DET NN → • MaryN → • otterDET → • the
DET → the •NP → DET • N
NP → DET N •
3 N → • MaryN → • otter
N → otter •
4 eos → • eos eos → eos •
5
Complete
![Page 37: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/37.jpg)
Lookahead Component
In original paper, Earley proposes the use of a lookahead string for each state which represents the allowed successor for LHS
Prevents completer from processing a state if lookahead string and next word of input do not match→ Remember Kay's directed top-down strategy?
![Page 38: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/38.jpg)
CKY: Basics
![Page 39: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/39.jpg)
CKY Basics
Proposed by John Cocke, Daniel H. Younger, and Tadao Kasami (independently)
Bottom-up search Incremental Grammar must be in Chomsky normal form (CNF) Complexity O(n3) Chart: (upper triangle of) array of size n*n
![Page 40: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/40.jpg)
CKY Algorithm: Idea
Initialise upper triangle of a chart of size n*n From upper left to lower right corner of chart:
Go to the next cell in the diagonal– Fill in POS tag of next word in input string – Each time a POS tag has been filled in,
go up cell by cell and build larger constituentsthat end at the current end position
![Page 41: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/41.jpg)
1 2 3 4
1
2
3
4
![Page 42: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/42.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1
2
3
4
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 43: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/43.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2
3
4
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 44: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/44.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2 V
3
4
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 45: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/45.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2 V
3
4
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 46: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/46.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2 V
3 DET
4
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 47: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/47.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2 V
3 DET
4
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 48: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/48.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2 V
3 DET
4
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 49: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/49.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2 V
3 DET
4 NNP
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 50: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/50.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2 V
3 DET NP
4 NNP
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 51: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/51.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
2 V VP
3 DET NP
4 NNP
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 52: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/52.jpg)
0 Mary 1 feeds 2 the 3 otter 4
1 2 3 4
1 NNP
S
2 V VP
3 DET NP
4 NNP
S → NP VPNP → NNP → DET NVP → V NP
N → Mary | otter
V → feedsDET → the
![Page 53: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/53.jpg)
CKY: BitPar
![Page 54: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/54.jpg)
BitPar: Basics
Proposed by Helmut Schmid Bit-vector-based parser Efficiently implements a CKY-style algorithm Uses bit vector operations to parallelise parsing
operations Idea:
Don't try to decrease number of edges that are built, instead minimise cost of building edges
Especially useful if all analyses are needed
![Page 55: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/55.jpg)
BitPar: Requirements Restrictions on Context Free Grammar
– Must be in CNF– Must be ε-free – Chain rules allowed
Precomputed for each non-terminal N: – Set of non-terminals that are derivable from N via
chain rules– Set is stored in the bit vector chainvec[N]– Set includes N itself
![Page 56: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/56.jpg)
Background: Bitwise AND and OR
● AND
0101& 0011 = 0001
Both corresponding bits must equal 1
● OR
0101 | 0011 = 0111
At least one of corresponding bits must equal 1
![Page 57: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/57.jpg)
BitPar: Chart Chart = three-dimensional bit array
chart [start position b] [end position e] = [011000...] [b] [e] contains a bit vector with one bit for each
non-terminal– Bit is set to 1 if non-terminal was inserted– 0 otherwise
Chart initialised with all bits = 0
![Page 58: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/58.jpg)
Filling the Chart: POS Tags
Inserting POS tags into a cell of the diagonal: For each non-terminal N that can be rewritten
as the word at the current position Do a bitwise OR of– Bits inhabiting the chart cell– chainvec[N]
→ N and all its chain derivations are inserted in just one operation
![Page 59: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/59.jpg)
Mary feeds the otter 1 2 3 4
1 011000 000000 000000 000000
2 000000 000010 000000 000000
3 000000 000000 000000 000000
4 000000 000000 000000 000000S, NP, N, VP, V, DET
![Page 60: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/60.jpg)
Mary feeds the otter 1 2 3 4
1 011000 000000
?000000 000000
2 000000 000010 000000 000000
3 000000 000000 000000 000000
4 000000 000000 000000 000000S, NP, N, VP, V, DET
![Page 61: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/61.jpg)
Filling the Chart: Larger Constituents
Conceptually: Determine if several cells can be combined to
form a higher level constituent labeled N For this:
Loop over grammar rules with LHS = N,extract RHS (consisting of RHS1, RHS2)
Loop over all possible combinations of cells that together could contain the substructure of N and determine whether they contain RHS1 and RHS2 respectively
![Page 62: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/62.jpg)
Filling the Chart: Larger Constituents
This has to be done – For each super-diagonal cell– For each non-terminal– For all corresponding grammar rules– For all possible cell combinations that could
constitute a substructure of N This is a time-consuming process BUT: The same functionality can be achieved
by a single AND operation on two bit vectors
![Page 63: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/63.jpg)
Filling the Chart: Larger Constituents
Internally: Can a given non-terminal LHS be inserted into a
given chart cell [b] [e]? Get RHS1, RHS2 from grammar Vector 1
Contains bits stored in chart [ b ] [ b ..b+1..e-1 ] [ RHS1 ]
Vector 2Contains bits stored in chart [ b+1..b+2..e ] [ e ] [ RHS2 ]
![Page 64: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/64.jpg)
Filling the Chart: Larger Constituents
If a bitwise AND operation on the two new vectors produces one bit = 1– A valid substructure for LHS has been found– LHS can be inserted into the chart cell
Let's look at an example
![Page 65: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/65.jpg)
Mary feeds the otter 1 2 3 4
1 011000
2 000010
3 000001
?4 011000
Example:Lets determine if NP should go into cell [3] [4].
S, NP, N, VP, V, DET
1 2 3 4
1 011000 000000 000000 000000
2 000000 000010 000000 000000
3 000000 000000 000001 000000 ?
4 000000 000000 000000 011000
![Page 66: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/66.jpg)
Should NP go into [3] [4]?
First, we consult the grammar We find a rule NP → DET N, so allowed right-hand sides for NP are RHS1 = DET RHS2 = N
Reminder: Rules v1 = chart [ b ] [ b .. b+1 .. e-1] [ RHS1 ]v2 = chart [ b+1.. b+2 .. e ] [ e ] [ RHS2 ]
Vector1 = 1chart [3] [3] = RHS1 = DET? → yes, so insert 1
Vector2 = 1chart [4] [4] = RHS2 = N? → yes, so insert 1
Vector1 AND Vector2 = 1, so insert NP
![Page 67: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/67.jpg)
Mary feeds the otter 1 2 3 4
1 011000
2 000010
3 000001
?4 011000
Example:Lets determine if NP should go into cell [3] [4].→ Yes!
S, NP, N, VP, V, DET
1 2 3 4
1 011000 000000 000000 000000
2 000000 000010 000000 000000
3 000000 000000 000001 010000
4 000000 000000 000000 011000
![Page 68: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/68.jpg)
Thank you for your attention!
![Page 69: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/69.jpg)
References Earley, Jay: An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–
102, 1970.
Jurafsky, Daniel and Martin, James H.: 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall
Kay, Martin: Algorithm schemata and data structures in syntactic processing. In Readings in natural language processing, pages 35–70. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1986.
Kay, Martin: Lecture Slides of the Course 'Basic Algorithms for Computational Linguistics' http://www.coli.uni-saarland.de/courses/algorithms-11/
Schmid, Helmut: Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors. In Proceedings of Coling 2004, pages 162–168, Geneva, Switzerland, 2004.
Wirén, Mats: A Comparison of Rule-Invocation Strategies in Context-Free Chart Parsing
![Page 70: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/70.jpg)
Initialization introduces a new non-terminal start symbol X
and a new end symbol EOS
adds EOS to the end of the input string for each root symbol R of the grammar:
add to the chart[0,0] an edge of the form:
X → . R EOS
![Page 71: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/71.jpg)
Predictor for all non-terminals N directly following a dot
(in the current state set): and for each grammar rule with N as LHS:
add a new edge with – LHS = N – RHS according to grammar, but– dot first element of RHS– start and end = end of original state
![Page 72: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/72.jpg)
Scanner
for all terminal symbols immediately following a dot:
compare terminal symbol with input string starting at end position of current edge
if they match: add new edge to the chart with – dot moved over the terminal symbol – end position incremented by 1
![Page 73: Basic Parsing Algorithms – Chart Parsingyzhang/rapt-ws1112/slides/...NP → DET N • 3 N → • Mary N → • otter N → otter • 4 eos → • eos eos → eos • 5 Complete](https://reader033.vdocuments.us/reader033/viewer/2022060723/6083396414ea8116984f24a0/html5/thumbnails/73.jpg)
Completer
If the dot is last element of a production with LHS of type T
find edges that– are still waiting for a constituent of the type T– end where the complete edge is starting
Add to the chart an edge with– dot moved over T– end position = end position of completed edge