spring 2014jim hogg - uw - cse - p501e-1 lr ~ bottom-up ~ shift-reduce - concluded dotted items...
TRANSCRIPT
Spring 2014 Jim Hogg - UW - CSE - P501 E-1
LR ~ Bottom-Up ~ Shift-Reduce - concluded
Dotted Items Building the Handles DFA Building Action & Goto Tables Building the Handles DFA: Theory Nullable, FIRST & FOLLOW SLR, LALR, LR
Next
CSE P501 – Compilers
Spring 2014 Jim Hogg - UW - CSE - P501 E-2
Recap: LR State Machine
Problem: when to reduce? when to shift?1. Clairvoyance - foretelling the future - tricky algorithm!2. Backtrack and try another path - slow and cumbersome3. DFA for prefixes - great solution, but still magical4. Action & Goto tables - encoding of 3, so still magical
Idea: devise algorithm for 3&4 - DFA to recognize handles
Language generated by a CFG is generally not regular (cannot be generated by regex, as we saw earlier)
But language of handles for a CFG is regular So use DFA to recognize those handles if (DFA accepts) then REDUCE else SHIFT
Spring 2014 Jim Hogg - UW - CSE - P501 E-3
Recap: Prefixes, Handles, Items
S is start symbol of a grammar G
If S =>* then is a sentential form of G
is a viable prefix of G if there is some derivation: S =>*rm Aw =>*rm w and is a prefix of
A viable prefix is a prefix of a sentential form in a rightmost derivation
The occurrence of in w is a handle of w
Spring 2014 Jim Hogg - UW - CSE - P501 E-4
An item is a production containing a at some position in its RHS
eg: [A X Y ] [A X Y ] [A X Y ]
The shows how much of the token stream we have seen so far
Dotted Items
Spring 2014 Jim Hogg - UW - CSE - P501 E-5
0. Z S $ ($ denotes end-of-file)1. S ( L )2. S x3. L S4. L L , S
We have added a production Z with the original start symbol followed by end of file marker, $
What language does this grammar generate?
Example Grammar
Spring 2014 Jim Hogg - UW - CSE - P501 E-6
Handles DFA - Opening Skirmish
Initially
Stack is empty Input is the right hand side of Z (ie, S$) Initial configuration is [Z S$] As before marks what we've seen so far - initially, nothing But, since is just before S, we might expect to next see
anything that can be derived from S. So: just before ( or x
0. Z S$1. S ( L )2. S x3. L S4. L L , S
Spring 2014 Jim Hogg - UW - CSE - P501 E-7
A state is just a set of items Start: an initial set of items
Completion (or closure): additional productions whose LHS appears immediately to the right of the dot in some item already in the state
Initial Items State
Z S $S ( L )S x
start item, or core
Completion, or closure, of core
0. Z S$1. S ( L )2. S x3. L S4. L L , S
Spring 2014 Jim Hogg - UW - CSE - P501 E-8
If we shift and see x, we will transition to a state containing the item S x
So, add a new state with that item In this case, closure of [S x] adds nothing If we hit this state during parse, we will reduce - no
alternative!
Shift Action on x
S x x
Z S $S ( L )S x
0. Z S$1. S ( L )2. S x3. L S4. L L , S
Spring 2014 Jim Hogg - UW - CSE - P501 E-9
If, instead, we shift and see ( we obtain item S ( L )
Its closure adds another 4 items, as shown
Shift Action on (
S ( L )L L , SL S S ( L ) S x
(Z S $S ( L )S x
0. Z S$1. S ( L )2. S x3. L S4. L L , S
Spring 2014 Jim Hogg - UW - CSE - P501 E-10
If we instead shift S, we pop the RHS from the stack exposing the first state. Add a Goto transition on S for this.
Goto Actions
Z S $SZ S $
S ( L )S x
0. Z S$1. S ( L )2. S x3. L S4. L L , S
Spring 2014 Jim Hogg - UW - CSE - P501 E-11
0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
Building the Handles DFA
Spring 2014 Jim Hogg - UW - CSE - P501 E-12
Build Handles DFA – Step 2 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
Spring 2014 Jim Hogg - UW - CSE - P501 E-13
0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
Build Handles DFA – Step 3
Spring 2014 Jim Hogg - UW - CSE - P501 E-14
Build Handles DFA – Step 4 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3(
Spring 2014 Jim Hogg - UW - CSE - P501 E-15
Build Handles DFA – Step 5 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
(x
Spring 2014 Jim Hogg - UW - CSE - P501 E-16
Build Handles DFA – Step 6 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
(x
L S 7
S
Spring 2014 Jim Hogg - UW - CSE - P501 E-17
Build Handles DFA – Step 7 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
((
L S 7
S
x
Spring 2014 Jim Hogg - UW - CSE - P501 E-18
Build Handles DFA – Step 8 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
((
L S 7
S
x
S ( L )L L , S 5
L
Spring 2014 Jim Hogg - UW - CSE - P501 E-19
Build Handles DFA – Step 9 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
((
L S 7
S
x
S ( L )L L , S 5
L
)
S ( L ) 6
Spring 2014 Jim Hogg - UW - CSE - P501 E-20
Build Handles DFA – Step 10 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
((
L S 7
S
x
S ( L )L L , S 5
L
)
L L , SS ( L )S x
8
,
S ( L ) 6
Spring 2014 Jim Hogg - UW - CSE - P501 E-21
Build Handles DFA – Step 11 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
((
L S 7
S
x
S ( L )L L , S 5
S
)
,
L L , S
9
L
L L , SS ( L )S x
8
S ( L ) 6
Spring 2014 Jim Hogg - UW - CSE - P501 E-22
Build Handles DFA – Step 12 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
((
L S 7
S
x
S ( L )L L , S
5
S
)
,
L L , S
9
L
x L L , SS ( L )S x
8
S ( L ) 6
Spring 2014 Jim Hogg - UW - CSE - P501 E-23
Build Handles DFA – Step 13 0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
( (
L S 7
S
x
S ( L )L L , S
5
S
)
,
L L , S
9
L
x L L , SS ( L )S x
8
(
S ( L ) 6
Spring 2014 E-24
0. S’ S $1. S ( L )2. S x3. L S4. L L , S
S’ S $S ( L )S x
1
S’ S $4
S
xS x
2
S ( L )L SL L , SS ( L )S x
3
( (
L S 7
S
x
S ( L )L L , S
5
S
S ( L ) 6
)
,
L L , S
9
L
x L L , SS ( L )S x
8
(
( ) x , $
1 s3 s2
2 r2 r2 r2 r2 r2
3 s3 s2
4 acc
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
8 s3 s2
9 r4 r4 r4 r4 r4
Building Action & Goto Tables
S L
g4
g7 g5
g9
Spring 2014 Jim Hogg - UW - CSE - P501 E-25
Closure(S) S is a state in the Handles-DFA Closure adds all items implied by items already in S
Goto (S, A) S is a state in the Handles-DFA (set-of-items) A is a grammar symbol (single non-terminal) Goto moves the dot past symbol A in all appropriate
items in S
Building the Handles DFA: Theory
Spring 2014 Jim Hogg - UW - CSE - P501 E-26
Closure Algorithm
fun Closure(S) = // S is a state in the Handles-DFA repeat foreach item = [AB] in S do // B N
foreach production B do S = [B] enddo enddo
until S does not changereturn S // update-in-place
endfun
Adds all the items to S that it should have: given B we are about to see a token that starts B; equally well, a token that starts any production of B
Spring 2014 Jim Hogg - UW - CSE - P501 E-27
Goto Algorithm
fun Goto(S, X) = // S is node in DFA = set-of-Items
// X is a terminal or non-terminal
new = {}foreach item = [AX] in S do new = [AX] enddoreturn Closure(new)
endfun
How to find new, or existing, DFA state we reach on starting in state S and reading X.
We view S as a set-of-items
Spring 2014 Jim Hogg - UW - CSE - P501 E-28
LR(0) Construction
Augment grammar with extra start production S’ S $
Let N be the set of states (Nodes in DFA) Let E be the set of edges Initialize N to Closure( [S’ S $] ) Initialize E to empty
Our LR Parse Table and Algorithm is based only on what we can see on the parse stack; so far, we ignore lookahead
So our grammar is restricted to LR(0)
Spring 2014 Jim Hogg - UW - CSE - P501 E-29
LR(0) Construction Algorithm
repeat foreach state S in DFA do foreach item [A X] in S do new = Goto (S, X) S = new E = I x new enddo enddountil E and S do not change
View S (state in DFA) as a set-of-items. So S = new
For symbol $, don’t compute Goto(I, $); instead, make this an accept action.
Spring 2014 Jim Hogg - UW - CSE - P501 E-30
Building the Parse Tables (1)
foreach edge = I x J do if X T then Action[I, X] = sj // shift and goto state j if X N then Goto[I, X] = gj // goto state jenddo
KeyT = set of TerminalsN = set of NonTerminals
Spring 2014 Jim Hogg - UW - CSE - P501 E-31
Building the Parse Tables (2)
foreach state S do foreach item = [S' S$] do Action[I, $] = accept enddo foreach item = [A ] do Action[I, *] = rj, where j is production number for A enddoenddo
Spring 2014 Jim Hogg - UW - CSE - P501 E-32
Where have we reached?
We have built LR(0) DFA and Parser Tables (Action & Goto)
No lookahead yet
Variations of LR parsers add lookahead info, but basic idea of states, closures, and edges remains the same
Spring 2014 Jim Hogg - UW - CSE - P501 E-33
A Grammar that is not LR(0)
We cannot parse the grammar below using LR(0), because it produces a Shift-Reduce conflict:
• S E $• E T + E• E T• T x
(This grammar generates strings: x, x + x, x + x + x, etc)
Spring 2014 Jim Hogg - UW - CSE - P501 E-34
LR(0) Parser for 0. S E $1. E T + E2. E T3. T x
S E $E T + EE TT x
T x
S E $
E T + EE T
E T + EE T + EE TT x
E T + E
1 2
3
45
6
E
T
+ Tx
E
x + $ E T
1 s5 g2 G3
2 acc
3 r2 s4,r2 r2
4 s5 g6 G3
5 r3 r3 r3
6 r1 r1 r1
SR conflict in State 3 So grammar is not LR(0)
x
Spring 2014 Jim Hogg - UW - CSE - P501 E-35
SLR Parsers
Idea: Use information about what can follow a non-terminal to decide if we should perform a reduction
Easiest form is SLR – "Simple LR"
We need to compute FOLLOW(A) – the set of symbols that can immediately follow A in any possible derivation
ie, terminal t is in FOLLOW(A) if any derivation contains At To compute this, we need to compute FIRST() for strings that
can follow A
Spring 2014 Jim Hogg - UW - CSE - P501 E-36
Nullable(A) is true if A can derive the empty string ie, we can find A =>*
FIRST(A) = set of Terminals that can begin strings derived from A
FIRST() = set of Terminals that can begin strings derived from
FOLLOW(A) is the set of terminals that can immediately follow A
Nullable, FIRST & FOLLOW
Recall standard notation: A N // Non-terminal (N T)* // String of Non-terminals and Terminals
Spring 2014 Jim Hogg - UW - CSE - P501 E-37
To calculate FIRST():
if = A then FIRST() = FIRST(A), right?
Yes, but what if A has an epsilon production: A
Then we need to union-in FOLLOW(A) (in this case, equal to FIRST())
Computing FIRST: Intuition
Spring 2014 Jim Hogg - UW - CSE - P501 E-38
Example
Z dZ X Y ZY εY cX YX a
Nullable?
FIRST FOLLOW
X T {a c} {a c d}
Y T {c} {a c d}
Z F {a c d} { }
Spring 2014
Example
Z dZ X Y ZY εY cX YX a
X F
Y F
Z F
X T
Y T
Z F
X {}
Y {}
Z {}
Nullable? FIRST
X {a c}
Y {c}
Z {d}
X {a c}
Y {c}
Z {a c d}
X {}
Y {}
Z {}
FOLLOW
X {a c d}
Y {a c d}
Z {}
Jim Hogg - UW - CSE - P501
Iterate thru productions, until tables stop changing
Spring 2014 Jim Hogg - UW - CSE - P501 E-40
Nullable, FIRST & FOLLOW: Intuition
To compute, start by: A N, FIRST(A) = FOLLOW(A) = {} A N, Nullable(A) = false a T, FIRST(a) = {a}
Rules:• if A a then FIRST(A) = a
• if A Y1 and Y1 not Nullable then FIRST(A) = FIRST(Y1)
• if A Y1Y2 and Y1 is Nullable then FIRST(A) = FIRST(Y2)
• if A Y1Y2Y3 and Y1,Y2 both Nullable then FIRST(A) = FIRST(Y3)
• if A Y1..Yn and all Yi are Nullable then FIRST(A) = a T // TerminalA N // Non-terminalY (N T) // Terminal or Non-terminal (N T)* // String of Terminals and Non-terminals
Spring 2014 Jim Hogg - UW - CSE - P501 E-41
Computing Nullable
foreach A N do Nullable(A) = false enddo
repeat foreach A Y1..Yk in P do if Y1..Yk are all Nullable then Nullable(A) =
true enddountil Nullable does not change
Intuition: Suppose A XY then we can derive A =>* only if both X =>* AND Y =>*
Nullable(A) is true if A can derive the empty string. ie, we can find A =>*
Nullable :: N Bool
Spring 2014 Jim Hogg - UW - CSE - P501 E-42
Computing FIRST
foreach A N do FIRST(A) = {} enddo
repeat foreach A Y1..Yk in P do FIRST(A) = FIRST(Y1) foreach i in [1..k-1] do if Nullable(Yi) then FIRST(A) = FIRST(Yi+1) else
break enddo if all Y1..Yk are Nullable then FIRST(A) = enddountil FIRST does not change
FIRST :: N {T}
Spring 2014 Jim Hogg - UW - CSE - P501 E-43
Computing FOLLOW
foreach A N do FOLLOW(A) = {} enddo
repeat foreach A Y1..Yk in P do foreach i in [1..k-1] do if Yi N then FOLLOW(Yi) = FIRST(Yi+1) enddo enddountil FOLLOW does not change
FOLLOW :: N {T}
Intuition: We cannot tell anything about FOLLOW(A) by looking at A's
productions But, given A Y1..Yk we can infer FOLLOW(Yi). Eg: A aBcdEFg => FOLLOW(B) = c; FOLLOW(E) = FIRST(F);
etc
Spring 2014 Jim Hogg - UW - CSE - P501 E-44
SLR Construction
Identical to LR(0) states. But refine the reduce actions
Algorithm:
foreach state S in DFA do foreach item [A] in S do reduce by A onlyif lookahead FOLLOW(A) enddoenddo
Spring 2014 Jim Hogg - UW - CSE - P501 E-45
LR(0) Parser for 0. S E $1. E T + E2. E T3. T x
S E $E T + EE TT x
T x
S E $
E T + EE T
E T + EE T + EE TT x
E T + E
1 2
3
45
6
E
T
+ Tx
E
x + $ E T
1 s5 g2 g3
2 acc
3 r2 s4,r2 r2
4 s5 g6 g3
5 r3 r3 r3
6 r1 r1 r1
x
By taking account of lookahead, we can eliminate 5 reduce actions
Spring 2014 Jim Hogg - UW - CSE - P501 E-46
On To LR(1)
Many practical grammars are SLR
LR(1) is even more powerful
Similar construction for LR(1), but notion of an item is more complex, incorporating lookahead information
Spring 2014 Jim Hogg - UW - CSE - P501 E-47
LR(1) Items
An LR(1) item [A , a] is A grammar production (A ) A right hand side position (the dot) A lookahead symbol (a)
Idea: This item indicates that is the top of the stack and the next input is derivable from a
Full construction: see Cooper&Torczon
Spring 2014 Jim Hogg - UW - CSE - P501 E-48
LR(1) Tradeoffs
LR(1) Pro: extremely precise; largest set of
grammars / languages
Con: potentially very large parse tables with many states
Spring 2014 Jim Hogg - UW - CSE - P501 E-49
LALR(1)
Variation of LR(1), but merge any two states that differ only in lookahead
Example: these two would be merged[A ::= x , a][A ::= x , b]
Spring 2014 Jim Hogg - UW - CSE - P501 E-50
LALR(1) vs LR(1)
LALR(1) tables can have many fewer states than LR(1)
LALR(1) may have reduce conflicts where LR(1) would not (but in practice this doesn’t happen often)
Most practical bottom-up parser tools are LALR(1)(eg yacc, bison, CUP)
Species of Context-Free Grammars
Spring 2014 Jim Hogg - UW - CSE - P501 E-51
All
Unambiguous
LR(1)
LALR(1)
SLR(1)
LR(0)
Spring 2014 Jim Hogg - UW - CSE - P501 E-52
LR, Bottom-Up, Shift-Reduce is done!
LL(k) Parsing – Top-Down
Recursive Descent Parsers What to do if you need a parser in a hurry
Next