inf5110 ch. 5: bottom-up parsing part 1€¦ · ch. 5: bottom-up parsing part 1 12/2-2015 stein...
TRANSCRIPT
1
INF5110 Ch. 5: Bottom-up parsing
Part 1
12/2-2015 Stein Krogdahl
Ifi, UiO
Mandatory assignment 1: Will occur sometime during the week from 16/2
2
Bottom up parsing S
A
t 1 t 2
t 3 t 7 t 4 t 5 t 6
B
B
A
The methods listed below are, in the given order, able to handle more and more complicated grammars. Each «state» is represented as a row in the table above.
- LR(0) Can handle only very simple grammars. Require about 300 states for a standard programming language. Only as an intro. to SLR(1) and LALR(1).
- SLR(1) Can take most standard grammars for standard PLs. The same number of states as the LR(0) method. This method will be our main focus
- LALR(1) Can handle a few more grammars than SLR(1). Again the same number of states as the LR(0) method. We will look at the ideas behind this method
- LR(1) Can handle all grammars that in any way can be handled by looking at only the next token. The number of states will be around 3000.
Automated tools that, from a BNF grammar, can deliver a parser:
-YACC Bison CUP ( all of the uses LALR(1)-techniques)
States
Tokens + nonterminals
Table for LR parsing
3
Data structure for LR-parsing
S’ → S
S → A B t 7 | ....
A → t 4 t 5 | t 1 B |
B → t 2 t 3 | A t 6 | ....
Assume that the grammar is unambiguous, that we are given a correct sentence «t1 t2 … t7» and that we know the parse tree for this sentence:
S
A
t 1 t 2
t 3 t 7 t 4 t 5 t 6
B B
A
S’
LR-parsing: • Have a stack representing what we have read
• Make a «reduction» of a subtree when «it occurs» at the top of the stack
Add a new «outermost» production. Thus, the new start symbol S’ will never occur in the right hand side of a production.
4
More about LR-parsing
And we assume thatwe know the parse tree for this sentence
S
A
t 1 t 2
t 3 t 7 t 4 t 5 t 6
B B
A
S’
• We have a stack representing what is read
• Make a «reduction» of a subtree when it appears at the top of the stack.
• A reduction is to replace this with the non-terminal that produced this tree.
• A reduction: To use a production backwards
Start-situation: $ t 1 t 2 t 3 t 7 $ t 4 t 5 t 6
stack input
stack input
Slutt-situasjonen: $ S’ $
S’ → S
S → A B t 7 | ....
A → t 4 t 5 | t 1 B |
B → t 2 t 3 | A t 6 | ....
New outermost production
The principles of LR-parsing, and the shift and reduce operations
S’ → S
S → A B t 7 | ....
A → t 4 t 5 | t 1 B |
B → t 2 t 3 | A t 6 | ....
stakk input
$ t1 t2 t3 t4 t5 t6 t7$
$ t1 t2 t3 t4 t5 t6 t7 $
$ t1 t2 t3 t4 t5 t6 t7 $
$ t1 t2 t3 t4 t5 t6 t7 $
$ t1 B t4 t5 t6 t7 $
$ A t4 t5 t6 t7 $
$ A t4 t5 t6 t7$
$ A t4 t5 t6 t7 $
$ A A t6 t7 $
$ A A t6 t7 $
$ A B t7 $
$ A B t7 $
$ S $
S
A
t 1 t 2
t 3 t 7 t 4 t 5 t 6
B
B A
S’
• There are two types of steps:
• Shift: Move the next input symbol over to the top of the stack.
• Reduction: Remove the symbols of the rightmost subtree from the stack, and replace it by the nonterminal at the root of the subtree.
If you know the parse tree it is easy to perform these steps correctly.
BUT: How can we do this without knowing:
- The full syntax tree
- The rest of the input
6
For comparison: Top down and bottom up parsing
Analysed and found OK token Will be matched against the rest of A.
C
S
A
B
Which alternative or A?
Input
• In top-down (e.g. recursive descent) parsing we «produce» the syntax tree aiming to get the given sentence, while using the productions of the grammar.
• During bottom-up (e.g LR) parsing we «reduce» the input according to productions, while aiming at the start symbol.
7
Example showing LR-parsing (when we know the tree!)
Right derivation (but the example is not good!)
E’
E
E
n + n
In our textbook, but we do not stress this:
• The next reduction that should be made is called the ”handle” of the situation.
• ”stack + input” will always form a sentenial form occuring during a right derivation. They occur in the opposite order of which they occur in a right derivation.
8
Another example: LR-parsing from grammars with empty productions
S’
S
S S
ε ε
( )
NB: S → ε will appear a little strange: During a reduction with this production a nonterminal will show up at the top of the stack «from nowhere», here and here.
9
A typical situation during LR-parsing
$ the stack token rest of input
s1s2s3s4 ....... sk
The stack is reduced version of the processed input
S’
S
A
B
t 1 t 2 t 3 ......t n $ t 4 t 5 t 6
All these are reduced
After a shift, the next reduction to be made is a reduction with the production:
C -> t1
Then, after two shifts, we will make a reduction with the production:
D -> t2 t3
Then, what’s next? D
C
Means that they are the same node
10
A plan for solving «The LR-problem»: «When to do a shift and when to do a reduction (with what production)»
(Not everything on the slides is found in our book. Only what’s in the book is curriculum)
We look at all possible stacks that can occur during LR-parsing of a sentence in L(G). We consider them as a new language Stacks(G), over the following alphabet:
{terminals} ∪ {non-terminals} Stacks(G) = {string s | s may occur as the stack during LR-parsing of a
sentence in L(G))}
This language turns out to be regular, and can be described by an NFA where all states are accepting. These states are identified by ”items” of the form: A → X Y . Z The possible state transitions of the NFA can also be described rather
straight-forward.
We will turn this NFA into a DFA in the standard way (subset construction from ch. 2)
Each state of this DFA will be a subset of the NFA states, and thus be sets of items.
11
The rest of: The plan for solving «The LR-problem»:
The new language Stacks(G), turns out to be regular, as it can be described by an NFA
In this NFA all states are accepting
The states of the NFA are all the possible ”items” of the grammar:
The position of the dot (e.g. in «A → X Y . Z» ) will play a central role in the later analyses.
We will turn this NFA into a DFA in the standard way (subset construction from ch. 2)
Thus, each state of this DFA will consist of a set of items, and these items will indicate the possible «local situations» of the parsing (see: «The main proposition for LR-parsing»).
12
The NFA: The states and the transitions This is not fully explained in our book, and is not curriculum
S’
S
A
B
$
A
Stack S: Original input:
Given a grammar G and a randomly chosen sentence in L(G), and a randomly chosen point during LR parsing of this sentence.
$
We will describe an NFA that will accept exactly all such «lines», seen as a sequence of symbols.
This sequence is the content of the stack
Note: No such subtrees remain under the left side branches of the remaining tree. They are all reduced to their root symbol.
Means that they are the same nodes
13
The state transitions of the NFA
This is not fully explained in our book, and is not curriculum S’
S
A
B
$
X α
X
η
α η
corresponds to A
ε β
Stack: Original input:
x
Y
$
We have to show:
(1) For any such situation, the left edge of the tree must be accepted by the NFA
(2) For any path through the NFA, we can set up a parse tree that has this path as its left edge.
The same type of «random situation» as before
corresponds to
If X is a non-terminal:
X can be either a non-terminalor a terminal:
14
Example: The NFA that describes all possible stacks More precisely: The LR(0)-NFA
E ’ → E
E → E + n
E → n
Items (LR(0)-items) E’
E
E
+ n start
E→ E+.n E→ E+n. E → E. +n
E ’ → E. E ’ → .E
E → .E +n
E → n. E → .n
ε
ε
ε
E
E
n
n +
ε
The same NFA, in a slightly more orderly form:
Again: The LR(0)-NFA (Slightly more ordered than in the textbook):
LR(0)-DFA, made by the «subset construction» (Ch. 2):
E→ E+.n E→ E+n. E → E. +n
E ’ → E. E ’ → .E
E → .E +n
E → n. E → .n
ε
ε
ε
E
E
n
n +
start
ε
E’
E
E . + n
E’
E .
”E” on the stack ends in state 1. The two possible situation can then be exemplified by:
Closure None of the other states will have any closure
16
How to construct the LR(0)-DFA directly from the grammar.
(Straight ahead use of the «subset construction»)
Closure of a set I of items: If:
A → α • Bγ is an item in I B is a non-terminal, and B → β 1 | β 2 | ...
Then these items B → • β 1
B → • β 2
…
should also be included in I.
The start state of the LR(0)-DFA is:
State transition for symbol X from state I X is a terminal or a
nonterminal (they are here treated the same way
S’ → • S
+ closure
Make sure that all items of the form «A → α • X β» are included
........
A1 → α1 • X β1
........
A2 → α2 • X β2
........
A1 → α1 X • β1
A2 → α2 X • β2
+ closure
X
State I:
Another example: How to construct the LR(0)-DFA directly from the grammar
Note that S → ε gives only one item, which is: S → • Not like this : S → • ε, and S → ε •
LR(0) – NFA:
LR(0) – DFA:
The items:
18
What is the ”top state” telling us?
The top state: The DFA-state that we arrive at when
the stack is fed trough the DFA.
The «Main proposition for LR-parsing»: The items in the top-state will indicate
all possible «local situations» that can occur in an LR parse, with the current stack.
S’
S
B
X
A
α β
Stakk:
If the item X → α β is a member of the top-state, then the situation may be as shown to the right
The rest of the input:
Token
When is shift a possibility?
........
X → α • a β
........
........
X → α a • β
........
a
s t
This tells us that the situation may be as in the figure to the right. Thus:
• Shift is a possible operation
• Also: If shift is the correct operation and «a» is a terminal symbol equal to the token symbol, then the state after the shift will be t.
Assume here that the top state s contains the item X → α a β, where «a» may be either a terminal or a non-terminal
19
S’
S
B
X
A
α a β
Stack:
The rest of the input
Token
When is a reduction (with a given production) a possibility?
........
A → γ • .......
S: Is called a complete item («slutt-item»)
Current stack
... v u w z s
New stack:
... v u t
γ
This indicates that the situation locally may be as in the figure to the right, and that the next step might be a reduction with A → γ
A A …
u …
t
NEW: We remember the states between the stack symbols!
The reduction step: Pop off what corresponds to γ (and the states in between), and push A as the new top symbol, and find the new top state from u and A
Assume that top state is s, and that this state has the item A → γ
20
S’
S
B
A
C
Stakk:
The rest of input
γ
Part of the DFA:
Token
21
Our old example, and LR(0) grammars
E’
E
E
n + n $
Should shift
Should reduce with E’→E
Def. of LR(0)-grammar: «The top state uniquely decides the next step»
Thus: The above grammar is not LR(0), because of state 2 (stack = $ E)
If the stack is: «E», the top state is 1, and we can either shift or reduce with E’ → E:
22
Are these example grammars LR(0) ? Grammar is LR(0) iff: For every state, only one action is possible
The example grammars:
This grammar gives the following LR(0)–DFA: State Possible actions:
0 Only shift is possible
1 Only red. possible, with A’ → A
2 Only red. possible, with A → a
3 Only shift is possible
4 Only shift is possible
5 Only red. possible, with A → (A)
- When shift: Many shift items may occur. Shift is one action
- When reduction: It must also be clear with which production
We have already looked at this one: Not LR(0)!
Yes! This grammar
is LR(0)!
Parsing table for an LR(0) grammar This table structure is slightly different than for SLR(1), LALR(1) and LR(1) grammars
Therefore this table structure is not important (not curriculum)
Parsing av sentence: ((a)) A’
A
A
A
( ( a ) )
We should here reduce with A’→A, and as the input is empty we are finished, and the sentence is correct
If a reduction brings us to state 0 or 3, the Goto part will tell us what state pushing A will give
24
Parsing of erroneous sentences Grammar: A → ( A ) | a
$ 0 ( ( a ) $
$ 0 ( 3 ( a ) $
$ 0 ( 3 ( a ) $
$ 0 ( 3 ( 3 a ) $
$ 0 ( 3 ( 3 a ) $
$ 0 ( 3 ( 3 A ) 5 $
$ 0 ( 3 ( 3 A 4 $
$ 0 ( ) $
$ 0 ( 3 ) $
Important invariant for LR-parsing in general:
We are never allowed to shift something illegal onto the
stack!
25
A (slightly erroneous) formulation of the LR(0) requirement: If the following rules gives an unambiguous algorithm,
the grammar is LR(0):
S is a DFA state
t, where s → t X
Avslutning
t, where u → t A u
u Can
occu
r in
m
any
stat
es
26
The next grammar: Is it LR(0)? No, because of the states 0, 2 and 4
LR(0) – DFA:
27
Old grammar that is not LR(0): How can we make a choice in state 1 ??
E’
E
E
n + n $
We should shift
We should reduce with E’→E
Solution: We look at the next input symbol, in the token-variable!!
28
SLR(1) – grammars. The SLR(1) algorithm
Very few grammars are LR(0) By looking at the Follow-set we can obtain a much stronger algorithm Will still use the LR(0)-DFA The table structure will be a little different. The tables will have one
column for each terminal and one for each non-terminal.
... A → α. ... B → β.
... A → α. ... B1 → β1. b1 γ1 ... B2 → β2. b2 γ2
LR(0): We have an unsolvable red./red. conflict
SLR(1): If Follow(A) ∩ Follow(B) = ∅ then we can solve the conflict by looking at the next input symbol
If token∈ Follow(A) reduce with A → α If token∈ Follow(B) reduce with B → β
But the «…» may inicate other possibilities!
b1
b2
…
29
SLR(1) - grammars, SLR(1) - algorithms
Very few grammars are LR(0) By looking at the Follow-set we can obtain a much stronger algorithm Will still use the LR(0)-DFA The table structure will be a little different. The tables will have one
column for each terminal and one for each non-terminal.
... A → α. ... B → β.
... A → α. ... B1 → β1. b1 γ1 ... B2 → β2. b2 γ2
LR(0): We have an unsovable shift/red. conflict
SLR(1): If Follow(A) ∩ {b1,b2, …} = ∅ then we can solve the conflict by looking at the next input symbol (in token) as follows:
• If token∈ Follow(A): Reduce with A → α Nonterminal A will decide new top state
• If token = b1, b2, …: shift. Input will decide new top state
• But the «…» may inicate other possibilities!
b1
b2
…
30
Is this grammar SLR(1)?
The SLR(1) requirement in the book: For all DFA states s, we must have:
Follow(E’) = { $ }
Thus: • Shift for ’+’ • Reduce for ’$’, med E’ → E (which is accept)
Shift reduce conflict in LR(0), but not in SLR(1)
Would otherwise have a shift/red. conflict on input X
Would otherwise have a red./red. conflict when input is in this set. Recall: «Complete item» = it has the dot at the end
Skal skifte for n
31
A (slightly erroneous) formulation of the SLR(1) requirement: If the following rules gives an unambiguous algorithm, the grammar is SLR(1):
t, where u → t A
t, where s → t X
u
Dette er nytt i forhold til LR(0).
32
Parsing table for an SLR(1)-grammar
SLR(1)-kravet på en annen måte: Denne tabellen må være entydig!
’n’ not in Follow(E)
33
Parsing for an SLR(1)-grammar
E’
E
E
E
n + n + n
May also look at erroeous sentences: + n $ n n $ n + $
Parsing of the sentence: n + n + n
34
Is this grammar SLR(1)?
Follow(S) = { ), $ }
35
SLR(k) – Possible to come up with a theory for this, but it is probably not used in any tools.