inf5110 ch. 5: bottom-up parsing part 1€¦ · ch. 5: bottom-up parsing part 1 12/2-2015 stein...

1

INF5110 Ch. 5: Bottom-up parsing

Part 1

12/2-2015 Stein Krogdahl

Ifi, UiO

Mandatory assignment 1: Will occur sometime during the week from 16/2

2

Bottom up parsing S

A

t 1 t 2

t 3 t 7 t 4 t 5 t 6

B

B

A

The methods listed below are, in the given order, able to handle more and more complicated grammars. Each «state» is represented as a row in the table above.

- LR(0) Can handle only very simple grammars. Require about 300 states for a standard programming language. Only as an intro. to SLR(1) and LALR(1).

- SLR(1) Can take most standard grammars for standard PLs. The same number of states as the LR(0) method. This method will be our main focus

- LALR(1) Can handle a few more grammars than SLR(1). Again the same number of states as the LR(0) method. We will look at the ideas behind this method

- LR(1) Can handle all grammars that in any way can be handled by looking at only the next token. The number of states will be around 3000.

Automated tools that, from a BNF grammar, can deliver a parser:

-YACC Bison CUP ( all of the uses LALR(1)-techniques)

States

Tokens + nonterminals

Table for LR parsing

3

Data structure for LR-parsing

S’ → S

S → A B t 7 | ....

A → t 4 t 5 | t 1 B |

B → t 2 t 3 | A t 6 | ....

Assume that the grammar is unambiguous, that we are given a correct sentence «t1 t2 … t7» and that we know the parse tree for this sentence:

S

A

t 1 t 2

t 3 t 7 t 4 t 5 t 6

B B

A

S’

LR-parsing: • Have a stack representing what we have read

• Make a «reduction» of a subtree when «it occurs» at the top of the stack

Add a new «outermost» production. Thus, the new start symbol S’ will never occur in the right hand side of a production.

4

More about LR-parsing

And we assume thatwe know the parse tree for this sentence

S

A

t 1 t 2

t 3 t 7 t 4 t 5 t 6

B B

A

S’

• We have a stack representing what is read

• Make a «reduction» of a subtree when it appears at the top of the stack.

• A reduction is to replace this with the non-terminal that produced this tree.

• A reduction: To use a production backwards

Start-situation: $ t 1 t 2 t 3 t 7 $ t 4 t 5 t 6

stack input

stack input

Slutt-situasjonen: $ S’ $

S’ → S

S → A B t 7 | ....

A → t 4 t 5 | t 1 B |

B → t 2 t 3 | A t 6 | ....

New outermost production

The principles of LR-parsing, and the shift and reduce operations

S’ → S

S → A B t 7 | ....

A → t 4 t 5 | t 1 B |

B → t 2 t 3 | A t 6 | ....

stakk input

$ t1 t2 t3 t4 t5 t6 t7$

$ t1 t2 t3 t4 t5 t6 t7 $

$ t1 t2 t3 t4 t5 t6 t7 $

$ t1 t2 t3 t4 t5 t6 t7 $

$ t1 B t4 t5 t6 t7 $

$ A t4 t5 t6 t7 $

$ A t4 t5 t6 t7$

$ A t4 t5 t6 t7 $

$ A A t6 t7 $

$ A A t6 t7 $

$ A B t7 $

$ A B t7 $

$ S $

S

A

t 1 t 2

t 3 t 7 t 4 t 5 t 6

B

B A

S’

• There are two types of steps:

• Shift: Move the next input symbol over to the top of the stack.

• Reduction: Remove the symbols of the rightmost subtree from the stack, and replace it by the nonterminal at the root of the subtree.

If you know the parse tree it is easy to perform these steps correctly.

BUT: How can we do this without knowing:

- The full syntax tree

- The rest of the input

6

For comparison: Top down and bottom up parsing

Analysed and found OK token Will be matched against the rest of A.

C

S

A

B

Which alternative or A?

Input

• In top-down (e.g. recursive descent) parsing we «produce» the syntax tree aiming to get the given sentence, while using the productions of the grammar.

• During bottom-up (e.g LR) parsing we «reduce» the input according to productions, while aiming at the start symbol.

7

Example showing LR-parsing (when we know the tree!)

Right derivation (but the example is not good!)

E’

E

E

n + n

In our textbook, but we do not stress this:

• The next reduction that should be made is called the ”handle” of the situation.

• ”stack + input” will always form a sentenial form occuring during a right derivation. They occur in the opposite order of which they occur in a right derivation.

8

Another example: LR-parsing from grammars with empty productions

S’

S

S S

ε ε

( )

NB: S → ε will appear a little strange: During a reduction with this production a nonterminal will show up at the top of the stack «from nowhere», here and here.

9

A typical situation during LR-parsing

$ the stack token rest of input

s1s2s3s4 ....... sk

The stack is reduced version of the processed input

S’

S

A

B

t 1 t 2 t 3 ......t n $ t 4 t 5 t 6

All these are reduced

After a shift, the next reduction to be made is a reduction with the production:

C -> t1

Then, after two shifts, we will make a reduction with the production:

D -> t2 t3

Then, what’s next? D

C

Means that they are the same node

10

A plan for solving «The LR-problem»: «When to do a shift and when to do a reduction (with what production)»

(Not everything on the slides is found in our book. Only what’s in the book is curriculum)

We look at all possible stacks that can occur during LR-parsing of a sentence in L(G). We consider them as a new language Stacks(G), over the following alphabet:

{terminals} ∪ {non-terminals} Stacks(G) = {string s | s may occur as the stack during LR-parsing of a

sentence in L(G))}

This language turns out to be regular, and can be described by an NFA where all states are accepting. These states are identified by ”items” of the form: A → X Y . Z The possible state transitions of the NFA can also be described rather

straight-forward.

We will turn this NFA into a DFA in the standard way (subset construction from ch. 2)

Each state of this DFA will be a subset of the NFA states, and thus be sets of items.

11

The rest of: The plan for solving «The LR-problem»:

The new language Stacks(G), turns out to be regular, as it can be described by an NFA

In this NFA all states are accepting

The states of the NFA are all the possible ”items” of the grammar:

The position of the dot (e.g. in «A → X Y . Z» ) will play a central role in the later analyses.

We will turn this NFA into a DFA in the standard way (subset construction from ch. 2)

Thus, each state of this DFA will consist of a set of items, and these items will indicate the possible «local situations» of the parsing (see: «The main proposition for LR-parsing»).

12

The NFA: The states and the transitions This is not fully explained in our book, and is not curriculum

S’

S

A

B

$

A

Stack S: Original input:

Given a grammar G and a randomly chosen sentence in L(G), and a randomly chosen point during LR parsing of this sentence.

$

We will describe an NFA that will accept exactly all such «lines», seen as a sequence of symbols.

This sequence is the content of the stack

Note: No such subtrees remain under the left side branches of the remaining tree. They are all reduced to their root symbol.

Means that they are the same nodes

13

The state transitions of the NFA

This is not fully explained in our book, and is not curriculum S’

S

A

B

$

X α

X

η

α η

corresponds to A

ε β

Stack: Original input:

x

Y

$

We have to show:

(1) For any such situation, the left edge of the tree must be accepted by the NFA

(2) For any path through the NFA, we can set up a parse tree that has this path as its left edge.

The same type of «random situation» as before

corresponds to

If X is a non-terminal:

X can be either a non-terminalor a terminal:

14

Example: The NFA that describes all possible stacks More precisely: The LR(0)-NFA

E ’ → E

E → E + n

E → n

Items (LR(0)-items) E’

E

E

+ n start

E→ E+.n E→ E+n. E → E. +n

E ’ → E. E ’ → .E

E → .E +n

E → n. E → .n

ε

ε

ε

E

E

n

n +

ε

The same NFA, in a slightly more orderly form:

Again: The LR(0)-NFA (Slightly more ordered than in the textbook):

LR(0)-DFA, made by the «subset construction» (Ch. 2):

E→ E+.n E→ E+n. E → E. +n

E ’ → E. E ’ → .E

E → .E +n

E → n. E → .n

ε

ε

ε

E

E

n

n +

start

ε

E’

E

E . + n

E’

E .

”E” on the stack ends in state 1. The two possible situation can then be exemplified by:

Closure None of the other states will have any closure

16

How to construct the LR(0)-DFA directly from the grammar.

(Straight ahead use of the «subset construction»)

Closure of a set I of items: If:

A → α • Bγ is an item in I B is a non-terminal, and B → β 1 | β 2 | ...

Then these items B → • β 1

B → • β 2

…

should also be included in I.

The start state of the LR(0)-DFA is:

State transition for symbol X from state I X is a terminal or a

nonterminal (they are here treated the same way

S’ → • S

+ closure

Make sure that all items of the form «A → α • X β» are included

........

A1 → α1 • X β1

........

A2 → α2 • X β2

........

A1 → α1 X • β1

A2 → α2 X • β2

+ closure

X

State I:

Another example: How to construct the LR(0)-DFA directly from the grammar

Note that S → ε gives only one item, which is: S → • Not like this : S → • ε, and S → ε •

LR(0) – NFA:

LR(0) – DFA:

The items:

18

What is the ”top state” telling us?

The top state: The DFA-state that we arrive at when

the stack is fed trough the DFA.

The «Main proposition for LR-parsing»: The items in the top-state will indicate

all possible «local situations» that can occur in an LR parse, with the current stack.

S’

S

B

X

A

α β

Stakk:

If the item X → α β is a member of the top-state, then the situation may be as shown to the right

The rest of the input:

Token

When is shift a possibility?

........

X → α • a β

........

........

X → α a • β

........

a

s t

This tells us that the situation may be as in the figure to the right. Thus:

• Shift is a possible operation

• Also: If shift is the correct operation and «a» is a terminal symbol equal to the token symbol, then the state after the shift will be t.

Assume here that the top state s contains the item X → α a β, where «a» may be either a terminal or a non-terminal

19

S’

S

B

X

A

α a β

Stack:

The rest of the input

Token

When is a reduction (with a given production) a possibility?

........

A → γ • .......

S: Is called a complete item («slutt-item»)

Current stack

... v u w z s

New stack:

... v u t

γ

This indicates that the situation locally may be as in the figure to the right, and that the next step might be a reduction with A → γ

A A …

u …

t

NEW: We remember the states between the stack symbols!

The reduction step: Pop off what corresponds to γ (and the states in between), and push A as the new top symbol, and find the new top state from u and A

Assume that top state is s, and that this state has the item A → γ

20

S’

S

B

A

C

Stakk:

The rest of input

γ

Part of the DFA:

Token

21

Our old example, and LR(0) grammars

E’

E

E

n + n $

Should shift

Should reduce with E’→E

Def. of LR(0)-grammar: «The top state uniquely decides the next step»

Thus: The above grammar is not LR(0), because of state 2 (stack = $ E)

If the stack is: «E», the top state is 1, and we can either shift or reduce with E’ → E:

22

Are these example grammars LR(0) ? Grammar is LR(0) iff: For every state, only one action is possible

The example grammars:

This grammar gives the following LR(0)–DFA: State Possible actions:

0 Only shift is possible

1 Only red. possible, with A’ → A

2 Only red. possible, with A → a



5 Only red. possible, with A → (A)

- When shift: Many shift items may occur. Shift is one action

- When reduction: It must also be clear with which production

We have already looked at this one: Not LR(0)!

Yes! This grammar

is LR(0)!

Parsing table for an LR(0) grammar This table structure is slightly different than for SLR(1), LALR(1) and LR(1) grammars

Therefore this table structure is not important (not curriculum)

Parsing av sentence: ((a)) A’

A

A

A

( ( a ) )

We should here reduce with A’→A, and as the input is empty we are finished, and the sentence is correct

If a reduction brings us to state 0 or 3, the Goto part will tell us what state pushing A will give

24

Parsing of erroneous sentences Grammar: A → ( A ) | a

$ 0 ( ( a ) $

$ 0 ( 3 ( a ) $

$ 0 ( 3 ( a ) $

$ 0 ( 3 ( 3 a ) $

$ 0 ( 3 ( 3 a ) $

$ 0 ( 3 ( 3 A ) 5 $

$ 0 ( 3 ( 3 A 4 $

$ 0 ( ) $

$ 0 ( 3 ) $

Important invariant for LR-parsing in general:

We are never allowed to shift something illegal onto the

stack!

25

A (slightly erroneous) formulation of the LR(0) requirement: If the following rules gives an unambiguous algorithm,

the grammar is LR(0):

S is a DFA state

t, where s → t X

Avslutning

t, where u → t A u

u Can

occu

r in

m

any

stat

es

26

The next grammar: Is it LR(0)? No, because of the states 0, 2 and 4

LR(0) – DFA:

27

Old grammar that is not LR(0): How can we make a choice in state 1 ??

E’

E

E

n + n $

We should shift

We should reduce with E’→E

Solution: We look at the next input symbol, in the token-variable!!

28

SLR(1) – grammars. The SLR(1) algorithm

Very few grammars are LR(0) By looking at the Follow-set we can obtain a much stronger algorithm Will still use the LR(0)-DFA The table structure will be a little different. The tables will have one

column for each terminal and one for each non-terminal.

... A → α. ... B → β.

... A → α. ... B1 → β1. b1 γ1 ... B2 → β2. b2 γ2

LR(0): We have an unsolvable red./red. conflict

SLR(1): If Follow(A) ∩ Follow(B) = ∅ then we can solve the conflict by looking at the next input symbol

If token∈ Follow(A) reduce with A → α If token∈ Follow(B) reduce with B → β

But the «…» may inicate other possibilities!

b1

b2

…

29

SLR(1) - grammars, SLR(1) - algorithms

Very few grammars are LR(0) By looking at the Follow-set we can obtain a much stronger algorithm Will still use the LR(0)-DFA The table structure will be a little different. The tables will have one

column for each terminal and one for each non-terminal.

... A → α. ... B → β.

... A → α. ... B1 → β1. b1 γ1 ... B2 → β2. b2 γ2

LR(0): We have an unsovable shift/red. conflict

SLR(1): If Follow(A) ∩ {b1,b2, …} = ∅ then we can solve the conflict by looking at the next input symbol (in token) as follows:

• If token∈ Follow(A): Reduce with A → α Nonterminal A will decide new top state

• If token = b1, b2, …: shift. Input will decide new top state

• But the «…» may inicate other possibilities!

b1

b2

…

30

Is this grammar SLR(1)?

The SLR(1) requirement in the book: For all DFA states s, we must have:

Follow(E’) = { $ }

Thus: • Shift for ’+’ • Reduce for ’$’, med E’ → E (which is accept)

Shift reduce conflict in LR(0), but not in SLR(1)

Would otherwise have a shift/red. conflict on input X

Would otherwise have a red./red. conflict when input is in this set. Recall: «Complete item» = it has the dot at the end

Skal skifte for n

31

A (slightly erroneous) formulation of the SLR(1) requirement: If the following rules gives an unambiguous algorithm, the grammar is SLR(1):

t, where u → t A

t, where s → t X

u

Dette er nytt i forhold til LR(0).

32

Parsing table for an SLR(1)-grammar

SLR(1)-kravet på en annen måte: Denne tabellen må være entydig!

’n’ not in Follow(E)

33

Parsing for an SLR(1)-grammar

E’

E

E

E

n + n + n

May also look at erroeous sentences: + n $ n n $ n + $

Parsing of the sentence: n + n + n

34

Is this grammar SLR(1)?

Follow(S) = { ), $ }

35

SLR(k) – Possible to come up with a theory for this, but it is probably not used in any tools.

inf5110 ch. 5: bottom-up parsing part 1€¦ · ch. 5: bottom-up parsing part 1 12/2-2015 stein...

Documents