bottom up parsing1

69
1 BOTTOM UP PARSING

Upload: vivek-patel

Post on 26-Sep-2015

226 views

Category:

Documents


0 download

DESCRIPTION

Bottom Up Parsing1

TRANSCRIPT

  • 1BOTTOM UP PARSING

  • 2Shift-Reduce Parsers

    There are two main categories of shift-reduce parsers

    1. Operator-Precedence Parser

    simple, but only a small class of grammars.

    2. LR-Parsers

    covers wide range of grammars.

    SLR simple LR parser

    LR most general LR parser

    LALR intermediate LR parser (lookahead LR parser)

    SLR, LR and LALR work same, only their parsing tables are different.SLR

    CFG

    LR

    LALR

  • 3LR Parsers

    The most powerful shift-reduce parsing (yet efficient) is:

    LR(k) parsing.

    left to right right-most k lookheadscanning derivation (k is omitted it is 1)

    LR parsing is attractive because: LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient.

    The class of grammars that can be parsed using LR methods is a proper superset of the class

    of grammars that can be parsed with predictive parsers.

    LL(1)-Grammars LR(1)-Grammars

    An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right

    scan of the input.

  • 4LL(k) vs. LR(k)

    LL(k): must predict which production to use having seen only first k

    tokens of RHS

    Works only with some grammars

    But simple algorithm (can construct by hand)

    LR(k): more powerful

    Can postpone decision until seen tokens of entire RHS of a

    production & k more beyond

  • 5More on LR(k)

    Can recognize virtually all programming language constructs (if CFG

    can be given)

    Most general non-backtracking shift-reduce method known, but can be

    implemented efficiently

    Class of grammars can be parsed is a superset of grammars parsed by

    LL(k)

    Can detect syntax errors as soon as possible

  • 6More on LR(k)

    Main drawback: too tedious to do by hand for typical

    programming lang. grammars We need a parser generator

    Many available

    Yacc (yet another compiler compiler) or bison for C/C++

    environment

    CUP (Construction of Useful Parsers) for Java environment;

    JavaCC is another example

    We write the grammar and the generator produces the parser for that

    grammar

  • 7LR Parsers

    LR-Parsers

    covers wide range of grammars.

    SLR simple LR parser

    LR most general LR parser

    LALR intermediate LR parser (look-head LR parser)

    SLR, LR and LALR work same (they used the same algorithm),

    only their parsing tables are different.

  • 8LR Parsing Algorithm

    Sm

    Xm

    Sm-1

    Xm-1

    .

    .

    S1

    X1

    S0

    a1 ... ai ... an $

    Action Table

    terminals and $

    st four different a actionstes

    Goto Table

    non-terminal

    st each item isa a state numbertes

    LR Parsing Algorithm

    stack

    input

    output

  • 9Parse Table For Expression Grammar

    Rules:

    1. E E + T

    2. E T

    3. T T * F

    4. T F

    5. F ( E )

    6. F id

    Notation:

    s5 = shift 5

    r2 = reduce by

    E T

    action goto

    State id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1 s6 acc

    2 r2 s7 r2 r2

    3 r4 r4 r4 r4

    4 s5 s4 8 2 3

    5 r6 r6 r6 r6

    6 s5 s4 9 3

    7 s5 s4 10

    8 s6 s11

    9 r1 s7 r1 r1

    10 r3 r3 r3 r3

    11 r5 r5 r5 r5

  • 10

    Entries in Transition Table

    Entry Meaning

    sn Shift into state n (advance input

    pointer to next token)

    gn Goto state n

    rk Reduce by rule (production) k;

    corresponding gn gives next state

    a Accept

    Error (denoted by blank entry)

  • 11

    Actions of A LR-Parser

    1. shift s -- shifts the next input symbol and the state s onto the stack

    ( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

    2. reduce A (or rn where n is a production number)

    pop 2|| (=r) items from the stack;

    then push A and s

    Output is the reducing production reduce A

    3. Accept Parsing successfully completed

    4. Error -- Parser detected an error (an empty entry in the action table)

  • 12

    LR Parsing Algorithm

    Refer Text:

    Compilers Principles Techniques and Tools by Alfred V Aho, Ravi

    Sethi, Jeffery D Ulman

    Page No. 218-219

  • 13

    Actions of A (S)LR-Parser -- Example

    stack input action output

    0 id*id+id$ shift 5

    0id5 *id+id$ reduce by Fid Fid

    0F3 *id+id$ reduce by TF TF

    0T2 *id+id$ shift 7

    0T2*7 id+id$ shift 5

    0T2*7id5 +id$ reduce by Fid Fid

    0T2*7F10 +id$ reduce by TT*F TT*F

    0T2 +id$ reduce by ET ET

    0E1 +id$ shift 6

    0E1+6 id$ shift 5

    0E1+6id5 $ reduce by Fid Fid

    0E1+6F3 $ reduce by TF TF

    0E1+6T9 $ reduce by EE+T EE+T

    0E1 $ accept

  • 14

    Key Idea

    Deciding when to shift and when to reduce is based on a DFA

    applied to the stack

    Edges of DFA labeled by symbols that can be on stack

    (terminals + non-terminals)

    Transition table defines transitions (and characterizes the type

    of LR parser)

  • 15

    SLR PARSING

    The central idea in the SLR method is first to construct from

    the grammar a DFA to recognize viable prefixes. We group

    items into sets, which become the states of the SLR parser.

    Viable prefixes:

    The set of prefixes of a right sentential form that can appear on the

    stack of a Shift-Reduce parser is called Viable prefixes.

    Example :- a, aa, aab, and aabb are viable prefixes of aabbbbd.

    One collection of sets of LR(0) items, called the canonical

    LR(0) collection, provides the basis for constructing SLR

    parsers.

  • 16

    How to make the Parse Table?

    Use DFA for building parse tables

    Each state now summarizes how much we have seen so far

    and what we expect to see

    Helps us to decide what action we need to take

    How to build the DFA, then?

    Analyze the grammar and productions

    Need a notation to show how much we have seen so far

    for a given production: LR(0) item

  • 17

    LR(0) Item

    An LR(0) item is a production and a position in its RHS marked by a dot

    (e.g., A )

    The dot tells how much of the RHS we have seen so far. For example,

    for a production S XYZ,

    S XYZ: we hope to see a string derivable from XYZ

    S XYZ: we have just seen a string derivable from X and we hope to see a string derivable from YZ

    SXY.Z : we have just seen a string derivable from XY and we hope to see a string derivable from Z

    SXYZ. : we have seen a string derivable from XYZ and going to reduce it to S

    (X, Y, Z are grammar symbols)

  • 18

    Augmented Grammar

    If G is a grammar with start symbol S, then G', the

    augmented grammar for G, is G with

    new start symbol S' and

    the production S' S.

    The purpose of the augmenting production is to indicate to

    the parser when it should stop parsing and accept the input.

    That is, acceptance occurs only when the parser is about to

    reduce by the production S' S.

  • 19

    Constructing Sets of LR(0) Items

    1. Create a new nonterminal S' and a new production S' S where S is

    the start symbol.

    2. Put the item S' S into a start state called state 0.

    3. Closure: If A B is in state s, then add B to state s for

    every production B in the grammar.

    4. Creating a new state from an old state[ goto operation] : goto(I,X) is

    closure of set of all items [A X ] such that [A X] is in I ,

    where X is a grammar symbol.

    5. Repeat steps 3 and 4 until no new states are created. A state is new if it

    is not identical to an old state.

  • 20

    The Closure Operation (Example)

    Grammar:E E + T | TT T * F | FF ( E )F id

    { [E E] }

    closure({[E E]}) =

    { [E E][E E + T][E T] }

    { [E E][E E + T][E T][T T * F][T F] }

    { [E E][E E + T][E T][T T * F][T F][F ( E )][F id] }

    Add [E]Add [T]

    Add [F]

  • 21

    State 0

    We start by adding item E' E to

    state 0.

    This item has a " " immediately to the

    left of a nonterminal. Whenever this is

    the case, we must perform step 3

    (closure) of the set construction

    algorithm.

    We add the items E E + T and E

    T to state 0, giving

    I0: { E' E

    E E + T

    E T }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 22

    State 0

    Reapplying closure to E T, we must add the

    items T T * F and

    T F to state 0, giving

    I0: { E' E

    E E + T

    E T

    T T * F

    T F

    }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 23

    State 0

    Reapplying closure to T F, we must

    add the items F ( E ) and F id

    to state 0, giving

    I0: { E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 24

    Formal Definition of GOTO operation for constructing

    LR(0) Items

    1. For each item [AX] I, add the set of items

    closure({[AX]}) to goto(I,X) if not already

    there

    2. Repeat step 1 until no more items can be added to

    goto(I,X)

  • 25

    The Goto Operation (Example 1)

    Suppose I = Then goto(I,E)= closure({[E E , E E + T]})= { [E E ]

    [E E + T] }

    Grammar:E E + T | TT T * F | FF ( E )F id

    { [E E][E E + T][E T][T T * F][T F][F ( E )][F id] }

  • 26

    Creating State 1 From State 0 [ goto(I0,E)]

    Final version of state 0:

    I0: {

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    }

    Using step 4, we create new state 1 from items E'

    E and E E + T

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 E I1

  • 27

    State 1

    State 1 starts with the items E' E and E E + T. These items are formed from items E' E and E E + T by moving the "" one grammar symbol to the right. In each case, the grammar symbol is E.

    Closure does not add any new items, so state 1 ends up with the 2 items:

    I1: {

    E' E

    E E + T

    }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 E I1

  • 28

    Creating State 2 From State 0 [ goto(I0,T)]

    Using step 4, we create state 2 from items E T

    and T T * F by moving the "" past the T.

    State 2 starts with 2 items,

    I2: {

    E T

    T T * F

    }

    Closure does not add additional items to state 2.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 T I2

  • 29

    Creating State 3 From State 0 [ goto(I0,F)]

    Using step 4, we create state 3 from item T

    F.

    State 3 starts (and ends up) with one item:

    I3: {

    T F

    }

    Since the only item in state 3 is a complete

    item, there will be no transitions out of state

    3.

    The figure on the next slide shows the DFA of

    viable prefixes to this point.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 F I3

  • 30

    DFA After Creation of State 3

  • 31

    Creating State 4 From State 0 [ goto(I0,( )]

    Using step 4, we create state 4 from item F

    ( E ).

    State 4 begins with one item:

    F ( E )

    Applying closure to this item, we add the items

    E E + T

    E T

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 32

    State 4

    Applying closure to E T, we add items T T * F and T F to state 4, giving

    F ( E )

    E E + T

    E T

    T T * F

    T F

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 33

    State 4

    Applying step 3 to T F, we add items F

    ( E ) and F id to state 4, giving the

    final set of items

    I4: {

    F ( E )

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    }

    The next slide shows the DFA to this point.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 ( I4

  • 34

    DFA After Creation of State 4

  • 35

    Creating State 5 From State 0 [ goto(I0,id)]

    Finally, from item F id in state 0, we

    create state 5, with the single item:

    I5: {

    F id

    }

    Since this item is a complete item, we will

    not be able to produce new states from state

    5.

    The next slide shows the DFA to this point.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I0 id I4

  • 36

    DFA After Creation of State 5

  • 37

    Creating State 6 From State 1 [ goto(I1,+)]

    State 1 consists of 2 items

    E' E E E + T

    Create state 6 from item E E + T, giving the item E E + T.

    Closure results in the set of items

    I6: {

    E E + T

    T T * F

    T F

    F ( E )

    F id

    }

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I1 + I6

  • 38

    DFA After Creation of State 6

  • 39

    Creating State 7 From State 2 [ goto(I2,*)]

    State 2 has two items,

    E T

    T T * F

    We create state 7 from T T * F,

    giving the initial item T T * F.

    Using closure, we end up with

    I7: {

    T T * F

    F ( E )

    F id}

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I2 * I7

  • 40

    DFA After Creation of State 7

  • 41

    Creating State 8 From State 4 [ goto(I4,E)]

    We use the items F ( E ) and E

    E + T from State 4 to add the

    following items to State 8:

    I8: {

    F ( E )

    E E + T

    }

    No further items can be added to state 8

    through closure.

    There are other transitions from state 4,

    but they do not result in new states.

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    I4 E I8

  • 42

    Other Transitions From State 4 [ goto(I4,T),

    goto(I4,F), goto(I4,( ), goto(I4,id)]

    If we use the items E T and

    T T * F from state 4 to start a

    new state, we begin with items

    E T

    T T * F

    This set is identical to state 2.

    Similarly, the items

    T F will produce state 3

    F ( E ) will produce state 4

    F id will produce state 5

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

  • 43

    DFA After Creation of State 8

  • 44

    Creating State 9 From State 6 [ goto(I6,T)]

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    We use items E E + T and T T * F

    from state six to create state 9:

    I9: {

    E E + T

    T T * F

    }

    All other transitions from state 6 go to

    existing states. The next slide shows the

    DFA to this point.

    I6 T I9

  • 45

    DFA After Creation of State 9

  • 46

    Creating State 10 From State 7 [ goto(I7,F)]

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    We use item T T * F from state 7 to create state 10:

    I10: {

    T T * F

    }

    All other transitions from state 7 go to existing states. The next slide shows the DFA to this point.

    I7 F I10

  • 47

    DFA After Creation of State 10

  • 48

    Creation of State 11 From State 8 [ goto(I8,))]

    E' E

    E E + T

    E T

    T T * F

    T F

    F ( E )

    F id

    We use item F ( E ) from state 8 to create state 11:

    I11: {

    F ( E )

    }

    All other transitions from state 8 go to existing states.

    State 9 has one transition to an existing state (7). No other new states can be added, so we are done.

    The next slide shows the final DFA for viable prefixes. I8 ) I11

  • 49

    DFA for Viable Prefixes

  • 50

    DFA for Viable Prefixes

  • 51

    Constructing Parse Table

    Construct the DFA (state graph) as in LR(0)

    Action Table

    If there is a transition from the state i to state j on a terminal a,

    ACTION[i, a] = shift j

    If there is a reduce item A (for a production #k in state i, for each a FOLLOW(A),

    ACTION[i, a] = Reduce k

    If an item S S. is in state i,

    ACTION[i, $] = Accept

    Otherwise, error

    GOTO

    Write GOTO for nonterminals: for terminals it is already embedded

    in the action table

  • 52

    Algorithm Construction of SLR Parsing Table1. Construct the canonical collection of sets of LR(0) items for G.

    C{I0,...,In}

    2. Create the parsing action table as follows

    If a is a terminal, A.a in Ii and goto(Ii,a)=Ij then action[i,a] is

    shift j.

    If A. is in Ii , then action[i,a] is reduce A for all a in FOLLOW(A) where AS.

    If SS. is in Ii , then action[i,$] is accept.

    If any conflicting actions generated by these rules, the grammar is

    not SLR(1).

    Create the parsing goto table

    for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j

    All entries not defined by (2) and (3) are errors.

    4. Initial state of the parser contains S.S

  • 53

    (SLR) Parsing Tables for Expression Grammar

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

  • 54

    DFA for Viable Prefixes

  • 55

    We use the partial DFA at right

    to fill in row 0 of the parse table.

    By rule 2a,

    action[ 0, ( ] = shift 4

    action[ 0, id ] = shift 5

    By rule 3,

    goto[ 0, E ] = 1

    goto[ 0, T ] = 2

    goto[ 0, F ] = 3

  • 56

    state id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

    Action Table Goto Table

  • 57

    We use the partial DFA at right

    to fill in row 1 of the parse table.

    By rule 2a,

    action [ 1, + ] = shift 6

    By rule 2c

    action [ 1, $ ] = accept

  • 58

    state id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1 s6 acc

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

    Action Table Goto Table

  • 59

    We use the partial DFA at right

    to fill in row 5 of the parse table.

    By rule 2b, we set

    action[ 5, x ] = reduce Fid

    for each x Follow(F).

    Since Follow(F) = { ), +, *, $)

    we have

    action[ 5, ) ] = reduce

    Fid

    action[ 5, +] = reduce

    Fid

    action[5, *] = reduce

    Fid

    action[5, $] = reduce

    Fid

  • 60

    state id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1 s6 acc

    2

    3

    4

    5 r6 r6 r6 r6

    6

    7

    8

    9

    10

    11

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

    Action Table Goto Table

  • 61

    Use the DFA to Finish the SLR Table

    The complete SLR parse table for the expression grammar is given on the next slide.

  • 62

    (SLR) Parsing Tables for Expression Grammar

    state id + * ( ) $ E T F

    0 s5 s4 1 2 3

    1 s6 acc

    2 r2 s7 r2 r2

    3 r4 r4 r4 r4

    4 s5 s4 8 2 3

    5 r6 r6 r6 r6

    6 s5 s4 9 3

    7 s5 s4 10

    8 s6 s11

    9 r1 s7 r1 r1

    10 r3 r3 r3 r3

    11 r5 r5 r5 r5

    Action Table Goto Table

    1) E E+T

    2) E T

    3) T T*F

    4) T F

    5) F (E)

    6) F id

  • 63

    Actions of A (S)LR-Parser -- Example

    stack input action output

    0 id*id+id$ shift 5

    0id5 *id+id$ reduce by Fid Fid

    0F3 *id+id$ reduce by TF TF

    0T2 *id+id$ shift 7

    0T2*7 id+id$ shift 5

    0T2*7id5 +id$ reduce by Fid Fid

    0T2*7F10 +id$ reduce by TT*F TT*F

    0T2 +id$ reduce by ET ET

    0E1 +id$ shift 6

    0E1+6 id$ shift 5

    0E1+6id5 $ reduce by Fid Fid

    0E1+6F3 $ reduce by TF TF

    0E1+6T9 $ reduce by EE+T EE+T

    0E1 $ accept

  • 64

    SLR PARSING

    The central idea behind SLR method was first to construct

    from the grammar a DFA to recognize viable prefixes. We

    group items into sets, which become the states of the SLR

    parser.

    Viable prefixes:

    The set of prefixes of a right sentential form that can appear on the

    stack of a Shift-Reduce parser is called Viable prefixes.

    Example :- a, aa, aab, and aabb are viable prefixes of aabbbbd.

  • 65

    Example SLR Grammar and LR(0) Items

    Augmentedgrammar:1. C C2. C A B3. A a4. B a

    State I0:C CC A BA a

    State I1:C C

    State I2:C ABB a

    State I3:A a

    State I4:C A B

    State I5:B a

    goto(I0,C)

    goto(I0,a)

    goto(I0,A)

    goto(I2,a)

    goto(I2,B)

    I0 = closure({[C C]})I1 = goto(I0,C) = closure({[C C]})

    start

    final

  • 66

    Example SLR Parsing Table

    s3

    acc

    s5

    r3

    r2

    r4

    a $

    0

    1

    2

    3

    4

    5

    C A B

    1 2

    4

    State I0:C CC A BA a

    State I1:C C

    State I2:C ABB a

    State I3:A a

    State I4:C A B

    State I5:B a

    1

    2

    4

    5

    3

    0start

    a

    A

    CB

    a

    Grammar:1. C C2. C A B3. A a4. B a

  • 67

    shift/reduce and reduce/reduce conflicts

    If a state does not know whether it will make a shift operation or

    reduction for a terminal, we say that there is a shift/reduce conflict.

    If a state does not know whether it will make a reduction operation using the production rule i or j for a terminal, we say that there is a

    reduce/reduce conflict.

    If the SLR parsing table of a grammar G has a conflict, we say that that

    grammar is not SLR grammar.

  • 68

    Conflict Example

    S L=R I0: S .S I1:S S. I6:S L=.R I9: S L=R.

    S R S .L=R R .L

    L *R S .R I2:S L.=R L .*R

    L id L .*R R L. L .id

    R L L .id

    R .L I3:S R.

    I4:L *.R I7:L *R.

    Problem R .L

    FOLLOW(R)={=,$} L .*R I8:R L.

    = shift 6 L .id

    reduce by R L

    shift/reduce conflict I5:L id.

  • 69

    Conflict Example2

    S AaAb I0: S .S

    S BbBa S .AaAb

    A S .BbBa

    B A .

    B .

    Problem

    FOLLOW(A)={a,b}

    FOLLOW(B)={a,b}

    a reduce by A b reduce by A

    reduce by B reduce by B

    reduce/reduce conflict reduce/reduce conflict