cfg (3)

Upload: akshat-sapra

Post on 06-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 CFG (3)

    1/58

    Context Free Grammars

    Grammar: Grammar is a recursive definition of Language

    (Natural or Programming)

    Formally: Grammar G = {V,T,P,S}

    Terminals: Basically T =

    Variables: Non terminal symbols that represent sets ofstrings being defined recursively

    Start Symbols S: S belongs to V and is a special symbolthat generates the desired language

    Production rules P: Recursive definitions

    Note: T, V and P are always finite sets.

  • 8/2/2019 CFG (3)

    2/58

    Context Free Grammars

    Leq Example: The grammar: Geq = (V,T,P,S)

    T = {0,1}

    V = {S, A, B}

    P = { S|oA|1B

    A 1S|0AA

    B 0S|1BB}

    Notations: For set of rules

    A1, A2, A3

    Short cut:

    A1| 2| 3

  • 8/2/2019 CFG (3)

    3/58

    Context Free Grammars

    Context Free Grammars (CFG): Are only allowed tohave production rules for substitution of the form:

    A1, 2, 3, k

    Where: LHS A belongs to VRHS i belongs to V U T for all i

    Non Context Free Grammars: Rules might specifycontext in which rules substitution can be performed.

    e.g. 0A1 0123k1

    Thus rules cannot be applied in other contexts.

  • 8/2/2019 CFG (3)

    4/58

    Context Free Grammars

    Language of CFG G = {V, T, P, S} is defined as

    Context Free Language is any language which has a ContextFree Grammar G

    Terminology:

    Sentence: Any w T* such that S => w

    Sentential form: any (V U T)* such that S =>

    Terminals: a, b, c

    Variables: A, B, C,

    Terminal Strings: , u, v, w, x, y, z

    V U T: , X, Y, T

    Sentential Form: , , ,

    }|*{)( wSTwGL

  • 8/2/2019 CFG (3)

    5/58

    Context Free Grammars

    Reading assignment: From Textbook, (2nd edition,)Theorem 5.7 (which talks about language of palindromes)

    Example: Arithmetic Expressions

    G = {V, T, P, S}V = {E, I}

    S = E;

    T = { x, y, z, +, *, (, )}

    P = { E I|(E)|E+E|E*E

    I x|y|z }

    Apply: E => E+E => E+E*E => I+E*E => x+E*E => x+I*E

    => x+y*E => x+y*I => x+y*z

  • 8/2/2019 CFG (3)

    6/58

  • 8/2/2019 CFG (3)

    7/58

    Context Free Grammars

    Left-most derivation: Always substitute the leftmostvariable with a production rule in sentential form arising incourse of a derivation.

    Notation:Example: E => E+E => I+E => x+E.

    Similarly: We can define right-most derivations

    Example: E => E+E => E=E*E => E=E*I.

    Now we can talk about canonical derivation sequenceand

  • 8/2/2019 CFG (3)

    8/58

  • 8/2/2019 CFG (3)

    9/58

    Context Free Grammars

    A-Tree: Any tree (or subtree) rooted at variable A.

    But simply call it a parse tree if rooted at S.

    Yield|Frontier of a tree is the sequence of leaves labeled

    from left to right orderTheorem: is the yield of an Atree which implies and isimplied by the fact that A =>

    Proof: By induction on the height of tree (See textbook)

    Observe: Preceding parse tree does not specify a uniqueway to derive from A

    In fact it removes the non-determinism of which rules to

    apply but leaves the order of application unspecified.

  • 8/2/2019 CFG (3)

    10/58

    Context Free Grammars

    Leftmost derivation(lm) is obtained by traversing thetree in depth-first order always going into left subtreesbefore right ones

    Similarly, Rightmost derivation(rm) comes from depth-

    first traversal going into right subtrees first.

  • 8/2/2019 CFG (3)

    11/58

    Context Free Grammars

    Claim: Following are all equivalent statements.

    For CFG G = (V, T, P, S) and string w T*

    a) wL(G)

    b) S =(LM)=>w

    c) S =(RM)=>w

    d) There exists an S-tree which yields w

    Of course: we could always use leftmost derivations tospecify a canonical way to derive any wbelonging to L(G)or convert parse tree to unique derivations Thussimplifying the task of parsers.

    But, what if some w belonging to L(G) has two distinctarse trees and hence two distinct leftmost derivations?

  • 8/2/2019 CFG (3)

    12/58

    Context Free Grammars

    Example: x+y*z

    Note: This is not just a syntactic problem, as we get twodifferent semantic interpretations

    Tree1: x+(y*z)

    Tree2: (x+y)*z

  • 8/2/2019 CFG (3)

    13/58

    Context Free Grammars

    Definition: A CFG is ambiguous if for some w L(G),there exists more than one distinct parse tree.

    In compilers parse trees determine interpretation and wecannot allow ambiguity.

    Of course we can force use of parenthesis, but weshould really redesign the grammar to be unambiguous

    by encoding precedence of operators. (See textbook forredesigning of grammars)

    While above grammar can be redesigned to be

    unambiguous, it is not always possible to do that.

  • 8/2/2019 CFG (3)

    14/58

    Context Free Grammars

    Definition: A CFG is called Inherently Ambiguous if allits grammars are ambiguous.

    Example: L = {anbncmdm| n,m1} U {anbmcmdn|n.m1}

    Consider the strings of the form akbkckdk, We can never tell whether this string came from first or

    second type of strings in L and any CFG must allow bothof these possibilities.

  • 8/2/2019 CFG (3)

    15/58

    Push Down Automata(PDA)

    PDA is a class of machines corresponding to CFGs(Accepting only the Context Free Languages) useful indesigning parsers based on CFG

    As we have discussed before, we must give PDA,unbounded memory to allow it to handle non-regularlanguages

    However, we will restrict its access to memory!

  • 8/2/2019 CFG (3)

    16/58

    Push Down Automata(PDA)

    Setup: A PDA on transition

    1. Consumes an input symbol

    2. Goes to a new state(or stays in the old)

    3. Replaces top of the stack by any string (does nothing,pops the stack or pushes a string onto the stack)

    Push Down Automata (PDA) is essentially an -NFA with

    stack

  • 8/2/2019 CFG (3)

    17/58

    Push Down Automata(PDA)

    Stack Notation

    Content: (top)ABBAC(bottom)

    Pop: returns A; new content BBAC

    Push(XYZ): new content XYZBBAC

    Transitions:Determined by:

    Input or -move

    Current state

    Stack top

    Effect:

    New state

    Pop

    Push new string

  • 8/2/2019 CFG (3)

    18/58

    Push Down Automata(PDA)

    Formally: Push Down Automata is a seven tuplerepresented as

    M = {Q, ,, , q0, 0, F}

    Where: Q is finite set of states

    is finite set of input alphabet

    is finite set of stack alphabet

    is the transition function q0 , is the start state

    is the start symbol for stack, and

    is the set of accepting states

    *2}{:

    QQ

    0

    QF

  • 8/2/2019 CFG (3)

    19/58

    Push Down Automata(PDA)

    Transition function: takes as argument a triple given as

    Suppose we have (q, a, X) then

    1. q is a state in Q

    2. a is either an input symbol in or a = , the empty string whichis not to be assumed an input symbol.

    3. X is the stack symbol in

    Output of is finite set of pairs (pi,i) where pi is the new stateand i is the string of stack symbol that replaces X.

    (q,a,X) = {(p1, 1), (p2, 2), }

    *2}{:

    QQ

  • 8/2/2019 CFG (3)

    20/58

    Push Down Automata(PDA)

    Action: First PDA pops stack top to determine X, reads inputto determine a (unless it is an -transition) then knowing q, a, Xit selects non-deterministically one of the possibilities of (pi,i)

    Finally:

    State: goes from q to piInput: scans past a (unless a = )

    Stack: Loses old top symbol X but gets i pushed onto

    it.Note: We thus need Z0 on stack initially to allow the firsttransition to pop the stack

    Convention: String in * : x, y, z

    String in *

    : , ,

  • 8/2/2019 CFG (3)

    21/58

    Push Down Automata(PDA)

    Example: L = {on1n|n 1}

    PDA M: = {0,1} = {X, Z0} Q = {q0, q1, q2} F = {q2}

    Transitions:

    (q0, 0, Z0) = {(q0, XZ0)} [On input 0 add X to stack](q0, 0, X) = {(q0, XX)} [On input 0 add X to stack]

    (q0, 1, X) = {(q1, )} [On input 1 switch to q1 and

    consume X](q1, 1, X) = {(q1, )} [On input 1 keep consuming Xs]

    (q1, , Z0) = {(q2, )} [When Z0 is found, consume

    it and move to final state q2]

  • 8/2/2019 CFG (3)

    22/58

    Push Down Automata(PDA)

    Transition Diagram :

    Remarks:

    1. Will reject inputs not of the format 0*1* by not havingany transition defined.

    2. If too few 1s , will never go to q2

    3. If too many 1s will get stuck in q2 without reaching end

    of input.

  • 8/2/2019 CFG (3)

    23/58

    Push Down Automata(PDA)

    Instantaneous description(ID): Succinct notation for describing theentire configuration of PDA mid-stream in an execution

    ID =

    Where, q: current state

    x: unread input: Stack content

    Acceptance by a PDA: PDA accepts input w if there is at least onetrace of executions which leads to final state when end of input isreached

    Rejection by PDA:

    When no transition is possible (Stuck)

    If input not over but stack is empty

    If input is over but in non-final state

    Of course this must happen on every track to reject w.

  • 8/2/2019 CFG (3)

    24/58

  • 8/2/2019 CFG (3)

    25/58

    Push Down Automata(PDA)

    Rejection in such cases: Along every execution trace one ofthe following happens

    Before w is over, stack gets empty

    When w is over, stack is not empty Before w is over, PDA gets stuck

    Example: L = {wwr|w{a,b}*}

    PDA M: Q = {q1, q2}, F =

    = {a,b} = {A,B,Z0}Goal: Accept by empty stack

  • 8/2/2019 CFG (3)

    26/58

    Push Down Automata(PDA)

    Idea:

    1. q0 pushes w onto stack one by one

    2. Guess mid point of w and move to q1

    3. In q1, match input with stack top, one by one4. At end, Z0 should be at top, so remove it to halt and

    accept

    Key: In step3, stack pops w in reverse order.

  • 8/2/2019 CFG (3)

    27/58

    Push Down Automata(PDA)

    Example(Contd.) Input = aabbaa

    Execution trace

    Accept by empty stack

    Remark: Since PDA is non-deterministic, other executiontraces are possible

  • 8/2/2019 CFG (3)

    28/58

    Push Down Automata(PDA)

    Equivalence of language acceptance:

    Theorem: L = L(Pf) for some PDA Pf is equivalent to somePDA

    L = N(Pn) for some PDA P

    nProof: Given Pf = {Q, ,, , q0, 0, F} construct M2 such that

    N(M2) = L(M1)

    M2 = {Q2, ,2, 2, p0, X0, }

    with: Q2 = Q U {p0, p}2 = U {X0}

    N in N(M) stands for null stack or empty stack

  • 8/2/2019 CFG (3)

    29/58

    Push Down Automata(PDA)

    2 : Idea

    Start in p0 with X0 on stack

    Move to q0 with Z0X0 on stack

    Simulate Pf From any final states of Pf add transition to p which will

    just empty the stack

    X0: It prevents accidental acceptance by Pn when Pf empties

    its stack and rejects

  • 8/2/2019 CFG (3)

    30/58

    Push Down Automata(PDA)

    The reverse: Given Pn = {Q2, ,2, 2, p0, X0, }

    construct Pf such that L(Pf) = N(Pn)

    Idea: Pf = {Q1, ,1, 1, p0, X0, F1}

    where, Q1 = Q U {p0, pf}

    1= U {X0}

    F1 = {pf}

    Idea:

    Start with p0

    with X0

    in stack

    Move to q0 with Z0X0 on stack

    Simulate Pn

    When Pn empties its stack, it exposes X0

    From all states add an -move to pf whenever X0 in on top ofthe stack

  • 8/2/2019 CFG (3)

    31/58

    Push Down Automata(PDA)

  • 8/2/2019 CFG (3)

    32/58

    Push Down Automata(PDA)

    Equivalence of CFGs and PDA

    Claim: Every CFL is accepted by some PDA and every PDAaccepts some CFG

    Theorem 1: If L is CFL L = N(M) for some PDA M

    Proof: Suppose G is a CFG for L

    Our goal: Construct a PDA M for G such that L(M) = N(M)

    Idea: PDA M simulates LM derivations in G for input w such

    that at any step the sentential form is represented bya) A sequence of symbols consumed from input w by M

    b) Followed by contents of Ms stack

  • 8/2/2019 CFG (3)

    33/58

    Push Down Automata(PDA)

    Formally given CFG G = (V, T, P, S)

    Construct PDA M = {Q, ,, , q0, 0, }

    with Q = {q}, q0 = q, = T, = VUT, Z0 = S

    Defining : Two types

    1. If terminal a is on stack top, then expect to see an a ininput and consume both note no change in sententialform

    2. If variable A is on stack top, then replace it by RHS of

    any of its production rule in P note no change in inputconsumed.

    Thus:

    (q, , A) = {(q, 1), (q, 2), , (q, k)}

    Where A

    1|2||k are in P =

    VA

    Ta

  • 8/2/2019 CFG (3)

    34/58

    Push Down Automata(PDA)

    Example: Consider G

    S AS|

    A 0A1|A1|01

    PDA: M = {{q}, {0,1}, {0,1,A,S},, q, S, }: (q,, S) = {(q, AS), (q,)}

    (q,, A) = {(q, 0A1), (q,A1), (q,01)}

    (q,0, 0) = {(q, )}

    (q,1, 1) = {(q, )}

  • 8/2/2019 CFG (3)

    35/58

    Push Down Automata(PDA)

    Execution: Consider w = 011

    In G: S AS A1S 011S 011

    In M: | |

    | |

    | |

    |

    Observe: one to one correspondence between LM derivationand execution trace.

    Of course there are many execution traces possible eachcorresponding to a distinct derivations.

    Beside that, observe if two distinct execution accepts w, thereexists two distinct LM derivations and thus the grammar is

    ambiguous.

  • 8/2/2019 CFG (3)

    36/58

    Push Down Automata(PDA)

    In theorem, our construction heavily relied on the power ofnon-determinism to allow the machine to guess the correctderivation

    But in real life (or in parsers/YACC), we dont have non-

    deterministic power

    So we need to convert PDAs to some form of deterministicPDA

    Definition: DPDA is a PDA with 2 restrictions:

    a) (q, a, Z) has 1possibility

    b) If (q, , Z) is defined then for all a , (q, a, Z) isempty

  • 8/2/2019 CFG (3)

    37/58

  • 8/2/2019 CFG (3)

    38/58

    Context Free Grammars

    Idea: Identify useless symbols by removing:

    Step1: Non generating Xs

    Step2: Non reachable Xs,

    and all their productions

    Observe: must do it in this order,Example S AB|a

    A b

    Suppose we do Step2 first, all symbols are reachable so when we

    do Step1 next we eliminate B as being non-generativeBut if we do it in right order, we first eliminate B in Step1 and alsoeliminate the production S AB

    Now in Step2 we find that A is non-reachable so we eliminate A aswell.

    In general we perform both steps recursively.

  • 8/2/2019 CFG (3)

    39/58

    Context Free Grammars

    Step1: Eliminating non-generative symbols

    Basis: Label all terminals in T as generating

    Induction: For all production: X X1X2Xk, if each Xi isgenerating then X is generating.

    Terminate when no new generating symbol could be found

    Step2: Eliminate non-reachable symbols

    Basis: S is reachableInduction: For all production: X X1X2Xk if X is reachable,then label each X1, X2, X3Xi as being reachable.

  • 8/2/2019 CFG (3)

    40/58

    Context Free Grammars

    Example: S AB|AC|CD

    A BB

    B AC|ab

    C

    Ca|CCD BC|b|d

    Step1: Base: {a, b, d} is generating

    {a, b, d, A, B, D} is generating

    {a, b, d, A, B, D, S} is generatingAs C is not found to be generating, remove C and all theproduction that contain C either on LHS or RHS.

  • 8/2/2019 CFG (3)

    41/58

    Context Free Grammars

    New grammar:

    G2: S AB

    A BB

    B

    abD b|d

    Step2: Reachable?

    Base: {S} is reachable

    {S, A, B} is reachable{S, A, B, a, b} is reachable

    Remove D and all productions that contain D either in LHS orRHS

  • 8/2/2019 CFG (3)

    42/58

    Context Free Grammars

    Finally: G3: S AB

    A BB

    Bab

    Removing-moves: -moves slows down the parser

    Definition: X belonging to V is nullable if X

    Idea: Find nullable symbols recursively

    Basis: If P contains A, then label A as nullabel

    Induction: For all productions X X1X2X3Xk, if Xi is nullable,label X as nullable

    Terminate when no new symbol could be found

  • 8/2/2019 CFG (3)

    43/58

  • 8/2/2019 CFG (3)

    44/58

    Context Free Grammars

    Overall algorithm:

    a) Identify all nullable symbols

    b) Replace any prod X X1X2X3Xk by set of productions ofthe form X123k, where;

    a) i=Xi if Xi is non-nullableb) i=Xi or if Xi is nullable

    c) Remove all -productions

    So in previous example: new G2 becomes

    S ABC|AB|AC|A|BCB|BC|CB|BB|B|C|

    A aB|a

    B CC|C|b|

    C S|

    Now remove all -productions.

  • 8/2/2019 CFG (3)

    45/58

    Context Free Grammars

    Glitch: Originally S was possible, but after final step we do lose from L(G) This is unavoidable

    Removing unit productions ( e.g. A B)Algorithm: Step 1: Remove -productions

    Step 2: For all X, Y belonging to V

    if X Y and Y is not unit

    then add X

    Step 3: eliminate all unit productions

    Finding X Y

    Since no -production, X Y only if

    X Y1 Y2 Y3 Y

    With all Yi being distinct. Thus k |V|. Can use reachability in

    directed graphs

  • 8/2/2019 CFG (3)

    46/58

    Context Free Grammars

    Example: G: S A|B

    A Sa|a

    B S|b

    Algorithm: S A, S B

    B S, B AGet S Sa|a|b|S|A|B

    ASa|a

    B Sa|a|b|S|A|B

    Removing unit productionsS Sa|a|b

    A Sa|a

    B Sa|a|b

    Observe: A and B are now useless as not reachable

  • 8/2/2019 CFG (3)

    47/58

    Context Free Grammars

    Question to remove useless | -Prod|unit prod all together,does order matter?

    Observe:

    a) Removing useless stuff cannot add -Prod|unit prod

    b) Removing -Prod could add unit productionsc) Removing unit productions

    a) Need to remove -Prod first

    b) Could create useless symbols but not -Prod.

    Thus use following order:1. -Productions

    2. Unit productions (no epsilons added)

    3. Useless symbols (No productions added)

  • 8/2/2019 CFG (3)

    48/58

    Context Free Grammars

    Chomsky Normal Form:CFG G is in Chomsky Normal Form(CNF) if all its productionsare of the form

    A a

    A

    XY,Theorem: Given any CFG G1 with not in language L(G1)

    we can find CNF grammar G2 such that

    L(G1) = L(G2)

    Construction: Three step process:

    Step1: Eliminate unit productions and -productionsNow all productions are of the form

    A aA X1X2Xk, with X1, X2, Xk belongs to V U T

  • 8/2/2019 CFG (3)

    49/58

    Context Free Grammars

    Step 2: Remove mixed bodies

    For each a belonging to T add new variable Va andVa a

    In each production A X1Xk replace a by VaNow all productions are of the form

    A a

    A A1Ak with Ai belonging to V

    Step 3: Factor long productions

    For A A1A2Ak, for k 3

    Add new variables B1B2Bk-2

    C G

  • 8/2/2019 CFG (3)

    50/58

    Context Free Grammars

    Replace A A1AkBy A A1B1

    B1 A2B2

    B2 A3B3 Bk-2 Ak-1AkVerify: Get CNF grammar and Language is preserved

    Example: G1: S ABB|ab

    A Ba|ba

    B aAbB

    C F G

  • 8/2/2019 CFG (3)

    51/58

    Context Free Grammars

    Step 2: Va a

    Vb b

    S ABB|VaVb

    A BVa|VbVaB VaAVbB

    C F G

  • 8/2/2019 CFG (3)

    52/58

    Context Free Grammars

    Step 3: Va a

    Vb b

    S AX1|VaVb

    X1 BBA BVa|VbVaB VaY1Y

    1

    AY2

    Y2 VbB

    C F G

  • 8/2/2019 CFG (3)

    53/58

    Context Free Grammars

    Greibach Normal Form (GNF)

    Theorem: A CFG G is in Greibach Normal Form if everyproduction is of the form

    A

    a, where belongs to V* and a belongs to .Note: = (Allowed)

    GNF is a natural generalization of regular grammar. Inregular grammar the productions are of the form A a,

    where }{ Vanda

    C F G

  • 8/2/2019 CFG (3)

    54/58

    Context Free Grammars

    Modifying productions(assume doesnt start with V)

    Modification 1: Productions of type A B:

    For any production of the form A B, where we have

    other productions of the form B

    1|2||k, replace thisparticular A-production with

    A1|2||k

    Modification 2: Productions of the form A A:

    For productions of the form A A1|A2||Ak|1|2||m, Let Z be a new variable.Define new productions as follows:

    a) A1|2||m, A1Z|2Z||lZ

    b Z Z Z Z Z

    C t t F G

  • 8/2/2019 CFG (3)

    55/58

    Context Free Grammars

    Steps 1: Eliminate all -productions and construct a grammarG1 in Chomsky Normal form

    Rename all variables as A1, A2, A3, An, where S = A1Step 2: Apply modification 1 on productions of type Ai Aj,

    where j < iStep 3: Apply modification 2 on productions of type A A

    Step 4: Apply modification 1 on productions of type Ai Aj,where j > i

    Step 5: Modify Z productions to convert them to the form Z

    a

    ____________________________________

    Example: Convert G1 = (V,T,P,S) defined as S AA|a, A SS|b to a grammar G2 in GNF.

    Ste 1: G is alread in CNF so rename variables as A = S, A

  • 8/2/2019 CFG (3)

    56/58

    C t t F G

  • 8/2/2019 CFG (3)

    57/58

    Context Free Grammars

    Step 4: Modify A1 A2A2 using modification 1. As A2productions are A2 aA1|b|aA1Z|bZ,

    the set of modified A1 productions is:

    A1

    a|aA1A2|bA2|aA1ZA2|bZA2Step 5: Modify Z productions. Z productions are

    Z A2A1|A2A1Z

    Applying modification 1, it becomes,

    Z aA1A1|bA1|aA1ZA1|bZA1Z aA1A1Z|bA1Z|aA1ZA1Z|bZA1Z

    C t t F G

  • 8/2/2019 CFG (3)

    58/58

    Context Free Grammars

    The resulting grammar thus has following productionsrules:

    A1 a|aA1A2|bA2|aA1ZA1|bZA2

    A2

    aA1|b|aA1Z|bZZ aA1A1|bA1|aA1ZA1|bZA1Z aA1A1Z|bA1Z|aA1ZA1Z|bZA1Z