cfg (3)
TRANSCRIPT
-
8/2/2019 CFG (3)
1/58
Context Free Grammars
Grammar: Grammar is a recursive definition of Language
(Natural or Programming)
Formally: Grammar G = {V,T,P,S}
Terminals: Basically T =
Variables: Non terminal symbols that represent sets ofstrings being defined recursively
Start Symbols S: S belongs to V and is a special symbolthat generates the desired language
Production rules P: Recursive definitions
Note: T, V and P are always finite sets.
-
8/2/2019 CFG (3)
2/58
Context Free Grammars
Leq Example: The grammar: Geq = (V,T,P,S)
T = {0,1}
V = {S, A, B}
P = { S|oA|1B
A 1S|0AA
B 0S|1BB}
Notations: For set of rules
A1, A2, A3
Short cut:
A1| 2| 3
-
8/2/2019 CFG (3)
3/58
Context Free Grammars
Context Free Grammars (CFG): Are only allowed tohave production rules for substitution of the form:
A1, 2, 3, k
Where: LHS A belongs to VRHS i belongs to V U T for all i
Non Context Free Grammars: Rules might specifycontext in which rules substitution can be performed.
e.g. 0A1 0123k1
Thus rules cannot be applied in other contexts.
-
8/2/2019 CFG (3)
4/58
Context Free Grammars
Language of CFG G = {V, T, P, S} is defined as
Context Free Language is any language which has a ContextFree Grammar G
Terminology:
Sentence: Any w T* such that S => w
Sentential form: any (V U T)* such that S =>
Terminals: a, b, c
Variables: A, B, C,
Terminal Strings: , u, v, w, x, y, z
V U T: , X, Y, T
Sentential Form: , , ,
}|*{)( wSTwGL
-
8/2/2019 CFG (3)
5/58
Context Free Grammars
Reading assignment: From Textbook, (2nd edition,)Theorem 5.7 (which talks about language of palindromes)
Example: Arithmetic Expressions
G = {V, T, P, S}V = {E, I}
S = E;
T = { x, y, z, +, *, (, )}
P = { E I|(E)|E+E|E*E
I x|y|z }
Apply: E => E+E => E+E*E => I+E*E => x+E*E => x+I*E
=> x+y*E => x+y*I => x+y*z
-
8/2/2019 CFG (3)
6/58
-
8/2/2019 CFG (3)
7/58
Context Free Grammars
Left-most derivation: Always substitute the leftmostvariable with a production rule in sentential form arising incourse of a derivation.
Notation:Example: E => E+E => I+E => x+E.
Similarly: We can define right-most derivations
Example: E => E+E => E=E*E => E=E*I.
Now we can talk about canonical derivation sequenceand
-
8/2/2019 CFG (3)
8/58
-
8/2/2019 CFG (3)
9/58
Context Free Grammars
A-Tree: Any tree (or subtree) rooted at variable A.
But simply call it a parse tree if rooted at S.
Yield|Frontier of a tree is the sequence of leaves labeled
from left to right orderTheorem: is the yield of an Atree which implies and isimplied by the fact that A =>
Proof: By induction on the height of tree (See textbook)
Observe: Preceding parse tree does not specify a uniqueway to derive from A
In fact it removes the non-determinism of which rules to
apply but leaves the order of application unspecified.
-
8/2/2019 CFG (3)
10/58
Context Free Grammars
Leftmost derivation(lm) is obtained by traversing thetree in depth-first order always going into left subtreesbefore right ones
Similarly, Rightmost derivation(rm) comes from depth-
first traversal going into right subtrees first.
-
8/2/2019 CFG (3)
11/58
Context Free Grammars
Claim: Following are all equivalent statements.
For CFG G = (V, T, P, S) and string w T*
a) wL(G)
b) S =(LM)=>w
c) S =(RM)=>w
d) There exists an S-tree which yields w
Of course: we could always use leftmost derivations tospecify a canonical way to derive any wbelonging to L(G)or convert parse tree to unique derivations Thussimplifying the task of parsers.
But, what if some w belonging to L(G) has two distinctarse trees and hence two distinct leftmost derivations?
-
8/2/2019 CFG (3)
12/58
Context Free Grammars
Example: x+y*z
Note: This is not just a syntactic problem, as we get twodifferent semantic interpretations
Tree1: x+(y*z)
Tree2: (x+y)*z
-
8/2/2019 CFG (3)
13/58
Context Free Grammars
Definition: A CFG is ambiguous if for some w L(G),there exists more than one distinct parse tree.
In compilers parse trees determine interpretation and wecannot allow ambiguity.
Of course we can force use of parenthesis, but weshould really redesign the grammar to be unambiguous
by encoding precedence of operators. (See textbook forredesigning of grammars)
While above grammar can be redesigned to be
unambiguous, it is not always possible to do that.
-
8/2/2019 CFG (3)
14/58
Context Free Grammars
Definition: A CFG is called Inherently Ambiguous if allits grammars are ambiguous.
Example: L = {anbncmdm| n,m1} U {anbmcmdn|n.m1}
Consider the strings of the form akbkckdk, We can never tell whether this string came from first or
second type of strings in L and any CFG must allow bothof these possibilities.
-
8/2/2019 CFG (3)
15/58
Push Down Automata(PDA)
PDA is a class of machines corresponding to CFGs(Accepting only the Context Free Languages) useful indesigning parsers based on CFG
As we have discussed before, we must give PDA,unbounded memory to allow it to handle non-regularlanguages
However, we will restrict its access to memory!
-
8/2/2019 CFG (3)
16/58
Push Down Automata(PDA)
Setup: A PDA on transition
1. Consumes an input symbol
2. Goes to a new state(or stays in the old)
3. Replaces top of the stack by any string (does nothing,pops the stack or pushes a string onto the stack)
Push Down Automata (PDA) is essentially an -NFA with
stack
-
8/2/2019 CFG (3)
17/58
Push Down Automata(PDA)
Stack Notation
Content: (top)ABBAC(bottom)
Pop: returns A; new content BBAC
Push(XYZ): new content XYZBBAC
Transitions:Determined by:
Input or -move
Current state
Stack top
Effect:
New state
Pop
Push new string
-
8/2/2019 CFG (3)
18/58
Push Down Automata(PDA)
Formally: Push Down Automata is a seven tuplerepresented as
M = {Q, ,, , q0, 0, F}
Where: Q is finite set of states
is finite set of input alphabet
is finite set of stack alphabet
is the transition function q0 , is the start state
is the start symbol for stack, and
is the set of accepting states
*2}{:
QQ
0
QF
-
8/2/2019 CFG (3)
19/58
Push Down Automata(PDA)
Transition function: takes as argument a triple given as
Suppose we have (q, a, X) then
1. q is a state in Q
2. a is either an input symbol in or a = , the empty string whichis not to be assumed an input symbol.
3. X is the stack symbol in
Output of is finite set of pairs (pi,i) where pi is the new stateand i is the string of stack symbol that replaces X.
(q,a,X) = {(p1, 1), (p2, 2), }
*2}{:
QQ
-
8/2/2019 CFG (3)
20/58
Push Down Automata(PDA)
Action: First PDA pops stack top to determine X, reads inputto determine a (unless it is an -transition) then knowing q, a, Xit selects non-deterministically one of the possibilities of (pi,i)
Finally:
State: goes from q to piInput: scans past a (unless a = )
Stack: Loses old top symbol X but gets i pushed onto
it.Note: We thus need Z0 on stack initially to allow the firsttransition to pop the stack
Convention: String in * : x, y, z
String in *
: , ,
-
8/2/2019 CFG (3)
21/58
Push Down Automata(PDA)
Example: L = {on1n|n 1}
PDA M: = {0,1} = {X, Z0} Q = {q0, q1, q2} F = {q2}
Transitions:
(q0, 0, Z0) = {(q0, XZ0)} [On input 0 add X to stack](q0, 0, X) = {(q0, XX)} [On input 0 add X to stack]
(q0, 1, X) = {(q1, )} [On input 1 switch to q1 and
consume X](q1, 1, X) = {(q1, )} [On input 1 keep consuming Xs]
(q1, , Z0) = {(q2, )} [When Z0 is found, consume
it and move to final state q2]
-
8/2/2019 CFG (3)
22/58
Push Down Automata(PDA)
Transition Diagram :
Remarks:
1. Will reject inputs not of the format 0*1* by not havingany transition defined.
2. If too few 1s , will never go to q2
3. If too many 1s will get stuck in q2 without reaching end
of input.
-
8/2/2019 CFG (3)
23/58
Push Down Automata(PDA)
Instantaneous description(ID): Succinct notation for describing theentire configuration of PDA mid-stream in an execution
ID =
Where, q: current state
x: unread input: Stack content
Acceptance by a PDA: PDA accepts input w if there is at least onetrace of executions which leads to final state when end of input isreached
Rejection by PDA:
When no transition is possible (Stuck)
If input not over but stack is empty
If input is over but in non-final state
Of course this must happen on every track to reject w.
-
8/2/2019 CFG (3)
24/58
-
8/2/2019 CFG (3)
25/58
Push Down Automata(PDA)
Rejection in such cases: Along every execution trace one ofthe following happens
Before w is over, stack gets empty
When w is over, stack is not empty Before w is over, PDA gets stuck
Example: L = {wwr|w{a,b}*}
PDA M: Q = {q1, q2}, F =
= {a,b} = {A,B,Z0}Goal: Accept by empty stack
-
8/2/2019 CFG (3)
26/58
Push Down Automata(PDA)
Idea:
1. q0 pushes w onto stack one by one
2. Guess mid point of w and move to q1
3. In q1, match input with stack top, one by one4. At end, Z0 should be at top, so remove it to halt and
accept
Key: In step3, stack pops w in reverse order.
-
8/2/2019 CFG (3)
27/58
Push Down Automata(PDA)
Example(Contd.) Input = aabbaa
Execution trace
Accept by empty stack
Remark: Since PDA is non-deterministic, other executiontraces are possible
-
8/2/2019 CFG (3)
28/58
Push Down Automata(PDA)
Equivalence of language acceptance:
Theorem: L = L(Pf) for some PDA Pf is equivalent to somePDA
L = N(Pn) for some PDA P
nProof: Given Pf = {Q, ,, , q0, 0, F} construct M2 such that
N(M2) = L(M1)
M2 = {Q2, ,2, 2, p0, X0, }
with: Q2 = Q U {p0, p}2 = U {X0}
N in N(M) stands for null stack or empty stack
-
8/2/2019 CFG (3)
29/58
Push Down Automata(PDA)
2 : Idea
Start in p0 with X0 on stack
Move to q0 with Z0X0 on stack
Simulate Pf From any final states of Pf add transition to p which will
just empty the stack
X0: It prevents accidental acceptance by Pn when Pf empties
its stack and rejects
-
8/2/2019 CFG (3)
30/58
Push Down Automata(PDA)
The reverse: Given Pn = {Q2, ,2, 2, p0, X0, }
construct Pf such that L(Pf) = N(Pn)
Idea: Pf = {Q1, ,1, 1, p0, X0, F1}
where, Q1 = Q U {p0, pf}
1= U {X0}
F1 = {pf}
Idea:
Start with p0
with X0
in stack
Move to q0 with Z0X0 on stack
Simulate Pn
When Pn empties its stack, it exposes X0
From all states add an -move to pf whenever X0 in on top ofthe stack
-
8/2/2019 CFG (3)
31/58
Push Down Automata(PDA)
-
8/2/2019 CFG (3)
32/58
Push Down Automata(PDA)
Equivalence of CFGs and PDA
Claim: Every CFL is accepted by some PDA and every PDAaccepts some CFG
Theorem 1: If L is CFL L = N(M) for some PDA M
Proof: Suppose G is a CFG for L
Our goal: Construct a PDA M for G such that L(M) = N(M)
Idea: PDA M simulates LM derivations in G for input w such
that at any step the sentential form is represented bya) A sequence of symbols consumed from input w by M
b) Followed by contents of Ms stack
-
8/2/2019 CFG (3)
33/58
Push Down Automata(PDA)
Formally given CFG G = (V, T, P, S)
Construct PDA M = {Q, ,, , q0, 0, }
with Q = {q}, q0 = q, = T, = VUT, Z0 = S
Defining : Two types
1. If terminal a is on stack top, then expect to see an a ininput and consume both note no change in sententialform
2. If variable A is on stack top, then replace it by RHS of
any of its production rule in P note no change in inputconsumed.
Thus:
(q, , A) = {(q, 1), (q, 2), , (q, k)}
Where A
1|2||k are in P =
VA
Ta
-
8/2/2019 CFG (3)
34/58
Push Down Automata(PDA)
Example: Consider G
S AS|
A 0A1|A1|01
PDA: M = {{q}, {0,1}, {0,1,A,S},, q, S, }: (q,, S) = {(q, AS), (q,)}
(q,, A) = {(q, 0A1), (q,A1), (q,01)}
(q,0, 0) = {(q, )}
(q,1, 1) = {(q, )}
-
8/2/2019 CFG (3)
35/58
Push Down Automata(PDA)
Execution: Consider w = 011
In G: S AS A1S 011S 011
In M: | |
| |
| |
|
Observe: one to one correspondence between LM derivationand execution trace.
Of course there are many execution traces possible eachcorresponding to a distinct derivations.
Beside that, observe if two distinct execution accepts w, thereexists two distinct LM derivations and thus the grammar is
ambiguous.
-
8/2/2019 CFG (3)
36/58
Push Down Automata(PDA)
In theorem, our construction heavily relied on the power ofnon-determinism to allow the machine to guess the correctderivation
But in real life (or in parsers/YACC), we dont have non-
deterministic power
So we need to convert PDAs to some form of deterministicPDA
Definition: DPDA is a PDA with 2 restrictions:
a) (q, a, Z) has 1possibility
b) If (q, , Z) is defined then for all a , (q, a, Z) isempty
-
8/2/2019 CFG (3)
37/58
-
8/2/2019 CFG (3)
38/58
Context Free Grammars
Idea: Identify useless symbols by removing:
Step1: Non generating Xs
Step2: Non reachable Xs,
and all their productions
Observe: must do it in this order,Example S AB|a
A b
Suppose we do Step2 first, all symbols are reachable so when we
do Step1 next we eliminate B as being non-generativeBut if we do it in right order, we first eliminate B in Step1 and alsoeliminate the production S AB
Now in Step2 we find that A is non-reachable so we eliminate A aswell.
In general we perform both steps recursively.
-
8/2/2019 CFG (3)
39/58
Context Free Grammars
Step1: Eliminating non-generative symbols
Basis: Label all terminals in T as generating
Induction: For all production: X X1X2Xk, if each Xi isgenerating then X is generating.
Terminate when no new generating symbol could be found
Step2: Eliminate non-reachable symbols
Basis: S is reachableInduction: For all production: X X1X2Xk if X is reachable,then label each X1, X2, X3Xi as being reachable.
-
8/2/2019 CFG (3)
40/58
Context Free Grammars
Example: S AB|AC|CD
A BB
B AC|ab
C
Ca|CCD BC|b|d
Step1: Base: {a, b, d} is generating
{a, b, d, A, B, D} is generating
{a, b, d, A, B, D, S} is generatingAs C is not found to be generating, remove C and all theproduction that contain C either on LHS or RHS.
-
8/2/2019 CFG (3)
41/58
Context Free Grammars
New grammar:
G2: S AB
A BB
B
abD b|d
Step2: Reachable?
Base: {S} is reachable
{S, A, B} is reachable{S, A, B, a, b} is reachable
Remove D and all productions that contain D either in LHS orRHS
-
8/2/2019 CFG (3)
42/58
Context Free Grammars
Finally: G3: S AB
A BB
Bab
Removing-moves: -moves slows down the parser
Definition: X belonging to V is nullable if X
Idea: Find nullable symbols recursively
Basis: If P contains A, then label A as nullabel
Induction: For all productions X X1X2X3Xk, if Xi is nullable,label X as nullable
Terminate when no new symbol could be found
-
8/2/2019 CFG (3)
43/58
-
8/2/2019 CFG (3)
44/58
Context Free Grammars
Overall algorithm:
a) Identify all nullable symbols
b) Replace any prod X X1X2X3Xk by set of productions ofthe form X123k, where;
a) i=Xi if Xi is non-nullableb) i=Xi or if Xi is nullable
c) Remove all -productions
So in previous example: new G2 becomes
S ABC|AB|AC|A|BCB|BC|CB|BB|B|C|
A aB|a
B CC|C|b|
C S|
Now remove all -productions.
-
8/2/2019 CFG (3)
45/58
Context Free Grammars
Glitch: Originally S was possible, but after final step we do lose from L(G) This is unavoidable
Removing unit productions ( e.g. A B)Algorithm: Step 1: Remove -productions
Step 2: For all X, Y belonging to V
if X Y and Y is not unit
then add X
Step 3: eliminate all unit productions
Finding X Y
Since no -production, X Y only if
X Y1 Y2 Y3 Y
With all Yi being distinct. Thus k |V|. Can use reachability in
directed graphs
-
8/2/2019 CFG (3)
46/58
Context Free Grammars
Example: G: S A|B
A Sa|a
B S|b
Algorithm: S A, S B
B S, B AGet S Sa|a|b|S|A|B
ASa|a
B Sa|a|b|S|A|B
Removing unit productionsS Sa|a|b
A Sa|a
B Sa|a|b
Observe: A and B are now useless as not reachable
-
8/2/2019 CFG (3)
47/58
Context Free Grammars
Question to remove useless | -Prod|unit prod all together,does order matter?
Observe:
a) Removing useless stuff cannot add -Prod|unit prod
b) Removing -Prod could add unit productionsc) Removing unit productions
a) Need to remove -Prod first
b) Could create useless symbols but not -Prod.
Thus use following order:1. -Productions
2. Unit productions (no epsilons added)
3. Useless symbols (No productions added)
-
8/2/2019 CFG (3)
48/58
Context Free Grammars
Chomsky Normal Form:CFG G is in Chomsky Normal Form(CNF) if all its productionsare of the form
A a
A
XY,Theorem: Given any CFG G1 with not in language L(G1)
we can find CNF grammar G2 such that
L(G1) = L(G2)
Construction: Three step process:
Step1: Eliminate unit productions and -productionsNow all productions are of the form
A aA X1X2Xk, with X1, X2, Xk belongs to V U T
-
8/2/2019 CFG (3)
49/58
Context Free Grammars
Step 2: Remove mixed bodies
For each a belonging to T add new variable Va andVa a
In each production A X1Xk replace a by VaNow all productions are of the form
A a
A A1Ak with Ai belonging to V
Step 3: Factor long productions
For A A1A2Ak, for k 3
Add new variables B1B2Bk-2
C G
-
8/2/2019 CFG (3)
50/58
Context Free Grammars
Replace A A1AkBy A A1B1
B1 A2B2
B2 A3B3 Bk-2 Ak-1AkVerify: Get CNF grammar and Language is preserved
Example: G1: S ABB|ab
A Ba|ba
B aAbB
C F G
-
8/2/2019 CFG (3)
51/58
Context Free Grammars
Step 2: Va a
Vb b
S ABB|VaVb
A BVa|VbVaB VaAVbB
C F G
-
8/2/2019 CFG (3)
52/58
Context Free Grammars
Step 3: Va a
Vb b
S AX1|VaVb
X1 BBA BVa|VbVaB VaY1Y
1
AY2
Y2 VbB
C F G
-
8/2/2019 CFG (3)
53/58
Context Free Grammars
Greibach Normal Form (GNF)
Theorem: A CFG G is in Greibach Normal Form if everyproduction is of the form
A
a, where belongs to V* and a belongs to .Note: = (Allowed)
GNF is a natural generalization of regular grammar. Inregular grammar the productions are of the form A a,
where }{ Vanda
C F G
-
8/2/2019 CFG (3)
54/58
Context Free Grammars
Modifying productions(assume doesnt start with V)
Modification 1: Productions of type A B:
For any production of the form A B, where we have
other productions of the form B
1|2||k, replace thisparticular A-production with
A1|2||k
Modification 2: Productions of the form A A:
For productions of the form A A1|A2||Ak|1|2||m, Let Z be a new variable.Define new productions as follows:
a) A1|2||m, A1Z|2Z||lZ
b Z Z Z Z Z
C t t F G
-
8/2/2019 CFG (3)
55/58
Context Free Grammars
Steps 1: Eliminate all -productions and construct a grammarG1 in Chomsky Normal form
Rename all variables as A1, A2, A3, An, where S = A1Step 2: Apply modification 1 on productions of type Ai Aj,
where j < iStep 3: Apply modification 2 on productions of type A A
Step 4: Apply modification 1 on productions of type Ai Aj,where j > i
Step 5: Modify Z productions to convert them to the form Z
a
____________________________________
Example: Convert G1 = (V,T,P,S) defined as S AA|a, A SS|b to a grammar G2 in GNF.
Ste 1: G is alread in CNF so rename variables as A = S, A
-
8/2/2019 CFG (3)
56/58
C t t F G
-
8/2/2019 CFG (3)
57/58
Context Free Grammars
Step 4: Modify A1 A2A2 using modification 1. As A2productions are A2 aA1|b|aA1Z|bZ,
the set of modified A1 productions is:
A1
a|aA1A2|bA2|aA1ZA2|bZA2Step 5: Modify Z productions. Z productions are
Z A2A1|A2A1Z
Applying modification 1, it becomes,
Z aA1A1|bA1|aA1ZA1|bZA1Z aA1A1Z|bA1Z|aA1ZA1Z|bZA1Z
C t t F G
-
8/2/2019 CFG (3)
58/58
Context Free Grammars
The resulting grammar thus has following productionsrules:
A1 a|aA1A2|bA2|aA1ZA1|bZA2
A2
aA1|b|aA1Z|bZZ aA1A1|bA1|aA1ZA1|bZA1Z aA1A1Z|bA1Z|aA1ZA1Z|bZA1Z