cs1622 - university of pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture07.pdf1 cs 1622...
TRANSCRIPT
1
CS 1622 Lecture 7 1
CS1622
Lecture 7Parsing (2)
CS 1622 Lecture 7 2
Reading Chapter 3
Introduction Section 3.1
CS 1622 Lecture 7 3
Parsing = determining whether a string of
tokens can be generated by a grammar Two classes based on order in which
parse tree is constructed: Top-down parsing
Start construction at root of parse tree
Bottom-up parsing Start at leaves and proceed to root
2
CS 1622 Lecture 7 4
Derivations and Parse TreesA derivation is a sequence of productionsA derivation can be drawn as a tree
Start symbol is the tree’s rootFor a production N -> X1…Xn add
X1 … Xn as children to node N
S!!!LLL
1 nXYY!L
X
1 nYYL
CS 1622 Lecture 7 5
Derivation Example S -> E + E | E * E E -> id | (E) Derivation for: id * id + id Derivation for: id + id + id
See board
CS 1622 Lecture 7 6
Notes on Derivations A parse tree has
Terminals at the leaves Non-terminals at the interior nodes
An in-order traversal of the leaves is theoriginal input
The parse tree shows the association ofoperations, the input string does not
3
CS 1622 Lecture 7 7
Terminals Terminals are called because there are no
rules for replacing them
Once generated, terminals are permanent
Terminals are the tokens of the languagerepresented by the grammar
CS 1622 Lecture 7 8
Left-most and Right-mostDerivations The example is a left-
most derivation At each step, replace the
left-most non-terminal
There is an equivalentnotion of a right-mostderivation
EE*Eid *Eid *E+E id *id+Eid*id+ id
®
®
®
®
®
CS 1622 Lecture 7 9
Right-most Derivation inDetail (1)
EEE+EE+ idE * E + idE * id + idid * id + id
®
®*®*®*®*
4
CS 1622 Lecture 7 10
Derivations and Parse Trees Note that right-most and left-most
derivations have the same parse tree
The difference is the order in whichbranches are added
CS 1622 Lecture 7 11
Summary of Derivations We are not just interested in whether
s ∈ L(G) We need a parse tree for s
A derivation defines a parse tree But one parse tree may have many derivations
Left-most and right-most derivations areimportant in parser implementation
CS 1622 Lecture 7 12
Parse treesQuestion 1:
for each of the two parse trees, find thecorresponding left-most derivation
Question 2: for each of the two parse trees, find the
corresponding right-most derivation
5
CS 1622 Lecture 7 13
Ambiguity (Cont.) A grammar is ambiguous if for some string
(the following three conditions are equivalent)
it has more than one parse tree if there is more than one right-most derivation if there is more than one left-most derivation
Ambiguity is BAD Leaves meaning of some programs ill-defined
CS 1622 Lecture 7 14
Dealing with Ambiguity There are several ways to handle ambiguity
Most direct method is to rewrite grammarunambiguousl
Enforces precedence of * over + Separate (external of grammar) conflict-resolution
rules E.g., precedence or associativity rules (e.g. via parser
tool declarations) Same idea we saw with scanning: instead of
complicated RE’s use symbol table to recognizekeywords
'''EEE | EEid E' | id | (E)!+!"
CS 1622 Lecture 7 15
Expression Grammars(precedence) Rewrite the grammar
use a different nonterminal for each precedence level start with the lowest precedence (MINUS)
E E - E | E / E | ( E ) | id
rewrite to
E E - E | TT T / T | FF id | ( E )
6
CS 1622 Lecture 7 16
Exampleparse tree for id – id / id
E E - E | TT T / T | FF id | ( E )
E
E
F F
T
-
id
/
idid
T
FT T
E
CS 1622 Lecture 7 17
More than one parse tree? Attempt to construct a parse tree for id-
id/id that shows the wrong precedence.
Question: Why do you fail to construct this parse
tree?
CS 1622 Lecture 7 18
Associativity The grammar captures operator precedence,
but it is still ambiguous! fails to express that both subtraction and division
are left associative;
e.g., 5-3-2 is equivalent to: ((5-3)-2) and not to:(5-(3-2)).
7
CS 1622 Lecture 7 19
Recursion A grammar is recursive in nonterminal X if:
X + … X … recall that + means “in one or more steps, X derives a
sequence of symbols that includes an X”
A grammar is left recursive in X if: X + X …
in one or more steps, X derives a sequence of symbolsthat starts with an X
A grammar is right recursive in X if: X + … X
in one or more steps, X derives a sequence of symbolsthat ends with an X
CS 1622 Lecture 7 20
How to fix associativity The grammar given above is both left and
right recursive in nonterminals exp and term try at home: write the derivation steps that show
this. To correctly express operator associativity:
For left associativity, use left recursion. For right associativity, use right recursion.
Here's the correct grammar:E E – T | TT T / F | FF id | ( E )
CS 1622 Lecture 7 21
Here's the correct grammar - corrected for bothprecedence and associativity
E E – T | TT T / F | FF id | ( E )
E E - E | TT T / T | F F id | ( E )
Ambiguity - has both left and right associativity
Old grammar - corrected for precedence
8
CS 1622 Lecture 7 22
Ambiguity: More thanoperator - The Dangling Else
Consider the grammar E → if E then E | if E then E else E | print
This grammar is also ambiguous
CS 1622 Lecture 7 23
The Dangling Else: Example The expression
if E1 then if E2 then E3 else E4has two parse trees
if
E1 if
E2 E3 E4
if
E1 if
E2 E3
E4
• Typically we want the second form
CS 1622 Lecture 7 24
The Dangling Else: A Fix else matches the closest unmatched then We can describe this in the grammar
S → WithElse /* all then are matched */ | LastElse /* some then are unmatched */WithElse → if E then WithElse else WithElse
| printLastElse → if E then S | if E then WithElse else LastElse
Describes the same set of strings
9
CS 1622 Lecture 7 25
The Dangling Else: ExampleRevisited The expression if E1 then if E2 then E3 else
E4if
E1 if
E2 E3 E4
if
E1 if
E2 E3
E4
• Not valid because thethen expression is nota MIF
• A valid parse tree(for a UIF)
CS 1622 Lecture 7 26
Precedence and AssociativityDeclarations Instead of rewriting the grammar
Use the more natural (ambiguous) grammar Along with disambiguating declarations
Most parser generators allow precedenceand associativity declarations todisambiguate grammars
Examples …
CS 1622 Lecture 7 27
Associativity Declarations Consider the grammar E → E - E | int Ambiguous: two parse trees of int - int - intE
E
E E
E-
int -
intint
E
E
E E
E-
int-
intint
• Left associativity declaration: %left +
10
CS 1622 Lecture 7 28
Precedence Declarations Consider the grammar E → E + E | E * E |
int And the string int + int * int
E
E
E E
E+
int *
intint
E
E
E E
E*
int+
intint
• Precedence declarations: %left + %left *
CS 1622 Lecture 7 29
Parenthetical Excursion intothe Land of Grammars Just FYI - if you are theoretically
inclined you may want to know this Relevant for the practical compiler
hacker wrt. kinds of constructs can beput into a CFG
CS 1622 Lecture 7 30
Chomsky Language Classificationbased on Grammar4 types of grammars
Type 0 grammar - unrestricted or recursive
Form of rule:α → β , α ε (N U T)*, β ε (N U T)*no restrictions on form of grammar rules
Example: aAB → aB
A → ε can have empty productions
| α | > | β |
11
CS 1622 Lecture 7 31
Type 1 grammar - context sensitive
Form of rule:
α A β → α γ β, where A ε N+
γ ε (N U T) +
| A | ≤ | γ |
Replace A by γ only if found in context of α and β
No erasure rules
CS 1622 Lecture 7 32
Type 2 grammars - context free
A → α, A ε N, α ε (N U T)*
Programming language constructs - mostly context free
but not all -
• declarations must proceed use
• number of formals and actuals agree
must check this in semantic phase
CS 1622 Lecture 7 33
Type 3 grammar - regular
rules: A → a
or A → aB, a ε T A,B, ε N
tokens defined by type 3 grammar
So lexical analysis - regular expressions - can only describe
regular languages
12
CS 1622 Lecture 7 34
End of Excursion Time to wake up again
CS 1622 Lecture 7 35
Parsing Algorithms Can parse any grammar
Earley’s algorithm - expensive O(n3) Top-down
Start from S and build input string Leftmost derivation Can be easily written by hand
Recursive descent Alternative implementation Predictive parsing
Only works for some grammars (LL(k))
CS 1622 Lecture 7 36
Bottom-up Parsing Reduce input string to S (root of tree) Rightmost derivation starting from leaves Shift-reduce: shift symbols on a stack,
reduce to lhs of production when a “handle”is identified
LR(k) parser Hard to build by hand Tools More grammars can be parsed
13
CS 1622 Lecture 7 37
Intro to Top-Down Parsing The parse tree is constructed
From the top From left to right
Terminals are seen in order ofappearance in the tokenstream:
t2 t5 t6 t8 t9
1
t2 3
4
t5
7
t6
t9
t8
CS 1622 Lecture 7 38
Recursive Descent Parsing Consider the grammar
E → T + E | T T → int | int * T | ( E )
Token stream is: int5 * int2 Start with top-level non-terminal E
Try the rules for E in order
CS 1622 Lecture 7 39
Recursive Descent Parsing.Example (Cont.) Try E0 → T1 + E2 Then try a rule for T1 → ( E3 )
But ( does not match input token int5 Try T1 → int . Token matches.
But + after T1 does not match input token * Try T1 → int * T2
This will match but + after T1 will be unmatched Has exhausted the choices for T1
Backtrack to choice for E0
14
CS 1622 Lecture 7 40
Recursive Descent Parsing.Example (Cont.) Try E0 → T1
Follow same steps as before for T1 And succeed with T1 → int * T2 and T2 → int With the following parse tree
E0
T1
int5
* T2
int2
CS 1622 Lecture 7 41
A Recursive Descent Parser.Preliminaries Let TOKEN be the type of tokens
Special tokens INT, OPEN, CLOSE,PLUS, TIMES
Let the global variable next point to thenext token
CS 1622 Lecture 7 42
A Recursive Descent Parser(2)
Define boolean functions that check thetoken string for a match of A given token terminal bool term(TOKEN tok) { return *next++ == tok; }
S -> S1 | S2 | S… | Sn A given production of S (the nth) bool Sn() { … } Any production of S: bool S() { … }
These functions advance next
15
CS 1622 Lecture 7 43
A Recursive Descent Parser(3) For production E → T bool E1() { return T(); } For production E → T + E bool E2() { return T() && term(PLUS) && E(); } For all productions of E (with backtracking)
bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }
CS 1622 Lecture 7 44
A Recursive Descent Parser(4) Functions for non-terminal Tbool T1() { return term(OPEN) && E() && term(CLOSE);
}bool T2() { return term(INT) && term(TIMES) && T(); }bool T3() { return term(INT); }
bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); }
CS 1622 Lecture 7 45
Recursive Descent Parsing.Notes. To start the parser
Initialize next to point to first token Invoke E()
Notice how this simulates our previousexample
Easy to implement by hand But does not always work …
16
CS 1622 Lecture 7 46
When Recursive DescentDoes Not Work Consider a production S → S a bool S1() { return S() && term(a); } bool S() { return S1(); } S() will get into an infinite loop
A left-recursive grammar has a non-terminalS S →+ Sα for some α
Recursive descent does not work in suchcases
CS 1622 Lecture 7 47
Elimination of Left Recursion Consider the left-recursive grammar S → S α | β
S generates all strings starting with a β andfollowed by a number of α
Can rewrite using right-recursion S → β S’ S’ → α S’ | ε
CS 1622 Lecture 7 48
More Elimination of Left-Recursion In general S → S α1 | … | S αn | β1 | … | βm
All strings derived from S start with oneof β1,…,βm and continue with severalinstances of α1,…,αn
Rewrite as S → β1 S’ | … | βm S’ S’ → α1 S’ | … | αn S’ | ε
17
CS 1622 Lecture 7 49
General Left Recursion The grammar
S → A α | δ A → S β is also left-recursive because
S →+ S β α
This left-recursion can also be eliminated - won’t go into this
CS 1622 Lecture 7 50
Summary of RecursiveDescent Simple and general parsing strategy
Left-recursion must be eliminated first … but that can be done automatically
Unpopular because of backtracking Thought to be too inefficient
In practice, backtracking is eliminated byrestricting the grammar