Download - Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen
Transparency No. 1
Lecture 3
Introduction to Parsing and
Top-Down Parsing
Cheng-Chia Chen
Transparency No. 2
Outlines
Parsing overviewLimitation of regular expressionsContext-free GrammarDerivation and Parse TreeAmbiguity of CFGsConcrete Parse Tree v.s. Abstract Syntax Tree.Recursive Descent ParsingLL(1) parsing
Transparency No. 3
Parsing Overview
What is syntax ? The way in which words are put together to form phrases,
clauses, or sentences. --- Webster’s Dictionary
The function of a parser : Input: sequence of tokens from lexer
Output: parse tree of the program
Transparency No. 4
Example
Java expr
x == y ? 1 : 2 Parser input
ID == ID ? INT : INT Parser output
ID ID
?:
==
INT
INT
Transparency No. 5
Comparison with Lexical Analysis
Phase Input Output
Lexer Sequence of characters
Sequence of tokens
Parser Sequence of tokens
Parse tree
Transparency No. 6
The Role of the Parser
Not all sequences of tokens are programs . . . . . . Parser must distinguish between valid and invalid
sequences of tokens
We need A language for describing valid sequences of tokens A method for distinguishing valid from invalid sequences
of tokens A way to construct the parse tree from the parsing
process.
Transparency No. 7
Limitation of regular expression
Recall how we define lexical structures using regular expressions.
Ex: ---- (1) digits = [0-9]+ sum = (digits “+”)* digits
match sums of the form: 28 + 301 + 9.Can we use the same way to define the syntax of a
language?
Transparency No. 8
Inadequacy of regular expressions
Consider how to use regular expressions to express sums with parentheses ? (109+235), 61, (1+ (250 + 34))
A solution: --- (2) digits = [0-9]+ sum = expr + expr expr = “(“ sum “)” | digits
Note the difference b/t (1) and (2). in (1) digits and sum can be and actually is treated as
abbreviations of their right hand side(RHS). in (2), the LHS name are used recursively at their RHS, and
cannot be treated as abbreviations.
Transparency No. 9
Inadequacy of regular expressions
To translate (1) digits = [0-9]+ sum = (digits “+”)* digits
into FA, we first translate each definition into normal regular expressions by replacing every reference of definitions by its RHS recursively: digits = [0-9]+ sum = ([0-9]+ “+”)* [0-9]+ and then apply normal procedures to translate regular
expressions into DFAs.But for definitions with recursion like (2), this approach
does not work.
Transparency No. 10
--- (2)1. digits = [0-9]+
2. sum = expr “+” expr
3. expr = “(“ sum “)” | digits Repalce sum in 3. by their RHS at 2:
expr = “(“ expr “+” expr “)” | digits ---- (4) Repace expr at (4) by itself, we get: expr = “(“ (“(“ expr “+” expr “)” | digits) “+” (“(“ expr “+” expr “)” | digits) | digits --- (5)
Conclusion: It is hopeless to eliminate names in RHS by finite
substitutions if there are direct or indirect recursions.
Transparency No. 11
It can be shown that regular expressions with non-recursive abbreviations do
not increase the expressive power of regular expressions, but ,
Regular expressions allowing recursive abbreviations do increase the expressive power of regular expressions
--- This formalism is called context-free grammar, and is just what we need for parsing.
Transparency No. 12
Redundancy induced by recursion
Inner alternation(|) is not needed: expr = ab(c|d) e ==> aux = c | d expr = a b aux e ==> aux=c aux = d expr = a b aux e
Repetition(*) is not needed expr = (a b c ) * ==> expr = (a b c) expr --- right recursion // expr = expr (a b c ) --- left recursion expr = .
Transparency No. 13
Context-Free Grammars
Programming language constructs have recursive structure
An EXPR isif EXPR then EXPR else EXPR fi , or
while ( EXPR ) EXPR , or
…Context-free grammars are a natural notation for this
recursive structure
Transparency No. 14
CFGs (Cont.)
A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set P of productions, where each rule r is of the form:
Assuming X N
X , or
X Y1 Y2 ... Yn where Yi N T
Transparency No. 15
Notational Conventions:
In these lecture notes Non-terminals are written upper-case (S, A,B,C,…) Terminals are written lower-case (a,b,c,…) X,Y,Z, … range over T U N , … ranges over strings of N U T.
The start symbol (S) is the left-hand side of the first production
Each terminal symbol corresponds to a token type from the lexer.
Each non-terminal symbol corresponds to a symbol occurring at the LHS of a production rule.
Transparency No. 16
Examples of CFGs
Expr if Expr then Expr else Expr
| while Expr do Expr
| id
Transparency No. 17
Examples of CFGs
Simple arithmetic expressions:
E E * E
| E + E
| ( E )
| id
Transparency No. 18
The Language of a CFG
Read productions as replacement rules:
X Y1 … Yn
Means X can be replaced by Y1 … Yn
X
Means X can be erased (replaced with empty string)
Transparency No. 19
Key Idea
1. Begin with a string consisting of the start symbol “S”2. Replace any non-terminal X in the string by a right-
hand side of some production
3. Repeat (2) until there are no non-terminals in the string
X Y1 … Yn
Transparency No. 20
The Language of a CFG (Cont.)
Write
X1X2 …Xi … Xn X1 … Xi-1 Y1 … Yn Xi+1…Xn
if there is a production
Xi Y1 … Yn
More formally : X if X ∃ P.∈ or define to be the binary relation { ( X ) | X if x ∃ P. } ∈ on (T U N)*.
Transparency No. 21
The Language of a CFG (Cont.)
Write X1…Xn * Y1 … Ym
if X1…Xn …… Y1 … Ym
in 0 or more stepsOr formally:
Define * to be the reflexive and transitive closure of the relation .
i.e., * iff = or ∃ 1,2,…,n (n ≥ 1) such that 1 …
n = .
Transparency No. 22
The Language of a CFG
Let G be a context-free grammar with start symbol S. Then the language L(G) of G is the set:
{ | S * where is a terminal string. }
Transparency No. 23
Terminals
Terminals are so called because there are no rules for replacing them
Once generated, terminals are permanent
Terminals ought to be token types of the language
Transparency No. 24
Examples
Strings of balanced parentheses :
Two representations of the same grammar G:
( )S S
S
( )
|
S S
( ) | 0i i i
OR
Transparency No. 25
Arithmetic Example
Simple arithmetic expressions:
Some elements of the language:
E E+E | E E | (E) | id
id id + id
(id) id id
(id) id id (id)
Transparency No. 26
Notes
The idea of a CFG is a big step. But:
Membership in a language is just “yes” or “no” We need also the parse tree of the input
Must handle errors gracefully
Need (tools for) an implementation of CFG’s.
Transparency No. 27
Derivations and Parse Trees
A derivation is a sequence :
S0, S1 , S2 , … Sn of strings over terminals and nonterminals such that
1. Si S i+1 for 0 ≤ i < n.
2. S0 = S is the start symbol.
A derivation can be shown as the drawing of a tree Start symbol is the tree’s root For a production X Y1 … Yn
add children Y1 … Yn to node X
Transparency No. 28
Derivation Example
Grammar
String
E E+E | E E | (E) | id
id id + id
Transparency No. 29
Derivation Example (Cont.)
E
E+E
E E+E
id E + E
id id + E
id id + id
E
E
E E
E+
id*
idid
Transparency No. 30
Derivation in Detail (1)
E
E
Transparency No. 31
Derivation in Detail (2)
E
E+E
E
E E+
Transparency No. 32
Derivation in Detail (3)
E E
E
E+E
E +
E
E
E E
E+
*
Transparency No. 33
Derivation in Detail (4)
E
E+E
E E+E
id E + E
E
E
E E
E+
*
id
Transparency No. 34
Derivation in Detail (5)
E
E+E
E E+E
id E +
id id +
E
E
E
E
E E
E+
*
idid
Transparency No. 35
Derivation in Detail (6)
E
E+E
E E+E
id E + E
id id + E
id id + id
E
E
E E
E+
id*
idid
Transparency No. 36
Properties of parse trees.
A parse tree has start symbol at the root terminals or empty nodes at the leaves Non-terminals at the internal nodes if internal node X has children Y1,…,Yn, then
X Y1 Y2 … Yn is a production rule.
An in-order traversal of the leaves is the original input.
The parse tree makes explicit the structure of the input string.
Transparency No. 37
Left-most and Right-most Derivations
The example is a right-most derivation At each step, replace
the right-most non-terminal
There is an equivalent notion of a left-most derivation
E
E+E
E+id
E E + id
E id + id
id id + id
Transparency No. 38
Right-most Derivation in Detail (1)
E
E
Transparency No. 39
Right-most Derivation in Detail (2)
E
E+E
E
E E+
Transparency No. 40
Right-most Derivation in Detail (3)
id
E
E+E
E+
E
E E+
id
Transparency No. 41
Right-most Derivation in Detail (4)
E
E+E
E+id
E E + id
E
E
E E
E+
id*
Transparency No. 42
Right-most Derivation in Detail (5)
E
E+E
E+id
E E
E
+ id
id + id
E
E
E E
E+
id*
id
Transparency No. 43
Right-most Derivation in Detail (6)
E
E+E
E+id
E E + id
E id + id
id id + id
E
E
E E
E+
id*
idid
Transparency No. 44
Derivations and Parse Trees
Note that right-most and left-most derivations have the same parse tree
The difference is the order in which branches are added
Transparency No. 45
Summary of Derivations
We are not just interested in whether
s (G) We need a parse tree for s
A derivation defines a parse tree But one parse tree may have many derivations
Left-most and right-most derivations are important in parser implementation in that they can each serve as the canonical derivation of a parse tree : Every parse tree has a unique left-most (and a unique right-most) derivation.
Transparency No. 46
Issues
A parser consumes a sequence of tokens s and produces a parse tree
Issues: How to recognize that s L(G) ? How to generate a parse tree of s once s L(G) Ambiguity: Is there more than one parse tree
(interpretation) for some string s ? Error handling: What should we do if no parse
tree exist for an input string.
Transparency No. 47
Ambiguity
Grammar
E E + E | E * E | ( E ) | int
String
int * int + int
Transparency No. 48
Ambiguity (Cont.)
This string has two parse trees
E
E
E E
E*
int +
intint
E
E
E E
E+
int*
intint
Transparency No. 49
Ambiguity (Cont.)
A grammar is ambiguous if it has more than one parse tree for some string Equivalently, there is more than one right-most or left-most
derivation for some stringAmbiguity is bad
Leave meaning of some programs ill-definedAmbiguity is common in programming languages
Arithmetic expressions IF-THEN-ELSE
Transparency No. 50
Dealing with Ambiguity
There are several ways to handle ambiguity
Most direct method is to rewrite the grammar unambiguously
E T + E | T
T int * T | int | ( E )
Enforces precedence of * over +
Transparency No. 51
Ambiguity: The Dangling Else
Consider the grammar E if E then E
| if E then E else E
| OTHER
This grammar is also ambiguous
Transparency No. 52
The Dangling Else: Example
The expression
if E1 then if E2 then E3 else E4
has two parse trees
if
E1 if
E2 E3 E4
if
E1 if
E2 E3
E4
• Typically we want the second form
Transparency No. 53
The Dangling Else: A Fix
else matches the closest unmatched then We can describe this in the grammar
E MIF /* all then are matched */
| UIF /* some then are unmatched */
MIF if E then MIF else MIF
| OTHER
UIF if E then E
| if E then MIF else UIF Describes the same set of strings
Transparency No. 54
The Dangling Else: Example Revisited
The expression if E1 then if E2 then E3 else E4
if(UIF)
E1 if(MIF)
E2 E3 E4
if(MIF)
E1 if(UIF)
E2 E3
E4
• Not valid because the then expression is not a MIF
• A valid parse tree (for a UIF)
Transparency No. 55
Ambiguity
No general techniques for handling ambiguity
Impossible to convert automatically an ambiguous grammar to an unambiguous one
However, if used with care, ambiguity can simplify the grammar Sometimes allows more natural definitions We need disambiguation mechanisms
Transparency No. 56
Precedence and Associativity Declarations
Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations
Most tools allow precedence and associativity declarations to disambiguate grammars
Examples …
Transparency No. 57
Associativity Declarations
Consider the grammar E E - E | int Ambiguous: two parse trees of int - int - int
E
E
E E
E-
int -
intint
E
E
E E
E -
int -
intint
• Left-associativity declaration: %left +, -
Transparency No. 58
Precedence Declarations
Consider the grammar E E + E | E * E | int And the string int + int * int
E
E
E E
E+
int *
intint
E
E
E E
E*
int+
intint• Precedence declarations: %left +, -• %left *, /
Transparency No. 59
Abstract Syntax Trees
So far a parser traces the derivation of a sequence of tokens to generate a parse tree Such a parse tree is called a concrete parse tree
since it reflects the syntax structure of the input.The rest of the compiler needs a structure more
suitable for the representation of the program Abstract syntax trees
Like parse trees but ignore some detailsAbbreviated as AST
Transparency No. 60
Abstract Syntax Tree. (Cont.)
Consider the grammar E int | ( E ) | E + E
And the string 5 + (2 + 3)
After lexical analysis (a list of tokens)
int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’
During parsing we build a parse tree …
Transparency No. 61
Example of Parse Tree
E
E E
( E )
+
E +
int5
int2
E
int3
Characteristic of the parse tree : Traces the operation of the
parser Does capture the nesting
structure But too much info
Parentheses Single-successor nodes
Transparency No. 62
Example of Abstract Syntax Tree
Also captures the nesting structureBut abstracts from the concrete syntax
=> more compact and easier to useAn important data structure in a compiler
PLUS
PLUS
2 5 3
Transparency No. 63
Constructing An AST
We first define the AST class hierarchy ASTNode IntNode , PlusNode
Consider an abstract tree type with two constructors:
new PlusNode(
T1
) =,
T2
PLUS
T1 T2
new IntNode(n) = n
Transparency No. 64
Semantic Actions (Syntax-directed definition )
This is what we’ll use to construct ASTs
Each grammar symbol may have attributes For terminal symbols (lexical tokens) attributes can be
calculated by the lexer
Each production may have an action Written as: X Y1 … Yn { action }
That can refer to or compute symbol attributes
Transparency No. 65
Constructing an AST
We define an attribute ast for non-terminals Values of ast attributes are ASTs We assume that int.lexval is the value of the integer lexeme Computed using semantic actions
E int E.ast = new IntNode(int.lexval)
| E1 + E2 E.ast = new PlusNode
(E1.ast, E2.ast)
| ( E1 ) E.ast = E1.ast
Transparency No. 66
Parse Tree Example
Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’A bottom-up evaluation of the ast attribute: E.ast = new PlusNode(new IntNode(5),
new PlusNode(new IntNode(2), new IntNode(3))
PLUS
PLUS
2 5 3
Transparency No. 67
Review
We can specify language syntax using CFGA parser will answer whether s L(G)
and will build a parse tree which we convert to an AST and pass on to the rest of the compiler
Next lectures: How do we answer s L(G) and build a parse tree?
Transparency No. 68
Introduction to Top-Down Parsing
Terminals are seen in order of appearance in the token stream:
t2 t5 t6 t8 t9
The parse tree is constructed From the top From left to right
1
t2 3
4
t5
7
t6
t9
t8
Transparency No. 69
Recursive Descent Parsing
Intuitions: for a set of productions with the same LHS X
X Y1 Y2 … Yn | … Nonterminal X => Procedure definition: X() {…} Nonterminal Y at RHS => procedure call: Y() Terminal b at RHS => match(b) : boolean concatenation X Y => sequence: X() ; Y() choice Y1 Y2 | Z1 Z2 => if ( ?) then {Y1(); Y2() } eise if (?) {Z1() ;Z2() } using one or more looking ahead symbols to help decide branch.
Ex : E T + E | T
E() { if(?) { T() ; m(+); E() } else T() ; }
Transparency No. 70
Recursive Descent Parsing. Example (Cont.)
Consider the grammar E T + E | T T int | int * T | ( E ) Start with top-level non-terminal E Token stream is: int5 * int2
Try the rules for E in orderTry E0 T1 + E2
Then try a rule for T1 ( E3 ) But ( does not match input token int5
Try T1 int . Token matches. But + after T1 does not match input token * Try T1 int * T2
This will match but + after T1 will be unmatched Have exhausted the choices for T1
Backtrack to choice for E0
Transparency No. 71
Recursive Descent Parsing. Example (Cont.)
Try E0 T1
Follow same steps as before for T1
And succeed with T1 int * T2 and T2 int
result in the following parse tree
E0
T1
int5 * T2
int2
Transparency No. 72
Recursive Descent Parsing. Notes.
Easy to implement by hand
But does not always work …
Transparency No. 1
Implementation of a Recursive Descent Parser
Transparency No. 74
A Recursive Descent Parser. Preliminaries
Let Token be the type of all token objects
public class Token {
int type ; // for token type
public final static int INT = 1,
OPENT=2, CLOSE=3, PLUS=4, TIMES=5,
… ; // constants for all token types.
… // other fields omitted
//Let the global tok point to the next token to be matched
public static Token tok;
//next() returns following token of ‘this’ from lexer.
public Token next(){…}
public static Token advance(){ tok = tok.next(); }
public static Token eat(int type){
if( tok.type = type) advance(); else error(); } … }
Transparency No. 75
A Recursive Descent Parser (2)
Define boolean functions that check the input token stream for a match of A given token type (terminal)
bool term(int type) {
if(tok.type == type){ advance(); return true;}
return false; } A given production of S (the nth rule)
bool Sn() { … } // do ‘and’ test inside the body
A NonTerminal S:
bool S() { … } // do ‘or’ test inside the body
These functions eat tokens.
Transparency No. 76
A Recursive Descent Parser (3)
For production E T + E bool E1() { return T() && term(PLUS) && E(); }
For production E T bool E2() { return T(); }
For all productions of E (with backtracking) bool E() { Token save = Token.tok;
if(E1()) return true;
// E1() fails => try next rule from stored save Token; Token.tok = save; if(E2()) return true; …
// En-1() fails => try last rule from stored save Token; Token.tok = save; return En(); }
Transparency No. 77
A Recursive Descent Parser (4)
Functions for non-terminal Tbool T1() { return term(OPEN) && E() && term(CLOSE); }
bool T2() { return term(INT) && term(TIMES) && T(); }
bool T3() { return term(INT); }
bool T() { Token save = Token.tok;
if(T1()) return true;
// E1() fails => try next rule from where save occurs.
Token.tok = save; // backtracking point
if(T2()) return true;
// T2() fails => try last rule from where save appears;
Token.tok = save;
return T3(); }
Transparency No. 78
Recursive Descent Parsing. Notes.
To start the parser 1. invoke Token.init(lexer) : static init(Lexer lexer) {…} // inside Token to set tok to first token; 2. Invoke E()
Notice how this simulates the previous backtracking example.
Easy to implement by handBut does not always work and is not efficient…
Predictive parsing (without backtracking) is better
Transparency No. 79
Recursive descent parser with lookahead
Recursive Descent Parser Unpopular because of inefficient backtracking
in practice:1.Using lookahead symbols to predict which alternative
rule to match and thus avoid some backtracking. though still unable to avoid all backtrackings!!
2.Restrict the grammar to specific form so that backtracking is not needed.
Transparency No. 80
Example
Grammar 3.11 S if E then S else S S begin S L S print E L end L ; S L E num = num
Normal procedure for S() with backtracking:bool S() { Token save = Token.tok;
if(S1()) return true; Token.tok = save;
id(S2()) return true; Token.tok = save;
return S3(); }
Observations: We can avoid unnecessary tries of S1() and S2() if we know
Token.tok is a ‘print’ token. S1() and S2() can also be expanded in-line in S(); a switch(Toekn.tok.type) {…} construct can be used to
determine which Si() is to be tried.
Transparency No. 81
The Improvement (no backtracking)
void S() { switch(Token.tok.type) { case IF : eat(IF); E(); eat(THEN); S(); eat(ELSE); S(); break;} case BEGIN: eat(BEGIN); S(); L(); break;} case PRINT: eat(PRINT); E(); break; default: error(); }}
void L() { switch(Token.tok.type) { case END: eat(END); break;} case SEMI: eat(SEMI); S(); L(); break;} default: error();}}
void E() { eat(NUM); eat(EQ); eat(NUM); }
Transparency No. 82
Try with another grammarGrammar 3.10
S E$ // $ is EOF (end-of-file marker) EE+T EE-T ET TT*F TT/F TF Fid Fnum F(E)
The translated program:
void S() {E(); eat(EOF);}
void E() { switch(Toekn.tok.type) {
case ? : E(); eat(PLUS); T();}
case ? : E(); eat(MINUS); T();}
case ? : T();}}Problem: there is no apparent terminals which E() can
use to decide the clause to proceed. In fact, All clauses are possible. Ex: (2) + 3, (2) – 3, 2 * 3.
Transparency No. 83
void T() { switch(tok.type) {
case ? : T(); eat(TIMES); F();}
case ? : T(); eat(DIV); F();}
case ? : F();}}
void F() { switch(tok.type) {
case ID: eat(ID);}
case NUM: eat(NUM);}
case LPAREN: eat(LPAREN); E(); eat(RPAREN);}}When will a predictive parser not work?
There are multiple production rules for a nonterminal with the same first terminal symbol.sometimes can be resolved by factoring out common left
parts.
Transparency No. 84
When Will Recursive Descent Not Work
Consider a production S S a: In the process of parsing S we try the above rule: boolean S() { return S() && term(a); }
or void S() { S(); eat(a);} What goes wrong?
A left-recursive grammar has a non-terminal S S + S for some
Recursive descent does not work in such cases
Transparency No. 85
Elimination of Left Recursion
Consider the left-recursive grammar S | S
S generates all strings starting with a and followed by a number of i.e., *
Can rewrite using right-recursion S S’
S’ S’ |
Transparency No. 86
More Elimination of Left-Recursion
In general
S S 1 | … | S n | 1 | … | m
All strings derived from S start with one of 1,…,m and continue with several instances of 1,…,n
i.e., (1|…|m ) (1|…|n )*
Rewrite as S 1 S’ | … | m S’
S’ 1 S’ | … | n S’ |
Transparency No. 87
General Left Recursion
The grammar S A | A S
is also left-recursive because
S + S
This left-recursion can also be eliminatedSee [Dragon book, Section 4.3] for general
algorithm.
Transparency No. 88
Summary of Recursive Descent
Simple and general parsing strategy Left-recursion must be eliminated first … but that can be done automatically
Unpopular because of backtracking Thought to be too inefficient
In practice, backtracking is eliminated by restricting the grammar
Transparency No. 89
Predictive Parsers
Table-based : encode grammar rules in a parsing table instead
of program code.Like recursive-descent but parser can “predict”
which production to use By looking at the next few tokens No backtracking
Predictive parsers accept LL(k) grammars 1st L means “left-to-right” scan of input 2nd L means “leftmost derivation” k means “predict based on at most k tokens of
lookahead”In practice, LL(1) is used.
Transparency No. 90
LL(1) Languages
In recursive-descent, for each non-terminal and input token there may be a choice of production
LL(1) means that for each non-terminal and token there is only at most one production
Can be specified as a 2D table One dimension for current non-terminal to expand One dimension for next token A table entry contains zero or one production
Ex: S if E then S else S --- (1) S begin S L --- (2) S print E --(3) Then table[S][if] = (1); table[S][begin] = (2), etc.
Transparency No. 91
Predictive Parsing and Left Factoring
Recall the grammar E T + E | T
T int | int * T | ( E )
Hard to predict because For T two productions start with int For E it is not clear how to predict
A grammar must be left-factored before use for predictive parsing
Transparency No. 92
Left-Factoring Example
Recall the grammar E T + E | T
T int | int * T | ( E )
• Factor out common prefixes of productions E T X
X + E | T ( E ) | int Y
Y * T |
Transparency No. 93
LL(1) Parsing Table Example
Left-factored grammarE T X X + E | T ( E ) | int Y Y * T |
The LL(1) parsing table:
int * + ( ) $
E T X T X
X + E T int Y ( E )
Y * T
Transparency No. 94
LL(1) Parsing Table Example (Cont.)
Consider the [E, int] entry “When current non-terminal is E and next input is int, use
production E T X This production can generate an int in the first place : T X * int …
Consider the [Y,+] entry “When current non-terminal is Y and current token is +,
get rid of Y” Y can be followed by + only in a derivation in which Y …XY+… * …X+…
Transparency No. 95
LL(1) Parsing Tables. Errors
Blank entries indicate error situations Consider the [E,*] entry “There is no way to derive a string starting with * from
non-terminal E”
Transparency No. 96
Using Parsing Tables
Method similar to recursive descent, except For each non-terminal S We look at the next token a And choose the production shown at [S,a]
We use a stack to keep track of pending non-terminalsWe reject when we encounter an error stateWe accept when the stack is empty
Transparency No. 97
LL(1) Parsing Algorithm
initialize :stack = <S $> and
tok (pointer to next token)repeat case stack of <X, rest> : if(T[X, tok.type] == Y1…Yn) then stack <Y1… Yn rest>; else error (); <t, rest> : if(t == tok.type) then{ stack <rest>;
advance(); } else error ();until stack == < >
Transparency No. 98
LL(1) Parsing Example
Stack Input Action
E $ int * int $ T X
T X $ int * int $ int Y
int Y X $ int * int $ terminal
Y X $ * int $ * T
* T X $ * int $ terminal
T X $ int $ int Y
int Y X $ int $ terminal
Y X $ $ X $ $ $ $ ACCEPT
Transparency No. 99
Constructing Parsing Tables
LL(1) languages are those with a LL(1) parsing table. No table entry can contain more than one
productions
How to generate the parsing table from a CFG ?
Transparency No. 100
Constructing Parsing Tables (Cont.)
If A , where in the line of A do we place ? i.e., table[A][??] =
1. In the column of b, where b can start a string derived from * b We say that b First()
2. In the column of b, if can reduce to and b can follow an A S * A b We say b Follow(A)
Transparency No. 101
Computing Nullable Symbols
Definition:
A nonterminal X is nullable if it can derive the empty string:
X * .
Given a grammar G : the set Nullable(G) of nullable symbols can be defined recursively as follows: 1. Basis: X is nullable if X is a rule of G. 2. Recursion: if Y1,Y2,…Yk (k > 0) are nullable and X Y1 Y2 … Yk is a
rule, then X is nullable.
Transparency No. 102
Example
Let G :
S ACA A aAa | B | C
B bB | b C cC | .
By (1) C is nullable --(3)
By (2)(3) A is nullable --(4)
By (2,3,4) S is nullable --(5).
By (1,2,3,4,5) no futher nullable symbol can be found.
Hence Nullable(G) = {A,C,S}.
Transparency No. 103
For each Nonterminal X define a number order(X) as follows: order(X) = 0 if X is a rule of G. if X Y1,…,Yk is a rule and Max{order(Yi) | I = 1,2,…k} = t , then order(X) = t + 1 if there is no other rule X Z1,…,Zm with Max{order(Zi) | I = 1,2,…m} < t.
Ex: in previous example: order(C) = 1; order(A) = 2; order(S) = 3.
Theorem: 1. X is nullable iff it order(X) is defined.
2. order(X) <= #nonterminals
Transparency No. 104
Algorithm for computing Nullable(G)
input: a CFG G; output NG : the set of all nullable symbols.
NG = {};
1. for all rules r:if r = X , then NG =NG U {X};
2. repeat : NG’ = NG;
for each rule r:
if r = X Y1 Y2 … Yk and {Y1,…Yk} NG’, then
NG = NG U {X}.
until NG = NG’ // no change of NG in the iteration. Correctness:
1. Step 1 . compute all nullables of order 0;
2. kth iteration of step 2 computes nullables of order k.
3. Since all nullables are of finite order, this program must terminate.
Transparency No. 105
Computing First Sets
Definition First(X) = { b | X * b} First(X) can be defined recursively :
1. Basis: First(b) = {b};
2. recursion: if X A1A2…An Y is a rule and
All A1…An are nullable, then First(Y) First(X). can extend First(-) to First() = { b | * b } , where is a sequecne of symbols. Then
First(Y1…Yk) = UY1…Yj-1 are nullable First(Yj) Notes : A,B,C are nonterminals; a,b,c are terminals; X,Y,Z,… are
terminal or nonterminals and are strings of terminals or nonterminals.
Transparency No. 106
First Sets. Example
Recall the grammar E T X X + E | T ( E ) | int Y Y * T |
nullable set: {X,Y}.First sets
First( ( ) = { ( } First( T ) = {int, ( }
First( ) ) = { ) } First( E ) = First(T) = {int, ( }
First( int) = { int } First( X ) = {+ }
First( + ) = { + } First( Y ) = {* }
First( * ) = { * }
Transparency No. 107
Algorithm for computing First(X)
input : a CFG G;output: First(X) for all symbols X.
1. for each terminal b; First(b) = b;
2. repeat:
for each rule X Y1Y2…Yk (k > 0)
2.1 First(X) = First(X) U First(Y1);
2.2 Let t be the index of Yt which is the first non-nullable symbol at the RHD or k+1 if all symbols are nullable.
2.3 for j = 1 to t-1
First(X) = First(X) U First(Yj);
until no First(-) changes in this iteration.
Transparency No. 108
Computing Follow Sets
Definition: Follow(X) = { b | S * X b }Intuition
If X A B then First(B) Follow(A) and Follow(X) Follow(B) Also if B * then Follow(X) Follow(A) If S is the start symbol then $ Follow(S)
Recursive definition: Basis: 1. $ Follow(S) 2. if X … Ythen First() Follow(Y) Recursion: 3. if X … A and is nullable, then Follow(X) Follow(A)
Note: 1,2 are used to compute following siblings while 3 is used to compute following cousins.
Transparency No. 109
Computing Follow Sets (Cont.)
Algorithm:
0. Follow(S) = {$}, Follow(X) = {} for X≠ S;
1. For each rule of the form X Y1Y2…Yk (k > 0)
for i = 1.. k-1
for j = i+1 .. k-1
Follow(Yi) = Follow(Yi) U First(Yj);
if Yj is not nullable then break;
2. Repeat : For each production A X1 X2 … Xn
for k = n ..1
Follow(Xk) = Follow(Xk) U Follow(A);
if (Xk is not nullable) break;
until Follow does not change in this iteration.
Transparency No. 110
Follow Sets. Example
Recall the grammar E T X X + E | T ( E ) | int Y Y * T |
Follow sets
Follow( + ) = { int, ( } Follow( * ) = { int, ( }
Follow( ( ) = { int, ( } Follow( ) ) = {+, ) , $}
Follow( int) = {*, +, ) , $}
Follow( X ) = {$, ) } Follow( T ) = {+, ) , $}
Follow( E ) = {), $} Follow( Y ) = {+, ) , $}
Transparency No. 111
Constructing LL(1) Parsing Tables
Construct a parsing table T for CFG G
For each production A in G do: For each terminal b First() do
T[A, b] = If nullable(), for each b Follow(A) do
T[A, b] =
Transparency No. 112
Notes on LL(1) Parsing Tables
If any entry contains multiple rules then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well
Most programming language grammars are not LL(1)There are tools that build LL(1) tables
Transparency No. 113
Summary
For some grammars there is a simple parsing strategy Predictive parsing
Next time: a more powerful parsing strategy