chapter 4 chang chi-chung 2007.5.17. parse tree intermediate representation the role of the parser...
Post on 22-Dec-2015
226 Views
Preview:
TRANSCRIPT
Chapter 4
Chang Chi-Chung
2007.5.17
Parse
tree
intermediate
representation
The Role of the Parser
LexicalAnalyzer ParserSource
Program
Token
Symbol Table
getNextToken
Rest of Front End
The Types of Parsers for Grammars Universal (any CFG)
Cocke-Younger-Kasimi Earley
Top-down (CFG with restrictions) Build parse trees from the root to the leaves. Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods
Bottom-up (CFG with restrictions) Build parse trees from the leaves to the root. Operator precedence parsing LR (Left-to-right, Rightmost derivation) methods
SLR, canonical LR, LALR
Representative Grammars
E → E + T | T T → T * F | F F → ( E ) | id
E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | id
E → E + E | E * E | ( E ) | id
Error Handling A good compiler should assist in identifying and
locating errors Lexical errors
important, compiler can easily recover and continue Syntax errors
most important for compiler, can almost always recover Static semantic errors
important, can sometimes recover Dynamic semantic errors
hard or impossible to detect at compile time, runtime checks are required
Logical errors hard or impossible to detect
Error Recovery Strategies
Panic mode Discard input until a token in a set of designated
synchronizing tokens is found Phrase-level recovery
Perform local correction on the input to repair the error Error productions
Augment grammar with productions for erroneous constructs
Global correction Choose a minimal sequence of changes to obtain a global
least-cost correction
Context-Free Grammar
Context-free grammar is a 4-tupleG = < T, N, P, S> where T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form
where N and (NT)*
S N is a designated start symbol
Notational Conventions Terminals
a, b, c, … T example: 0, 1, +, *, id, if
Nonterminals A, B, C, … N example: expr, term, stmt
Grammar symbols X, Y, Z (N T)
Strings of terminals u, v, w, x, y, z T*
Strings of grammar symbols (sentential form) , , (N T)*
The head of the first production is the start symbol, unless stated.
Derivations The one-step derivation is defined by
A where A is a production in the grammar
In addition, we define is leftmost lm if does not contain a nontermi
nal is rightmost rm if does not contain a nonterm
inal Transitive closure * (zero or more steps) Positive closure + (one or more steps)
Sentence and Language Sentence form
If S * in the grammar G, then is a sentential form of G Sentence
A sentential form of G has no nonterminals. Language
The language generated by G is it’s set of sentences. The language generated by G is defined by
L(G) = { w T* | S * w } A language that can be generated by a grammar is said to be a Co
ntext-Free language.
If two grammars generate the same language, the grammars are said to be equivalent.
Example G = < T, N, P, S>
T = { +, *, (, ), -, id } N = { E } P = E E + E
E E * E E ( E ) E - E E id
S = E
E lm – E lm –(E) lm –(E+E) lm –(id +E)
lm –(id + id)
Ambiguity A grammar that produces more than one pars
e tree for some sentence is said to be ambiguous.
Example id + id * id
E E + E id + E id + E * E id + id * E id + id * id
E E * E E + E * E id + E * E id + id * E id + id * id
E → E + E | E * E | ( E ) | id
Grammar Classification A grammar G is said to be
Regular (Type 3) right linear A a B or A a left linear A B a or A a
Context free (Type 2)
where N and ( N T )* Context sensitive (Type 1)
A where A N, , , (N T )*, | | > 0
Unrestricted (Type 0) where , ( N T )*,
Language Classification
The set of all languages generated by grammars G of type T L(T) = { L(G) | G is of type T }
L(regular) L(context free)
L(context sensitive) L(unrestricted)
Example ( a | b )* a b b
A0a A1 A3A2
b b
a
b
A0 aA0 | bA0 | aA1
A1 bA2
A2 bA3
A3 ε
Example L = { anbn | n 1 } is context free.
S0 Si f
path labeled ai path labeled bi
… …
path labeled aj-i
ajbi can be accepted by D, but ajbi is not in the L.
Finite automata cannot count.CFG can count two items but not three.
Writing a Predictive Parsing Grammar Eliminating Ambiguity Elimination of Left Recursion Left Factoring ( Elimination of Left Factor ) Compute FIRST and FOLLOW Two variants:
Recursive (recursive calls) Non-recursive (table-driven)
Eliminating Ambiguity Dangling-else Grammar
stmt if expr then stmt | if expr then stmt else stmt
| other
if E1 then S1 else if E2 then S2 else S3
Eliminating Ambiguity(2)if E1 then if E2 then S1 else S2
Eliminating Ambiguity(3)
Rewrite the dangling-else grammar stmt matched_stmt
| open_stmt
matched_stmt if expr then matched_stmt else matched_stmt
| other
open_stmt if expr then stmt
| if expr then matched_stmt else open_stmt
Elimination of Left Recursion Productions of the form
A A |
are left recursive Non-left-recursions
A A’ A’ A’ | ε
When one of the productions in a grammar is left recursive then a predictive parser loops forever on certain inputs
Immediate Left-Recursion Elimination Group the Productions as
A A1 | A2 | … | Am | 1 | 2 | … | n
Where no i begins with an A
Replace the A-Productions byA 1 A’ | 2 A’ | … | n A’
A’ 1 A’ | 2 A’ | … | m A’ | ε
Example Left-recursive grammar
A A | | | A
Into a right-recursive production A AR
| AR
AR AR
| AR
|
Non-Immediate Left-Recursion The Grammar
S A a | b
A A c | S d | ε The nonterminal S is left recursive, because
S A a Sda
But S is not immediately left recursive.
Elimination of Left Recursion Eliminating left recursion algorithm
Arrange the nonterminals in some order A1, A2, …, Anfor (each i from 1 to n) {
for (each j from 1 to i-1){ replace each production
Ai Ajwith
Ai 1 | 2 | … | kwhere
Aj 1 | 2 | … | k}
eliminate the immediate left recursion in Ai}
Example A B C | aB C A | A bC A B | C C | a
i = 1 nothing to do
i = 2, j = 1 B C A | A b B C A | B C b | a b(imm) B C A BR | a b BR
BR C b BR |
i = 3, j = 1 C A B | C C | a C B C B | a B | C C | a
i = 3, j = 2 C B C B | a B | C C | aC C A BR C B | a b BR C B | a B | C C | a(imm)C a b BR C B CR | a B CR | a CR
CR A BR C B CR | C CR |
Exercise
The grammarS A a | b
A A c | S d | ε
Answer A A c | A a d | b d |
Left Factoring
Left Factoring is a grammar transformation. Predictive Parsing Top-down Parsing
Replace productionsA 1 | 2 | … | n |
withA AR | AR 1 | 2 | … | n
Example
The Grammar
stmt if expr then stmt | if expr then stmt else stmt
Replace with
stmt if expr then stmt stmts
stmts else stmt | ε
Exercise
The following grammarS i E t S | i E t S e S | a
E b Answer
S i E t S S’ | a
S’ e S | ε
E b
Non-Context-Free Grammar Constructs A few syntactic constructs found in typical program
ming languages cannot be specified using grammars alone.
Checking the identifiers are declared before they are used in a program. The abstract language is L1 = { wcw | w is in (a|b)* }
aabcaab is in L1 and L1 is not CFG. C/C++ and Java does not distinguish among identifiers that
are different character strings. All identifiers are represented by a token such as id in a grammar.
In the semantic-analysis phase checks that identifiers are declared before they are used.
Top-Down Parsing LL methods and recursive-descent parsing
Left-to-right, Leftmost derivation Creating the nodes of the parse tree in preorder ( depth-
first )
GrammarE T + TT ( E )T - ET id
Leftmost derivationE lm T + T lm id + T lm id + id
E E
T
+
T
idid
E
TT
+
E
T
+
T
id
Top-down Parsing
LL
recursive-descent parsing
Predictive Parsing
Give a Grammar G E → T E’
E’ → + T E’ | ε
T → F T’
T’ → * F T’ | ε
F → ( E ) | id
FIRST and FOLLOW
S
α A
c γ
a β
c is in FIRST(A)a is in FOLLOW(A)
FIRST and FOLLOW
The constructed of both top-down and bottom-up parsers is aided by two functions, FIRST and FOLLOW, associated with a grammar G.
During top-down parsing, FIRST and FOLLOW allow us to choose which production to apply.
During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as synchronizing tokens.
FIRST FIRST()
The set of terminals that begin all strings derived from
FIRST(a) = { a } if a T FIRST() = { } FIRST(A) = A FIRST () for A P FIRST(X1X2…Xk) =
if FIRST (Xj) for all j = 1, …, i-1 thenadd non- in FIRST(Xi) to FIRST(X1X2…Xk)
if FIRST (Xj) for all j = 1, …, k thenadd to FIRST (X1X2…Xk)
FIRST(1) By definition of the FIRST, we can compute F
IRST(X) If XT, then FIRST(X) = {X}. If XN, X→, then add to FIRST(X). If XN, and X → Y1 Y2 . . . Yn, then add all non- ele
ments of FIRST(Y1) to FIRST(X), if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add to FIRST(X).
FOLLOW FOLLOW(A)
the set of terminals that can immediately follow nonterminal A
FOLLOW(A) =for all (B A ) P do add FIRST()-{} to FOLLOW(A)for all (B A ) P and FIRST() do add FOLLOW(B) to FOLLOW(A)for all (B A) P do add FOLLOW(B) to FOLLOW(A)if A is the start symbol S then add $ to FOLLOW(A)
FOLLOW(1)
By definition of the FOLLOW, we can compute FOLLOW(X) Put $ into FOLLOW(S). For each A B, add all non- elements of FIRST()
to FOLLOW(B). For each A B or A B, where FIRST(), ad
d all of FOLLOW(A) to FOLLOW(B).
Example
Give a Grammar GE → T E’
E’ → + T E’ | ε
T → F T’
T’ → * F T’ | ε
F → ( E ) | id
FIRST
E ( id
E’ + T ( id
T’ * F ( id
FOLLOW
E $ )
E’ $ )
T + $ )
T’ + $ )
F * + $ )
Recursive Descent Parsing
Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens
When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information
Procedure in Recursive-Descent Parsingvoid A() {
Choose an A-Production, AX1X2…Xk; for (i = 1 to k) {
if ( Xi is a nonterminal) call procedure Xi();
else if ( Xi = current input symbol a ) advance the input to the next symbol; else /* an error has occurred */ }}
Using FIRST and FOLLOW to Write a Recursive Descent Parser
expr term rest rest + term rest | - term rest | term id
FIRST(+ term rest) = { + }FIRST(- term rest) = { - }FOLLOW(rest) = { $ }
rest(){ if (lookahead in FIRST(+ term rest) ) { match(‘+’); term(); rest() } else if (lookahead in FIRST(- term rest) ) { match(‘-’); term(); rest() } else if (lookahead in FOLLOW(rest) ) return else error()}
LL(1)
LL(1) Grammar Predictive parsers, that is, recursive-descent parsers
needing no backtracking, can be constructed for a class of grammars called LL(1)
First “L” means the input from left to right. Second “L” means leftmost derivation. “1” for using one input symbol of lookahead at each st
ep tp make parsing action decisions. No left-recursive. No ambiguous.
LL(1) A grammar G is LL(1) if it is not left recursive
and for each collection of productionsA 1 | 2 | … | n
for nonterminal A the following holds: FIRST(i) FIRST(j) = for all i j
如果交集不是空集合,會如何? if i * then
j * for all i j
FIRST(j) FOLLOW(A) = for all i j
Example
Grammar Not LL(1) because:
S S a | a Left recursive
S a S | a FIRST(a S) FIRST(a) ={a}
S a R | R S |
For R: S * and *
S a R aR S |
For R:FIRST(S) FOLLOW(R)
Non-Recursive Predictive Parsing Table-Driven Parsing Given an LL(1) grammar G = <N, T, P, S> con
struct a table M[A,a] for A N, a T and use a driver program with a stack
Predictive parsingprogram (driver)
Parsing tableM
a + b $
X
Y
Z
$
stack
input
output
Predictive Parsing Table Algorithm
ExampleE T E’E’ + T E’ | T F T ’ T ’ * F T ’ | F ( E ) | id
id + * ( ) $
E E T E’ E T E’
E’ E’ + T E’ E’ E’
T T F T’ T F TR
T’ T’ T’ * F TR T’ T’
F F id F ( E )
A FIRST() FOLLOW(A)
E T E’ ( id $ )
E’ + T E’ + $ )
E’ $ )
T F T’ ( id + $ )
T’ * F T’ * + $ )
T’ + $ )
F ( E ) ( * + $ )
F id id * + $ )
Exercise
Give a Grammar G as belowS i E t S S’
S’ e S | ε
E b Calculate the FIRST and FOLLOW Create a predictive parsing table
Answer
Ambiguous grammar
S i E t S S’ | aS’ e S | E b
a b e i t $
S S a S i E t S SR
SR
SR SR e S
SR
E E b
A FIRST() FOLLOW(A)
S i E t S S’ i e $
S a a e $
S’ e S e e $
S’ e $
E b b t
Error: duplicate table entry
Predictive Parsing AlgorithmInitially, w$ is in the input buffer and S is on top of the stack
set ip to point to the first symbol of w;set X to the top stack symbol;while (X != $) {
if (X is a) pop the stack and advance ip; else if ( X is a terminal) error(); else if ( M[X, a] is an error entry) error(); else if ( M[X, a] = X Y1Y2…Yk ) { output the production X Y1Y2…Yk pop the stack; push Yk , Yk-1 , ….Y1 onto the stack, with Y1 on top; } set X to the top stack symbol;}
Example MATCHED STACK INPUT ACTION
E$ id + id * id$
TE’$ id + id * id$ E T E’
FT’E’$ id + id * id$ T F T ’
id T’E’$ id + id * id$ F id
id T’E’$ + id * id$ match id
id E’$ + id * id$ T ’
id +TE’$ + id * id$ E’ + T E ’
id + TE’$ id * id$ match +
id + FT’E’$ id * id$ T F T’
id + id T’E’$ id * id$ F id
id + id T’E’$ * id$ match id
id + id * FT’E’$ * id$ T’ * F T’
id + id * FT’E’$ id$ match *
id + id * id T’E’$ id$ F id
id + id * id T’E’$ $ Match id
id + id * id E’$ $ T’
id + id * id $ $ E’
Table-Driven Parsing
E T E’E’ + T E’ | T F T ’ T ’ * F T ’ | F ( E ) | id
Panic Mode Recovery
id + * ( ) $
E E T E’ E T E’ synch synch
E’ E’ + T E’ E’ v
T T F T’ synch T F TR synch synch
T’ T’ T’ * F TR T’ T’
F F id synch synch F ( E ) synch synch
FOLLOW(E) = { ) $ }FOLLOW(E’) = { ) $ }FOLLOW(T) = { + ) $ }FOLLOW(TR) = { + ) $ }FOLLOW(F) = { + * ) $ }
Add synchronizing actions toundefined entries based on FOLLOW
synch: the driver pops current nonterminal A and skips input tillsynch token or skips input until one of FIRST(A) is found
Pro: Can be automatedCons: Error messages are needed
Example STACK INPUT ACTION
E$ ) id * + id $ error, skip )
E$ id * + id $ id is in FIRST(E)
TE’$ id * + id $
FT’E’$ id * + id $
id T’E’$ id * + id $
T’E’$ * + id $
* FT’E’$ * + id $
FT’E’$ + id $ error, M[F, +] = synch
T’E’$ + id $ F has been popped
E’$ + id $
+TE’$ + id $
T’E’$ id $
FT’E’$ id $
id T’E’$ id $
T’E’$ $
E’$ $
$ $
E T E’E’ + T E’ | T F T ’ T ’ * F T ’ | F ( E ) | id
Phrase-Level Recovery
id + * ( ) $
E E T E’ E T E’ synch synch
E’ E’ + T E’ E’ E’
T T F T’ synch T F T’ synch synch
T’ insert * T’ T’ * F T’ T’ T’
F F id synch synch F ( E ) synch synch
Change input stream by inserting missing tokensFor example: id id is changed into id * id
insert *: driver inserts missing * and retries the production
Can then continue here
Pro: Can be automatedCons: Recovery not always intuitive (直覺 )
Error Productions
id + * ( ) $
E E T E’ E T E’ synch synch
E’ E’ + T E’ E’ E’
T T F T’ synch T F T’ synch synch
T’ T’ F T’ T’ T’ * F TR T’ T’
F F id synch synch F ( E ) synch synch
E T E’
E’ + T E’ | T F T’
T’ * F T’ | F ( E ) | id
Add “error production”:T’ F T’
to ignore missing *, e.g.: id id
Pro: Powerful recovery methodCons: Cannot be automated
top related