compiler construction in4020 – lecture 3
DESCRIPTION
Compiler construction in4020 – lecture 3. Koen Langendoen Delft University of Technology The Netherlands. program text. token description. scanner generator. lexical analysis. tokens. syntax analysis. AST. context handling. annotated AST. Summary of lecture 2. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/1.jpg)
Compiler constructionin4020 – lecture 3
Koen Langendoen
Delft University of TechnologyThe Netherlands
![Page 2: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/2.jpg)
Summary of lecture 2
• lexical analyzer generator• description FSA
• FSA construction• dotted items
• character moves
• moves
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
scanner
generator
token
description
![Page 3: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/3.jpg)
Quiz
2.24 Will the function identify() (to access a symbol table) still work if the hash function maps all identifiers onto the same number?
2.26 Tutor X insists that macro processing must be implemented as a separate phase between reading the program and the lexical analysis. Show mr. X wrong.
![Page 4: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/4.jpg)
• syntax analysis: tokens AST
• AST construction• by hand: recursive descent
• automatic: top-down (LLgen), bottom-up (yacc)
Overview
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
parser
generator
language
grammar
![Page 5: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/5.jpg)
Syntax analysis
• parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together.
• result: parse tree
‘b’
identifier
expression
term
factor
term
‘b’
factor
identifier
‘4’
constant
term
factor
term
‘a’
factor
identifier
term
factor
identifier
expression
‘c’‘*’
‘-’
‘*’
‘*’
syn-tax: the way in which words are put together to form phrases, clauses, or sentences.
Webster’s Dictionary
![Page 6: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/6.jpg)
Syntax analysis
• parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together.
• result: parse tree
‘b’
identifier
expression
term
factor
term
‘b’
factor
identifier
‘4’
constant
term
factor
term
‘a’
factor
identifier
term
factor
identifier
expression
‘c’‘*’
‘-’
‘*’
‘*’
![Page 7: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/7.jpg)
Context free grammar
• G = (VN, VT, S, P)
• VN : set of Non-terminal symbols
• VT : set of Terminal symbols
• S : Start symbol (S VN)
• P : set of Production rules {N }
• VN VT =
• P = {N | N VN (VN VT)*}
![Page 8: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/8.jpg)
Top-down parsing
• process tokens left to right
• expression grammarinput expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |
• example expression
input
expression EOF
term rest_expression
IDENTIFIER
aap + ( noot + mies )
![Page 9: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/9.jpg)
rest_expression
expression
rest_exprterm
Bottom-up parsing
• process tokens left to right
• expression grammarinput expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |
IDENT
• • •
IDENT
• •
IDENT
• •aap + ( noot + mies )
![Page 10: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/10.jpg)
Comparison
top-down bottom-up
node creation pre-order post-order
alternative selection first token last token
grammar type restricted
LL(1)
relaxed
LR(1)
implementation manual + automatic
automatic
5
1 4
2 3
1
2 3
4 5
![Page 11: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/11.jpg)
Recursive descent parsing
• each rule N translates to a boolean function• return true if a terminal production of N was matched
• return false otherwise (without consuming any token)
• try alternatives of N in turn
• a terminal symbol must match the current token
• a non-terminal is matched by calling its routine
input expression EOF
int input(void) {
return expression() && require(token(EOF));
}
![Page 12: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/12.jpg)
Recursive descent parsing
expression term rest_expression
int expression(void) {
return term() && require(rest_expression());
}
term IDENTIFIER | ‘(’ expression ‘)’
int term(void) {
return token(IDENTIFIER) ||
token('(') && require(expression()) && require(token(')'));
}
![Page 13: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/13.jpg)
Recursive descent parsing
auxiliary functions• consume matched tokens
• report syntax errors
int token(int tk) {
if (Token.class != tk) return 0;
get_next_token(); return 1;
}
int require(int found) {
if (!found) error();
return 1;
}
rest_expression ‘+’ expression |
int rest_expression(void) {
return token('+') && require(expression()) || 1;
}
![Page 14: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/14.jpg)
Automatic top-down parsing
• follow recursive descent scheme, but avoid interpretation overhead
• for each rule and alternative determine the tokens it can start with: FIRST set
• parsing scheme for rule N A1 | A2 | …• if token FIRST(N) then ERROR
• if token FIRST(A1) then parse A1
• if token FIRST(A2) then parse A2
• …
![Page 15: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/15.jpg)
Exercise (7 min.)
• design an algorithm to compute the FIRST sets of all non-terminals in a context free grammar.
• hint: consider the types of rules• alternatives
• composition
• empty productions
input expression EOFexpression term rest_expressionterm IDENTIFIER | ‘(’ expression ‘)’rest_expression ‘+’ expression |
![Page 16: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/16.jpg)
Answers
![Page 17: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/17.jpg)
Answers (Fig 2.58, page 122)
• N w (w VT )
FIRST(N) = {w}
• N A1 | A2 | …
FIRST(N) = FIRST(Ai )
• N FIRST(N) = {}
• N A FIRST(N) = FIRST(A ) , FIRST(A)
FIRST(N) = FIRST(A ) \ {} FIRST() , otherwise
closure algorithm
![Page 18: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/18.jpg)
Break
![Page 19: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/19.jpg)
Predictive parsing
• similar to recursive descent, but no back-tracking
• functions “know” what they are doing
input expression EOF FIRST(expression) = {IDENT, ‘(‘}
void input(void) { switch (Token.class) { case IDENT: case '(': expression(); token(EOF); break; default: error();
}
}
void token(int tk) {
if (Token.class != tk) error();
get_next_token();
}
![Page 20: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/20.jpg)
Predictive parsing
expression term rest_expression FIRST(term) = {IDENT, ‘(‘}
void expression(void) { switch (Token.class) { case IDENT: case '(': term(); rest_expression(); break; default: error();
}
}
term IDENTIFIER | ‘(’ expression ‘)’void term(void) { switch (Token.class) { case IDENT: token(IDENT); break; case '(': token('('); expression(); token(')'); break; default: error();
}
}
![Page 21: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/21.jpg)
Predictive parsing
• FIRST() = {}• check nothing?
• NO: token FOLLOW(rest_expr)
rest_expression ‘+’ expression | FIRST(rest_expr) = {‘+’, }void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error();
}
}
FOLLOW(rest_expr) = {EOF, ‘)’}
![Page 22: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/22.jpg)
Limitations of LL(1) parsers
• FIRST/FIRST conflictterm IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ | ‘(’ expression ‘)’
• FIRST/FOLLOW conflictS A ‘a’ ‘b’
A ‘a’ |
• left recursionexpression expression ‘-’ term | term
FIRST(A) = { ‘a’ } = FOLLOW(A)
![Page 23: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/23.jpg)
Making grammars LL(1)
• manual labour• rewrite grammar
• adjust semantic actions
• three rewrite methods• left factoring
• substitution
• left-recursion removal
![Page 24: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/24.jpg)
Left factoring
term IDENTIFIER
| IDENTIFIER ‘[‘ expression ‘]’
• factor out common prefix
term IDENTIFIER after_identifier
after_identifier | ‘[‘ expression ‘]’
‘[’ FOLLOW(after_identifier)
![Page 25: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/25.jpg)
Substitution
S A ‘a’ ‘b’
A ‘a’ |
• replace non-terminal by its alternative
S ‘a’ ‘a’ ‘b’ | ‘a’ ‘b’
![Page 26: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/26.jpg)
Left-recursion removal
N N |
• replace by
N M
M M |
• example
expression expression ‘-’ term | term
...
expression term tail
tail ‘-’ term tail |
N
![Page 27: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/27.jpg)
Exercise (7 min.)
• make the following grammar LL(1)
expression expression ‘+’ term | expression ‘-’ term | termterm term ‘*’ factor | term ‘/’ factor | factorfactor ‘(‘ expression ‘)’ | func-call | identifier | constantfunc-call identifier ‘(‘ expr-list? ‘)’expr-list expression (‘,’ expression)*
• and what about
S if E then S (else S)?
![Page 28: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/28.jpg)
Answers
![Page 29: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/29.jpg)
Answers
• substitutionF ‘(‘ E ‘)’ | ID ‘(‘ expr-list? ‘)’ | ID | constant
• left factoringE E ( ‘+’ | ‘-’ ) T | TT T ( ‘*’ | ‘/’ ) F | FF ‘(‘ E ‘)’ | ID ( ‘(‘ expr-list? ‘)’ )? | constant
• left recursion removalE T (( ‘+’ | ‘-’ ) T )*T F (( ‘*’ | ‘/’ ) F )*
• if-then-else grammar is ambiguous
![Page 30: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/30.jpg)
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
parser
generator
language
grammar
automatic generation
LL(1) push-down
automaton
![Page 31: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/31.jpg)
LL(1) push-down automaton
• stack right-hand side of production
transition tablestate
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
![Page 32: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/32.jpg)
LL(1) push-down automaton
aap + ( noot + mies ) EOF
input
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
![Page 33: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/33.jpg)
LL(1) push-down automaton
aap + ( noot + mies ) EOF
input
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
replace non-terminal by transition entry
![Page 34: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/34.jpg)
LL(1) push-down automaton
aap + ( noot + mies ) EOF
expression EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
![Page 35: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/35.jpg)
LL(1) push-down automaton
aap + ( noot + mies ) EOF
expression EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
replace non-terminal by transition entry
![Page 36: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/36.jpg)
LL(1) push-down automaton
aap + ( noot + mies ) EOF
term rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
![Page 37: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/37.jpg)
LL(1) push-down automaton
aap + ( noot + mies ) EOF
term rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
replace non-terminal by transition entry
![Page 38: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/38.jpg)
LL(1) push-down automaton
aap + ( noot + mies ) EOF
IDENT rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
![Page 39: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/39.jpg)
LL(1) push-down automaton
aap + ( noot + mies ) EOF
IDENT rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
pop matching token
![Page 40: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/40.jpg)
LL(1) push-down automaton
+ ( noot + mies ) EOF
rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
![Page 41: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/41.jpg)
LL(1) push-down automaton
+ ( noot + mies ) EOF
rest-expr EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
![Page 42: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/42.jpg)
LL(1) push-down automaton
+ ( noot + mies ) EOF
+ expression EOF
input
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
![Page 43: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/43.jpg)
LL(1) push-down automaton
+ ( noot + mies ) EOFinput
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
+ expression EOF
![Page 44: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/44.jpg)
LL(1) push-down automaton
( noot + mies ) EOFinput
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
expression EOF
![Page 45: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/45.jpg)
LL(1) push-down automaton
( noot + mies ) EOFinput
prediction stack
state
(top of stack)
look-ahead token
IDENT + ( ) EOF
input expression EOF expression EOF
expression term rest-expr term rest-expr
term IDENT ( expression )
rest-expr + expression
expression EOF
![Page 46: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/46.jpg)
LLgen
• top-down parser generator
• to be used in assignment #1
• discussed in lecture 5
![Page 47: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/47.jpg)
• syntax analysis: tokens AST
• top-down parsing• recursive descent• push-down automaton• making grammars LL(1)
Summary
program text
lexical analysis
syntax analysis
context handling
annotated AST
tokens
AST
parser
generator
language
grammar
![Page 48: Compiler construction in4020 – lecture 3](https://reader036.vdocuments.us/reader036/viewer/2022062314/56814887550346895db59a4f/html5/thumbnails/48.jpg)
Homework
• study sections:• 1.10 closure algorithm
• 2.2.4.6 error handling in LL(1) parsers
• print handout for next week [blackboard]
• find a partner for the “practicum”
• register your group• send e-mail to [email protected]