ch2.1 cse244 chapter 2: a simple one pass compiler aggelos kiayias computer science &...
Post on 22-Dec-2015
225 views
TRANSCRIPT
CH2.1
CSE244
Chapter 2: A Simple One Pass Chapter 2: A Simple One Pass CompilerCompiler
Aggelos KiayiasComputer Science & Engineering Department
The University of Connecticut371 Fairfield Road, Box U-1155
Storrs, CT 06269
[email protected]://www.cse.uconn.edu/~akiayias
CH2.2
CSE244
The Entire Compilation Process
Grammars for Syntax DefinitionGrammars for Syntax Definition Syntax-Directed TranslationSyntax-Directed Translation Parsing - Top Down & PredictiveParsing - Top Down & Predictive Pulling Together the PiecesPulling Together the Pieces The Lexical Analysis ProcessThe Lexical Analysis Process Symbol Table ConsiderationsSymbol Table Considerations A Brief Look at Code GenerationA Brief Look at Code Generation Concluding Remarks/Looking AheadConcluding Remarks/Looking Ahead
CH2.3
CSE244
Grammars for Syntax DefinitionGrammars for Syntax Definition
A A Context-free GrammarContext-free Grammar ( (CFGCFG) Is Utilized to ) Is Utilized to Describe the Syntactic Structure of a LanguageDescribe the Syntactic Structure of a Language
A CFG Is Characterized By:A CFG Is Characterized By: 1. A Set of Tokens or Terminal Symbols 2. A Set of Non-terminals 3. A Set of Production Rules
Each Rule Has the Form
NT {T, NT}* 4. A Non-terminal Designated As the Start
Symbol
CH2.4
CSE244
Grammars for Syntax DefinitionGrammars for Syntax DefinitionExample CFGExample CFG
list list + digit
list list - digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
(the “|” means OR)
(So we could have written
list list + digit | list - digit | digit )
CH2.5
CSE244
Grammars are Used to Derive Strings:
Using the CFG defined on the previous slide, we can derive the string: 9 - 5 + 2 as follows:
list list + digit
list - digit + digit
digit - digit + digit
9 - digit + digit
9 - 5 + digit
9 - 5 + 2
P1 : list list + digit
P2 : list list - digit
P3 : list digit
P4 : digit 9
P4 : digit 5
P4 : digit 2
CH2.6
CSE244
Grammars are Used to Derive Strings:
This derivation could also be represented via a Parse Tree
(parents on left, children on right)
list
digit
digit
list
digit
list
9
5
2-
+
list list + digit
list - digit + digit
digit - digit + digit
9 - digit + digit
9 - 5 + digit
9 - 5 + 2
CH2.7
CSE244
A More Complex Grammar
What is this grammar for ?What does “” represent ?What kind of production rule is this ?
block begin opt_stmts end
opt_stmts stmt_list |
stmt_list stmt_list ; stmt | stmt
CH2.8
CSE244
Defining a Parse Tree
More Formally, a Parse Tree for a CFG Has the More Formally, a Parse Tree for a CFG Has the Following Properties:Following Properties: Root Is Labeled With the Start Symbol Leaf Node Is a Token or Interior Node (Now Leaf) Is a Non-Terminal If A x1x2…xn, Then A Is an Interior;
x1x2…xn Are Children of A and May Be Non-Terminals or Tokens
CH2.9
CSE244
Other Important Concepts Ambiguity
string string
string string
string
+
2-
59
Why is this a Problem ?
Grammar:
string string + string | string – string | 0 | 1 | …| 9
Two derivations (Parse Trees) for the same token string.
stringstring
stringstring
string-
9 +
5 2
CH2.10
CSE244
Other Important Concepts Associativity of Operators
Left vs. Right
right
letter
letter
right
letter
right
c
b
a =
=
right letter = right | letter
letter a | b | c | …| z
list
digit
digit
list
digit
list
9
5
2-
+
list list + digit |
| list - digit | digit
digit 0 | 1 | 2 | …| 9
CH2.11
CSE244
Embedding AssociativityEmbedding Associativity
The language of arithmetic expressions with + -The language of arithmetic expressions with + - (ambiguous) grammar that does not enforce
associativitystring string + string | string – string | 0 | 1 | …| 9
non-ambiguous grammar enforcing left associativity (parse tree will grow to the left)
string string + digit | string - digit | digit
digit 0 | 1 | 2 | …| 9
non-ambiguous grammar enforcing right associativity (parse tree will grow to the right)
string digit + string | digit - string | digit
digit 0 | 1 | 2 | …| 9
CH2.12
CSE244
Other Important Concepts Operator Precedence
What does
9 + 5 * 2 mean?
Typically( )
* /+ -
is precedence order
This can be
incorporated
into a grammar
via rules:
expr expr + term | expr – term | term
term term * factor | term / factor | factor
factor digit | ( expr )
digit 0 | 1 | 2 | 3 | … | 9
Precedemce Achieved by: expr & term for each precedence level
Rules for each are left recursive or associate to the left
CH2.13
CSE244
Syntax-Directed Translation
Associate Attributes With Grammar Rules & Constructs and Translate As Parsing Occurs
The translation will follow the parse tree structure (and as a result the structure and form of the parse tree will affect the translation).
First example: Inductive Translation. Infix to Postfix Notation Translation for Expressions Translation defined inductively As: Postfix(E) where E is
an Expression.
1. If E is a variable or constant then Postfix(E) = E
2. If E is E1 op E2 then Postfix(E)
= Postfix(E1 op E2) = Postfix(E1) Postfix(E2) op
3. If E is (E1) then Postfix(E) = Postfix(E1)
Rules
CH2.14
CSE244
ExamplesExamples
Postfix( ( 9 – 5 ) + 2 ) = Postfix( ( 9 – 5 ) ) Postfix( 2 ) + = Postfix( 9 – 5 ) Postfix( 2 ) + = Postfix( 9 ) Postfix( 5 ) - Postfix( 2 ) + = 9 5 – 2 +
Postfix(9 – ( 5 + 2 ) ) = Postfix( 9 ) Postfix( ( 5 + 2 ) ) - = Postfix( 9 ) Postfix( 5 + 2 ) – = Postfix( 9 ) Postfix( 5 ) Postfix( 2 ) + – = 9 5 2 + –
CH2.15
CSE244
Syntax-Directed Definition
Each Production Has a Set of Semantic Rules
Each Grammar Symbol Has a Set of Attributes
For the Following Example, String Attribute “t” is Associated With Each Grammar Symbol
recall: What is a Derivation for 9 + 5 - 2?
expr expr – term | expr + term | term
term 0 | 1 | 2 | 3 | … | 9
list list - digit list + digit - digit digit + digit - digit 9 + digit - digit 9 + 5 - digit 9 + 5 - 2
CH2.16
CSE244
Syntax-Directed Definition (2))
Each Production Rule of the CFG Has a Semantic Each Production Rule of the CFG Has a Semantic RuleRule
NoteNote: Semantic Rules for : Semantic Rules for exprexpr define define tt as a as a “synthesized attribute” i.e., the various copies of “synthesized attribute” i.e., the various copies of tt obtain their values from “children obtain their values from “children tt’s”’s”
Production Semantic Ruleexpr expr + term expr.t := expr.t || term.t || ‘+’
expr expr – term expr.t := expr.t || term.t || ’-’
expr term expr.t := term.tterm 0 term.t := ‘0’
term 1 term.t := ‘1’…. ….term 9 term.t := ‘9’
CH2.17
CSE244
Semantic Rules are Embedded in Parse Tree
expr.t =95-
expr.t =9
expr.t =95-2+
term.t =5
term.t =2
term.t =9
2+5-9 How Do Semantic Rules Work ? What Type of Tree Traversal is Being
Performed? How Can We More Closely Associate Semantic
Rules With Production Rules ?
CH2.18
CSE244
Translation SchemesEmbed Semantic Actions into the right sides of the productions.
expr expr + term {print(‘+’)}
expr - term {print(‘-’)}
term
term 0 {print(‘0’)}
term 1 {print(‘1’)}
…
term 9 {print(‘9’)}
term
term
termexpr
expr
expr
9
5
2-
+
{print(‘-’)}
{print(‘9’)}
{print(‘5’)}
{print(‘2’)}
{print(‘+’)}
CH2.19
CSE244
Parsing – Top-Down & Predictive
Top-Down ParsingTop-Down Parsing Parse tree / derivation of a Parse tree / derivation of a token string occurs in a token string occurs in a top down fashion.top down fashion.
For Example, Consider:For Example, Consider:
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
Suppose input is :
array [ num dotdot num ] of integer
Parsing would begin with
type ???
Start symbol
CH2.20
CSE244
Top-Down Parse (type = start symbol)Top-Down Parse (type = start symbol)
type]simple of[array
type
type]simple of[array
type
numnum dotdot
Input : array [ num dotdot num ] of integer
Lookahead symbol
type
?
Input : array [ num dotdot num ] of integer
Lookahead symbol
CH2.21
CSE244
Top-Down Parse (type = start symbol)Top-Down Parse (type = start symbol)
Input : array [ num dotdot num ] of integer
type]simple of[array
type
numnum dotdot simple
type]simple of[array
type
numnum dotdot simple
integer
Lookahead symbol
CH2.22
CSE244
Top-Down Process Recursive Descent or Predictive Parsing Parser Operates by Attempting to Match Tokens in
the Input Stream Utilize both Grammar and Input Below to Motivate
Code for Algorithm
array [ num dotdot num ] of integer
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
procedure match ( t : token ) ;
begin if lookahead = t then lookahead : = nexttoken else errorend ;
CH2.23
CSE244
Top-Down Algorithm (Continued)Top-Down Algorithm (Continued)
procedure type ;begin if lookahead is in { integer, char, num } then simple else if lookahead = ‘’ then begin match (‘’ ) ; match( id ) end else if lookahead = array then begin match( array ); match(‘[‘); simple; match(‘]’); match(of); type end else errorend ;procedure simple ;begin if lookahead = integer then match ( integer ); else if lookahead = char then match ( char ); else if lookahead = num then begin match (num); match (dotdot); match (num) end else errorend ;
CH2.24
CSE244
TracingTracing
Input: array [ num dotdot num ] of integerTo initialize the parser:set global variable : lookahead = arraycall procedure: type
Procedure call to type with lookahead = array results in the actions:match( array ); match(‘[‘); simple; match(‘]’); match(of); type
Procedure call to simple with lookahead = num results in the actions:match (num); match (dotdot); match (num)
Procedure call to type with lookahead = integer results in the actions:simple
Procedure call to simple with lookahead = integer results in the actions:match ( integer )
CH2.25
CSE244
LimitationsLimitations
Can we apply the previous technique to every Can we apply the previous technique to every grammar?grammar?
NO:NO:
type simple
| array [ simple ] of type
simple integer
| array digit
digit 0|1|2|3|4|5|6|7|8|9
consider the string “consider the string “array 6”
the predictive parser starts with the predictive parser starts with typetype and lookahead= and lookahead= array
apply production apply production type simple OR OR type array digit ????
CH2.26
CSE244
Designing a Predictive ParserDesigning a Predictive Parser
Consider AConsider A FIRST()=set of leftmost tokens that appear in
or in strings generated by . E.g. FIRST(type)={,array,integer,char,num}
Consider productions of the form AConsider productions of the form A, A, A the the sets FIRST(sets FIRST() and FIRST() and FIRST() should be disjoint) should be disjoint
Then we can implement predictive parsing Then we can implement predictive parsing (initially: start NT + lookahead=lefmost)(initially: start NT + lookahead=lefmost) Starting with A? we find into which FIRST()
set the lookahead symbol belongs to and we use this production.
Any non-terminal results in the corresponding procedure call
Terminals are matched.
CH2.27
CSE244
Problems with Top Down ParsingProblems with Top Down Parsing
Left Recursion in CFG May Cause Parser to Loop Forever.Left Recursion in CFG May Cause Parser to Loop Forever. Indeed:Indeed:
In the production AA we write the programprocedure A{
if lookahead belongs to First(A) thencall the procedure A
}
Solution: Remove Left Recursion...Solution: Remove Left Recursion... without changing the Language defined by the
Grammar.
CH2.28
CSE244
Dealing with Left recursionDealing with Left recursion
Solution: Algorithm to Remove Left Recursion:Solution: Algorithm to Remove Left Recursion:
expr expr + term | expr - term | term
term 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
expr term rest
rest + term rest | - term rest |
term 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
BASIC IDEA:AA| becomes
A RR R|
CH2.29
CSE244
What happens to semantic actions?What happens to semantic actions?
expr expr + term {print(‘+’)}
expr - term {print(‘-’)}
term
term 0 {print(‘0’)}
term 1 {print(‘1’)}
…
term 9 {print(‘9’)}
expr term rest
rest + term {print(‘+’)} rest
- term {print(‘-’)} rest
term 0 {print(‘0’)}
term 1 {print(‘1’)}
…
term 9 {print(‘9’)}
CH2.30
CSE244
Comparing GrammarsComparing Grammarswith Left Recursionwith Left Recursion
Notice Location of Semantic Actions in TreeNotice Location of Semantic Actions in Tree
What is Order of Processing?What is Order of Processing?
expr
expr
expr
term
term
term
{print(‘2’)}
{print(‘+’)}
{print(‘5’)}
{print(‘-’)}
{print(‘9’)}
5
+
2-
9
CH2.31
CSE244
Comparing GrammarsComparing Grammarswithout Left Recursionwithout Left Recursion
Now, Notice Location of Semantic Actions in Tree Now, Notice Location of Semantic Actions in Tree for Revised Grammarfor Revised Grammar
What is Order of Processing in this Case?What is Order of Processing in this Case?
{print(‘2’)}
expr
term
term {print(‘-’)}
term {print(‘+’)}{print(‘5’)}
{print(‘9’)} rest
rest
2
5
-9+
rest
CH2.32
CSE244
The Lexical Analysis ProcessA Graphical Depiction
uses getchar ( ) to read character
pushes back c using ungetc (c , stdin)
returns token to caller
tokenval
Sets global variable to attribute value
lexan ( )
lexical analyzer
CH2.33
CSE244
The Lexical Analysis ProcessFunctional Responsibilities
Input Token String Is Broken Down
White Space and Comments Are Filtered Out
Individual Tokens With Associated Values Are Identified
Symbol Table Is Initialized and Entries Are Constructed for Each “Appropriate” Token
Under What Conditions will a Character be Pushed Back?
CH2.34
CSE244
Example of a Lexical AnalyzerExample of a Lexical Analyzer
function lexan: integer ;
var lexbuf : array[ 0 .. 100 ] of char ; c : char ;begin loop begin read a character into c ; if c is a blank or a tab then do nothing else if c is a newline then lineno : = lineno + 1 else if c is a digit then begin set tokenval to the value of this and following digits ; return NUM end
CH2.35
CSE244
Algorithm for Lexical AnalyzerAlgorithm for Lexical Analyzer
else if c is a letter then begin place c and successive letters and digits into lexbuf ; p : = lookup ( lexbuf ) ; if p = 0 then p : = insert ( lexbf, ID) ; tokenval : = p return the token field of table entry p end else set tokenval to NONE ; / * there is no attribute * / return integer encoding of character c endend
Note: Insert / Lookup operations occur against the Symbol Table !
CH2.36
CSE244
Symbol Table ConsiderationsSymbol Table Considerations
ARRAY symtable
lexptr token attributes
div mod id id
0
1
23
4
EOSiEOStnuocEOSdomEOSvid
ARRAY lexemes
OPERATIONS: Insert (string, token_ID) Lookup (string)NOTICE: Reserved words are placed into symbol table for easy lookupAttributes may be associated with each entry, i.e., Semantic Actions Typing Info: id integer etc.
CH2.37
CSE244
A Brief Look at Code Generation
Back-end of Compilation Process - Which Will Not Be Our Emphasis
We’ll Focus on Front-end Important Concepts to Re-emphasize
•• Abstract Stack Machine for Intermediate
Code Generation: (i) basic arithmetic, (ii) stack, (iii), flow control
•• L-value Vs. R-value of an identifier I : = 5 ; L - Location I : = I + 1 ; R - Contents
CH2.38
CSE244
A Brief Look at Code Generation
Employ Statement Templates for Code Generation. Each Template Characterizes the Translation
Different Templates for Each Major Programming Language Construct, if, while, procedure, etc.
IF
code for expr
gofalse out
code for stmt
label out
WHILE
label test
code for expr
gofalse out
code for stmt
goto test
label out
CH2.39
CSE244
Concluding Remarks / Looking Ahead
We’ve Reviewed / Highlighted Entire Compilation We’ve Reviewed / Highlighted Entire Compilation ProcessProcess
Introduced Introduced Context-free GrammarsContext-free Grammars (CFG) and (CFG) and Indicated /Illustrated Relationship to Compiler Indicated /Illustrated Relationship to Compiler TheoryTheory
Reviewed Many Different Versions of Reviewed Many Different Versions of Parse TreesParse Trees That Assist in Both That Assist in Both RecognitionRecognition and and TranslationTranslation
We’ll Return to Beginning - We’ll Return to Beginning - Lexical AnalysisLexical Analysis
We’ll Explore Close Relationship of We’ll Explore Close Relationship of Lexical Lexical AnalysisAnalysis to to Regular ExpressionsRegular Expressions, , GrammarsGrammars, and , and Finite AutomatonsFinite Automatons