intermediate code generation professor yihjia tsai tamkang university

48
Intermediate Code Generation Professor Yihjia Tsai Tamkang University

Post on 20-Dec-2015

247 views

Category:

Documents


0 download

TRANSCRIPT

Intermediate Code Generation

Professor Yihjia TsaiTamkang University

Sanath Jayasena/Apr 2006 7-2

Introduction

• Intermediate representation (IR)– Generally a program for an abstract

machine (can be assembly language or slightly above)

– Easy to produce and translate into target code

• Why?– When a re-targetable compiler is needed

• i.e., if we are planning a portable compiler, with different back ends

– Better/easier for some optimizations• Machine code can be more complex

Sanath Jayasena/Apr 2006 7-3

Java

ML

Pascal

C

Sparc

MIPS

Pentium

Alpha

Java

ML

Pascal

C

Sparc

MIPS

Pentium

Alpha

IntermediateRepresentation

Sanath Jayasena/Apr 2006 7-4

Introduction …contd

• Front end can do scanning, parsing, semantic analysis and translation to IR

• Back end will then optimize and generate target code

• IR can modularize the task– Front end not bothered about machine

details– Back end not bothered about source

language

Sanath Jayasena/Apr 2006 7-5

Introduction …contd

• Qualities of a good IR– Convenient for semantic analysis phase

to produce– Convenient to translate into machine

language of all desired target hardware– Each construct has a clear and simple

meaning• Easy for optimizing transformations

Sanath Jayasena/Apr 2006 7-6

Intermediate Representations• Abstract syntax trees

• Postfix notation

• Directed acyclic graphs (DAGs)

• Three-address code (3AC)

Sanath Jayasena/Apr 2006 7-7

Abstract Syntax Trees

• Also called Intermediate Rep. (IR) trees– Has individual components that describe

only very simple things– E.g., load, store, add, move, jump – E.g., pp. 136-139, Tiger book (see

handout)

Sanath Jayasena/Apr 2006 7-8

Postfix Notation

• For an expression E, inductively:1. If E is a var or const, the postfix

notation is E2. If E is of the form E1 <op> E2, the

postfix notation is E1’ E2’ <op> where E1’, E2’ are postfix notations for E1, E2

3. If E is of the form (E1) then the postfix notation for E1 is also that for E

– Parenthesis unnecessary

Sanath Jayasena/Apr 2006 7-9

Example

• What are the postfix notations for (9-5)+2 and 9-(5+2)

• (9-5)+2 in postfix notation is 95-2+

• 9-(5+2) in postfix notation is 952+-

Sanath Jayasena/Apr 2006 7-10

Syntax-Directed Translation• Translation guided by CFG’s

– Based on “attributes” of language constructs• E.g., type, string, number, memory location

– Attach attributes to grammar symbols– Values for attributes computed by

semantic rules associated with productions• Translation of a language construct in

terms of attributes associated with its syntactic components

Sanath Jayasena/Apr 2006 7-11

Syntax-Directed Translation …contd

• Two notations for associating semantic rules with productions in a CFG

1. Syntax-directed definitions • High-level specs, details hidden, order of

translation unspecified

2. Translation schemes• Order of translations specified, more details

shown

• [Dragon book: Section 2.3 and Chapter 5]

Sanath Jayasena/Apr 2006 7-12

Syntax-Directed Definitions• For each grammar symbol: associate

a set of attributes (synthesized and inherited)

• For each production: a semantic rule defines the values of attribute at the parse-tree node used at that node

• Grammar + set of semantic rules

Sanath Jayasena/Apr 2006 7-13

Annotated Parse Tree

• A parse tree showing attribute value at each node

• Used for translation (which is an inputoutput mapping)– For input x, construct parse tree for x– If a node n in tree is labeled by symbol Y

• Value of attribute p of Y at node n denoted as Y.p

• Value of Y.p computed using semantic rule for attribute p associated with the Y-production at n

Sanath Jayasena/Apr 2006 7-14

Synthesized Attributes

• An attribute is synthesized if its value at a parse tree node is determined from those at the child nodes

• Can be evaluated with a single bottom-up tree traversal (e.g., depth-first traversal)

• A syntax-directed definition that uses these exclusively is said to be an s-attributed definition

Sanath Jayasena/Apr 2006 7-15

Example 1Translating expressions into postfix

“.t” is a string valued attribute, || is concatenation

Production Semantic Ruleexpr → expr1 + term

expr.t := expr1.t || term.t || ‘+’

expr → expr1 - term

expr.t := expr1.t || term.t || ‘-’

expr → term expr.t := term.t

term → 0 term.t := ‘0’

… …

term → 9 term.t := ‘9’

Sanath Jayasena/Apr 2006 7-16

Example 1 …contd

expr.t = 95-2+

expr.t = 95-

expr.t = 9

term.t = 9

9

term.t = 5

term.t = 2

- 5 + 2

Annotated parse tree corresponding to “9-5+2”

Sanath Jayasena/Apr 2006 7-17

Example 2Syntax-directed definition for desk calculator

program

Draw the annotated parse tree for “3*5+4 $”

Production Semantic Rule

L → E $ print(E.val)

E → E1 + T E.val := E1.val + T.val

E → T E.val := T.val

T → T1 * F T.val := T1.val × F.val

T → F T.val := F.val

F → digit F.val := digit.lexval

Sanath Jayasena/Apr 2006 7-18

Example 2 …contd

E.Val = 19

T.val = 15

T.val=3

F.val=3

digit.lexval=3

T.val=5

T.val=4

*

+

Annotated parse tree corresponding to “3*5+4 $”

F.val=5

F.val=4

digit.lexval=5

digit.lexval=4

L

$

E.val = 15

Sanath Jayasena/Apr 2006 7-19

Inherited Attributes

• Value at a node is defined using attributes at siblings and/or parent of the node

• Useful for tracking the context of a construct– E.g., decide whether address or value of a

var is needed by keeping track of whether it appears on RHS or LHS of an assignment

Sanath Jayasena/Apr 2006 7-20

ExampleSyntax-directed definition with inherited attributeL.in for declaration of variables of type int or real

Draw the annotated parse tree for “real id1, id2, id3”

Production Semantic Rule

D → T L L.in := T.type

T → int T.type := integer

T → real T.type := real

L → L1 , id L1.in := L.in

addtype(id.entry, L.in)

L → id addtype(id.entry, L.in)

Sanath Jayasena/Apr 2006 7-21

Example …contd

D

T.type = real

L.in = real

real

,

id1

,

Annotated parse tree for “real id1, id2, id3” with inherited attribute in at each node L

L.in = real

L.in = real

id2

id3

Sanath Jayasena/Apr 2006 7-22

Translation Schemes

• Semantic actions embedded within RHS of productions– Unlike syntax-directed definitions, order of

evaluation of semantic rules explicitly shown– Action to be taken shown by enclosing in { }

• E.g., rterm term { print (‘+’) } rterm1

– In a parse tree in this context, an action is shown by an extra child node & dashed edge

Sanath Jayasena/Apr 2006 7-23

Depth-First Order

• L-attributed definitions– Attributes can be always evaluated in

depth-first order (left-to-right)

• Translation schemes with restrictions motivated by L-attributed definitions ensure that an attribute value is available when an action refers to it– E.g., when only synthesized attributes

exist

Sanath Jayasena/Apr 2006 7-24

Example

• Translation scheme that maps infix expressions with addition/subtraction into corresponding postfix expressionsE → T R

R → addop T { print(addop.lexeme) } R1 | Λ

R → subop T { print(subop.lexeme) } R2 | Λ

T → num { print(num.val) }

• Show the parse tree for “9-5+2”

Sanath Jayasena/Apr 2006 7-25

Example …contdE

9

5

Parse tree for “9-5+2” showing actions; when performed in depth-first order, prints “95-2+”

2

Λ

{ print (‘9’) }

R

R

T

T

T

R

{ print (‘5’) }

{ print (‘-’) }

{ print (‘+’) }

{ print (‘2’) }

-

+

Sanath Jayasena/Apr 2006 7-26

Emitting a Translation

• For simple syntax-directed definitions, implementation possible with translation schemes where actions print additional strings in the order of appearance – [Simple: string representing the

translation of the non-terminal on LHS of each production is the concatenation of translations of non-terminals on the RHS, in the same order as in the production]

Sanath Jayasena/Apr 2006 7-27

Example• A translation scheme derived from

Example in slide 7-15expr → expr + term { print (‘+’) }expr → expr – term { print (‘-’) }expr → termterm → 0 { print (‘0’) }term → 1 { print (‘1’) }…term → 9 { print (‘9’) }

Sanath Jayasena/Apr 2006 7-28

Example …contd expr

9 5

Actions translating “9-5+2” into “95-2+”

2

{ print (‘9’) } { print (‘5’) }

{ print (‘-’) }

{ print (‘+’) }

{ print (‘2’) }

-

+

term

expr

exprterm

term

Sanath Jayasena/Apr 2006 7-29

Constructing Syntax Trees• Syntax-directed definitions can be used• Recall: syntax tree is a condensed form of

parse tree– Operators, keywords appear as interior nodes

• Construction: similar to postfix notation– For a subexpression, create a node for each

operator and operand– Children of operator node represent operands

(as subexpressions) of that operator

Sanath Jayasena/Apr 2006 7-30

Nodes in a Syntax Tree

• A node is like a record with many fields:– label, pointers to operand nodes, value etc.,

• 3 basic functions to create nodes– mknode(op, left, right): operator node with

label op, two pointer fields left and right– mkleaf(id, entry): ID node with label id and

field entry pointing to symbol-table entry– mkleaf(num, val): a NUM node with label

num and value field containing value of number

Sanath Jayasena/Apr 2006 7-31

Example

• From Example 5.7, p. 288– What is the sequence of calls to create

the syntax tree for the expression “a – 4 + c” ?

p1 = mkleaf(id, entry_a);p2 = mkleaf(num, 4);p3 = mknode(‘-’, p1, p2);p4 = mkleaf(id, entry_c);p5 = mknode(‘+’, p3, p4);

What is the syntax tree?

Sanath Jayasena/Apr 2006 7-32

Constructing Syntax Trees …contd

• A syntax-directed definition may be used for constructing a syntax tree– Semantic rules: calls to functions

mknode( ) and mkleaf( )– E.g., for the production, E E1 + T, we

may have the semantic ruleE.nptr = mknode(‘+’, E1.nptr, T.nptr)

– Example 5.8, p. 289

Sanath Jayasena/Apr 2006 7-33

DAGs for Expressions

• A dag for an expression identifies common subexpressions– Unlike a syntax tree, a node for a common

subexpression may have > 1 parent node– E.g., “a + a * (b-c) + (b-c) * d”

• Fig. 5.11, p.291

• How to create a dag, given an expression?– Check if an identical node already exists– Example 5.9, p. 291

Sanath Jayasena/Apr 2006 7-34

Review

• Example: for the assignment statement, a = b * -c + b * -c,give a syntax tree, dag and postfix notation

• Fig. 8.2, p. 464

Sanath Jayasena/Apr 2006 7-35

Three-Address Code (3AC)• 3AC is a sequence of statements of the

general formx := y <op> z

– x, y, z are names, const’s, generated temp’s– <op> is any operator (arithmetic, logical)

• 3AC means each statement usually has 3 addresses (2 for operands, 1 for the result)

Sanath Jayasena/Apr 2006 7-36

Examples

• Given the expression, x+y*z the 3AC t1 := y * z t2 := x +

t1

• Show 3AC for (a) syntax tree, (b) dag discussed earlier in slide 7-34 (Fig. 8.2)– Fig. 8.5, p. 466

Sanath Jayasena/Apr 2006 7-37

3AC …contd

• A name in a program replaced by a pointer to a symbol table entry for that name

• 3AC statements are like assembly code– There are flow-control statements– They can have symbolic labels– A label represents the index of a 3AC

statement in an array containing the intermediate code

Sanath Jayasena/Apr 2006 7-38

Types of 3AC Statements

1. Assignment statements with binary operators (arithmetic or logical)

– Of the form x:= y <op> z

2. Assignment statements with unary operators (minus, logical not, shift etc.,)

– Of the form x:= <op> y

3. Copy statements– Of the form x := y

Sanath Jayasena/Apr 2006 7-39

Types of 3AC Statements …contd

4. Unconditional jump: goto L– Statement with label L to be executed next

5. Conditional jump: if x <relop> y goto L

– A relational operator (<, =, >= …) is applied to x and y

– If the relation holds, statement with label L executed next

– If not, statement following it is executed

Sanath Jayasena/Apr 2006 7-40

Types of 3AC Statements …contd

6. Function calls: param x , call p, n and return y

– “return y” is optional– E.g., for call p(x1, x2, …, xn) the 3AC

will beparam x1param x2 …param xncall p, n

Sanath Jayasena/Apr 2006 7-41

Types of 3AC Statements …contd

7. Indexed assignments: x := y[i] , x[i] := y

– In x:=y[i] : x is set to the value in location i units beyond memory location y

– In x[i]:=y : value in location i units beyond memory location x is set to the value of y

– x, y and i are data objects

Sanath Jayasena/Apr 2006 7-42

Types of 3AC Statements …contd

8. Address & pointer assignments: x := &y , x := *y , *x := y

– In x:= &y : x is set to be the location of y• y denotes an l-value, x is a pointer name

– In x:= *y : (r-value of) x is set to the value in location pointed by y• y is a pointer; r-value of y is a location

– In *x:= y : (r-value of) object pointed by x is set to (the r-value of) y

Sanath Jayasena/Apr 2006 7-43

Syntax-Dir. Translation into 3AC• When 3AC code is generated, temp

names are made up for interior nodes in syntax tree– E.g., for E E1 + E2, value of E on LHS

will be computed to a new temp t

• Example– Fig. 8.6, Fig 8.7 on p. 469

Sanath Jayasena/Apr 2006 7-44

Implementation of 3AC

• 3AC is an abstract form– Can be implemented in a compiler as

records– (with fields for operator and operands)

• Three representations– Quadruples– Triples– Indirect triples

Sanath Jayasena/Apr 2006 7-45

(a) Quadruples

• A record structure with 4 fields– op, arg1, arg2 and result

• Examples– For x := y op z we have:

• y in arg1, z in arg2 and x in result

– For unary operators, arg2 not used– For param operator, arg2 and result unused– Fig. 8.8(a), p. 471 for a:= b* -c + b* -c

• Content of fields are pointers to ST entries

Sanath Jayasena/Apr 2006 7-46

(b) Triples

• Temps generated in quadruples must be entered in symbol table

• To avoid this, we can refer to a temp value by the location of the relevant statement– We can have records with only 3 fields

• op, arg1 and arg2

– Fields arg1 and arg2 can be pointers to ST entries or to triple structure for temp values

– Example: Fig 8.8(b), Fig. 8.9 on p. 471

Sanath Jayasena/Apr 2006 7-47

(c) Indirect Triples

• Listing of pointers to triples, rather than triples themselves

• Example– We can use an array to list pointers to

triples in the desired order

– Example: Fig 8.10 on p. 472

Sanath Jayasena/Apr 2006 7-48

Translating Language Constructs• Balance of Chapter 8 in Dragon book

covers details on implementing:– Declarations, scope– Assignments, array elements, fields in

records– Boolean expressions– Case statements– Label renaming (called backpatching)– Function calls