intermediate code generation

74
Intermediate Code Generation Chapter 6 1

Upload: rahim-moses

Post on 03-Jan-2016

34 views

Category:

Documents


4 download

DESCRIPTION

Intermediate Code Generation. Chapter 6. Bac k -end and Fron t -end of A Com piler. Bac k -end and Fron t -end of A Com piler. Cl ose t o sou r c e lan g uage (e . g. s y n t a x t r e e ). Cl ose t o machine l an g uage. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Intermediate Code Generation

1

Intermediate Code Generation

Chapter 6

Page 2: Intermediate Code Generation

Back-end and Front-end

of A Compiler

Page 3: Intermediate Code Generation

Back-end and Front-endof A Compiler

Close to source language(e.g. syntax tree)

Close to machine language

m x n compilers can be built by writing just m front ends and n back ends

Page 4: Intermediate Code Generation

Back-end and Front-endof A Compiler

Includes:• Type checking• Any syntactic checks that remain after parsing(e.g. ensure break statement is enclosed

within while-, for-, or switch statements).

Page 5: Intermediate Code Generation

Component-Based Approach to Building Compilers

5

Source program in Language-1

Source program in Language-2

Non-optimized Intermediate Code

Optimized Intermediate Code

Target-1 machine code Target-2 machine code

Target-1 Code Generator Target-2 Code Generator

Intermediate-code Optimizer

Language-1 Front End Language-2 Front End

Page 6: Intermediate Code Generation

Intermediate Representation (IR)

6

A kind of abstract machine language that can express the target machine operations without committing to too much machine details.

:MWhy IR ?

Page 7: Intermediate Code Generation

Without IR

7

C

Pascal

FORTRAN

C++

SPARC

HP PA

x86

IBM PPC

Page 8: Intermediate Code Generation

With IR

8

C

Pascal

FORTRAN

C++

SPARC

HP PA

x86

IBM PPC

IR

Page 9: Intermediate Code Generation

With IR

C

Pascal

FORTRAN

C++

IR Common Backend?

9

Page 10: Intermediate Code Generation

Advantages of Using an Intermediate Language

10

1. Retargeting - Build a compiler for a new machine by

attaching a new code generator to an existing front-

end.

2. Optimization - reuse intermediate code optimizers

in compilers for different languages and different

machines.

Note: the terms “intermediate code”, “intermediate

language”, and “intermediate representation” are all

used interchangeably.

Page 11: Intermediate Code Generation

Issues in Designing an IR

11

• Whether to use an existing IR• if target machine architecture is

similar• if the new language is similar

• Whether the IR is appropriate for the kind of optimizations to be performed• e.g. speculation and predication• some transformations may take

much longer than they would on a different IR

Page 12: Intermediate Code Generation

Issues in Designing an IR

12

• Designing a new IR needs to consider

• Level (how machine dependent it is)

• Structure• Expressiveness• Appropriateness for general and

special optimizations• Appropriateness for code

generation• Whether multiple IRs should be

used

Page 13: Intermediate Code Generation

Multiple-Level IR

13

Source Program

High-level IR

Low-level IR

Target code

Semantic Check

High-level Optimization

Low-level Optimization

Page 14: Intermediate Code Generation

Using Multiple-level IR

14

Translating from one level to another in the compilation process

• Preserving an existing technology investment

• Some representations may be more appropriate for a particular task.

Page 15: Intermediate Code Generation

Commonly Used IR

15

:MPossible IR forms• Graphical representations: such as

syntax trees, AST (Abstract Syntax Trees), DAG

• Postfix notation• Three address code• SSA (Static Single Assignment) form

:MIR should have individual components that describe simple things

Page 16: Intermediate Code Generation

Syntax Tree

Page 17: Intermediate Code Generation

Syntax Tree

+

+

*

a -

-

b c

a*

d

b c

Page 18: Intermediate Code Generation

Syntax Tree

+

+

*

a -

-

b c

a*

d

b c

Page 19: Intermediate Code Generation

Syntax Tree+

+

*

a -

-

b c

a

b c

*

d

Directed Acyclic Graph (DAG):• More compact representation• Gives clues regarding generation of efficient

code

Node can have more thanone parent

Page 20: Intermediate Code Generation

Example

Construct the DAG for:

Page 21: Intermediate Code Generation

How to Generate DAG fromSyntax-Directed Definition?

All what is needed is that functions such as Node and Leaf above check whether a node already exists. If such a node exists, a pointer is returned to that node.

Page 22: Intermediate Code Generation

How to Generate DAG from

Syntax-Directed Definition?

Page 23: Intermediate Code Generation

Data Structure: Array

Page 24: Intermediate Code Generation

Data Structure: Array

Left and right children of an intermediate node

Operation code

Leaves

Scanning the array each time a new node is needed, is not an efficient thing to do.

Page 25: Intermediate Code Generation

Data Structure: Hash Table

Hash function = h(op, L, R)

Page 26: Intermediate Code Generation

Three-Address Code• Another option for intermediate

presentation• Built from two concepts:

– addresses and instructions• At most one operator

Page 27: Intermediate Code Generation

AddressCan be one of the following:• A name: source program

name• A constant• Compiler-generated

temporary

Page 28: Intermediate Code Generation

Instructions

Procedure call such as p(x1, x2, …, xn) is implemented as:

Page 29: Intermediate Code Generation

Example

OR

Page 30: Intermediate Code Generation

Choice of Operator Set• Rich enough to implement the

operations of the source language• Close enough to machine

instructions tosimplify code generation

Page 31: Intermediate Code Generation

Data StructureHow to present these instructions in a

data structure?– Quadruples– Triples– Indirect triples

Page 32: Intermediate Code Generation

Data Structure:Quadruples

• Has four fields: op, arg1, arg2, result

• Exceptions:– Unary operators: no arg2– Operators like param: no arg2, no

result– (Un)conditional jumps: target label is

theresult

Page 33: Intermediate Code Generation

tl =

tz = b * tt

t3 = minus

c t4 = b

* t3 t6 = tz + t4

a = t5

minus c 0

12

3

4

5

op arg1 arg2 result

=I ts I

I

minus : c I• tlI

• b tl I

t2

* I I -,I

Tm1nus 1 c I I t3

* I b • t3 I t4I

+ I t2 • t4 'I tsI

I

I

a...

Page 34: Intermediate Code Generation

Data Structure:Triples

• Only three fields: no result field• Results referred to by its

position

Page 35: Intermediate Code Generation

Data Structure:Indirect Triples

• When instructions are moving around during optimizations: quadruples are better than triples.

• Indirect triples solve this problem

List of pointers to triplesOptimizing complier can reorder instructionlist, instead of affecting the triples themselves

Page 36: Intermediate Code Generation

Single-Static-Assignment (SSA)

• Is an intermediate presentation• Facilitates certain code

optimizations• All assignments are to variables

with distinct names

Page 37: Intermediate Code Generation

Single-Static-Assignment (SSA)

Example:

If we use different names for X in true part and false part, then which nameshall we use in the assignment of y = x * a ?

The answer is: Ø-function

Returns the value of its argument that corresponds to the control-flow path that was taken to get to the assignment statement containing the Ø-function

Page 38: Intermediate Code Generation

Example

Page 39: Intermediate Code Generation

Types and Declarations• Type checking: to ensure that types

of operands match the type expected by operator

• Determine the storage needed• Calculate the address of an

array reference• Insert explicit type conversion• Choose the write version of an

operator• …

Page 40: Intermediate Code Generation

Storage Layout• From the type, we can determine

amount of storage at run time.• At compile time, we will use this

amountto assign its name a relative address.

• Type and relative address are saved inthe symbol table entry of the name.

• Data with length determined only at run time saves a pointer in the symbol table.

Page 41: Intermediate Code Generation

Storage Layout• Multibyte objects are stored in

consecutive bytes and given the address of the first byte

• Storage for aggregates (e.g. arrays and classes) is allocated in one contiguous block of bytes.

Page 42: Intermediate Code Generation

B

Page 43: Intermediate Code Generation

offset = offset + T.width; }

D €

P { offset = 0; }

D D

T id ;{ top.put( id .lexeme, T.type, offset);

Page 44: Intermediate Code Generation

To keep track of the next available relative address

Create a symbol table entry

Page 45: Intermediate Code Generation

Translations of Statementsand Expressions

Syntax-Directed Definition (SDD)

Syntax-Directed Translation(SDT)

Page 46: Intermediate Code Generation

PRODUCTION SEMANTIC RULES

S id = E ;

E E1 + E2

I - E1

( E1 )

I id

S.code = E.code IIgen(top .g et( id.lexeme) '=' E.addr)

E.addr- = new Temp()E.cod e = E 1.code II E2.code II

gen(E.addr'=' E 1 .addr'+' E2 .addr)

E.addr = new Temp()E.cod e = E 1.code II

gen(E.addr '=''minus' E1.addr)

E.addr· = E1.addr E.cod e = Et.code

E .addr = top.get( id .lexeme)E.cod e ="

Page 47: Intermediate Code Generation

Three-address code of E

Build an instruction

Address holding value of E(e.g. tmp variable, name, constant)

Get a temporaryvariable

Current symbol table

Page 48: Intermediate Code Generation

a = b + - c

•t 1 = minus ct2 ; b + tla = t2

PRODUCTION SEMANTIC RULES

S id = E ;

E E1 + Ez

I - E1

( E1 )

I id

S.code = E.cod e IIgen( lOJI.g et( id .lexeme) '=' E .addr)

E.addr =new Temp()E.code = E1.code II E2.code II

gen(E.addr '=' E1.addr '+' E2.addr)

E .addr = new Temp()E.code == E1.cod e II

gen(E.addr '=' 'minus' E1.addr)

E.addr = E1.addrE.code = E 1 .cod e

E.addr = top.get( id.lexeme)E .cod e =="

Page 49: Intermediate Code Generation

Generating three-address code incrementallyto avoid long strings manipulations

gen() does two things:• generate three address

instruction•append it to the sequence ofinstructions generated so far

Page 50: Intermediate Code Generation

Arrays• Elements of the same time• Stored consecutively in memory• In languages like C or Java

elements are: 0, 1, …, n-1• In some other languages:

low, low+1, …, high

Page 51: Intermediate Code Generation

ArraysIf elements start with 0, and element width is w, then a[i] address is: base is address of A[0]

Generalizing to two-dimensions a[i1][i2]:w1 is width of a row andw2 the width of an element

Generalizing to k-dimensions:

or

orw is width of an element, n2 is numberof elements

Page 52: Intermediate Code Generation

temporary used while computing the offset

pointer to symbol table entry

Page 53: Intermediate Code Generation

A is a 2x3 array of integers

Page 54: Intermediate Code Generation

54

Type Checking

Type Checking includes several aspects. • The language comes with a type system, i.e., a set of rules

saying what types can appear where. • The compiler assigns a type expression to parts of the

source program. • The compiler checks that the type usage in the program

conforms to the type system for the language.

Page 55: Intermediate Code Generation

55

Type Checking

• All type checking could be done at run time: • The compiler generates code to do the checks. • Some languages have very weak typing; for example, variables can change

their type during execution. • Often these languages need run-time checks. • Examples include lisp, snobol, apl.

• A sound type system guarantees that all checks can be performed prior to execution. This does not mean that a given compiler will make all the necessary checks.

• An implementation is strongly typed if compiled programs are guaranteed to run without type errors

Page 56: Intermediate Code Generation

56

Rules for Type Checking

• There are two forms of type checking. 1. We will learn type synthesis where the types of parts are used to infer the

type of the whole. For example, integer+real =real. 2. Type inference is very slick. The type of a construct is determined from

usage. This permits languages like ML to check types even though names need not be declared.

• We consider type checking for expressions. • Checking statements is very similar. • View the statement as a function having its components as

arguments and returning void.

Page 57: Intermediate Code Generation

57

Type Conversions

• A very strict type system would do no automatic conversion. • Instead it would offer functions for the programmer to explicitly

convert between selected types. Then either the program has compatible types or is in error.

• we will consider a approach in which the language permits certain implicit conversions that the compiler is to supply.

• This is called type coercion. Explicit conversions supplied by the programmer are called casts.

Page 58: Intermediate Code Generation

58

Type Conversions

• We will see integer and real types, and postulate a unary function denoted (real) that converts an integer into the real having the same value.

• We consider the more general case where there are multiple types some of which have coercions (often called widening).

• For example in C/Java, int can be widened to long, which in turn can be widened to float.

• Mathematically the hierarchy is a partially order set (poset) in which each pair of elements has a least upper bound (LUB).

• For many binary operators the two operands are converted to the LUB. So adding a short to a char, requires both to be converted to an int. Adding a byte to a float, requires the byte to be converted to a float (the float remains a float and is not converted).

Page 59: Intermediate Code Generation

59

Checking and Coercing Types for Basic Arithmetic

• The steps for addition, subtraction, multiplication, and division are all essentially the same:

• Convert each types if necessary to the LUB and then perform the arithmetic on the (converted or original) values. Note that conversion often requires the generation of code.

• Two functions are convenient. 1. LUB(t1,t2) returns the type that is the LUB of the two given types. It signals

an error if there is no LUB, for example if one of the types is an array. 2. widen(a,t,w,newcode,newaddr). Given an address a of type t, and a

(hopefully) wider address w, produce the instructions newcode needed so that the address newaddr is the conversion of address a to type w.

• LUB is simple, just look at the address latice. If one of the type arguments is not in the lattice, signal an error; otherwise find the lowest common ancestor.

Page 60: Intermediate Code Generation

60

Checking and Coercing Types for Basic Arithmetic

• widen is more interesting. It involves n2 cases for n types. Many of these are error cases (e.g., if t wider than w). Below is the code for our situation with two possible types integer and real.

• The four cases consist of 2 nops (when t=w), one error (t=real; w=integer) and one conversion (t=integer; w=real).

widen (a:addr, t:type, w:type, newcode:string, newaddr:addr)

if t=w newcode = "" newaddr = a

else if t=integer and w=real newaddr = new Temp() newcode = gen(newaddr = (real) a)

else signal error

Page 61: Intermediate Code Generation

61

Checking and Coercing Types for Basic Arithmetic

• With these two functions it is not hard to modify the rules to catch type errors and perform coercions for arithmetic expressions.

• Maintain the type of each operand by defining type attributes for e, t, and f.

• Coerce each operand to the LUB. • This requires that we have type information for the base entities,

identifiers and numbers. The lexer can supply the type of the numbers. We retrieve it via get(NUM.type).

• It is more interesting for the identifiers. We insert that information when we process declarations. So we now have another semantic check: Is the identifier declared before it is used?

Page 62: Intermediate Code Generation

62

Checking and Coercing Types for Basic Arithmetic

• let's examine a particularly interesting entry. Consider the assignment statement

A[3/X+4] := X*5+Y;

• Consider the ra node, i.e., the node corresponding to the production. ra → [ e ] := e1 ;

• When the tree traversal gets to this node, its parent has passed in the value of the inherited attribute ra.id=id.entry.

• Thus the ra node has access to the identifier table entry for ID, which in our example is the variable A.

Page 63: Intermediate Code Generation

63

Checking and Coercing Types for Basic Arithmetic

• Prior to doing its calculations, the ra node invokes its children and gets back all the synthesized attributes. To summarize, when the ra node performs its calculations, it has available.

• ra.id: the identifier entry for A. • e.addr/e.code: executing e.code results in e.addr containing the value of the

expression 3/X+4. • e1.addr/e1.code: executing e1.code results in e1.addr containing the value of

the expression X*5+Y. • e.type/e1.type: The types of the expressions.

• What must the ra node do? • Ensure execution of e.code and e1.code. • Check that e.type is int (I don't do this). • Multiply e by the base width of the array A. (We need a temporary, ra.t1, to

hold the computed value). • Widen e1 to the base type of A. (We need, and widen generates, a temporary

ra.addr to hold the widened value). • Do the actual assignment of X*5+Y to A[3/X+4].

Page 64: Intermediate Code Generation

64

Control Flow

• Control flow includes the study of Boolean expressions, which have two roles.

1. They can be computed and treated similar to integers or real. There are also relational operators that produce Boolean values from arithmetic operands.

2. They are used in certain statements that alter the normal flow of control. In this regard.

• One question that comes up with Boolean expressions is whether both operands need be evaluated. If we are evaluating A or B and find that A is true, must we evaluate B? For example, consider evaluating

A=0 OR 3/A < 1.2 • when A is zero. • Consider A*F(x). If the compiler knows that for this run A is zero, must it

generate code to evaluate F(x)? Don't forget that functions can have side effects,

Page 65: Intermediate Code Generation

65

Short-Circuit Code

• This is also called jumping code. Here the Boolean operators AND, OR, and NOT do not appear in the generated instruction stream. Instead we just generate jumps to either the true branch or the false branch.

• This statement cab be translated as follows

Page 66: Intermediate Code Generation

66

Flow-of-Control Statements

• We now consider the translation of boolean expressions into three-address code

• in the context of statements such as those generated by the following grammar:

S if(B)S1S if ( B ) S1 else S2S while ( B ) S1

• both B and S have a synthesized attribute code, which gives the translation into three-address instructions.

• For simplicity, we build up the translations B. code and S. code as strings, using syntax-directed definitions.

Page 67: Intermediate Code Generation

67

• The translation of if (B) S1 consists of B. code followed by Sl. Code

Page 68: Intermediate Code Generation

68

Page 69: Intermediate Code Generation

69

Control-Flow Translation of Boolean Expressions

Page 70: Intermediate Code Generation

70

Control-Flow Translation of Boolean Expressions

• Consider again the following statement

• Translating into 3 address code will fetch the following results

Page 71: Intermediate Code Generation

71

Avoiding Redundant Gotos

• For x > 200 we will have the following code fragment

Page 72: Intermediate Code Generation

72

Avoiding Redundant Gotos

Page 73: Intermediate Code Generation

73

Avoiding Redundant Gotos

Page 74: Intermediate Code Generation

74

Boolean Values and Jumping Code

• A boolean expression may also be evaluated for its value, as in assignment statements such as x = true; or x = a < b;

• A clean way of handling both roles of boolean expressions is to first build a syntax tree for expressions, using either of the following approaches:

1. Use two passes. Construct a complete syntax tree for the input, and then walk the tree in depth-first order, computing the translations specified by the semantic rules.

2. Use one pass for statements, but two passes for expressions. With this approach, we would translate E in while (E) S1 before S1 is examined. The translation of E, however, would be done by building its syntax tree and then walking the tree.