intermediate code generation mooly sagiv [email protected] schrierber 317 03-640-7606 wed...

27
Intermediate Code Generation Mooly Sagiv [email protected] Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/ courses/wcc02.html Chapter 7 (Chapter 6 next week)

Post on 19-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Intermediate Code GenerationMooly Sagiv

[email protected] 31703-640-7606

Wed 10:00-12:00

html://www.math.tau.ac.il/~msagiv/courses/wcc02.htmlChapter 7

(Chapter 6 next week)

Basic Compiler PhasesSource program (string)

Fin. Assembly

lexical analysis

syntax analysis

semantic analysis

Translate

Instruction selection

Register Allocation

Tokens

Abstract syntax tree

Intermediate representation

Assembly

Why can’t we translate directly into machine language

Why use intermediate languages?• Simplify the compilation phase

– ultimately leads to a more efficient code

• Portability of the compiler front-end

• Reusability of the compiler back-end

Java

C

Pascal

C++

ML

Pentium

MIPS

Sparc

Java

C

Pascal

C++

ML

Pentium

MIPS

Sparc

IR

IR Design Goals• Convenient to generate IR from the source

• Convenient to generate machine code from IR– Missmatches between Source and Target

• Clear operational meaning

Textbook Solution

• Simple intermediate instructions

•Tree like expressions

A Grammar for the Tree IRT_stm ::= T_stm T_stm (T_SEQ)

T_stm ::= T_label (T_LABEL)

T_exp ::=T_exp (T_MEM)

T_stm ::= T_exp Temp_labelList (T_JUMP)

T_stm::= T_relop T_exp T_exp Temp_label Temp_label (T_CJUMP)

T_stm::=T_exp T_exp (T_MOVE)

T_stm ::= T_exp (T_EXP)

T_exp ::=T_binop T_exp T_Exp (T_BINOP)

T_exp ::= Temp_temp (T_TEMP)

T_exp ::= T_stm T_exp (T_ESEQ)

T_exp ::= Temp_label (T_LABEL)

T_exp ::=int (T_CONST)

T_exp::= T_exp T_expList (T_CALL)

/* tree.h */typedef struct T_exp_ *T_exp;struct T_stm_ { enum {T_SEQ, T_LABEL, T_JUMP, …, T_EXP} kind; union { struct {T_stm left, right;} SEQ;

… } u;};

T_stm T_Seq(T_stm left, T_stm right);T_stm T_Label(Temp_label);T_stm T_Jump(T_exp exp, Temp_labelList labels);T_stm T_Cjump(T_relOp op, T_exp left, T_exp right, Temp_label _true, Temp_label _false );T_stm T_Move(T_exp, T_exp);T_stm T_Exp(T_exp);typedef enum {T_plus, T_minus, T_mul, T_div, T_and, T_or, T_lshift, T_rshift, T_arshift, T_xor} T_binOp ;typedef enum {T_eq, T_ne, T_lt, T_gt, T_le, T_ge, T_ult, T_ule, T_ugt, T_uge} T_relOp;struct T_exp_ { enum {T_BINOP, T_MEM, T_TEMP, …, T_CALL} kind;

union {struct {T_binop op; T_exp left; T_exp right;} BINOP; …} u; } ;

Example factorial

let function nfactor (n: int): int = if n = 0 then 1 else n * nfactor(n-1)in nfactor(10)end

Abstract Tiger ProgramletExp(decList( functionDec(fundecList( fundec(nfactor, fieldList( field(n, int, fld-escaped=FALSE), fieldList()), int, ifExp( opExp(EQUAL, varExp(simpleVar(n)), intExp(0)), intExp(1), opExp(TIMES, varExp(simpleVar(n)), callExp(nfactor, expList(opExp(MINUS, varExp(simpleVar(n)), intExp(1)), expList()))))), fundecList())), decList()), seqExp(expList( callExp(nfactor, expList(intExp(10), expList())), expList())))

IR for Main

/* prologue of main starts with l1 *//* body of main */MOV(TEMP(RV), CALL(NAME(l2), ExpList(CONST(10), null /* next argument */)))/* epilogue of main */

IR for nfact/* Prologue of nfunc starts with l2 *//* body of nfunc */MOV(TEMP(RV), ESEQ(SEQ( CJUMP(=, “n”, CONST(0), NAME(l3), NAME(l4)), SEQ(LABEL(l3) /* then-clause */, SEQ(MOV(TEMP(t1), CONST(1)), SEQ(JUMP(NAME(l5)), SEQ(LABEL(l4), /* else-clause */ SEQ(MOV(TEMP(t1), BINOP(MUL, “n”, CALL(NAME(l2), ExpList(BINOP(MINUS, “n”, CONST(1)), null /* next argument */)))), LABEL(l5)))…), TEMP(t1)))/* epilogue of nfunc */

Outline of the Translation (translate.c)

• Top-down traversal over the abstract syntax tree• Generate code to allocate memory for declarations and

initializations (next week)• Generate code for function declarations:

– Prologue– The body expression– Epilogue

• Generate code for expressions– Value expressions

• x + y

– Location expressions • x < y

• Statements– x := y– Control flow

The rest of this lecture• L-values and R-Values• Arithmetic expressions• Conditionals and Loops• Conversions• Complex data types

– Arrays

– Structures

• Memory Checks

L-values vs. R-values

• Assignment x := exp is compiled into:– Compute the address of x

– Compute the value of exp

– Store the value of exp into the address of x

• Generalization– R-value

– L-value

rval(y) + rval(x) = y)+(x rval

5 = (5) rval

xof value= (x) rval

lval(*e)

rval(e) + lval(a) = (a[e]) lval

a of address base = a)array -Pascal lval(

a of address = a)pointer -lval(C

undefined = a)array -lval(C

undefined = y)+(x lval

undefined = (5) lval

xof address = (x) lval

Translating Expressions• Straightforward by induction on the abstract

expression tree

/* translate.c */Tr_exp Tr_opExp(A_oper oper, Tr_exp left, Tr_exp right){ switch (oper) { case A_plusOp: return Tr_opArithExp(T_plus, left, right); case A_minusOp: return Tr_opArithExp(T_minus, left, right); case A_timesOp: … case A_eqOp: return Tr_opCondExp(T_eq,left,right); case A_neqOp: return Tr_opCondExp(T_ne,left,right); case A_ltOp: … } assert(0); return NULL;}

Conditional Expressions• Translating Expressions in Conditions may

be tricky

• Two options– Value computation

• Compute a value of Boolean Expression

– Location computation• Compute a label in the code that will be reached if

the expression holds

• Allows shortcut computations

Example C code• if (a < 6 && b+1 >7)

a = b * c

CJUMP(<, “a” CONST(6), l1, l2)

LABEL(l1)

CJUMP(>, (BINOP(+, “b”, CONST(1)), CONST(7), l3, l2)

LABEL(l3)

MOVE(“a”, BINOP(*, “b”, “c”)

LABEL(l2)

Conditional Expressions in Tiger

static Tr_exp Tr_opCondExp( T_relOp oper,

Tr_exp left,

Tr_exp right)

{

struct Cx cx;

cx.stm = T_Cjump(oper, left, right, NULL, NULL);

cx.trues = PatchList(cx.stm->u.CJUMP._true, NULL);

cx.falses = PatchList(cx.stm->u.CJUMP._false, NULL);

return Tr_Cx(cx.trues, cx.falses, cx.stm);

}

if a >b then x := 5 SEQ

CJUMP

GT “a” “b”

t

NAME

f

NAME

SEQ

SEQ

Code for x:=5

t

LABEL

LABEL

f

Loops• Similar to if-then else

• Need to handle break

while a >b do S

Conversions

• Local translation may lead to converting representations – Value-computation Location-computation

• Examplesif (x+5) then 0 else 1

(a > b) + b

x := if (a>b) then a else b

x := (a > b)

(if a>b then a else b) + 1

Complex Data Types

• Data types like arrays, strings, and records may require special treatment

• Important questions– Duration– Static vs. Dynamic size– Structured L-values

Complex Data Types in Tiger• Arrays, strings, and record’s fields are long-lived

– Usually allocated in the heap

– No structured L-values

• Example: Tiger Record Allocation

type foo = { a : ty1 , b : ty2}... = foo {a =e1, b = e2}

ESEQ (SEQ ( MOV(TEMP r, CALL(NAME MALLOC, CONST 2*W)), SEQ( MOV(MEM(+(0*W, TEMP r)), TransExp(e1))), MOV(MEM(+(1*W, TEMP r)), (TransExp(e2))))), TEMP r)

Example Tiger Arrayslet type intArray = array of int var a := intArray[12] of 0 var b := intArray[13] of 7in a := b

SEQ( SEQ( CONST 0, SEQ( MOVE(TEMP ta, CALL(NAME initArray, CONST 12, CONST 0)), SEQ( MOVE(TEMP tb, CALL(NAME initArray, CONST 13, CONST 7)), MOVE(TEMP ta, TEMP tb)))))

L-values of Arrays and Structures(Tiger)

• The l-value of a[i] MEM(+(“a”, *(CONST W, “i”)))

• For a structure s.f MEM(+(“s”, *(CONST W, CONST kf)))

Big L-values

• In some programming languages, more than one word need to be copied or stored

• Examples: – C structures– Pascal arrays

• How can this be handled?

Memory checks• Can the compiler guarantee that no invalid memory is

referred– At compile-time– At runtime?

• Examples– Array references

• Algol, Pascal, Java, PL.1– Runtime checks

• C – No checks

• Ada, C#– User control

– Field and pointer dereferences• The best solutions combine runtime and compile-time

checks

Summary• Intermediate code simplifies the translation

and increases re-use

• Tree-like intermediate code simplifies the translation of expressions– No temporaries

• Abstract syntax helps

• Memory management is interesting– Mostly next week