lecture 10 – code generation eran yahav 1 reference: dragon 8. mcd 4.2.4

Post on 29-Dec-2015

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

THEORY OF COMPILATIONLecture 10 – Code Generation

Eran Yahav

Reference: Dragon 8. MCD 4.2.4

2

You are here

Executable

code

exe

Source

text

txt

Compiler

LexicalAnalysi

s

Syntax Analysi

s

Parsing

Semantic

Analysis

Inter.Rep.

(IR)

Code

Gen.

3

Last Week: Runtime Part II Nested procedures Object layout Inheritance Multiple inheritance

4

Today

Runtime checks Garbage collection Generating assembly code

5

Runtime checks

generate code for checking attempted illegal operations Null pointer check

MoveField, MoveArray, ArrayLength, VirtualCall Reference arguments to library functions should not be

null Array bounds check Array allocation size check Division by zero …

If check fails jump to error handler code that prints a message and gracefully exists program

6

Null pointer check

# null pointer check

cmp $0,%eax

je labelNPE

labelNPE: push $strNPE # error message call __println push $1 # error code call __exit

Single generated handler for entire program

7

Array bounds check

# array bounds check mov -4(%eax),%ebx # ebx = length mov $0,%ecx # ecx = index cmp %ecx,%ebx jle labelABE # ebx <= ecx ? cmp $0,%ecx jl labelABE # ecx < 0 ?

labelABE: push $strABE # error message call __println push $1 # error code call __exit

Single generated handler for entire program

8

Array allocation size check

# array size check

cmp $0,%eax # eax == array size

jle labelASE # eax <= 0 ?

labelASE: push $strASE # error message call __println push $1 # error code call __exit

Single generated handler for entire program

9

Automatic Memory Management automatically free memory when it is no longer needed not limited to OO programs, we show it here because it

is prevalent in OO languages such as Java also in functional languages

approximate reasoning about object liveness use reachability to approximate liveness assume reachable objects are live

non-reachable objects are dead

Three classical garbage collection techniques reference counting mark and sweep copying

10

GC using Reference Counting add a reference-count field to every

object how many references point to it

when (rc==0) the object is non reachable non reachable => dead can be collected (deallocated)

11

Managing Reference Counts

Each object has a reference count o.RC A newly allocated object o gets o.RC = 1

why?

write-barrier for reference updatesupdate(x,old,new) { old.RC--; new.RC++; if (old.RC == 0) collect(old); }

collect(old) will decrement RC for all children and recursively collect objects whose RC reached 0.

12

Cycles!

cannot identify non-reachable cycles reference counts for nodes on the cycle

will never decrement to 0 several approaches for dealing with

cycles ignore periodically invoke a tracing algorithm to

collect cycles specialized algorithms for collecting

cycles

13

GC Using Mark & Sweep

Marking phase mark roots trace all objects transitively reachable

from roots mark every traversed object

Sweep phase scan all objects in the heap collect all unmarked objects

14

mark_sweep() { for Ptr in Roots mark(Ptr) sweep()}

mark(Obj) { if mark_bit(Obj) == unmarked { mark_bit(Obj)=marked for C in Children(Obj) mark(C) }}

Sweep() { p = Heap_bottom while (p < Heap_top) if (mark_bit(p) == unmarked) then free(p) else mark_bit(p) = unmarked; p=p+size(p)}

GC Using Mark & Sweep

15

Copying GC

partition the heap into two parts: old space, new space

GC copy all reachable objects from old

space to new space swap roles of old/new space

16

Example

old new

Roots

A

D

C

B

E

17

Example

old new

Roots

A

D

C

B

E

A

C

18

Summary

How objects are organized in memory

Automatic management of memory

Coming up… Generating assembly code

19

target languages

Absolute machine code

Code

Gen.Relative

machine code

Assembly

IR + Symbol Table

20

From IR to ASM: Challenges mapping IR to ASM operations

what instruction(s) should be used to implement an IR operation?

how do we translate code sequences call/return of routines

managing activation records memory allocation register allocation optimizations

21

Intel IA-32 Assembly

Going from Assembly to Binary… Assembling Linking

AT&T syntax vs. Intel syntax We will use AT&T syntax

matches GNU assembler (GAS)

23

IA-32 Registers

Eight 32-bit general-purpose registers EAX – accumulator for operands and result data.

Used to return value from function calls. EBX – pointer to data. Often use as array-base address ECX – counter for string and loop operations EDX – I/O pointer (GP for us) ESI – GP and source pointer for string operations EDI – GP and destination pointer for string operations EBP – stack frame (base) pointer ESP – stack pointer

EFLAGS register EIP (instruction pointer) register Six 16-bit segment registers … (ignore the rest for our purposes)

24

Not all registers are born equal

EAX Required operand of MUL,IMUL,DIV and IDIV instructions Contains the result of these operations

EDX Stores remainder of a DIV or IDIV instruction

(EAX stores quotient) ESI, EDI

ESI – required source pointer for string instructions EDI – required destination pointer for string instructions

Destination Registers of Arithmetic operations EAX, EBX, ECX, EDX

EBP – stack frame (base) pointer ESP – stack pointer

25

IA-32 Addressing Modes

Machine-instructions take zero or more operands

Source operand Immediate Register Memory location (I/O port)

Destination operand Register Memory location (I/O port)

26

Immediate and Register Operands

Immediate Value specified in the instruction itself GAS syntax – immediate values

preceded by $ add $4, %esp

Register Register name is used GAS syntax – register names preceded

with % mov %esp,%ebp

27

Memory and Base Displacement Operands

Memory operands Value at given address GAS syntax - parentheses mov (%eax), %eax

Base displacement Value at computed address Address computed out of

base register, index register, scale factor, displacement

offset = base + (index*scale) + displacement Syntax: disp(base,index,scale) movl   $42, $2(%eax) movl $42, $1(%eax,%ecx,4)

28

Base Displacement Addressing

Mov (%ecx,%ebx,4), %eax

7

Array Base Reference

4 4

0 2 4 5 6 7 1

4 4 4 4 4 4

%ecx = base%ebx = 3

offset = base + (index*scale) + displacement

offset = base + (3*4) + 0 = base + 12

(%ecx,%ebx,4)

29

How do we generate the code? break the IR into basic blocks basic block is a sequence of instructions

with single entry (to first instruction), no jumps to

the middle of the block single exit (last instruction) code execute as a sequence from first

instruction to last instruction without any jumps edge from one basic block B1 to another

block B2 when the last statement of B1 may jump to B2

30

Example

False

B1

B2 B3

B4

True

t1 := 4 * it2 := a [ t1 ]if t2 <= 20 goto B3

t5 := t2 * t4

t6 := prod + t5

prod := t6

goto B4

t7 := i + 1i := t2

Goto B5

t3 := 4 * it4 := b [ t3 ]goto B4

31

creating basic blocks

Input: A sequence of three-address statements Output: A list of basic blocks with each three-

address statement in exactly one block Method

Determine the set of leaders (first statement of a block) The first statement is a leader Any statement that is the target of a conditional or

unconditional jump is a leader Any statement that immediately follows a goto or

conditional jump statement is a leader For each leader, its basic block consists of the leader

and all statements up to but not including the next leader or the end of the program

32

control flow graph

A directed graph G=(V,E)

nodes V = basic blocks

edges E = control flow (B1,B2) E when

control from B1 flows to B2

B1

B2

t1 := 4 * it2 := a [ t1 ]t3 := 4 * it4 := b [ t3 ]t5 := t2 * t4

t6 := prod + t5

prod := t6

t7 := i + 1i := t7

if i <= 20 goto B2

prod := 0i := 1

example

1) i = 12) j =13) t1 = 10*I4) t2 = t1 + j5) t3 = 8*t26) t4 = t3-887) a[t4] = 0.08) j = j + 19) if j <= 10 goto (3)10) i=i+111) if i <= 10 goto (2)12) i=113) t5=i-114) t6=88*t515) a[t6]=1.016) i=i+117) if I <=10 goto (13)

33

i = 1

j = 1

t1 = 10*It2 = t1 + jt3 = 8*t2t4 = t3-88a[t4] = 0.0j = j + 1if j <= 10 goto B3i=i+1if i <= 10 goto B2

i = 1

t5=i-1t6=88*t5a[t6]=1.0i=i+1if I <=10 goto B6

B1

B2

B3

B4

B5

B6

for i from 1 to 10 do for j from 1 to 10 do a[i, j] = 0.0;for i from 1 to 10 do a[i, i] = 1.0;

source IR

CFG

34

Variable Liveness

A statement x = y + z defines x uses y and z

A variable x is live at a program point if its value is used at a later point

y = 42z = 73

x = y + zprint(x);

x is live, y dead, z dead

x undef, y live, z live

x undef, y live, z undef

x is dead, y dead, z dead

(showing state after the statement)

35

Computing Liveness Information between basic blocks – dataflow

analysis (next lecture)

within a single basic block? idea

use symbol table to record next-use information

scan basic block backwards update next-use for each variable

36

Computing Liveness Information INPUT: A basic block B of three-address statements.

symbol table initially shows all non-temporary variables in B as being live on exit.

OUTPUT: At each statement i: x = y + z in B, liveness and next-use information of x, y, and z at i.

Start at the last statement in B and scan backwards At each statement i: x = y + z in B, we do the following:1. Attach to i the information currently found in the symbol

table regarding the next use and liveness of x, y, and z.2. In the symbol table, set x to "not live" and "no next use.“3. In the symbol table, set y and z to "live" and the next uses

of y and z to i

37

Computing Liveness Information Start at the last statement in B and scan backwards

At each statement i: x = y + z in B, we do the following:1. Attach to i the information currently found in the symbol

table regarding the next use and liveness of x, y, and z.2. In the symbol table, set x to "not live" and "no next use.“3. In the symbol table, set y and z to "live" and the next

uses of y and z to i

can we change the order between 2 and 3?

x = 1 y = x + 3 z = x * 3 x = x * z

38

common-subexpression elimination

common-subexpression elimination

a = b + cb = a – dc = b + cd = a - d

a = b + cb = a – dc = b + cd = b

39

DAG Representation of Basic Blocks

a = b + cb = a - d

c = b + cd = a - d

b0 c0

+ d0

-

+

a

b,d

c

40

DAG Representation of Basic Blocks

a = b + cb = b - dc = c + de = b + c

b0 c0

+

d0

- +a b c

+ e

41

algebraic identities

a = x^2b = x*2c = x/2d = 1*x

a = x*xb = x+xc = x*0.5d = x

42

coming up next

register allocation

43

The End

top related