7. code generation chih-hung wang compilers references 1. c. n. fischer, r. k. cytron and r. j....

Post on 18-Jan-2016

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

7. Code Generation

Chih-Hung Wang

Compilers

References1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010.2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000.3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2nd Ed. 2006)1

2

Overview

3

InterpretationAn interpreter is a program that consider

the nodes of the AST in the correct order and performs the actions prescribed for those nodes by the semantics of the language.

Two varietiesRecursiveIterative

4

InterpretationRecursive interpretation

operates directly on the AST [attribute grammar]

simple to writethorough error checksvery slow: 1000x speed of compiled code

Iterative interpretationoperates on intermediate code

good error checkingslow: 100x speed of compiled code

5

Recursive Interpretation

6

Self-identifying datamust handle user-defined data typesvalue = pointer to type descriptor +

array of subvalues

example: complex number

re: 3.0

im: 4.0

7

Complex number representation

8

Iterative interpretation

Operates on threaded AST

Active node pointer

Flat loop over a case statement

IF

condition THEN ELSE

FI

9

Sketch of the main loop

10

Example for demo compiler

11

Code GenerationCompilation produces object code from the

intermediate code tree through a process called code generation

Tree rewritingReplace nodes and subtrees of the AST by

target code segmentsProduce a linear sequence of instructions

from the rewritten AST

12

Example of code generationa:=(b[4*c+d]*2)+9;

13

Machine instructionsLoad_Addr M[Ri], C, Rd

Loads the address of the Ri-th element of the array at M into Rd, where the size of the elements of M is C bytes

Load_Byte (M+Ro)[Ri], C, RdLoads the byte contents of the Ri-th element

of the array at M plus offset Ro into Rd, where the other parameters have the same meanings as above

14

Two sample instructions with their ASTs

15

Code generationMain issues:Code selection – which template?Register allocation – too few!Instruction ordering

Optimal code generation is NP-completeConsider small parts of the ASTSimplify target machineUse conventions

16

Object code sequence Load_Byte (b+Rd)[Rc], 4, RtLoad_Addr 9[Rt], 2, Ra

17

Trivial code generation

18

Code for (7*(1+5))

19

Partial evaluation

20

New Code

21

Simple code generationConsider one AST node at a time

Two simplistic target machinesPure register machinePure stack machine

BP

SP

stack

frame

vars

22

Pure stack machineInstructions

23

Example of p:=p+5Push_Local #pPush_Const 5Add_Top2Store_Local #p

24

Pure register machineInstructions

25

Example of p:=p+5Load_Mem p, R1Load_Const 5, R2Add_Reg R2, R1Store_Reg R1, p

26

Simple code generation for a stack machineThe AST for b*b – 4 *(a*c)

27

The ASTs for the stack machine instructions

28

The AST for b*b - 4*(a*c) rewritten

29

Simple code generationfor a stack machine (demo)example: b*b – 4*a*cthreaded AST

-

* *

*b b

a c

4

30

Simple code generationfor a stack machine (demo)example: b*b – 4*a*cthreaded AST

-

* *

*b b

a c

4

Sub_Top2

Mul_Top2 Mul_Top2

Mul_Top2Push_Local #b Push_Local #b

Push_Local #a Push_Local #c

Push_Const 4

31

Simple code generationfor a stack machine (demo)

example: b*b – 4*a*crewritten AST

-

* *

*b b

a c

4

Sub_Top2

Mul_Top2 Mul_Top2

Mul_Top2Push_Local #b Push_Local #b

Push_Local #a Push_Local #c

Push_Const 4

Push_Local #b

Push_Local #b

Mul_Top2

Push_Const 4

Push_Local #a

Push_Local #c

Mul_Top2

Mul_Top2

Sub_Top2

32

Depth-first code generation

33

Stack configurations

34

Simple code generation for a register machineThe ASTs for the register machine

instructions

35

Code generation with register allocation

36

Code generation with register numbering

37

Register machine code for b*b - 4*(a*c)

38

Register contents

39

Weighted register allocationIt is advantageous to generate the code for

the child that requires the most registers first

Weight:The number of registers required by a node

40

Register weight of a node

41

AST for b*b-4*(a*c) with register weights

42

Weighted register machine code

43

ExampleParameter number N 2 3 1Stored weight 4 2 1Registers occupied when 0 1 2 starting parameter NMaximum per parameter 4 3 3Overall maximum 4

44

Example: Tree representation

45

Register spilling

Too few registers?Spill registers in memory, to be

retrieved laterHeuristic: select subtree that uses all

registers, and replace it by a temporary

example: b*b – 4*a*c2 registers 1

2

11

11

3

2

2

2-

* *

*b b

a c

4

T11

46

1

2

11

11

3

2

2

2-

* *

*b b

a c

4

T11

Register spilling

Load_Mem b, R1

Load_Mem b, R2

Mul_Reg R2, R1

Store_Mem R1, T1

Load_Mem a, R1

Load_Mem c, R2

Mul_Reg R2, R1

Load_Const 4, R2

Mul_Reg R1, R2

Load_Mem T1, R1

Sub_Reg R2, R1

47

Another example

1

2

11

3

2

2

2-

* *

*b b

a c

4

T1

1

48

Algorithm

49

Machines with register-memory operationsAn instruction:

Add_Mem X, R1Adding the contents of memory location X to

R1

50

Register-weighted tree for a memory-register machine

51

Code generation for basic blocksFinding the optimal rewriting of the AST

with available instruction templates is NP-complete.

Three techniquesBasic blocksBottom-up tree rewritingRegister allocation by graph coloring

52

Basic blockImprove quality of code emitted by

simple code generationConsider multiple AST nodes at a time

Generate code for maximal basic blocks that cannot be extended by including adjacent AST nodes

basic block: a part of the control graph that

contains no splits (jumps) or combines (labels)

53

Example of basic blockA basic block consists of expressions and

assignments

Fixed sequence (;) limits code generationAn AST is too restrictive

54

From AST to dependency graphAST for the simple basic block

55

Simple algorithm to convert AST to a data dependency graphReplace arcs by downwards arrows

(upwards for destination under assignment)

Insert data dependencies from use of V to preceding assignment to V

Insert data dependencies from the assignment to a variable V to the previous assignment to V

Add roots to the graph (output variables)Remove ;-nodes and connecting arrows

56

Simple data dependency graph

57

Cleaned-up graph

58

Exercise

{ int n;

n = a+1;

x = (b+c) * n;

n = n+1;

y = (b+c) * n;

}

Convert the above codes to a data dependency graph

59

Answer

+

b c a

+

1

*

x

+ +

1

*

y

60

Common subexpression eliminationSimple example

x=a*a+2*a*b + b*b;y=a*a-2*a*b + b*b;Three common subxpressionsdouble quads = a*a + b*b;double cross_prod = 2*a*b;x = quads + cross_prod;y = quads – cross_prod;

61

Common subexpressionEqual subexpression in a basic block are

not necessarily common subexpressions

x=a*a+2*a*b + b*b;a=b=0;y=a*a-2*a*b + b*b;

62

Common subexpression example (1/3)

63

Common subexpression example (2/3)

64

Common subexpression example (3/3)

65

From dependency graph to codeRewrite nodes with machine instruction

templates, and linearize the resultInstruction ordering: ladder sequencesRegister allocation: graph coloring

66

Linearization of thedata dependency graph

Example:

(a+b)*c – d

Definition of a ladder sequenceEach root node is a ladder sequence A ladder sequence S ending in operator

node N can be extended with the left operand of N

If operator N is commutative then S may also extended with the right operand of N

Load_Mem a, R1

Add_Mem b, R1

Mul_Mem, c, R1

Sub_Mem d, R1

67

Code generated for a given ladder sequence

load_Mem b, R1 Add_Reg I1, R1 Add_Mem c, R1 Store_Reg R1, x

68

Heuristic ordering algorithmTo delay the issues of register allocation,

use pseudo-registers during the linearization

•Select ladder sequence S without more than one incoming dependencies

•Introduce temporary (pseudo-) registers for non-leaf operands, which become additional roots

•Generate code for S, using R1 as the ladder register

•Remove S from the graph

•Repeat step 1 through 4 until the entire data dependency graph has been consumed and rewritten to code

69

Example of linearization

X1

70

The code for y, *, +Load_Reg X1, R1Add_Const 1, R1Multi_Mem d, R1Store_Reg R1, y

71

Remove the ladder sequence y, *, +

72

The code for x, +, +, *Load_Reg X1, R1Mult_Reg X1, R1Add_Mem b, R1Add_Mem c, R1Store_Reg R1, x

73

The Last stepLoad_Mem a, R1Add_Const 1, R1Load_Reg R1, X1

74

The results of code generation

75

Exercise

Generate code for the following dependency graph

*

2

*

+

+

x

-

+

y

*

a

*

b

76

Answers

*

2

*

+

+

x

-

+

y

*

a

*

b

Load_Reg R2, R1

Add_Reg R3, R1

Add_Reg, R4, R1

Store_Mem R1, x

1) ladder: x, +, +

Load_Reg R2, R1

Sub_Reg R3, R1

Add_Reg, R4, R1

Store_Mem R1, y

2) ladder: y, +, -

R2R3

R4

Load_Const 2, R1

Mul_Reg Ra, R1

Mul_Reg, Rb, R1

Load_Reg R1, R3

3) ladder: R3, *, *

Load_Reg Ra, R1

Mul_Reg Ra, R1

Load_Reg R1, R2

4) ladder: R2, *

Load_Reg Rb, R1

Mul_Reg Rb, R1

Load_Reg R1, R4

5) ladder: R4, *

77

Register allocation for the linearized codeMap the pseudo-registers to memory

locations or real registers

gcc compiler

78

Code optimization in the presence of pointersPointers cause two different problems

for the dependency grapha=x * y;*p = 3;b = x * y;

a=*p * y;b = 3;c = *p * q;

x * y is not a common subexpression if p happens to point to x or y

*p * q is not a common subexpression if p happens to point to b

79

Example (1/4)Assignment under a pointer

80

Example (2/4)

Data dependency graph with an assignment under a pointer

81

Example (3/4)

Cleaned-up graph

82

Example (4/4)

Target code

*x:=R1

83

BURS code generationIn practice, machines often have a great

variety of instructions, simple ones and complicated ones, and better code can be generated if all available instructions are utilized.

Machines often have several hundred different machine instructions, often each with ten or more addressing modes, and it would be very advantages if code generators for such machines could be derived from a concise machine description rather than written by hand.

84

BURS code generationSimple instruction patterns (1/2)

85

BURS code generationSimple instruction patterns (2/2)

86

Example: Input tree

87

Naïve rewrite Its cost is 17 units

1 + 3 + 4 + 1 + 4 + 3 + 1 = 17

88

Code resulting

89

Top-down largest-fit rewrite

90

DiscussionsHow do we find all possible rewrites, and

how do we represent them? It will be clear that we do not fancy listing them all!!

How do we find the best/cheapest rewrite among all possibilities, preferably in time linear in the size of the expression to be translated.

91

Bottom-up pattern matchingThe dotted trees

92

Outline code for bottom-up pattern matching

93

Label set resulting

94

Instruction selection by dynamic programmingBottom-up pattern matching with costs

#5->reg#6->reg#7.1#8.1

Instructionsselection

95

Cost evaluationLower *

#5->reg@7#6->reg@8 (1+3+4)

Higher *#6->reg@12 (1+7+4)#8->reg@9 (1+3+5)

Top + (?)Exercise

96

Code generation by bottom-up matching

97

Code generation by bottom-up matching, using commutativity

98

Pattern matching and instruction selection combinedTwo basic operands

State S1: -> cst@0 #1->reg@1State S2: -> mem@0 #2->reg@3

99

States of the BURS

100

Creating the cost-conscious next-state tableThe triplet {‘+’, S1, S1}=S3

S3:#4->reg@3 (1+1+1)

{‘+’, S1, S2} = S5S5:

#3->reg@1+0+3=4#4->reg@1+3+1=5

Exercise: {‘+’, S1, S5}Exercise: {‘*’, S1, S2}

–#5->reg@1+0+6=7 (4)–#6->reg@1+3+4=8–#7.1@0+3+0=3 (0)–#8.1@0+3+0=3 (0)

101

Cost conscious next table

102

Code generation using cost-conscious next-state table

103

Register allocation by graph coloringProcedure-wide register allocationOnly live variables require register storage

Two variables(values) interfere when their live ranges overlap

dataflow analysis: a variable is live at node N if the value it holds is used on some path further down the control-flow graph; otherwise it is dead

104

A program segment for live analysis

105

Live range of the variables

106

Graph coloringNP complete problem

Heuristic: color easy nodes lastFind node N with lowest degreeRemove N from the graphColor the simplified graph Set color of N to the first color that is

not used by any of N’s neighbors

107

Coloring process

3 registers

108

Preprocessing the intermediate codePreprocessing of expressions

char lower_case_from_capital(char ch) { return ch + (‘a’ – ‘A’); }Constant expression evaluation char lower_case_from_capital(char ch) { return ch + 32; }

109

Arithmetic simplificationTransformations that replace an operation

by a simpler one are called strength reductions.

Operations that can be removed completely are called null sequences.

110

Some transformations for arithmetic simplification

111

Preprocessing of if-statements and goto statementsWhen the condition in an if-then-else

statement turns out to be constant, we can delete the code of the branch that will never be executed. This process is called dead code elimination.

If a goto or return statement is followed by code that has no incoming data flow, that code is dead and can be eliminated.

112

Stack representations

113

Stack representations (details)

condition

IF

THEN ELSE

>

y 0yx

5

yx

5

5

yx

5

50

yx

5

T

yx

5

yx

5

yx

5

x = 7;

yx

57

yx

5

dead

code

FI

merge

114

Preprocessing of routinesIn-lining method

115

In-lining result

Advanced examples:{int n=3; printf(“square=%d\n”, n*n);}=> {int n=3; printf(“square=%d\n”, 3*3);}=> {int n=3; printf(“square=%d\n”, 9);}

Load_par “square=%d\n”Load_par 9Call printf

116

CloningExample

double poewr_series(int n, double a[], double x) { int p; for (p=0; p<n; p++) result += a[p] * (x**p); return result }

Is called with x set to 1.0

double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] * (1.0**p); return result }

double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] ; return result }

117

Postprocessing the target codeStupid instruction sequences

Load_Reg R1, R2Load_Reg R2, R1orStore_Reg R1, nLoad_Mem n, R1

118

Creating replacement patternsExample

Load_Reg Ra, Rb; Load_Reg Rc, Rd |Ra=Rd, Rb=Rc => Load_Reg Ra, Rb

Load_const 1, Ra; Add_Reg Rb, Rc |Ra=Rb, is_last_use(Rb) => Increment Rc

119

Locating and replacing instructionsMultiple pattern matchingUsing FSADotted items

120

HomeworkStudy sections

4.2.13 Machine code generation4.3 Assemblers, linkers and loaders

top related