classical optimization

20
1 Classical Optimization Types of classical optimizations Operation level: one operation in isolation Local: optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization Global: optimize pairs of operations spanning multiple basic blocks and must use dataflow analysis in this case, e.g. reaching definitions, UD/DU chains, or SSA forms Loop: optimize loop body and nested loops

Upload: gerodi

Post on 05-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Classical Optimization. Types of classical optimizations Operation level : one operation in isolation Local : optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Classical Optimization

1

Classical Optimization

Types of classical optimizations Operation level: one operation in isolation Local: optimize pairs of operations in same basic block

(with or without dataflow analysis), e.g. peephole optimization

Global: optimize pairs of operations spanning multiple basic blocks and must use dataflow analysis in this case, e.g. reaching definitions, UD/DU chains, or SSA forms

Loop: optimize loop body and nested loops

Page 2: Classical Optimization

2

Local Constant Folding

Goal: eliminate unnecessary operations

Rules:1. X is an arithmetic

operation2. If src1(X) and src2(X) are

constant, then change X by applying the operation

r7 = 4 + 1

r5 = 2 * r4r6 = r5 * 2 src1(X) = 4

src2(X) = 1

Page 3: Classical Optimization

3

Local Constant Combining

Goal: eliminate unnecessary operations First operation often becomes

dead

Rules:1. Operations X and Y in same

basic block2. X and Y have at least one

literal src3. Y uses dest(X)4. None of the srcs of X have

defs between X and Y (excluding Y)

r7 = 5

r5 = 2 * r4r6 = r5 * 2

r6 = r4 * 4

Page 4: Classical Optimization

4

Local Strength Reduction

Goal: replace expensive operations with cheaper ones

Rules (example):1. X is an multiplication

operation where src1(X) or src2(X) is a const 2k integer literal

2. Change X by using shift operation

3. For k=1 can use add

r7 = 5

r5 = 2 * r4r6 = r4 * 4

r6 = r4 << 2

r5 = r4 + r4

Page 5: Classical Optimization

5

Local Constant Propagation

Goal: replace register uses with literals (constants) in single basic block

Rules:1. Operation X is a move to

register with src1(X) literal2. Operation Y uses dest(X)3. There is no def of dest(X)

between X and Y (excluding defs at X and Y)

4. Replace dest(X) in Y with src1(X)

r1 = 5r2 = _xr3 = 7r4 = r4 + r1r1 = r1 + r2r1 = r1 + 1r3 = 12r8 = r1 - r2r9 = r3 + r5r3 = r2 + 1r7 = r3 - r1M[r7] = 0

Page 6: Classical Optimization

6

Local Common Subexpression Elimination (CSE)

Goal: eliminate recomputations of an expression More efficient code Resulting moves can get copy

propagated (see later)

Rules:1. Operations X and Y have the

same opcode and Y follows X2. src(X) = src(Y) for all srcs3. For all srcs, no def of a src

between X and Y (excluding Y)4. No def of dest(X) between X and

Y (excluding X and Y)5. Replace Y with move dest(Y) =

dest(X)

r1 = r2 + r3r4 = r4 + 1r1 = 6r6 = r2 + r3r2 = r1 - 1r5 = r4 + 1r7 = r2 + r3r5 = r1 - 1

Page 7: Classical Optimization

7

Dead Code Elimination

Goal: eliminate any operation who’s result is never used

Rules (dataflow required)1. X is an operation with no use

in DU chain, i.e. dest(X) is not live

2. Delete X if removable (not a mem store or branch)

Rules too simple! Misses deletion of r4, even

after deleting r7, since r4 is live in loop

Better is to trace UD chains backwards from “critical” operations

r4 = r4 + 1r7 = r1 * r4

r3 = r3 + 1 r2 = 0

r3 = r2 + r1

M[r1] = r3

r1 = 3r2 = 10

Page 8: Classical Optimization

8

Local Backward Copy Propagation

Goal: propagate LHS of moves backward Eliminates useless moves

Rules (dataflow required)1. X and Y in same block2. Y is a move to register3. dest(X) is a register that is not

live out of the block4. Y uses dest(X)5. dest(Y) not used or defined

between X and Y (excluding X and Y)

6. No uses of dest(X) after the first redef of dest(Y)

7. Replace src(Y) on path from X to Y with dest(X) and remove Y

r1 = r8 + r9r2 = r9 + r1r4 = r2r6 = r2 + 1r9 = r1r7 = r6r5 = r6 + 1r4 = 0r8 = r2 + r7

Page 9: Classical Optimization

9

Global Constant Propagation

Goal: globally replace register uses with literals

Rules (dataflow required)1. X is a move to a register with

src1(X) literal2. Y uses dest(X)3. dest(X) has only one def at X

for UD chains to Y4. Replace dest(X) in Y with

src1(X)

r5 = 2r7 = r1 * r5

r3 = r3 + r5 r2 = 0

r3 = r2 + r1r6 = r7 * r4

M[r1] = r3

r1 = 4r2 = 10

Page 10: Classical Optimization

10

Global Constant Propagation with SSA

Goal: globally replace register uses with literals

Rules (high level)1. For operation X with a register

src(X)2. Find def of src(X) in chain3. If def is move of literal, src(X) is

constant: done4. If RHS of def is an operation,

including node, recurse on all srcs

5. Apply rule for operation to determine src(X) constant

6. Note: abstract values T (top) and (bottom) are often used to indicate unknown values

r5 = 2r7 = r1 * r5

r3 = r3 + r5 r2 = 0

r3 = r2 + r1r6 = r7 * r4

M[r1] = r3

r1 = 4r2 = 10

Exercise: compute SSA form and propagate constants

Page 11: Classical Optimization

11

Forward Copy Propagation

Goal: globally propagate RHS of moves forward Reduces dependence chain May be possible to eliminate

moves

Rules (dataflow required)1. X is a move with src1(X) register2. Y uses dest(X)3. dest(X) has only one def at X for

UD chains to Y4. src1(X) has no def on any path

from X to Y5. Replace dest(X) in Y with src1(X)

r1 = r2r3 = r4

r6 = r3 + 1 r2 = 0

r5 = r2 + r3

Page 12: Classical Optimization

12

Global Common Subexpression Elimination (CSE)

Goal: eliminate recomputations of an expression

Rules:1. X and Y have the same opcode

and X dominates Y2. src(X) = src(Y) for all srcs3. For all srcs, no def of a src on any

path between X and Y (excluding Y)

4. Insert rx = dest(X) immediately after X for new register rx

5. Replace Y with move dest(Y) = rx

r1 = r2 * r6r3 = r4 / r7

r2 = r2 + 1 r1 = r3 * 7

r5 = r2 * r6r8 = r4 / r7

r9 = r3 * 7

Page 13: Classical Optimization

13

Loop Optimizations

Loops are the most important target for optimization Programs spend much time in loops

Loop optimizations Invariant code removal (aka. code motion) Global variable migration Induction variable strength reduction Induction variable elimination

Page 14: Classical Optimization

14

Code Motion

Goal: move loop-invariant computations to preheader

Rules:1. Operation X in block that

dominates all exit blocks2. X is the only operation to

modify dest(X) in loop body3. All srcs of X have no defs in any

of the basic blocks in the loop body

4. Move X to end of preheader5. Note 1: if one src of X is a

memory load, need to check for stores in loop body

6. Note 2: X must be movable and not cause exceptions

r4 = M[r5]r7 = r4 * 3

r8 = r2 + 1r7 = r8 * r4

r3 = r2 + 1

r1 = r1 + r7

M[r1] = r3

r1 = 0 preheader

header

Page 15: Classical Optimization

15

Global Variable Migration

Goal: assign a global variable to a register for the entire duration of a loop

Rules:1. X is a load or store to M[x]2. Address x of M[x] not modified

in loop3. Replace all M[x] in loop by new

register rx4. Add rx = M[x] to preheader5. Add M[x] = rx to each loop exit6. Memory disambiguation is

required: all mem ops in loop whose address can equal x must use same address x

r4 = M[r5]r4 = r4 + 1

r8 = M[r5]r7 = r8 * r4

M[r5] = r4

M[r5] = r7

Page 16: Classical Optimization

16

Loop Strength Reduction (1)

Goal: create basic IVs from derived IVs

Rules1. X is a *, <<, +, or - operation2. src1(X) is a basic IV3. src2(X) is invariant4. No other ops modify dest(X)5. dest(X) != src(X) for all srcs6. dest(X) is a register

r5 = r4 - 3r4 = r4 + 1

r7 = r4 * r9

r6 = r4 << 2

preheader

header

src1(X) = r4src2(X) = r9

dest(X) = r7

Basic IV r4 has triple (r4, 1, ?)

Page 17: Classical Optimization

17

Loop Strength Reduction (2)

Transformation1. Insert into the bottom of the

preheader:new_reg = RHS(X)

2. If opcode(X) is not + or -, then insert into the bottom of the preheader:new_inc = inc(src1(X)) opcode(X) src2(X)

3. Elsenew_inc = inc(src1(X))

4. Insert at each update of src1(X):new_reg += new_inc

5. Change X by:dest(X) = new_reg

r5 = r4 - 3r4 = r4 + 1r1 = r1 + r2

r7 = r1

r6 = r4 << 2

r1 = r4 * r9r2 = 1 * r9

Exercise: apply strength reduction to r5 and r6

Page 18: Classical Optimization

18

IV Elimination (1) Goal: remove unnecessary basic IVs

from the loop by substituting uses with another basic IV

Rules for IVs with same increment and initial value:1. Find two basic IV x and y2. If x and y in same family and have

same increment and initial values3. Incremented at same place4. x is not live at loop exit5. For each basic block where x is defined,

there are no uses of x between first/last def of x and last/first def of y

6. Replace uses of x with y

r1 = r1 - 1r2 = r2 - 1

r9 = r2 + r4 r7 = r1 * r9

r4 = M[r1]

M[r2] = r7

r1 = 0r2 = 0

Exercise: apply IV elimination

Page 19: Classical Optimization

19

IV Elimination (2)

Many variants, from simple to complex:1. Trivial cases: IV variable that is never used except by the

increment operations and is not live at loop exit2. IVs with same increment and same initial value3. IVs with same increment and initial values are known constant

offset from each other4. IVs with same increment, but initial values unknown5. IVs with different increments and no info on initial values

Method 1 and 2 are virtually free, so always applied Methods 3 to 5 require preheader operations

Page 20: Classical Optimization

20

IV Elimination (3)

Example for method 4

r3 = M[r1+4]r4 = M[r2+8]…r1 = r1 + 4r2 = r2 + 4

r1 = ?r2 = ?

r3 = M[r1+4]r4 = M[r1+r5]…r1 = r1 + 4

r1 = ?r2 = ?r5 = r2-r1+8