classical optimization
DESCRIPTION
Classical Optimization. Types of classical optimizations Operation level : one operation in isolation Local : optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization - PowerPoint PPT PresentationTRANSCRIPT
1
Classical Optimization
Types of classical optimizations Operation level: one operation in isolation Local: optimize pairs of operations in same basic block
(with or without dataflow analysis), e.g. peephole optimization
Global: optimize pairs of operations spanning multiple basic blocks and must use dataflow analysis in this case, e.g. reaching definitions, UD/DU chains, or SSA forms
Loop: optimize loop body and nested loops
2
Local Constant Folding
Goal: eliminate unnecessary operations
Rules:1. X is an arithmetic
operation2. If src1(X) and src2(X) are
constant, then change X by applying the operation
r7 = 4 + 1
r5 = 2 * r4r6 = r5 * 2 src1(X) = 4
src2(X) = 1
3
Local Constant Combining
Goal: eliminate unnecessary operations First operation often becomes
dead
Rules:1. Operations X and Y in same
basic block2. X and Y have at least one
literal src3. Y uses dest(X)4. None of the srcs of X have
defs between X and Y (excluding Y)
r7 = 5
r5 = 2 * r4r6 = r5 * 2
r6 = r4 * 4
4
Local Strength Reduction
Goal: replace expensive operations with cheaper ones
Rules (example):1. X is an multiplication
operation where src1(X) or src2(X) is a const 2k integer literal
2. Change X by using shift operation
3. For k=1 can use add
r7 = 5
r5 = 2 * r4r6 = r4 * 4
r6 = r4 << 2
r5 = r4 + r4
5
Local Constant Propagation
Goal: replace register uses with literals (constants) in single basic block
Rules:1. Operation X is a move to
register with src1(X) literal2. Operation Y uses dest(X)3. There is no def of dest(X)
between X and Y (excluding defs at X and Y)
4. Replace dest(X) in Y with src1(X)
r1 = 5r2 = _xr3 = 7r4 = r4 + r1r1 = r1 + r2r1 = r1 + 1r3 = 12r8 = r1 - r2r9 = r3 + r5r3 = r2 + 1r7 = r3 - r1M[r7] = 0
6
Local Common Subexpression Elimination (CSE)
Goal: eliminate recomputations of an expression More efficient code Resulting moves can get copy
propagated (see later)
Rules:1. Operations X and Y have the
same opcode and Y follows X2. src(X) = src(Y) for all srcs3. For all srcs, no def of a src
between X and Y (excluding Y)4. No def of dest(X) between X and
Y (excluding X and Y)5. Replace Y with move dest(Y) =
dest(X)
r1 = r2 + r3r4 = r4 + 1r1 = 6r6 = r2 + r3r2 = r1 - 1r5 = r4 + 1r7 = r2 + r3r5 = r1 - 1
7
Dead Code Elimination
Goal: eliminate any operation who’s result is never used
Rules (dataflow required)1. X is an operation with no use
in DU chain, i.e. dest(X) is not live
2. Delete X if removable (not a mem store or branch)
Rules too simple! Misses deletion of r4, even
after deleting r7, since r4 is live in loop
Better is to trace UD chains backwards from “critical” operations
r4 = r4 + 1r7 = r1 * r4
r3 = r3 + 1 r2 = 0
r3 = r2 + r1
M[r1] = r3
r1 = 3r2 = 10
8
Local Backward Copy Propagation
Goal: propagate LHS of moves backward Eliminates useless moves
Rules (dataflow required)1. X and Y in same block2. Y is a move to register3. dest(X) is a register that is not
live out of the block4. Y uses dest(X)5. dest(Y) not used or defined
between X and Y (excluding X and Y)
6. No uses of dest(X) after the first redef of dest(Y)
7. Replace src(Y) on path from X to Y with dest(X) and remove Y
r1 = r8 + r9r2 = r9 + r1r4 = r2r6 = r2 + 1r9 = r1r7 = r6r5 = r6 + 1r4 = 0r8 = r2 + r7
9
Global Constant Propagation
Goal: globally replace register uses with literals
Rules (dataflow required)1. X is a move to a register with
src1(X) literal2. Y uses dest(X)3. dest(X) has only one def at X
for UD chains to Y4. Replace dest(X) in Y with
src1(X)
r5 = 2r7 = r1 * r5
r3 = r3 + r5 r2 = 0
r3 = r2 + r1r6 = r7 * r4
M[r1] = r3
r1 = 4r2 = 10
10
Global Constant Propagation with SSA
Goal: globally replace register uses with literals
Rules (high level)1. For operation X with a register
src(X)2. Find def of src(X) in chain3. If def is move of literal, src(X) is
constant: done4. If RHS of def is an operation,
including node, recurse on all srcs
5. Apply rule for operation to determine src(X) constant
6. Note: abstract values T (top) and (bottom) are often used to indicate unknown values
r5 = 2r7 = r1 * r5
r3 = r3 + r5 r2 = 0
r3 = r2 + r1r6 = r7 * r4
M[r1] = r3
r1 = 4r2 = 10
Exercise: compute SSA form and propagate constants
11
Forward Copy Propagation
Goal: globally propagate RHS of moves forward Reduces dependence chain May be possible to eliminate
moves
Rules (dataflow required)1. X is a move with src1(X) register2. Y uses dest(X)3. dest(X) has only one def at X for
UD chains to Y4. src1(X) has no def on any path
from X to Y5. Replace dest(X) in Y with src1(X)
r1 = r2r3 = r4
r6 = r3 + 1 r2 = 0
r5 = r2 + r3
12
Global Common Subexpression Elimination (CSE)
Goal: eliminate recomputations of an expression
Rules:1. X and Y have the same opcode
and X dominates Y2. src(X) = src(Y) for all srcs3. For all srcs, no def of a src on any
path between X and Y (excluding Y)
4. Insert rx = dest(X) immediately after X for new register rx
5. Replace Y with move dest(Y) = rx
r1 = r2 * r6r3 = r4 / r7
r2 = r2 + 1 r1 = r3 * 7
r5 = r2 * r6r8 = r4 / r7
r9 = r3 * 7
13
Loop Optimizations
Loops are the most important target for optimization Programs spend much time in loops
Loop optimizations Invariant code removal (aka. code motion) Global variable migration Induction variable strength reduction Induction variable elimination
14
Code Motion
Goal: move loop-invariant computations to preheader
Rules:1. Operation X in block that
dominates all exit blocks2. X is the only operation to
modify dest(X) in loop body3. All srcs of X have no defs in any
of the basic blocks in the loop body
4. Move X to end of preheader5. Note 1: if one src of X is a
memory load, need to check for stores in loop body
6. Note 2: X must be movable and not cause exceptions
r4 = M[r5]r7 = r4 * 3
r8 = r2 + 1r7 = r8 * r4
r3 = r2 + 1
r1 = r1 + r7
M[r1] = r3
r1 = 0 preheader
header
15
Global Variable Migration
Goal: assign a global variable to a register for the entire duration of a loop
Rules:1. X is a load or store to M[x]2. Address x of M[x] not modified
in loop3. Replace all M[x] in loop by new
register rx4. Add rx = M[x] to preheader5. Add M[x] = rx to each loop exit6. Memory disambiguation is
required: all mem ops in loop whose address can equal x must use same address x
r4 = M[r5]r4 = r4 + 1
r8 = M[r5]r7 = r8 * r4
M[r5] = r4
M[r5] = r7
16
Loop Strength Reduction (1)
Goal: create basic IVs from derived IVs
Rules1. X is a *, <<, +, or - operation2. src1(X) is a basic IV3. src2(X) is invariant4. No other ops modify dest(X)5. dest(X) != src(X) for all srcs6. dest(X) is a register
r5 = r4 - 3r4 = r4 + 1
r7 = r4 * r9
r6 = r4 << 2
preheader
header
src1(X) = r4src2(X) = r9
dest(X) = r7
Basic IV r4 has triple (r4, 1, ?)
17
Loop Strength Reduction (2)
Transformation1. Insert into the bottom of the
preheader:new_reg = RHS(X)
2. If opcode(X) is not + or -, then insert into the bottom of the preheader:new_inc = inc(src1(X)) opcode(X) src2(X)
3. Elsenew_inc = inc(src1(X))
4. Insert at each update of src1(X):new_reg += new_inc
5. Change X by:dest(X) = new_reg
r5 = r4 - 3r4 = r4 + 1r1 = r1 + r2
r7 = r1
r6 = r4 << 2
r1 = r4 * r9r2 = 1 * r9
Exercise: apply strength reduction to r5 and r6
18
IV Elimination (1) Goal: remove unnecessary basic IVs
from the loop by substituting uses with another basic IV
Rules for IVs with same increment and initial value:1. Find two basic IV x and y2. If x and y in same family and have
same increment and initial values3. Incremented at same place4. x is not live at loop exit5. For each basic block where x is defined,
there are no uses of x between first/last def of x and last/first def of y
6. Replace uses of x with y
r1 = r1 - 1r2 = r2 - 1
r9 = r2 + r4 r7 = r1 * r9
r4 = M[r1]
M[r2] = r7
r1 = 0r2 = 0
Exercise: apply IV elimination
19
IV Elimination (2)
Many variants, from simple to complex:1. Trivial cases: IV variable that is never used except by the
increment operations and is not live at loop exit2. IVs with same increment and same initial value3. IVs with same increment and initial values are known constant
offset from each other4. IVs with same increment, but initial values unknown5. IVs with different increments and no info on initial values
Method 1 and 2 are virtually free, so always applied Methods 3 to 5 require preheader operations
20
IV Elimination (3)
Example for method 4
r3 = M[r1+4]r4 = M[r2+8]…r1 = r1 + 4r2 = r2 + 4
r1 = ?r2 = ?
r3 = M[r1+4]r4 = M[r1+r5]…r1 = r1 + 4
r1 = ?r2 = ?r5 = r2-r1+8