PSU CS322 HM 1
Languages and Compiler Design IIIR Code Optimization
Material provided by Prof. Jingke Li
Stolen with pride and modified by Herb Mayer
PSU Spring 2010rev.: 4/16/2010
PSU CS322 HM 2
Agenda
• IR Optimization• Redundancy Elimination• Sample: CSE• Partial Redundancy Elimination (PRE)• Copy Propagation• Value Numbering• Loop Invariant Code Motion• Counter Examples• Strength Reduction• Induction Variable (IV) Elimination
PSU CS322 HM 3
IR Optimization
• Definition: Optimization is the translation of an original program P1 into a semantically equivalent program P2 with better properties
• “Better” depends on the project. Possibilities include code compactness, execution speed, numeric precision, and others
PSU CS322 HM 4
IR OptimizationOptimizations transform a program into a functionally-equivalentprogram with better performance. Transformation can be
implementedat various stages and levels.Advantages of IR-Level Optimization:• IR Operations are explicit, so cost estimations can be accurate• IR Optimizations are machine-independent, hence the results are
portable across different target machinesScopes of Optimization:• Local: Transforming code by analyzing a single basic block• Global: Transforming code by analyzing a whole subroutine• Inter-Procedural: By analyzing the whole programConcepts and Techniques:• Basic blocks & flow graphs• Control-flow analysis & data-flow analysis
PSU CS322 HM 5
Redundancy Elimination
IR code optimization removes redundant computations. The following are specific examples:
• Common Subexpression Elimination (CSE) — Based on lexical representation, applicable to global scope
• Partial Redundancy Elimination — More powerful than CSE
• Copy Propagation — Companion optimization to CSE• Value Numbering (VN) — Value based, single Basic Block• Super-local Value Numbering — Extends VN to multiple
blocks• Loop Invariant Elimination — Removes code from
frequently to rarely executed part of program
PSU CS322 HM 6
Common Subexpression Elimination (CSE)• E is a common subexpression if it occurs at L1 and L2, was computed
at L1, and no components received new values along path to L2• To achieve CSE, introduce Temp to hold subexpression when first
evaluated; see Example from Quicksort():
The second occurrence of 4*i in BB --from Quicksort()-- is a common
subexpression; so is the second occurrence of 4*j
t11 := 4*i x := a[t11] t12 := 4*i t13 := 4*j t14 := a[t13] a[t12]:= t14 t15 := 4*ja[t15] := x
BB before CSE
t11 := 4*i x := a[t11] t12 := t11 t13 := 4*j t14 := a[t13] a[t12]:= t14 t15 := t13a[t15] := x
t11 := 4*i x := a[t11] t13 := 4*j t14 := a[t13]a[t11]:= t14a[t13]:= x
BB’ after CSE BB’’ after total CSE
PSU CS322 HM 7
CSE Across BBsCSE can eliminate redundant computation across Basic Blocks:
i := ja := 4 * iif … goto BB3
before CSE
BB1
i := j b := 4 * i
BB2
i := j c := 4 * i
BB3
i := jtemp := 4 * i a := tempif … goto BB3
after CSE
BB1’
i := j b := temp
BB2’
i := j c := temp
BB3’
PSU CS322 HM 8
Global CSEboth 4*i in BB5 (andBB6) are CSEs
⇒ eliminate t6 and t11,t7, t12, replace with t2
4*j in BB5 and BB6are CSEs
⇒ eliminate t10 and t15,replace with t8 and t13
Now a[t2] in BB5 andBB6 become CSEs
⇒ replace with t3
i := m-1 j := nt1 := 4*n v := a[t1]
BB1
i := i+1t2 := 4*it3 := a[t2]if t3<v goto BB2
BB2
j := j-1t4 := 4*jt5 := a[t4]if t5 > v goto BB3
BB3
if i >= j goto BB6
BB4
t11 := 4*i x := a[t11] t12 := 4*i t13 := 4*j t14 := a[t13] a[t12]:= t14 t15 := 4*ja[t15] := x
BB6 t6 := 4*i x := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7]:= t9 t10 := 4*ja[t10]:= x goto BB2
BB5
PSU CS322 HM 9
Global CSE i := m-1 j := nt1 := 4*n v := a[t1]
BB1
i := i+1t2 := 4*it3 := a[t2]if t3 < v goto BB2
BB2
j := j-1t4 := 4*jt5 := a[t4]if t5 > v goto BB3
BB3
if i >= j goto BB6
BB4
x := t3 t14 := a[t1] a[t2]:= t14 a[t1]:= x
BB6
x := t3 a[t2]:= t5 a[t4]:= x goto BB2
BB5
PSU CS322 HM 10
CSE Algorithm
Available expressions: An expression x y⊕ is available at node n if
every path from the entry node to n evaluates the expression, and there
are no definitions of x or y after the last evaluation
Algorithm:
1. Compute available expressions for all expressions.
2. At each node n : w := x y⊕ , where the expression x y is ⊕available, search backwards for the evaluations of x y⊕ that reach n
3. Replace each evaluation v := x y⊕ found in the search by
t := x y; v := t⊕4. Replace n by w := t
PSU CS322 HM 11
An Improved CSE AlgorithmThe previous CSE algorithm performs the expensive backward search andinserts a new temp for every use of a common subexpression. The followingideas can improve the algorithm:
– Reduce number of new temps by assigning a unique name to each unique expression
– Avoid backward search by a separate traversal of the CFG
Algorithm:1. Compute available expressions for all expressions
2. Initialize an array Name[ e ] = ø for all expressions
3. At each node n : w := x y, where the expression x y (denoted e below) is ⊕ ⊕available:
If Name[ e ] = ø, allocate new name t and set Name[ e ] = t;
Else let t = Name[ e ];Replace n by w := t;
4. In a subsequent traversal of CFG, at each node v := e, if Name[ e ] != ø,
let t = Name[ e ]; replace the node by t := e; v := t;
PSU CS322 HM 12
Yet Another CSE Algorithm
Ideas:
Create one temp for each unique expression.
Let subsequent pass eliminate unnecessary temps.
Algorithm:
1. Compute available expressions for all expressions.
2. At each evaluation of e:• Hash e to a name, t, in a table• Insert assignment t = e.
3. At a use of e where e is available:• Look up e’s name t in the hash table• Replace e with t.
PSU CS322 HM 13
Partial Redundancy Elimination (PRE)
An expression x y is partially redundant at node ⊕ n, if some path from
entry node to n evaluates x y⊕ , and there are no definitions of x or y
after the last evaluation
PRE Optimization (it subsumes CSE):• Discover partially redundant expressions• Convert them to fully redundant expressions• Remove redundancy, to reduce # of overall computations at runtime
= ... x ⊕ y
x ⊕ y
x y⊕ x ⊕ y
x ⊕ y
x y⊕ x ⊕ y
= ...n
⇒
n n
⇒
PSU CS322 HM 14
Copy Propagation
Copy statement has the form f := g
A large number of copy statements may be generated after performing
CSE optimizations. Copy propagation eliminates copy statements
by using g for f wherever possible
t6 := 4*i x := a[t6] t7 := t6 t8 := 4*j t9 := a[t8] a[t7]:= t9 t10 := t8a[t10]:= x goto BB2
BB5
Before
t6 := 4*i x := a[t6] t8 := 4*j t9 := a[t8] a[t6]:= t9 a[t8]:= x goto BB2
BB5’
After
⇒
PSU CS322 HM 15
Cascading Problem
CSE transformations may have a cascading effect — more rounds of
CSE/Copy-propagation may be needed before reaching the final form:
x := b + c y := a + x u := b + c v := a + u
⇒
x := b + c y := a + x u := x v := a + u
x := b + c y := a + x v := a + x
x := b + c y := a + x v := y
⇒ ⇒
PSU CS322 HM 16
Value Numbering
• Each variable is assumed to have a unique initial value• Each unique value is assigned a unique number• An expression’s value is represented by a corresponding symbolic
expression based on the operands’ numbers• E.g. expression x + y’s value is 1+2 , if 1 and 2 are x and y’s value
numbers, respectively• Each unique expression value is also assigned a unique number• When a new variable or expression is encountered, check to see if it
has been assigned a number, if so, use the number, otherwise assign it a new number
• Use a hash table for efficient number lookup
PSU CS322 HM 17
Sample: Value Numbering
Value numbering uses a single round to calculate the effect
of cascaded optimizations
x := b + c y := a + x u := b + c v := a + u
statement var or expr assigned #
x := b + c b
c
b+c (1+2)
x
1
2
3
3
y := a + x a
a+x (4+3)
y
4
5
5
u := b + c u (1+2) 3
v := a + u v (4+3) 5
PSU CS322 HM 18
Loop Invariant Code Motion
If a loop contains a statement t ← a ⊕ b such that a and b have
the same values each time around the loop, then t will also have the
same value each time. Hoist such loop-invariant statement out of loop!
t1 := 0
i := i+1t2 := a * bM[i]:= t2if a < N goto BB3
BB2
x := t2
BB3
BB1 t1 := 0 t2 := a * b
i := i+1M[i]:= t2if a < N goto BB3’
BB2’
x := t2
BB3’
BB1’
⇒
PSU CS322 HM 19
Loop Invariant Criteria
A statement S : t ← a1 a2⊕ is loop-invariant within loop
L if, for each operand ai
1.) ai is a constant, or
2.) all definitions of ai that reach S are outside the loop, or
3.) only 1 definition of ai reaches S, which is loop-invariant
An iterative algorithm can be used to find all loop-invariant
statements
PSU CS322 HM 20
Strength Reduction (SR)
• Definition: Reduction in strength is the replacement of an operation by a cheaper one, e.g. replace * by + if feasible
• Do not make such changes in the source, e.g. do not replace j=2*k; with j=k+k; let optimizer do this
if i >= y goto BB3
Call func1 j := 2 * k i := i + 1 goto BB1
BB2
x := ... BB3
BB1
⇒
if i >= y goto BB3
Call func1 j := k + k i++ goto BB1
BB2
x := ... BB3
BB1
PSU CS322 HM 21
Induction Variable Elimination (IVE)
• Definition: Induction Variable (IV) is a variable iterating through a linear progression of values in a program section
• The program section is frequently a proper loop• IV are either fundamental or dependent on other IVs• IV elimination reduces multiple IVs into fewer, thus saving
operations– Since these operations are inside inner loops, savings can be
significant
• After IVE other optimizations can be applied too, e.g. SR
PSU CS322 HM 22
Induction Variable Elimination, Cont’dinteger a(100) -- low bound is 1, not 0 like in C++ or Java, subtract!do i = 1, 100 -- OK for i to be undefined after loop
a(i) = 2 * i -- rhs deliberately not 4 * i, which would be easy: = IV
enddoBB0
t2 = 2 * t1t3 = 4 * t1t4 = t3 – 4t5 = A(a)+t4*t5 = t2t1 = t1 + 1Goto BB1
If t1>100 goto BB3
t1 = 1 // i
BB1
BB2
BB3Ater loop i undefined
BB0’
t2 = 2 * t1t5 = A(a)+t0*t5 = t2t0 = t0 + 4Goto BB1’
If t0>= 400 goto BB3’
t0 = 0 // IVt1 = 1 // i
BB1’
BB2’
BB3’Ater loop i undefined
⇒
BB0’’
t2 = 2 * t1*t0 = t2t0 = t0 + 4Goto BB1’’
If t0>= A(a)+400 goto BB3’
t0 = A(a) // IVt1 = 1 // i
BB1’’
BB2’’
BB3’
⇒
BB3’’Ater loop i undefined