code optimization manas thakur
TRANSCRIPT
CS502: Compiler Design
Code Optimization
Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 2
Fast. Faster. Fastest?
Lexical AnalyzerLexical Analyzer
Syntax AnalyzerSyntax Analyzer
Semantic AnalyzerSemantic Analyzer
Intermediate Code Generator
Intermediate Code Generator
Character stream
Token stream
Syntax tree
Syntax tree
Intermediaterepresentation
Machine-Independent Code Optimizer
Code GeneratorCode Generator
Target machine code
Intermediate representation
Machine-Dependent Code Optimizer
Target machine code
SymbolTable
F r
o n
t e
n d
B a
c k
e n
d
Manas Thakur CS502: Compiler Design 3
Role of Code Optimizer
● Make the program better
– time, memory, energy, ...
● No guarantees in this land!
– Will a particular optimization for sure improve something?
– Will performing an optimization affect something else?
– In what order should I perform the optimizations?
– What “scope” to perform certain optimization at?
– Is the optimizer fast enough?
● Can an optimized program be optimized further?
Manas Thakur CS502: Compiler Design 4
Full employment theorem for compiler writers
● Statement: There is no fully optimizing compiler.
● Assume it exists:
– such that it transforms a program P to the smallest program Opt(P) that has the same behaviour as P.
– Halting problem comes to the rescue:● Smallest program that never halts:
L1: goto L1– Thus, a fully optimizing compiler could solve the halting problem by
checking if a given program is L1: goto L1!
– But HP is an undecidable problem.
– Hence, a fully optimizing compiler can’t exist!
● Therefore we talk just about an optimizing compiler.
– and keep working without worrying about future prospects!
Manas Thakur CS502: Compiler Design 5
How to perform optimizations?● Analysis
– Go over the program
– Identify some properties● Potentially useful properties
● Transformation
– Use the information computed by the analysis to transform the program
● without affecting the semantics
● An example that we have (not literally) seen:
– Compute liveness information
– Delete assignments to variables that are dead
Manas Thakur CS502: Compiler Design 6
Classifying optimizations● Based on scope:
– Local to basic blocks
– Intraprocedural
– Interprocedural
● Based on positioning:
– High-level (transform source code or high-level IR)
– Low-level (transform mid/low-level IR)
● Based on (in)dependence w.r.t. target machine:
– Machine independent (general enough)
– Machine dependent (specific to the architecture)
Manas Thakur CS502: Compiler Design 7
May versus Must information
● Consider the program:
● Which variables may be assigned?
– a, b, c● Which variables must be assigned?
– a● May analysis:
– the computed information may hold in at least one execution of the program.
● Must analysis:
– the computed information must hold every time the program is executed.
if (c) { a = ... b = ...} else { a = ... c = ...}
Manas Thakur CS502: Compiler Design 8
Many many optimizations
● Constant folding, constant propagation, tail-call elimination, redundancy elimination, dead code elimination, loop-invariant code motion, loop splitting, loop fusion, strength reduction, array scalarization, inlining, synchronization elision, cloning, data prefetching, parallelization . . . etc . .
● How do they interact?
– Optimist: we get the sum of all improvements.
– Realist: many are in direct opposition.
● Let us study some of them!
Manas Thakur CS502: Compiler Design 9
Constant propagation● Idea:
– If the value of a variable is known to be a constant at compile-time, replace the use of the variable with the constant.
– Usually a very helpful optimization
– e.g., Can we now unroll the loop?● Why is it good?● Why could it be bad?
– When can we eliminate n and c themselves?● Now you know how well different optimizations might interact!
n = 10;c = 2;for (i=0; i<n; ++i) s = s + i * c;
n = 10;c = 2;for (i=0; i<10; ++i) s = s + i * 2;
Manas Thakur CS502: Compiler Design 10
Constant folding
● Idea:
– If operands are known at compile-time, evaluate expression at compile-time.
– What if the code was?
– And what now?
r = 3.141 * 10; r = 31.41;
PI = 3.141;r = PI * 10;
PI = 3.141;r = PI * 10;d = 2 * r;
Constant propagation
Constant folding
Called partial evaluation
Manas Thakur CS502: Compiler Design 11
Common sub-expression elimination
● Idea:
– If program computes the same value multiple times,reuse the value.
– Subexpressions can be reused until operands are redefined.
a = b + c;c = b + c;d = b + c;
t = b + c;a = t;c = t;d = b + c;
Manas Thakur CS502: Compiler Design 12
Copy propagation
● Idea:
– After an assignment x = y, replace the uses of x with y.
– Can only apply up to another assignment to x, or
... another assignment to y!
– What if there was an assignment y = z earlier?● Apply transitively to all assignments.
x = y;if (x > 1) s = x + f(x);
x = y;if (y > 1) s = y + f(y);
Manas Thakur CS502: Compiler Design 13
Dead-code elimination
● Idea:
– If the result of a computation is never used,remove the computation.
– Remove code that assigns to dead variables.● Liveness analysis done before would help!
– This may, in turn, create more dead code.● Dead-code elimination usually works transitively.
x = y + 1;y = 1;x = 2 * z;
y = 1;x = 2 * z;
Manas Thakur CS502: Compiler Design 14
Unreachable-code elimination
● Idea:
– Eliminate code that can never be executed
– High-level: look for if (false) or while (false)● perhaps after constant folding!
– Low-level: more difficult● Code is just labels and gotos● Traverse the CFG, marking reachable blocks
#define DEBUG 0if (DEBUG) print(“Current value = ", v);
Manas Thakur CS502: Compiler Design 15
Next class
● Next class:
– How to perform the optimizations that we have seen using a dataflow analysis?
● Starting with:
– The back-end fullform of CFG!
● Approximately only 10 more classes left.
– Hope this course is being successful in making (y)our hectic days a bit more exciting :-)
CS502: Compiler Design
Code Optimization (Cont.)
Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 17
Recall A2
● Is ‘a’ initialized in this program?
– Reality during run-time: Depends
– What to tell at compile-time?● Is this a ‘must’ question or a ‘may’ question?● Correct answer: No
– How do we obtain such answers?● Need to model the control-flow
int a;if (*) { a = 10;else { //something that doesn’t touch ‘a’}x = a;
Manas Thakur CS502: Compiler Design 18
Control-Flow Graph (CFG)
● Nodes represent instructions; edges represent flow of control
a = 0L1: b = a + 1 c = c + b a = b * 2 if a < N goto L1 return c
a = 0
b = a + 1
c = c + b
a = b * 2
a < N
return c
Manas Thakur CS502: Compiler Design 19
Some CFG terminology
● pred[n] gives predecessors of n
– pred[1]? pred[4]? pred[2]?
● succ[n] gives successors of n
– succ[2]? succ[5]?
● def(n) gives variables defined by n
– def(3) = {c}
● use(n) gives variables used by n
– use(3) = {b, c}
a = 0
b = a + 1
c = c + b
a = b * 2
a < N
return c
1
2
3
4
5
6
Manas Thakur CS502: Compiler Design 20
Live ranges revisited
● A variable is live, if its current value may be used in future.
– Insight:● work from future to past● backward over the CFG
● Live ranges:
– a: {1->2, 4->5->2}
– b: {2->3, 3->4}
– c: All edges except 1->2
a = 0
b = a + 1
c = c + b
a = b * 2
a < N
return c
1
2
3
4
5
6
Manas Thakur CS502: Compiler Design 21
Liveness● A variable v is live on an edge if there is a
directed path from that edge to a use of v that does not go through any def of v.
● A variable is live-in at a node if it is live on any of the in-edges of that node.
● A variable is live-out at a node if it is live on any of the out-edges of that node.
● Verify:
– a: {1->2, 4->5->2}
– b: {2->4}
a = 0
b = a + 1
c = c + b
a = b * 2
a < N
return c
1
2
3
4
5
6
Manas Thakur CS502: Compiler Design 22
Computation of liveness
● Say live-in of n is in[n], and live-out of n is out[n].
● We can compute in[n] and out[n] for any n as follows:
in[n] = use[n] ∪ (out[n] – def[n])
out[n] = s∀ ∈succ[n] ∪ in[s]
Called dataflow equations.Called flow functions.
Manas Thakur CS502: Compiler Design 23
Liveness as an iterative dataflow analysis
Initialize
Save previous values
Computenew values
Repeat till fixed-point
IDFAfor each n
in[n] = {}; out[n] = {}
repeat
for each n
in’[n] = in[n]; out’[n] = out[n]
in[n] = use[n] (out[n] – def[n])∪
out[n] = s succ[n] ∀ ∈ in[s]∪● until in’[n] == in[n] and out’[n] == out[n] ∀n
Manas Thakur CS502: Compiler Design 24
Liveness analysis example
a = 0
b = a + 1
c = c + b
a = b * 2
a < N
return c
1
2
3
4
5
6
in[n] = use[n] ∪ (out[n] – def[n]) out[n] = s∀ ∈succ[n] ∪ in[s]
Fixed point
Manas Thakur CS502: Compiler Design 25
In backward order a = 0
b = a + 1
c = c + b
a = b * 2
a < N
return c
1
2
3
4
5
6● Fixed point only in 3 iterations!
● Thus, the order of processing statements is important for efficiency.
in[n] = use[n] ∪ (out[n] – def[n]) out[n] = s∀ ∈succ[n] ∪ in[s]
Manas Thakur CS502: Compiler Design 26
Complexity of our liveness computation algorithm
● For input program of size N
– ≤N nodes in CFG
⇒ N variables
⇒ N elements per in/out
⇒ O(N) time per set union
– for loop performs constant number of set operations per node
⇒ O(N2) time for for loop
– Each iteration of for loop can only add to each set (monotonicity)
– Sizes of all in and out sets sum to 2N2
thus bounding the number of iterations of the repeat loop
⇒ worst-case complexity of O(N4)
– Much less in practice (usually O(N) or O(N2)) if ordered properly.
repeat
for each n
in’[n] = in[n]; out’[n] = out[n]
in[n] = use[n] (out[n] – def[n])∪
out[n] = s succ[n] ∀ ∈ in[s]∪
until in’[n] == in[n] and out’[n] == out[n] ∀n
Manas Thakur CS502: Compiler Design 27
Least fixed points
● There is often more than one solution for a given dataflow problem.
– Any solution to dataflow equations is a conservative approximation.
● Conservatively assuming a variable is live does not break the program:
– Just means more registers may be needed.
● Assuming a variable is dead when really live will break things.
● Many possible solutions; but we want the smallest: the least fixed point.
● The iterative algorithm computes this least fixed point.
Manas Thakur CS502: Compiler Design 28
Confused!?
● Is compilers a theoretical topic or a practical one?
● Recall:
– “A sangam of theory and practice.”
● Next class:
– We are not leaving a topic as important as IDFA so soon!
CS502: Compiler Design
Code Optimization (Cont.)
Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 30
Recall our IDFA algorithm
for each n
in[n] = ...; out[n] = ...
repeat
for each n
in’[n] = in[n]; out’[n] = out[n]
in[n] = ...
out[n] = ...
until in’[n] = in[n] and out’[n] = out[n] for all n
Initialize
Save previous values
Computenew values
Repeat till fixed-point
Do we need to process all the nodes in each iteration?
Manas Thakur CS502: Compiler Design 31
Worklist-based Implementation of IDFA
● Initialize a worklist of statements
● Forward analysis:
– Start with the entry node
– If OUT(n) changes, then add succ(n) to the worklist
● Backward analysis:
– Start with the exit node
– If IN(n) changes, then add pred(n) to the worklist
● In both the cases, iterate till fixed point.
Manas Thakur CS502: Compiler Design 32
Writing an IDFA (Cont.)
● Initialization of IN and OUT sets depends on the analysis:
– empty if the information grows
– all the nodes if the information shrinks
● Requirement for termination:
– unidirectional growth/shrinkage
– Called monotonicity
● Confluence/Meet operation (at control-flow merges):
– Union
– Intersection Depends on the analysis
Manas Thakur CS502: Compiler Design 33
Live-variable analysis revisited
● Direction:
– Backward
● Initialization:
– Empty sets
● Flow functions:
– out[n] = s∀ ∈succ[n] ∪ in[s]– in[n] = use[n] ∪ (out[n] – def[n])
● Confluence operation:
– Union
Manas Thakur CS502: Compiler Design 34
Common sub-expressions revisited
● Idea:
– If a program computes the same value multiple times,reuse the value.
– Subexpressions can be reused until operands are redefined.
– Say given a node n, the expressions computed at n are denoted as gen(n) and the ones killed (operands redefined) at n are denoted as kill(n).
a = b + c;c = b + c;d = b + c;
t = b + c;a = t;c = t;d = b + c;
Manas Thakur CS502: Compiler Design 35
Common subexpressions as an IDFA● Direction:
– Forward
● Initialization:
– Empty sets
● Flow functions:
– in[n] = p∀ pred∈ [n] ∩ out[p]– out[n] = gen[n] ∪ (in[n] – kill[n])
● Confluence operation:
– Intersection
Manas Thakur CS502: Compiler Design 36
Are we efficient enough?
● When can IDFAs take a lot of time?
● Which operations could be expensive?
– Confluence
– Equality
● Compilers may have to perform several IDFAs.
● How can we make an IDFA more efficient (perhaps with some loss of precision)?
repeat
for each n
in’[n] = in[n]; out’[n] = out[n]
in[n] = use[n] (out[n] – def[n])∪
out[n] = s succ[n] in[s]∀ ∈ ∪
until in’[n] == in[n] and out’[n] == out[n] n∀
Manas Thakur CS502: Compiler Design 37
Basic Blocks a = 0L1: b = a + 1 c = c + b a = b * 2 if a < N goto L1 return c
a = 0
b = a + 1
c = c + b
a = b * 2
a < N
return c
a = 0
b = a + 1 c = c + b a = b * 2 a < N
return c
Each instruction as a node Using basic blocks
Manas Thakur CS502: Compiler Design 38
Basic Blocks (Cont.)
● Idea:
– Once execution enters a basic block, all statements are executed in sequence.
– Single-entry, single-exit region
● Details:
– Starts with a label
– Ends with one or more branches
– Edges may be labeled with predicates● True/false● Exceptions
● Key: Improve efficiency, with reasonable precision.
Manas Thakur CS502: Compiler Design 39
Have you got a compiler’s eyes yet?
● What properties can you identify about this program?
● What’s the advantage if it was rewritten as follows?
● Def-use becomes explicit.
S1: y = 1;S2: y = 2;S3: x = y;
S1: y1 = 1;S2: y2 = 2;S3: x = y2;
Manas Thakur CS502: Compiler Design 40
Static Single Assignment (SSA)
● A form of IR in which each use can be mapped to a single definition.
– Achieved using variable renaming and phi nodes.
● Many compilers use SSA form in their IRs.
if (flag) x = -1;else x = 1;y = x * a;
if (flag) x1 = -1;else x2 = 1;x3 = Φ(x
1, x
2)
y = x3 * a;
Manas Thakur CS502: Compiler Design 41
SSA Classwork
● Convert the following program to SSA form:
– (Hint: First convert to 3AC)
x = 0;for (i=0; i<N; ++i) { x += i; i = i + 1; x--;}x = x + i;
x1 = 0;i1 = 0;L1:i13 = Φ(i1,i3);if (i13 < N) { x13 = Φ(x1,x3); x2 = x13 + i13; i2 = i13 + 1;
x3 = x2 – 1;i3 = i2 + 1;goto L1;
}x4 = Φ(x1, x3);x5 = x4 + i13;
Manas Thakur CS502: Compiler Design 42
Effect of SSA on Register Allocation!?
● What is the effect of SSA form on liveness?
● What does SSA do?
– Breaks a single variable into multiple instances
– Instances represent distinct, non-overlapping uses
● Effect:
– Breaks up live ranges; often improves register allocation
x x1 x2
Manas Thakur CS502: Compiler Design 43
Featuring Next in Code Optimization
● Heard of the 80-20 or 90-10 rule?
– X% of time is spent in executing y% of the code, where X >> y.
● Which all kinds of code portions tend to form the region ‘y’ in typical programs?
– Loops
– Methods
● Tomorrow: Loop optimizations
CS502: Compiler Design
Loop Optimizations
Manas Thakur
Fall 2020
Manas Thakur CS502: Compiler Design 45
Why optimize loops?
● Form a significant portion of the time spent in executing programs.
– If N is just 10000 (not uncommon), we have too many instructions!● How many in the above loop?
– What if S1/S2 is/are function calls?
● Involve costly instructions in each iteration:
– Comparisons
– Jumps
for (i=0; i<N; i++) { S1; S2;}
Manas Thakur CS502: Compiler Design 46
What is a loop?
● A loop in a CFG is a set of nodes S such that:
– There is a designated header node h in S
– There is a path from each node in S to h
– There is a path from h to each node in S
– h is the only node in S with an incoming edge from outside S
Manas Thakur CS502: Compiler Design 47
Are all these loops?
Manas Thakur CS502: Compiler Design 48
What about these?
Manas Thakur CS502: Compiler Design 49
Identifying loops using dominators
● A node d dominates a node n if every path from entry to n goes through d.
● Compute dominators of each node:
Manas Thakur CS502: Compiler Design 50
Flow function for computing dominators
● Assuming D[i] is the set of dominators of node i:
D[entry] = {entry}
D[n] = {n} ∪ p∀ pred∈ [n] ∩ D[p]
Manas Thakur CS502: Compiler Design 51
Identifying loops using dominators (Cont.)
● First, identify a back edge:
– An edge from a node n to another node h, where h dominates n● Each back edge leads to a loop:
– Set X of nodes such that for each x ∈ X, h dominates x and there is a path from x to n not containing h
– h is the header
● Verify:
Manas Thakur CS502: Compiler Design 52
Loop-Invariant Code Motion (LICM)
● Loop-invariant code:
– d: t = a OP b, such that:● a and b are constants; or● all the definitions of a and b that reach d are outside the loop; or● only one definition each of a and b reaches d, and that definition is
loop-invariant.
● Example:
L0: t = 0L1: i = i + 1 t = a * b M[i] = t if i<N goto L1L2: x = t
Manas Thakur CS502: Compiler Design 53
LICM: Get ready for code hoisting
● Can we always hoist loop-invariant code?
● Criteria for hoisting d: t = a OP b:
– d dominates all loop exits at which t is live-out, and
– there is only one definition of t in the loop, and
– t is not live-out of the loop preheader
● How can we hoist code in the pink and the orange blocks?
L0: t = 0L1: i = i + 1 t = a * b M[i] = t if i<N goto L1L2: x = t
L0: t = 0L1: if i>=N goto L2 i = i + 1 t = a * b M[i] = t goto L1L2: x = t
L0: t = 0L1: M[j] = t i = i + 1 t = a * b M[i] = t if i<N goto L1L2: x = t
Manas Thakur CS502: Compiler Design 54
Induction-variable optimization
● Induction variables:
– Variables whose value depends on iteration variable
● Optimization:
– Compute them efficiently, if possible
s = 0 i = 0L1: if i>=N goto L2 j = i * 4 k = j + a x = M[k] s = s + x i = i + 1 goto L1L2:
s = 0 k’ = a b = N * 4 c = a + bL1: if k’>=c goto L2 x = M[k’] s = s + x k’ = k’ + 4 goto L1L2:
Manas Thakur CS502: Compiler Design 55
Loop unrolling
● Minimize the number of increments and condition-checks
● Be careful about the increase in code size (I-cache misses!)
L1: x = M[i] s = s + x i = i + 4 if i<N goto L1L2:
L1: x = M[i] s = s + x x = M[i+4] s = s + x i = i + 8 if i<N goto L1L2:
if i<N-8 goto L1 goto L2L1: x = M[i] s = s + x x = M[i+4] s = s + x i = i + 8 if i<N-8 goto L1L2: x = M[i] s = s + x i = i + 4 if i<N goto L2L3:
Only even no. of iterations: Any no. of iterations:
Unroll by factor of 2
Manas Thakur CS502: Compiler Design 56
Loop interchange● A C/Java programmer starting with MATLAB:
● But MATLAB stores matrices in column-major order!
● Implication?
– Cache misses (perhaps in each iteration)!
● Solution (interchange the loops!):
for i=1:1000, for j=1:1000, a(i) = a(i) + b(i,j)*c(i) endend
for j=1:1000, for i=1:1000, a(i) = a(i) + b(i,j)*c(i) endend
Manas Thakur CS502: Compiler Design 57
Many more loop optimizations
● Loop fusion
● Loop fission
● Loop inversion
● Loop tiling
● Loop unswitching
● . . .
● Vectorization
● ParallelizationNext class!
Some other time!