instruction scheduling on vliw architecturesaces.snu.ac.kr/.../4541.775.8.vliw.scheduling.pdf ·...
Post on 18-Jul-2020
1 Views
Preview:
TRANSCRIPT
Instruction Schedulingon
VLIW Architectures
Spring 2011
4541.775Topics on Compilers
Instruction Scheduling
● Limited ILP
● Trace Scheduling
● Superblock Scheduling
● Hyperblock Scheduling
● Modulo Scheduling
Instruction Scheduling
● Insufficient ILP
● “normal” code does not contain enough ILP
● ILP within basic blocks is limited for controlintensive programs
– the problem accentuates with longer latencies
unsigned int abs_sum = 0;for (int i=0; i<N; i++) { int abs = (A[i] >= 0? A[i] : -A[i]); abs_sum += abs;}
mov r0 ← #0 mov r1 ← #0 mov r2 ← N shl #2 mov r3 ← @A.loop ld r4 ← mem[r3 + r1] bge r4, #0, .skip not r4 ← r4 add r4 ← r4, #1.skip add r0 ← r0, r4 add r1 ← r1, #4 blt r1, r2, .loop
b0
b1
b2
b3
Instruction Scheduling
● Insufficient ILP
● “normal” code does not contain enough ILP
● ILP within basic blocks is limited for controlintensive programs
– the problem accentuates with longer latencies
b0
b2
b3
b1
ld r4 ← … bge r4, …
ld latency: 4 cycles
ld
bge
4
ld
bge
Instruction Scheduling
● ILP within basic blocks is limited for controlintensive programs.
→ optimizations across basic blocks are needed
– trace scheduling (J.Fisher, 1981)
– superblock scheduling (P.Chang, 1991)
– hyperblock scheduling (S.Mahlke, 1992)
Instruction Scheduling
● Trace Scheduling
J.A.Fisher: Trace Scheduling: A Technique for Global Microcode Compaction (IEEE Transactions on Computers, vol.30, no.7, 1981)
● basic idea: schedule the most frequently executed trace of basic blocks as one unit
● requires compensation code if the program takes another route than expected
add r4 ← r0, r1
add r4 ← r0, r1 add r4 ← r0, r1
code motioncompensationcode0.9 0.1
Instruction Scheduling
● Trace Scheduling
● A trace consists of a sequence of instructions
– including branches– but not including loops
● example:● assume B1,B3,B4,B5,B7 is
the most frequently executedpath
B2
B3
B4
B5
B1
B6
B7
Instruction Scheduling
● Trace Scheduling
B2
B3
B4
B5
B1
B6
B7
B2
B3
B4
B1
B6B5
B7
add compensationcode if necessary
Instruction Scheduling
● Trace Scheduling
● Compensation Code
– moving an instruction below a side exit
instr 1instr 2instr 3instr 4instr 5instr 6
instr 2instr 3instr 4instr 5instr 1instr 6
instr 1
Instruction Scheduling
● Trace Scheduling
● Compensation Code
– moving an instruction above a side exit(speculative execution)
instr 1instr 2instr 3instr 4instr 5instr 6
instr 1instr 5instr 2instr 3instr 4instr 6
[undo instr 5]
Instruction Scheduling
● Trace Scheduling
● Compensation Code
– moving an instruction below a side entrance– moving an instruction above a side entrance
instr 1instr 2instr 3instr 4instr 5instr 6
instr 2instr 3instr 4instr 5instr 1instr 6
instr 5instr 4
Instruction Scheduling
● Superblock Scheduling WenMei Hwu et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation (The Journal of Supercomputing, vol. 7, issue 12, 1993)
● tries to overcome some difficulties with trace scheduling
– complicated bookkeeping when moving instructions above/below a side entrance/exit
– some compiler optimizations require additional bookkeeping when side entrances are present
example: copypropagation
Instruction Scheduling
● Superblock Scheduling
● a superblock is a trace with no side entrances control may only enter from the top, but leave at one or more exit →
points
● similar to extended basic blocks (Aho et al, 1986)
● superblock formation:
1. identify trace using profile information
2. apply tailduplication until all side entrances have been eliminated
● tail duplication
1. copy the the tail portion of the trace from the first side entrance to the end
2. move all side entrances to the corresponding duplicated basic blocks
Instruction Scheduling
● Superblock Scheduling
● example: superblock formation
Instruction Scheduling
● Superblock Scheduling
● superblock ILP optimizations
optimizations that are performed before superblock formation with the goal to enlarge the superblock and increase ILP by removing dependences.
● superblock enlarging optimizations
– branch target expansion● expand target of the likely taken control transfer that ends a superblock● not applied to backedges● stops when a predefined superblock size is reached or the branch does not favor
one direction.
Instruction Scheduling
● Superblock Scheduling
● superblock enlarging optimizations (cont’d)
– loop peeling● applied to superblock loops (superblocks which end with a likely taken control
transfer to itself) that only tend to iterate a few (k) times.● peel the first k iterations and insert control flow to branch to the original loop
body if the loop is not executed k times.● after loop peeling, the superblock may be extended both at the head and the tail
of the superblock loop
– loop unrolling● unroll the body of a superblock loop that tends to iterate many times
Instruction Scheduling
● Superblock Scheduling
● superblock dependence removing optimizationsremove data dependences between instructions in a superblock
– register renamingi.e., in unrolled loop bodies
– operation migration● move instructions whose result is not used within a superblock to a less
frequently superblock● decicion based on a cost function
– induction variable expansion● create a separate copy of the loop induction variable for each unrolled loop body● requires additional patch code at the loop preheader and at exits
Instruction Scheduling
● Superblock Scheduling
● superblock dependence removing optimizations (cont’d)
– accumulator variable expansion● use a separate accumulator for each unrolled instance of loops accumulating a
sum or product in every iteration● additional patch code at the loop preheader needed● additional patch code at the loop exits needed (summing up the individual
accumulators)
– operation combining● for certain classes of instructions, true dependencies can be eliminated by pre
computing new immediate values at compile time● example:
add x ← x, #4add x ← x, #4
add x ← x, #4 add x’ ← x, #8……mov x ← x’
Instruction Scheduling
● Superblock Scheduling
● example: superblock dependence removing optimizations
accumulator variableexpansion
induction variableexpansion
Instruction Scheduling
● Superblock Scheduling
● speculative execution
– occurs when moving an instruction up above a control transfer instruction B
– the instruction is executed in any case, even if the control transfer instruction would branch out of the superblock (i.e., speculative instructions)
– restrictions for an instruction I to be executed speculatively
1. the destination of I is not used before it is redefined when B is taken
2. I will never cause an exception that may terminate the program when B is taken
– instructions that may cause exceptions● memory load● memory store● integer divide● floating point operations
Instruction Scheduling
● Superblock Scheduling
● speculative execution (cont’d)
– exception models● restricted percolation model
no support for disregarding exceptions generated by speculatively executed instructions
● limits performance in superblocks that contain many longlatency potentially trapcausing instructions (i.e., memory loads) above branches
● general percolation modelthe architecture provides a nontrapping version instructions that may cause exceptions
● convert speculatively executed and potentially trapping instructins to their nontrapping counterpart
● if detection of the exception is required additional architecture and compiler support is required
Instruction Scheduling
● Superblock Scheduling
● Analysis
– implementation complexity in the IMPACTI C compiler
total size: ~92K lines
Instruction Scheduling
● Superblock Scheduling
● Analysis
– compilation time (IMPACTI)
Instruction Scheduling
● Superblock Scheduling
● Analysis
– performance improvement due to superblock ILP optimization
Instruction Scheduling
● Superblock Scheduling
● Analysis
– effect of speculative execution support
Instruction Scheduling
● Superblock Scheduling
● Analysis
– code size increase
Instruction Scheduling
● Hyperblock Scheduling Scott Mahlke et al. Effective Compiler Support for Predicated Execution Using the Hyperblock (MICRO’25, 1992)
● tries to overcome some difficulties with superblock scheduling
– superblocks end when both targets of a control flow instruction have a similar probability to be taken
● hyperblock scheduling
– combine basic blocks from multiple control paths (using ifconversion)
– for programs without heavily biased branches, hyperblocks provide a more flexible framework
Instruction Scheduling
● Hyperblock Scheduling
● Predicated execution
– When the predicate is TRUE the instruction is executed normally
– When the predicate is FALSE the instruction is treated as a NOP
● Conditional branches can be eliminated with predicated execution (ifconversion)
Instruction Scheduling
● Hyperblock Scheduling
● The Hyperblock
– set of predicated basic blocks in which control may only enter at the top but several exits may exists.
– very similar to superblock formation
● Building Hyperblocks
1. hyperblock block selection● decide which basic blocks in a region should be included in the hyperblock● three features of each block are examined
– execution frequency– block size– instruction characteristics
● use heuristic functions
Instruction Scheduling
● Hyperblock Scheduling
● Building Hyperblocks (cont’d)
2. hyperblock formation● tail duplication● loop peeling● node splitting
– eliminate dependences created by control path merges– duplicate all blocks subsequent to the merge point for each path
● Ifconversion
Instruction Scheduling
● Hyperblock Scheduling
● Building Hyperblocks (cont’d)
Instruction Scheduling
● Hyperblock Scheduling
● Control Flow Information
– instructions within a hyperblock are not sequential. a more complex analysis is required→
● Predicate Hierarchy Graph (PHG)
– determine if two instructions can ever be executed in a single path
– if they can, then there is a control flow path between these two instructions
Instruction Scheduling
● Hyperblock Scheduling
● Predicate Hierarchy Graph (PHG) example
ANDing p4 and p5p4∙p5 = (c1∙c2) ∙(~c1+c1 ∙~c2) = 0
→ there is no viable path between p4, p5
same path: ANDp4 = c1 ∙ c2
multiple paths meet: ORp5 = ~c1 + c1 ∙ ~c2
● Hyperblock Scheduling
● HyperblockSpecific Optimizations
– similar to optimizations for superblocks
– instruction promotion● removes the dependence between the predicated instruction and the instruction
which sets the corresponding predicate value
– instructions merging● combine two instructions in a hyperblock with complementary predicates into a
single instruction
● Summary
● Trace Scheduling can increase ILP
– side entrances are too complex to handle
● Superblock Scheduling removes the side entrances from the trace
– weak point: unbiased branches
● Hyperblock Scheduling
– for programs without heavily biased branches, hyperblocks provide a more flexible framework
● Modulo Scheduling next class!→
top related