instr sched 3
TRANSCRIPT
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 1/34
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 2/34
Instruction Scheduling
Reordering of instructions so as to keep the pipelines of
functional units full with no stalls
NP-Complete and needs heuristcs
Applied on basic blocks (local)
Global scheduling requires elongation of basic blocks
(super-blocks)
Y.N. Srikant Instruction Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 3/34
Optimal Instruction Scheduling using Integer Linear
Programming
This is useful for the evaluation of instruction scheduling
heuristics that do not generate optimal schedules
Careful implementation may enable these methods to be
deployed even in production quality compilersAssume a simple resource model in which all the
functional units are fully pipelined
Assume an architecture with integer ALU, FP add unit, FP
mult/div unit, and load/store unit with possibly differing
execution latencies
Assume that there are R r instances of the functional unit r
Y.N. Srikant Instruction Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 4/34
Optimal Instruction Scheduling using Integer Linear
Programming
Let σi be the time at which instruction i is scheduledLet d (i , j ) be the weight of the edge (i , j ) of the DAG
To satisfy dependence constraints, for each arc (i , j ) of the
DAG
σ j ≥ σi + d (i , j ) (1)
A matrix K n ×T , where n is the number of instructions in theDAG and T is an estimate of the worst case execution timeof the schedule, is used
T can be estimated by summing up the execution times ofall the instructions in the DAG
K [i , t ] is 1, if instruction i is scheduled at time step t and 0
otherwise
Y.N. Srikant Instruction Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 5/34
Optimal Instruction Scheduling using Integer Linear
Programming
The schedule time σi of instruction i can be expressed as
σi = k i ,0 · 0 + k i ,1 · 1 + · · ·+ k i ,T −1 · (T − 1)
where exactly one of the k i , j is 1
This can be written in matrix form for all σi ’s as:
σ0
σ1...
σn −1
=
k 0,0 k 0,1, · · · k 0,T −1
k 1,0 k 1,1 · · · k 1,T −1...
......
...
k n −1,0 k n −1,1 · · · k n −1,T −1
∗
01...
T − 1
(2)To express that each instruction is scheduled exactly once,
we include the constraintt
k i ,t = 1, ∀i (3)
Y.N. Srikant Instruction Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 6/34
Optimal Instruction Scheduling using Integer Linear
Programming
The resource constraint that no more than R r instructionsare scheduled in any time step can be expressed as
i ∈ F (r )
k i ,t ≤ R r , ∀ t and ∀ r (4)
where F (r ) represents the set of instructions that can be
executed in functional unit type r .
The objective function is to minimize the execution time or
schedule length, subject to the constraints in equations 1-4
above. This can be represented as:
minimize(maxi
(σi + d (i , j )))
Y.N. Srikant Instruction Scheduling
S
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 7/34
Delayed Load Scheduling Algorithm for Trees
RISC load/store architecture with delayed loadsSingle cycle issue/execution, with only loads pipelined
(load delay = 1 cycle)
Generates optimal code without any interlocks for
expression trees
Three phases
Computation of minReg as in Sethi-Ullman code generationalgorithmOrdering of loads and operations as in the SU algorithmEmitting code in canonical DLS order
Uses 1 + minReg number of registers and can handle only
one cycle load delay
Y.N. Srikant Instruction Scheduling
S hi Ull i R C i Al i h
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 8/34
Sethi-Ullman minReg Computation Algorithm
if (isLeaf(node)) then {node.minReg = 1}
else
if (node.left.minReg == node.right.minReg) then{node.minReg = node.left.minReg + 1}
else {node.minReg = MAX(node.left.minReg,
node.right.minReg)}
Y.N. Srikant Instruction Scheduling
S thi Ull i R C t ti E l
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 9/34
Sethi-Ullman minReg Computation Example
½ Ø ½ Ð Ó
¾ Ø ¾ Ð Ó
¿ Ø ¿ Ø ½ · Ø ¾
Ø Ð Ó
Ø Ð Ó
Ø Ð Ó
Ø Ø · Ø
Ø Ø ¹ Ø
Ø Ø ¿ ¶ Ø
½ ¼ × Ø Ø
´ µ ¿ ¹ Ö × × Ó
3
3
2
1 1 111
2 2
i10
i9
i3
i1 i2 i5
i7
i8
i4i6
st
mult
addadd
sub
loadloadloadloadload
´ µ Ü Ô Ö × × Ó Ò Ì Ö
Y.N. Srikant Instruction Scheduling
S thi Ull Al ith C d G E l
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 10/34
Sethi-Ullman Algorithm Code Gen Example
½ Ö ½ Ð Ó
¾ Ö ¾ Ð Ó
¿ Ö ½ Ö ½ · Ö ¾
Ö ¾ Ð Ó
Ö ¿ Ð Ó
Ö Ð Ó
Ö ¿ Ö ¿ · Ö
Ö ¾ Ö ¿ ¹ Ö ¾
Ö ½ Ö ½ ¶ Ö ¾
½ ¼ × Ø Ö ½
´ µ Ó Ë Õ Ù Ò Ù × Ò Ê × Ø Ö ×
Ö ½ Ð Ó
Ö ¾ Ð Ó
Ö ½ Ö ½ · Ö ¾
Ö ¾ Ð Ó
Ö ½ Ö ½ ¹ Ö ¾
½ Ö ¾ Ð Ó
¾ Ö ¿ Ð Ó
¿ Ö ¾ Ö ¾ · Ö ¿
Ö ½ Ö ½ ¶ Ö ¾
½ ¼ × Ø Ö ½
´ µ Ç Ô Ø Ñ Ð Ó Ë Õ Ù Ò Û Ø ¿ Ê × Ø Ö ×
Y.N. Srikant Instruction Scheduling
DLS C t ti E l
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 11/34
DLS Computation Example
Ö ½ Ð Ó
Ö ¾ Ð Ó
Ö ½ Ö ½ · Ö ¾ ± ½ × Ø Ð Ð
Ö ¾ Ð Ó
Ö ½ Ö ½ ¹ Ö ¾ ± ½ × Ø Ð Ð
½ Ö ¾ Ð Ó
¾ Ö ¿ Ð Ó
¿ Ö ¾ Ö ¾ · Ö ¿ ± ½ × Ø Ð Ð
Ö ½ Ö ½ ¶ Ö ¾
½ ¼ × Ø Ö ½
´ µ Ë Ø Ð Ð × Ò Ë Ø ¹ Í Ð Ð Ñ Ò Ë Õ Ù Ò
Ö ½ Ð Ó
Ö ¾ Ð Ó
Ö ¿ Ð Ó
½ Ö Ð Ó
Ö ½ Ö ½ · Ö ¾
¾ Ö ¾ Ð Ó
Ö ½ Ö ½ ¹ Ö ¿
¿ Ö Ö · Ö ¾
Ö ½ Ö ½ ¶ Ö
½ ¼ × Ø Ö ½
´ µ Ä Ë Ë Õ Ù Ò Û Ø Æ Ó Ë Ø Ð Ð ×
Y.N. Srikant Instruction Scheduling
DLS Algorithm Main Program
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 12/34
DLS Algorithm - Main Program
Procedure Generate(root: ExprNode)
{ label(root); //Calculate minReg values
opSched = loadSched = emptyList(); //Initialize
order(root, opSched, loadSched);//Find load and operation order
schedule(opSched, loadSched, root.minReg+1);
//Emit canonical order
}
Y.N. Srikant Instruction Scheduling
DLS Algorithm Finding SU Order
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 13/34
DLS Algorithm - Finding SU Order
Procedure Order(root: ExprNode;
var opSched, loadSched: NodeList){ if (not(isLeaf(root))
{ if (root.left.minReg < root.right.minReg)
{ order(root.right, opSched, loadSched);
order(root.left, opSched, loadSched);
} else
{order(root.left, opSched, loadSched);
order(root.right, opSched, loadSched);
}
append(root, opSched);}
else { append(root, loadSched);
}
Y.N. Srikant Instruction Scheduling
DLS Algorithm Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 14/34
DLS Algorithm - Scheduling
Procedure schedule(opSched, loadSched: NodeList;
Regs: integer)
{ for i = 1 to MIN(Regs, length(loadSched)) do// loads first
{ ld = popHead(loadSched);
ld.reg = getReg(); gen(Load, ld.name, ld.Reg)}
while (not Empty(loadSched))
// (Operation,Load) pairs next{ op = popHead(opSched); op.reg = op.left.reg;
gen(op.op, op.left.reg, op.right.reg, op.reg);
ld = popHead(loadSched); ld.reg = op.right.reg;
gen(Load, ld.name, ld.reg) }
while (not Empty(opSched)) //Remaining Operations
{ op = popHead(opSched); op.reg = op.left.reg;
gen(op.op, op.left.reg, op.right.reg, op.reg);
freeReg(op.right.reg) }
}
Y.N. Srikant Instruction Scheduling
Global Acyclic Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 15/34
Global Acyclic Scheduling
Average size of a basic block is quite small (5 to 20instructions)
Effectiveness of instruction scheduling is limitedThis is a serious concern in architectures supportinggreater ILP
VLIW architectures with several function unitssuperscalar architectures (multiple instruction issue)
Global scheduling is for a set of basic blocks
Overlaps execution of successive basic blocksTrace scheduling, Superblock scheduling, Hyperblock
scheduling, Software pipelining, etc.
Y.N. Srikant Instruction Scheduling
Trace Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 16/34
Trace Scheduling
A Trace is a frequently executed acyclic sequence of basicblocks in a CFG (part of a path)
Identifying a trace
Identify the most frequently executed basic blockExtend the trace starting from this block, forward and
backward, along most frequently executed edges
Apply list scheduling on the trace (including the branch
instructions)
Execution time for the trace may reduce, but execution time
for the other paths may increaseHowever, overall performance will improve
Y.N. Srikant Instruction Scheduling
Trace Example
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 17/34
Trace Example
Ó Ö ´ ¼ ½ ¼ ¼ · · µ
ß
´ ¼ µ
· ×
Ð ×
× Ù Ñ × Ù Ñ ·
´ µ À ¹ Ä Ú Ð Ó
± ± Ö ½ ¼
± ± Ö ¼
± ± Ö ¼ ¼
± ± Ö ×
½ ½ Ö ¾ Ð Ó ´ Ö ½ µ
¾ ´ Ö ¾ ¼ µ Ó Ø Ó
¾ ¿ Ö ¿ Ð Ó ´ Ö ½ µ
Ö Ö ¿ · Ö
´ Ö ½ µ Ö
Ó Ø Ó
¿ Ö Ö ¾
´ Ö ½ µ Ö ¾
Ö Ö · Ö
½ ¼ Ö ½ Ö ½ ·
½ ½ ´ Ö ½ Ö µ Ó Ø Ó ½
´ µ × × Ñ Ð Ý Ó
B2
B1
B3
B4
main trace
´ µ Ó Ò Ø Ö Ó Ð Ð Ó Û Ö Ô
Y.N. Srikant Instruction Scheduling
Trace - Basic Block Schedule
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 18/34
Trace Basic Block Schedule
2-way issue architecture with 2 integer units
add, sub, store : 1 cycle, load : 2 cycles, goto : no stall
9 cycles for the main trace and 6 cycles for the off-trace
Ì Ñ Á Ò Ø º Í Ò Ø ½ Á Ò Ø º Í Ò Ø ¾
¼ ½ Ö ¾ Ð Ó ´ Ö ½ µ
½
¾ ¾ ´ Ö ¾ ¼ µ Ó Ø Ó
¿ ¿ Ö ¿ Ð Ó ´ Ö ½ µ
Ö Ö ¿ · Ö
´ Ö ½ µ Ö Ó Ø Ó
¿ Ö Ö ¾ ´ Ö ½ µ Ö ¾
´ µ Ö Ö · Ö ½ ¼ Ö ½ Ö ½ ·
´ µ ½ ½ ´ Ö ½ Ö µ Ó Ø Ó ½
Y.N. Srikant Instruction Scheduling
Trace Schedule
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 19/34
Trace Schedule
Y.N. Srikant Instruction Scheduling
Trace Schedule
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 20/34
Trace Schedule
6 cycles for the main trace and 7 cycles for the off-trace
Ì Ñ Á Ò Ø º Í Ò Ø ½ Á Ò Ø º Í Ò Ø ¾
¼ ½ Ö ¾ Ð Ó ´ Ö ½ µ ¿ Ö ¿ Ð Ó ´ Ö ½ µ
½
¾ ¾ ´ Ö ¾ ¼ µ Ó Ø Ó Ö Ö ¿ · Ö
¿ ´ Ö ½ µ Ö
´ µ Ö Ö · Ö ½ ¼ Ö ½ Ö ½ ·
´ µ ½ ½ ´ Ö ½ Ö µ Ó Ø Ó ½
¿ Ö Ö ¾ ´ Ö ½ µ Ö ¾
½ ¾ Ó Ø Ó
Y.N. Srikant Instruction Scheduling
Trace Scheduling - Issues
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 21/34
Trace Scheduling Issues
Side exits and side entrances are ignored during
scheduling of a trace
Required compensation code is inserted during
book-keeping (after scheduling the trace)
Speculative code motion - load instruction moved ahead ofconditional branch
Example: Register r3 should not be live in block B3(off-trace path)May cause unwanted exceptions
Requires additional hardware support!
Y.N. Srikant Instruction Scheduling
Compensation Code - Side Exit
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 22/34
Compensation Code Side Exit
Y.N. Srikant Instruction Scheduling
Compensation Code - Side Exit
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 23/34
p
Y.N. Srikant Instruction Scheduling
Compensation Code - Side Entry
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 24/34
p y
Y.N. Srikant Instruction Scheduling
Compensation Code - Side Entry
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 25/34
p y
Y.N. Srikant Instruction Scheduling
Superblock Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 26/34
p g
A Superblock is a trace without side entrances
Control can enter only from the topMany exits are possibleEliminates several book-keeping overheads
Superblock formation
Trace formation as before
Tail duplication to avoid side entrances into a superblockCode size increases
Y.N. Srikant Instruction Scheduling
Superblock Example
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 27/34
p p
5 cycles for the main trace and 6 cycles for the off-trace
B1
B2
B4
B3
B4’
SuperBlock 1
SuperBlock 2
´ µ Ó Ò Ø Ö Ó Ð Ð Ó Û Ö Ô
Ì Ñ Á Ò Ø º Í Ò Ø ½ Á Ò Ø º Í Ò Ø ¾
¼ ½ Ö ¾ Ð Ó ´ Ö ½ µ ¿ Ö ¿ Ð Ó ´ Ö ½ µ
½
¾ ¾ ´ Ö ¾ ¼ µ Ó Ø Ó Ö Ö ¿ · Ö
¿ ´ Ö ½ µ Ö ½ ¼ Ö ½ Ö ½ ·
Ö Ö · Ö ½ ½ ´ Ö ½ Ö µ Ó Ø Ó ½
¿ Ö Ö ¾ ´ Ö ½ µ Ö ¾
³ Ö Ö · Ö ½ ¼ ³ Ö ½ Ö ½ ·
½ ½ ³ ´ Ö ½ Ö µ Ó Ø Ó ½
´ µ Ë Ù Ô Ö Ð Ó Ë Ù Ð
Y.N. Srikant Instruction Scheduling
Hyperblock Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 28/34
yp g
Superblock scheduling does not work well withcontrol-intensive programs which have many control flow
paths
Hyperblock scheduling was proposed to handle such
programs
Here, the control flow graph is IF-converted to eliminate
conditional branches
IF-conversion replaces conditional branches with
appropriate predicated instructions
Now, control dependence is changed to a datadependence
Y.N. Srikant Instruction Scheduling
IF-Conversion Example
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 29/34
Y.N. Srikant Instruction Scheduling
Hyperblock Example Code
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 30/34
Ó Ö ´ ¼ ½ ¼ ¼ · · µ
ß
´ ¼ µ
· ×
Ð ×
× Ù Ñ × Ù Ñ ·
´ µ À ¹ Ä Ú Ð Ó
± ± Ö ½ ¼
± ± Ö ¼
± ± Ö ¼ ¼
± ± Ö ×
½ ½ Ö ¾ Ð Ó ´ Ö ½ µ
¾ ´ Ö ¾ ¼ µ Ó Ø Ó
¾ ¿ Ö ¿ Ð Ó ´ Ö ½ µ
Ö Ö ¿ · Ö
´ Ö ½ µ Ö
Ó Ø Ó
¿ Ö Ö ¾
´ Ö ½ µ Ö ¾
Ö Ö · Ö
½ ¼ Ö ½ Ö ½ ·
½ ½ ´ Ö ½ Ö µ Ó Ø Ó ½
´ µ × × Ñ Ð Ý Ó
B2
B1
B3
B4
main trace
´ µ Ó Ò Ø Ö Ó Ð Ð Ó Û Ö Ô
Y.N. Srikant Instruction Scheduling
Hyperblock Example
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 31/34
6 cycles for the entire set of predicated instructions
Instructions i3 and i4 can be executed speculatively and
can be moved up, instead of being scheduled after cycle 2
B2
B1
B3
B4
Hyperblock
´ µ Ó Ò Ø Ö Ó Ð Ð Ó Û Ö Ô
Ì Ñ Á Ò Ø º Í Ò Ø ½ Á Ò Ø º Í Ò Ø ¾
¼ ½ Ö ¾ Ð Ó ´ Ö ½ µ ¿ Ö ¿ Ð Ó ´ Ö ½ µ
½
¾ ¾ ³ Ô ½ ´ Ö ¾ ¼ µ Ö Ö ¿ · Ö
¿ ´ Ö ½ µ Ö ¸ Ô ½ ´ Ö ½ µ Ö ¾ ¸ Ô ½
½ ¼ Ö ½ Ö ½ · Ö Ö ¾ ¸ Ô ½
Ö Ö · Ö ½ ½ ´ Ö ½ Ö µ Ó Ø Ó ½
´ µ À Ý Ô Ö Ð Ó Ë Ù Ð
Y.N. Srikant Instruction Scheduling
Delayed Branch Scheduling
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 32/34
Delayed branching
One instruction immediately following the delayed branchinstruction will be executed before the branch is taken
The instruction occupying the delay slot should beindependent of the branch instruction
It is best to fill the branch delay slot with an instruction from
the basic block that the branch terminatesOtherwise, an instruction from either the target block or thefall-through block, whichever is most likely to be executed,is selected
The selected instruction should either be a root node of the
DAG of the basic block (target of fall-through)and has a destination register that is not live-in in the otherblockor has a destination register that can be renamed
Y.N. Srikant Instruction Scheduling
Delay Branch Scheduling Conditions - 1
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 33/34
Y.N. Srikant Instruction Scheduling
Delay Branch Scheduling Conditions - 2
7/28/2019 Instr Sched 3
http://slidepdf.com/reader/full/instr-sched-3 34/34
Y.N. Srikant Instruction Scheduling