optimal superblock scheduling using enumeration
DESCRIPTION
Optimal Superblock Scheduling Using Enumeration. Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/. Outline. Background Existing Solutions Optimal Solution Experimental Results Summary and Future Work. Overview. - PowerPoint PPT PresentationTRANSCRIPT
Optimal Superblock Optimal Superblock Scheduling Using Scheduling Using
EnumerationEnumerationGhassan Shobaki, CS Dept.Ghassan Shobaki, CS Dept.
Kent Wilken, ECE Dept.Kent Wilken, ECE Dept.
University of California, DavisUniversity of California, Davis
www.ece.ucdavis.edu/aco/www.ece.ucdavis.edu/aco/
2
OutlineOutline
BackgroundBackground Existing SolutionsExisting Solutions Optimal SolutionOptimal Solution Experimental ResultsExperimental Results Summary and Future WorkSummary and Future Work
3
OverviewOverview
““Instruction Scheduling is the most Instruction Scheduling is the most fundamental ILP-oriented phase”. fundamental ILP-oriented phase”. [Josh Fisher [Josh Fisher et al.et al., “Embedded Computing”], “Embedded Computing”]
Scheduler tries to find an instruction Scheduler tries to find an instruction order that minimizes pipeline stalls order that minimizes pipeline stalls
Schedule must preserve program’s Schedule must preserve program’s semantics and honor hardware semantics and honor hardware constraintsconstraints
4
Elements of Instruction Elements of Instruction SchedulingScheduling
Region FormationRegion Formation Schedule Construction (the Schedule Construction (the
focus of our research)focus of our research)
5
Region FormationRegion Formation Scheduler’s scope is a sub-graph of the Scheduler’s scope is a sub-graph of the
program’s control flow graph (CFG)program’s control flow graph (CFG) Local scheduling:Local scheduling: single basic block single basic block Global scheduling:Global scheduling: multiple basic blocks: multiple basic blocks:
Trace Trace SuperblockSuperblock and hyperblock and hyperblock TreegionTreegion General acyclic: e.g. Wavefront (2000)General acyclic: e.g. Wavefront (2000)
6
Schedule ConstructionSchedule Construction NP-Hard problem for realistic NP-Hard problem for realistic
machinesmachines Heuristic Solutions: Virtually all Heuristic Solutions: Virtually all
production compilers and most production compilers and most researchresearch
Optimal Approaches: Recent researchOptimal Approaches: Recent research Local: Integer Programming and Local: Integer Programming and
enumeration enumeration Global: Integer ProgrammingGlobal: Integer Programming
7
The Superblock The Superblock
Single-entry multiple-exit sequence Single-entry multiple-exit sequence of basic blocksof basic blocks
Data and control dependencies and Data and control dependencies and allowed code motions are allowed code motions are represented by a represented by a Directed Acyclic Directed Acyclic Graph (DAG)Graph (DAG)
8
B
E
G
C
D
I
F
H0.3
0.2
0.5
A
1 1
11
0
3
0
13
0
Example Superblock DAG Example Superblock DAG
A
B
C
G
H
I
0.3
0.2
A
B
C
D
E
F
9
List SchedulingList Scheduling Most common method in practice Most common method in practice Approximate greedy algorithm that runs fast Approximate greedy algorithm that runs fast
in practice in practice Data-ready instructions stored in a Data-ready instructions stored in a priority priority
listlist Priorities assigned according to Priorities assigned according to heuristicsheuristics If ready list is not empty, schedule top If ready list is not empty, schedule top
priority instruction priority instruction Else schedule a stallElse schedule a stall Advance to next issue slotAdvance to next issue slot
10
Critical-Path HeuristicCritical-Path Heuristic
B
E
G
C
D
I
F
H0.3
0.2
0.5
A1 1
11
0
3
0
13
0
5
0
4
3
3 1
4
3
0
Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7 F 8 I
11
Superblock HeuristicsSuperblock Heuristics Critical PathCritical Path Successive RetirementSuccessive Retirement Dependence height and Dependence height and
speculative yield (DHASY)speculative yield (DHASY) G* G* Speculative HedgeSpeculative Hedge Balance SchedulingBalance Scheduling
12
Optimal SchedulingOptimal Scheduling
Can make improvement over heuristicsCan make improvement over heuristics Accurate heuristic methods are already Accurate heuristic methods are already
complexcomplex In some applications, longer compile In some applications, longer compile
times can be toleratedtimes can be tolerated Reference for evaluating accuracy of Reference for evaluating accuracy of
heuristics and studying ILP limitsheuristics and studying ILP limits
13
ObjectiveObjective
S : A given schedule
Pi : Probability of exit i
Di : Delay of exit i from its lower bound Li
E : # of side exits
Find a schedule with minimum cost
1
1
)(E
iiiDPSCost
14
B
E
G
C
D
I
F
H0.3
0.2
0.5
A
1 1
11
0
3
0
13
0
[0,0]
[6,7]
[1,2]
[2,3]
[3,4] [3,6]
[1,4]
[2,5]
[8,8]
Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7 F 8 I
Cost Function Example: CPCost Function Example: CP
Cost = 0.3*1 + 0.2*1 + 0.5*0 = 0.5
15
Heuristic Solution
Lower Bounds
Cost = 0YES
NO
Optimal AlgorithmOptimal Algorithm
Fix BranchesEnumera
teFeasible
Done
DoneYES
NO
16
EnumerationEnumeration List scheduling with backtrackingList scheduling with backtracking Explores one target length at a timeExplores one target length at a time A subset of instructions can be fixed A subset of instructions can be fixed Branch-and-Bound approach with four Branch-and-Bound approach with four
feasibility tests (pruning techniques)feasibility tests (pruning techniques)- Node superiorityNode superiority- LB tighteningLB tightening- History-based dominationHistory-based domination- Relaxed SchedulingRelaxed Scheduling
17
Enumeration ExampleEnumeration Example
I2 I3I1
I4 I5
22
22
I1
I2
I3
stall
I2
I3
I4
I5Infeasible!
Backtrack
Target length = 4
18
Branch Combinations Branch Combinations & Subset Sum& Subset Sum
Branch Combination Problem is Branch Combination Problem is NP- Complete!NP- Complete!
Can be reduced to Subset SumCan be reduced to Subset Sum In practice, the number of In practice, the number of
branches and ranges are small.branches and ranges are small. Solved efficiently using Solved efficiently using Dynamic Dynamic
ProgrammingProgramming
19
B
E
G
C
D
I
F
H0.3
0.2
0.5
A
1 1
11
0
3
0
13
0
[0,0]
[6,7]
[1,2]
[2,3]
[3,4] [3,6]
[1,4]
[2,5]
[8,8]
Start with CP heuristic
Cost = 0.5Only length 8 is interesting
BranchComb C F Cost(0, 0) 2 6 0.0(0, 1) 2 7 0.2(1, 0) 3 6 0.3
Complete ExampleComplete Example
20
0 : A
1 : B
2 : C
3 : D
4 : G
5 : E
A
Relaxed Sched
H
X
?Infeasible
Branch Combination (0,0)Branch Combination (0,0)Cost = 0.0Cost = 0.0
B
E
G
C
D
I
F
H0.3
0.2
0.5
A
1 1
11
0
3
0
13
0
[0,0]
[6,6]
[1,1]
[2,2]
[3,3] [3,5]
[1,4]
[2,5]
[8,8]
21
A
G
EDE
HE
E
F
I
H
B
C
G
D
G
Optimal ScheduleA, B, C, G, D, H, E, F, Iwith cost 0.2
B
E
G
C
D
I
F
H0.3
0.2
0.5
A
1 1
11
0
3
0
13
0
[0,0]
[7,7]
[1,1]
[2,2]
[3,4] [3,6]
[1,4]
[2,5]
[8,8]
Branch Combination (0,1)Branch Combination (0,1)Cost = 0.2Cost = 0.2
22
Experimental ResultsExperimental Results Superblocks imported from GCC Superblocks imported from GCC
using SPEC CPU2000, FP and INTusing SPEC CPU2000, FP and INT Scheduled for 4 machine models:Scheduled for 4 machine models:
single-issuesingle-issue dual-issuedual-issue quad-issuequad-issue six-issue.six-issue.
Time limit set to 1 second per Time limit set to 1 second per problemproblem
23
Superblock StatisticsSuperblock StatisticsFP2000FP2000 INT200INT200
MaxMax AvgAvg MaxMax AvgAvg
DAG Size DAG Size 12361236 2424 454454 1717
Exit CountExit Count 3131 2.82.8 4242 3.33.3
Final-Exit Final-Exit Probability (%)Probability (%) 9999 6868 9999 6666
Side-Exit Side-Exit Probability (%)Probability (%) 4848 1717 4949 1414
24
INT2000 ResultsINT2000 Results
Issue RateIssue Rate 11 22 44 66 AvgAvg
Hard BlocksHard Blocks 25132513 21312131 16851685 573573 17261726
%Timeouts%Timeouts 1.41.4 0.80.8 1.11.1 0.90.9 1.11.1
Avg Soln Avg Soln
Time (ms)Time (ms)55 55 99 99 77
%Improved %Improved BlocksBlocks 8585 7070 8282 8181 7979
% Cycle % Cycle ImprovementImprovement 2.92.9 2.42.4 3.53.5 4.14.1 33
25
Summary & Future WorkSummary & Future Work
An optimal superblock scheduling An optimal superblock scheduling technique has been developed technique has been developed
About 99% of hard problems solved About 99% of hard problems solved within 1 secwithin 1 sec
80% improved80% improved Next Goal: explore other global Next Goal: explore other global
regions. Trace is strongest candidate regions. Trace is strongest candidate