predicting unroll factors using supervised learning
DESCRIPTION
PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING. Mark Stephenson & Saman Amarasinghe Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab. INTRODUCTION & MOTIVATION. Compiler heuristics rely on detailed knowledge of the system - PowerPoint PPT PresentationTRANSCRIPT
PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING
Mark Stephenson & Saman AmarasingheMassachusetts Institute of Technology
Computer Science and Artificial Intelligence Lab
INTRODUCTION & MOTIVATION
Compiler heuristics rely on detailed knowledge of the system
Compiler interactions not understoodArchitectures are complex
Features Pentium® (3M)
Pentium 4 (55M)
Superscalar
Hyperthreading
Speculative execution
Improved FPU
HEURISTIC DESIGN
Current approach to heuristic development is somewhat ad hoc
Can compiler writers learn anything from baseball?• Is it feasible to deal with empirical data?• Can we use statistics and machine learning
to build heuristics?
CASE STUDY
Loop unrolling• Code expansion can degrade performance• Increased live ranges, register pressure• A myriad of interactions with other passes
Requires categorization into multiple classes• i.e., what’s the unroll factor?
ORC’S HEURISTIC (UNKNOWN TRIPCOUNT)
if (trip_count_tn == NULL) { UINT32 ntimes = MAX(1, OPT_unroll_times-1); INT32 body_len = BB_length(head); while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max) ntimes--; Set_unroll_factor(ntimes); } else { … }
ORC’S HEURISTIC (KNOWN TRIPCOUNT)
} else { BOOL const_trip = TN_is_constant(trip_count_tn); INT32 const_trip_count = const_trip ? TN_value(trip_count_tn) : 0; INT32 body_len = BB_length(head); CG_LOOP_unroll_min_trip = MAX(CG_LOOP_unroll_min_trip, 1); if (const_trip && CG_LOOP_unroll_fully &&
(body_len * const_trip_count <= CG_LOOP_unrolled_size_max ||
CG_LOOP_unrolled_size_max == 0 && CG_LOOP_unroll_times_max >= const_trip_count)) {
Set_unroll_fully(); Set_unroll_factor(const_trip_count); } else { UINT32 ntimes = OPT_unroll_times; ntimes = MIN(ntimes, CG_LOOP_unroll_times_max); if (!is_power_of_two(ntimes)) {
ntimes = 1 << log2(ntimes); } while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max)
ntimes /= 2; if (const_trip) {
while (ntimes > 1 && const_trip_count < 2 * ntimes) ntimes /= 2;
} Set_unroll_factor(ntimes); } }
SUPERVISED LEARNING
Supervised learning algorithms try to find a function F(X) → Y • X : vector of characteristics that define a
loop• Y : empirically found best unroll factor
Loops Unroll Factors
12345678
F(X)
EXTRACTING THE DATA
Extract features• Most features readily available in ORC• Kitchen sink approach
Finding the labels (best unroll factors)• Added instrumentation pass
Assembly instructions inserted to time loops Calls to a library at all exit points
• Compile and run at all unroll factors (1.. 8) For each loop, choose the best one as the label
LEARNING ALGORITHMS
Prototyped in MatlabTwo learning algorithms classified our
data set well• Near neighbors• Support Vector Machine (SVM)
Both algorithms classify quickly• Train at the factory• No increase in compilation time
NEAR NEIGHBORS
# branches
# F
P o
pera
tions
unrolldon’t unroll
SUPPORT VECTOR MACHINES
Map the original feature space into a higher-dimensional space (using a kernel)
Find a hyperplane that maximally separates the data
SUPPORT VECTOR MACHINES
# branches
# F
P o
pera
tions
# branches2
# F
P o
pera
tions
unrolldon’t unroll
PREDICTION ACCURACY
Leave-one-out cross validationFilter out ambiguous training examples
• Only keep obviously better examples (1.05x)
• Throw away obviously noisy examples
NN SVM ORC
Accuracy 62% 65% 16%
REALIZING SPEEDUPS (SWP DISABLED)
-10%
0%
10%
20%
30%
40%
Spee
du
p o
ver
OR
C.
NN v. ORC SVM v. ORCOracle v. ORC
FEATURE SELECTION
Feature selection is a way to identify the best features
Start with loads of featuresSmall feature sets are better
• Learning algorithms run faster• Are less prone to overfitting the training
data• Useless features can confuse learning
algorithms
FEATURE SELECTION CONT.MUTUAL INFORMATION SCORE
Measures the reduction of uncertainty in one variable given knowledge of another variable
Does not tell us how features interact with each other
FEATURE SELECTION CONT.GREEDY FEATURE SELECTION
Choose single best featureChoose another feature, that together
with the best feature, improves classification accuracy most …
FEATURE SELECTIONTHE BEST FEATURES
Rank
Mutual Information Score
Greedy Feature Selection with SVM
1. # FP operations # FP operations 0.59
2. # Operands Loop nest level 0.49
3. Instruction fan-in in DAG
# Operands 0.34
4. # Live ranges # Branches 0.20
5. # Memory operations # Memory operations
0.13
RELATED WORK
Monsifrot et al., “A Machine Learning Approach to Automatic Production of Compiler Heuristics.” 2002
Calder et al., “Evidence-Based Static Branch Prediction Using Machine Learning.” 1997
Cavazos et al., “Inducing Heuristic to Decide Whether to Schedule.” 2004
Moss et al., “Learning to Schedule Straight-Line Code.” 1997
Cooper et al., “Optimizing for Reduced Code Space using Genetic Algorithms.” 1999
Puppin et al., “Adapting Convergent Scheduling using Machine Learning.” 2003
Stephenson et al., “Meta Optimization: Improving Compiler Heuristics with Machine Learning.” 2003
CONCLUSION
Supervised classification can effectively find good heuristics• Even for multi-class problems• SVM and near neighbors perform well• Potentially have big impact
Spent very little time tuning the learning parameters
Let a machine learning algorithm tell us which features are best
TT HH EE NN DDEE
SOFTWARE PIPELINING
ORC has been tuned with SWP in mind• Every major release of ORC has had a
different unrolling heuristic for SWP• Currently 205 lines long
Can we learn a heuristic that outperforms ORC’s SWP unrolling heuristic?
REALIZING SPEEDUPS (SWP ENABLED)
-10%
-5%
0%
5%
10%
15%
20%
25%
Imp
rovem
en
t o
ver
OR
C;
NN v. ORC SVM v. ORC
Oracle v. ORC
HURDLES
Compiler writer must extract featuresAcquiring labels takes time
• Instrumentation library• ~2 weeks to collect data
Predictions confined to training labelsHave to tweak learning algorithmsNoise