predicting unroll factors using supervised learning

PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING

Mark Stephenson & Saman AmarasingheMassachusetts Institute of Technology

Computer Science and Artificial Intelligence Lab

INTRODUCTION & MOTIVATION

Compiler heuristics rely on detailed knowledge of the system

Compiler interactions not understoodArchitectures are complex

Features Pentium® (3M)

Pentium 4 (55M)

Superscalar

Hyperthreading

Speculative execution

Improved FPU

HEURISTIC DESIGN

Current approach to heuristic development is somewhat ad hoc

Can compiler writers learn anything from baseball?• Is it feasible to deal with empirical data?• Can we use statistics and machine learning

to build heuristics?

CASE STUDY

Loop unrolling• Code expansion can degrade performance• Increased live ranges, register pressure• A myriad of interactions with other passes

Requires categorization into multiple classes• i.e., what’s the unroll factor?

ORC’S HEURISTIC (UNKNOWN TRIPCOUNT)

if (trip_count_tn == NULL) { UINT32 ntimes = MAX(1, OPT_unroll_times-1); INT32 body_len = BB_length(head); while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max) ntimes--; Set_unroll_factor(ntimes); } else { … }

ORC’S HEURISTIC (KNOWN TRIPCOUNT)

} else { BOOL const_trip = TN_is_constant(trip_count_tn); INT32 const_trip_count = const_trip ? TN_value(trip_count_tn) : 0; INT32 body_len = BB_length(head); CG_LOOP_unroll_min_trip = MAX(CG_LOOP_unroll_min_trip, 1); if (const_trip && CG_LOOP_unroll_fully &&

(body_len * const_trip_count <= CG_LOOP_unrolled_size_max ||

CG_LOOP_unrolled_size_max == 0 && CG_LOOP_unroll_times_max >= const_trip_count)) {

Set_unroll_fully(); Set_unroll_factor(const_trip_count); } else { UINT32 ntimes = OPT_unroll_times; ntimes = MIN(ntimes, CG_LOOP_unroll_times_max); if (!is_power_of_two(ntimes)) {

ntimes = 1 << log2(ntimes); } while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max)

ntimes /= 2; if (const_trip) {

while (ntimes > 1 && const_trip_count < 2 * ntimes) ntimes /= 2;

} Set_unroll_factor(ntimes); } }

SUPERVISED LEARNING

Supervised learning algorithms try to find a function F(X) → Y • X : vector of characteristics that define a

loop• Y : empirically found best unroll factor

Loops Unroll Factors

12345678

F(X)

EXTRACTING THE DATA

Extract features• Most features readily available in ORC• Kitchen sink approach

Finding the labels (best unroll factors)• Added instrumentation pass

Assembly instructions inserted to time loops Calls to a library at all exit points

• Compile and run at all unroll factors (1.. 8) For each loop, choose the best one as the label

LEARNING ALGORITHMS

Prototyped in MatlabTwo learning algorithms classified our

data set well• Near neighbors• Support Vector Machine (SVM)

Both algorithms classify quickly• Train at the factory• No increase in compilation time

NEAR NEIGHBORS

# branches

# F

P o

pera

tions

unrolldon’t unroll

SUPPORT VECTOR MACHINES

Map the original feature space into a higher-dimensional space (using a kernel)

Find a hyperplane that maximally separates the data

SUPPORT VECTOR MACHINES

# branches

# F

P o

pera

tions

# branches2

# F

P o

pera

tions

unrolldon’t unroll

PREDICTION ACCURACY

Leave-one-out cross validationFilter out ambiguous training examples

• Only keep obviously better examples (1.05x)

• Throw away obviously noisy examples

NN SVM ORC

Accuracy 62% 65% 16%

REALIZING SPEEDUPS (SWP DISABLED)

-10%

0%

10%

20%

30%

40%

Spee

du

p o

ver

OR

C.

NN v. ORC SVM v. ORCOracle v. ORC

FEATURE SELECTION

Feature selection is a way to identify the best features

Start with loads of featuresSmall feature sets are better

• Learning algorithms run faster• Are less prone to overfitting the training

data• Useless features can confuse learning

algorithms

FEATURE SELECTION CONT.MUTUAL INFORMATION SCORE

Measures the reduction of uncertainty in one variable given knowledge of another variable

Does not tell us how features interact with each other

FEATURE SELECTION CONT.GREEDY FEATURE SELECTION

Choose single best featureChoose another feature, that together

with the best feature, improves classification accuracy most …

FEATURE SELECTIONTHE BEST FEATURES

Rank

Mutual Information Score

Greedy Feature Selection with SVM

1. # FP operations # FP operations 0.59

2. # Operands Loop nest level 0.49

3. Instruction fan-in in DAG

# Operands 0.34

4. # Live ranges # Branches 0.20

5. # Memory operations # Memory operations

0.13

RELATED WORK

Monsifrot et al., “A Machine Learning Approach to Automatic Production of Compiler Heuristics.” 2002

Calder et al., “Evidence-Based Static Branch Prediction Using Machine Learning.” 1997

Cavazos et al., “Inducing Heuristic to Decide Whether to Schedule.” 2004

Moss et al., “Learning to Schedule Straight-Line Code.” 1997

Cooper et al., “Optimizing for Reduced Code Space using Genetic Algorithms.” 1999

Puppin et al., “Adapting Convergent Scheduling using Machine Learning.” 2003

Stephenson et al., “Meta Optimization: Improving Compiler Heuristics with Machine Learning.” 2003

CONCLUSION

Supervised classification can effectively find good heuristics• Even for multi-class problems• SVM and near neighbors perform well• Potentially have big impact

Spent very little time tuning the learning parameters

Let a machine learning algorithm tell us which features are best

TT HH EE NN DDEE

SOFTWARE PIPELINING

ORC has been tuned with SWP in mind• Every major release of ORC has had a

different unrolling heuristic for SWP• Currently 205 lines long

Can we learn a heuristic that outperforms ORC’s SWP unrolling heuristic?

REALIZING SPEEDUPS (SWP ENABLED)

-10%

-5%

0%

5%

10%

15%

20%

25%

Imp

rovem

en

t o

ver

OR

C;

NN v. ORC SVM v. ORC

Oracle v. ORC

HURDLES

Compiler writer must extract featuresAcquiring labels takes time

• Instrumentation library• ~2 weeks to collect data

Predictions confined to training labelsHave to tweak learning algorithmsNoise

predicting unroll factors using supervised learning

Documents