iterative optimization in the polyhedral model: part i, one...

47
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen and Nicolas Vasilache ALCHEMY, INRIA Futurs / University of Paris-Sud XI March 12, 2007 Fifth International Symposium on Code Generation and Optimization San Jose, California

Upload: others

Post on 20-Jun-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Iterative Optimization in the Polyhedral Model:Part I, One-Dimensional Time

Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen andNicolas Vasilache

ALCHEMY, INRIA Futurs / University of Paris-Sud XI

March 12, 2007

Fifth International Symposium on Code Generation and OptimizationSan Jose, California

Page 2: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Outline: CGO’07

Outline

Context of this study:I Focus on Loop Nest Optimization for regular loopsI Automatic method for parallelism extraction / loop transformationI Combine iterative methods with the power of the polyhedral modelI Solution independent of the compiler and the target machine

Our contribution:I Search space construction

I 1 point in the space⇔ 1 distinct legal program versionI suitable for various exploration methods

I PerformanceI 99% of the best speedup attained within 20 runs of a dedicated heuristicI wall clock optimal transformation discoverable on small kernels

2

Page 3: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: A Motivating Example CGO’07

One-Dimensional Scheduling

Original Schedule

for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j). . S2(i,j);

}

{θS1 = iθS2 = i

for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j). . S2(i,j);

}

I Specify the outer-most loop onlyI Initial outer-most loop is i

4

Page 4: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: A Motivating Example CGO’07

One-Dimensional Scheduling

Distribute loops

for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j). . S2(i,j);

}

{θS1 = iθS2 = i+n

for (i=0; i<n; ++i). S1(i);for (i=n; i<2*n; ++i). for (j=0; j<n; ++j). . S2(i-n,j);

I Specify the outer-most loop onlyI All instances of S1 are executed before the first S2 instance

4

Page 5: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: A Motivating Example CGO’07

One-Dimensional Scheduling

Distribute loops + Interchange loops for S2

for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j). . S2(i,j);

}

{θS1 = iθS2 = j+n

for (i=0; i<n; ++i). S1(i);for (j=n; j<2*n; ++j). for (i=0; i<n; ++i). . S2(i,j-n);

I Specify the outer-most loop onlyI The outer-most loop for S2 becomes j

4

Page 6: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: A Motivating Example CGO’07

One-Dimensional Scheduling

Distribute loops + Interchange loops for S2

for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j). . S2(i,j);

}

{θS1 = iθS2 = j+n

for (i=0; i<n; ++i). S1(i);for (j=n; j<2*n; ++j). for (i=0; i<n; ++i). . S2(i,j-n);

Transformation Descriptionreversal Changes the direction in which a loop

traverses its iteration rangeskewing Makes the bounds of a given loop depend on

an outer loop counterinterchange Exchanges two loops in a perfectly nested

loop, a.k.a. permutationpeeling Extracts one iteration of a given loopshifting Allows to reorder loopsfusion Fuses two loops, a.k.a. jamming

distribution Splits a single loop nest into many,a.k.a. fission or splitting

4

Page 7: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: A Motivating Example CGO’07

One-Dimensional Scheduling

for (i=0; i<n; ++i) {. S1(i);. for (j=0; j<n; ++j). . S2(i,j);

}

I A schedule is an affine function of the iteration vector and theparameters

θS1(~xS1) = t1S1 .iS1 + t2S1 .n+ t3S1 .1θS2(~xS2) = t1S2 .iS2 + t2S2 .jS2 + t3S2 .n+ t4S2 .1

I For −1≤ t ≤ 1, there are 37 = 2187 possible schedulesI But only 129 legal distinct schedules

6

Page 8: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: A Motivating Example CGO’07

One-Dimensional Scheduling

for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j). . s[i] = s[i]+a[i][j]*x[j];

}

I A schedule is an affine function of the iteration vector and theparameters

θS1(~xS1) = t1S1 .iS1 + t2S1 .n+ t3S1 .1θS2(~xS2) = t1S2 .iS2 + t2S2 .jS2 + t3S2 .n+ t4S2 .1

I For −1≤ t ≤ 1, there are 37 = 2187 possible schedules

I But only 129 legal distinct schedules

6

Page 9: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: A Motivating Example CGO’07

One-Dimensional Scheduling

for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j). . s[i] = s[i]+a[i][j]*x[j];

}

I A schedule is an affine function of the iteration vector and theparameters

θS1(~xS1) = t1S1 .iS1 + t2S1 .n+ t3S1 .1θS2(~xS2) = t1S2 .iS2 + t2S2 .jS2 + t3S2 .n+ t4S2 .1

I For −1≤ t ≤ 1, there are 37 = 2187 possible schedulesI But only 129 legal distinct schedules

6

Page 10: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: Overview CGO’07

Our Objective

1 Search space constructionI Efficiently construct a space of all legal, distinct affine schedules

matmult locality fir h264 crout

~i-Bounds −1,1 −1,1 0,1 −1,1 −3,3c-Bounds −1,1 −1,1 0,3 0,4 −3,3#Sched. 1.9×104 5.9×104 1.2×107 1.8×108 2.6×1015

⇓#Legal 6561 912 792 360 798

I Rely on the polyhedral model and Integer Linear Programming toguarantee completeness and correctness of the space properties

I Search space will emcoumpass unique, distinct compositions ofreversal, skewing, interchange, fusion, peeling, shifting, distribution

2 Search space explorationI Perform exhaustive scan to discover wall clock optimal schedule, and

evidences of intricacy of the best transformationI Build an efficient heuristic to accelerate the space traversal

7

Page 11: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: Overview CGO’07

Our Objective

1 Search space constructionI Efficiently construct a space of all legal, distinct affine schedules

matmult locality fir h264 crout

~i-Bounds −1,1 −1,1 0,1 −1,1 −3,3c-Bounds −1,1 −1,1 0,3 0,4 −3,3#Sched. 1.9×104 5.9×104 1.2×107 1.8×108 2.6×1015

⇓#Legal 6561 912 792 360 798

I Rely on the polyhedral model and Integer Linear Programming toguarantee completeness and correctness of the space properties

I Search space will emcoumpass unique, distinct compositions ofreversal, skewing, interchange, fusion, peeling, shifting, distribution

2 Search space explorationI Perform exhaustive scan to discover wall clock optimal schedule, and

evidences of intricacy of the best transformationI Build an efficient heuristic to accelerate the space traversal

7

Page 12: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: Overview CGO’07

Our Objective

1 Search space constructionI Efficiently construct a space of all legal, distinct affine schedules

matmult locality fir h264 crout

~i-Bounds −1,1 −1,1 0,1 −1,1 −3,3c-Bounds −1,1 −1,1 0,3 0,4 −3,3#Sched. 1.9×104 5.9×104 1.2×107 1.8×108 2.6×1015

⇓#Legal 6561 912 792 360 798

I Rely on the polyhedral model and Integer Linear Programming toguarantee completeness and correctness of the space properties

I Search space will emcoumpass unique, distinct compositions ofreversal, skewing, interchange, fusion, peeling, shifting, distribution

2 Search space explorationI Perform exhaustive scan to discover wall clock optimal schedule, and

evidences of intricacy of the best transformationI Build an efficient heuristic to accelerate the space traversal

7

Page 13: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: Overview CGO’07

Our Objective

1 Search space constructionI Efficiently construct a space of all legal, distinct affine schedules

matmult locality fir h264 crout

~i-Bounds −1,1 −1,1 0,1 −1,1 −3,3c-Bounds −1,1 −1,1 0,3 0,4 −3,3#Sched. 1.9×104 5.9×104 1.2×107 1.8×108 2.6×1015

⇓#Legal 6561 912 792 360 798

I Rely on the polyhedral model and Integer Linear Programming toguarantee completeness and correctness of the space properties

I Search space will emcoumpass unique, distinct compositions ofreversal, skewing, interchange, fusion, peeling, shifting, distribution

2 Search space explorationI Perform exhaustive scan to discover wall clock optimal schedule, and

evidences of intricacy of the best transformationI Build an efficient heuristic to accelerate the space traversal

7

Page 14: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: Overview CGO’07

Our Objective

1 Search space constructionI Efficiently construct a space of all legal, distinct affine schedules

matmult locality fir h264 crout

~i-Bounds −1,1 −1,1 0,1 −1,1 −3,3c-Bounds −1,1 −1,1 0,3 0,4 −3,3#Sched. 1.9×104 5.9×104 1.2×107 1.8×108 2.6×1015

⇓#Legal 6561 912 792 360 798

I Rely on the polyhedral model and Integer Linear Programming toguarantee completeness and correctness of the space properties

I Search space will emcoumpass unique, distinct compositions ofreversal, skewing, interchange, fusion, peeling, shifting, distribution

2 Search space explorationI Perform exhaustive scan to discover wall clock optimal schedule, and

evidences of intricacy of the best transformationI Build an efficient heuristic to accelerate the space traversal

7

Page 15: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Scheduling in the Polyhedral Model: Overview CGO’07

Our Objective

1 Search space constructionI Efficiently construct a space of all legal, distinct affine schedules

matmult locality fir h264 crout

~i-Bounds −1,1 −1,1 0,1 −1,1 −3,3c-Bounds −1,1 −1,1 0,3 0,4 −3,3#Sched. 1.9×104 5.9×104 1.2×107 1.8×108 2.6×1015

⇓#Legal 6561 912 792 360 798

I Rely on the polyhedral model and Integer Linear Programming toguarantee completeness and correctness of the space properties

I Search space will emcoumpass unique, distinct compositions ofreversal, skewing, interchange, fusion, peeling, shifting, distribution

2 Search space explorationI Perform exhaustive scan to discover wall clock optimal schedule, and

evidences of intricacy of the best transformationI Build an efficient heuristic to accelerate the space traversal

7

Page 16: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Preliminaries CGO’07

Polyhedral Representation of Programs

Static Control PartsI Loops have affine control only

9

Page 17: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Preliminaries CGO’07

Polyhedral Representation of Programs

Static Control PartsI Loops have affine control onlyI Iteration domain: represented as integer polyhedra

for (i=1; i<=n; ++i). for (j=1; j<=n; ++j). . if (i<=n-j+2). . . s[i] = ...

DS1 =

1 0 0 −1−1 0 1 0

0 1 0 −1−1 0 1 0−1 −1 1 2

.

ijn1

≥~0

9

Page 18: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Preliminaries CGO’07

Polyhedral Representation of Programs

Static Control PartsI Loops have affine control onlyI Iteration domain: represented as integer polyhedraI Memory accesses: static references, represented as affine functions of

~xS and~p

for (i=0; i<n; ++i) {. s[i] = 0;. for (j=0; j<n; ++j). . s[i] = s[i]+a[i][j]*x[j];

}

fs( ~xS2) =[

1 0 0 0].

~xS2n1

fa( ~xS2) =[

1 0 0 00 1 0 0

].

~xS2n1

fx( ~xS2) =[

0 1 0 0].

~xS2n1

9

Page 19: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Preliminaries CGO’07

Polyhedral Representation of Programs

Static Control PartsI Loops have affine control onlyI Iteration domain: represented as integer polyhedraI Memory accesses: static references, represented as affine functions of

~xS and~pI Data dependence between S1 and S2: a subset of the Cartesian

product of DS1 and DS2 (exact analysis)

for (i=1; i<=3; ++i) {. s[i] = 0;. for (j=1; j<=3; ++j). . s[i] = s[i] + 1;

}

DS1δS2 :

1 −1 0 01 0 0 −1−1 0 0 3

0 1 0 −10 −1 0 30 0 1 −10 0 −1 3

.

iS1iS2jS21

= 0

≥~0

i

S1 iterations

S2 iterations

9

Page 20: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Preliminaries CGO’07

Polyhedral Representation of Programs

Static Control PartsI Loops have affine control onlyI Iteration domain: represented as integer polyhedraI Memory accesses: static references, represented as affine functions of

~xS and~pI Data dependence between S1 and S2: a subset of the Cartesian

product of DS1 and DS2 (exact analysis)I Reduced dependence graph labeled by dependence polyhedra

9

Page 21: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Legal Distinct Schedules

Affine Schedules

10

Page 22: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Legal Distinct Schedules

Affine Schedules

- Causality condition

Property (Causality condition for schedules)

Given RδS, θR and θS are legal iff for each pair of instances in dependence:

θR(~xR) < θS(~xS)

Equivalently: ∆R,S = θS(~xS)−θR(~xR)−1≥ 0

10

Page 23: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Legal Distinct Schedules

Affine Schedules

- Causality condition

- Farkas Lemma

Lemma (Affine form of Farkas lemma)

Let D be a nonempty polyhedron defined by A~x+~b≥~0. Then any affine function f (~x)is non-negative everywhere in D iff it is a positive affine combination:

f (~x) = λ0 +~λT(A~x+~b), with λ0 ≥ 0 and~λ≥~0.

λ0 and ~λT are called the Farkas multipliers.

10

Page 24: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Legal Distinct Schedules

Affine Schedules

- Causality condition

- Farkas Lemma

Valid

Farkas

Multipliers

10

Page 25: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Legal Distinct Schedules

Affine Schedules

- Causality condition

- Farkas Lemma

Valid

Farkas

Multipliers

Many to one

10

Page 26: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Legal Distinct Schedules

Affine Schedules

- Causality condition

- Farkas Lemma

Valid

Farkas

Multipliers

- Identification

θS(~xS)−θR(~xR)−1 = λ0 +~λT(

DR,S

(~xR

~xS

)+~dR,S

)≥ 0

DRδS iR : λD1,1 −λD1,2 +λD1,3 −λD1,4

iS : −λD1,1 +λD1,2 +λD1,5 −λD1,6

jS : λD1,7 −λD1,8

n : λD1,4 +λD1,6 +λD1,8

1 : λD1,0

10

Page 27: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Legal Distinct Schedules

Affine Schedules

- Causality condition

- Farkas Lemma

Valid

Farkas

Multipliers

- Identification

θS(~xS)−θR(~xR)−1 = λ0 +~λT(

DR,S

(~xR~xS

)+~dR,S

)≥ 0

DRδS iR : −t1R = λD1,1 −λD1,2 +λD1,3 −λD1,4

iS : t1S = −λD1,1 +λD1,2 +λD1,5 −λD1,6

jS : t2S = λD1,7 −λD1,8

n : t3S − t2R = λD1,4 +λD1,6 +λD1,8

1 : t4S − t3R −1 = λD1,0

10

Page 28: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Legal Distinct Schedules

Affine Schedules

- Causality condition

- Farkas Lemma

Valid

Farkas

Multipliers

- Identification

- Projection

I Solve the constraint systemI Use (optimized) Fourier-Motzkin projection algorithm

I Reduce redundancyI Detect implicit equalities

10

Page 29: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Valid

Transformation

Coefficients

Legal Distinct Schedules

Affine Schedules

- Causality condition

- Farkas Lemma

Valid

Farkas

Multipliers

- Identification

- Projection

10

Page 30: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Space Construction

Valid

Transformation

Coefficients

Legal Distinct Schedules

Affine Schedules

- Causality condition

- Farkas Lemma

Valid

Farkas

Multipliers

Bijection

- Identification

- Projection

I One point in the space⇔ one set of legal schedulesw.r.t. the dependence

10

Page 31: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Construction: Way to Go CGO’07

Overview

AlgorithmI Add constraints obtained for each dependenceI Bound the spaceI Search space: set of linear constraints on the schedule coefficients

(i.e. Z-polytope)

I To each integral point in the space corresponds a distinct programversion where the semantics is preserved

Benchmark ~i-Bounds #Sched #Legal Time

matmult −1,1 1.9×104 912 0.029locality −1,1 5.9×104 6561 0.022

fir 0,1 1.2×107 792 0.047h264 −1,1 1.8×108 360 0.024crout −3,3 2.6×1015 798 0.046

11

Page 32: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Exploration: Framework for Iterative Optimization CGO’07

Workflow

SCoP

representation

Iterative compilation and run of base source code

with transformed SCoP

Code generation

SourceCode

StaticAnalysis

Kernel Generation

Unit Generation

Polyhedral computing libraries

PIPLib PolyLib CLooG

Run

C compilable

code

Feedback from hardware counter(s)

Compilation

Polyhedral

representation

of SCoP

Bounded

search space

Space Construction

Space Exploration

TargetCode

I CLooG: http://www.cloog.org

I PiPLib: http://www.piplib.org

I PolyLib: http://icps.u-strasbg.fr/polylib

12

Page 33: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Exploration: Exhaustive Scan CGO’07

Performance Distribution [1/2]

6e+08

8e+08

1e+09

1.2e+09

1.4e+09

1.6e+09

1.8e+09

2e+09

0 100 200 300 400 500 600 700 800 900 1000

Cyc

les

Transformation identifier

matmult

original

5e+08

1e+09

1.5e+09

2e+09

2.5e+09

3e+09

3.5e+09

4e+09

0 1000 2000 3000 4000 5000 6000 7000

Cyc

les

Transformation identifier

locality

original

Figure: Performance distribution for matmult and locality

13

Page 34: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Exploration: Exhaustive Scan CGO’07

Performance Distribution [2/2]

1.26e+09

1.28e+09

1.3e+09

1.32e+09

1.34e+09

1.36e+09

1.38e+09

1.4e+09

1.42e+09

0 100 200 300 400 500 600 700 800

Cyc

les

Transformation identifier

crout

original

(a) GCC -O3

1.26e+09

1.27e+09

1.28e+09

1.29e+09

1.3e+09

1.31e+09

1.32e+09

1.33e+09

1.34e+09

0 100 200 300 400 500 600 700 800

Cyc

les

Transformation identifier

crout

original

original

(b) ICC -fast

Figure: The effect of the compiler

14

Page 35: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Exploration: Exhaustive Scan CGO’07

Performance Comparison

Figure: Best Version vs Original

15

Page 36: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Exploration: Heuristic Scan CGO’07

Heuristic Scan

Propose a decoupling heuristic:I The general “form” of the schedule is embedded in the iterator

coefficients

I Decouple the schedule: θS(~xS) = (~ı~p c)

~xS~n1

I Parameters and constant coefficients can be seen as a refinement

Adressing scalability to larger SCoPs:1 impose a static or dynamic limit to the number of runs (limit to the~ı part)2 replace an exhaustive enumeration of the~ı combinations by a limited set

of random draws in the~ı space.

16

Page 37: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Exploration: Heuristic Scan CGO’07

Heuristic Scan

Propose a decoupling heuristic:I The general “form” of the schedule is embedded in the iterator

coefficients

I Decouple the schedule: θS(~xS) = (~ı~p c)

~xS~n1

I Parameters and constant coefficients can be seen as a refinement

Adressing scalability to larger SCoPs:1 impose a static or dynamic limit to the number of runs (limit to the~ı part)2 replace an exhaustive enumeration of the~ı combinations by a limited set

of random draws in the~ı space.

16

Page 38: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Exploration: Heuristic Scan CGO’07

Heuristic Scan

Propose a decoupling heuristic:I The general “form” of the schedule is embedded in the iterator

coefficients

I Decouple the schedule: θS(~xS) = (~ı~p c)

~xS~n1

I Parameters and constant coefficients can be seen as a refinement

Adressing scalability to larger SCoPs:1 impose a static or dynamic limit to the number of runs (limit to the~ı part)2 replace an exhaustive enumeration of the~ı combinations by a limited set

of random draws in the~ı space.

16

Page 39: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Search Space Exploration: Heuristic Scan CGO’07

Results

40

50

60

70

80

90

100

2 4 6 8 10 12 14 16 18 20

Max

imum

spe

edup

ach

ieve

d (in

%)

Runs

locality

DecouplingRandom

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14 16 18 20

Max

imum

spe

edup

ach

ieve

d (in

%)

Runs

matmult

DecouplingRandom

65

70

75

80

85

90

95

100

2 4 6 8 10 12 14 16 18 20

Max

imum

spe

edup

ach

ieve

d (in

%)

Runs

mvt

DecouplingRandom

Figure: Comparison between random and decoupling heuristics

5e+08

1e+09

1.5e+09

2e+09

2.5e+09

3e+09

3.5e+09

4e+09

0 1000 2000 3000 4000 5000 6000 7000

Cyc

les

Transformation identifier

locality

original

6e+08

8e+08

1e+09

1.2e+09

1.4e+09

1.6e+09

1.8e+09

2e+09

0 100 200 300 400 500 600 700 800 900 1000

Cyc

les

Transformation identifier

matmult

original

4e+08

5e+08

6e+08

7e+08

8e+08

9e+08

1e+09

1.1e+09

1.2e+09

1.3e+09

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

Cyc

les

(M)

Transfo. ID

matvecttransp

Original

17

Page 40: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Conclusion: CGO’07

Conclusion

I Optimizing and / or Enabling transformation framework on top ofthe compiler

I Encouraging speedups, fast heuristic convergenceI On small kernels, optimal transformation can be discovered

Ongoing and future work:I Couple with state-of-the-art feedback-directed iterative methodsI Part II: multidimensional schedulesI Integrate into GCC GRAPHITE branch

18

Page 41: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Conclusion: CGO’07

Conclusion

I Optimizing and / or Enabling transformation framework on top ofthe compiler

I Encouraging speedups, fast heuristic convergenceI On small kernels, optimal transformation can be discovered

Ongoing and future work:I Couple with state-of-the-art feedback-directed iterative methodsI Part II: multidimensional schedulesI Integrate into GCC GRAPHITE branch

18

Page 42: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Conclusion: CGO’07

Conclusion

I Optimizing and / or Enabling transformation framework on top ofthe compiler

I Encouraging speedups, fast heuristic convergenceI On small kernels, optimal transformation can be discovered

Ongoing and future work:I Couple with state-of-the-art feedback-directed iterative methodsI Part II: multidimensional schedulesI Integrate into GCC GRAPHITE branch

18

Page 43: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Conclusion: CGO’07

Conclusion

I Optimizing and / or Enabling transformation framework on top ofthe compiler

I Encouraging speedups, fast heuristic convergenceI On small kernels, optimal transformation can be discovered

Ongoing and future work:I Couple with state-of-the-art feedback-directed iterative methodsI Part II: multidimensional schedulesI Integrate into GCC GRAPHITE branch

18

Page 44: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Conclusion: CGO’07

Conclusion

I Optimizing and / or Enabling transformation framework on top ofthe compiler

I Encouraging speedups, fast heuristic convergenceI On small kernels, optimal transformation can be discovered

Ongoing and future work:I Couple with state-of-the-art feedback-directed iterative methodsI Part II: multidimensional schedulesI Integrate into GCC GRAPHITE branch

18

Page 45: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Conclusion: CGO’07

Conclusion

I Optimizing and / or Enabling transformation framework on top ofthe compiler

I Encouraging speedups, fast heuristic convergenceI On small kernels, optimal transformation can be discovered

Ongoing and future work:I Couple with state-of-the-art feedback-directed iterative methodsI Part II: multidimensional schedulesI Integrate into GCC GRAPHITE branch

18

Page 46: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Questions: CGO’07

19

Page 47: Iterative Optimization in the Polyhedral Model: Part I, One …web.cse.ohio-state.edu/~pouchet.2/doc/cgo-03.12.07-slides.pdf · 12/03/2007  · distribution Splits a single loop nest

Questions: A Transformation Example CGO’07

Intricacy of the Transformed Code

Optimal Transformation for locality, GCC 4 -O3, P4 Xeon

S1: B[j] = A[j]S2: C[j] = A[j + N]

for (i=0;i<=M;i++) {for (j=0;j<=M;j++) {S1(i,j);S2(i,j);

}}

for (c1=-N;c1<=min(-2,M-N);c1++)for (j=0;j<=M;j++)S1(c1+N,j);

for (c1=-1;c1<=M-N;c1++) {for (j=0;j<=M;j++)S2(c1+1,j);

for (j=0;j<=M;j++)S1(c1+N,j);

}for (c1=max(M-N+1,-1);c1<=M-1;c1++)for (j=0;j<=M;j++)S2(c1+1,j);

→ 19.4% speedup, without vectorization

21