introduction to polyhedral compilation

Post on 12-Jan-2017

79 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Polyhedral Compilation

Akihiro Hayashi, Jun Shirako Rice University

1

Outline q High-level Summary q Theory q Compilers and Tools

2

HIGH LEVEL SUMMARY Introduction to Polyhedral Compilation

3

q The first priority is “performance”

4

Supercomputers Personal Computers Smartphones Embedded Pictures Borrowed From : commons.wikimedia.org, www.hirt-japan.info

Parallel Computing

Parallel programming is hard…

5

DRAM

L3 Cache

Core Core Core Core

L2 Cache L2 Cache

SIMD SIMD SIMD SIMD

L1$ L1$ L1$ L1$

DR

AM (s

low

est)

– R

egis

ter (

fast

est) Exploiting

SIMD

Scheduling tasks on CPUs

OptimizingData Locality

Multi-core CPUs Many-core GPUs

C

L2 Cache

DRAM

CCC

CC

CC

C C C C

CC

CC

CC

CC

C C C CUtilizing

Accelerators

A gap between domain experts and hardware

6

Application Domain(Domain Experts)

Prog Lang. Compilers Runtime

Want to get significant performance

improvement easily (Performance Portability)

Hard to exploit the full capability of hardware

We believe Languages and Compilers are very important!

Hardware (Concurrency Experts)

A review of literature q Automatic Parallelizing Compiler

§  IBM XL Compilers, Intel Compilers, OSCAR, Pluto, Polly, Polaris, R-Stream, SUIF, …

q Parallel Languages §  Language-based:

ü Cilk, CUDA, OpenCL, C++AMP, Java, Habanero C/Java, PGAS, … §  Directive-based:

ü OpenMP, OpenACC, OmpSs, … §  Library-based:

ü Charm++, TBB, Thrust, RAJA, Kokkos, UPC++, HJLib, …

7

From the perspective of compilers…

q Compilers are one of the most complicated software L §  Pointer Analysis §  Scalar Optimizations §  Loop Transformations §  Vectorization/SIMDization §  Scheduling §  Exploiting accelerators §  …

8Credits: dragon by Cassie McKown from the Noun Project, crossed swords by anbileru adaleru from the Noun Project, https://en.wikipedia.org/

What are compilers doing?

9

x = a + b;y = a + b;z = x + y;

+

a b

+

a b

+

Intermediate Representation(e.g. AST)

Programs

x = a + b;y = x;

z = x + y;

“Optimized” Code

Parsing Optimizations

What are compilers doing?

10

q Compiler can modify programs (e.g. change the execution order of statements) as long as maintaining the semantics of programs

x = a + b;y = a + b;z = x + y;

+

a b

+

a b

+

Intermediate Representation(e.g. AST)

Programs

x = a + b;y = x;

z = x + y;

“Optimized” Codez = x + y;x = a + b;

Examples of optimizations:Scalar optimizations

11

x = a + b;y = x;

z = x + y;

x = a + b;y = a + b;z = x + y;

a = 0;if (a) {

… }

ConstantPropaga4on

a = 0;if (0) {

… }

DeadCodeElimina4on a = 0;

CSE

Examples of optimizations:loop permutation (interchange)

12

for (i = 0; i < M; i++) { for (j = 0; j < N; j++) { b[i][j] = a[i][j]; }}

for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { b[i][j] = a[i][j]; }}

Offset access(faster on CPUs)

Stride access (slower on CPUs)

InterchangedOriginal

Examples of optimizations:loop fusion/distribution

13

for (i = 0; i < N; i++) { a[i] = b[i] + c[i]; d[i] = a[i] + e[i];}

for (i = 0; i < N; i++) { a[i] = b[i] + c[i];}for (i = 0; i < N; i++) { d[i] = a[i] + e[i];}

Better temporal locality on CPUs

Fused Distributed

Good for Vectorizationon CPUs

Depending on the loop size “N”

The phase-ordering problem q Which order is better?

14

a = 0;if (a) {

… }

DeadCodeElimina4on

a = 0;if (a) {

… }

a = 0;if (0) {

… }

ConstantPropaga4on

a = 0;if (a) {

… }

ConstantPropaga4on

a = 0;if (0) {

… }

DeadCodeElimina4on

a = 0;

15

x = a + b;y = a + b;z = x + y;

+

a b

+

a b

+

ASTPrograms

x = a + b;y = x;

z = x + y;

“Optimized” Code

AST vs. The Polyhedral Model

i >= 0;i < N;

…Polyhedron

(Affine Inequalities) “Synthesized” Code

TODAYAST

Why Polyhedral Model?

q One solution for tackling the phase-ordering problem q Good for performing a set of loop transformations

§  Loop permutation §  Loop fusion/distribution §  Loop tiling §  …

16

“The Polyhedral Model is a convenient alternative representation which combines analysis power, expressiveness and high flexibility”- OpenScop Specification and Library

THEORY Introduction to Polyhedral Compilation

17

The polyhedral model in a nutshell q  The polyhedral transformation = “scheduling (determine the execution order of statements)”

q  3 important things: §  Domain: A set of instances for a statement §  Scattering (Scheduling): an instance -> time stamp §  Access: an instance -> array element(s)

q  Limitation: Only applicable for Static Control Part (SCoP) in general §  Loop bounds and conditionals are affine functions of the surrounding the loop iterators

18

for (i=1; …){ S1; for (j=1; …) S2;

1 ≤ iS1 ≤ 2;1 ≤ iS2 ≤ 2;1 ≤ jS2 ≤ 3;iS1 = iS2;

InequalitiesProgramConstraints:

Cost Function:

ILP

δe(!s,!t) = φSj

(!t) − φSi

( !s)

for (i=1; …){ S1;}for (i=1; …) { …;

“Synthesized” Code

Ci − Cj ≥ 0,!

Representation of “Domain”

q Observations: §  S1 is executed 30

times (30 instances) §  Each instance is

associated with (i,j)19

for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1;

“The key aspect of the polyhedral model is to consider statement instances.”- OpenScop Specification and Library

Iteration Domain

q  A set of constraints to represent instances of a statement §  Using iteration vectors (i,j); §  If those constraints are affine -> Polyhedron

20

for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1;

1 ≤ i ≤ 5,1 ≤ j ≤ 6;

DS1 =

1 0 −1−1 0 50 1 −10 −1 6

⎜⎜⎜⎜

⎟⎟⎟⎟

ij1

⎜⎜⎜

⎟⎟⎟≥ 0

Credits: Clint (https://www.ozinenko.com/clint)

Representation of “Scheduling”:1-dimensional schedules

q Function T: returns the logical date of each statement

21

x = a + b; // S1y = a + b; // S2z = x + y; // S3

T_S1 = 0;T_S2 = 1;T_S3 = 2;

Logi

cal T

ime

T=0T=1T=2

Representation of “Scheduling”:multi-dimensional schedules

22

x = a + b; // S1for (i = 0; i < 2; i++) {  a[i] = x; // S2}z = x + y; // S3Lo

gica

l Tim

e

T=1

T=2

T_S1 = (0);

T_S2(0) = (1, 0);T_S2(1) = (1, 1);T_S3 = (2)

T=0

i=0i=1

q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks

§  Lexicographical Order: §  C.f. Clocks (days, hours, minutes, seconds)

TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)

Representation of “Scheduling”:multi-dimensional schedules

23

x = a + b; // S1for (i = 0; i < 2; i++) {  a[i] = x; // S2}z = x + y; // S3Lo

gica

l Tim

e

T=1

T=2

T_S1 = (0);

T_S2(i) = (1, i);

T_S3 = (2)

T=0

i=0i=1

Parameterized:

Recall “Iteration domain”

0 ≤ i < 2

q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks

§  Lexicographical Order: §  C.f. Clocks (days, hours, minutes, seconds)

TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)

Representation of “Scheduling”:multi-dimensional schedules

24

x = a + b; // S1for (i = 0; i < 2; i++) {  a[i] = x; // S2}for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] += a[i]; // S3 }}

Logi

cal T

ime

T=1

T=2

T_S1 = (0);T_S2(i) = (1, i);

T_S3(i,j) = (2, i, j);

T=0

i=0i=1

j=0i=1

j=0i=1

i=0

i=1

Loop transformations with schedules

25

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}

TS1(i,j) = 1 00 1

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟ = i

j⎛

⎝⎜⎜

⎠⎟⎟

T_S1(i, j) = (i, j);

T_S1(i, j) = (i, j);

Originalschedule

Newschedule

New Schedule

Iteration Vector

Original

NewTransformation

Loop transformations with schedules: Loop Reversal

26

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}

T_S1(i, j) = (i, j);

T_S1(i, j) = (-i, j);

Originalschedule

Newschedule

Original

NewTS1(i,j) = −1 0

0 1⎛

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟ = −i

j⎛

⎝⎜⎜

⎠⎟⎟

New Schedule

Iteration VectorTransformation

for (i = -1; i <= 0; i++) { for (j = 0; j < 3; j++) { b[-i][j] = ...; // S1 }} inew = −iold;

iold → −inew;

Loop transformations with schedules: Loop Permutation

27

for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}

for (j = 0; j < 3; j++) {  for (i = 0; i < 2; i++) b[i][j] = ...; // S1 }}

T_S1(i, j) = (i, j);

T_S1(i, j) = (j, i);

Originalschedule

Newschedule

Original

NewTS1(i,j) = 0 1

1 0⎛

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟ = j

i⎛

⎝⎜⎜

⎠⎟⎟

New Schedule

Iteration VectorTransformation

Loop transformations with schedules: Loop Skewing

28

for (i = 1; i <= 5; i++) { for (j = 1; j <= 5; j++) { a[i][j] = a[i-1][j+1]; // S1 }}

for (i = 1; i <= 5; i++) { for (j = i+1; j <= i+5; j++) { a[i][j-i] = a[i-1][j-i+1]; // S1 }}

T_S1(i, j) = (i, j);

T_S1(i, j) = (i, i+j);

Originalschedule

Newschedule

Original

NewTS1(i,j) = 1 0

1 1⎛

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟ = i

i + j⎛

⎝⎜⎜

⎠⎟⎟

New Schedule

Iteration VectorTransformation

jnew = i + jold;jold → jnew − i;

Loop transformations with schedules: Loop Skewing (Cont’d)

29

TS1 = 1 01 1

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟

Credits: Clint (https://www.ozinenko.com/clint)

(i,i+j)=(1,2);(1,3);(1,4);(1,5);(2,3);(2,4);(2,5);(2,6);(2,7);(3,4);(3,5);(3,6);(3,7);(3,8);(4,5);…

(i,j)=(1,1);(1,2);(1,3);(1,4);(1,5);(2,1);(2,2);(2,3);(2,4);(2,5);(3,1);(3,2);(3,3);(3,4);(3,5);…

DependenceExecution Order

Scalar Dimensions in schedules q 2d+1 format (d+d+1) q Can represent/transform imperfectly nested loops

§  e.g., Loop fusion/distribution

30

for (i = 0; i < 2; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = ...; // S2for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S3

T_S1(i) = (0, i, 0);T_S2(i,j) = (0, i, 1, j, 0);

T_S3(i,j) = (1, i, 0, j, 0)

Loop transformations to schedulesloop fusion w/ scalar dimensions

31

for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S2

for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1 for (j = 0; j < 3; j++) b[i] = ...; // S2

T_S1(i,j) = (0, i, 0, j); T_S2(i,j) = (1, i, 0, j);

T_S1(i,j) = (0, i, 0, j);T_S2(i,j) = (0, i, 1, j);

Originalschedule

Newschedule

TS2(i,j) =

0 01 00 00 1

⎜⎜⎜⎜

⎟⎟⎟⎟

ij⎛

⎝⎜⎜⎞

⎠⎟⎟ +

0010

⎜⎜⎜⎜

⎟⎟⎟⎟

=

0i1j

⎜⎜⎜⎜

⎟⎟⎟⎟

New Schedule

Scalar DimensionsTransformation

Original

New

Schedules in general

32

TS(!i) =

φS1(!i)

φS2(!i)

φS3(!i)

φS4(!i)"

φSd(!i)

⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟

=

C11S C12

S C13S C14

S " C1mSS

C21S C22

S C23S C24

S " C2mSS

C31S C32

S C33S C34

S " C3mSS

C41S C42

S C43S C44

S " C4mSS

# # # # $ #Cd1

S Cd2S Cd3

S Cd 4S " CdmS

S

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

!i( ) +

C10S

C20S

C30S

C40S

!Cd 0

S

⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟

Scalar DimensionsA transformation for an iteration vector

d

mS

d

1

Schedulese.g.,(0,i,0,j)

d = 2mS + 1, mS = the size of iteration vector

Schedules in general

33

TS(!i) =

φS1(!i)

φS2(!i)

φS3(!i)

φS4(!i)"

φSd(!i)

⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟

=

C11S C12

S C13S C14

S " C1mSS

C21S C22

S C23S C24

S " C2mSS

C31S C32

S C33S C34

S " C3mSS

C41S C42

S C43S C44

S " C4mSS

# # # # $ #Cd1

S Cd2S Cd3

S Cd 4S " CdmS

S

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

!i( ) +

C10S

C20S

C30S

C40S

!Cd 0

S

⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟

Scalar DimensionsA transformation for an iteration vector

d

mS

d

1

Schedulese.g.,(0,i,0,j)

d = 2mS + 1, mS = the size of iteration vector

Goal: Compute the coefficients and offsets for each statement

Legality of transformations

q All transformations are valid? NO! 34

for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2

T_S1(i) = (0, i, 0);T_S2(i,j) = (0, i, 1, j, 0);

for (i = 1; i <= 10; i++) for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 s[i] = ...; // S1

T_S2(i,j) = (0, i, 0, j, 0);T_S2(i) = (0, i, 1);

Original

NewTransforma4on

Dependences q Three types of dependence:

§  Read-After-Write: (a=1; then b=a;) §  Write-After-Read: (b=a; then a=1;) §  Write-After-Write: (a=1; then a=2;)

q Dependence: computed from domain, access, and schedule §  Transformation = Find a new schedule that satisfies

all dependences

35

Dependence polyhedron

q  Dependence polyhedron : a set of inequalities ( ) §  A general and accurate representation of instance-wise dependences

36

for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2

iS1 = iS21 ≤ iS1 ≤ 10,1 ≤ iS2 ≤ 10;0 ≤ jS2 < 3;

1 −1 0 01 0 0 −1−1 0 0 100 0 1 00 0 −1 2

⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟

iS1iS2jS2

1

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

=

≥0

DS1

DS2

S1

S2

Credits: Clint (https://www.ozinenko.com/clint)

iS1 = iS2 ⇒ iS1 − iS2 ≥ 0 ∧ iS2 − iS1 ≥ 0

Legality of transformations q Dependence polyhedron: q Legality:

§  §  If “source” instance must happen before

“target” instance in the original program, the transformed program must preserve this property (must satisfy the dependence)

37

∀ s,t ∈ Pe,(s ∈ DSi,t ∈ DSj),TSi(s) ≺ TSj(t)

Pe

Putting it all together

q Goal : Compute all coefficients and offsets such that

38

TS2(i,j) =

C11S2 C12

S2

C21S2 C22

S2

C31S2 C32

S2

C41S2 C32

S2

C51S2 C52

S2

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

iS2jS2

⎝⎜⎜

⎠⎟⎟ +

C10S2

C20S2

C30S2

C40S2

C50S2

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

TS1(i) =

C11S1

C21S1

C31S1

⎜⎜⎜⎜

⎟⎟⎟⎟

iS1( ) +C10

S1

C20S1

C30S1

⎜⎜⎜⎜

⎟⎟⎟⎟

1 −1 0 01 0 0 −1−1 0 0 100 0 1 00 0 −1 2

⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟

iS1iS2jS2

1

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

=

≥0

∀ s,t ∈ Pe,(s ∈ DS1,t ∈ DS2),TS1(s) ≺ TS2(t)

DependencePolyhedron PeSchedules

iS1 = iS21 ≤ iS1 ≤ 10,1 ≤ iS2 ≤ 10;0 ≤ jS2 < 3;

Linearizing the legality condition(The Pluto Algorithm) q The Legality condition (for iteration vectors) q Uniform dependences : distance between two dependent

iteration is a constant ( is a constant) q  Non-uniform dependences : distance between two

dependence varies ( is a function of j ) §  Apply the Farkas lemma

39

δ(s,t) = (c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≥ 0, s,t ∈ P

i → i + 1 ⇒ δ(s,t)

i → i + j ⇒ δ(s,t)

(c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≥ 0, s,t ∈ Pe ⇔

(c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≡ λe0 + λekk =1

me∑ Pek,λek ≥ 0

Each inequality in a dependence

polyhedron

Cost Function & Objective Function(The Pluto Algorithm) q Compute all coefficients and offsets under the legality

condition : Solve an ILP problem q Cost Function = Transformation policy

§  Pluto’s cost function = dependence distance

ü Fuse loops as much as possible ü Push loops carrying dependence inner level

§  Also used in ISL (Polly, PPCG, …) q Objective Function:

§  Iteratively find linearly independent solutions 40

δ(s,t) = (c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s, s,t ∈ P

minimize ≺ (u1,w,c1Sj,c2

Sj)

Step-by-step example

41

for (i = 0; i < N; i++) { for (j = 1; j < N; j++) { a[i][j] = a[j][i] + a[i][j-1]; // S1 }}

a[0][1] = a[1][0] + a[0][0]; // S1(0,1)a[0][2] = a[2][0] + a[0][1]; // S1(0,2)a[0][3] = a[3][0] + a[0][2]; // S1(0,3)...a[1][1] = a[1][1] + a[1][0]; // S1(1,1)a[1][2] = a[2][1] + a[1][1]; // S1(1,2)a[1][3] = a[3][1] + a[1][2]; // S1(1,3)...a[2][1] = a[1][2] + a[2][0]; // S1(2,1)a[2][2] = a[2][2] + a[2][1]; // S1(2,2)a[2][3] = a[3][2] + a[2][2]; // S1(2,3)...a[3][1] = a[1][3] + a[3][0]; // S1(3,1)

Dependence 1 (RAW)Dependence 2 (RAW)Dependence 3 (WAR)

(is,js) → (it,jt)

c1S1,c2

S1( ) itjt

⎜⎜

⎟⎟− c1

S1,c2S1( ) is

js

⎜⎜

⎟⎟≥ 0, is,js,it,jt ∈ Pe1

⇒ c1S1it + c2

S1jt − (c1S1is + c2

S1js) = c1S1it + c2

S1jt − (c1S1it + c2

S1(jt − 1)) ≥ 0⇒ c2

S1 ≥ 0

Step-by-step example:Legality Constraints 1 (The Pluto Algorithm) q Dependence 1 : RAW (flow dependence )

42

Source: a[0][1] = a[1][0] + a[0][0]; // S1(0,1)Target: a[0][2] = a[2][0] + a[0][1]; // S1(0,2)...

Pe1 : is = it,js = jt − 1,0 ≤ it ≤ N − 1,2 ≤ jt ≤ Nδ(s,t) = (c1

Sj,c2Sj,…,cmSj

Sj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≥ 0, s,t ∈ PLegality Constraints:

DependencePolyhedronPe1

Step-by-step example:Legality Constraints 2 (The Pluto Algorithm) q Dependence 2 : RAW (flow dependence )

43

Pe2 : is = jt,js = it,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1

c1S1,c2

S1( ) itjt

⎜⎜

⎟⎟− c1

S1,c2S1( ) is

js

⎜⎜

⎟⎟≥ 0, is,js,it,jt ∈ Pe1

⇒ c1S1it + c2

S1jt − (c1S1is + c2

S1js) = c1S1it + c2

S1jt − (c1S1jt + c2

S1it) ≥ 0⇒ (c1

S1 − c2S1)it + (c2

S1 − c1S1)jt ≥ 0,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1

δ(s,t) = (c1Sj,c2

Sj,…,cmSjSj)!t − (c1

Si,c2Si,…,cmSi

Si)!s ≥ 0, s,t ∈ PLegality Constraints:

DependencePolyhedronPe2

Target: a[1][2] = a[2][1] + a[1][1]; // S1(1,2)...Source: a[2][1] = a[1][2] + a[2][0]; // S1(2,1)

(is,js) → (it,jt)

FarkasLemma+FourierMozkin c1S1 − c2

S1 ≥ 0

Step-by-step example:Putting it all together (The Pluto Algorithm) q Dependence 1 q Dependence 2 & 3

q Avoiding zero vector

q Objective Function:

44

c2S1 ≥ 0,w ≥ c2

S1

c1S1 − c2

S1 ≥ 0,u1 ≥ 0,u1 ≥ c1S1 − c2

S1,3u1 + w ≥ c1S1 − c2

S1

c1S1 + c2

S1 ≥ 1

minimize ≺ (u1,w,c1S1,c2

S1) → (0,1,1,1)

Constraints using parameter N that bound the dependence distances

Find linearly Independent answer TS1(i,j) = 1 11 0

⎝⎜

⎠⎟ i

j⎛

⎝⎜⎜

⎠⎟⎟

Summary q  The polyhedral transformation = “scheduling (determine the execution order of statements)”

q  3 important things: §  Domain: A set of instances for a statement §  Scattering (Scheduling): an instance -> time stamp §  Access: an instance -> array element(s)

q  Limitation: Only applicable for Static Control Part (SCoP) in general §  Loop bounds and conditionals are affine functions of the surrounding the loop iterators

45

for (i=1; …){ S1; for (j=1; …) S2;

1 ≤ iS1 ≤ 2;1 ≤ iS2 ≤ 2;1 ≤ jS2 ≤ 3;iS1 = iS2;

InequalitiesProgramConstraints:

Cost Function:

ILP

δe(!s,!t) = φSj

(!t) − φSi

( !s)

for (i=1; …){ S1;}for (i=1; …) { …;

“Synthesized” Code

Ci − Cj ≥ 0,!

COMPILERS AND TOOLS Introduction to Polyhedral Compilation

46

Polyhedral Compilers & Tools q PoCC (The Polyhedral Compiler Collection)

§  http://web.cs.ucla.edu/~pouchet/software/pocc/ §  Clan: extract a polyhedral IR from the source code §  Candl: a dependence analyzer §  LetSee: legal transformation space explorer §  PLuTo: an automatic parallelizer and locality

optimizer §  CLooG: code generation from the polyhedral IR

47

Polyhedral Compilers & Tools q Polly

§  http://polly.llvm.org/ §  ISL: Integer Set Library (including code generator)

q Clay/Chrole/Clint §  https://www.ozinenko.com/projects §  Clay: “Chunky Loop Alteration wizardrY” §  Chrole: “Recovering high-level syntactic description of the

automatically computed polyhedral optimization” §  Clint: “Interactive graphical interface to the manual and

compiler-assisted program restructuring in the polyhedral model”

48

Clint

49

Further readings q  Fundamentals

§  OpenScop Specification ü  http://icps.u-strasbg.fr/people/bastoul/public_html/development/openscop/docs/openscop.html

§  ISL ü  https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf

q  Pluto algorithm §  U. Bondhugula, “Effective Automatic Parallelization and Locality Optimization Using The Polyhedral

Model” (PhD Dissertation, 2010) §  U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, “A Practical Automatic Polyhedral

Parallelizer and Locality Optimizer.” [PLDI’08] q  Polly

§  T. Grosser, S. Verdoolaege, A. Cohen, “Polyhedral AST generation is more than scanning polyhedra” [ACM TOPLAS2015]

q  Polyhedral model + AST-based Cost Function §  J. Shirako, L.N. Pouchet, V. Sarkar, “Oil and Water Can Mix: An Integration of Polyhedral and AST-

based Transformations.” [SC’14] q  GPU Code Generation

§  S. Verdoolaege, J.C Juega. A. Cohen, J.I Gomez, C. Tenllado, F. Catthoor, “Polyhedral parallel code generation for CUDA” [ACM TACO2013]

§  J. Shirako, A. Hayashi, V. Sarkar., “Optimized Two-level Parallelization for GPU Accelerators using the Polyhedral Model” [CC’17] 50

top related