spire: a methodology for sequential to parallel intermediate ... · async nish next atomic pgas...

SPIRE: A Methodology for Sequential to ParallelIntermediate Representation ExtensionDounia Khaldi Pierre Jouvelot François Irigoin Corinne An ourtCRI, Mathématiques et systèmesMINES ParisTe h35 rue Saint-Honoré, 77300 Fontainebleau, Fran eHiPEAC Computing Systems Week, Paris, Fran eMay 03, 20131/14

Context and MotivationCilk, UPC, X10, Habanero-Java, Chapel, OpenMP, MPI,OpenCL...Parallelism handling in ompilersA Parallel Intermediate RepresentationTrade-o� between expressibility and on iseness of representationGeneri (language-neutral) and simpli ityHuge ompiler platformsGCC (more than 7 million lines of ode)LLVM (more than 1 million lines of ode)PIPS (600 000 lines of ode)ProposalSPIRE: Sequential to Parallel Intermediate Representation Extensionmethodology 2/14

Ri h Related Work !Sequential Parallel ParallelismIR IRAST

GIMPLE GCC (GOMP) Additional built-ins[Merrill, 2003℄ [Novillo, 2006℄ (__sync_fetch_and_add)Habanero-Java HPIR New nodes:[Cavé et al., 2011℄ [Zhao and Sarkar, 2011℄ AsyncRegionEntry,AST IR AsyncRegionExitFinishRegionEntry,FinishRegionExit� InsPIRe [Insieme, ℄ Built-ins� PLASMA [Pai et al., 2010℄ Ve tor operators:add, reduce and par

Graph

Program Parallel Program mgoto ontrol andDependen e Graph Graph [Sarkar and Simons, 1993℄ syn hronization edges[Ferrante et al., 1987℄� Hierar hi al Task Graph New nodes[Girkar and Poly hronopoulos, 1992℄ for tasks� Stream Graph [Choi et al., 2009℄ Nodes for streamsLLVM IR LLVM PIR Metadata[Team, 2010℄ llvm.loop.parallel 3/14

Outline1 Design Approa h2 PIPS (Sequential) IR3 SPIRE, a Sequential to Parallel IR Extension4 SPIRE Operational Semanti s5 Validation: Appli ation to LLVM6 Con lusion 4/14

Design Approa h“parallelism or concurrency are operational concepts

that refer not to the program, but to itsexecution.” [Dijkstra, 1977]

Execution Synchronization MemoryLanguage Parallelism Task Task Point-to- Atomi Model Data dis-tribution reation join point se tionCilk � spawn syn � ilk_lo k Shared �(MIT)Chapel forall begin � syn syn PGAS (on)(Cray) oforall atomi (Lo ales) obeginX10(IBM), forea h asyn �nish next atomi PGAS (at)Habanero-Java(Ri e) future for eget isolated (Pla es)OpenMP omp for omp task omp taskwait � omp riti al Shared private,omp se -tions omp se tion omp barrier ompatomi shared...OpenCL EnqueueND- EnqueueTask Finish events atom_add, Distribu- ReadBu�erRangeKernel EnqueueBarrier ... ted WriteBu�erMPI MPI_Init MPI_spawn MPI_Finalize � � Distribu- MPI_SendMPI_Barrier ted MPI_Re v...SPIRE sequential ,

parallelspawn barrier signal ,

waitatomic Shared,

Distribu-ted

send ,recv 5/14

PIPS (Sequential) IR in Newgeninstruction = call + forloop + sequence;

statement = instruction x declarations:entity*;

entity = name:string x type x initial:value;

forloop = index:entity xlower:expression x upper:expression xstep:expression x body:statement;

sequence = statements:statement*;

6/14

SPIRE: Exe utionexecution = sequential :unit + parallel :unit;Add execution to ontrol onstru ts:

loopsequence

forall I in 1..n dot[i] = 0;

forloop(I,1,n,1,t[i] = 0,parallel )

forall in Chapel, and its SPIRE ore language representation7/14

SPIRE: Syn hronizationsynchronization = none:unit +

spawn:entity + barrier :unit +single :bool + atomic :reference;Add synchronization to statement

mode = OUT_OF_ORDER_EXEC_MODE_ENABLE;c = clCreateCommandQueue (context,

device_id,mode,&err);clEnqueueTask (c, k_A, 0, NULL, NULL);clEnqueueTask (c, k_B, 0, NULL, NULL);clEnqueueBarrier (c);clEnqueueTask (c, k_C, 0, NULL, NULL);

barrier (spawn(zero,k_A(...));spawn(one,k_B(...)));spawn(zero,k_C(...))OpenCL example illustrating spawn and barrier statements, and itsSPIRE ore language representation 8/14

SPIRE: Point-to-Point Syn hronization (Event API)event newEvent (int i);void freeEvent (event e);void signal (event e);void wait (event e);Add event as a new typefinish {

phaser ph=new phaser ();for (j = 1;j <= n;j++){

async phased (ph<SIG_WAIT>){

S; next ; S′;}

}}

barrier (ph=newEvent (-(n-1));forloop(j, 1, n, 1,

spawn(j, S;signal (ph);wait (ph);signal (ph);S′),

parallel );freeEvent (ph)

)A phaser in Habanero-Java, and its SPIRE ore language representation9/14

SPIRE as a Language Transformerκ ∈ Con�guration = Memory × Stmtπ ∈ State = Pro → Con�guration × Pro si ∈ Pro = N ∈ Pro s = P(Pro )dom(π) = {i ∈ Pro /π(i) is de�ned}π[i → (κ, )] the state π extended at i with (κ, )

κ → κ′

π[i → (κ, )] →֒ π[i → (κ′, )]n = ζ(I)mπ[i → ((m,spawn(I,S)), )] →֒

π[i → ((m,nop), ∪ {n})][n → ((m,S), ∅)] 11/14

Validation: LLVMfunction = blocks:block*;block = label:entity x phi_nodes:phi_node* x

instructions:instruction* x terminator;phi_node = call;instruction = call;terminator = conditional_branch + unconditional_branch +

return;conditional_branch = value:entity x label_true:entity x

label_false:entity;unconditional_branch = label:entity;return = value:entity;

12/14

Validation: LLVMfunction = blocks:block*;block = label:entity x phi_nodes:phi_node* x




sum = 42;for (i=0; i<10; i++){

sum = sum + 2;}

entry:br label %bb1bb: ; preds = %bb1%0 = add nsw i32 %sum.0, 2%1 = add nsw i32 %i.0, 1br label %bb1bb1: ; preds = %bb, %entry%sum.0 = phi i32 [42,%entry],[%0,%bb]%i.0 = phi i32 [0,%entry],[%1,%bb]%2 = icmp sle i32 %i.0, 10br i1 %2, label %bb, label %bb2 12/14

Validation: SPIRE (LLVM)function = blocks:block*;block = label:entity x phi_nodes:phi_node* x




13/14

Validation: SPIRE (LLVM)function = blocks:block*;block = label:entity x phi_nodes:phi_node* x




function ; function x execution ;block ; block x execution ;instruction ; instruction x synchronization ;Intrinsic Functions: send , recv , signal , wait 13/14

Con lusionSPIRE methodology as an IR transformer:10 on epts olle ted in three groupsexe ution, syn hronization, data distributionSPIRE (PIPS), SPIRE (LLVM)Operational semanti s for SPIRE [Khaldi et al., 2012b℄Trade-o� between expressibility and on iseness of representationAppli ations:Implementation of SPIRE (PIPS)Implementation of a new BDSC-based task parallelizationalgorithm [Khaldi et al., 2012a℄Generation of OpenMP and MPI ode from the same parallel IRFuture WorkTransformations for parallel languages en oded in SPIRERepresentation of the PGAS memory modelAddition of programming features su h as ex eptions andspe ulation 14/14

Referen esCavé, V., Zhao, J., and Sarkar, V. (2011).Habanero-Java: the New Adventures of Old X10.In 9th International Conferen e on the Prin iples and Pra ti e of Programming in Java (PPPJ).Choi, Y., Lin, Y., Chong, N., Mahlke, S., and Mudge, T. (2009).Stream Compilation for Real-Time Embedded Multi ore Systems.In Pro eedings of the 7th annual IEEE/ACM International Symposium on Code Generation andOptimization, CGO '09, pages 210�220, Washington, DC, USA.Dijkstra, E. W. (1977).Re: �Formal Derivation of Strongly Corre t Parallel Programs� by Axel van Lamsweerde andM.Sintzo�. ir ulated privately.Ferrante, J., Ottenstein, K. J., and Warren, J. D. (1987).The Program Dependen e Graph And Its Use In Optimization.ACM Trans. Program. Lang. Syst., 9(3):319�349.Girkar, M. and Poly hronopoulos, C. D. (1992).Automati Extra tion of Fun tional Parallelism from Ordinary Programs.IEEE Trans. Parallel Distrib. Syst., 3:166�178.Insieme.Insieme - an Optimization System for OpenMP, MPI and OpenCL Programs.http://www.dps.uibk.ac.at/insieme/architecture.html.Khaldi, D., Jouvelot, P., and An ourt, C. (2012a).Parallelizing with BDSC, a Resour e-Constrained S heduling Algorithm for Shared and DistributedMemory Systems.Te hni al Report CRI/A-499 (Submitted to Parallel Computing), MINES ParisTe h.Khaldi, D., Jouvelot, P., An ourt, C., and Irigoin, F. (2012b).SPIRE: A Sequential to Parallel Intermediate Representation Extension.Te hni al Report CRI/A-487 (Submitted to CGO'13), MINES ParisTe h.Merrill, J. (2003).GENERIC and GIMPLE: a New Tree Representation for Entire Fun tions.In GCC developers summit 2003, pages 171�180.Novillo, D. (2006).OpenMP and Automati Parallelization in GCC.In the Pro eedings of the GCC Developers Summit.Pai, S., Govindarajan, R., and Thazhuthaveetil, M. J. (2010).PLASMA: Portable Programming for SIMD Heterogeneous A elerators.In Workshop on Language, Compiler, and Ar hite ture Support for GPGPU, held in onjun tionwith HPCA/PPoPP 2010, Bangalore, India.Sarkar, V. and Simons, B. (1993).Parallel Program Graphs and their Classi� ation.In Banerjee, U., Gelernter, D., Ni olau, A., and Padua, D. A., editors, LCPC, volume 768 ofLe ture Notes in Computer S ien e, pages 633�655. Springer.Team, T. L. D. (2010).The LLVM Referen e Manual (Version 2.6)..Zhao, J. and Sarkar, V. (2011).Intermediate Language Extensions for Parallelism.In Pro eedings of the ompilation of the o-lo ated workshops on DSM'11, TMC'11, AGERE!'11,AOOPES'11, NEAT'11, & VMIL'11, SPLASH '11 Workshops, pages 329�340, New York, NY,USA. ACM.

15/14

http://www.dps.uibk.ac.at/insieme/architecture.html

http://llvm.org/docs/LangRef.html

SPIRE: A Methodology for Sequential to ParallelIntermediate Representation ExtensionDounia Khaldi Pierre Jouvelot François Irigoin Corinne An ourtCRI, Mathématiques et systèmesMINES ParisTe h35 rue Saint-Honoré, 77300 Fontainebleau, Fran eHiPEAC Computing Systems Week, Paris, Fran eMay 03, 201316/14

SPIRE: Data Distributionvoid send (int dest, entity buf);void recv (int source, entity buf);

MPI_Init(&argc, &argv[]);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);if (rank == 0)

MPI_Recv(sum,sizeof (sum),MPI_FLOAT,1,1,MPI_COMM_WORLD,&stat);

else if (rank == 1){sum = 42;MPI_Send(sum,sizeof (sum),MPI_FLOAT,

0,1,MPI_COMM_WORLD);}MPI_Finalize();

forloop(rank,0,size,1,test(rank==0,

recv (one,sum),test(rank==1,

sum=42;send (zero,sum),nop)

),parallel )

17/14

SPIRE: Data DistributionNon-blo king send ≡

spawn(new,send (zero,sum))Non-blo king re eive ≡

finish_recv = false;spawn(new,recv (one,sum);

atomic (finish_recv=true))Broad ast ≡test(rank==0,

forloop(rank,1,size,1,send (rank,sum),parallel ),

recv (zero,sum)) 18/14

spire: a methodology for sequential to parallel intermediate ... · async nish next atomic pgas...

Documents