parallel programming using the iteration space visualizer
DESCRIPTION
Parallel Programming using the Iteration Space Visualizer. Yijun Yu and Erik H. D'Hollander University of Ghent, Belgium http://www.elis.rug.ac.be/paris/ppt. Introduction. Overview of the approach interactive vs automatic Loop dependence Iteration Space Dependence Graph ISDG - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/1.jpg)
1
Parallel Programming using the Iteration Space Visualizer
Yijun Yu and Erik H. D'Hollander
University of Ghent, Belgiumhttp://www.elis.rug.ac.be/paris/ppt
![Page 2: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/2.jpg)
2
Introduction Overview of the approach
interactive vs automatic Loop dependence
Iteration Space Dependence Graph ISDG Instrumentation and construct ISDG
Visualization of … Dependence Transformations
Applications and Results Conclusion and Future work
![Page 3: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/3.jpg)
3
Overview of the approachProgram
Code Generation
Visualize dependence
Visualize transformation
Dependence Analysis
Dataflow Analysis
ProgramTransformation
Construct the ISDG
Instrument the program
Iteration Space Visualizer Parallel Compiler
Automatic
Interactiveexact?
why?
![Page 4: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/4.jpg)
4
Introduction (2) Overview of the approach
interactive vs automatic Loop dependence
Iteration Space Dependence Graph ISDG Instrumentation and construct ISDG
Visualization of … Dependence Transformations
Applications and Results Conclusion and Future work
![Page 5: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/5.jpg)
5
Loop Dependence Nested loops are the focus of the parallel
programming Data dependences happen when there are
multiple accesses to the same memory locations where at least one of them WRITE
Data dependence is classified as flow (first WRITE then READ), anti-flow (first READ then WRITE) or output (WRITE after WRITE)
Loop dependence is the ordering between data dependent loop iterations
![Page 6: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/6.jpg)
6
The Iteration Space Dependence Graph (ISDG)
The object to be visualized is …ISDG = Iteration Space + Loop Dependence
An iteration I=(i1..im) is a point in the m-D iteration space, which is mapped to the 3D space
The dependent iterations I and J are linked by an arrow I J
![Page 7: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/7.jpg)
7
An example of ISDG
do i=1,n do j=1,n do k=1,2 if(k.eq.1) then a(i,j,k)=(a(i-1,j,k)+a(i+1,j,k))/2 else a(i,j,k)=(a(i,j-1,k)+a(i,j+1,k))/2 endif enddo enddoenddo
i
j
k
(1,1,1) (1,2,1) (1,3,1)(2,1,1) (1,4,1)(2,2,1) (2,3,1)(3,1,1) (2,4,1)(3,2,1) (3,3,1)(4,1,1) (3,4,1)(4,2,1) (4,3,1) (4,4,1)
(1,1,2) (1,2,2) (1,3,2) (1,4,2)(2,1,2) (2,2,2) (2,3,2) (2,4,2)(3,1,2) (3,2,2)(4,1,2) (3,3,2) (3,4,2)(4,2,2) (4,3,2) (4,4,2)
![Page 8: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/8.jpg)
8
Instrumentation and the ISDG construction Program instrumentation
Loop iteration: id + indices Array reference:
id + name + Read | Write + subscripts ISDG construction
1. Create the iteration points from indices2. Setup a reference list for every accessed
location3. Mark Flow-, Anti- and Output-dependence
arrows
![Page 9: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/9.jpg)
9
Introduction (3) Overview of the approach
interactive vs automatic Loop dependence
Iteration Space Dependence Graph ISDG Instrumentation and construct ISDG
Visualization of … Dependence Transformations
Applications and Results Conclusion and Future work
![Page 10: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/10.jpg)
10
Dependence Visualization Loop visualization
3D view-port of Iteration space Graphical operations
Detecting and enhancing parallelism Automatic parallelization Maximal parallelism detection Parallelization by plane execution
![Page 11: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/11.jpg)
11
Loop Visualization Visualization of the ISDG
Points + Arrows + Colors + Labels + Axes 3D view-port of Iteration space
=3D, >3D and < 3Dprojection (condensed points and arrows)expansion (dummy index dimension)
ISDG operations Graphical operations: rotate, move and
animate Query dialogs: selection, variable zooming
and dependence type filtering, etc.
![Page 12: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/12.jpg)
12
Automatic Parallelization Sequential execution
Traverse the iteration space in lexicographical order and count the iterations TSeq
Parallel execution Traverse the iterations in a marked loop in parallel and
count the steps Tpar
Report speedup Spara = Tseq / Tpar
Automatic parallelization Test whether the dependence ordering is kept for all
combinations of loop parallelizations :DOALLi1,i2,i3?+DOALLi1,i2?+DOALLi1,i3? + DOALLi2,i3?+DOALLi1?+DOALLi2?+DOALLi3?
![Page 13: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/13.jpg)
13
Maximal Parallelism Detection Data-flow order
An iteration is executed as soon as its data are ready, i.e. after all the dependent iterations are carried out
The iterations of the same delay are executed at the same time, i.e. in parallel
The dependent iterations are executed sequentially. Count the steps Tdf
Minimal executing time = Maximal parallelism
Maximal speedup Smax = Tseq/Tdf
![Page 14: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/14.jpg)
14
Plane Parallelization Define a cutting plane Ax+By+Cz=D
Clicking at three points Giving parameters A,B,C,D
Plane execution Traverse the planes d0 Ax+By+Cz<d0+Td
along the normal vector (A,B,C) Plane parallelization
Matching the dataflow execution may enhance speedup Splane=Tseq/Td
Verified by cross-plane dependence checking or 3D->2D projection checking
![Page 15: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/15.jpg)
15
Dependence Visualization procedural summary
Spara=Sdf?
Start
Maximal parallelism detection Sdf
Automatic parallelization Spara
Prune false dependences
End
Yes
Plane parallelization Splane
Splane>Spara?
No
NoYes
Program transformation
![Page 16: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/16.jpg)
16
Program Transformations
When Sdf>Spara, loop transformations may enhance the parallelism of the target loop…
Unimodular Loop Transformations Why? 3D 3D, 1-to-1, etc.
Loop Projections and Expansions Loop Projection: >3D 3D Loop Expansion: <3D 3D
![Page 17: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/17.jpg)
17
Unimodular Transformations
?
?
?
?
?
?
?
?
?
Normal vector
(A,B,C)
A
B
C
?
?
?
?
?
?
A
B
C
!
!
!
!
!
!
•Unimodular
•Legality
Look for a suitable transformation Interactive way
Automatic way Possible when array index expression are linear
and all the distance vectors lie in a plane Extract largest base vectors of the dependence
distances and construct the transformation (pseudo distance matrix approach)
![Page 18: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/18.jpg)
18
Loop Expansion Non-perfectly vs perfectly nested loop Statement vs Iteration-level parallelism Statement reordering affine remapping Loop expansion
Use additional dimension to index the statements in the loop body
Unimodular loop transformations are still applicable at the statement level
![Page 19: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/19.jpg)
19
Introduction Overview of the approach
interactive vs automatic Loop dependence
Iteration Space Dependence Graph ISDG Instrumentation and construct ISDG
Visualization of … Dependence Transformations
Applications and Results Conclusion and Future work
![Page 20: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/20.jpg)
20
Application and Results Gauss-Jordan:
linear system solver Lim’s example:
statement-level parallelism Cholesky kernel:
loop projection CFD application:
unimodular transformation
![Page 21: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/21.jpg)
21
Gauss-Jordan elimination do i=1,n do j=1,n if(i.ne.j) then f=a(j,i)/a(i,i)C$doisv do k=i+1,n+1 a(j,k)=a(j,k)-f*a(i,k) enddo endif enddo enddo
id=0 do i = 1,n do j = 1,n if (i.ne.j) then
write(11,*) id+1," r ","a",2,j,i write(11,*) id+1," r ","a",2,i,i write(11,*) id+1," w ","f"," 1 0 " f=a(j,i)/a(i,i) do k = i+1,n
id=id+1 write(11,*) id,i,j,k write(11,*) id," r ","a",2,j,k write(11,*) id," r ","f"," 1 0 " write(11,*) id," r ","a",2,i,k write(11,*) id," w ","a",2,j,k a(j,k)=a(j,k)-f*a(i,k) enddo endif enddo enddo
![Page 22: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/22.jpg)
22
Gauss-Jordan elimination
Plane: I = 1
DOALL J, K validSeq. time: 30 Dataflow: 4, Speedup: 7.5
Loop time: 4, Speedup: 7.5
IJ
K
(1,4,2)
(2,4,3)
(2,4,4) (3,4,4)
(3,4,5)
(1,4,3)
(2,4,5)
(1,4,4)
(1,4,5)(4,3,5)
(1,3,2)
(2,3,3)(1,3,3)
(1,3,4) (2,3,4)
(2,3,5)(1,3,5)
(1,2,2)
(3,2,4)
(4,2,5)(3,2,5)
(1,2,3)
(1,2,4)
(1,2,5)
(2,1,3)
(2,1,4) (3,1,4)
(3,1,5) (4,1,5)(2,1,5)
![Page 23: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/23.jpg)
23
Lim’s Example
The original program do l1=1,n do l2=1,n a(l1,l2)=a(l1,l2)+b(l1-1,l2) b(l1,l2)=a(l1,l2-1)*b(l1,l2) enddo enddo
do l1=1,n do l2=1,nc$doisv do l3=0,1 if(l3.eq.0) a(l1,l2)=a(l1,l2)+b(l1-1,l2) if(l3.eq.1) b(l1,l2)=a(l1,l2-1)*b(l1,l2) enddo enddo enddo
Loop Expansion
![Page 24: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/24.jpg)
24
Lim’s example unimodular transformation
1
-1 1
1
0
0
0
1
0
Plane:L1-L2+L3=0
DOALL L3 validSeq. time: 32 Dataflow: 7, Speedup: 4.57
Loop time: 16, Speedup: 2.00
l1
l2
l3
i1
i2i3
Plane: i1 = 0
DOALL i1 validSeq. time: 32 Dataflow: 7, Speedup: 4.57
Loop time:7, Speedup: 4.57
![Page 25: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/25.jpg)
25
Lim’s exampleCode generation
C The unimodular transformed code doall i1 = 1-n, n do i2 = max(i1,1), min(n,i1+n) do i3 = max(-i1+i2,1), min(-i1+i2+1,n) l1 = i2 l2 = i3 l3 = i1 - i2 + i3 if (l3.eq.1)a(l1,l2)=a(l1,l2)+b(l1-1,l2) if (l3.eq.2)b(l1,l2)=a(l1,l2-1)*b(l1,l2) enddo enddo enddoall
1
-1
1
1
0
0
0
1
0FourierMotzkin
0
1
0
0
0
1
1
-1 1
Inversion
![Page 26: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/26.jpg)
26
Lim’s exampleCode generation
symbolic n;IS1:={[i,j,k]:1<=i,j<=n && k=0};IS2:={[i,j,k]:1<=i,j<=n && k=1};T1:={[i,j,k]->[i-j+k,i,j]};T2:={[i,j,k]->[i-j+k,i,j]};codegen 0 T1:IS1,T2:IS2;
1
-1
1
1
0
0
0
1
0
I’ = I – J + KJ’ = IK’= J
![Page 27: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/27.jpg)
27
Lim’s exampleCode generation
1
-1
1
1
0
0
0
1
0
C the optimized code by Omega calculator doall p = 1-n, n if (p.ge.1)b(p,1) = a(p,0) * b(p,1) do l1 = max(p+1,1), min(p+n-1,n) a(l1,l1-p) =a(l1,l1-p)+b(l1-1,l1-p) a(l1,l1-p+1)=a(l1,l1-p)*b(l1,l1-p+1) enddo if (p.le.0)a(p+n,n)=a(p+n,n)+b(p+n-1,n) enddoall
![Page 28: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/28.jpg)
28
Cholesky kernel (I,K,J,L) DO 1 I = 0,NRHS DO 1 K = 0,2*N+1 IF (K.LE.N) THEN I0 = MIN(M,N-K) ELSE I0 = MIN(M,2*N-K+1) ENDIF DO 1 J = 0,I0C$DOISV DO 1 L = 0,NMAT IF (K.LE.N) THEN IF (J.EQ.0) THEN 8 B(I,L,K)=B(I,L,K)*A(L,0,K) ELSE 7 B(I,L,K+J)=B(I,L,K+J)-A(L,-J,K+J)*B(I,L,K) ENDIF ELSE IF (J.EQ.0) THEN 9 B(I,L,K)=B(I,L,K)*A(L,0,K) ELSE 6 B(I,L,K-J)=B(I,L,K-J)-A(L,-J,K)*B(I,L,K) ENDIF ENDIF1 CONTINUE
C THE ORIGINAL KERNEL DO 6 I = 0, NRHS DO 7 K = 0, N DO 8 L = 0, NMAT8 B(I,L,K) = B(I,L,K) * A(L,0,K) DO 7 J = 1, MIN (M, N-K) DO 7 L = 0, NMAT7 B(I,L,K+J) = B(I,L,K+J) - A(L,-J,K+J) * B(I,L,K) DO 6 K = N, 0, -1 DO 9 L = 0, NMAT9 B(I,L,K) = B(I,L,K) * A(L,0,K) DO 6 J = 1, MIN (M, K) DO 6 L = 0, NMAT6 B(I,L,K-J) = B(I,L,K-J) - A(L,-J,K) * B(I,L,K)
Loop Fusion
![Page 29: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/29.jpg)
2929
Cholesky Kernel
29
(I,K,J ,L)
IK
J
Plane: L=0
I
KL
Loop Projections
![Page 30: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/30.jpg)
30
Cholesky kernel (I,K,J,L) DO 1 I = 0,NRHS DO 1 K = 0,2*N+1 IF (K.LE.N) THEN I0 = MIN(M,N-K) ELSE I0 = MIN(M,2*N-K+1) ENDIF DO 1 J = 0,I0C$DOISV
DOALL 1 L = 0,NMAT IF (K.LE.N) THEN IF (J.EQ.0) THEN 8 B(I,L,K)=B(I,L,K)*A(L,0,K) ELSE 7 B(I,L,K+J)=B(I,L,K+J)-A(L,-J,K+J)*B(I,L,K) ENDIF ELSE IF (J.EQ.0) THEN 9 B(I,L,K)=B(I,L,K)*A(L,0,K) ELSE 6 B(I,L,K-J)=B(I,L,K-J)-A(L,-J,K)*B(I,L,K) ENDIF ENDIF1 CONTINUE
(L,I,K,J)
![Page 31: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/31.jpg)
31
CFD application Computation Fluid Dynamics CFD
Navier-Stokes equations Successive Over-Relaxation SOR Kernel 3D loop: difficult to analyze
172 array references/iteration33 if-branches/iteration
Unimodular transformation found!
![Page 32: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/32.jpg)
32
CFD Application
Range:I1’= 6,24I2’= 1, 4I3’= 1, 4
Plane: i1’=9
Seq. timeDOALL i2’,i3’
Dataflow: 19, Speedup: 3.37Loop time:19,Speedup: 3.37
I1’
I2’
I3’
(9,1,1)
(9,2,1)
(9,1,2)
Range:i1= 1, 4i2= 1, 4i3= 1, 4
Plane: 3 i1+2 i2+i3=9
Seq. time: 64 Dataflow: 19, Speedup: 3.37Loop time: 64, Speedup: 1.00
i1
i2
i3
(2,1,1)
(1,2,2)
(1,1,4)
3
2
1
0
1
0
1
0
0
![Page 33: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/33.jpg)
33
Conclusion and Future work Allowing the exact visualization of real
program loops Assistance with detecting parallel loops Estimation of maximal speedup using
dataflow execution Assistance with finding suitable loop
transformations Future work:
Seemless Integration into PPT (parallel programming environment)
![Page 34: Parallel Programming using the Iteration Space Visualizer](https://reader036.vdocuments.us/reader036/viewer/2022081506/568145ce550346895db2d5e8/html5/thumbnails/34.jpg)
34
THANKS For you attention!
Any question?