dependence-driven loop manipulation
DESCRIPTION
DEPENDENCE-DRIVEN LOOP MANIPULATION. Based on notes by David Padua University of Illinois at Urbana-Champaign. DEPENDENCES. DEPENDENCES (continued). DEPENDENCE ANDPARALLELIZATION (SPREADING). OpenMP Implementation. RENAMING: To Remove Memory Related Dependencies . DEPENDENCES IN LOOPS. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/1.jpg)
DEPENDENCE-DRIVEN LOOPMANIPULATION
Based on notes by David PaduaUniversity of Illinois at Urbana-Champaign
1
![Page 2: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/2.jpg)
DEPENDENCES
2
![Page 3: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/3.jpg)
DEPENDENCES (continued)
3
![Page 4: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/4.jpg)
DEPENDENCE ANDPARALLELIZATION (SPREADING)
4
![Page 5: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/5.jpg)
OpenMP Implementation
5
![Page 6: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/6.jpg)
RENAMING: To Remove Memory Related Dependencies
6
![Page 7: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/7.jpg)
DEPENDENCES IN LOOPS
7
![Page 8: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/8.jpg)
DEPENDENCES IN LOOPS (Cont.)
8
![Page 9: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/9.jpg)
DEPENDENCE ANALYSIS
9
![Page 10: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/10.jpg)
DEPENDENCE ANALYSIS (continued)
10
![Page 11: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/11.jpg)
LOOP PARALLELIZATION AND VECTORIZATION
11
The reason is that if there are no cycles in the dependence graph, then there will be no races in the parallel loop.
A loop whose dependence graph is cycle free can be parallelized or vectorized
![Page 12: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/12.jpg)
ALGORITHM REPLACEMENT
12
• Some program patterns occur frequently in programs. They can be replaced with a parallel algorithm.
![Page 13: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/13.jpg)
LOOP DISTRIBUTION• To insulate these patterns, we
can decompose loops into several loops, one for each strongly-connected component (π-block)in the dependence graph.
13
![Page 14: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/14.jpg)
LOOP DISTRIBUTION (continued)
14
![Page 15: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/15.jpg)
LOOP INTERCHANGING• The dependence information determines whether or not the loop headers
can be interchanged. • The headers of the following loop can be interchanged
15
![Page 16: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/16.jpg)
LOOP INTERCHANGING (continued)• The headers of the following loop can not be interchanged
16
![Page 17: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/17.jpg)
DEPENDENCE REMOVAL: Scalar Expansion:• Some cycles in the dependence graph can be eliminated by using
elementary transformations.
17
![Page 18: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/18.jpg)
Induction variable recognition
• Induction variable: a variable that gets increased or decreased by a fixed amount on every iteration of a loop
18
![Page 19: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/19.jpg)
More about the DO to PARALLEL DO transformation
• When the dependence graph inside a loop has no cross-iteration dependences, it can be transformed into a PARALLEL loop.
19
do i=1,nS1: a(i) = b(i) + c(i)S2: d(i) = x(i) + 1end do
do i=1,nS1: a(i) = b(i) + c(i)S2: d(i) = a(i) + 1end do
![Page 20: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/20.jpg)
20
![Page 21: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/21.jpg)
Loop Alignment• When there are cross iteration dependences, but no cycles, do loops can be
aligned to be transformed into parallel loops (DOALLs)
21
![Page 22: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/22.jpg)
Loop Distribution• Another method for eliminating cross-iteration dependences
22
![Page 23: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/23.jpg)
Loop Coalescing for DOALL loops
Consider a perfectly nested DOALL loop such as
23
This could be trivially transformed into a singly nested loop with a tuple of variables as index:
This coalescing transformation is convenient for scheduling and could reduce the overhead involved in starting DOALL loops.
![Page 24: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/24.jpg)
Extra Slides
24
![Page 25: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/25.jpg)
25
Why loop Interchange: Matrix Multiplication Example A classic example for locality-aware programming is
matrix multiplication
for (i=0;i<N;i++)for (j=0;j<N;j++)for (k=0;k<N;k++)c[i,j] += a[i][k] * b[k][j];
There are 6 possible orders for the three loops i-j-k, i-k-j, j-i-k, j-k-i, k-i-j, k-j-i
Each order corresponds to a different access patterns of the matrices
Let’s focus on the inner loop, as it is the one that’s executed most often
![Page 26: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/26.jpg)
26
Inner Loop Memory Accesses Each matrix element can be accessed in three modes in
the inner loop Constant: doesn’t depend on the inner loop’s index Sequential: contiguous addresses Stride: non-contiguous addresses (N elements apart)
c[i][j] += a[i][k] * b[k][j]; i-j-k: Constant Sequential Strided i-k-j: Sequential Constant Sequential j-i-k: Constant Sequential Strided j-k-i: Strided Strided Constant k-i-j: Sequential Constant Sequential k-j-i: Strided Strided Constant
![Page 27: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/27.jpg)
27
Loop order and Performance Constant access is better than sequential
access it’s always good to have constants in loops
because they can be put in registers Sequential access is better than strided
access sequential access is better than strided because
it utilizes the cache better Let’s go back to the previous slides
![Page 28: DEPENDENCE-DRIVEN LOOP MANIPULATION](https://reader035.vdocuments.us/reader035/viewer/2022062811/568160d0550346895dd00568/html5/thumbnails/28.jpg)
28
Best Loop Ordering?c[i][j] += a[i][k] * b[k][j];
i-j-k: Constant Sequential Stridedi-k-j: Sequential Constant Sequentialj-i-k: Constant Sequential Stridedj-k-i: Strided Strided Constantk-i-j: Sequential Constant Sequentialk-j-i: Strided Strided Constant
k-i-j and i-k-j should have the best performance (no Strided) i-j-k and j-i-k should be worse ( 1 strided) j-k-i and k-j-i should be the worst (2 strided)