cmput680 - winter 2006
DESCRIPTION
CMPUT680 - Winter 2006. Topic C: Loop Fusion Kit Barton www.cs.ualberta.ca/~cbarton. Outline. Definition of loop fusion Basic concepts Prerequisites of loop fusion A loop fusion algorithm Example. Loop Fusion. Combine 2 or more loops into a single loop - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/1.jpg)
March 14, 2002 1
CMPUT680 - Winter 2006
Topic C: Loop FusionKit Barton
www.cs.ualberta.ca/~cbarton
![Page 2: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/2.jpg)
March 14, 2002 2
Outline
• Definition of loop fusion• Basic concepts• Prerequisites of loop fusion• A loop fusion algorithm• Example
![Page 3: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/3.jpg)
March 14, 2002 3
Loop Fusion
• Combine 2 or more loops into a single loop
• This cannot violate any dependencies between the loop bodies
• Several conditions which must be met for fusion to occur
• Often these conditions are not initially satisfied
![Page 4: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/4.jpg)
March 14, 2002 4
Advantages of Loop Fusion
• Save increment and branch instructions
• Creates opportunities for data reuse
• Provide more instructions to instruction scheduler to balance the use of functional units
![Page 5: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/5.jpg)
March 14, 2002 5
Disadvantages of Loop Fusion
• Increase code size effecting instruction cache performance
• Increase register pressure within a loop
• Could cause the formation of loops with more complex control flow
![Page 6: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/6.jpg)
March 14, 2002 6
Background
• There has been extensive work done on loop fusion
• Most has focused on weighted loop fusion (Gao et al., Kennedy and McKinley, Megiddo and Sarkar)
• Extensive work has also been done it performing loop fusion to increase parallelism
![Page 7: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/7.jpg)
March 14, 2002 7
Weighted Loop Fusion
• Associates non-negative weights with each pair of loop nests
• Weights are a measurement of the expected gain if the two loops are fused
• Gains include potential for array contraction, data reuse and improved local register allocation
![Page 8: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/8.jpg)
March 14, 2002 8
Optimal Loop Fusion
• Fuse loops to optimize data reuse, taking into consideration resource constraints and register usage
• This problem is NP-Hard
![Page 9: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/9.jpg)
March 14, 2002 9
Maximal Loop Fusion
• Our approach is to perform maximal loop fusion
• Fuse as many loops as possible, without considering resource constraints
• Fuse loops as soon as possible, not considering the consequences
![Page 10: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/10.jpg)
March 14, 2002 Allen & Kennedy, p. 150, 353 10
Dominators and Post Dominators
• A node x in a directed graph G with a single exit node dominates node y in G if any path from the entry node of G to y must pass through x
• A node x in a directed graph G with a single exit node post-dominates node y in G if any path from y to the exit node of G must pass through x
![Page 11: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/11.jpg)
March 14, 2002 11
Requirements for Loop Fusion
i. Loops must have identical iteration counts (be conforming)
ii. Loops must be control-flow equivalentiii. Loops must be adjacentiv. There cannot be any negative distance
dependencies between the loops
![Page 12: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/12.jpg)
March 14, 2002 12
Non-conforming Loops
• If iteration counts are different, one loop must be manipulated to make the iteration counts the same
1. Loop peeling2. Introduce a guard into one of the loops
![Page 13: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/13.jpg)
March 14, 2002 13
Loop Peeling
• Find the difference between the iteration count of the two loops (n)
• Duplicate the body of the loop with the higher iteration count n times
• Update the iteration count of the peeled loop
![Page 14: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/14.jpg)
March 14, 2002 14
Loop Peeling Example
while (i < 10){
a[i] = a[i - 1] * 2;i++;
}while (j < 12){
b[j] = b[j - 1] - 2;j++;
}
while (i < 10){
a[i] = a[i - 1] * 2;i++;
}while (j < 10){
b[j] = b[j - 1] - 2;j++;
}b[j] = b[j - 1] - 2;j++;b[j] = b[j - 1] - 2;j++;
![Page 15: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/15.jpg)
March 14, 2002 15
Guarding Iterations
• Increase the iteration count of the loop with fewer iterations
• Insert a guard branch around statements that would not normally be executed
![Page 16: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/16.jpg)
March 14, 2002 16
Guarding Iterations Example
while (i < 10){
a[i] = a[i - 1] * 2;i++;
}while (j < 12){
b[j] = b[j - 1] - 2;j++;
}
while (i < 12){
if (i < 10){a[i] = a[i - 1] * 2;
i++;}
}while (j < 12){
b[j] = b[j - 1] - 2;j++;
}
![Page 17: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/17.jpg)
March 14, 2002 17
Loop Peeling
• Advantage:• Does not generate control flow within a loop
body
• Disadvantage:• Generates additional code outside of loops,
which could possible intervene with other loops
![Page 18: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/18.jpg)
March 14, 2002 18
Guarding Iterations
• Advantages:• Does not introduce intervening code• Can be “undone” later
• Disadvantage:• Generates control flow within a loop
![Page 19: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/19.jpg)
March 14, 2002 19
Control Flow Equivalence
• Two loops are control-flow equivalent if when one executes, the other also executes
Loop 1
BB
Loop2
Loop 1
Loop 3
BB
Loop2
![Page 20: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/20.jpg)
March 14, 2002 20
Determining Control Flow Equivalence
• Use the concepts of dominators and post dominators. Two loops L1 and L2 are control-flow equivalent if the following two conditions are true:• L1 dominates L2; and • L2 post dominates L1.
![Page 21: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/21.jpg)
March 14, 2002 21
Intervening Code
• Two loops are adjacent if there are no statements between the two loops
• Can be determined using the CFG:• If the immediate successor of the first loop is
the second loop, the two loops are adjacent• If two loops are not adjacent, there is
intervening code between them.
![Page 22: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/22.jpg)
March 14, 2002 22
Dealing with Non-Adjacent Loops
• If two loops are not adjacent, we attempt to make them adjacent by moving the intervening code
• Intervening code can be moved:• Above the first loop• Below the second loop• Both
• as long as no data dependencies are violated
![Page 23: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/23.jpg)
March 14, 2002 23
Intervening Code Example
• Assume CFG has 20 nodes
• 0-5 are above Loop 1• 17-19 are below Loop 2• What algorithm should be
used to determine which nodes are between Loop1 and Loop2?
Loop 1
Loop 2
6
7
8 9
10 11 12
13 14
15
16
![Page 24: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/24.jpg)
March 14, 2002 24
Gathering Intervening Code
• Given two loops L1 and L2, a basic block B is intervening code between L1 and L2 if and only if:o B is strictly dominated by L1o B is not dominated by L2
• Once the dominance relations are known, the set subtraction can be efficiently computed using bit vectors
![Page 25: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/25.jpg)
March 14, 2002 25
Intervening Code ExampleLoop 1
Loop 2
6
7
8 9
10 11 12
13 14
15
16
Loop 10000 0011 1111 1111 1111 1
Loop 20000 0000 0000 0000 1111 1
Difference
0000 0011 1111 1111 0000 0
![Page 26: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/26.jpg)
March 14, 2002 26
Analyze Intervening Code
• Build a DDG of the intervening code• Put all nodes with no predecessors into queue• For each node in the queue:
• If there are no dependencies between the node and the loop
• Mark node as moveable• Add all of the nodes immediate successors to the
queue• All nodes marked can be moved around the loop
![Page 27: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/27.jpg)
March 14, 2002 27
Non-Adjacent loops examplewhile (i < N) {
a += i;i++;
}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)
d := c/2;else
e := c * 2;while (j < N) {
f := g + 6;j++;
}
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
![Page 28: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/28.jpg)
March 14, 2002 28
Non-Adjacent loops examplewhile (i < N) {
a += i;i++;
}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)
d := c/2;else
e := c * 2;while (j < N) {
f := g + 6;j++;
}
g := 0;h := g + 10;while (i < N) {
a += i;i++;
}while (j < N) {
f := g + 6;j++;
}b := a * 2;c := b + 6;if (c < 100)
d := c/2;else
e := c * 2;
![Page 29: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/29.jpg)
March 14, 2002 29
Non-Adjacent loops example
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
Node Queueb := a * 2;
g := 0;
DDG Loop 2
Moveable Nodes
c := b + 6;
if (c < 100)
d := c/2;
else
e := c * 2;
b := a * 2;
c := b + 6;
if (c < 100)
d := c/2;
else
e := c * 2;
while (j < N) {
f := g + 6;
j++;
}
![Page 30: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/30.jpg)
March 14, 2002 30
Non-Adjacent loops example
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
Node Queueb := a * 2;
g := 0;
DDG Loop 1
Moveable Nodes
h := g + 10;
g := 0;
h := g + 10;
while (i < N) {
a += i;
i++;
}
![Page 31: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/31.jpg)
March 14, 2002 31
Dependencies Preventing Fusion
i = j = 1; while (i < 10){
a[i] = c[i] + 10;i++;
}while (j < 10){
b[j] = a[j+1] * 2;j++;
}
Can the following loops be fused?
![Page 32: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/32.jpg)
March 14, 2002 32
Dependencies Preventing Fusion
• If we look at the array access patterns of a[], we see the following
a[i] = c[i] + 10;
b[j] = a[j+1] * 2;
![Page 33: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/33.jpg)
March 14, 2002 33
Dependencies Preventing Fusion
• By aligning the array access patterns, we get the following:
a[i] = c[i] + 10;
b[j] = a[j+1] * 2;
![Page 34: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/34.jpg)
March 14, 2002 34
Loop Alignment
i = j = 1; while (i < 10){
a[i] = c[i] + 10;i++;
}while (j < 10){
b[j] = a[j+1] * 2;j++;
}
j = 1; i = 2a[1] = c[1] + 10;while (i < 10){
a[i] = c[i] + 10;i++;
}while (j < 10){
b[j] = a[j+1] * 2;j++;
}
![Page 35: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/35.jpg)
March 14, 2002 35
Loop Alignment
• Loop alignment can be used to remove dependencies between loop bodies
• Easy to do when all dependencies have the same distance
• Gets tricky when there are multiple dependencies with different distances
![Page 36: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/36.jpg)
March 14, 2002 36
Putting it all together
• We’ve seen ways to deal with each of the preconditions of loop fusion
• If the conditions are not met, we apply transformations to try and modify the code
• If the transformations are successful, loop fusion can occur
• But in what order should these transformations be applied?
![Page 37: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/37.jpg)
March 14, 2002 37
Loop Fusion Algorithm
For each Ni from outermost to innermost:Gather control equivalent loops in Ni into LoopSets For each set Si in LoopSetsremove non-eligible loops from Si
FusedLoops = trueDirection = forwardwhile FusedLoops == trueif |Si| < 2 breakCompute Dominance RelationFusedLoops = LoopFusionPass(Si, Direction)Reverse Direction
![Page 38: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/38.jpg)
March 14, 2002 38
Loop Fusion AlgorithmLoopFusionPass(S, Direction)
FusedLoops = falseFor each pair of loops Lj and Lk in S such that Lj dominates Lk in Directionif (DependenceDistance(Lj, Lk) < 0) continueif (InterveningCode(Lj, Lk) == true and
IsInterveningCodeMoveable(Lj, Lk) == false) continued = | IterationCount(Lj) – IterationCount(Lk) |if (Lj and Lk are non-conforming and (d cannot be determined at compile time or d > MAXPEEL)) continueif (Lj and Lk are non-conforming) Peel iterations
MoveInterveningCode(Lj, Lk)if InterveningCode(Lj, Lk) == false FuseLoops(Lj, Lk) FusedLoops = true
Return FusedLoops
![Page 39: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/39.jpg)
March 14, 2002 39
ExampleL1: do i1 = 1, n a(i1) = a(i1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
Loop Set
L1
L2
L3
L4
![Page 40: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/40.jpg)
March 14, 2002 40
Peeling Loop 1L1: do i1 = 1, n a(i1) = a(i1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1L1: do i1 = 1, n-1 a(i1+1) = a(i1+1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 41: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/41.jpg)
March 14, 2002 41
Fuse L1 and L2S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1L1: do i1 = 1, n-1 a(i1+1) = a(i1+1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 42: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/42.jpg)
March 14, 2002 42
Compare L5 and L3
• We now compare loops L5 and L3
• They are not adjacent, but the intervening code can move
• Difference in iteration count is not know, so fusion fails
S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 43: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/43.jpg)
March 14, 2002 43
Compare L5 and L4
Intervening CodeS7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S1: ds = 0.0
L3: do i3 = 1, m
ds = ds + d(i3)
end do
S2: if (n<m)
S3: c(n-2) = n
S4: else
S5: c(n-2) = m
![Page 44: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/44.jpg)
March 14, 2002 44
Peel L5S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2L5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 45: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/45.jpg)
March 14, 2002 45
Move Intervening CodeS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2L5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 46: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/46.jpg)
March 14, 2002 46
Reverse PassS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
Loop Set
L1
L3
L4
Sorted in Reverse Dominance Direction
L1
L3
L4
![Page 47: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/47.jpg)
March 14, 2002 47
Compare L4 and L3
• Compare L4 and L3• No dependencies to
prevent fusion• Iteration count cannot
be determined at compile time
• Fusion fails
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 48: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/48.jpg)
March 14, 2002 48
Compare L4 and L5
Intervening CodeL3: do i3 = 1, m
ds = ds + d(i3)
end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 49: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/49.jpg)
March 14, 2002 49
Move Intervening CodeS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end doL3: do i3 = 1, m ds = ds + d(i3) end do
![Page 50: CMPUT680 - Winter 2006](https://reader035.vdocuments.us/reader035/viewer/2022070502/56813590550346895d9d000f/html5/thumbnails/50.jpg)
March 14, 2002 50
Fuse L4 and L1S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL6: do i5 = 1, n-2 a(i6+2) = a(i6+2) * k1 d(i6+1) = a(i6+1) - b(i6+2) * k2 b(i6) = a(i6) + b(i6) / c(i6) end doL3: do i3 = 1, m ds = ds + d(i3) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end doL3: do i3 = 1, m ds = ds + d(i3) end do