loop alignment

14
Loop Alignment (Advanced Compilers) By- Isha Pandya Sumita Das

Upload: sumita-das

Post on 07-Jan-2017

102 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Loop alignment

Loop Alignment(Advanced Compilers)

By- Isha PandyaSumita Das

Page 2: Loop alignment

Created by Sumita Das

IntroductionLoop distribution eliminates loop carried dependences by

executing the sources of all dependences before executing any sinks.

Many carried dependencies are due to array alignment issues.

If we can align all references, then dependencies would go away, and parallelism is possible.

For e.g.,DO I = 2,N

A(I) = B(I)+C(I)

D(I) = A(I-1)*2.0

ENDDO

Page 3: Loop alignment

Created by Sumita Das

This loop cannot be run in parallel.

Because the value of A computed on iteration I is used on iteration I+1.

The two statements can be aligned to compute and use the values in the same iteration by adding an extra iteration and adjusting the indices of one of the statement to produce

For e.g., DO I = 1,N+1

IF (I .GT. 1) A(I) = B(I)+C(I)

IF (I .LE. N) D(I+1) = A(I)*2.0

ENDDO

Page 4: Loop alignment

Created by Sumita Das

Illustration of Loop Alignment

Page 5: Loop alignment

Created by Sumita Das

DO I = 2,N

J = MOD(I+N-4,N-1)+2

A(J) = B(J)+C

D(I)=A(I-1)*2.0

ENDDO

AlignmentLoop alignment does incur some overhead—

One extra loop iteration and extra work required to test the conditionals.

This overhead can be reduced by executing the last iteration of the first statement with the first iteration of the second statement.

Page 6: Loop alignment

Created by Sumita Das

For every iteration other than the first, j is one less than i, so that the assignment to A is for the ith location.

On the first iteration, j=N-1 so that j+1=N, and the assignment to the last location of A is correctly executed.

As a result, the total number of loop iterations is restored to its original count, but there is still the overhead of the MOD calculation.

Page 7: Loop alignment

Created by Sumita Das

Alternatively, the conditional statements can be eliminated without adding calls to MOD by peeling off the first and last executions for each of the statements, yielding

This form permits efficient parallelism with the added overhead of two statements, one before and one after the loop, that cannot be executed in parallel.

D(2) = A(1) * 2.0DO I= 2, N-1A(I) = B(I) + C(I)D(I+1) = A(I)*2.0ENDDOA(N) = B(N) + C(N)

Page 8: Loop alignment

Created by Sumita Das

It is not possible to use alignment to eliminate all carried dependences in a loop if the carried dependence is involved in a recurrence, as the following example shows:

DO I = 1, NA(I) = B(I) + CB(I+1) = A(I) + DENDDO

In this example, the references to B create a carried dependence.

For alignment to be successful in this case, we would need to interchange the order of the two statements in the loop body.

Page 9: Loop alignment

Created by Sumita Das

However, the loop-independent dependence involving A prevents interchanging the statements before alignment, so our hope is that we can do the alignment and statement interchange in a single step to eliminate the carried dependence:

DO I = 1, N+1IF (I .NE. 1) B(I) = A(I-1) + DIF (I .NE. N+1) A(I) = B(I) + CENDDO

Although B is now aligned, the references to A are misaligned, creating a new carried dependence.

Looking at this example, it is reasonable to believe that loop alignment cannot eliminate carried dependences in a recurrence.

Page 10: Loop alignment

Created by Sumita Das

Alignment, replication, and statement reordering are sufficient to eliminate all carried dependencies in a single loop containing no recurrence, and in which the distance of each dependence is a constant independent of the loop index We can establish this constructively. Let G = (V,E,) be a weighted graph. v V is a statement, and (v1, v2) is the dependence distance between v1 and v2. Let o: V Z give the offset of vertices. G is said to be carry free if o(v1) + (v1, v2) = o(v2).

Theorem

Page 11: Loop alignment

Created by Sumita Das

The carried dependences that are not involved in a recurrence cannot be always eliminated by alignment without introducing new carried dependences?

Because of the possibility of an alignment conflict—two or more dependences that cannot be simultaneously aligned.

Consider the following example:

DO I = 1, NA(I+1) = B(I) + CX(I)= A(I+1) + A(I)ENDDO

This loop contains two dependences involving the array A, one loop-independent dependence and a loop-carried dependence.

Page 12: Loop alignment

Created by Sumita Das

If the statements are aligned to eliminate the carried dependence, the following code results:

DO I = 0, NIF (I .NE. 0) A(I+1) = B(I) + CIF (I .NE. N) X(I+1)= A(I+2) + A(I+1)ENDDO

The original loop-carried dependence has been eliminated, but the process of eliminating it has transformed the original loop-independent dependence into a loop-carried dependence. The loop still cannot be correctly run in parallel.

Page 13: Loop alignment

Created by Sumita Das

procedure Align(V,E,,0) While V is not empty remove element v from V for each (w,v) E if w V W W {w} o(w) o(v) - (w,v) else if o(w) != o(v) - (w,v) create vertex w’ replace (w,v) with (w’,v) replicate all edges into w onto w’ W W {w’} o(w)’ o(v) - (w,v)

for each (v,w) E if w V W W {w} o(w) o(v) + (v,w) else if o(w) != o(v) + (v,w) create vertex v’ replace (v,w) with (v’,w) replicate edges into v onto v’ W W {v’} o(v’) o(w) - (v,w)end align

Alignment Procedure

Page 14: Loop alignment

Created by Sumita Das

References[1] Randy Allen, Ken Kennedy”Optimizing Compilers for Modern Architectures, Chapter 6: Creating Coarse-Grained Parallelism”, 1st Edition