loop alignment

Loop Alignment(Advanced Compilers)

By- Isha PandyaSumita Das

Created by Sumita Das

IntroductionLoop distribution eliminates loop carried dependences by

executing the sources of all dependences before executing any sinks.

Many carried dependencies are due to array alignment issues.

If we can align all references, then dependencies would go away, and parallelism is possible.

For e.g.,DO I = 2,N

A(I) = B(I)+C(I)

D(I) = A(I-1)*2.0

ENDDO


This loop cannot be run in parallel.

Because the value of A computed on iteration I is used on iteration I+1.

The two statements can be aligned to compute and use the values in the same iteration by adding an extra iteration and adjusting the indices of one of the statement to produce

For e.g., DO I = 1,N+1

IF (I .GT. 1) A(I) = B(I)+C(I)

IF (I .LE. N) D(I+1) = A(I)*2.0

ENDDO


Illustration of Loop Alignment


DO I = 2,N

J = MOD(I+N-4,N-1)+2

A(J) = B(J)+C

D(I)=A(I-1)*2.0

ENDDO

AlignmentLoop alignment does incur some overhead—

One extra loop iteration and extra work required to test the conditionals.

This overhead can be reduced by executing the last iteration of the first statement with the first iteration of the second statement.


For every iteration other than the first, j is one less than i, so that the assignment to A is for the ith location.

On the first iteration, j=N-1 so that j+1=N, and the assignment to the last location of A is correctly executed.

As a result, the total number of loop iterations is restored to its original count, but there is still the overhead of the MOD calculation.


Alternatively, the conditional statements can be eliminated without adding calls to MOD by peeling off the first and last executions for each of the statements, yielding

This form permits efficient parallelism with the added overhead of two statements, one before and one after the loop, that cannot be executed in parallel.

D(2) = A(1) * 2.0DO I= 2, N-1A(I) = B(I) + C(I)D(I+1) = A(I)*2.0ENDDOA(N) = B(N) + C(N)


It is not possible to use alignment to eliminate all carried dependences in a loop if the carried dependence is involved in a recurrence, as the following example shows:

DO I = 1, NA(I) = B(I) + CB(I+1) = A(I) + DENDDO

In this example, the references to B create a carried dependence.

For alignment to be successful in this case, we would need to interchange the order of the two statements in the loop body.


However, the loop-independent dependence involving A prevents interchanging the statements before alignment, so our hope is that we can do the alignment and statement interchange in a single step to eliminate the carried dependence:

DO I = 1, N+1IF (I .NE. 1) B(I) = A(I-1) + DIF (I .NE. N+1) A(I) = B(I) + CENDDO

Although B is now aligned, the references to A are misaligned, creating a new carried dependence.

Looking at this example, it is reasonable to believe that loop alignment cannot eliminate carried dependences in a recurrence.


Alignment, replication, and statement reordering are sufficient to eliminate all carried dependencies in a single loop containing no recurrence, and in which the distance of each dependence is a constant independent of the loop index We can establish this constructively. Let G = (V,E,) be a weighted graph. v V is a statement, and (v1, v2) is the dependence distance between v1 and v2. Let o: V Z give the offset of vertices. G is said to be carry free if o(v1) + (v1, v2) = o(v2).

Theorem


The carried dependences that are not involved in a recurrence cannot be always eliminated by alignment without introducing new carried dependences?

Because of the possibility of an alignment conflict—two or more dependences that cannot be simultaneously aligned.

Consider the following example:

DO I = 1, NA(I+1) = B(I) + CX(I)= A(I+1) + A(I)ENDDO

This loop contains two dependences involving the array A, one loop-independent dependence and a loop-carried dependence.


If the statements are aligned to eliminate the carried dependence, the following code results:

DO I = 0, NIF (I .NE. 0) A(I+1) = B(I) + CIF (I .NE. N) X(I+1)= A(I+2) + A(I+1)ENDDO

The original loop-carried dependence has been eliminated, but the process of eliminating it has transformed the original loop-independent dependence into a loop-carried dependence. The loop still cannot be correctly run in parallel.


procedure Align(V,E,,0) While V is not empty remove element v from V for each (w,v) E if w V W W {w} o(w) o(v) - (w,v) else if o(w) != o(v) - (w,v) create vertex w’ replace (w,v) with (w’,v) replicate all edges into w onto w’ W W {w’} o(w)’ o(v) - (w,v)

for each (v,w) E if w V W W {w} o(w) o(v) + (v,w) else if o(w) != o(v) + (v,w) create vertex v’ replace (v,w) with (v’,w) replicate edges into v onto v’ W W {v’} o(v’) o(w) - (v,w)end align

Alignment Procedure


References[1] Randy Allen, Ken Kennedy”Optimizing Compilers for Modern Architectures, Chapter 6: Creating Coarse-Grained Parallelism”, 1st Edition

loop alignment

Engineering