cs4402 – parallel computing
TRANSCRIPT
![Page 1: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/1.jpg)
1
CS4402 – Parallel Computing
Lecture 9 – Sorting Algorithms (2)
Compare and Exchange Operation
Compare and Exchange Sorting
![Page 2: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/2.jpg)
2
![Page 3: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/3.jpg)
3
![Page 4: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/4.jpg)
4
![Page 5: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/5.jpg)
5
Compare and Exchange Operation
Take place between processors rank1, rank2.
Each processor keeps the sub-array a=(a[i],i=0,1,…,n).
if(rank is rank1){MPI_Send(&a,n,MPI_INT,rank2, tag1,MPI_COMM_WORLD);MPI_Recv(&b,n,MPI_INT,rank2, tag2,MPI_COMM_WORLD,&status);c = merge(n,a,n,b);for(i=0;i<n;i++)a[i]=c[i];
} if(rank is rank2){MPI_Send(&a,n,MPI_INT,rank2, tag2,MPI_COMM_WORLD);MPI_Recv(&b,n,MPI_INT,rank2, tag1,MPI_COMM_WORLD,&status);c = merge(n,a,n,b);for(i=0;i<n;i++)a[i]=c[i+n];
}
![Page 6: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/6.jpg)
6
Compare and Exchange Operation
Complexity?
What amount of computation is being used?
What amount of communication takes place?
CAN YOU FIND ARGUMENTS TO PROVE
THAT THIS IS OPTIMAL OR EFFICIENT?
![Page 7: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/7.jpg)
7
Compare and Exchange Algorithms
Step 1. The array is scattered onto p sub-arrays.
Step 2. Processor rank sorts a sub-array.At any time the processors keep the sub-arrays sorted.
Step 3. While is not sorted / is needed compare and exchange between some processors
Step 4. Gather of arrays to restore a sorted array.
![Page 8: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/8.jpg)
8
Bubble Sort
![Page 9: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/9.jpg)
9
Bubble Sort
![Page 10: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/10.jpg)
10
Bubble Sort
![Page 11: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/11.jpg)
11
![Page 12: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/12.jpg)
12
![Page 13: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/13.jpg)
13
![Page 14: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/14.jpg)
14
Odd-Even Sort1. Scatter the array onto processors.2. Sort each sub-array aa.3. Repeat for step=0,1,2,…, p-1
if (step is odd){if(rank is odd)exchange(aa,n/size,rank, rank+1); if(rank is even) exchange(aa,n/size,rank-1, rank);
} if (step is even){
if(rank is even)exchange(aa,n/size,rank, rank+1); if(rank is odd) exchange(aa,n/size,rank-1, rank);
}4. Gather the sub-arrays back to root.
![Page 15: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/15.jpg)
15
Odd-Even SortSimple Remarks:
- Odd-Even Sort uses size rounds of exchange.
- Odd-Even Sort keeps all processors busy … or almost all.
- The complexity is given by
- Scatter and Gather the array n/size elements
- Sorting the array n/size elements
- Compare and Exchange process size rounds involving n/size
elements
![Page 16: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/16.jpg)
16
if( rank == 0 ){
array = (double *) calloc( n, sizeof(double) ); srand( ((unsigned)time(NULL)+rank) );
for( x = 0; x < n; x++ ) array[x]=((double)rand()/RAND_MAX)*m;}
MPI_Scatter( array, n/size, MPI_DOUBLE, a, n/size, MPI_DOUBLE, 0, MPI_COMM_WORLD );
merge_sort(n/size,a);
for(i=0;i<size;i++){
if( (i+rank)%2 ==0 ){ if( rank < size-1 ) exchange(n/size,a,rank,rank+1,MPI_COMM_WORLD); } else { if( rank > 0 ) exchange(n/size,a,rank-1,rank,MPI_COMM_WORLD);
} MPI_Barrier(MPI_COMM_WORLD)
}
MPI_Gather( a, n/size, MPI_DOUBLE, array, n/size, MPI_DOUBLE, 0, MPI_COMM_WORLD );
if( rank == 0 ){ for( x = 0; x < n; x++ ) printf( "Output : %f\n", array[x] ); }
![Page 17: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/17.jpg)
17
Comments on Odd-EvenFeatures of the algorithm:
- Simple and quite efficient.
- In p steps of compare and exchange the array is sorted out
- Why???
- The number of steps can be reduced if test “array sorted” but still in O(p).
- C&E operations only between neighbors.
Can we do C&E operations between other processors?
![Page 18: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/18.jpg)
18
Odd-Even Sort Complexity
Stage 0. To sort out the scattered array
Stage 1. Odd-Even for p levels
Scatter and Gather
Total computation complexity
comTpn
pn log
commcomcomcomm TnTnTpnT
pnp
2222
commcom Tnpn
Tnpn
pn
222log
commTpn
2
![Page 19: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/19.jpg)
19
isSorted(n, a, comm)The parallel routine int isSorted(int n, double *a, MPI_Comm comm)
1. Test if the processors have all the local arrays in order.
2. rank1 < rank2 elements of rank1 < rank2.
3. If the answer if yes then no exchange is needed.
How to do it?1. The test is done at the root.
2. The test is done collectively by all processors.
![Page 20: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/20.jpg)
20
isSorted(n,a,comm) – Strategy 1The test is done collectively by all processors
1. Send last to the right processor
2. Receive last from the left processor
3. Test if last > a[0] then answer = 0
4. All_Reduce answer by using MIN
![Page 21: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/21.jpg)
21
isSorted(n,a,comm) – Strategy 2The test is done at the root.
1. Gather the first elements to the root.
2. Gather the last elements to the root.
3. If rank == 0 then1. For size-1 times do
- test if last[i] > first[i+1]
• Broadcast the answer
![Page 22: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/22.jpg)
22
Shell Sort It is based on the notion of “shell/group” of consecutive processors.
- C&E take place between equally extreme procs. - The shell is then divided into 2.
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15) #(shell)=p
(0 1 2 3 4 5 6 7) (8 9 10 11 12 13 14 15) #(shell)=p/2 (0 1 2 3) (4 5 6 7) (8 9 10 11) (12 13 14 15) #(shell)=p/4 (0 1) (2 3) (4 5) (6 7) (8 9) (10 11) (12 13) (14 15) #(shell)=p/8
- There are log(p) levels of division.
For the level l we have- there are pow(2,l) shells each of size p/pow(2,l).
- The shell k contains the processors
1
2)1(,...,1
2,
2 lll
pkpkpk
![Page 23: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/23.jpg)
23
Shell SortShell Sort is based on two stages:
Stage 1. Divide the shells
for l=0,1,2, log(p)
- exchange in parallel between extreme processors in each shell.
Stage 2. Odd-Even
for l=0,1,2, …,p
- if rank and l are both even then exchange in parallel betw rank and rank+1
- if rank and l are both odd then exchange in parallel betw rank and rank+1
- test “array sorted”
![Page 24: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/24.jpg)
24
Shell Sort Complexity
Stage 0. To sort out the scattered array
Stage 1. Odd-Even for l levels
Catch the average complexity of l is in this case O(log^2(p)) so that in average the shell can be
Scatter and Gather
Total computation complexity
comTpn
pn log
comcommcomcomm TppnTp
pnTl
pnTl
pn
22 log2log222
commcom Tppn
pnTp
pn
pn
pn
22 log22log2log
commTpn
2
![Page 25: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/25.jpg)
25
Complexity Comparison for Parallel Sorting
Odd-Even Sort
Shell Sort
Merge Sort
commcom Tppn
pnTp
pn
pn
pn
22 log22log2log
commcom Tnpn
Tnpn
pn
222log
commcom TpnnT
pnn
pn
pn
2222log
![Page 26: CS4402 – Parallel Computing](https://reader036.vdocuments.us/reader036/viewer/2022081512/5a4d1aef7f8b9ab05997d2ab/html5/thumbnails/26.jpg)
26
AssignmentDescription: Write a MPI program to sort out an array:
1. Use a MPI method to compare and exchange
2. Use a MPI method to test isSorted()
3. Use the odd-even sort.
4. Evaluate the performances of the program in a readme.doc
General Points:1. It is for 10% of the marks.
2. Deadline on Monday 2/12/2013 at 5 pm.
3. The following elements must be submitted by email to [email protected]:1. The c program name with your name and student number e.g. SabinTabirca_111111111.c.
2. The Makefile file
3. Readme.doc in which you have 1) to give your student details and 2) to state the performances.