ece 1747h : parallel programming message passing (mpi)
TRANSCRIPT
![Page 1: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/1.jpg)
ECE 1747H : Parallel Programming
Message Passing (MPI)
![Page 2: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/2.jpg)
Explicit Parallelism
• Same thing as multithreading for shared memory.
• Explicit parallelism is more common with message passing.– User has explicit control over processes.– Good: control can be used to performance
benefit.– Bad: user has to deal with it.
![Page 3: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/3.jpg)
Distributed Memory - Message Passing
proc1 proc2 proc3 procN
mem1 mem2 mem3 memN
network
![Page 4: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/4.jpg)
Distributed Memory - Message Passing
• A variable x, a pointer p, or an array a[] refer to different memory locations, depending of the processor.
• In this course, we discuss message passing as a programming model (can be on any hardware)
![Page 5: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/5.jpg)
What does the user have to do?
• This is what we said for shared memory:– Decide how to decompose the computation into
parallel parts.– Create (and destroy) processes to support that
decomposition.– Add synchronization to make sure dependences
are covered.
• Is the same true for message passing?
![Page 6: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/6.jpg)
Another Look at SOR Example
for some number of timesteps/iterations {for (i=0; i<n; i++ )
for( j=0; j<n, j++ )temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j] +
grid[i][j-1] + grid[i][j+1] );for( i=0; i<n; i++ )
for( j=0; j<n; j++ )grid[i][j] = temp[i][j];
}
![Page 7: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/7.jpg)
Shared Memory
proc1 proc2 proc3 procN
grid 1 1temp
2 23 34 4
![Page 8: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/8.jpg)
Message-Passing Data Distribution (only middle processes)
proc2 proc3
2
2
3
3
grid grid
temp temp
![Page 9: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/9.jpg)
Is this going to work?
Same code as we used for shared memory
for( i=from; i<to; i++ )
for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1][j]
+ grid[i][j-1] + grid[i][j+1]);
No, we need extra boundary elements for grid.
![Page 10: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/10.jpg)
Data Distribution (only middle processes)
proc2 proc3
2
2
3
3
grid grid
temp temp
![Page 11: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/11.jpg)
Is this going to work?
Same code as we used for shared memory
for( i=from; i<to; i++)
for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1][j]
+ grid[i][j-1] + grid[i][j+1]);
No, on the next iteration we need boundary elements from our neighbors.
![Page 12: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/12.jpg)
Data Communication (only middle processes)
proc2 proc3
grid grid
![Page 13: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/13.jpg)
Is this now going to work?
Same code as we used for shared memory
for( i=from; i<to; i++ )
for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1][j]
+ grid[i][j-1] + grid[i][j+1]);
No, we need to translate the indices.
![Page 14: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/14.jpg)
Index Translation
for( i=0; i<n/p; i++)
for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1][j]
+ grid[i][j-1] + grid[i][j+1]);
Remember, all variables are local.
![Page 15: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/15.jpg)
Index Translation is Optional
• Allocate the full arrays on each processor.
• Leave indices alone.
• Higher memory use.
• Sometimes necessary (see later).
![Page 16: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/16.jpg)
What does the user need to do?
• Divide up program in parallel parts.• Create and destroy processes to do above.• Partition and distribute the data.• Communicate data at the right time.• (Sometimes) perform index translation.• Still need to do synchronization?
– Sometimes, but many times goes hand in hand with data communication.
![Page 17: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/17.jpg)
Message Passing Systems
• Provide process creation and destruction.
• Provide message passing facilities (send and receive, in various flavors) to distribute and communicate data.
• Provide additional synchronization facilities.
![Page 18: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/18.jpg)
MPI (Message Passing Interface)
• Is the de facto message passing standard.
• Available on virtually all platforms.
• Grew out of an earlier message passing system, PVM, now outdated.
![Page 19: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/19.jpg)
MPI Process Creation/Destruction
MPI_Init( int argc, char **argv )Initiates a computation.
MPI_Finalize()Terminates a computation.
![Page 20: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/20.jpg)
MPI Process Identification
MPI_Comm_size( comm, &size )
Determines the number of processes.
MPI_Comm_rank( comm, &pid )Pid is the process identifier of
the caller.
![Page 21: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/21.jpg)
MPI Basic Send
MPI_Send(buf, count, datatype, dest, tag, comm)
buf: address of send buffer
count: number of elements
datatype: data type of send buffer elements
dest: process id of destination process
tag: message tag (ignore for now)
comm: communicator (ignore for now)
![Page 22: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/22.jpg)
MPI Basic Receive
MPI_Recv(buf, count, datatype, source, tag, comm, &status)
buf: address of receive buffer
count: size of receive buffer in elements
datatype: data type of receive buffer elements
source: source process id or MPI_ANY_SOURCE
tag and comm: ignore for now
status: status object
![Page 23: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/23.jpg)
MPI Matrix Multiply (w/o Index Translation)
main(int argc, char *argv[]){
MPI_Init (&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &myrank);MPI_Comm_size(MPI_COMM_WORLD, &p);from = (myrank * n)/p;to = ((myrank+1) * n)/p;/* Data distribution */ .../* Computation */ .../* Result gathering */ ...MPI_Finalize();
}
![Page 24: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/24.jpg)
MPI Matrix Multiply (w/o Index Translation)
/* Data distribution */if( myrank != 0 ) {
MPI_Recv( &a[from], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD, &status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD, &status );
} else {for( i=1; i<p; i++ ) {
MPI_Send( &a[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD );}
}
![Page 25: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/25.jpg)
MPI Matrix Multiply (w/o Index Translation)
/* Computation */
for ( i=from; i<to; i++) for (j=0; j<n; j++) {
C[i][j]=0;for (k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j]; }
![Page 26: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/26.jpg)
MPI Matrix Multiply (w/o Index Translation)
/* Result gathering */if (myrank!=0)
MPI_Send( &c[from], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD);
else for (i=1; i<p; i++)
MPI_Recv( &c[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD,
&status);
![Page 27: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/27.jpg)
MPI Matrix Multiply (with Index Translation)
main(int argc, char *argv[]){
MPI_Init (&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &myrank);MPI_Comm_size(MPI_COMM_WORLD, &p);from = (myrank * n)/p;to = ((myrank+1) * n)/p;/* Data distribution */ .../* Computation */ .../* Result gathering */ ...MPI_Finalize();
}
![Page 28: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/28.jpg)
MPI Matrix Multiply (with Index Translation)
/* Data distribution */if( myrank != 0 ) {
MPI_Recv( &a, n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD, &status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD, &status );
} else {for( i=1; i<p; i++ ) {
MPI_Send( &a[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD );}
}
![Page 29: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/29.jpg)
MPI Matrix Multiply (with Index Translation)
/* Computation */
for ( i=0; i<n/p; i++) for (j=0; j<n; j++) {
C[i][j]=0;for (k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j]; }
![Page 30: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/30.jpg)
MPI Matrix Multiply (with Index Translation)
/* Result gathering */if (myrank!=0)
MPI_Send( &c, n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD);
else for( i=1; i<p; i++ )
MPI_Recv( &c[from], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD,
&status);
![Page 31: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/31.jpg)
Running a MPI Program
• mpirun <program_name> <arguments>
• Interacts with a daemon process on the hosts.
• Causes a Unix process to be run on each of the hosts.
• May only run in interactive mode (batch mode may be blocked by ssh)
![Page 32: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/32.jpg)
ECE1747 Parallel Programming
Message Passing (MPI)
Global Operations
![Page 33: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/33.jpg)
What does the user need to do?
• Divide program in parallel parts.• Create and destroy processes to do above.• Partition and distribute the data.• Communicate data at the right time.• (Sometimes) perform index translation.• Still need to do synchronization?
– Sometimes, but many times goes hand in hand with data communication.
![Page 34: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/34.jpg)
MPI Process Creation/Destruction
MPI_Init( int *argc, char **argv )Initiates a computation.
MPI_Finalize()Finalizes a computation.
![Page 35: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/35.jpg)
MPI Process Identification
MPI_Comm_size( comm, &size )
Determines the number of processes.
MPI_Comm_rank( comm, &pid )Pid is the process identifier of
the caller.
![Page 36: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/36.jpg)
MPI Basic Send
MPI_Send(buf, count, datatype, dest, tag, comm)
buf: address of send buffer
count: number of elements
datatype: data type of send buffer elements
dest: process id of destination process
tag: message tag (ignore for now)
comm: communicator (ignore for now)
![Page 37: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/37.jpg)
MPI Basic Receive
MPI_Recv(buf, count, datatype, source, tag, comm, &status)
buf: address of receive buffer
count: size of receive buffer in elements
datatype: data type of receive buffer elements
source: source process id or MPI_ANY_SOURCE
tag and comm: ignore for now
status: status object
![Page 38: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/38.jpg)
Global Operations (1 of 2)
• So far, we have only looked at point-to-point or one-to-one message passing facilities.
• Often, it is useful to have one-to-many or many-to-one message communication.
• This is what MPI’s global operations do.
![Page 39: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/39.jpg)
Global Operations (2 of 2)
• MPI_Barrier
• MPI_Bcast
• MPI_Gather
• MPI_Scatter
• MPI_Reduce
• MPI_Allreduce
![Page 40: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/40.jpg)
Barrier
MPI_Barrier(comm)
Global barrier synchronization, as before: all processes wait until all have arrived.
![Page 41: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/41.jpg)
Broadcast
MPI_Bcast(inbuf, incnt, intype, root, comm)
inbuf: address of input buffer (on root); address of output buffer (elsewhere)
incnt: number of elements
intype: type of elements
root: process id of root process
![Page 42: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/42.jpg)
Before Broadcast
proc0 proc1 proc2 proc3
inbuf
root
![Page 43: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/43.jpg)
After Broadcast
proc0 proc1 proc2 proc3
inbuf
root
![Page 44: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/44.jpg)
Scatter
MPI_Scatter(inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm)
inbuf: address of input bufferincnt: number of input elementsintype: type of input elementsoutbuf: address of output bufferoutcnt: number of output elementsouttype: type of output elementsroot: process id of root process
![Page 45: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/45.jpg)
Before Scatter
proc0 proc1 proc2 proc3
inbuf
outbuf
root
![Page 46: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/46.jpg)
After Scatter
proc0 proc1 proc2 proc3
inbuf
outbuf
root
![Page 47: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/47.jpg)
Gather
MPI_Gather(inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm)
inbuf: address of input bufferincnt: number of input elementsintype: type of input elementsoutbuf: address of output bufferoutcnt: number of output elementsouttype: type of output elementsroot: process id of root process
![Page 48: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/48.jpg)
Before Gather
proc0 proc1 proc2 proc3
inbufoutbuf
root
![Page 49: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/49.jpg)
After Gather
proc0 proc1 proc2 proc3
inbufoutbuf
root
![Page 50: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/50.jpg)
Broadcast/Scatter/Gather
• Funny thing: these three primitives are sends and receives at the same time (a little confusing sometimes).
• Perhaps un-intended consequence: requires global agreement on layout of array.
![Page 51: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/51.jpg)
MPI Matrix Multiply (w/o Index Translation)
main(int argc, char *argv[]){
MPI_Init (&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &myrank);MPI_Comm_size(MPI_COMM_WORLD, &p);for( i=0; i<p; i++ ) {
from[i] = (i * n)/p;to[i] = ((i+1) * n)/p;
}/* Data distribution */ .../* Computation */ .../* Result gathering */ ...MPI_Finalize();
}
![Page 52: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/52.jpg)
MPI Matrix Multiply (w/o Index Translation)
/* Data distribution */if( myrank != 0 ) {
MPI_Recv( &a[from[myrank]], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD, &status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD, &status );
} else {for( i=1; i<p; i++ ) {
MPI_Send( &a[from[i]], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD );}
}
![Page 53: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/53.jpg)
MPI Matrix Multiply (w/o Index Translation)
/* Computation */
for ( i=from[myrank]; i<to[myrank]; i++) for (j=0; j<n; j++) {
C[i][j]=0;for (k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j]; }
![Page 54: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/54.jpg)
MPI Matrix Multiply (w/o Index Translation)
/* Result gathering */if (myrank!=0)
MPI_Send( &c[from[myrank]], n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD);
else for( i=1; i<p; i++ )
MPI_Recv( &c[from[i]], n*n/p, MPI_INT, i, tag, MPI_COMM_WORLD,
&status);
![Page 55: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/55.jpg)
MPI Matrix Multiply Revised (1 of 2)
main(int argc, char *argv[]){
MPI_Init (&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &myrank);MPI_Comm_size(MPI_COMM_WORLD, &p);from = (myrank * n)/p;to = ((myrank+1) * n)/p;MPI_Scatter (a, n*n/p, MPI_INT, a, n*n/p, MPI_INT, 0,
MPI_COMM_WORLD);MPI_Bcast (b,n*n, MPI_INT, 0, MPI_COMM_WORLD);
...
![Page 56: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/56.jpg)
MPI Matrix Multiply Revised (2 of 2)
...for (i=from; i<to; i++)
for (j=0; j<n; j++) {C[i][j]=0;for (k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j]; }
MPI_Gather (C[from], n*n/p, MPI_INT, c[from], n*n/p, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Finalize();}
![Page 57: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/57.jpg)
SOR Sequential Code
for some number of timesteps/iterations {for (i=0; i<n; i++ )
for( j=0; j<n, j++ )temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]
grid[i][j-1] + grid[i][j+1] );for( i=0; i<n; i++ )
for( j=0; j<n; j++ )grid[i][j] = temp[i][j];
}
![Page 58: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/58.jpg)
MPI SOR
• Allocate grid and temp arrays.
• Use MPI_Scatter to distribute initial values, if any (requires non-local allocation).
• Use MPI_Gather to return the results to process 0 (requires non-local allocation).
• Focusing only on communication within the computational part ...
![Page 59: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/59.jpg)
Data Communication (only middle processes)
proc2 proc3
grid grid
![Page 60: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/60.jpg)
MPI SOR
for some number of timesteps/iterations {for (i=from; i<to; i++ )
for( j=0; j<n, j++ )temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]grid[i][j-1] + grid[i][j+1] );
for( i=from; i<to; i++ )for( j=0; j<n; j++ )
grid[i][j] = temp[i][j];/* here comes communication */
}
![Page 61: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/61.jpg)
MPI SOR Communication
if (myrank != 0) {MPI_Send (grid[from], n, MPI_DOUBLE, myrank-1, tag, MPI_COMM_WORLD);
MPI_Recv (grid[from-1], n, MPI_DOUBLE, myrank-1, tag, MPI_COMM_WORLD,
&status);}if (myrank != p-1) {
MPI_Send (grid[to-1], n, MPI_DOUBLE, myrank+1, tag, MPI_COMM_WORLD);MPI_Recv (grid[to], n, MPI_DOUBLE, myrank+1, tag, MPI_COMM_WORLD, &status);
}
![Page 62: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/62.jpg)
No Barrier Between Loop Nests?
• Not necessary.
• Anti-dependences do not need to be covered in message passing.
• Memory is private, so overwrite does not matter.
![Page 63: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/63.jpg)
SOR: Terminating Condition
• Real versions of SOR do not run for some fixed number of iterations.
• Instead, they test for convergence.
• Possible convergence criterion: difference between two successive iterations is less than some delta.
![Page 64: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/64.jpg)
SOR Sequential Code with Convergence
for( ; diff > delta; ) {for (i=0; i<n; i++ )
for( j=0; j<n, j++ ) { … }diff = 0;for( i=0; i<n; i++ )
for( j=0; j<n; j++ ) { diff = max(diff, fabs(grid[i][j] -
temp[i][j])); grid[i][j] = temp[i][j];
}}
![Page 65: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/65.jpg)
Reduction
MPI_Reduce(inbuf, outbuf, count, type, op, root, comm)
inbuf: address of input buffer
outbuf: address of output buffer
count: number of elements in input buffer
type: datatype of input buffer elements
op: operation (MPI_MIN, MPI_MAX, etc.)
root: process id of root process
![Page 66: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/66.jpg)
Global Reduction
MPI_Allreduce(inbuf, outbuf, count, type, op, comm)
inbuf: address of input buffer
outbuf: address of output buffer
count: number of elements in input buffer
type: datatype of input buffer elements
op: operation (MPI_MIN, MPI_MAX, etc.)
no root process
![Page 67: ECE 1747H : Parallel Programming Message Passing (MPI)](https://reader035.vdocuments.us/reader035/viewer/2022062518/56649e9c5503460f94b9d33a/html5/thumbnails/67.jpg)
MPI SOR Code with Convergence
for( ; diff > delta; ) {for (i=from; i<to; i++ ) for( j=0; j<n, j++ ) { … }mydiff = 0.0;for( i=from; i<to; i++ ) for( j=0; j<n; j++ ) {
mydiff=max(mydiff,fabs(grid[i][j]-temp[i][j]);
grid[i][j] = temp[i][j];}MPI_Allreduce (&mydiff, &diff, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD);...
}