Download - MPI: Message Passing Interface
1
MPI: Message Passing InterfaceMPI: Message Passing Interface
Prabhaker MatetiWright State University
Mateti, MPI2
Overview Overview
MPI Hello World!Introduction to programming with MPIMPI library calls
Mateti, MPI3
MPI Overview MPI Overview
Similar to PVMNetwork of Heterogeneous MachinesMultiple implementations
– Open source:MPICHLAM
– Vendor specific
Mateti, MPI4
MPI FeaturesMPI Features
Rigorously specified standard Portable source codeEnables third party libraries Derived data types to minimize overhead Process topologies for efficiency on MPP Van fully overlap communication Extensive group communication
Mateti, MPI5
MPI 2MPI 2
Dynamic Process Management One-Sided Communication Extended Collective Operations External Interfaces Parallel I/O Language Bindings (C++ and Fortran-90)http://www.mpi-forum.org/
Mateti, MPI6
MPI OverviewMPI Overview
125+ functions typical applications need only about 6
Mateti, MPI7
MPI: manager+workersMPI: manager+workers#include <mpi.h>
main(int argc, char *argv[]){ int myrank; MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0) manager(); else worker();
MPI_Finalize();}
MPI_Init initializes the MPI system
MPI_Finalize called last by all processes
MPI_Comm_rank identifies a process by its rank
MPI_COMM_WORLD is the group that this process belongs to
Mateti, MPI8
MPI: manager()MPI: manager()manager(){ MPI_Status status;
MPI_Comm_size( MPI_COMM_WORLD, &ntasks); for (i = 1;i < ntasks;++i){
work= nextWork(); MPI_Send(&work, 1, MPI_INT,i,WORKTAG, MPI_COMM_WORLD);
}
… MPI_Reduce(&sub, &pi, 1, MPI_DOUBLE,
MPI_SUM, 0, MPI_COMM_WORLD);
}
MPI_Comm_size MPI_Send
Mateti, MPI9
MPI: worker()MPI: worker()worker(){ MPI_Status status; for (;;) { MPI_Recv(&work, 1,
MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD,
&status); result = doWork();
MPI_Send(&result, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
}}
MPI_Recv
Mateti, MPI10
MPI computes MPI computes #include "mpi.h"
int main(int argc, char *argv[]){ MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&np); MPI_Comm_rank(MPI_COMM_WORLD,&myid);
n = ...; /* intervals */
MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
sub = series_sum(n, np); MPI_Reduce(&sub, &pi, 1, MPI_DOUBLE,
MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi is %.16f\n", pi); MPI_Finalize(); return 0;}
Mateti, MPI11
Process groupsProcess groups
Group membership is static. There are no race conditions caused by
processes independently entering and leaving a group.
New group formation is collective and group membership information is distributed, not centralized.
Mateti, MPI12
MPI_SendMPI_Send blocking sendblocking send
MPI_Send(&sendbuffer, /* message buffer */
n, /* n items of */MPI_type, /* data type in message */ destination, /* process rank */ WORKTAG, /* user chosen tag */ MPI_COMM /* group */);
Mateti, MPI13
MPI_RecvMPI_Recv blocking blocking receivereceiveMPI_Recv(&recvbuffer, /* message buffer */n, /* n data items */MPI_type, /* of type */ MPI_ANY_SOURCE, /* from any sender */ MPI_ANY_TAG, /* any type of message */ MPI_COMM, /* group */ &status);
Mateti, MPI14
Send-receive succeeds …Send-receive succeeds …
Sender’s destination is a valid process rankReceiver specified a valid source processCommunicator is the same for bothTags matchMessage data types matchReceiver’s buffer is large enough
Mateti, MPI15
Message OrderMessage Order
P sends messages m1 first then m2 to QQ will receive m1 before m2
P sends m1 to Q, then m2 to RIn terms of a global wall clock, conclude
nothing re R receiving m2 before/after Q receiving m1.
Mateti, MPI16
Blocking and Non-blockingBlocking and Non-blocking
Send, receive can be blocking or notA blocking send can be coupled with a non-
blocking receive, and vice-versaNon-blocking send can use
– Standard mode MPI_Isend– Synchronous mode MPI_Issend– Buffered mode MPI_Ibsend– Ready mode MPI_Irsend
Mateti, MPI17
MPI_IsendMPI_Isend non-blockingnon-blocking
MPI_Isend(&buffer, /* message buffer */
n, /* n items of */MPI_type, /* data type in message */ destination, /* process rank */ WORKTAG, /* user chosen tag */ MPI_COMM, /* group */&handle);
Mateti, MPI18
MPI_IrecvMPI_Irecv
MPI_Irecv(&result, /* message buffer */n, /* n data items */MPI_type, /* of type */ MPI_ANY_SOURCE, /* from any sender */ MPI_ANY_TAG, /* any type of message */ MPI_COMM_WORLD, /* group */ &handle);
Mateti, MPI19
MPI_WaitMPI_Wait
MPI_Wait(handle,&status
);
Mateti, MPI20
MPI_Wait, MPI_TestMPI_Wait, MPI_Test
MPI_Wait(handle,&status
);
MPI_Test(handle,&status
);
Mateti, MPI21
Collective CommunicationCollective Communication
Mateti, MPI22
MPI_BcastMPI_Bcast
MPI_Bcast(buffer, count, MPI_Datatype, root,MPI_Comm
);
All processes use the same count, data type, root, and communicator. Before the operation, the root’s buffer contains a message. After the operation, all buffers contain the message from the root
Mateti, MPI23
MPI_ScatterMPI_Scatter
MPI_Scatter(sendbuffer,sendcount, MPI_Datatype,recvbuffer,recvcount, MPI_Datatype,root, MPI_Comm);
All processes use the same send and receive counts, data types, root and communicator. Before the operation, the root’s send buffer contains a message of length sendcount * N', where N is the number of processes. After the operation, the message is divided equally and dispersed to all processes (including the root) following rank order.
Mateti, MPI24
MPI_GatherMPI_Gather
MPI_Gather(sendbuffer,sendcount, MPI_Datatype, recvbuffer,recvcount, MPI_Datatype, root,MPI_Comm);
This is the “reverse” of MPI_Scatter(). After the operation the root process has in its receive buffer the concatenation of the send buffers of all processes (including its own), with a total message length of recvcount * N, where N is the number of processes. The message is gathered following rank order.
Mateti, MPI25
MPI_ReduceMPI_Reduce
MPI_Reduce(sndbuf, rcvbuf, count, MPI_Datatype datatype, MPI_Op, root, MPI_Comm);
After the operation, the root process has in its receive buffer the result of the pair-wise reduction of the send buffers of all processes, including its own.
Mateti, MPI26
Predefined Reduction OpsPredefined Reduction Ops
MPI_MAX MPI_MIN MPI_SUM MPI_PROD MPI_LAND MPI_BAND MPI_LOR MPI_BOR MPI_LXOR MPI_BXOR
MPI_MAXLOC MPI_MINLOC
L logical B bit-wise
Mateti, MPI27
User Defined Reduction User Defined Reduction OpsOpsvoid myOperator (void * invector,void * inoutvector,int * length,MPI_Datatype * datatype)
{…
}
Mateti, MPI28
Ten Reasons to Prefer MPI over PVMTen Reasons to Prefer MPI over PVM
1. MPI has more than one free, and quality implementations.
2. MPI can efficiently program MPP and clusters. 3. MPI is rigorously specified. 4. MPI efficiently manages message buffers. 5. MPI has full asynchronous communication. 6. MPI groups are solid, efficient, and deterministic. 7. MPI defines a 3rd party profiling mechanism. 8. MPI synchronization protects 3rd party software. 9. MPI is portable. 10. MPI is a standard.
Mateti, MPI29
SummarySummary
Introduction to MPIReinforced Manager-Workers paradigmSend, receive: blocked, non-blockedProcess groups
Mateti, MPI30
MPI resourcesMPI resources Open source implementations
– MPICH– LAM
Books– Using MPI
William Gropp, Ewing Lusk, Anthony Skjellum – Using MPI-2
William Gropp, Ewing Lusk, Rajeev Thakur
On-line tutorials– www.tc.cornell.edu/Edu/Tutor/MPI/