mpi presentation

11/05/08 Parallel Programming Using MPI 1

Parallel Programmingusing

Message Passing Interface

(MPI)

metu-ceng [email protected]

25 April 2008

11/05/08 Parallel Programming Using MPI 2/26

Outline

• What is MPI?• MPI Implementations• OpenMPI• MPI• References• Q&A


What is MPI?

• A standard with many implementations (lam-mpi and mpich, evolving into OpenMPI and MVAPICH).

• message passing API• Library for programming clusters• Needs to be high performing, scalable,

portable ...


MPI Implementations• Is it up for the challenge?

MPI does not have many alternatives (what about OpenMP, map-reduce etc?).

• Many implementations out there.• The programming interface is all the same. But

underlying implementations and what they support in terms of connectivity, fault tolerance etc. differ.

• On ceng-hpc, both MVAPICH and OpenMPI is installed.


OpenMPI

• We'll use OpenMPI for this presentation• It's open source, MPI2 compliant,

portable, has fault tolerance, combines best practices of number of other MPI implementations.

• To install it, for example on Debian/Ubuntu type:# apt-get install openmpi-bin libopenmpi-dev

openmpi-doc


MPI – General Information

• Functions start with MPI_* to differ from application

• MPI has defined its own data types to abstract machine dependent implementations (MPI_CHAR, MPI_INT, MPI_BYTE etc.)


MPI - API and other stuff

• Housekeeping (initialization, termination, header file)

• Two types of communication: Point-to-point and collective communication

• Communicators


Housekeeping

• You include the header mpi.h• Initialize using MPI_Init(&argc, &argv)

and end MPI using MPI_Finalize()• Demo time, “hello world!” using MPI


Point-to-point communication

• Related definitions – source, destination, communicator, tag, buffer, data type, count

• man MPI_Send, MPI_Recv int MPI_Send(void *buf, int count, MPI_Datatype

datatype, int dest,int tag, MPI_Comm comm)

• Blocking send, that is the processor doesn't do anything until the message is sent


P2P Communication (cont.)• int MPI_Recv(void *buf, int count, MPI_Datatype

datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

• Source, tag, communicator has to be correct for the message to be received

• Demo time – simple send• One last thing, you can use wildcards in

place of source and tag. MPI_ANY_SOURCE and MPI_ANY_TAG


P2P Communication (cont.)

• The receiver actually does not know how much data it received. He takes a guess and tries to get the most.

• To be sure of how much received, one can use:

• int MPI_Get_count(MPI_Status *status, MPI_Datatype dtype, int *count);

• Demo time – change simple send to check the received message size.


P2P Communication (cont.)• For a receive operation, communication ends when

the message is copied to the local variables.• For a send operation, communication is completed

when the message is transferred to MPI for sending. (so that the buffer can be recycled)

• Blocked operations continue when the communication has been completed

• Beware – There are some intricaciesCheck [2] for more information.



• For blocking communications, deadlock is a possibility:

if( myrank == 0 ) {

/* Receive, then send a message */

MPI_Recv( b, 100, MPI_DOUBLE, 1, 19, MPI_COMM_WORLD, &status );

MPI_Send( a, 100, MPI_DOUBLE, 1, 17, MPI_COMM_WORLD );

}

else if( myrank == 1 ) {

/* Receive, then send a message */

MPI_Recv( b, 100, MPI_DOUBLE, 0, 17, MPI_COMM_WORLD, &status ); MPI_Send( a, 100, MPI_DOUBLE, 0, 19, MPI_COMM_WORLD );

}

• How to remove the deadlock?



• When non-blocking communication is used, program continues its execution

• A program can send a blocking send and the receiver may use non-blocking receive or vice versa.

• Very similar function callsint MPI_Isend(void *buf, int count, MPI_Datatype dtype, int dest,

int tag, MPI_Comm comm, MPI_Request *request);

• Request handle can be used latereg. MPI_Wait, MPI_Test ...



• Demo time – non_blocking• There are other modes of sending

(but not receiving!) check out the documentation for synchronous, buffered and ready mode send in addition to standard one we have seen here.



• Keep in mind that each send/receive is costly – try to piggyback

• You can send different data types at the same time – eg. Integers, floats, characters, doubles... using MPI_Pack. This function gives you an intermediate buffer which you will send.

• int MPI_Pack(void *inbuf, int incount, MPI_Datatype datatype, void *outbuf, int outsize, int *position, MPI_Comm comm)

• MPI_Send(buffer, count, MPI_PACKED, dest, tag, MPI_COMM_WORLD);



• You can also send your own structs (user defined types). See the documentation


Collective Communication

• Works like point to point except you send to all other processors

• MPI_Barrier(comm), blocks until each processor calls this. Synchronizes everyone.

• Broadcast operation MPI_Bcast copies the data value in one processor to others.

• Demo time - bcast_example


Collective Communication

• MPI_Reduce collects data from other processors, operates on them and returns a single value

• reduction operation is performed• Demo time – reduce_op example• There are MPI defined reduce

operations but you can define your own


Collective Communication - MPI_Gather

• gather and scatter operations• Like what their name implies• Gather – like every process sending

their send buffer and root process receiving

• Demo time - gather_example


Collective Communication - MPI_Scatter

• Similar to MPI_Gather but here data is sent from root to other processors

• Like gather, you can accomplish it by having root calling MPI_Send repeatedly and others calling MPI_Recv

• Demo time – scatter_example


Collective Communication – More functionality

• Many more functions to lift hard work from you.

• MPI_Allreduce, MPI_Gatherv, MPI_Scan, MPI_Reduce_Scatter ...

• Check out the API documentation• Manual files are your best friend.


Communicators

• Communicators group processors• Basic communicator

MPI_COMM_WORLD defined for all processors

• You can create your own communicators to group processors. Thus you can send messages to only a subset of all processors.


More Advanced Stuff

• Parallel I/O – when one node is used for reading from disk it is slow. You can have each node use its local disk.

• One sided communications – Remote memory access

• Both are MPI-2 capabilities. Check your MPI implementation to see how much it is implemented.


References[1] Wikipedia articles in general, including but not limited to:

http://en.wikipedia.org/wiki/Message_Passing_Interface

[2] An excellent guide at NCSA (National Center for Supercomputing Applications):

http://webct.ncsa.uiuc.edu:8900/public/MPI/

[3] OpenMPI Official Web site:

http://www.open-mpi.org/

http://en.wikipedia.org/wiki/Message_Passing_Interface

http://webct.ncsa.uiuc.edu:8900/public/MPI/

http://www.open-mpi.org/


The End

Thanks For Your Time.Any Questions

?

mpi presentation

Technology

status mpi

header mpi

mpi api

end mpi

man mpi

application mpi

comm comm mpi

recv int mpi