introduction to mpi mpi programming running mpi program architecture of mpich lecture 2: part ii...
TRANSCRIPT
Introduction to MPI MPI programming Running MPI program Architecture of MPICH
Lecture 2: Part IIMessage Passing
Programming: MPI
Message Passing Interface (MPI)
What is MPI?
A message passing library specification– message-passing model– not a compiler specification– not a specific product
For parallel computers, clusters and heterogeneous networks.
Full-featured
Why use MPI? (1)
Message passing now mature as programming paradigm
well understood efficient match to hardware many applications
Why use MPI? (2)
Full range of desired features– modularity– access to peak performance– portability– heterogeneity– subgroups– topologies– performance measurement tools
Who Designed MPI ?
Venders– IBM, Intel, TMC, SGI, Meiko, Cray,
Convex, Ncube,….. Library writers
– PVM, p4, Zipcode, TCGMSG, Chameleon, Express, Linda, DP (HKU), PM (Japan), AM (Berkeley), FM (HPVM at Illinois)
Application specialists and consultants
Cho-Li Wang 7
Vender-Supported MPI
HP-MPI Hewlett Packard; Convex SPPMPI-F IBM SP1/SP2Hitachi/MPI HitachiSGI/MPI SGI PowerChallenge seriesMPI/DE NEC.INTEL/MPI Intel. Paragon (iCC lib)T.MPI Telmat MultinodeFujitsu/MPI Fujitsu AP1000EPCC/MPI Cray & EPCC, T3D/T3E.
Cho-Li Wang 8
Public-Domain MPI
MPICH Argonne National Lab. &
Mississippi State Univ. LAM Ohio Supercomputer center MPICH/NT Mississippi State University MPI-FM Illinois (Myrinet) MPI-AM UC Berkeley (Myrinet) MPI-PM RWCP, Japan (Myrinet) MPI-CCL California Institute of Technology
Public-Domain MPI
CRI/EPCC MPI Cray Research and Edinburgh Parallel Computing Centre (Cray
T3D/E)
MPI-AP Australian National University-
CAP Research Program (AP1000)
W32MPI Illinois, Concurrent Systems RACE-MPI Hughes Aircraft Co. MPI-BIP INRIA, France (Myrinet)
Communicator Conceptin MPI
Identify the process group and context with respect to which the operation is to be performed
Process
Process
Process
Process
ProcessProcess
Process
Process
Process
Process
Process
Communicator (2)Four communicatorsProcess in different communicators
cannot communicate
Process
Process
Process
Process
ProcessProcess
Communicator within Communicator
Process
Process
Same process can be existed in different
communicators
Process
Features of MPI (1)
General– Communicators combine context and
group for message security
Features of MPI (2)
Point-to-point communication Structured buffers and derived data
types, heterogeneity Modes : normal (blocking and non-
blocking), synchronous, ready (to allow access to fast protocols), buffered
Collective Communication Both built-in and user-defined collective
operations Large number of data movement routines Subgroups defined directly or by topology E.g, broadcast, barrier, reduce, scatter,
gather, all-to-all, ..
Features of MPI (3)
MPI Programming
Writing MPI programs
MPI comprises 125 functions Many parallel programs can be written
with just 6 basic functions
Six basic functions (1)
MPI_INITInitiate an MPI computation
MPI_FINALIZETerminate a computation
Six basic functions (2)
MPI_COMM_SIZEDetermine number of processes in a communicator
MPI_COMM_RANKDetermine the identifier of a process in a specific communicator
Six basic functions (3)
MPI_SENDSend a message from one process to another process
MPI_RECVReceive a message from one process to another process
Program main
begin
MPI_INIT()
MPI_COMM_SIZE(MPI_COMM_WORLD, count)
MPI_COMM_RANK(MPI_COMM_WORLD, myid)
print(“I am ”, myid, “ of ”, count)
MPI_FINALIZE()
end
A simple program
MPI_INIT()
Initiate computation
MPI_COMM_SIZE(MPI_COMM_WORLD, count)
Find the numberof processes
MPI_COMM_RANK(MPI_COMM_WORLD, myid)
Find the process ID ofcurrent process
print(“I am “, myid, “ of “, count)
Each process prints out its output
MPI_FINALIZE()
Shut down
Result
I’m 3 of 4
I’m 0 of 4
I’m 1 of 4
I’m 2 of 4
Process 0 Process 4
Process 1Process 3
Point-to-Point Communication
The basic point-to-point communication operators are send and receive.
Sender Receiver
BufferBuffer
TransmissionSend Receive
Another simple program (2 nodes)
…..MPI_COMM_RANK(MPI_COMM_WORLD, myid)if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…)else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…)END IFprint(“Received from “,words)……
I’m process 0!if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…)……
I’m process 1!else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…)
Process 0 Process 1MPI_SEND(“Zero”,…,…,1,…,…)
MPI_RECV(words,…,…,0,…,…,…)
Send “Zero”to process 1
Setup buffer and wait the messagefrom process 0
words(buffer)
Received
WaitMPI_RECV(words,…,…,1,…,…)
MPI_SEND(“One”,…,…,0,…,…,…)
Setup buffer and wait the messagefrom process 1
Send “One”to process 0
Wait
words(buffer)
Received
Print(“Receivedfrom “,words)
Print(“Receivedfrom “,words)
Zero
One
Result
Received from One
Received from Zero
Process 0
Process 1
Collective Communication (1)
Communication that involves a group of processes
Sender
Receivers
Buffer
Buffer
TransmissionSend
Buffer
Buffer
Receive
Collective Communication (2)
Three Types Barrier
• MPI_BARRIER
Data movement• MPI_BCAST• MPI_GATHER• MPI_SCATTER
Reduction operations• MPI_REDUCE
Barrier
MPI_BARRIER Used to synchronize execution of a
group of processesWait for us!
We can’tgo on!
Barrier
Barrier
We’re together! The barrier will be disappeared!
Barrier
Let’s go!
FACEFACE
Process 0 Process 1 Process 2 Process 3
BCAST BCAST BCAST BCAST
FACE FACE FACE
Data movement (1)
MPI_BCAST One single process sends the same
data to all other processes, itself included
Process 0 Process 1 Process 2 Process 3
GATHER GATHER GATHER GATHER
EA C EF FACFACE
Data movement (2)
MPI_GATHER All process (include the root process)
send the same data to one process and store them in rank order
Process 0 Process 1 Process 2 Process 3
SCATTER SCATTER SCATTER SCATTER
FACEF C EA
Data movement (3)
MPI_SCATTER A process sends out a message, which
is split into several equals parts, and the ith portion is sent to the ith process
Process 0 Process 1 Process 2 Process 3
REDUCE REDUCE REDUCE REDUCE
9 3 789
8 9 3 7max
Data movement (4)
MPI_REDUCE (e.g., find maximum value)
combine the values of each process, using a specified operation, and return the combined value to a process
Example program (1)
Calculating the value of by:
1
02
dxx1
4
Example program (2)
……
MPI_BCAST(numprocs, …, …, 0, …)
for (i = myid + 1; i <= n; i += numprocs)
compute the area for each interval
accumulate the result in processes’
program data (sum)
MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …)
if (myid == 0)
Output result
…… Boardcast the no. of process
MPI_BCAST(numprocs, …, …, 0, …)
Each process calculate specified areas
for (i = myid + 1; i <= n; i += numprocs)
compute the area for each interval
accumulate the result in processes’
program data (sum)
Sum up all the areas
MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …)
Print the resultif (myid == 0)
Output result
Calculated by process 0Calculated by process 1Calculated by process 2Calculated by process 3
OK!
OK!
OK!
OK!
=3.141...
Start calculation!
MPICH - A Portable Implementation of MPI
Argonne National Laboratory
What is MPICH???
The first complete and portable implementation of full MPI standard.
‘CH’ stands for “Chameleon” symbol of adaptability and portability.
It contains a programming environment for working with MPI programs.
It includes a portable startup mechanism and libraries.
How can I install it??? Install the packet mpich.tar.gz to a directory Use ‘./configure’ and ‘make >& make.log to
choose appropriate architecture and device and compile the file – Syntax: ./configure -device=DEVICE -
arch=ARCH_TYPE• ARCH_TYPE: specify the type of machine to be
configured• DEVICE: specify what kind of communication
device the system will choose - ch_p4 (TCP/IP)
How to run an MPI Program
1 Edit mpich/util/machines/machines.XXXX, to contain names of machines of architecture xxxx. For example:
Computermercury
Computervenus
Computermars
Computerearth
The file should be in the format:
mercuryvenusearthmarsearthmars
How to run an MPI Program
2 include “mpi.h” into the source program. 3 Compile program by using command
‘mpicc’ - mpicc -c foo.c4 Use ‘mpirun’ to run an MPI program.
mpirun will determine the environment for the program to run
How to run an MPI Program
mpirun -np 4 a.out - a.out are going to run four processors for massively parallel processors
mpirun -arch sun4 -np2 -arch rs6000 -np 3 program
- Run a program on 2 sun4s and 3 rs6000s, with local machine being a sun4 (multiple architectures)
5
6
MPIRUN (1)
How to start a mpi program? Use mpirun Examples:
– #mpirun -np 4 cpi– it starts four processes of cpi
MPIRUN (2) What MPIRUN do?
– 1. Read the arguments to specify the environment of the mpi program.
i) How many processes should be started
ii) Which machines will the mpi program be started
iii) What device will be used (e.g. ch_p4)
– 2. Split the processes to the machines will be ran
– 3. Record down the split results in the PI???? file
MPIRUN(3)
Example
Suppose using ch_p4 device– #mpirun -np 4 cpi
1. mpirun knows 4 processes need to be started
2. mpirun reads the machines file to find which machines can be ran
3. ch_p4 device will be used if no specified argument given in the command
MPIRUN (4)
4. Split the tasks and save in PI???? file
File format:
<hostname> <no. of proc.> <program>
genius.cs.hku.hk 0 cpi
eagle.cs.hku.hk 1 cpi
dragon.cs.hku.hk 1 cpi
virtue.cs.hku.hk 1 cpi
5. Start the processes in remote machines by using “rsh”
Architecture of MPICH
Low Level LayerLow Level Layer
ABSTRACT
DEVICE
INTERFACE
ABSTRACT
DEVICE
INTERFACE
ABSTRACT
DEVICE
INTERFACE
Structure of MPICH
ABSTRACT
DEVICE
INTERFACE
ABSTRACT
DEVICE
INTERFACE
ABSTRACT
DEVICE
INTERFACE
ABSTRACT
DEVICE
INTERFACE
Low Level LayerLow Level LayerLow Level LayerLow Level LayerLow Level LayerLow Level LayerLow Level LayerLow Level LayerLow Level Layer
MPI PORTABLE API LIBRARY
MPICH ABSTRACT DEVICE
MPICH CHANNEL INTERFACE
Socket
TCP/IP
Shared
Memory
Vendor
Design
MPICH - Abstract Device Interface
Interface between high-level MPI and low-level device.
Manages message packaging, buffering policies and handle heterogeneous communication.
4 sets of functions: – 1. Specify send or receive of a message.– 2. Data movement between API and hardware.– 3. Manage lists of pending messages.– 4. Provide information about execution environment.
MPICH - The Channel Interface (1)
The interface transfer data from one process‘s address space to another’s.
Information is divided into two parts:– message envelop and data
It includes five functions:• MPID_SendControl, MPID_RecvAnyControl,
MPID_ControlMsgAvail - envelop information• MPID_SendChannel, MPID_RecvFromChannel - data
information
MPICH - The Channel Interface (2)
Channel Interface adopt data exchange mechanism in accordance to the size of message.
Data Exchange Mechanism implemented:– Short, Eager, Rendezvous, Get
Protocol - Short
The size of data managed by this mechanism is shortest.
The data is delivered within the message envelop.
Data
Control MessageControl MessageControl Message
Reach
Short Protocol Data Transfer
Control MessageControl MessageControl MessageControl MessageControl MessageControl Message
Store in Buffer
Control MessageControl MessageControl MessageControl MessageControl Message
ReachReachReachReachReach
MPI_RecvMPI_RecvMPI_Recv
Protocol - Eager
Data is sent to the destination immediately.
The receiver must allocate some space to store the data locally.
It is the default choice in MPICH. It is not suitable for large amounts of
data transfer.
Eager Protocol Data Transfer
MPI_Control
Data
Save in Buffer
MPI_RecvMPI_Recv
Buffer Full!!!
MPI_ControlMPI_ControlMPI_ControlMPI_ControlMPI_ControlMPI_ControlMPI_ControlMPI_Control
Data Data Data Data DataData Data DataDataData1Data3
Data2Data4
MPI_RecvMPI_Recv
Protocol - Rendezvous
Data is sent to the destination only when requested.
If users want to use it, add -use_rndv in the command ‘./configure’.
No buffering required.
Rendezvous Protocol Data Transfer
MPI_Control
Data
Wait!MPI_ControlMPI_Control MPI_Cotrol MPI_Control MPI_ControlMPI_Control
MPI_Recv
MPI_RequestMPI_RequestMPI_RequestMPI_RequestMPI_RequestMPI_RequestMPI_Request
Data Data Data
MPI_Control
Wait Again!
Match!!!WaitData DataDataData DataReceived!
Protocol - Get
In this protocol, data is read directly by the receiver.
Data is directly transferred from one process’s memory to another.
Highest Performance. – require shared memory– remote memory operation
Get Protocol Data Transfer
I want to get data
from sender
Receiver directly access sender shared memory
Receiver directly copy data from sender shared memory to its memory
Conclusion
MPI–1.1 (June 95)
MPI 1.1 doesn’t provide process management remote memory transfers active messages threads virtual shared memory
MPI–2 (July 97)
Extensions to the MPI process creation and management one-sided communications extended collective operations external interface I/O additional language bindings