29 November 2004 Parallel Computing @ ISDS 2
Organization
Basics of Parallel Computing Structural Computational
Coding for Distributed Computing Examples Resources at Duke
CSEM Cluster
29 November 2004 Parallel Computing @ ISDS 3
Basics of Distributed Computing
Online Tutorial (Livermore Nat. Lab.)http://www.llnl.gov/computing/tutorials/parallel_comp/
Serial Computing: one computer, one CPU
Parallel Computing: multiple computers working at the same time
29 November 2004 Parallel Computing @ ISDS 4
Various Setups Collection of Workstations
PVM (Parallel Virtual Machine) R (rpvm, Rmpi, SNOW) LAM-MPI (Local Area Multicomputer) Matlab
Dedicated Cluster Master/Slave model MPI
29 November 2004 Parallel Computing @ ISDS 5
Network Layout Basic layout:
Each node has:CPU(s), memory
Node 3 Node 4
Node 1
Node 2
29 November 2004 Parallel Computing @ ISDS 6
Designing a Parallel Program
Do the nodes need to interact? “Embarrassingly Parallel”
Very little communication-- Monte Carlo
“Shamelessly Parallel”Communication Needed-- Heat diffusion models-- Spatial models
29 November 2004 Parallel Computing @ ISDS 7
Message Passing Interface
“MPI” Standard
Easy to use functions that manage the communication between nodes.
29 November 2004 Parallel Computing @ ISDS 8
Master/Slave Model Organize the
layout:
Slaves
Node 1“Master”
Node 2
Node 3
Node 4
Node p
29 November 2004 Parallel Computing @ ISDS 9
Master/Slave Model “Master” divides the task into pieces… …while slaves “listen” to the network,
waiting for work.
Master sends out the work to be done… …slaves do the work… …while the Master waits for the answers.
Slaves return the results.
29 November 2004 Parallel Computing @ ISDS 11
Example: Monte Carlo
Master
Slave 1Draw N/3 values
Compute Return
Slave 2
Same
Slave 3
Same
29 November 2004 Parallel Computing @ ISDS 12
Code
Write ONE program Same for master and slaves
Run program on EACH node Each program has to figure out if it’s the
master or a slave MPI
29 November 2004 Parallel Computing @ ISDS 13
Pseudo Code
LOAD DATA;
GET ID;
IF(ID==MASTER)
{
MASTER();
}
ELSE
{
SLAVE();
}
...
LOAD DATA;
GET ID;
IF(ID==MASTER)
{
MASTER();
}
ELSE
{
SLAVE();
}
...
LOAD DATA;
GET ID;
IF(ID==MASTER)
{
MASTER();
}
ELSE
{
SLAVE();
}
...
LOAD DATA;
GET ID;
IF(ID==MASTER)
{
MASTER();
}
ELSE
{
SLAVE();
}
...
Master Slave 1 Slave 2 Slave 3
29 November 2004 Parallel Computing @ ISDS 14
MASTER {
// Find # of nodes
GET NP;
for(i in 1:NP)
{ “Tell process i to compute the mean for 1000 samples” }
RET=RES=0;
// Wait for results
WHILE(RET<NP){
ANS = RECEIVE();
RES+=ANS/NP;
RET++;
}
SLAVE {
ANS = 0;
// Wait for orders
// Receive NREPS
for(i in 1:NREPS)
{
ANS += DRAW();
}
SEND(ANS/NREPS);
RETURN TO MAIN;
};
SLAVE {
ANS = 0;
// Wait for orders
// Receive NREPS
for(i in 1:NREPS)
{
ANS += DRAW();
}
SEND(ANS/NREPS);
RETURN TO MAIN;
};
SLAVE {
ANS = 0;
// Wait for orders
// Receive NREPS
for(i in 1:NREPS)
{
ANS += DRAW();
}
SEND(ANS/NREPS);
RETURN TO MAIN
};
Master Slave 1 Slave 2 Slave 3
29 November 2004 Parallel Computing @ ISDS 15
PRINT RES;
RETURN TO MAIN;
}
...
FINALIZE();
}
...
FINALIZE();
}
...
FINALIZE();
}
...
FINALIZE();
}
Master Slave 1 Slave 2 Slave 3
Pseudo Code
29 November 2004 Parallel Computing @ ISDS 16
Example: C++ Code
Large dataset N=40, p = 1,000 One outcome variable, y
Calculate R2 for all 1-var regression
parallel_R2.cpp
29 November 2004 Parallel Computing @ ISDS 17
CSEM Cluster @ Duke
Computational Science, Engineering and Medicine Cluster
http://www.csem.duke.edu/Cluster/clustermain.htm
Shared machines Core Nodes Contributed Nodes
29 November 2004 Parallel Computing @ ISDS 18
CSEM Cluster Details
4 Dual processing head nodes 64 Dual processing shared nodes
Intel Xeon 2.8 GHz 40 Dual processing “stat” nodes
Intel Xeon 3.1 GHz 161 Dual processing “other” nodes
Owners get priority
29 November 2004 Parallel Computing @ ISDS 20
Using the Cluster
Access limited ssh –l cmh27 cluster1.csem.duke.edu
Data stored locally on cluster
Compile using mpicc , mpif77, mpif90
Cluster uses SGE Queuing System
29 November 2004 Parallel Computing @ ISDS 21
Queuing System Submit your job with requests
memory usage number of CPUs (nodes)
Assigns nodes and schedules job Least-loaded machines fitting
requirements
Jobs run outside of SGE are killed
29 November 2004 Parallel Computing @ ISDS 22
Compiling
Linked libraries, etc…
g++ -c parallel_R2.cpp -I/opt/mpich-1.2.5/include -L/opt/mpich-1.2.5/lib
mpicc parallel_R2.o -o parallel_R2.exe -lstdc++ -lg2c -lm
29 November 2004 Parallel Computing @ ISDS 23
Submitting a Job
Create a queue script:
#!/bin/tcsh##$ -S /bin/tcsh -cwd#$ -M [email protected] -m b#$ -M [email protected] -m e#$ -o parallel_R2.out -j y#$ -pe low-all 5cd /home/stat/username/mpirun -np $NSLOTS -machinefile $TMPDIR/machines parallel_R2.exe
Email me when the job is submitted and when it finishes
Output file
Priority and number of nodes requested
Your Home Directory on CSEM Cluster
29 November 2004 Parallel Computing @ ISDS 24
Submitting a Job
Type:[cmh27@head1 cmh27]$ qsub parallel_R2.q
Your job 93734 ("parallel_R2.q") has been submitted.
Monitoringhttp://clustermon.csem.duke.edu/ganglia/