mpi and high performance computing: systems and programming barry britt, systems administrator
DESCRIPTION
MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University. Purpose. To give you: … an overview of some new system-level MPI functions … access to tools that you need to compile and run MPI jobs - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/1.jpg)
MPI and High Performance Computing:Systems and Programming
Barry Britt, Systems Administrator Department of Computer Science
Iowa State University
![Page 2: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/2.jpg)
Purpose
To give you: … an overview of some new system-level MPI
functions … access to tools that you need to compile and
run MPI jobs … some instruction in the creation and use of
Makefiles … some instruction on how to tell time in C
programs.
![Page 3: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/3.jpg)
Makefiles
![Page 4: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/4.jpg)
Makefiles
GNU Make Enables the end user to build and install a
package without worrying about the details. Automatically figures out which files it needs
to update based on which source files have changed.
Not language dependent Not limited to building a package; can be used
to install or uninstall
![Page 5: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/5.jpg)
Makefile Rules
A rule tells Make how to execute a series of commands in order to build a target from source files.
Specifies a list of dependancies Dependancies should include ALL files that are
dependancies for a target
target:dependancies ....
commands...
![Page 6: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/6.jpg)
Example Makefile for C Source
CC=gccCFLAGS=-WallINCLUDES=BINARIES=rand test
.SUFFIXES: .c .o
.c.o: $(CC) $(CFLAGS) -c $*.call: $(BINARIES)rand.o: rand.ctest.o: test.crand: rand.o $(CC) $(CFLAGS) -o rand rand.otest: test.o $(CC) $(CFLAGS) -o test test.oclean: rm -f a.out core *.o $(BINARIES)
![Page 7: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/7.jpg)
Example Makefile for C SourceCC=gccCFLAGS=-WallINCLUDES=BINARIES=rand test
Variables CC is set to use the GCC compiler
For MPI programs, set it to mpicc, not gcc CFLAGS:
-c: compile -Wall: set warnings to all
![Page 8: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/8.jpg)
Example Makefile for C Source
Target “clean”. Use by typing make clean
Rule states: In my current directory, run:
rm -f a.out core *.o $(BINARIES) rm -f a.out core *.o rand test
clean:rm -f a.out core *.o $(BINARIES)
![Page 9: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/9.jpg)
Example Makefile for C Source
Makefile instruction on how to handle .c files and turn them into object (.o) files
Compile using $(CC) value with $(CFLAGS) Compile each individual file into its
appropriate .o file
.SUFFIXES: .c .o
.c.o: $(CC) $(CFLAGS) -c $*.c
![Page 10: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/10.jpg)
Example Makefile for C Source
Target: rand or test Run $(CC) $(CFLAGS) -o rand rand.o gcc -Wall -o rand rand.o
If you were going to include external libraries to link, they would be linked at the end of the rule.
rand.o:rand.ctest.o: test.crand: rand.o
$(CC) $(CFLAGS) -o rand rand.otest: test.o
$(CC) $(CFLAGS) -o test test.o
![Page 11: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/11.jpg)
Random Matrix Generation
![Page 12: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/12.jpg)
Random Generator for Matrices
Rand -f: filename to which to write the matrix -c: number of matrix columns -r: number of matrix rows -h: help documentation -s: seed -m: max integer in matrix cells
![Page 13: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/13.jpg)
Random Generator for Matrices
Completely random generation for an m by n matrix
Uses a random seed to create the matrix Output file
First line contains the number of rows and the number of columns
Subsequent lines contain matrix cell values, one per line.
![Page 14: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/14.jpg)
Random Generator for Matrices
For a Matrix with row length m, cell A[i,j] is on line:
m * i + j + 2 Lines are not zero-indexed for the purpose of
this calculation. Therefore, for a 5 x 5 matrix (zero-indexed):
A[0, 0] is on line 2 A[0, 1] is on line 3 A[4, 4] is on line 26 A[2, 3] is on line 15
![Page 15: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/15.jpg)
Calculating Run Time in C
![Page 16: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/16.jpg)
Calculating Running Time in C
#include <stdio.h>#include <sys/time.h>
int main() { struct timeval begin, end; double time;
gettimeofday(&begin, NULL); sleep(10); gettimeofday(&end, NULL);
time = (end.tv_sec - begin.tv_sec) +((end.tv_usec - begin.tv_usec) / 1000000.0); printf("This program ran for %f seconds\n", time);
return 0;}
![Page 17: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/17.jpg)
C Time
Includes seconds and microseconds Used by the gettimeofday() system call gettimeofday()
Returns the number of seconds (and microseconds) since the UNIX Epoch
Is this completely accurate? No, but it's VERY close (within a few
microseconds).
![Page 18: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/18.jpg)
C Time
You MUST use the timeval struct for the gettimeofday() call
On UNIX systems, you need to include sys/time.h to use this.
Calculation of time is:(end seconds – begin seconds) +
((end microseconds – begin microseconds) / 1000000)
You can calculate: Program run time Algorithm execution time
![Page 19: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/19.jpg)
Using the PBS Job Submission System
![Page 20: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/20.jpg)
PBS (Torque/Maui)
hpc-class job submission system qsub All queues are managed by the scheduler. PBS scripts can be created at:
http://hpcgroup.public.iastate.edu/HPC/hpc-class/hpc-class_script_writer.html
![Page 21: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/21.jpg)
Example script
#!/bin/csh
#PBS -o BATCH_OUTPUT #PBS -e BATCH_ERRORS
#PBS -lvmem=256Mb,pmem=256Mb,mem=256Mb,nodes=16:ppn=2,cput=2:00:00,walltime=1:00:00
# Change to directory from which qsub was executed cd $PBS_O_WORKDIR
time mpirun -np 32 <program>
![Page 22: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/22.jpg)
PBS Variables
-l (resources) vmem: total virtual memory pmem: per task memory mem: total aggregate memory nodes – total number of nodes ppn – processors per node cput – CPU time walltime – total time for all CPUs
![Page 23: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/23.jpg)
PBS Variables
vmem = pmem = mem total CPUs = nodes * ppn cput = walltime * ppn
![Page 24: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/24.jpg)
PBS (Torque/Maui)
Based on the previous script BATCH_OUTPUT contains the output from the
batch job BATCH_ERRORS contains the error
information from the batch job
![Page 25: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/25.jpg)
Some other important information
Max CPU – 32 for classwork Max memory – 2.0 GB Max swap – 2.0 GB Short queue -
4 nodes per job; 16 total CPUs 1 hour per job 2 total jobs per user
![Page 26: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/26.jpg)
MPI Blocking vs. Non-Blocking Communication
![Page 27: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/27.jpg)
MPI Communication
Blocking Communication: MPI_Send MPI_Recv
MPI_Send → Basic blocking send operation. Routine returns only after the application buffer in the sending task is free for reuse.
MPI_Recv → Receive a message and block until the requested data is available in the application buffer in the receiving task.
![Page 28: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/28.jpg)
MPI Communication Non-blocking Communication
MPI_Isend | MPI_Irecv MPI_Wait | MPI_Test
MPI_Isend → Identifies an area in memory to serve as a send buffer. Processing continues without waiting for the message to be copied out from the buffer.
MPI_Irecv → Identifies an area in memory to serve as a receive buffer. Processing continues immediately without waiting for the message to be received and copied into the the buffer.
MPI_Test → check the status of a non-blocking send or receive MPI_Wait → block until a specified non-blocking send or receive
operation has completed
![Page 29: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/29.jpg)
Why non-blocking communication?
In some cases, it can increase performance. If there is an expensive operation you need to
do, it helps speed up the program Disk I/O Heavy processing on already received data
BE CAREFUL!!! If you try to access a buffer when it isn't there,
your program WILL fail.
![Page 30: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/30.jpg)
int main (int argc, char **argv) { int myRank;
MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
if (myRank == 0) master(); else slave();
MPI_Finalize(); return 0;}
int master() { int i, size, my_answer = 0, their_work = 0; MPI_Status status;
MPI_Comm_size(MPI_COMM_WORLD, &size);
for (i = 1; i < size; i++) { MPI_Recv ( &their_work, 1, MPI_INT, i, TAG, MPI_COMM_WORLD, &status); my_answer += their_work; } printf("The answer is: %d\n", my_answer);
return 0;}
![Page 31: MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator](https://reader031.vdocuments.us/reader031/viewer/2022020322/56814643550346895db34e69/html5/thumbnails/31.jpg)
int slave() { int i, myRank, size, namelength, work = 0; char name[MPI_MAX_PROCESSOR_NAME];
MPI_Comm_rank(MPI_COMM_WORLD, &myRank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_processor_name(name, &namelength);
printf("[%s]: Adding the nubmers %d to %d = ", name, (100 / (size-1)) * (myRank-1) + 1 , (100 / (size-1)) * myRank);
for (i = (100 / (size-1)) * (myRank-1) + 1; i <= myRank * (100 / (size-1)); i++) { work = work + i; } printf("%d\n", work);
MPI_Send(&work, 1, MPI_INT, 0, TAG, MPI_COMM_WORLD);
return 0;}