advanced high performance computing workshop hpc 201 charles j antonelli mark champe seth meyer...
Post on 14-Dec-2015
215 Views
Preview:
TRANSCRIPT
Advanced High Performance
Computing WorkshopHPC 201
Charles J AntonelliMark Champe
Seth MeyerLSAIT ARS
October, 2014
cja 2014 2
RoadmapFlux review
Globus Connect
Advanced PBSArray & dependent scheduling
Tools
GPUs on Flux
Scientific applicationsR, Python, MATLAB
Parallel programming
Debugging & profiling
10/14
cja 2014 3
Flux review
10/14
cja 2014 4
The Flux clusterLogin nodes Compute nodes
Storage…
Data transfernode
10/14
cja 2014 5
A Flux node
12-40 Intel cores
48 GB - 1 TB RAM
Local disk
10/14
8 GPUs (GPU Flux)
Each GPU contains 2,688 GPU cores
cja 2014 6
Programming Models
Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network
Used when the data are too large to fit on a single node, and simple synchronization is adequate
"Coarse parallelism" or "SPMD"
Implemented using MPI (Message Passing Interface) libraries
Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives
Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable
"Fine-grained parallelism" or "shared-memory parallelism"
Implemented using OpenMP (Open Multi-Processing) compilers and libraries
Both
10/14
cja 2014 7
Using Flux
Three basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token)
Logging in to Fluxssh flux-login.engin.umich.eduCampus wired or MWirelessOtherwise:
VPN
ssh login.itd.umich.edu first
10/14
cja 2014 8
Cluster batch workflow
You create a batch script and submit it to PBS
PBS schedules your job, and it enters the flux queue
When its turn arrives, your job will execute the batch script
Your script has access to all Flux applications and data
When your script completes, anything it sent to standard output and error are saved in files stored in your submission directory
You can ask that email be sent to you when your jobs starts, ends, or aborts
You can check on the status of your job at any time,or delete it if it's not doing what you want
A short time after your job completes, it disappears from PBS
10/14
cja 2014 9
Loosely-coupled batch script
#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS -l procs=12,pmem=1gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe
#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmpirun ./c_ex01
10/14
cja 2014 10
Tightly-coupled batch script
#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS -l nodes=1:ppn=12,mem=4gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe
#Your Code Goes Below:cd $PBS_O_WORKDIRmatlab -nodisplay -r script
10/14
cja 2014 11
GPU batch script
#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS -l nodes=1:gpus=1,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe
#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmatlab -nodisplay -r gpuscript
10/14
LSA IT ARS / cja © 2014 12
Copying dataFrom Linux or Mac OS X, use scp or sftp
Non-interactive (scp)scp localfile uniqname@flux-xfer.engin.umich.edu:remotefilescp -r localdir uniqname@flux-xfer.engin.umich.edu:remotedirscp uniqname@flux-login.engin.umich.edu:remotefile localfile
Use "." as destination to copy to your Flux home directory:scp localfile login@flux-xfer.engin.umich.edu:.
... or to your Flux scratch directory:scp localfile login@flux-xfer.engin.umich.edu:/scratch/allocname/uniqname
Interactive (sftp)sftp uniqname@flux-xfer.engin.umich.edu
From Windows, use WinSCPU-M Blue Disc: http://www.itcs.umich.edu/bluedisc/
Version ca-0.96 - 8 Oct 2014
cja 2014 13
Globus OnlineFeatures
High-speed data transfer, much faster than SCP or SFTP
Reliable & persistent
Minimal client software: Mac OS X, Linux, Windows
GridFTP EndpointsGateways through which data flow
Exist for XSEDE, OSG, …
UMich: umich#flux, umich#nyx
Add your own client endpoint!
Add your own server endpoint: contact flux-support@umich.edu
More informationhttp://cac.engin.umich.edu/resources/login-nodes/globus-gridftp 10/14
cja 2014 14
Advanced PBS
10/14
cja 2014 15
Job Arrays• Submit copies of identical jobs• Invoked via qsub -t:
qsub -t array-spec pbsbatch.txt
Where array-spec can be
m-n
a,b,c
m-n%slotlimit
e.g.
qsub -t 1-50%10 Fifty jobs, numbered 1 through 50,
only ten can run simultaneously
• $PBS_ARRAYID records array identifier
1510/14
cja 2014 16
Dependent scheduling
• Submit job to become eligible for execution at a given time
• Invoked via qsub -a:qsub -a [[[[CC]YY]MM]DD]hhmm[.SS] …
qsub -a 201412312359 j1.pbs j1.pbs becomes eligible one minute before New Year's Day 2015
qsub -a 1800 j2.pbsj2.pbs becomes eligible at six PM today (or tomorrow, if submitted after six PM)
1610/14
cja 2014 17
Dependent scheduling
• Submit job to run after specified job(s) • Invoked via qsub -W:
qsub -W depend=type:jobid[:jobid]…
Where depend can beafter Schedule this job after jobids have startedafterany Schedule this job after jobids have finishedafterok Schedule this job after jobids have finished with no errorsafternotok Schedule this job after jobids have finished with errors
qsub first.pbs # assume receives jobid 12345qsub -W afterany:12345 second.pbsSchedule second.pbs after first.pbs completes
1710/14
cja 2014 18
Dependent scheduling
• Submit job to run before specified job(s)• Requires dependent jobs to be scheduled first• Invoked via qsub -W:
qsub -W depend=type:jobid[:jobid]…
Where depend can bebefore jobids scheduled after this job startsbeforeanyjobids scheduled after this job completesbeforeok jobids scheduled after this job completes with no errorsbeforenotok jobids scheduled after this job completes with errorson:N wait for N job completions
qsub -W on:1 second.pbs # assume receives jobid 12345qsub -W beforeany:12345 first.pbsSchedule second.pbs after first.pbs completes
1810/14
cja 2014 19
Troubleshootingshowq [-r][-i][-b][-w user=uniq] # running/idle/blocked jobs
qstat -f jobno # full info incl gpu
qstat -n jobno # nodes/cores where job running
diagnose -p # job prio and components
pbsnodes # nodes, states, properties
pbsnodes -l # list nodes marked down
checkjob [-v] jobno # why job jobno not running
mdiag -a # allocs & users (flux)
freenodes # aggregate node/core busy/free
mdiag -u uniq # allocs for uniq (flux)
mdiag -a alloc_flux # cores active, alloc (flux)
10/14
cja 2014 20
Scientific applications
10/14
cja 2014 21
Scientific Applications
R (incl snow and multicore)
R with GPU (GpuLm, dist)
SAS, Stata
Python, SciPy, NumPy, BioPy
MATLAB with GPU
CUDA Overview
CUDA C (matrix multiply)
10/14
cja 2014 22
PythonPython software available on Flux
EPDThe Enthought Python Distribution provides scientists with a comprehensive set of tools to perform rigorous data analysis and visualization.https://www.enthought.com/products/epd/
biopythonPython tools for computational molecular biologyhttp://biopython.org/wiki/Main_Page
numpyFundamental package for scientific computinghttp://www.numpy.org/
scipyPython-based ecosystem of open-source software for mathematics, science, and engineeringhttp://www.scipy.org/
10/14
cja 2014 23
Debugging & profiling
10/14
cja 2014 24
Debugging with GDB
Command-line debuggerStart programs or attach to running programsDisplay source program linesDisplay and change variables or memoryPlant breakpoints, watchpointsExamine stack frames
Excellent tutorial documentationhttp://www.gnu.org/s/gdb/documentation/
2410/14
cja 2014 25
Compiling for GDBDebugging is easier if you ask the compiler to generate extra source-level debugging information
Add -g flag to your compilationicc -g serialprogram.c -o serialprogramormpicc -g mpiprogram.c -o mpiprogram
GDB will work without symbolsNeed to be fluent in machine instructions and hexadecimal
Be careful using -O with -g
Some compilers won't optimize code when debugging
Most will, but you sometimes won't recognize the resulting source code at optimization level -O2 and higher
Use -O0 -g to suppress optimization
2510/14
cja 2014 26
Running GDBTwo ways to invoke GDB:
Debugging a serial program:gdb ./serialprogram
Debugging an MPI program:mpirun -np N xterm -e gdb ./mpiprogram
This gives you N separate GDB sessions, each debugging one rank of the program
Remember to use the -X or -Y option to ssh when connecting to Flux, or you can't start xterms there
10/14
cja 2014 27
Useful GDB commandsgdb exec start gdb on executable execgdb exec core start gdb on executable exec with core file corel [m,n] list sourcedisas disassemble function enclosing current instructiondisas func disassemble function funcb func set breakpoint at entry to funcb line# set breakpoint at source line#b *0xaddr set breakpoint at address addri b show breakpointsd bp# delete beakpoint bp#r [args] run program with optional argsbt show stack backtracec continue execution from breakpointstep single-step one source linenext single-step, don't step into functionstepi single-step one instructionp var display contents of variable varp *var display value pointed to by varp &var display address of varp arr[idx] display element idx of array arrx 0xaddr display hex word at addrx *0xaddr display hex word pointed to by addrx/20x 0xaddr display 20 words in hex starting at addri r display registersi r ebp display register ebpset var = expression set variable var to expressionq quit gdb
10/14
cja 2014 28
Debugging with DDTAllinea's Distributed Debugging Tool is a comprehensive graphical debugger designed for the complex task of debugging parallel code
Advantages includeProvides GUI interface to debugging
Similar capabilities as, e.g., Eclipse or Visual Studio
Supports parallel debugging of MPI programsScales much better than GDB
10/14
cja 2014 29
Running DDTCompile with -g:mpicc -g mpiprogram.c -o mpiprogram
Load the DDT module:module load ddt
Start DDT:ddt mpiprogram
This starts a DDT session, debugging all ranks concurrently
Remember to use the -X or -Y option to ssh when connecting to Flux, or you can't start ddt there
http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/
http://content.allinea.com/downloads/userguide.pdf
10/14
cja 2014 30
Application Profiling with MAP
Allinea's MAP Tool is a statistical application profiler designed for the complex task of profiling parallel code
Advantages includeProvides GUI interface to profiling
Observe cumulative results, drill down for details
Supports parallel profiling of MPI programs
Handles most of the details under the covers
10/14
cja 2014 31
Running MAPCompile with -g:mpicc -g mpiprogram.c -o mpiprogram
Load the MAP module:module load ddt
Start MAP:ddt mpiprogram
This starts a MAP sessionRuns your program, gathers profile data, displays summary statistics
Remember to use the -X or -Y option to ssh when connecting to Flux, or you can't start ddt there
http://content.allinea.com/downloads/userguide.pdf
10/14
cja 2014 32
Resourceshttp://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/
U-M Advanced Research Computing Flux pages
http://arc.research.umich.edu/resources-services/flux/U-M Advanced Research Computing Flux pages
http://cac.engin.umich.edu/CAEN HPC Flux pages
http://www.youtube.com/user/UMCoECACCAEN HPC YouTube channel
For assistance: flux-support@umich.eduRead by a team of people including unit support staffCannot help with programming questions, but can help with operational Flux and basic usage questions
10/14
cja 2014 33
References1. Supported Flux software,
http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/, (accessed June 2014)
2. Free Software Foundation, Inc., "GDB User Manual," http://www.gnu.org/s/gdb/documentation/ (accessed June 2014).
3. Intel C and C++ Compiler 14 User and Reference Guide, https://software.intel.com/en-us/compiler_14.0_ug_c (accessed June 2014).
4. Intel Fortran Compiler 14 User and Reference Guide,https://software.intel.com/en-us/compiler_14.0_ug_f(accessed June 2014).
5. Torque Administrator's Guide, http://docs.adaptivecomputing.com/torque/4-2-8/torqueAdminGuide-4.2.8.pdf (accessed June 2014).
6. Submitting GPGPU Jobs, https://sites.google.com/a/umich.edu/engin-cac/resources/systems/flux/gpgpus (accessed June 2014).
7. http://content.allinea.com/downloads/userguide.pdf (accessed October 2014)
10/14
top related