advanced high performance computing workshop hpc 201 charles j antonelli mark champe seth meyer...

33
Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

Upload: landen-ober

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

Advanced High Performance

Computing WorkshopHPC 201

Charles J AntonelliMark Champe

Seth MeyerLSAIT ARS

October, 2014

Page 2: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 2

RoadmapFlux review

Globus Connect

Advanced PBSArray & dependent scheduling

Tools

GPUs on Flux

Scientific applicationsR, Python, MATLAB

Parallel programming

Debugging & profiling

10/14

Page 3: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 3

Flux review

10/14

Page 4: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 4

The Flux clusterLogin nodes Compute nodes

Storage…

Data transfernode

10/14

Page 5: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 5

A Flux node

12-40 Intel cores

48 GB - 1 TB RAM

Local disk

10/14

8 GPUs (GPU Flux)

Each GPU contains 2,688 GPU cores

Page 6: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 6

Programming Models

Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network

Used when the data are too large to fit on a single node, and simple synchronization is adequate

"Coarse parallelism" or "SPMD"

Implemented using MPI (Message Passing Interface) libraries

Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives

Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable

"Fine-grained parallelism" or "shared-memory parallelism"

Implemented using OpenMP (Open Multi-Processing) compilers and libraries

Both

10/14

Page 7: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 7

Using Flux

Three basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token)

Logging in to Fluxssh flux-login.engin.umich.eduCampus wired or MWirelessOtherwise:

VPN

ssh login.itd.umich.edu first

10/14

Page 8: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 8

Cluster batch workflow

You create a batch script and submit it to PBS

PBS schedules your job, and it enters the flux queue

When its turn arrives, your job will execute the batch script

Your script has access to all Flux applications and data

When your script completes, anything it sent to standard output and error are saved in files stored in your submission directory

You can ask that email be sent to you when your jobs starts, ends, or aborts

You can check on the status of your job at any time,or delete it if it's not doing what you want

A short time after your job completes, it disappears from PBS

10/14

Page 9: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 9

Loosely-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS -l procs=12,pmem=1gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmpirun ./c_ex01

10/14

Page 10: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 10

Tightly-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS -l nodes=1:ppn=12,mem=4gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cd $PBS_O_WORKDIRmatlab -nodisplay -r script

10/14

Page 11: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 11

GPU batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS -l nodes=1:gpus=1,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmatlab -nodisplay -r gpuscript

10/14

Page 12: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

LSA IT ARS / cja © 2014 12

Copying dataFrom Linux or Mac OS X, use scp or sftp

Non-interactive (scp)scp localfile [email protected]:remotefilescp -r localdir [email protected]:remotedirscp [email protected]:remotefile localfile

Use "." as destination to copy to your Flux home directory:scp localfile [email protected]:.

... or to your Flux scratch directory:scp localfile [email protected]:/scratch/allocname/uniqname

Interactive (sftp)sftp [email protected]

From Windows, use WinSCPU-M Blue Disc: http://www.itcs.umich.edu/bluedisc/

Version ca-0.96 - 8 Oct 2014

Page 13: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 13

Globus OnlineFeatures

High-speed data transfer, much faster than SCP or SFTP

Reliable & persistent

Minimal client software: Mac OS X, Linux, Windows

GridFTP EndpointsGateways through which data flow

Exist for XSEDE, OSG, …

UMich: umich#flux, umich#nyx

Add your own client endpoint!

Add your own server endpoint: contact [email protected]

More informationhttp://cac.engin.umich.edu/resources/login-nodes/globus-gridftp 10/14

Page 14: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 14

Advanced PBS

10/14

Page 15: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 15

Job Arrays• Submit copies of identical jobs• Invoked via qsub -t:

qsub -t array-spec pbsbatch.txt

Where array-spec can be

m-n

a,b,c

m-n%slotlimit

e.g.

qsub -t 1-50%10 Fifty jobs, numbered 1 through 50,

only ten can run simultaneously

• $PBS_ARRAYID records array identifier

1510/14

Page 16: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 16

Dependent scheduling

• Submit job to become eligible for execution at a given time

• Invoked via qsub -a:qsub -a [[[[CC]YY]MM]DD]hhmm[.SS] …

qsub -a 201412312359 j1.pbs j1.pbs becomes eligible one minute before New Year's Day 2015

qsub -a 1800 j2.pbsj2.pbs becomes eligible at six PM today (or tomorrow, if submitted after six PM)

1610/14

Page 17: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 17

Dependent scheduling

• Submit job to run after specified job(s) • Invoked via qsub -W:

qsub -W depend=type:jobid[:jobid]…

Where depend can beafter Schedule this job after jobids have startedafterany Schedule this job after jobids have finishedafterok Schedule this job after jobids have finished with no errorsafternotok Schedule this job after jobids have finished with errors

qsub first.pbs # assume receives jobid 12345qsub -W afterany:12345 second.pbsSchedule second.pbs after first.pbs completes

1710/14

Page 18: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 18

Dependent scheduling

• Submit job to run before specified job(s)• Requires dependent jobs to be scheduled first• Invoked via qsub -W:

qsub -W depend=type:jobid[:jobid]…

Where depend can bebefore jobids scheduled after this job startsbeforeanyjobids scheduled after this job completesbeforeok jobids scheduled after this job completes with no errorsbeforenotok jobids scheduled after this job completes with errorson:N wait for N job completions

qsub -W on:1 second.pbs # assume receives jobid 12345qsub -W beforeany:12345 first.pbsSchedule second.pbs after first.pbs completes

1810/14

Page 19: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 19

Troubleshootingshowq [-r][-i][-b][-w user=uniq] # running/idle/blocked jobs

qstat -f jobno # full info incl gpu

qstat -n jobno # nodes/cores where job running

diagnose -p # job prio and components

pbsnodes # nodes, states, properties

pbsnodes -l # list nodes marked down

checkjob [-v] jobno # why job jobno not running

mdiag -a # allocs & users (flux)

freenodes # aggregate node/core busy/free

mdiag -u uniq # allocs for uniq (flux)

mdiag -a alloc_flux # cores active, alloc (flux)

10/14

Page 20: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 20

Scientific applications

10/14

Page 21: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 21

Scientific Applications

R (incl snow and multicore)

R with GPU (GpuLm, dist)

SAS, Stata

Python, SciPy, NumPy, BioPy

MATLAB with GPU

CUDA Overview

CUDA C (matrix multiply)

10/14

Page 22: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 22

PythonPython software available on Flux

EPDThe Enthought Python Distribution provides scientists with a comprehensive set of tools to perform rigorous data analysis and visualization.https://www.enthought.com/products/epd/

biopythonPython tools for computational molecular biologyhttp://biopython.org/wiki/Main_Page

numpyFundamental package for scientific computinghttp://www.numpy.org/

scipyPython-based ecosystem of open-source software for mathematics, science, and engineeringhttp://www.scipy.org/

10/14

Page 23: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 23

Debugging & profiling

10/14

Page 24: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 24

Debugging with GDB

Command-line debuggerStart programs or attach to running programsDisplay source program linesDisplay and change variables or memoryPlant breakpoints, watchpointsExamine stack frames

Excellent tutorial documentationhttp://www.gnu.org/s/gdb/documentation/

2410/14

Page 25: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 25

Compiling for GDBDebugging is easier if you ask the compiler to generate extra source-level debugging information

Add -g flag to your compilationicc -g serialprogram.c -o serialprogramormpicc -g mpiprogram.c -o mpiprogram

GDB will work without symbolsNeed to be fluent in machine instructions and hexadecimal

Be careful using -O with -g

Some compilers won't optimize code when debugging

Most will, but you sometimes won't recognize the resulting source code at optimization level -O2 and higher

Use -O0 -g to suppress optimization

2510/14

Page 26: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 26

Running GDBTwo ways to invoke GDB:

Debugging a serial program:gdb ./serialprogram

Debugging an MPI program:mpirun -np N xterm -e gdb ./mpiprogram

This gives you N separate GDB sessions, each debugging one rank of the program

Remember to use the -X or -Y option to ssh when connecting to Flux, or you can't start xterms there

10/14

Page 27: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 27

Useful GDB commandsgdb exec start gdb on executable execgdb exec core start gdb on executable exec with core file corel [m,n] list sourcedisas disassemble function enclosing current instructiondisas func disassemble function funcb func set breakpoint at entry to funcb line# set breakpoint at source line#b *0xaddr set breakpoint at address addri b show breakpointsd bp# delete beakpoint bp#r [args] run program with optional argsbt show stack backtracec continue execution from breakpointstep single-step one source linenext single-step, don't step into functionstepi single-step one instructionp var display contents of variable varp *var display value pointed to by varp &var display address of varp arr[idx] display element idx of array arrx 0xaddr display hex word at addrx *0xaddr display hex word pointed to by addrx/20x 0xaddr display 20 words in hex starting at addri r display registersi r ebp display register ebpset var = expression set variable var to expressionq quit gdb

10/14

Page 28: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 28

Debugging with DDTAllinea's Distributed Debugging Tool is a comprehensive graphical debugger designed for the complex task of debugging parallel code

Advantages includeProvides GUI interface to debugging

Similar capabilities as, e.g., Eclipse or Visual Studio

Supports parallel debugging of MPI programsScales much better than GDB

10/14

Page 29: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 29

Running DDTCompile with -g:mpicc -g mpiprogram.c -o mpiprogram

Load the DDT module:module load ddt

Start DDT:ddt mpiprogram

This starts a DDT session, debugging all ranks concurrently

Remember to use the -X or -Y option to ssh when connecting to Flux, or you can't start ddt there

http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/

http://content.allinea.com/downloads/userguide.pdf

10/14

Page 30: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 30

Application Profiling with MAP

Allinea's MAP Tool is a statistical application profiler designed for the complex task of profiling parallel code

Advantages includeProvides GUI interface to profiling

Observe cumulative results, drill down for details

Supports parallel profiling of MPI programs

Handles most of the details under the covers

10/14

Page 31: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 31

Running MAPCompile with -g:mpicc -g mpiprogram.c -o mpiprogram

Load the MAP module:module load ddt

Start MAP:ddt mpiprogram

This starts a MAP sessionRuns your program, gathers profile data, displays summary statistics

Remember to use the -X or -Y option to ssh when connecting to Flux, or you can't start ddt there

http://content.allinea.com/downloads/userguide.pdf

10/14

Page 32: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 32

Resourceshttp://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/

U-M Advanced Research Computing Flux pages

http://arc.research.umich.edu/resources-services/flux/U-M Advanced Research Computing Flux pages

http://cac.engin.umich.edu/CAEN HPC Flux pages

http://www.youtube.com/user/UMCoECACCAEN HPC YouTube channel

For assistance: [email protected] by a team of people including unit support staffCannot help with programming questions, but can help with operational Flux and basic usage questions

10/14

Page 33: Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS October, 2014

cja 2014 33

References1. Supported Flux software,

http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/, (accessed June 2014)

2. Free Software Foundation, Inc., "GDB User Manual," http://www.gnu.org/s/gdb/documentation/ (accessed June 2014).

3. Intel C and C++ Compiler 14 User and Reference Guide, https://software.intel.com/en-us/compiler_14.0_ug_c (accessed June 2014).

4. Intel Fortran Compiler 14 User and Reference Guide,https://software.intel.com/en-us/compiler_14.0_ug_f(accessed June 2014).

5. Torque Administrator's Guide, http://docs.adaptivecomputing.com/torque/4-2-8/torqueAdminGuide-4.2.8.pdf (accessed June 2014).

6. Submitting GPGPU Jobs, https://sites.google.com/a/umich.edu/engin-cac/resources/systems/flux/gpgpus (accessed June 2014).

7. http://content.allinea.com/downloads/userguide.pdf (accessed October 2014)

10/14