advanced high performance computing workshop hpc 201

32
Advanced High Performance Computing Workshop HPC 201 Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS September, 2014

Upload: carla-wynn

Post on 30-Dec-2015

48 views

Category:

Documents


1 download

DESCRIPTION

Advanced High Performance Computing Workshop HPC 201. Charles J Antonelli Mark Champe Seth Meyer LSAIT ARS September, 2014. Roadmap. Flux review Globus Connect Advanced PBS Array & dependent scheduling Tools GPUs on Flux Scientific applications R, Python, MATLAB - PowerPoint PPT Presentation

TRANSCRIPT

Advanced High Performance

Computing WorkshopHPC 201

Charles J AntonelliMark Champe

Seth MeyerLSAIT ARS

September, 2014

cja 2014 2

RoadmapFlux review

Globus Connect

Advanced PBSArray & dependent scheduling

Tools

GPUs on Flux

Scientific applicationsR, Python, MATLAB

Parallel programming

Debugging & profiling

9/14

cja 2014 3

Flux review

9/14

cja 2014 4

The Flux clusterLogin nodes Compute nodes

Storage…

Data transfernode

9/14

cja 2014 5

A Flux node

12-40 Intel cores

48 GB – 1 TB RAM

Local disk

9/14

8 GPUs (GPU Flux)

Each GPU contains 2,688 GPU cores

cja 2014 6

Programming Models

Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network

Used when the data are too large to fit on a single node, and simple synchronization is adequate

“Coarse parallelism” or “SPMD”

Implemented using MPI (Message Passing Interface) libraries

Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives

Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable

“Fine-grained parallelism” or “shared-memory parallelism”

Implemented using OpenMP (Open Multi-Processing) compilers and libraries

Both

9/14

cja 2014 7

Using Flux

Three basic requirements:A Flux login accountA Flux allocationAn MToken (or a Software Token)

Logging in to Fluxssh flux-login.engin.umich.eduCampus wired or MWirelessOtherwise:

VPN

ssh login.itd.umich.edu first

9/14

cja 2014 8

Cluster batch workflow

You create a batch script and submit it to PBS

PBS schedules your job, and it enters the flux queue

When its turn arrives, your job will execute the batch script

Your script has access to all Flux applications and data

When your script completes, anything it sent to standard output and error are saved in files stored in your submission directory

You can ask that email be sent to you when your jobs starts, ends, or aborts

You can check on the status of your job at any time,or delete it if it’s not doing what you want

A short time after your job completes, it disappears from PBS

9/14

cja 2014 9

Loosely-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l procs=12,pmem=1gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmpirun ./c_ex01

9/14

cja 2014 10

Tightly-coupled batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:ppn=12,mem=4gb,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cd $PBS_O_WORKDIRmatlab -nodisplay -r script

9/14

cja 2014 11

GPU batch script

#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l nodes=1:gpus=1,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmatlab -nodisplay -r gpuscript

9/14

cja 2014 12

Copying dataThree ways to copy data to/from Flux

Use scp from login server:scp flux-login.engin.umich.edu:hpc201cja/example563.png .

Use scp from transfer host:scp flux-xfer.engin.umich.edu:hpc201cja/example563.png .

Use Globus Online

9/14

cja 2014 13

Globus OnlineFeatures

High-speed data transfer, much faster than SCP or SFTP

Reliable & persistent

Minimal client software: Mac OS X, Linux, Windows

GridFTP EndpointsGateways through which data flow

Exist for XSEDE, OSG, …

UMich: umich#flux, umich#nyx

Add your own client endpoint!

Add your own server endpoint: contact [email protected]

More informationhttp://cac.engin.umich.edu/resources/login-nodes/globus-gridftp 9/14

cja 2014 14

Job Arrays• Submit copies of identical jobs• Invoked via qsub –t:

qsub –t array-spec pbsbatch.txt

Where array-spec can be

m-n

a,b,c

m-n%slotlimit

e.g.

qsub –t 1-50%10 Fifty jobs, numbered 1 through 50,

only ten can run simultaneously

• $PBS_ARRAYID records array identifier

149/14

cja 2014 15

Dependent scheduling

• Submit job to become eligible for execution at a given time

• Invoked via qsub –a:qsub –a [[[[CC]YY]MM]DD]hhmm[.SS] …

qsub –a 201412312359 j1.pbs j1.pbs becomes eligible one minute before New Year’s Day 2015

qsub -a 1800 j2.pbsj2.pbs becomes eligible at six PM today (or tomorrow, if submitted after six PM)

159/14

cja 2014 16

Dependent scheduling

• Submit job to run after specified job(s) • Invoked via qsub –W:

qsub -W depend=type:jobid[:jobid]…

Where depend can beafter Schedule this job after jobids have startedafterany Schedule this job after jobids have finishedafterok Schedule this job after jobids have finished with no errorsafternotok Schedule this job after jobids have finished with errors

qsub first.pbs # assume receives jobid 12345qsub –W afterany:12345 second.pbsSchedule second.pbs after first.pbs completes

169/14

cja 2014 17

Dependent scheduling

• Submit job to run before specified job(s)• Requires dependent jobs to be scheduled first• Invoked via qsub –W:

qsub -W depend=type:jobid[:jobid]…

Where depend can bebefore jobids scheduled after this job startsbeforeanyjobids scheduled after this job completesbeforeok jobids scheduled after this job completes with no errorsbeforenotok jobids scheduled after this job completes with errorson:N wait for N job completions

qsub –W on:1 second.pbs # assume receives jobid 12345qsub –W beforeany:12345 first.pbsSchedule second.pbs after first.pbs completes

179/14

cja 2014 18

Troubleshootingshowq [-r][-i][-b][-w user=uniq] # running/idle/blocked jobs

qstat -f jobno # full info incl gpu

qstat -n jobno # nodes/cores where job running

diagnose -p # job prio and components

pbsnodes # nodes, states, properties

pbsnodes -l # list nodes marked down

checkjob [-v] jobno # why job jobno not running

mdiag -a # allocs & users (flux)

freenodes # aggregate node/core busy/free

mdiag -u uniq # allocs for uniq (flux)

mdiag -a alloc_flux # cores active, alloc (flux)

9/14

cja 2014 19

Scientific applications

9/14

cja 2014 20

Scientific Applications

R (incl snow and multicore)

R with GPU (GpuLm, dist)

SAS, Stata

Python, SciPy, NumPy, BioPy

MATLAB with GPU

CUDA Overview

CUDA C (matrix multiply)

9/14

cja 2014 21

PythonPython software available on Flux

EPDThe Enthought Python Distribution provides scientists with a comprehensive set of tools to perform rigorous data analysis and visualization.https://www.enthought.com/products/epd/

biopythonPython tools for computational molecular biologyhttp://biopython.org/wiki/Main_Page

numpyFundamental package for scientific computinghttp://www.numpy.org/

scipyPython-based ecosystem of open-source software for mathematics, science, and engineeringhttp://www.scipy.org/

9/14

cja 2014 22

Debugging & profiling

9/14

cja 2014 23

Debugging with GDB

Command-line debuggerStart programs or attach to running programsDisplay source program linesDisplay and change variables or memoryPlant breakpoints, watchpointsExamine stack frames

Excellent tutorial documentationhttp://www.gnu.org/s/gdb/documentation/

239/14

cja 2014 24

Compiling for GDBDebugging is easier if you ask the compiler to generate extra source-level debugging information

Add –g flag to your compilationicc –g serialprogram.c –o serialprogramormpicc –g mpiprogram.c –o mpiprogram

GDB will work without symbolsNeed to be fluent in machine instructions and hexadecimal

Be careful using –O with –g

Some compilers won’t optimize code when debugging

Most will, but you sometimes won’t recognize the resulting source code at optimization level -O2 and higher

Use –O0 –g to suppress optimization

249/14

cja 2014 25

Running GDBTwo ways to invoke GDB:

Debugging a serial program:gdb ./serialprogram

Debugging an MPI program:mpirun -np N xterm -e gdb ./mpiprogram

This gives you N separate GDB sessions, each debugging one rank of the program

Remember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start xterms there

9/14

cja 2014 26

Useful GDB commandsgdb exec start gdb on executable execgdb exec core start gdb on executable exec with core file corel [m,n] list sourcedisas disassemble function enclosing current instructiondisas func disassemble function funcb func set breakpoint at entry to funcb line# set breakpoint at source line#b *0xaddr set breakpoint at address addri b show breakpointsd bp# delete beakpoint bp#r [args] run program with optional argsbt show stack backtracec continue execution from breakpointstep single-step one source linenext single-step, don’t step into functionstepi single-step one instructionp var display contents of variable varp *var display value pointed to by varp &var display address of varp arr[idx] display element idx of array arrx 0xaddr display hex word at addrx *0xaddr display hex word pointed to by addrx/20x 0xaddr display 20 words in hex starting at addri r display registersi r ebp display register ebpset var = expression set variable var to expressionq quit gdb

9/14

cja 2014 27

Debugging with DDTAllinea’s Distributed Debugging Tool is a comprehensive graphical debugger designed for the complex task of debugging parallel code

Advantages includeProvides GUI interface to debugging

Similar capabilities as, e.g., Eclipse or Visual Studio

Supports parallel debugging of MPI programsScales much better than GDB

9/14

cja 2014 28

Running DDTCompile with -g:mpicc –g mpiprogram.c –o mpiprogram

Load the DDT module:module load ddt

Start DDT:ddt mpiprogram

This starts a DDT session, debugging all ranks concurrently

Remember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start ddt there

http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/

http://content.allinea.com/downloads/userguide.pdf

9/14

cja 2014 29

Application Profiling with MAP

Allinea’s MAP Tool is a statistical application profiler designed for the complex task of profiling parallel code

Advantages includeProvides GUI interface to profiling

Observe cumulative results, drill down for details

Supports parallel profiling of MPI programs

Handles most of the details under the covers

9/14

cja 2014 30

Running MAPCompile with -g:mpicc –g mpiprogram.c –o mpiprogram

Load the MAP module:module load ddt

Start MAP:ddt mpiprogram

This starts a MAP sessionRuns your program, gathers profile data, displays summary statistics

Remember to use the –X or –Y option to ssh when connecting to Flux, or you can’t start ddt there

http://content.allinea.com/downloads/userguide.pdf

9/14

cja 2014 31

Resourceshttp://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/

U-M Advanced Research Computing Flux pages

http://arc.research.umich.edu/resources-services/flux/U-M Advanced Research Computing Flux pages

http://cac.engin.umich.edu/CAEN HPC Flux pages

http://www.youtube.com/user/UMCoECACCAEN HPC YouTube channel

For assistance: [email protected] by a team of people including unit support staffCannot help with programming questions, but can help with operational Flux and basic usage questions

9/14

cja 2014 32

References1. Supported Flux software,

http://arc.research.umich.edu/flux-and-other-hpc-resources/flux/software-library/, (accessed June 2014)

2. Free Software Foundation, Inc., “GDB User Manual,” http://www.gnu.org/s/gdb/documentation/ (accessed June 2014).

3. Intel C and C++ Compiler 14 User and Reference Guide, https://software.intel.com/en-us/compiler_14.0_ug_c (accessed June 2014).

4. Intel Fortran Compiler 14 User and Reference Guide,https://software.intel.com/en-us/compiler_14.0_ug_f(accessed June 2014).

5. Torque Administrator’s Guide, http://docs.adaptivecomputing.com/torque/4-2-8/torqueAdminGuide-4.2.8.pdf (accessed June 2014).

6. Submitting GPGPU Jobs, https://sites.google.com/a/umich.edu/engin-cac/resources/systems/flux/gpgpus (accessed June 2014).

9/14