high performance computing workshop (statistics) hpc 101 dr. charles j antonelli lsait ars january,...

75
High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

Upload: dwayne-freeman

Post on 19-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

High PerformanceComputing Workshop

(Statistics)HPC 101

Dr. Charles J AntonelliLSAIT ARS

January, 2013

Page 2: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 2

CreditsContributors:

Brock Palen (CoE-IT CAC)

Jeremy Hallum (MSIS)

Tony Markel (MSIS)

Bennet Fauber (CoE-IT CAC)

LSAIT ARS

UM CoE-IT CAC

1/13

Page 3: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 3

Roadmap

Flux Mechanics

High Performance Computing

Flux Architecture

Flux Batch Operations

Introduction to Scheduling

1/13

Page 4: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 4

Flux Mechanics

1/13

Page 5: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 5

Using Flux

Three basic requirements to use Flux:

1. A Flux account2. A Flux allocation3. An MToken (or a Software Token)

1/13

Page 6: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 6

Using Flux1. A Flux account

Allows login to the Flux login nodes

Develop, compile, and test code

Available to members of U-M community, free

Get an account by visiting https://www.engin.umich.edu/form/cacaccountapplication

1/13

Page 7: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 7

Using Flux2. A Flux allocation

Allows you to run jobs on the compute nodes

Current rates: $18 per core-month for Standard Flux

$24.35 per core-month for BigMem Flux

$8 subsidy per core month for LSA and Engineering

Details at http://www.engin.umich.edu/caen/hpc/planning/costing.html

To inquire about Flux allocations please email [email protected]

1/13

Page 8: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 8

Using Flux3. An MToken (or a Software Token)

Required for access to the login nodesImproves cluster security by requiring a second means of proving your identity

You can use either an MToken or an application for your mobile device (called a Software Token) for this

Information on obtaining and using these tokens at http://cac.engin.umich.edu/resources/loginnodes/twofactor.html

1/13

Page 9: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 9

Logging in to Fluxssh flux-login.engin.umich.edu

MToken (or Software Token) required

You will be randomly connected a Flux login nodeCurrently flux-login1 or flux-login2

Firewalls restrict access to flux-login.To connect successfully, either

Physically connect your ssh client platform to the U-M campus wired network, or

Use VPN software on your client platform, or

Use ssh to login to an ITS login node, and ssh to flux-login from there

1/13

Page 10: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 10

ModulesThe module command allows you to specify what versions of software you want to usemodule list -- Show loaded modulesmodule load name -- Load module name for usemodule avail -- Show all available modulesmodule avail name -- Show versions of module name*module unload name -- Unload module namemodule -- List all optionsEnter these commands at any time during your sessionA configuration file allows default module commands to be executed at login

Put module commands in file ~/privatemodules/defaultDon’t put module commands in your .bashrc / .bash_profile

1/13

Page 11: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 11

Flux environment

The Flux login nodes have the standard GNU/Linux toolkit:

make, autoconf, awk, sed, perl, python, java, emacs, vi, nano, …

Watch out for source code or data files written on non-Linux systems

Use these tools to analyze and convert source files to Linux formatfile

dos2unix, mac2unix1/13

Page 12: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 12

Lab 1Task: Invoke R interactively on the login node

module load Rmodule list

Rq()

Please run only very small computations on the Flux login nodes, e.g., for testing

1/13

Page 13: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 13

Lab 2Task: Run R in batch mode

module load R

Copy sample code to your login directorycdcp ~cja/stats-sample-code.tar.gz .tar -zxvf stats-sample-code.tar.gzcd ./stats-sample-code

Examine lab2.pbs and lab2.R

Edit lab2.pbs with your favorite Linux editorChange #PBS -M email address to your own

1/13

Page 14: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 14

Lab 2Task: Run R in batch mode

Submit your job to Fluxqsub lab2.pbs

Watch the progress of your jobqstat -u uniqname

where uniqname is your own uniqname

When complete, look at the job’s outputless lab2.out

1/13

Page 15: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 15

Lab 3Task: Use the multicore package in R

The multicore package allows you to use multiple cores on a single node

module load Rcd ~/stats-sample-code

Examine lab3.pbs and lab3.R

Edit lab3.pbs with your favorite Linux editorChange #PBS -M email address to your own

1/13

Page 16: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 16

Lab 3Task: Use the multicore package in R

Submit your job to Fluxqsub lab3.pbs

Watch the progress of your jobqstat -u uniqname

where uniqname is your own uniqname

When complete, look at the job’s outputless lab3.out

1/13

Page 17: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 17

Lab 4Task: Another multicore example in R

module load Rcd ~/stats-sample-code

Examine lab4.pbs and lab4.R

Edit lab4.pbs with your favorite Linux editorChange #PBS -M email address to your own

1/13

Page 18: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 18

Lab 4Task: Another multicore example in R

Submit your job to Fluxqsub lab4.pbs

Watch the progress of your jobqstat -u uniqname

where uniqname is your own uniqname

When complete, look at the job’s outputless lab4.out

1/13

Page 19: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 19

Lab 5Task: Run snow interactively in R

The snow package allows you to use cores on multiple nodes

module load Rcd ~/stats-sample-code

Examine lab5.R

Start an interactive PBS sessionqsub -I -V -l procs=3 -l walltime=30:00 -A stats_flux -l qos=flux -q flux

1/13

Page 20: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 20

Lab 5Task: Run snow interactively in R

cd $PBS_O_WORKDIR

Run snow in the interactive PBS sessionR CMD BATCH --vanilla lab5.R lab5.out… ignore any “Connection to lifeline lost” message

1/13

Page 21: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 21

Lab 6Task: Run snowfall in R

The snowfall package is similar to snow, and allows you to change the number of cores used without modifying your R code

module load Rcd ~/stats-sample-code

Examine lab6.pbs and lab6.R

Edit lab6.pbs with your favorite Linux editorChange #PBS -M email address to your own

1/13

Page 22: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 22

Lab 6Task: Run snowfall in R

Submit your job to Fluxqsub lab6.pbs

Watch the progress of your jobqstat -u uniqname

where uniqname is your own uniqname

When complete, look at the job’s outputless lab6.out

1/13

Page 23: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 23

Lab 7Task: Run parallel MATLAB

Distribute parfor iterations over multiple cores on multiple nodes

Do this once:mkdir ~/matlab/cd ~/matlabwget http://cac.engin.umich.edu/resources/software/matlabdct/mpiLibConf.m

1/13

Page 24: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 24

Lab 7Task: Run parallel MATLAB

Start an interactive PBS sessionmodule load matlabqsub -I -V -l nodes=2:ppn=3 -l walltime=30:00 -A stats_flux -l qos=flux -q flux

Start MATLABmatlab -nodisplay

1/13

Page 25: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 25

Lab 7Task: Run parallel MATLAB

Set up a matlabpoolsched = findResource('scheduler', 'type', 'mpiexec') set(sched, 'MpiexecFileName', '/home/software/rhel6/mpiexec/bin/mpiexec') set(sched, 'EnvironmentSetMethod', 'setenv') %use the 'sched' object when calling matlabpool %the syntax for matlabpool must use the (sched, N) format matlabpool (sched, 6)

… ignore “Found pre-existing parallel job(s)” warnings

1/13

Page 26: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 26

Lab 7Task: Run parallel MATLAB

Run a simple parforticx=0; parfor i=1:100000000x=x+i; end toc

Close the matlabpoolmatlabpool close

1/13

Page 27: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 27

Compiling CodeAssuming default module settings

Use mpicc/mpiCC/mpif90 for MPI code

Use icc/icpc/ifort with -mp for OpenMP code

Serial code, Fortran 90:ifort -O3 -ipo -no-prec-div –xHost -o prog prog.f90

Serial code, C:icc -O3 -ipo -no-prec-div –xHost –o prog prog.cMPI parallel code:mpicc -O3 -ipo -no-prec-div –xHost -o prog prog.cmpirun -np 2 ./prog

1/13

Page 28: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 28

LabTask: compile and execute simple programs on the Flux login node

Copy sample code to your login directory:cdcp ~brockp/cac-intro-code.tar.gz .tar -xvzf cac-intro-code.tar.gzcd ./cac-intro-code

Examine, compile & execute helloworld.f90:ifort -O3 -ipo -no-prec-div -xHost -o f90hello helloworld.f90./f90hello

Examine, compile & execute helloworld.c:icc -O3 -ipo -no-prec-div -xHost -o chello helloworld.c./chello

Examine, compile & execute MPI parallel code:mpicc -O3 -ipo -no-prec-div -xHost -o c_ex01 c_ex01.c… ignore the “feupdateenv is not implemented and will always fail” warningmpirun -np 2 ./c_ex01… ignore runtime complaints about missing NICs

1/13

Page 29: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 29

MakefilesThe make command automates your code compilation processUses a makefile to specify dependencies between source and object filesThe sample directory contains a sample makefileTo compile c_ex01:make c_ex01To compile all programs in the directorymakeTo remove all compiled programsmake cleanTo make all the programs using 8 compiles in parallel make -j8

1/13

Page 30: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

30

High Performance Computing

1/13cja 2013

Page 31: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 31

Advantages of HPC

Cheaper than the mainframe

More scalable than your laptop

Buy or rent only what you need

COTS hardware

COTS software

COTS expertise

1/13

Page 32: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 32

Disadvantages of HPC

Serial applications

Tightly-coupled applications

Truly massive I/O or memory requirements

Difficulty/impossibility of porting software

No COTS expertise

1/13

Page 33: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 33

Programming Models

Two basic parallel programming modelsMessage-passingThe application consists of several processes running on different nodes and communicating with each other over the network

Used when the data are too large to fit on a single node, and simple synchronization is adequate

“Coarse parallelism”

Implemented using MPI (Message Passing Interface) libraries

Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives

Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable

“Fine-grained parallelism” or “shared-memory parallelism”

Implemented using OpenMP (Open Multi-Processing) compilers and libraries

Both

1/13

Page 34: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 34

Good parallelEmbarrassingly parallel

Folding@home, RSA Challenges, password cracking, …

Regular structuresDivide&conquer, e.g. Quicksort

Pipelined: N-body problems, matrix multiplication

O(n2) -> O(n)

1/13

Page 35: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 35

Less good parallelSerial algorithms

Those that don’t parallelize easily

Irregular data & communications structuresE.g., surface/subsurface water hydrology modeling

Tightly-coupled algorithms

Unbalanced algorithmsMaster/worker algorithms, where the worker load is uneven

1/13

Page 36: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 36

Amdahl’s Law

If you enhance a fraction f of a computationby a speedup S, the overall speedup is:

1/13

Page 37: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 37

Amdahl’s Law

1/13

Page 38: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 38

Flux Architecture

1/13

Page 39: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 39

The Flux clusterLogin nodes Compute nodes

Storage…

Data transfernode

1/13

Page 40: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 40

Behind the curtainLogin nodes Compute nodes

Storage

nyx

flux

…shared

Data transfernode

1/13

Page 41: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 41

A Flux node

12 Intel cores

48 GB RAM

Local disk

Ethernet InfiniBand

1/13

Page 42: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 42

A Newer Flux node

16 Intel cores

64 GB RAM

Local disk

Ethernet InfiniBand

1/13

Page 43: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 43

A Flux BigMem node

1 TB RAM

Local disk

Ethernet InfiniBand

1/13

40 Intel cores

Page 44: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 44

Flux hardware(January 2012)

8,016 Intel cores 200 Intel BigMem cores632 Flux nodes 5 Flux BigMem nodes

48 GB RAM/node 1 TB RAM/ BigMem node4 GB RAM/core (average) 25 GB RAM/BigMem core

4X Infiniband network (interconnects all nodes)40 Gbps, <2 us latency

Latency an order of magnitude less than Ethernet

Lustre Filesystem

Scalable, high-performance, open

Supports MPI-IO for MPI jobs

Mounted on all login and compute nodes 1/13

Page 45: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 45

Flux softwareDefault Software:

Intel Compilers with OpenMPI for Fortran and C

Optional software:PGI Compilers

Unix/GNU tools

gcc/g++/gfortran

Licensed software:Abacus, ANSYS, Mathematica, Matlab, R, STATA SE, …

See http://cac.engin.umich.edu/resources/software/index.html

You can choose software using the module command

1/13

Page 46: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 46

Flux networkAll Flux nodes are interconnected via Infiniband and a campus-wide private Ethernet network

The Flux login nodes are also connected to the campus backbone network

The Flux data transfer node will soon be connected over a 10 Gbps connection to the campus backbone network

This meansThe Flux login nodes can access the Internet

The Flux compute nodes cannot

If Infiniband is not available for a compute node, code on that node will fall back to Ethernet communications

1/13

Page 47: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 47

Flux dataLustre filesystem mounted on /scratch on all login, compute, and transfer nodes

342TB of short-term storage for batch jobs

Large, fast, short-term

NFS filesystems mounted on /home and /home2 on all nodes

40 GB of storage per user for development & testing

Small, slow, long-term

1/13

Page 48: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 48

Flux dataFlux does not provide large, long-term storage

Alternatives:ITS Value Storage

Departmental server

CAEN can mount your storage on the login nodes

Issue df –kh command on a login node to see what other groups have mounted

1/13

Page 49: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 49

Globus OnlineFeatures

High-speed data transfer, much faster than SCP or SFTP

Reliable & persistent

Minimal client software: Mac OS X, Linux, Windows

GridFTP EndpointsGateways through which data flow

Exist for XSEDE, OSG, …

UMich: umich#flux, umich#nyx

Add your own server endpoint: contact flux-support

Add your own client endpoint!

More informationhttp://cac.engin.umich.edu/resources/loginnodes/globus.html 1/13

Page 50: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 50

Flux Batch Operations

1/13

Page 51: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 51

Portable Batch System

All production runs are run on the compute nodes using the Portable Batch System (PBS)

PBS manages all aspects of cluster job execution except job scheduling

Flux uses the Torque implementation of PBS

Flux uses the Moab scheduler for job scheduling

Torque and Moab work together to control access to the compute nodes

PBS puts jobs into queuesFlux has a single queue, named flux

1/13

Page 52: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 52

Cluster workflowYou create a batch script and submit it to PBS

PBS schedules your job, and it enters the flux queue

When its turn arrives, your job will execute the batch script

Your script has access to any applications or data stored on the Flux cluster

When your job completes, anything it sent to standard output and error are saved and returned to you

You can check on the status of your job at any time, or delete it if it’s not doing what you want

A short time after your job completes, it disappears

1/13

Page 53: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 53

Sample serial script#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l procs=1,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cd $PBS_O_WORKDIR./f90hello

1/13

Page 54: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 54

Sample batch script#PBS -N yourjobname#PBS -V#PBS -A youralloc_flux#PBS -l qos=flux#PBS -q flux#PBS –l procs=16,walltime=00:05:00#PBS -M youremailaddress#PBS -m abe#PBS -j oe

#Your Code Goes Below:cat $PBS_NODEFILEcd $PBS_O_WORKDIRmpirun ./c_ex01

Lists the node(s) your job ran on

No need to specify -npChange to submission directory

1/13

Page 55: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 55

Basic batch commands

Once you have a script, submit it:qsub scriptfile

$ qsub singlenode.pbs6023521.nyx.engin.umich.edu

You can check on the job status:qstat jobid$ qstat 6023521nyx.engin.umich.edu: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----6023521.nyx.engi cja flux hpc101i -- 1 1 -- 00:05 Q --

To delete your jobqdel jobid

$ qdel 6023521$

1/13

Page 56: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 56

LabTask: Run an MPI job on 8 cores

Compile c_ex05cd ~/cac-intro-codemake c_ex05

Edit file run with your favorite Linux editorChange #PBS –M address to your own

I don’t want Brock to get your email!

Change #PBS –A allocation to stats_flux, or to your own allocation, if desired

Change #PBS –l allocation to flux

Submit your jobqsub run

1/13

Page 57: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 57

PBS attributesAs always, man qsub is your friend

-N : sets the job name, can’t start with a number-V : copy shell environment to compute node-A youralloc_flux: sets the allocation you are using-l qos=flux: sets the quality of service parameter-q flux: sets the queue you are submitting to-l : requests resources, like number of cores or nodes-M : whom to email, can be multiple addresses-m : when to email: a=job abort, b=job begin, e=job end-j oe: join STDOUT and STDERR to a common file

-I : allow interactive use-X : allow X GUI use

1/13

Page 58: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 58

PBS resources (1)A resource (-l) can specify:

Request wallclock (that is, running) time-l walltime=HH:MM:SS

Request C MB of memory per core-l pmem=Cmb

Request T MB of memory for entire job-l mem=Tmb

Request M cores on arbitrary node(s)-l procs=M

Request a token to use licensed software-l gres=stata:1-l gres=matlab-l gres=matlab%Communication_toolbox

1/13

Page 59: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 59

PBS resources (2)A resource (-l) can specify:

For multithreaded code:Request M nodes with at least N cores per node-l nodes=M:ppn=N

Request M nodes with exactly N cores per node-l nodes=M:tpn=N(you’ll only use this for specific algorithms)

1/13

Page 60: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 60

Interactive jobsYou can submit jobs interactively:

qsub -I -V -l procs=2 -l walltime=15:00 -A youralloc_flux -l qos=flux –q flux

This queues a job as usualYour terminal session will be blocked until the job runs

When it runs, you will be connected to one of your nodes

Invoked serial commands will run on that node

Invoked parallel commands (e.g., via mpirun) will run on all of your nodes

When you exit the terminal session your job is deleted

Interactive jobs allow you toTest your code on cluster node(s)

Execute GUI tools on a cluster node with output on your local platform’s X server

Utilize a parallel debugger interactively 1/13

Page 61: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 61

LabTask: Run an interactive job

Enter this command (all on one line):qsub -I -V -l procs=2 -l walltime=15:00 -A FluxTraining_flux -l qos=flux -q flux

When your job starts, you’ll get an interactive shell

Copy and paste the batch commands from the “run” file, one at a time, into this shell

Experiment with other commands

After fifteen minutes, your interactive shell will be killed

1/13

Page 62: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 62

Introduction to Scheduling

1/13

Page 63: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 63

The Scheduler (1/3)

Flux scheduling policies:The job’s queue determines the set of nodes you run on

The job’s account and qos determine the allocation to be charged

If you specify an inactive allocation, your job will never run

The job’s resource requirements help determine when the job becomes eligible to run

If you ask for unavailable resources, your job will wait until they become free

There is no pre-emption

1/13

Page 64: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 64

The Scheduler (2/3)Flux scheduling policies:

If there is competition for resources among eligible jobs in the allocation or in the cluster, two things help determine when you run:

How long you have waited for the resource

How much of the resource you have used so far

This is called “fairshare”

The scheduler will reserve nodes for a job with sufficient priority

This is intended to prevent starving jobs with large resource requirements

1/13

Page 65: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 65

The Scheduler (3/3)Flux scheduling policies:

If there is room for shorter jobs in the gaps of the schedule, the scheduler will fit smaller jobs in those gaps

This is called “backfill”

Core

sTime

1/13

Page 66: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 66

Gaining insightThere are several commands you can run to get some insight over the scheduler’s actions:

freenodes : shows the number of free nodes and cores currently available

showq : shows the state of the queue (like qstat -a), except shows running jobs in order of finishing

mdiag –p -t flux : shows the factors used in computing job priority

checkjob jobid : Can show why your job might not be starting

showstart –e all : Gives you a coarse estimate of job start time; use the smallest value returned

1/13

Page 67: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 67

More advanced scheduling

Job Arrays

Dependent Scheduling

671/13

Page 68: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 68

Job Arrays• Submit copies of identical jobs• Invoked via qsub –t:

qsub –t array-spec pbsbatch.txt

Where array-spec can be

m-n

a,b,c

m-n%slotlimit

e.g.

qsub –t 1-50%10 Fifty jobs, numbered 1 through 50,

only ten can run simultaneously

• $PBS_ARRAYID records array identifier

681/13

Page 69: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 69

Dependent scheduling

• Submit jobs whose execution scheduling depends on other jobs

• Invoked via qsub –W:qsub -W depend=type:jobid[:jobid]…

Where depend can be

after Schedule after jobids have started

afterok Schedule after jobids have finished, only if no errors

afternotok Schedule after jobids have finished, only if errors

afterany Schedule after jobids have finished, regardless of status

before,beforeok,beforenotok,beforeany691/13

Page 70: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 70

Dependent scheduling

Where depend can be (cont’t)

before When this job has started, jobids will be scheduled

beforeok After this job completes without errors, jobids will be scheduled

beforenotok After this job completes without errors, jobids will be scheduled

afterany After this job completes, regardless of status, jobids will be scheduled

701/13

Page 71: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 71

Flux On-Demand Pilot

Alternative to a static allocationPay only for the core time you use

ProsAccommodates “bursty” usage patterns

ConsLimit of 50 cores totalLimit of 25 cores for any user

FoD pilot has ended FoD pilot users continue to be supportedFoD service is being defined

To inquire about FoD allocations please [email protected]

1/13

Page 72: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 72

Flux Resources

http://www.youtube.com/user/UMCoECACUMCoECAC’s YouTube channel

http://orci.research.umich.edu/resources-services/flux/U-M Office of Research Cyberinfrastructure Flux summary page

http://cac.engin.umich.edu/Getting an account, basic overview (use menu on left to drill down)

http://cac.engin.umich.edu/startedHow to get started at the CAC, plus cluster news, RSS feed and outages

http://www.engin.umich.edu/caen/hpcXSEDE information, Flux in grant applications, startup & retention offers

http://cac.engin.umich.edu/ Resources | Systems | Flux | PBS

Detailed PBS information for Flux use

For assistance: [email protected]

Read by a team of people

Cannot help with programming questions, but can help with operational Flux and basic usage questions

1/13

Page 73: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 73

SummaryThe Flux cluster is just a collection of similar Linux machines connected together to run your code, much faster than your desktop can

Command-line scripts are queued by a batch system and executed when resources become available

Some important commands are

qsubqstat -u usernameqdel jobidcheckjob

Develop and test, then submit your jobs in bulk and let the scheduler optimize their execution

1/13

Page 74: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 74

Any Questions?Charles J. AntonelliLSAIT Research Systems [email protected]://www.umich.edu/~cja734 763 0607

1/13

Page 75: High Performance Computing Workshop (Statistics) HPC 101 Dr. Charles J Antonelli LSAIT ARS January, 2013

cja 2013 75

References1. http://cac.engin.umich.edu/resources/software/R.html2. http://cac.engin.umich.edu/resources/software/snow.html3. http://cac.engin.umich.edu/resources/software/matlab.html4. CAC supported Flux software, http://cac.engin.umich.edu/resources/software/index.html, (accessed

August 2011)5. J. L. Gustafson, “Reevaluating Amdahl’s Law,” chapter for book, Supercomputers and Artificial

Intelligence, edited by Kai Hwang, 1988. http://www.scl.ameslab.gov/Publications/Gus/AmdahlsLaw/Amdahls.html (accessed November 2011).

6. Mark D. Hill and Michael R. Marty, “Amdahl’s Law in the Multicore Era,” IEEE Computer, vol. 41, no. 7, pp. 33-38, July 2008. http://research.cs.wisc.edu/multifacet/papers/ieeecomputer08_amdahl_multicore.pdf (accessed November 2011).

7. InfiniBand, http://en.wikipedia.org/wiki/InfiniBand (accessed August 2011).8. Intel C and C++ Compiler 1.1 User and Reference Guide,

http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/index.htm (accessed August 2011).

9. Intel Fortran Compiler 11.1 User and Reference Guide,http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/index.htm (accessed August 2011).

10. Lustre file system, http://wiki.lustre.org/index.php/Main_Page (accessed August 2011).11. Torque User’s Manual, http://www.clusterresources.com/torquedocs21/usersmanual.shtml (accessed

August 2011).12. Jurg van Vliet & Flvia Paginelli, Programming Amazon EC2,’Reilly Media, 2011. ISBN 978-1-449-

39368-7.

1/13