until now - uclouvain · . 5. control your job ... parallelism is obtained by launching a...
TRANSCRIPT
Until now: - access the cluster- copy data to/from the cluster- create parallel software- compile code and use optimized libraries- how to run the software on the full cluster
tl;dr:
- submit a job to the scheduler
Job scheduler/Resource manager :
Piece of software which:
● manages and allocates resources;● manages and schedules jobs;
and sets up the environment for parallel and distributed computing
Two computersare available for 10h
You go, then yougo. You wait.
Slurm
Free and open-source
Mature
Very active community
Many success stories
Runs 50% of TOP10 systems, including 1st
Also an intergalactic soft drink
You will learn how to:
Create a jobMonitor the jobs
Control your own jobGet job accounting info
with
1. Make up your mind
● resources you need;● operations you need to perform.
e.g. 1 core, 2GB RAMfor 1 hour
e.g. launch 'myprog'
Job parameters
Job steps
2. Write a submission script
It is a shellscript (Bash)
Regular Bashcomment
Bash sees these as comments
Slurm takes them as
commands
Job stepcreation
Regular Bashcommands
Other useful parameters
You want You ask
To set a job name --job-name=MyJobName
To attach a comment to the job --comment=”Some comment”
To get emails --email-type= BEGIN|END|[email protected]
To set the name of the ouptut file --output=result-%j.txt--error=error-%j.txt
To delay the start of your job --begin=16:00--begin=now+1hour--begin=2010-01-20T12:34:00
To specify an ordering of your jobs --dependency=after(ok|notok|any):jobids--dependency=singleton
To control failure options --nokill--norequeue--requeue
Constraints and resources
You want You ask
To choose a specific feature (e.g. a processor type or a NIC type)
--constraint
To use a specific resources (e.g. a gpu) --gres
To reserve a whole node for yourself --exclusive
To chose a partition --partition
So you can play
Download http://www.cism.ucl.ac.be/Services/Formations/slurm.tgz
with wget and untar it on hmem
compile the 'stress' programyou can use it to burn cputime and memory:
./stress --cpu 1 --vm-bytes 128M --timeout 30s
Write a job scriptSubmit a jobSee it runningCancel itGet it killed
A word about backfill
The rule: a job with a lower priority can start before a job with a higher priority if it does not delay that job's start time.
resources
time
60
100
80
70
10
Low priority job has short max run time and less requirements ; it starts before larger priority job
job's priorityjob
4. Monitor your job
● squeue● sprio● sstat
● sview
http://www.schedmd.com/slurmdocs/slurm_ug_2011/sview-users-guide.pdf
5. Control your job
● scancel● scontrol
● sview
http://www.schedmd.com/slurmdocs/slurm_ug_2011/sview-users-guide.pdf
The rules of fairshare
● A share is allocated to you: 1/nbusers ● If your actual usage is above that share, your
fairshare value is decreased towards 0. ● If your actual usage is below that share, your
fairshare value is increased towards 1.● The actual usage taken into account decreases
over time
A word about fairshare
● Assume 3 users, 3-cores cluster● Red uses 1 core for a certain period of time● Blue uses 2 cores for half that period● Red uses 2 cores afterwards
#nodes
time
A word about fairshare
● Assume 3 users, 3-cores cluster● Red uses 1 core for a certain period of time● Blue uses 2 cores for half that period● Red uses 2 cores afterwards
Summary
● Explore the enviroment● Get node features (sinfo --node --long)● Get node usage (sinfo --summarize)
● Submit a job:● Define the resources you need● Determine what the job should do● Submit the job script (sbatch)● View the job status (squeue)● Get accounting information (sacct)
job script
Concurrent - Parallel - Distributed
Master/slave vs SPMD
Synchronous vs asynchronous
Message passing vs shared memory
Typical resource request
You want You ask
16 independent processes (no communication) --ntasks=16
MPI and do not care about where cores are distributed
--ntasks=16
cores spread across distinct nodes --ntasks=16 --nodes=16
cores spread across distinct nodes and nobody else around
--ntasks=16 --nodes=16 --exclusive
16 processes to spread across 8 nodes --ntasks=16 --ntasks-per-node=2
16 processes on the same node --ntasks=16 --ntasks-per-node=16
one process that can use 16 cores for multithreading
--ntasks=1 --cpus-per-task=16
4 processes that can use 4 cores --ntasks=4 --cpus-per-task=4
more constraint requests --distribution=block|cyclic|arbitrary
● Your program draws random numbers and processes them sequentially
● Parallelism is obtained by launching the same program multiple times simultaneously
● Every process does the same thing
● No inter process communication
● Results appended to one common file
Use case 1: Random sampling
Use case 1: Random sampling
You want You ask
16 independent processes (no communication) --ntasks=16
You use srun ./myprog
Use case 1: Random sampling
You want You ask
16 independent processes (no communication) --array=1-16 --output=res%a
You merge with cat res*
Use case 2: Multiple datafiles
● Your program processes data from one datafile
● Parallelism is obtained by launching the same program multiple times on distinct data files
● Everybody does the same thing on distinct data stored in different files
● No inter process communication
● Results appended to one common file
Use case 2: Multiple datafiles
You want You ask
16 independent processes (no communication) --ntasks=16
You use srun ./myprog$SLURM_PROCID
Use case 2: Multiple datafiles
Useful commands: xargs and find/ls:
Single node:
ls “data*” | xargs -n1 -P $SLURM_NPROCS myprog
Multiple nodes:
ls “data*” | xargs -n1 -P $SLURM_NTASKS srun -c1 myprog
Safer: find . -maxdepth1 -name “data*” -print0 | xargs -0 -n1 -P ...
Use case 2: Multiple datafiles
You want You ask
16 independent processes (no communication) --array=1-16
You use $=SLURM_TASK_ARRAY_ID
Use case 3: Parameter sweep
● Your program tests something for one particular value of a parameter
● Parallelism is obtained by launching the same program multiple times with an distinct identifier
● Everybody does the same thing except for a given parameter value based on the identifier
● No inter process communication
● Results appended to one common file
Use case 3: Parameter sweep
You want You ask
16 independent processes (no communication) --ntasks=16
You use srun ./myprog$SLURM_PROCID
Use case 3: Parameter sweep
You want You ask
16 independent processes (no communication) --array=1-16 --output=res%a
You use $SLURM_ARRAY_TASK_IDcat res* to merge
Use case 3: Parameter sweep
Useful command: GNU Parallel
Single node:
parallel -j $SLURM_NPROCS myprog ::: {1..5} ::: {A..D}
Multiple nodes:
parallel -j $SLURM_NTASKS srun -c1 myprog ::: {1..5} ::: {A..D}
Useful: parallel --joblog runtask.log –resume for checkpointing parallel echo data_{1}_{2}.dat ::: 1 2 3 ::: 1 2 3
Use case 4: Multithread
● Your program uses OpenMP or TBB
● Parallelism is obtained by launching a multithreaded program
● One program spawns itself on the node
● Inter process communication by shared memory
● Results managed in the program which outputs a summary
You want You ask
one process that can use 16 cores for multithreading
--ntasks=1 --cpus-per-task=16
You use OMP_NUMTHREADS=16 srun myprog
Use case 4: Multithread
● Your program uses MPI
● Parallelism is obtained by launching a multi-process program
● One program spawns itself on several nodes
● Inter process communication by the network
● Results managed in the program which outputs a summary
Use case 5: Message passing
Use case 5: Message passingYou want You ask
16 processes for use with MPI --ntasks=16
You use module load openmpimpirun myprog
● You have two types of programs: master and slave
● Parallelism is obtained by launching a several slaves, managed by the master
● The master launches several slaves on distinct nodes
● Inter process communication by the network or the disk
● Results managed in the master program which outputs a summary
Use case 6: Master/slave
Use case 6: Master slaveYou want You ask
16 processes 16 threads
--ntasks=16--cpus-per-task=16
You use --multi-prog + conf file
Use case 6: Master slaveYou want You ask
16 processes 16 threads
--ntasks=16--cpus-per-task=16
You use --multi-prog + conf file
Summary
● Choose number of processes: --ntasks● Choose number of threads: --cpu-per-task
● Launch processes with srun or mpirun● Set multithreading with OMP_NUM_THREADS
● You can use $SLURM_PROC_ID $SLURM_TASK_ARRAY_ID