Download - Advanced UPPMAX usage More SLURM
![Page 1: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/1.jpg)
More SLURMAdvanced UPPMAX usage
Using Bash to manage jobs
Job efficiency
![Page 2: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/2.jpg)
More SLURM and other advanced UPPMAX techniques
● A closer look at SLURM● GPUs on Snowy● Jobstats — our mutual friend in the fight for
efficiency● Advanced job submission
![Page 3: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/3.jpg)
SLURM
● Free, popular, lightweight ● Open source:
https://github.com/SchedMD/slurm
● UPPMAX Slurm user guide: https://www.uppmax.uu.se/support/user-guides/slurm-user-guide/
![Page 4: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/4.jpg)
More on sbatch
● A recap:● sbatch -A snic2021-1-123 -t 10:00 -p core -n 10 myjob.sh
Slurm batch
Project name Maximum runtime
“partition”(“job type”)
# cores job script
![Page 5: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/5.jpg)
More on time limits● -t dd-hh:mm:ss
● 0-00:10:00 = 00:10:00 = 10:00 = 10
● 0-12:00:00 = 12:00:00
● 3-00:00:00 = 3-0
● 3-12:10:15
![Page 6: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/6.jpg)
Recall from “Intro to UPPMAX”● Q: When you have no idea how long a program
will run, what should you book?− A: very long time, e.g. 10-00:00:00
● Q: When you do have an idea of how long a program should run, what should you book?− A: overbook by 150%
![Page 7: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/7.jpg)
Recall from “Intro to UPPMAX”● Q: When you have no idea how long a program
will run, what should you book?− A: very long time, e.g. 10-00:00:00
● Q: When you do have an idea of how long a program will run, what should you book?− A: overbook by 150%
![Page 8: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/8.jpg)
More on partitions● -p core
● The default● < 20 cores on Rackham● < 16 cores on Snowy or Bianca● A script or program written without any thought
to parallelism will use 1 core
![Page 9: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/9.jpg)
Quick testing● The “devel” partition
− 2 nodes
− Up to 1 hour in length
− Only 1 at a time
− -p devcore, -p devel
● High-priority short jobs− 4 nodes
− Up to 15 minutes
− --qos=short
● Interactive jobs− Up to 12 hours
− Handy for debugging a script by executing it manually line by line.
![Page 10: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/10.jpg)
When a job goes wrong● scancel
− <jobid>
− -u username — to cancel all your jobs− -t <state> — cancel pending or running jobs− -n <name> — cancel jobs with name− -i — asks for confirmation
![Page 11: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/11.jpg)
Parameters in job script or on command line?
● Command line parameters override script parameters● Typical script maybe:
#!/bin/bash -l #SBATCH –A snic2021-22-606 #SBATCH –p core #SBATCH –n 1 #SBATCH –t 24:00:00
● Just a quick test: ● $ sbatch -p devcore -t 00:15:00 job.sh
![Page 12: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/12.jpg)
Memory in core or devcore jobs● -n X
● On Rackham: get 6.4 GB per core● On Snowy/Bianca: get 8 GB per core
● Slurm reports available memory on starting an interactive job
![Page 13: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/13.jpg)
More flags● -J jobname
● Email:− --mail-type=BEGIN,END,FAIL,TIME_LIMIT_80
− --mail-user Don’t use. Set your email in SUPR correctly.
● Output redirection:− --output=my.output.file
− --error=my.error.file
![Page 14: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/14.jpg)
More flags● Memory
− -C thin / -C 128GB
− -C fat / -C 256GB / -C 1TB
● Dependencies: --dependency
● Job array: --array
● More at https://slurm.schedmd.com/sbatch.html − Or just man batch
− (though not all work on all systems!)
![Page 15: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/15.jpg)
GPU Nodes on Snowy● Nodes with 1 Nvidia T4 ● Available to everyone, priority to groups that paid for
them
#SBATCH -M snowy#SBATCH --gres=gpu:1 or --gpus=1#SBATCH --gpus-per-node=1
● There is a system installation of CUDA v 11, other versions available via modules
● https://www.uppmax.uu.se/support/user-guides/using-the-gpu-nodes-on-snowy/
![Page 16: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/16.jpg)
Time for a break?● That’s enough about sbatch● Next up: monitoring jobs and job efficiency
![Page 17: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/17.jpg)
Monitoring● jobinfo — a wrapper around squeue
− jobinfo -u username
− jobinfo -A snic2021-22-606
● Can also use squeue directly
![Page 18: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/18.jpg)
Priority● Roughly:
− First job of the day gets elevated priority− Other normal jobs run in order of submission
(subject to scheduling)− Projects exceeding allocation get successively
lower priority category− Bonus jobs run after higher priority categories
![Page 19: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/19.jpg)
Priority● In practice:
− submit early — run early. − Bonus jobs always run eventually, sometimes wait
until night or weekend.
● In detail: − https://www.uppmax.uu.se/support/faq/running-jobs
-faq/your-priority-in-the-waiting-job-queue/
![Page 20: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/20.jpg)
Job efficiency● jobstats — our mutual friend in the fight for
productivity− Only works for jobs > 5-15 minutes in length− -r — check running jobs− -A <project> — check all recent jobs for a
project− -p — produce CPU & memory usage plot− -M <cluster> — check jobs on other cluster
![Page 21: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/21.jpg)
Jobstats exercise● Generate jobstats plots for your jobs
− First, find some job id’s from this month− $ finishedjobinfo -m <yourusername>
− Note the job id’s from some interesting jobs. − Generate the images− $ jobstats -p id1 id2 id3
● Look at the images. I have put some interesting ones in /proj/introtouppmax/labs/moreslurm/jobstatsplots
− $ eog *.png &
![Page 22: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/22.jpg)
Jobstats plots● Which of the plots in labs/moreslurm/jobstatsplots:− Show good CPU/memory usage?− Show a job that needs a fat node?
![Page 23: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/23.jpg)
Time for a break?● Next is advanced job submission
![Page 24: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/24.jpg)
Multicore jobs● (Efficient) multicore jobs need either high
memory utilisation or multiple execution threads. Either:− Job script launches multiple programs− One program runs multithreaded
![Page 25: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/25.jpg)
Multithreaded script#!/bin/bash -l#SBATCH -A snic2021-22-606#SBATCH –p core#SBATCH –n 4#SBATCH -t 00:15:00#SBATCH -J 4commands
cd /proj/introtouppmax/labs/moreslurm work.sh 1 10000000 & work.sh 2 15000000 & work.sh 3 20000000 & work.sh 4 10000000 & wait
![Page 26: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/26.jpg)
Multithreaded program● Program can mention “OpenMP”, “MPI”, “pthreads”, or other parallel
programming technologies
#!/bin/bash -l#SBATCH -A snic2021-22-606#SBATCH –p core#SBATCH –n 4#SBATCH -t 00:15:00#SBATCH -J multithreaded
cd /proj/introtouppmax/labs/moreslurm ./work_threaded.sh
![Page 27: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/27.jpg)
Dependencies● --dependency <jobid> : job added to
queue after successful end of job <jobid>● Very handy for “fire and forget” workloads● Potentially lots of time spent in queue
![Page 28: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/28.jpg)
Exercise● Look at /proj/introtouppmax/labs/moreslurm/dependency/
● Run dependency_submit.sh and see how it works ● Read man sbatch for more information
− When might you use afterany instead of afterok? − When might singleton be a good idea?
● Discuss in HackMD
![Page 29: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/29.jpg)
Dividing up a big chunk of work● A common question is how to divide up a really
big job and manage the chunks● The best approach depends on specifics, but is
usually either:− Jobarrays− Some Bash scripting that submits lots of
reasonable-sized jobs− A workflow manager such as SnakeMake or Nextflow
![Page 30: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/30.jpg)
Snakemake and Nextflow● Conceptually similar, but with different flavours● First define steps, each with an input, an output,
and a command that transforms an input into an output
● Then just ask for desired output, and the system will handle the rest
![Page 31: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/31.jpg)
Job arrays● Submit many jobs at once with same
parameters● Use $SLURM_ARRAY_TASK_ID in script to find
the correct part of the workload● You can find a simple example in moreslurm/jobarrays
![Page 32: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/32.jpg)
Exercise● Suppose you have to do 1000 runs of a
program and want do 50 runs per job.● Modify the jobarrays example to submit 20
1-core jobs in an array, each of which will run “echo” 50 times.
![Page 33: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/33.jpg)
DIY Workflows● For middle-of-the-road situations, some simple
Bash (or Python) will suffice.● labs/moreslurm/manyjobs/ contains an
example− job.sh does a “chunk” of work− jobsubmit.sh submits the jobs to Slurm
![Page 34: Advanced UPPMAX usage More SLURM](https://reader034.vdocuments.us/reader034/viewer/2022042723/6267d5c238b16c5a947d3ae2/html5/thumbnails/34.jpg)
THE END
Now you know everything there is to know about using Slurm at UPPMAX