converting your (simple) job scripts from pbs to slurm · $ sbatch myjob.sh nccs brown bag...

28
Converting Your (Simple) Job Scripts from PBS to SLURM on discover NASA Center for Climate Simulation High Performance Science July 31, 2014

Upload: others

Post on 03-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Converting Your

(Simple) Job Scripts

from PBS to SLURM

on discover

NASA Center for Climate Simulation

High Performance Science

July 31, 2014

Page 2: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Introduction

• Portable Batch System

–Developed at Ames for NASA

–Commercial version: PBS Pro (Altair

Engineering)

• Simple Linux Utility for Resource

Management

–Developed at LLNL

–Open-source (supported by SchedMD)

• PBS->SLURM on discover in October 2013. NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 2

Page 3: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

What’s the difference?

• Concepts and commands have new names.

• Overall script design remains essentially the

same.

• A PBS “queue” is equivalent to a SLURM

“partition”.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 3

Page 4: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Why did we switch?

• Quality of Service (QoS)

– Eliminates need for dedicated queues

• Great reduction in cost

• But PBS is still used at NAS….

– … so we use a PBS emulation layer.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 4

Page 5: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

PBS emulation with SLURM

• SchedMD provided wrapper scripts (in Perl).

• We modified the wrappers for discover.

• Most changes were folded back into baseline.

• Wrapped tools: qsub, qalter, qdel,

qhold, qrerun, qrls, qstat, xsub

• Wrappers handle command-line options only.

• #PBS script directives are translated to

#SBATCH and processed by sbatch.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 5

Page 6: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Emulation “gotchas”

• Not all PBS features can be emulated.

• SLURM exports user environment by default.

• SLURM runs in the current directory.

• SLURM combines stdout and stderr.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 6

Page 7: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Batch job submission

• For simple cases, just replace qsub with

sbatch.

$ qsub myjob.sh

becomes

$ sbatch myjob.sh

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7

Page 8: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Naming your job

• Naming the job makes it easier to find.

#PBS -N job_name

becomes

#SBATCH -J job_name

or

#SBATCH --job-name=job_name

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 8

Page 9: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Specifying the account

• Make sure the proper account is charged.

#PBS -A account_name

becomes

#SBATCH -A account_name

or

#SBATCH --account=account_name

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 9

Page 10: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Specifying the partition

• Only if you have to ….

#PBS -q destination

becomes

#SBATCH -p destination

or

#SBATCH --partition=destination

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 10

Page 11: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Specifying the number of nodes

• Specify how many nodes you need.

#PBS -l select=num

becomes

#SBATCH -N num

or

#SBATCH --nodes=num

• A range can also be specified as nmin-nmax.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 11

Page 12: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Specifying processes per node

• Use one process per core on each node by

default, but may want less.

#PBS -l mpiprocs=num

becomes

#SBATCH --ntasks-per-node=num

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 12

Page 13: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Specifying processor type

• Choices are:

– Sandy Bridge (sand – 16 cores/node)

– Westmere (west – 12 cores/node)

#PBS -l proc=proc_type

becomes

#SBATCH -C proc_type

or

#SBATCH --constraint=proc_type

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 13

Page 14: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

stdout and stderr streams

• Specify where the output streams are written.

#PBS -o opath -e epath

becomes

#SBATCH -o opath -e epath

or

#SBATCH --output=opath --error=epath

• Streams are joined in SLURM by default (./slurm-

NNNNNNN.out), which required -j oe or -j eo

in PBS. NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 14

Page 15: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Mail notification

• Use to get a message when your job is done, or when something bad happens…

#PBS -M user_list

becomes

#SBATCH --mail-type=type

#SBATCH --mail-user=user

• Type can be BEGIN, END, FAIL, ALL (any state change).

• Default user is the submitter.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 15

Page 16: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Your working directory

• PBS jobs ran in a spool directory.

• SLURM jobs run in the current directory.

• Can be changed with cd command, or:

#SBATCH -D path

or

#SBATCH --workdir=path

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 16

Page 17: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Exporting environment variables

• PBS exported nothing by default.

• SLURM exports everything by default.

• Change with one or more of:

#SBATCH --export=names

#SBATCH --export=ALL

#SBATCH --export=NONE

#SBATCH --export-file=path

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 17

Page 18: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Threads and MPI

• Set up and run threads as you always have.

• mpirun/mpiexec/mpiexec.hydra are

not part of PBS, so no changes needed.

• The SLURM tool srun provides additional

features that are SLURM-specific.

– Provides features similar to those of MPI tools.

– Differences in job step control and signal

propagation.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 18

Page 19: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

A simple example

• User inigo has an old PBS script:

–The job name is revenge.

–Runs in the default PBS queue.

–Runs the application sword.x.

–Uses 8 Westmere nodes

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 19

Page 20: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

The simple script (PBS)

#PBS –N revenge

#PBS –l select=8:proc=west

mpirun sword.x

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 20

Page 21: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

The simple script (SLURM)

#SBATCH --job-name=revenge

#SBATCH --nodes=8

#SBATCH --constraint=west

mpirun sword.x

# Could also use:

# srun sword.x

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 21

Page 22: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

The simple results…

• Program runs in current directory, not the

spool directory.

• User environment is exported.

• Standard output and standard error together in ./slurm-NNNNNNN.out.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 22

Page 23: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

A not-so-simple example

• User westley has an old PBS script:

–The job name is pirate.

–Charge the account roberts.

–Runs in the PBS queue dread.

–Uses 12 Sandy Bridge nodes.

–Uses 8 cores per node.

–Export only the variable BUTTERCUP.

–Runs the application sword.x.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 23

Page 24: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

The NSS Script (PBS)

#PBS –N pirate

#PBS –A roberts

#PBS –q dread

#PBS –l

select=12:proc=sand:mpiprocs=8

#PBS –v BUTTERCUP

mpirun sword.x

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 24

Page 25: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

The NSS Script (SLURM)

#SBATCH --job-name=pirate

#SBATCH --account=roberts

#SBATCH --partition=dread

#SBATCH --nodes=12

#SBATCH --constraint=sand

#SBATCH --ntasks-per-node=8

#SBATCH --export=NONE,BUTTERCUP

mpirun sword.x

# or

# srun sword.x

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 25

Page 26: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

The NSS results…

• Program runs in current directory, not the spool directory.

• User environment is exported.

• Standard output and standard error together in ./slurm-NNNNNNN.out.

• NOTE: If you have an environment variable named NONE, and use --export=NONE, nothing is exported. But if you have NONE, and use --export=NONE,OTHER, NONE and OTHER are exported with everything else! So don’t do that….

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 26

Page 27: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Much more to come…

• Using mpirun/mpiexec/mpiexec.hydra

vs. using srun.

– Differing behavior for signal propagation and job

control commands.

• Job dependencies with strigger.

• Copying files with sbcast.

• Attaching to running jobs with sattach.

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 27

Page 28: Converting Your (Simple) Job Scripts from PBS to SLURM · $ sbatch myjob.sh NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 7 . Naming your job •Naming the job makes

Questions?

NCCS Brown Bag 7/31/2014 NASA Center for Climate Simulation 28