running jobs on jacquard an overview of interactive and batch computing, with comparsions to seaborg...
TRANSCRIPT
Running Jobs on Jacquard
An overview of interactive and batch computing, with comparsions to Seaborg
David TurnerNUG Meeting3 Oct 2005
2
Topics
• Interactive– Serial– Parallel– Limits
• Batch– Serial– Parallel– Queues and Policies
• Charging• Comparison with Seaborg
3
Execution Environment
• Four login nodes– Serial jobs only– CPU limit: 60 minutes– Memory limit: 64 MB
• 320 compute nodes– “Interactive” parallel jobs– Batch serial and parallel jobs– Scheduled by PBSPro
• Queue limits and policies established to meet system objectives
– User input is critical!
4
Interactive Jobs
• Serial jobs run on login nodes– cd, ls, pathf90, etc.– ./a.out
• Parallel jobs run on compute nodes– Controlled by PBSPro
mpirun -np 16 ./a.out
qsub -I -q interactive -l nodes=8:ppn=2 % cd $PBS_O_WORKDIR
% mpirun -np 16 ./a.out
qsub -I -q batch -l nodes=32:ppn=2,walltime=18:00:00
5
PBSPro
• Marketed by Altair Engineering– Based on open source Portable Batch
System developed for NASA– Also installed on DaVinci
• Batch scripts contain directives:#PBS -o myjob.out
• Directives may also appear as command-line options:qsub -o myjob.out …
6
Simple Batch Script
#PBS -l nodes=8:ppn=2,walltime=00:30:00#PBS -N myjob #PBS -o myjob.out #PBS -e myjob.err #PBS -A mp999 #PBS -q debug #PBS -V
cd $PBS_O_WORKDIR mpirun -np 16 ./a.out
7
Useful PBS Options (1)
-A repoCharge this job to repository repoDefault: Your default repository
-N jobnameProvide name for job; up to 15 printable, non-
whitespace charactersDefault: Name of batch script
-q qnameSubmit job to batch queue qnameDefault: batch
8
Useful PBS Options (2)
-S shellSpecify shell as the scripting language
Default: Your login shell
-VExport current environment variables into the
batch job environment
Default: Do not export
9
Useful PBS Options (3)
-o outfileWrite STDOUT to outfileDefault: <jobname>.o<jobid>
-e errfileWrite STDERR to errfileDefault: <jobname>.e<jobid>
-j [eo|oe]Join STDOUT and STDERR on STDOUT (eo)
or STDERR (oe)Default: Do not join
10
Useful PBS Options (4)
-m [a|b|e|n]E-main notification
a = send mail when job aborted by system
b = send mail when job begins
e = send mail when job ends
n = do not send mail
Options a, b, and e may be combined
Default: a
11
Batch Queues
Submit Execute Nodes Walltime
interactive interactive 1 – 16 30 mins
debug debug 1 – 32 30 mins
batch
batch16 1 – 16 48 hours
batch32 17 – 32 24 hours
batch64 33 – 64 12 hours
batch128 65 – 128 6 hours
batch256 129 – 256 6 hours
low low 1 – 64 6 hours
12
Batch Queue Policies
• Each user may have:– One running interactive job– One running debug job– Four jobs running over entire system
• Only one batch128 job is allowed to run at a time.
• The batch256 queue usually has a run limit of zero. NERSC staff will arrange to run jobs of this size.
13
Submitting Batch Jobs
% qsub myjob
93935.jacin03
%
• Record jobid for tracking!
14
Deleting Batch Jobs
% qdel 93935.jacin03
%
15
Monitoring Batch Jobs (1)
• PBS command qstat % qstatJob id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----93295.jacin03-ib job5 einstein 00:00:00 R batch1693894.jacin03 EV80fl02_3 legendre 0 H batch16
93330.jacin03 test.script laplace 00:00:23 R batch32
93897.jacin03 runlu8x8 rasputin 0 Q batch3293334.jacin03-m mtp_mg_3wat_o2a fibonacci 00:00:11 R batch16...
• Use -u option for single-user output% qstat -u einsteinJob id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----93295.jacin03-ib job5 einstein 00:00:00 R batch16%
16
Monitoring Batch Jobs (2)
• NERSC command qs% qs
JOBID ST USER NAME NDS REQ USED SUBMIT
93939 R gauss STDIN 1 00:30:00 00:10:43 Oct 2 16:47:00
93891 R einstein runlu4x8 16 01:00:00 00:38:48 Oct 2 15:23:36
93918 R inewton r4_16 8 01:00:00 00:10:37 Oct 2 15:36:35
... 93785 Q inewton r4_64 32 01:00:00 - Oct 2 08:42:36
93828 Q rasputin nodemove 64 00:05:00 - Oct 2 12:00:11
93897 Q einstein runlu8x8 32 01:00:00 - Oct 2 15:24:27
... 93893 H legendre EV80fl02_2 4 03:00:00 - Oct 2 15:24:23
93894 H legendre EV80fl02_3 4 03:00:00 - Oct 2 15:24:24
93917 H legendre EV80fl98_5 4 03:00:00 - Oct 2 15:26:06
...
• Also provides -u option
17
Monitoring Batch Jobs (3)
• NERSC website has current queue look:http://www.nersc.gov/nusers/status/jacquard/qstat
• Also has completed jobs list:http://www.nersc.gov/nusers/status/jacquard/pbs_summary
• Numerous filtering options available– Owner– Account– Queue– Jobid
18
Charging
• Machine charge factor (cf) = 4– Based on benchmarks and user applications– Currently under review
• Serial interactive– Charge = cf • cputime– Always charged to default repository
• All parallel– Charge = cf • 2 • nodes • walltime– Charged to default repo unless -A specified
19
Things To Look Out For (1)
• Do not set group write permission for your home directory; it will prevent PBS from running your jobs.
• Library modules must be loaded at runtime as well as linktime.
• Propagation of environment variables to remote processes is incomplete; contact NERSC consulting for help.
20
Things To Look Out For (2)
• Do not run more that one MPI program in a single batch script.
• If your login shell is bash, you may see:accept: Resource temporarily unavailable
done.
In this case, specify a different shell using the -S directive, such as:#PBS -S /usr/bin/ksh
21
Things To Look Out For (3)
• Batch jobs always start in $HOME. To get to directory where job was submitted:cd $PBS_O_WORKDIR
For jobs that work with large files:cd $SCRATCH/some_subdirectory
• PBS buffers output and error files until job completes. To view files (in home directory) while running:-k oe
22
Things To Look Out For (3)
• The following is just a warning and can be ignored:Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.
23
LoadLeveler vs. PBS
LL PBS LL PBS#@ node #PBS -l nodes #@
notification#PBS -m
#@ tasks_per_node
#PBS -l ppn #@ shell #PBS -S
#@ wall_clock_limit
#PBS -l walltime #@ output #PBS -o
#@ class #PBS -q #@ error #PBS -e
#@ job_name #PBS -N #@ environment
#PBS -V
#@ account_no #PBS -A
24
Resources
• NERSC Websitehttp://www.nersc.gov/nusers/resources/jacquard/running_jobs.php
http://www.nersc.gov/vendor_docs/altair/PBSPro_7.0_User_Guide.pdf
• NERSC Consulting
1-800-66-NERSC, menu option 3, 8 am - 5 pm, Pacific time (510) 486-8600, menu option 3, 8 am - 5 pm, Pacific time [email protected] http://help.nersc.gov/