how to get started on cees mandy sep style. resources cees-clusters sep-reserved disk20tb sep...
TRANSCRIPT
How to get started on cees
Mandy
SEP Style
ResourcesCees-clustersSEP-reserved disk 20TBSEP reserved node 35 (currently 25)Default max node 149 (8 cores per node)Computer node hardware 2.26 GHz Dual Processor Quad-Core Nehalem
cees-rcfSEP-reserved disk 30TBSEP reserved node 21 (16 cores per node)Default max node 137 (16 cores per node)Computer node hardware sandy bridge
Home and working directories
• /home/username– 10GB quota– Backed up daily– Mounted read-only on compute nodes
• /data/sep/username– Everyone have write access to 20TB in /data/cees– Not backed up– SEP partition in /data/sep (20TB for cees-clusters and 30TB for cees-rcfs)
• Options1) Run your code in /home but use absolute paths for outputting in /data2) Run your code in /data but back-up your code in /home
• TipsA lot faster to write to /tmp within each node first and then copy back to /data
Where is SEPlib?
• # my own environmental variable• setenv SEP /usr/local/SEP• setenv SEPINC /usr/local/SEP/include• setenv SEPBIN /usr/local/SEP/bin
How to submit a job
Number of nodes and cores you need
How to submit a job
The max run time of your job before it is killedNote: must be < 2hours for default queue
How to submit a job
Stdout and Stderr logs
How to submit a job
Queue, either default or sep
How to submit a job
Jobname
How to submit a job
The command for your jobs
How to submit a job
Submit your job using qsub
Do not run big jobs on the head node
-Talk to Dennis when moving large dataset- You can use cees-rcf-tools to test jobs as well
Check jobs
Cancel jobs
Need 40 nodes
Need 1 node
Need 40 nodes
Need 1 node
Need 40 nodes
Ex. Stacking, step sizes, updating
Ex. Pre-stack forwardor adjoint operation
Typical computation structure
1 job or many jobs?
reserved queue jobs can run forever
default queue jobs must finish in 2 hours
Waiting…
Need 40 nodes
Need 1 node
Need 40 nodes
Need 1 node
Need 40 nodes
Ex. Stacking, step sizes, updating
Ex. Pre-stack forwardor adjoint operation
I am taking over every single node.
muahahaha
Bob’s advice
• Break your jobs into 2 hours block and use the default queue
• Only store intermediate result on the clusters
Scripting is useful for job management
• On cees-clusters• /data/sep/mandyman/Tutorial
1. Embarrassingly parallel jobs submission2. Timer to check jobs
Sharing resourcesWe are here now
Sharing resources