uppmax introduction · projects and resources all uppmax resources are allocated to projects. all...

36
UPPMAX Introduction Marcus Holm [email protected] Slides courtesy of: Martin Dahlö [email protected]

Upload: others

Post on 30-Aug-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX Introduction

Marcus [email protected]

Slides courtesy of: Martin Dahlö[email protected]

Page 2: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX

● Uppsala Multidisciplinary Center for Advanced Computational Science

● http://www.uppmax.uu.se

● 2 clusters for ”public” use Rackham, 486 nodes à 20 cores (128GB RAM)

32 with 256 GB, 4 with 1 TB 2 TB local disk per node 6 PB of fast network storage (Crex)

Bianca, 200 nodes à 16 cores for SNIC-SENS 9.5 PB of storage

● SNIC-Cloud system: Dis

Page 3: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Organisational context

● UPPMAX is: A centre at the Dept of IT at Uppsala University A centre of Swedish National Infrastructure for

Computing (SNIC)

● UPPMAX hosts the SciLifeLab Compute and Storage

facility Supports life science researchers’ needs

● All projects and user accounts are handled via SUPR, the SNIC project management portal (supr.snic.se)

Page 4: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Projects and resources● All UPPMAX resources are allocated to projects. All

members are responsible for sharing project resources constructively.

● All projects are created and managed in SUPR.● A project can have:

– Thousands of core-hours per Month (kch/m)● A constant 2-core job will use about 1.5

kch/month– Storage (GB)

● Project storage in /proj/xyz● Can have backup or nobackup separately or

together

Page 5: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Projects and resources● The cost? Free to you but…

● Rackham’s extension cost (very roughly):

– 375 kr/TB per year

– 0.1 kr/core-hour● A typical PhD-student’s project that uses 1 TB for four years and

averages 1000 core-hours/month represents 6,300 kr.

● An ongoing potato genomics project that uses 20 TB and averages 1000 core-hours/month represents 8,700 kr each year.

● A large shotgun genomics project that uses 5 TB and averages 30,000 core-hours/month for half a year represents 19,000 kr.

● The creation of a new reference genome, requiring 100 TB and 50,000 core-hours/month for a year represents almost 100,000 kronor.

Page 6: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Project types● SNIC (Rackham)

Small — anyone can get 2 kch/m and 128 GB Medium — researchers can get up to 100 kch/m Large — groups can get lots of time UPPMAX Storage — additional GB for SNIC projects

(still has plenty of space) SciLifeLab Storage — additional GB for life science

research on Rackham (fully booked)

● SNIC-SENS (Bianca) For work with sensitive personal data Small (up to 20 TB) and Medium

Guide for project applications: http://uppmax.uu.se/support/getting-started/applying-for-projects/

Page 7: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Who are we, what do we do?● System Administrators (about 10 people)

– Have root access

– Fix problems requiring privileged access (e.g. account issues)

– Maintain operating systems and build software infrastructure

– Etc etc etc: They keep all the systems running

● Application Experts (about 7 people)

– Install software

– Help users with application- or science-related issues

– Give user workshops & seminars

– Represent user community to UPPMAX & SNIC

● Others

– UPPMAX Director: Elisabeth Larsson

– SNAC WG: Marcus Holm (manages project allocations)

– Economy admin

Page 8: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX

● The basic structure of a supercomputer

Page 9: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX

● The basic structure of a supercomputer

Page 10: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX

● The basic structure of a supercomputer

Page 11: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX

● The basic structure of a compute node

Local Diskcore

RAM

Other nodesNetwork storage

Internet

Page 12: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX

Storage systems: Crex — Rackham /proj directories Castor — storage for Bianca Cygnus — new storage for Bianca ”scratch” — node local disks

Page 13: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAXStorage system basics:

● All nodes can access: – your home directory on Domus– a project directory on Crex or Castor– Its own local disk (2-3 TB)

● If you’re reading/writing a file once, use a directory on Crex or Castor

● If you’re reading/writing a file many times...

copy to the file to ”scratch”, the node local disk ”cp myFile $SNIC_TMP”

Page 14: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Graphical Access

● ThinLinc

Modern, efficient, experimental

http://www.uppmax.uu.se/support/user-guides/thinlinc-graphical-connection-guide/

● X11-forwarding

Ancient, slow, still useful

Connect with ”ssh -X [email protected]

Check if it works by running e.g. Xclock

Are you using a Mac (OSX/MacOS)?

Then you must install Xquartz (https://www.xquartz.org)

Page 15: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● More users than nodes Need for a queue

Page 16: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● Scheduling jobs– long or short, narrow or wide?

Time ↑

Page 17: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● Scheduling jobs– Short and narrow easier to schedule

Time ↑

Page 18: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

Short and narrow easier to schedule

Time ↑

Page 19: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

A job?

● Job = what happens during booked time Described in a Bash script file

Slurm parameters Load software modules Move around file system Start programs ...and more

Page 20: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● 1 mandatory setting for jobs: Who ”pays” for it? (-A)

● 3 settings you really should set: Where should it run? (-p) How wide is it? (-n) How long at most? (-t)

If in doubt: -p core -n 1 -t 10-00:00:00

Page 21: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● Who ”pays” for it? (-A) Only projects can be charged

You have to be a member

This course's project ID: g2018014

● -A = account (the account you charge) No default value, mandatory

Page 22: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System● Where should it run? (-p)

Use a whole node or just part of it? 1 node = 20 cores (16 on Bianca) 1 hour walltime = 20 core hours = expensive Waste of resources unless you have a parallel

program

● -p = partition (node or core) Default value: core

Page 23: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● How wide is it? (-n) How much of the node should be booked?

1 node = 20 cores Any number of cores

1, 2, 5, 13, 15 etc

● -n = number of cores Default value: 1 Usually used together with -p core

Page 24: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● How long is it? (-t) Always overestimate with ~50%

Jobs killed when timelimit reached Only charged for time used

● -t = time (hh:mm:ss) 78:00:00 or 3-6:00:00 Default value: 7-00:00:00

Page 25: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● How to submit a job Write a script (bash)

Queue options Rest of the script

#! /bin/bash -l #SBATCH -A g2018014#SBATCH -p core#SBATCH -n 1#SBATCH -t 00:10:00#SBATCH -J Template_script

# go to some directorycd /proj/g2018014/marcusl

# load software modulesmodule load bioinfo-tools

# do somethingecho Hello world!

Page 26: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Queue System

● How to submit a job Script written, now what?

[marcusl@rackham1 ~]$ sbatch myjobscript.shSubmitted batch job 4367759[marcusl@rackham1 ~]$ jobinfo -u marcusl

CLUSTER: rackhamRunning jobs: JOBID PARTITION NAME USER ACCOUNT ST START_TIME TIME_LEFT NODES CPUS NODELIST(REASON)

Nodes in use: 479Nodes in devel, free to use: 1Nodes in other partitions, free to use: 0Nodes available, in total: 480

Nodes in test and repair: 6Nodes, all in total: 486

Waiting jobs: JOBID POS PARTITION NAME USER ACCOUNT ST START_TIME TIME_LEFT PRIORITY CPUS NODELIST(REASON) FEATURES DEPENDENCY 4367759 12 core Template_script marcusl staff PD N/A 1:00:00 190000 1 (Resources) (null)

Waiting bonus jobs:

Page 27: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

SLURM Output

● Prints to a file instead of terminal slurm-<job id>.out

[marcusl@tintin2 glob]$ ls -ltotal 4-rw-rw-r-- 1 marcusl marcusl 62 Jun 20 13:40 my_script.sb[marcusl@tintin2 glob]$[marcusl@tintin2 glob]$ sbatch my_script.sb Submitted batch job 10281906[marcusl@tintin2 glob]$[marcusl@tintin2 glob]$ ls -ltotal 4-rw-rw-r-- 1 marcusl marcusl 92 Jun 20 13:40 my_script.sb-rw-rw-r-- 1 marcusl marcusl 87 Jun 20 13:40 slurm-10281906.out[marcusl@tintin2 glob]$

Page 28: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

SLURM Output

● Prints to a file instead of terminal slurm-<job id>.out

[marcusl@rackham2 test]$ lsmy_script.sh[marcusl@rackham2 test]$[marcusl@rackham2 test]$ sbatch my_script.sh Submitted batch job 10281906[marcusl@rackham2 test]$[marcusl@rackham2 test]$ lsmy_script.sh slurm-10281906.out[marcusl@rackham2 test]$[marcusl@rackham2 test]$ cat slurm-10281906.outExample of error with line number and messageslurm_script: 40: An error has occurred.[marcusl@rackham2 test]$

Page 29: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

SLURM Tools

● Squeue — quick info about jobs in queue● Jobinfo — detailed info about jobs● Finishedjobinfo — summary of finished

jobs● Jobstats — efficiency of booked resources

Page 30: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Squeue

● Shows quick information about job queue– All jobs: squeue– Your jobs: squeue -u <user>

[marcusl@rackham2 test]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 4362762 core kbs-3 peterj CG 18:00:47 1 r446 4362767 core kbs-3 peterj CG 18:00:47 1 r446 4430481 node supernov remi PD 0:00 1 (Priority) 4433857 node pretest3 maka4186 PD 0:00 16 (Priority) 4433861 node Freq emile PD 0:00 4 (Priority) 4433740 node REDOH jolla PD 0:00 4 (Priority) 4433872 node q_timing batchtst PD 0:00 1 (Priority) 4433878 core final_vc madeline PD 0:00 1 (Priority) 4433890 node gm_grnof gulla PD 0:00 1 (Priority)

Page 31: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Jobstats

● Shows efficiency information of finished jobs

[marcusl@localmac ~]$ ssh -X [email protected]

..

[marcusl@rackham2 test]$ jobstats -p -A g2018014

Running '/sw/uppmax/bin/finishedjobinfo -M rackham g2018014' through a pipe to get more information, please be patient…

..

*** 10 total jobs, 0 jobs not run, 0 jobs had no jobstats files (includes jobs not run)

[marcusl@rackham2 test]$ eog *.png &

Page 32: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Jobstats

● Shows efficiency information of finished jobs

[marcusl@localmac ~]$ ssh -X [email protected]..[marcusl@rackham2 test]$ jobstats -p -A g2018014Running '/sw/uppmax/bin/finishedjobinfo -M rackham g2018014' through a pipe to get more information, please be patient…..*** 10 total jobs, 0 jobs not run, 0 jobs had no jobstats files (includes jobs not run)[marcusl@rackham2 test]$ eog *.png &

Page 33: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

Interactive

● Books a node and connects you to it– No X11 forwarding through this connection, have to ”ssh -X” in

with another window

interactive -A <proj id> -p <core or node> -t <time>

[marcusl@rackham2 test]$ interactive -A g2018014 -p core -n 4 -t 03:00:00You receive the high interactive priority.There are free cores, so your job is expected to start at once.

Please, use no more than 25.6 GB of RAM.

Waiting for job 4434279 to start…[marcusl@r483 ~/test]$

Page 34: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX Software

● 100+ programs installed● Managed by a 'module system'

Installed, but hidden Manually loaded before use

module avail — Lists all available modules

module load <module name> — Loads the module

module unload <module name> — Unloads the module

module list — Lists loaded modules

module spider <name> — search for modules

Page 35: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

UPPMAX Commands

● uquota – show disk space

Page 36: UPPMAX Introduction · Projects and resources All UPPMAX resources are allocated to projects. All members are responsible for sharing project resources constructively

● Laboratory time! (again)

Same instructions PDF as this morning

Do chapter 3 If you finish:

Go back and finish chapter 2 Then do chapter 4