uppmax introduction - github pages

63

Upload: others

Post on 26-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UPPMAX Introduction - GitHub Pages
Page 2: UPPMAX Introduction - GitHub Pages

UPPMAX Introduction

2018-02-07

Valentin [email protected]

Martin Dahlö[email protected]

Page 3: UPPMAX Introduction - GitHub Pages

Objectives

What is UPPMAX what it provides

Projects at UPPMAX

How to access UPPMAX

Jobs and queuing systems

How to use the resources of UPPMAX

How to use the resources of UPPMAX in a good way!Efficiency!!!

Page 4: UPPMAX Introduction - GitHub Pages

Objectives

What is UPPMAX what it provides

Projects at UPPMAX

How to access UPPMAX

Jobs and queuing systems

How to use the resources of UPPMAX

How to use the resources of UPPMAX in a good way!Efficiency!!!

Page 5: UPPMAX Introduction - GitHub Pages

UPPMAX

Uppsala Multidisciplinary Center for Advanced Computational Science

http://www.uppmax.uu.se

computer clusters:● Rackham, 334(600) computers à 20 cores (128GB RAM)● Bianca, 200 nodes à 16 cores (128GB RAM)

(Milou, 208 computers à 16 cores (128GB RAM) 17 with 256, 17 with 512)■

~12 PB fast parallel storage

Bioinformatics software

Page 6: UPPMAX Introduction - GitHub Pages

UPPMAX

The basic structure of supercomputer cluster

Login nodes

node = computer

Page 7: UPPMAX Introduction - GitHub Pages

UPPMAX

The basic structure of supercomputer

Login nodes

Page 8: UPPMAX Introduction - GitHub Pages

UPPMAX

The basic structure of supercomputer

Login nodes

Page 9: UPPMAX Introduction - GitHub Pages

UPPMAX

UPPMAX provides

Compute and Storage

Page 10: UPPMAX Introduction - GitHub Pages

Objectives

What is UPPMAX what it provides

Projects at UPPMAX

How to access UPPMAX

Jobs and queuing systems

How to use the resources of UPPMAX

How to use the resources of UPPMAX in a good way!Efficiency!!!

Page 11: UPPMAX Introduction - GitHub Pages

Projects

UPPMAX provides its resources via

projects

Page 12: UPPMAX Introduction - GitHub Pages

Projects

your project

Page 13: UPPMAX Introduction - GitHub Pages

Projects

Resources:

compute storage(core-hours/month) (GB/TB)

Page 14: UPPMAX Introduction - GitHub Pages

Projects

two separate projects: SNIC project:

cluster Rackham2000 core-hours/month128 GB

Uppstore project:storage system CREX

1 - 100 TB

Page 15: UPPMAX Introduction - GitHub Pages

Projects

Page 16: UPPMAX Introduction - GitHub Pages

Projects

Page 17: UPPMAX Introduction - GitHub Pages

Objectives

What is UPPMAX what it provides

Projects at UPPMAX

How to access UPPMAX

Jobs and queuing systems

How to use the resources of UPPMAX

How to use the resources of UPPMAX in a good way!Efficiency!!!

Page 18: UPPMAX Introduction - GitHub Pages

How to access UPPMAX

SSH to a cluster

ssh -Y your_username@cluster_name.uppmax.uu.se

Page 19: UPPMAX Introduction - GitHub Pages

How to access UPPMAX

SSH to Rackham

Page 20: UPPMAX Introduction - GitHub Pages

SSH

Page 21: UPPMAX Introduction - GitHub Pages

SSH

Page 22: UPPMAX Introduction - GitHub Pages

How to use UPPMAX

Login nodes use them to access UPPMAXnever use them to run jobsdon’t even use them to do “quick stuff”

Calculation nodesdo your work here - testing and running

Page 23: UPPMAX Introduction - GitHub Pages

How to use UPPMAX

Calculation nodesnot accessible directlySLURM (queueing system) gives you access

Page 24: UPPMAX Introduction - GitHub Pages

Objectives

What is UPPMAX what it provides

Projects at UPPMAX

How to access UPPMAX

Jobs and queuing systems

How to use the resources of UPPMAX

How to use the resources of UPPMAX in a good way!Efficiency!!!

Page 25: UPPMAX Introduction - GitHub Pages

Job

Job (computing)From Wikipedia, the free encyclopedia

For other uses, see Job (Unix) and Job stream.

In computing, a job is a unit of work or unit of execution (that performs said work). A component of a job (as a unit of work) is called a task or a step (if sequential, as in a

job stream). As a unit of execution, a job may be concretely identified with a single process, which may in turn have subprocesses (child processes; the process

corresponding to the job being the parent process) which perform the tasks or steps that comprise the work of the job; or with a process group; or with an abstract

reference to a process or process group, as in Unix job control.

Page 26: UPPMAX Introduction - GitHub Pages

Job

Read/open files

Do something with the data

Print/save output

Page 27: UPPMAX Introduction - GitHub Pages

Job

Read/open files

Do something with the data

Print/save output

Page 28: UPPMAX Introduction - GitHub Pages

Job

Standard way of running jobs

job

Page 29: UPPMAX Introduction - GitHub Pages

Parallel computing

Job

jobs

Page 30: UPPMAX Introduction - GitHub Pages

Queue System

More users than nodesNeed for a queue

nodes - hundredsusers - thousands

Page 31: UPPMAX Introduction - GitHub Pages

Queue System

More users than nodesNeed for a queue

Page 32: UPPMAX Introduction - GitHub Pages

Queue System

More users than nodesNeed for a queue

Page 33: UPPMAX Introduction - GitHub Pages

Queue System

More users than nodesNeed for a queue

Page 34: UPPMAX Introduction - GitHub Pages

SLURM

queue systemworkload managerjob queuebatch queuejob scheduler

SLURM (Simple Linux Utility for Resource Management)free and open source

Page 35: UPPMAX Introduction - GitHub Pages

Objectives

What is UPPMAX what it provides

Projects at UPPMAX

How to access UPPMAX

Jobs and queuing systems

How to use the resources of UPPMAX

How to use the resources of UPPMAX in a good way!Efficiency!!!

Page 36: UPPMAX Introduction - GitHub Pages

SLURM

1) Ask for resource and run jobs manuallymainly for testing and small jobs

2)Write a script and submit it to SLURMdo the real job

Page 37: UPPMAX Introduction - GitHub Pages

SLURM

1) Ask for resource and run jobs manually

submit a request for resources

ssh to a calculation node

run programs

Page 38: UPPMAX Introduction - GitHub Pages

SLURM

1) Ask for resource and run jobs manuallysubmit a request for resources

salloc -A b2015245 -p core -n 1 -t 00:05:00

salloc - commandmandatory job parameters:-A - project ID (who “pays”)-p - node or core (the type of resource)-n - number of nodes/cores-t - time

Page 39: UPPMAX Introduction - GitHub Pages

SLURM-A project ID

you have to be a member

-p 1 node = 16 cores1 hour walltime = 16 core-hours

-n number of cores (default value = 1)-N number of nodes

-t format - hh:mm:ssdefault value= 7-00:00:00

jobs killed when time limit reaches - always overestimate ~ 50%

Page 40: UPPMAX Introduction - GitHub Pages

SLURM

Information about your jobs squeue -u <user>

Page 41: UPPMAX Introduction - GitHub Pages

SLURM

SSH to a calculation node (from a login node)

ssh -Y <node_name>

Page 42: UPPMAX Introduction - GitHub Pages

SLURM

Page 43: UPPMAX Introduction - GitHub Pages

SLURM

You can run programs now!

Page 44: UPPMAX Introduction - GitHub Pages

SLURM

2)Write a script and submit it to SLURM

put all commands in a text file - script

tell SLURM to run the script (use the same job parameters)

Page 45: UPPMAX Introduction - GitHub Pages

SLURM

2)Write a script and submit it to SLURM

put all commands in a text file - script

job parameters

tasks to be done

Page 46: UPPMAX Introduction - GitHub Pages

SLURM

2)Write a script and submit it to SLURM

put all commands in a text file - script

Page 47: UPPMAX Introduction - GitHub Pages

2)Write a script and submit it to SLURM

tell SLURM to run the script (use the same job parameters)

sbatch test.sbatch

SLURM

Page 48: UPPMAX Introduction - GitHub Pages

2)Write a script and submit it to SLURM

tell SLURM to run the script (use the same job parameters)

sbatch test.sbatch

sbatch - commandtest.sbatch - name of the script file

SLURM

Page 49: UPPMAX Introduction - GitHub Pages

2)Write a script and submit it to SLURM

tell SLURM to run the script (use the same job parameters)

sbatch -A b2015245 -p core -n 1 -t 00:05:00 test.sbatch

SLURM

Page 50: UPPMAX Introduction - GitHub Pages

SLURM OutputPrints to a file instead of terminal

slurm-<job id>.out

Page 51: UPPMAX Introduction - GitHub Pages

Squeue

Shows information about your jobs squeue -u <user>

jobinfo -u <user>

Page 52: UPPMAX Introduction - GitHub Pages

Queue System

SLURM user guidego to http://www.uppmax.uu.se/click Support (left-hand side menu)click User Guidesclick Slurm user guide

or just google “uppmax slurm user guide”

link: http://www.uppmax.uu.se/support/user-guides/slurm-user-guide/

Page 53: UPPMAX Introduction - GitHub Pages

UPPMAX Software

100+ programs installed

Managed by a 'module system'Installed, but hiddenManually loaded before use

module avail - Lists all available modulesmodule load <module name> - Loads the modulemodule unload <module name> - Unloads the modulemodule list - Lists loaded modulesmodule spider <word> - Searches all modules after 'word'

Page 54: UPPMAX Introduction - GitHub Pages
Page 55: UPPMAX Introduction - GitHub Pages

UPPMAX Commands

uquota

Page 56: UPPMAX Introduction - GitHub Pages

UPPMAX Commands

projinfo

Page 57: UPPMAX Introduction - GitHub Pages

UPPMAX Commands

projplot -A <proj-id> (-h for more options)

Page 58: UPPMAX Introduction - GitHub Pages

Objectives

What is UPPMAX what it provides

Projects at UPPMAX

How to access UPPMAX

Jobs and queuing systems

How to use the resources of UPPMAX

How to use the resources of UPPMAX in a good way!Efficiency!!!

Page 59: UPPMAX Introduction - GitHub Pages

UPPMAX Commands

Plot efficiencyjobstats -p -A <projid>

Page 60: UPPMAX Introduction - GitHub Pages
Page 61: UPPMAX Introduction - GitHub Pages
Page 62: UPPMAX Introduction - GitHub Pages
Page 63: UPPMAX Introduction - GitHub Pages

UPPMAX

SummaryAll jobs are run on nodes through queue systemA job script usually consists of

Job settings (-A, -p, -n, -t)Modules to be loadedBash code to perform actions

Run a program, or multiple programs

More info on UPPMAX homepagehttp://www.uppmax.uu.se/milou-user-guide