advanced research computing - university of oxforddownloads.arc.ox.ac.uk/training-docs/arc... · an...

An IT Services and Oxford e-Research Centre Partner facility

Advanced Research Computing(previously known as Oxford Supercomputing Centre)

Introduction Training


The ARC Training Programme

Course 1:

An introduction to Supercomputing at Oxford

– Introduction to the ARC

– Introduction to Parallel Computing & HPC

Job Submission, Account management

Course 2:

Introduction to Scientific Programming

Mihai Duta and Ian Bush


Introduction to ARC

Who are we?

What do we offer?

What is High Performance Computing


Who are we?Helping researchers access advanced research computing

facilities, locally, nationally and internationally

Central University HPC

User Support and Training

Access to range of machines

Priority Usage Service

Support for Commercial Customers


ARC Team : [email protected]

Andrew Richards

Head of ARC

Steven Young

Head of Technical Services

Dugan Witherick

Senior Research Computing Specialist

Mihai Duta

Scientific Software Advisor

Ian Bush

Scientific Application Specialist

mailto:[email protected]


ARC Governance Structure

Executive Committee

IT Committee

User GroupOperations GroupTechnical

Advisory Group


Where we are

ARC

Data Centre

&

Offices


Data Centre


Provide access to :

Hardware

Software

User Support


Beyond Personal Computing…


Computing to Compete

• IT in all forms increasingly underpins research activities

• Resources of the scale required can not always be provided

locally at research institutes.

• A clear need for regional, national and international collaboration

on e-infrastructure


Research Computing Pyramid

Tier 0

Tier 1

Tier 2

Tier 3

Tier 0: Europe wide with users from

multiple countries, e.g. PRACE.

Tier-0 for the particle physics community is

the HPC Data Centre at CERN.

Tier 1: National facility.

e.g. ARCHER facility

For particle physics users it is the LHC

Tier-1 Centre at RAL.

Tier 2: Regional Centres

e.g. SES Centre for Innovation

Tier 3: main institutional computing service

such as ARC


HPC Landscape in the UK


ARC Services

• Local HPC– Range of Clusters, Shared memory, Development systems– ARCUS, ARCUS-GPU, SAL, HAL, CARIBOU, RUBY, PHILEAS…

– & Dedicated Storage infrastructure

• Regional HPC– SES-5 e-Infrastructure

• EMERALD (GPU) and IRIDIS (x86)

• Training, Support, Consultancy


What is Linux?

• Linux is an Operating System– Software loaded into your computer's

memory when you switch it on

– Software which manages the hardware

• Linux is “Unix-like”– Unix is an operating system originally developed in the

1970s to run large, central computers with many users

– Runs everything from phones to supercomputers today


Operating Systems


Brief history of UNIX and Linux

• A multi-user, multi-tasking operating system

• 1960s

– MIT, AT&T, General Electric develop Multics (Multiplexing Information and Computing Services)

• 1970s

– Unics (Uniplexed Information and Computing Service) – renamed to Unix

• 1972

– rewritten in C – resulted in more portable software to different hardware

• 1980

– AT&T licenses UNIX System III – consolidated differences into UNIX System V release 1

• 1983

– AT&T commercialise UNIX System V

– Academia continue to develop BSD Unix as an alternative

– BSD Unix main contributor of TCP/IP network code to the mainstream Unix kernel

• 1990

– Open Software Foundation released OSF/1 – standard Unix implementation (Mach and BSD)

• 1991

– Linus Torvalds starts Linux (originally Freax). Mixture of BSD and SYS-V


Why do we use Linux?

• Linux very popular on critical servers due to

– Stability

– Security

– Performance

– Price

– Flexibility & Open Source nature

• All of the above are applicable to high performance computing


Storage

/home: 20GB quota per user

/data: 5TB per group/project

Larger quotas available on request (charges may apply)

Scratch areas specific to each machine

• Home areas (/home)

• mid-term storage

• General purpose (/data)

• mid-term storage

• Scratch space (/scratch)

• High performance storage

• Jobs which perform significant disk

read/write use the scratch disks

and not /home

• Short term (duration of job)

ARC provides three types of storage

• active data storage for ARC projects only

• limited backups of home and data

Storage Management: http://www.arc.ox.ac.uk/content/storage-management


Data Storage Policy

The ARC makes every effort to ensure the integrity of data stored on our facilities

• However, we are under no obligation to guarantee the integrity or availability of data - this is the responsibility of the individual user.

No Backups (limited snapshots of home and data)

• The ARC does not accept any liability, financial or otherwise for loss of data.

• We recommend that users employ standard industry practice for their important data and store it at sites other than the ARC , for example, in their department.


Lib

rari

es &

Co

mp

ilers

• Intel C/C++

• Intel Fortran

• Intel MKL

• Portland

• BLACS

• CGNS

• HDF5

• ParMetis

• PETSc

• Scalapack/

• VTK

• MUMPS

• NetCDF

• DDT

Stat

isti

cs• Stata MP4

• R

• Rmpi

Ch

em

istr

y • ADF

• Gaussian03

• Gaussian09

• CPMD

• SIESTA

• CP2K

• GAMESS-US

• GROMACS

• NAMD

• Abaqus –FE

• CFD-ACE

Installed Software

‘module avail’ - command on ARC Services will show applications available


How much computing knowledge do I need ?


User Support

Training Courses

ARC User Guide

User Forum

Seminars on parallel Matlab & Mathematica

Advice on costing grants

Installing, testing software applications

Advice on using software, best practice, examples

What else would be of use in the future?


How do I get on the Systems?

ARC is open to all researchers at the University of Oxford.

• To use the ARC you must first set up a new project and then apply for an individual user account.

Projects can be research groups, individuals or whatever best suits your local set up.

• For example, a research group may have two areas of research with different funding streams.

• It might make sense to create separate projects and have different user accounts for each project.

• Users can decide on the most sensible way to approach this.


Logging onto ARC Systems

Remote access

• Linux, Mac (and other Unix/Unix-like) users should use sshto connect to the ARC systems from a terminal

• ssh -X [email protected]

• Windows users should download and install an application called PuTTY and Xming for X11 support

Batch Jobs• Never run intensive jobs on ARC systems without using the

job scheduler

• Jobs are submitted to the scheduler using the "qsub" command and a job submission script


User Account Managementusing ‘OSCtool’

Passwords and other user

functions can be managed via

the osctool

To be replaced soon


How is usage funded?


Access Policy

Purchase of time via Research Grants

For larger, sustained use of ARC systems

Higher priority job queue

Direct

Basic Service

Usage charges distributed across Divisions and Departments

Contact us to request credits

Indirect

ARC costs around £0.5M a year to run and is currently supported through a hybrid

model funding the basic provision of the service through indirect charges with

additional direct costs from grants for the large users of the facility.


ARC charges

Compute Time

• Funds deposited for usage on ARC systems are converted to credits where 1 credit = 1 Processor core*1 second of walltime.

• The current Full Economic Cost of credits is 3,600 for £0.02 (or 2p per core hour). Pricing is reviewed annually

• EU grants are charged differently (please speak to us first)

• Please contact us for current ‘price’ of credits

Storage

• £250 per TB per year above 5TB space limit provided to projects

Consultancy

• Please contact us.


Parallel Computing

High Performance Computing

High Throughput Computing

Supercomputing


What is High Performance computing?• No single, all-encompassing definition

• For the purposes of this course it’s computing which is:

– Work you can’t do on a desktop or server,

– requires specialised resources and

– is carried out on multiple processors in ‘parallel’

• We’re going to be working at the “sports car” level of computing...


A useful analogy

Local desktop

• Easily to buy

• Cheap

• Low running costs

• Easy to use

• No training required (but available)

• Low performance


Servers

• Very common but more expensive

• Can be expensive to run

• Some local training may be required to use properly.

• Relatively easy to get the best out of them with some practice

• Mid-range performance

A useful analogy


Campus HPC (Tier 3)

• Less common

• Fairly expensive

• High running costs

• Training can build on previous experience BUT to squeeze out the best performance, more advanced training and years of experience needed

• New users can have accidents easily

A useful analogy


National HPC

• A few per country

• Extremely expensive

• High running costs

• Specialist training and experience required to begin usage

• New users not allowed near them

A useful analogy


Impact of HPC• Waiting for computers is dead time.

• Take any one thing which takes a week

to do in your place of work and bring it

down to a few hours...how big is the

impact?

• Performance also limits the scale of the

problems we can tackle.

• Take a problem which you can tackle in

ten times more detail, size (or both)...

How big is the impact?


How fast is fast enough?

A home PC can manage on the order of a few Billion calculations per second – isn't this fast enough?

To run weather and climate models in a sensible amount of time the Met Office needs a machine which can reach 1Pflop (1 Million Billion calculations per second) by 2011

Why can't we just make our PCs that fast?


Computational Performance

• Computers are an amalgamation of

components, some faster than

others

• Processors have enjoyed the

majority of progress but

increasingly software is bottle

necked by other components:

Processor(FAST)

Bus(SLOW)

Memory(SLOW)

Network

(SLOW)

Storage

(SLOW)


Making Processors Faster

1980s to early 00's – just make the processor clocks tick faster!

Clocks speed increases heat production, chips experiencing thermal limits/damage

Power consumption getting out of control

The “Megahertz Myth”

Make components smaller, faster

Runs into physical limitations


Physical Limitations• Size of transistors:

– 2003 90nm

– 2009 32nm

– 2014 14nm

– 2015 8-11nm (TBC)

• Size of Silicon atom = 0.22nm

• Analogy:

– Silicon atom = football

– 2015 transistor = centre circle

– 2009 transistor = width of pitch

What do we do when we're near 1nm?


Solution

• Use more processors!

– 1950s: multiple processor computers

– 1980s: possible to link multiple computers together (clusters)

– Late 1990s: multiple chip PCs

– Early 00's: multiple “cores” on single PC chip

• Now: hundreds of cores on specialist hardware

But: Multiple cores brings additional problems for

software design, optimal code ‘placement’ on

cores.


How does it work?

• We achieve higher performance by looking for parallelism – Items of work you can do independently at the same time.

• A Problem can have:

– lots of parallelism (little communication)

– no parallelism!

– some parts parallelise, others don't (typical)

– trivial parallelism (no communication except start/end)


Analogy: Building a House Building the foundation and

the roof – no parallelism

Laying the bricks – partial parallelism

Building the walls and laying the drive way - trivially parallel


Too good to be true ?

• Splitting the task up isn't always straightforward

– Sometimes done by the application automatically

– Sometimes done by the ARC staff

– Rarely done by the end user

• Not a magic bullet approach –speedup and scaling

• Communication much slower than computation...

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0

2

4

6

8

10

12

14

16

Example Speedups

Linear

Sub-Linear

Perf

orm

ance

Processors


Program Speedup: Scaling

• Algorithm may not parallelise fully (Amdahl's law)

• If 20% of a problem won't parallelise, you can never make it more than 5 times faster

http://en.wikipedia.org/wiki/Amdahl%27s_law

The speedup of a program using multiple processors in parallel

computing is limited by the time needed for the sequential fraction

of the program.

For example, if a program needs 20 hours using a single

processor core, and a particular portion of the program which

takes one hour to execute cannot be parallelized, while the

remaining 19 hours (95%) of execution time can be parallelized,

then regardless of how many processors are devoted to a

parallelized execution of this program, the minimum execution

time cannot be less than that critical one hour.

Hence the speedup is limited to at most 20×


Reasons for sub-linear scaling

Remember that computers are multi-component systems which run at different speeds

Processors, very fast

Cache, fast

Main memory, slow

Interconnect, slow

Disk, very slow

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0

2

4

6

8

10

12

14

16

Example Speedups

Linear

Sub-Linear

Different applications place different loads on each subsystem


HPC Terminology?

High Throughput Computing

(Capacity)High Performance

Computing

(Capability)


Capacity vs Capability Computing

Capacity Capability

Using computing power to process many problems simultaneously.

Using maximum computing power to solve a large problem that no other computer can

Individual tasks, no communicationbetween processes

One single individual task comprised of many child processes all communicating with each other

No specialist hardware interconnects needed between nodes in a cluster

Specialist, high performance, low latency, hardware interconnects needed between nodes in a cluster


Clusters (Distributed memory)

• Problem split across smaller machines with software, typically MPI

• Sub-tasks occasionally exchange information

• Cheap to build

• Hard to program

• Good for partially self contained problems e.g. fluid flow


Shared Memory Systems

Shared memory

• Lots of processors and memory in a box

• All sub processes in one box

• Acts like a large ‘desktop PC’

• Applications split into “threads”

• Usually programmed in something like OpenMP

• Easy to program

• Financially expensive

• Good for processes with common data e.g. stats, genomics.


Other HPC Resources

SES: Centre for Innovation

– EMERALD: 372 Tesla GPUs

• 114 Teraflops

• Infiniband and 10Gb Ethernet/Gnodalinterconnect

– IRIDIS: 12,000 core Intel based cluster

• Capability/highly scaling work which cannot be accommodated on local systems


UK e-Infrastructure

National and Regional HPC Centres

MidPlus


HPC Matters ?

• Analysing data - see structure in the chaos

• Modelling endless combinations of molecules to find the cure for <insert disease>

• Turn back the clock 14 billion years

• Modelling the path of storms

• Every Day life– Investments, cars, shopping

• HPC Matters– http://sc14.supercomputing.org/about-sc14

http://sc14.supercomputing.org/about-sc14


Contact us

• http://www.arc.ox.ac.uk

• [email protected]

• [email protected]

Problems

Ideas

Questions

advanced research computing - university of oxforddownloads.arc.ox.ac.uk/training-docs/arc... · an...

Documents