getting started on topsail charles davis its research computing february 10, 2010

Post on 21-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Getting Started on Topsail

Getting Started on TopsailCharles Davis

ITS Research ComputingFebruary 10, 2010

2

History of Topsail

Structure of Topsail

File Systems on Topsail

Compiling on Topsail

Topsail and LSF

OutlineOutline

3

Initial Topsail ClusterInitial Topsail Cluster

Initially: 1040 CPU Dell Linux Cluster

•520 dual socket, single core nodes

Infiniband interconnect

Intended for capability research

Housed in ITS Franklin machine room

Fast and efficient for large computational jobs

4

Topsail Upgrade 1Topsail Upgrade 1

Topsail upgraded to 4,160 CPU• replaced blades with dual socket, quad core

Intel Xeon 5345 (Clovertown) Processors• Quad-Core with 8 CPU/node

Increased number of processors, but decreased individual processor speed (was 3.6 GHz, now 2.33)

Decreased energy usage and necessary resources for cooling system

Summary: slower clock speed, better memory bandwidth, less heat• Benchmarks tend to run at the same speed per core

• Topsail shows a net ~4X improvement

• Of course, this number is VERY application dependent

5

Topsail – Upgraded blades

Topsail – Upgraded blades

52 Chassis: Basis of node names• Each holds 10 blades -> 520 blades total• Nodes = cmp-chassis#-blade#

Old Compute Blades: Dell PowerEdge 1855• 2 Single core Intel Xeon EMT64T 3.6 GHZ procs• 800 Mhz FSB• 2MB L2 Cache per socket• Intel NetBurst MicroArchitecture

New Compute Blades: Dell PowerEdge 1955• 2 Quad core Intel 2.33 GHz procs• 1333 Mhz FSB• 4MB L2 Cache per socket• Intel Core 2 MicroArchitecture

6

Topsail Upgrade 2Topsail Upgrade 2

Most recent Topsail upgrade (Feb/Mar ‘09)

Refreshed much of the infrastructure

Improved IBRIX filesystem

Replaced and improved Infiniband cabling

Moved cluster to ITS-Manning building

•Better cooling and UPS

7

Current Topsail Architecture

Current Topsail Architecture

Login node: 8 CPU @ 2.3 GHz Intel EM64T, 12 GB memory

Compute nodes: 4,160 CPU @ 2.3 GHz Intel EM64T, 12 GB memory

Shared disk: 39TB IBRIX Parallel File System

Interconnect: Infiniband 4x SDR

64bit Linux Operating System

8

Multi-Core ComputingMulti-Core Computing

Processor Structure on Topsail

• 500+ nodes

• 2 sockets/node

• 1 processor/socket

• 4 cores/processor (Quad-core)

• 8 cores/node

http://www.tomshardware.com/2006/12/06/quad-core-xeon-clovertown-rolls-into-dp-servers/page3.html

9

Multi-Core ComputingMulti-Core Computing

The trend in High Performance Computing is towards multi-core or many core computing.

More cores at slower clock speeds for less heat

Now, dual and quad core processors are becoming common.

Soon 64+ core processors will be common

•And these may be heterogeneous!

10

The Heat ProblemThe Heat Problem

Taken From: Jack Dongarra, UT

11

More ParallelismMore Parallelism

Taken From: Jack Dongarra, UT

12

Infiniband Connections

Infiniband Connections

Connection comes in single (SDR), double (DDR), and quad data rates (QDR).

•Topsail is SDR. Single data rate is 2.5 Gbit/s in each direction

per link. Links can be aggregated - 1x, 4x, 12x.

•Topsail is 4x. Links use 8B/10B encoding —10 bits carry 8

bits of data — useful data transmission rate is four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s respectively.

Data rate for Topsail is 8 GB/s (4x SDR).

13

Topsail Network Topology

Topsail Network Topology

14

Infiniband Benchmarks

Infiniband Benchmarks

Point-to-point (PTP) intranode communication on Topsail for various MPI send types

Peak bandwidth:• 1288 MB/s

Minimum Latency (1-way):• 3.6 s

15

Infiniband Benchmarks

Infiniband Benchmarks

Scaled aggregate bandwidth for MPI Broadcast on Topsail

Note good scaling throughout the tested range (from 24-1536 cores)

16

Login to TopsailLogin to Topsail

Use ssh to connect:•ssh topsail.unc.edu

SSH Secure Shell with Windows For using interactive programs with

X-Windows Display:•ssh –X topsail.unc.edu

•ssh –Y topsail.unc.edu Off-campus users (i.e. domains

outside of unc.edu) must use VPN connection

17

Topsail File SystemsTopsail File Systems

39TB IBRIX Parallel File System

Split into Home and Scratch Space

Home: /ifs1/home/my_onyen

Scratch: /ifs1/scr/my_onyen

Mass Storage

•Only Home is backed up

•/ifs1/home/my_onyen/ms

18

File System LimitsFile System Limits

500GB Total Limit per User

Home – 15GB limit for Backups

Scratch:

•No limit except 500GB total

•Not backed up

•Periodically cleaned

Few installed packages/programs

19

Compiling on TopsailCompiling on Topsail

Modules Serial Programming

• Intel Compiler Suite for Fortran77, Fortran90, C and C++ - Recommended by Research Computing

• GNU

Parallel Programming• MPI

• OpenMP Must use Intel Compiler Suite Compiler tag: -openmp Must set OMP_NUM_THREADS in submission script

20

Compiling ModulesCompiling Modules

Module commands

•module – list commands

•module avail – list modules

•module add – add module temporarily

•module list – list modules being used

•module clear – remove module temporarily

Add module using startup files

21

Available Compilers Available Compilers

Intel – ifort, icc, icpc GNU – gcc, g++, gfortran Libraries - BLAS/LAPACK MPI:

•mpicc/mpiCC

•mpif77/mpif90

mpixx is just a wrapper around the Intel or GNU compiler•Adds location of MPI libraries and include files

•Provided as a convenience

22

Test MPI CompileTest MPI Compile

Copy cpi.c to scratch directory:• cp /ifs1/scr/cdavis/Topsail/cpi.c /ifs1/scr/my_onyen/.

Add Intel module:

•module load hpc/mvapich-intel-11

Confirm Intel module:

•which mpicc

Compile code:

•mpicc –o cpi cpi.c

23

MPI/OpenMP TrainingMPI/OpenMP Training

Courses are taught throughout year by Research Computing http://learnit.unc.edu/workshops

Next course:

•MPI – Summer

•OpenMP – March 3rd

24

Running Programs on Topsail

Running Programs on Topsail

Upon ssh to Topsail, you are on the Login node.

Programs SHOULD NOT be run on Login node.

Submit programs to one of 4,160 Compute nodes.

Submit jobs using Load Sharing Facility (LSF).

25

Job Scheduling Systems

Job Scheduling Systems

Allocates compute nodes to job submissions based on user priority, requested resources, execution time, etc.

Many types of schedulers

•Load Sharing Facility (LSF) – Used by Topsail

•IBM LoadLeveler

•Portable Batch System (PBS)

•Sun Grid Engine (SGE)

26

Load Sharing Facility (LSF)

Load Sharing Facility (LSF)

Submission host

LIM

Batch API

Master host

MLIM

MBD

Execution host

SBD

Child SBD

LIM

RES

User jobLIM – Load Information ManagerMLIM – Master LIMMBD – Master Batch DaemonSBD – Slave Batch DaemonRES – Remote Execution Server

queue1

2

3

45

6 7

89

10

11

12

13

Loadinformation

otherhosts

otherhosts

bsub app

27

Submitting a Job to LSF

Submitting a Job to LSF

For a compiled MPI job:

•bsub -n "< number CPUs >" -o out.%J -e err.%J -a mvapich mpirun ./mycode

bsub – LSF command that submits job to compute node

bsub –o and bsub -e

•Job output saved to file in submission directory

28

Queue System on Topsail

Queue System on Topsail

Topsail uses queues to distribute jobs.

Specify queue with –q in bsub:

•bsub –q week …

No –q specified = default queue (week)

Queues vary depending on size and required time of jobs

See listing of queues:

•bqueues

29

Topsail QueuesTopsail Queues

Queue Time Limit

Jobs/User CPU/Job

int 2 hrs 128 ---debug 2 hrs 128 ---day 24 hrs 1024 4 – 128week 1 week 1024 4 – 128512cpu 4 days 1024 32 – 1024128cpu 4 days 1024 32 – 12832cpu 2 days 1024 4 – 32chunk 4 days 1024 Batch Jobs

• Most jobs do not scale very well over 128 cpu.

30

Submission ScriptsSubmission Scripts

Easier to write submission script that can be edited for each job submission.

Example script file – run.hpl:#BSUB -n "< number CPUs >"

#BSUB -e err.%J

#BSUB -o out.%J

#BSUB -a mvapich

mpirun ./mycode

Submit with: bsub < run.hpl

31

More bsub options More bsub options

bsub –x NO LONGER USE!!!!•Exclusive use of a node

•Use extensively when first testing code bsub –n 4 –R span[ptile=4]

•Forces all 4 processors to be on same node

•Similar to –x bsub –J job_name see man pages for a complete

description•man bsub

32

Performance TestPerformance Test

Gromacs MD simulation of bulk water

Simulation setups:

•Case 1: -n 8 -R span[ptile=1]

•Case 2: -n 8 -R span[ptile=8]

Simulation times (1ns MD):

•Case 1: 1445 sec

•Case 2: 1255 sec

Using 1 node only improved speed by 13%

33

Following Job After Submission

Following Job After Submission

bjobs•bjobs –l JobID

•Shows current status of job

bhist•bhist –l JobID

•More details information regarding job history

bkill•bkill –r JobID

•Ends job prematurely

34

Submit Test MPI JobSubmit Test MPI Job

Submit the test MPI program on Topsail

•bsub –q week –n 4 –o out.%J –e err.%J –a mvapich mpirun ./cpi

Follow submission: bjobs

Output stored in out.%J file

35

Pre-Compiled Programs on Topsail

Pre-Compiled Programs on Topsail

Some applications are precompiled for all users:

• /ifs1/apps

• Amber, Gaussian, Gromacs, NetCDF, NWChem, R

Add module to path using module commands:

• module list – shows available applications

• module add – add specific application

Once module command is used, executable is added to the full path

36

Test Gaussian Job on Topsail

Test Gaussian Job on Topsail

Add Gaussian Application to path:

• module add apps/gaussian-03e01

• module list

Copy input com file:

• cp /ifs1/scr/cdavis/Topsail/water.com .

Check that executable has been added to path:

• echo $PATH

Submit job:

• bsub –q week –n 4 –e err.%J –o out.%J g03 water.com

37

Common Error 1Common Error 1

If job immediately dies, check err.%J file

err.%J file has error:

• Can't read MPIRUN_HOST

Problem: MPI enivronment settings were not correctly applied on compute node

Solution: Include mpirun in bsub command

38

Common Error 2Common Error 2

Job immediately dies after submission err.%J file is blank Problem: ssh passwords and keys were

not correctly setup at initial login to Topsail

Solution: • cd ~/.ssh/

• mv id_rsa id_rsa-orig

• mv id_rsa.pub id_rsa.pub-orig

• Logout of Topsail

• Login to Topsail and accept all defaults

39

Interactive JobsInteractive Jobs

To run long shell scripts on Topsail, use int queue

bsub –q int –Ip /bin/bash

•This bsub command provides a prompt on compute node

•Can run program or shell script interactively from compute node

Totalview debugger can also be run interactively from Topsail

40

Further Help with Topsail

Further Help with Topsail

More details about using Topsail can be found on the Getting Started on Topsail help document

•http://help.unc.edu/?id=6214•http://keel.isis.unc.edu/wordpress/ - ON

CAMPUS

For assistance with Topsail, please contact the ITS Research Computing group

•Email: research@unc.edu For immediate assistance, see manual

pages on Topsail:•man <command>

top related