kaust supercomputing laboratory orientation workshop october 13, 2009

39
KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

Upload: paulina-keighley

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST Supercomputing Laboratory Orientation Workshop

October 13, 2009

Page 2: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 2

Agenda

1 Introduction to KAUST SL and team

2 Computational resources currently available

3 Computational resources available in the near future

4 Getting an account on KAUST SL machines

5 Q & A

6 Machine-room viewing

Page 3: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 3

KAUST Supercomputer Lab

Our Mission

• To offer resources that are world-class in both capacity and diversity

• HPC systems (BG/P, GPUs, SMP, Linux)

• Data systems (on-demand filesystems, archive)

• To assist KAUST researchers to fully exploit these resources• Via a talented, skilled and experienced staff

• Joint research collaborations between SL team and researchers

• SL team will conduct its own HPC exploitation research

Thank you for your continued patience and understanding!!

Page 4: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

4

The KAUST SL team

Management

Jim Sexton

Richard Orme

Systems Administration

Jonathon Anderson

Iain Georgeson

Research & Enablement

Aron Ahmadia

Samar Aseeri

Dodi Heryadi

Mark Cheeseman

Ying Qian

**Possibility of getting IBM expertise as part of CDCR collaboration

Page 5: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 5

Currently available resources

Capability machines (Blue Gene/P)

• WATSONshaheen

• Shaheen (early user access only)

Capacity machines (linux clusters)

• WATSONlinux

• Shaheen (early user access only)

• Texas A&M University linux clusters

Data stores

• Storage available at WATSON (not backed up)

• 0.5TB shared on Shaheen

Page 6: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

Available Blue Gene/P SystemsKAUST SL Orientation Session

October 13, 2009

Page 7: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

7

Blue Gene/P – compute design

CPU4 ‘cores’ @ 850MHz13.6 GF/s

Compute Card1 cpu, 4GB DDR2

Node Card32 compute cards, 0-2 IO cards128GB DDR2, 435 GF/s

Rack32 node cards, 16-32 IO cards4TB DDR2, 13.9 TF/s

Shaheen System4 or 8 racks16 or 32TB DDR2 55.6 or 111.2 TF/s

Page 8: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

8

Blue Gene/P – communication networks

3D Toruspoint-to-point communication

twelve 425 MB/s links per compute node (5.1 GB/s total)

41 or 167 TB/s for system

CollectiveOptimized collective operations (broadcast, reduction, …)

Three 850 MB/s links per compute/IO node (5.1 GB/s total)

Serves as connection between compute and IO nodes

Low latency barriers and interrupt

External

10GbE connection for external communication (file IO)

Page 9: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

9

Accessing the Shaheen platforms

KAUST researchers have access to 2 BG/P’s– WATSONshaheen (4 racks)– Shaheen (8 racks)

Need to ssh/sftp into a front-end machine

kstfen1.watson.ibm.com

kstfen2.watson.ibm.com

shaheen.hpc.kaust.edu.sa

NOTE: front-end machines are of a different architecture

Power6 (32-way)

64 GB

Access to shared GPFS filesystems

Page 10: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

10

Filesystem layout

WATSONshaheen

• 5 PB scratch GPFS (not backed up)

• No archive access for KAUST users– Users responsible for backing up important

Shaheen

• Currently only 0.5PB available

• Archive is not available

• 3 GPFS file-systems shared between BG/P and Xeon cluster– Home – Project (to be shared between users of the same project)– scratch

Page 11: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

11

Shaheen – programming environment

Full IBM and GNU compiler suite available on both BG/P systems

MPI and OpenMP supported

applications

abinit

cpmd

dlpoly

espresso

games

gromacs

nwchem

openfoam

siesta

math libraries

blas/lapack

boost

essl

fftw

hypre

parmetis

petsc

scalapack

trilinos

IO libraries

hdf5

netcdf

pnetcdf

debugging

gdb

Totalview (Shaheen only)

performance analysis

fpmpi

gprof

ipm

mpiP

mpitrace

tau

Supported software:

located under /soft

Control by MODULES

Please allow some time for Shaheen supported software stack to be built

Page 12: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

12

Shaheen - compiling

Do not use normal compiler calls (gcc, gfortran, xlc, …)– create binaries that run on login nodes NOT the compute nodes– Architecture difference between login and compute nodes

Use IBM-provided wrapper compiler commands– create binaries for the compute nodes– Includes native MPI support

mpicc –o test.exe test.ccmpicxx –o test.exe test.cppmpif90 –o test.exe test.fmpif77 –o test.exe test.f90

mpixlc –o test.exe test.ccmpixlcxx –o test.exe test.cppmpixlf90 –o test.exe test.f90

GNU IBM

Page 13: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

13

Shaheen – running a job

WATSONshaheen

• No job management or queuing system present– All jobs ran interactively via the mpirun command

mpirun –np 16 –partition r001n00-c32i2 –VN –cwd pwd –exe test.exe

where

-np indicates # of MPI tasks

-partition indicates BG/P partition to use

-VN indicates the run mode

-cwd gives the rutime directory

-exe gives the name of the executable to be ran

In above example, test.exe is ran on 4 quad-core CPUs in the current directory

• How do I find an appropriate BG/P partition?

/soft/bgp_partition_finder <#_of_quad-core_cpus>NOTE: only a simple script that may fail sometimes

Page 14: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

14

Shaheen – running a job continued

WATSONshaheen continued…

• Run modes– SMP: 1 MPI task per CPU. 4GB available to task– DUAL: 2 MPI tasks per CPU. 2GB available for each task– VN: 4 MPI tasks per CPU. 1GB available for each task

Shaheen

• LoadLeveler job management system to be used– 2 queues (12hr production and 30min development)– Users do not need to specify partition id

• Pre/Post-processing work to be ran on linux cluster– Shared filesystems allow easy data transfer

Page 15: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

Available IBM Linux ClustersKAUST SL Orientation Session

October 13, 2009

Page 16: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

16

IBM Linux clusters - overview

KAUST researchers have access to two clusters

• WATSONlinux (32-node system @ NY, USA)

• Shaheen (96-node system @ KAUST)

these systems are primarily intended to be used as auxiliary computational resources for pre/post-processing, and initial x86 code tests prior to their enablement on Shaheen

NOTE:

Page 17: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

17

IBM Linux clusters – overview continued

Page 18: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 18

Accessing the linux clusters

Need to ssh/sftp into a front-end machine

kstxfen1.watson.ibm.com

kstxfen2.watson.ibm.com

shaheenx.hpc.kaust.edu.sa

Page 19: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 19

IBM Linux clusters - Modules

a simple mechanism to update a user's environment such as PATH, MANPATH, NLSPATH, LD_LIBRARY_PATH, etc.

module list -> to show currently loaded modules

module avail -> to show available modules

module what-is <name> -> to describe <name> module

module load <name> -> to load <name> module[xxxxxxxx@n1 ~]$ module avail

--------------------------------- /opt/modules ---------------------------------Loadleveler hdf5 postprocessing/nccmpcompilers/GNU netcdf postprocessing/nclcompilers/INTEL netcdf4 totalviewfftw2 postprocessing/ferret wien2kfftw3 postprocessing/grads

Page 20: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 20

IBM Linux clusters – Programming Environment

Compilers

GNU and Intel compilers (C, C++ and FORTRAN) available

PGI has been ordered

MPI Support3

MPICH2 is default

MPICH1 and OpenMPI are available as well

It is strongly encouraged that Modules be used for compiling and linking

Page 21: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 21

IBM Linux clusters – Compiling serial codes

Intel compilers:

module load compilers/INTEL

ifort -> calls the Intel Fortran compiler

icc -> calls the Intel C compiler

icpc -> calls the Intel C++ compiler

GNU compilers:

module load compilers/GNU

gfortran -> calls the GNU Fortran compiler

gcc -> calls the GNU C compiler

g++ -> calls the GNU C++ compiler

Page 22: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

22

IBM Linux clusters – Compiling MPI codes

Intel compilers:

module load compilers/INTEL

mpicc -> calls the Intel C compiler with MPI support enabled

mpic++-> calls the Intel C++ compiler with MPI support enabled

mpif77 -> calls the Intel F77 compiler with MPI support enabled

mpif90 -> calls the Intel F90 compiler with MPI support enabled

GNU compilers:

module load compilers/GNU

mpicc -> calls the GNU C compiler with MPI support enabled

mpic++ -> calls the GNU C++ compiler with MPI support enabled

mpif77 -> calls the GNU F77 compiler with MPI support enabled

mpif90 -> calls the GNU F90 compiler with MPI support enabled

Page 23: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 23

IBM Linux clusters – INTEL MKL

The following Intel Math Kernel Libraries are available

BLAS

LAPACK

BLACS

SCALAPACK

Page 24: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 24

IBM Linux clusters – INTEL MKL

Linking codes with Intel MKL BLAS and LAPACK

Static, sequential, 64-bit integer

$MKLPATH/libmkl_solver_ilp64_sequential.a -Wl,--start-group $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread

Dynamic, multi-threaded, 64-bit integer

-L$MKLPATH $MKLPATH/libmkl_solver_ilp64.a -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread

Page 25: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 25

IBM Linux clusters – INTEL MKL

Linking codes with Intel MKL SCALAPACK and BLACS

SCALAPACK: Static, sequential, 64-bit integer, MPICH2

$MKLPATH/libmkl_scalapack_ilp64.a $MKLPATH/libmkl_solver_ilp64.a -Wl,--start-group $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -openmp -lpthread

BLACS: Dynamic, multi-threaded, 64-bit integer, MPICH2

-L$MKLPATH $MKLPATH/libmkl_solver_ilp64.a -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -Wl,--end-group -openmp -lpthread

Page 26: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

26

IBM Linux clusters – running a job

LoadLeveler job management and queuing system present

Useful LoadLeveler commandsllsubmit <job file> submit a job to LoadLeveler

llq show queued and running jobs

llcancel <job_id> delete queued or running job

llstatus display system information

[xxxxx@n1 ~]$ module load compilers/INTEL[xxxxx@n1 ~]$ module load Loadleveler[xxxxx@n1 ~]$ llsubmit jobscriptllsubmit: The job "n1.linux32.watson.ibm.com.96" has been submitted.[xxxxx@n1 ~]$ llqId Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ -----------n1.96.0 xxxxx 10/13 03:42 R 50 No_Class n1

1 job step(s) in queue, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted

Page 27: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 27

IBM Linux clusters – constructing a jobfile

EXAMPLE: parallel job with only MPI tasks

#! /bin/csh -f#@ output = out#@ error = err#@ job_type = parallel#@ node = 1#@ notification = never#@ environment = COPY_ALL #@ queue

cd $LOADL_STEP_INITDIRmpdboot -n 1 -f ${LOADL_HOSTFILE}mpiexec -n 8 ./hello_intelmpdallexit

**Here 8 MPI tasks are spawned a single (2 quadcore Xeon) compute node

Page 28: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 28

IBM Linux clusters – constructing a jobfile

EXAMPLE: parallel job with only OpenMP threads

#! /bin/csh -f#@ output = out#@ error = err#@ job_type = parallel#@ node = 1#@ notification = never#@ environment = COPY_ALL #@ queue

setenv OMP_NUM_THREADS 8

cd $LOADL_STEP_INITDIR./hello_omp_gnu

**Here 8 OpenMP threads are spawned a single (2 quadcore Xeon) compute node

Page 29: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 29

IBM Linux clusters – constructing a jobfile

EXAMPLE: parallel job with 2 MPI tasks that each spawn 8 OpenMP threads

#! /bin/csh -f#@ output = out#@ error = err#@ job_type = parallel#@ node = 2#@ notification = never#@ environment = COPY_ALL #@ queue

setenv OMP_NUM_THREADS 8

cd $LOADL_STEP_INITDIR

mpdboot -n 2 -f ${LOADL_HOSTFILE}mpiexec -np 2 ./hello_mpi_omp_intelmpdallexit

**Here 8 OpenMP threads are spawned a single (2 quadcore Xeon) compute node

Page 30: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 30

IBM Linux clusters – 3rd party software

Installation/support of 3rd party software is based on the mutual agreement between the requesting PI and KAUST SL

applications

gaussian

schrodinger

wien2k

math libraries

blas/lapack

boost

INTEL MKL

fftw

GSL

IO libraries

hdf5

netcdf

pnetcdf

debugging

gdb

Totalview (Shaheen only)

Supported software:

located under /opt

Control by MODULES

Please allow some time for supported software stack to be built

Post-processing

ferret

grads

ncl

ncview

udunits

Page 31: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

Available Texas A&M Linux ClustersKAUST SL Orientation Session

October 13, 2009

Page 32: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

Resources Available in the “Near Future”KAUST SL Orientation Session

October 13, 2009

Page 33: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 33

More resources are on the way…

Shaheen installation continues

• Expansion from 8 to 16 racks

• Full 1.9PB shared disk space

Archive not built yet

Other HPC systems are being shipped

• 256 node x86 linux cluster

• 4 SMP nodes

• 16 TESLA GPGPU nodes

• 1 PB shared disk

Page 34: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

Project & Account Creation ProceduresKAUST SL Orientation Session

October 13, 2009

Page 35: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 35

Accessing Shaheen

Organization application

• Terms and conditions acknowledgement

• Funding authorization

Project proposal

• Scientific description

• Authorized researchers

Individual application

• Personal information

Page 36: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 36

Accessing Shaheen (restrictions)

Nationals of “group E” countries

• Cuba

• Iran

• North Korea

• Sudan

• Syria

Page 37: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 37

Accessing Shaheen (restrictions)

Unauthorized research

• Weapons

• Rockets

• Unmanned aerial vehicles

• Nuclear fuel facilities (except by treaty)

• Heavy water production facilities (except by treaty)

Page 38: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

KAUST King Abdullah University of Science and Technology 38

Contacting Us

Our internal wiki/website is available to KAUST users

http://www.hpc.kaust.edu.sa

For HPC Support Queries

[email protected]

Or drop by and see us in person

Level 0, Building 1(across from the cafeteria)

Offices 0121-0126

Page 39: KAUST Supercomputing Laboratory Orientation Workshop October 13, 2009

Thank you for your attention

Questions?