scientific computation (at nc state)

39
Scientific Computation (at NC State) Gary Howell, HPC/OIT NC State University gary [email protected] Scientific Computation (at NC State) – p. 1/2

Upload: doquynh

Post on 01-Jan-2017

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientific Computation (at NC State)

Scientific Computation (at NCState)

Gary Howell, HPC/OIT

NC State University

gary [email protected]

Scientific Computation (at NC State) – p. 1/29

Page 2: Scientific Computation (at NC State)

Science

Theoretical

Experimental

Computational

Scientific Computation (at NC State) – p. 2/29

Page 3: Scientific Computation (at NC State)

Science

Theoretical Simplify to get ideal cases

Experimental

Computational Last 50 years – leave in asmuch complication as the computer can dealwith

Scientific Computation (at NC State) – p. 3/29

Page 4: Scientific Computation (at NC State)

Science

Theoretical Simplify to get ideal cases

Experimental

Computational Last 50 years – leave in asmuch complication as the computer can dealwith

Algorithm Development?

Scientific Computation (at NC State) – p. 3/29

Page 5: Scientific Computation (at NC State)

Efficiency

Ease in setting up the computation

Using resources efficiently

Scientific Computation (at NC State) – p. 4/29

Page 6: Scientific Computation (at NC State)

Efficiency

Ease in setting up the computation High levellanguages are quick for programming .. canget the answer same day ?

Using resources efficiently Can get a betterquality answer from the same computationalresource .. price of much work for theprogrammer ?

Scientific Computation (at NC State) – p. 5/29

Page 7: Scientific Computation (at NC State)

Command Line vs. Mouse

Point and Click computing

Command Line

Scientific Computation (at NC State) – p. 6/29

Page 8: Scientific Computation (at NC State)

Command Line vs. Mouse

Point and Click computing Can only do whatis already set up

Command Line linux or cygwin fromWindows or darwin from a Mac. Once you’velaboriously constructed a sequence ofcommands, you turn it into a file and just typethe name of the file.

Need to use a native editor to linux, cygwin,darwin, as Windows dos files have an extrafew symbols

Scientific Computation (at NC State) – p. 6/29

Page 9: Scientific Computation (at NC State)

High Level Languages

Can get to an answer quickly .. more powerfulthan a calculator ..

Maple, Mathematica

Matlab,

Sas

Python (numpy, scipy)

Scientific Computation (at NC State) – p. 7/29

Page 10: Scientific Computation (at NC State)

High Level Languages

Can get to an answer quickly .. more powerfulthan a calculator ..

Maple, Mathematica Reduce

Matlab, Octave

Sas R

Python (numpy, scipy) color are open source

Scientific Computation (at NC State) – p. 7/29

Page 11: Scientific Computation (at NC State)

High Level Languages

Can get to an answer quickly .. more powerfulthan a calculator ..

Maple, Mathematica Reduce,Sage includesall the red colored languages

Matlab, Octave

Sas R

Python (numpy, scipy) color are open source

Open Source codes are good when you move onand find that you might have to purchase yourown license

Scientific Computation (at NC State) – p. 7/29

Page 12: Scientific Computation (at NC State)

Some more specialized packages

finite elements

fluids

chemistry codes

weather and oceanographic

genetics codes

Scientific Computation (at NC State) – p. 8/29

Page 13: Scientific Computation (at NC State)

Some more specialized packages

finite elements Abaqus, ANSYS

fluids CFX

chemistry codes Gaussian, VASP, Amber

weather and oceanographic

genetics codes

Scientific Computation (at NC State) – p. 8/29

Page 14: Scientific Computation (at NC State)

Some more specialized packages

finite elements Abaqus, ANSYS Elmer

fluids CFX OpenFOAM

chemistry codes Gaussian, VASP, Amberlammps, nwchem, abinit, espresso, ..

weather and oceanographic wrf wrf-chem,cmaq, ROMS ..

genetics codes Mr. Bayes, BLAST

Scientific Computation (at NC State) – p. 8/29

Page 15: Scientific Computation (at NC State)

Lower Level Languages

Fortran 77 – slightly higher level than C

C – more versatile than Fortran

Scientific Computation (at NC State) – p. 9/29

Page 16: Scientific Computation (at NC State)

Lower Level Languages

Fortran 77 – slightly higher level than C ,Fortran 90

C – more versatile than Fortran , C++

Fortran 90 and C++ are more "modern". Havefeatures that allow large packages to be writtenand maintained. But are also more complicatedthan C or Fortran 77

Scientific Computation (at NC State) – p. 9/29

Page 17: Scientific Computation (at NC State)

Lower Level Languages

Fortran and C codes need to be compiled. Addsan extra step to the code writing process. Allowsloops to execute efficiently.For languages like R and Matlab-Octave, we canwrite the code a line at a time and execute it,good for verifying it as you go. For Fortran or Cyou have to compile then execute. If you want tostep through the code, you attach a debugger tothe compiled code.

Scientific Computation (at NC State) – p. 10/29

Page 18: Scientific Computation (at NC State)

Libraries

dense linear algebra

sparse linear algebra

FFT

numeric libraries (e.g. evaluate Besselfunction, generate random number, computeintegrals )

Scientific Computation (at NC State) – p. 11/29

Page 19: Scientific Computation (at NC State)

Linking to Libraries

To compile codes into executables in 2 phases.

compile .f or .c (source code files) to .o objectfiles by for example

pgf90 -c foo.f

Libraries are collections of .o files with anextension .a Fortran modules have extension.mod. The .a, .o and .mod files are linkedtogether to get an executable "foo" by (forexample)

pgf90 -o foo *.o *.a *.mod

Then we can run the executable by

./fooScientific Computation (at NC State) – p. 12/29

Page 20: Scientific Computation (at NC State)

Fast High level languages

If the high level code, e.g. Matlab as a wrapperfor LAPACK, is spending most of its time in callsto LAPACK, it’s pretty efficient.You can sometimes get the high level code to callC or Fortran subroutines .. e.g. mex files inMatlab.

Scientific Computation (at NC State) – p. 13/29

Page 21: Scientific Computation (at NC State)

Converting Code

You can transition high level code to low level byrewriting it so that lines of the code correspond tolibrary calls.For example, "y =A*x" in Matlab corresponds to acall to the DGEMV subroutine (from BLAS)called in a Fortran or C code.

Scientific Computation (at NC State) – p. 14/29

Page 22: Scientific Computation (at NC State)

Parallelization Libraries

Shared Memory

Distributed Memory

Scientific Computation (at NC State) – p. 15/29

Page 23: Scientific Computation (at NC State)

Parallelization Libraries

Shared Memory Processors can see thesame variables .. access the same RAM, e.g.multi-core machines. Many NCSUcomputational nodes have 8 cores.

Distributed Memory Processors are joined bya network, e.g., ethernet, programmer mustspecify what information is to be sent over thenetwork. Hundreds of computational nodesare available.

Scientific Computation (at NC State) – p. 15/29

Page 24: Scientific Computation (at NC State)

Shared Memory

Pthreads, Cilk, OpenMP, UPC

OpenMP is particularly easy to use. InsertPragmas in a Fortran or C code to run a loopin parallel. The code will still run in serialmode. See the short course on-line athpc.ncsu.edu (under Courses).

All our new HPC blades have at least 8 cores.Some have 16 cores and 128 GBytes ofRAM. Anyone who can write a code inFortran or C can try OpenMP at a fraction ofthe effort of writing the code.

Scientific Computation (at NC State) – p. 16/29

Page 25: Scientific Computation (at NC State)

Shared Memory Libraries

One way to take advantage of shared memory isjust to call shared memory libraries.

dense linear algebra

sparse linear algebra

FFT

Scientific Computation (at NC State) – p. 17/29

Page 26: Scientific Computation (at NC State)

Shared Memory Libraries

One way to take advantage of shared memory isjust to call shared memory libraries.

dense linear algebra – acml and mkl BLASand LAPACK libraries. use 64 bit integerversions for large problems

sparse linear algebra – SuperLUmulti-threaded version

FFT – Multi-thread version of fftw

Scientific Computation (at NC State) – p. 17/29

Page 27: Scientific Computation (at NC State)

Distributed Memory via MPI

MPI –Calls to Message Passing Interface libraryroutines "mail" i data from one processor toanother. Accomplished via MPI library callsinserted into Fortran or C (or R or Matlab orJava) routines.Most computer codes which use manyprocessors use MPI to communicate betweenprocessors. See short course at hpc.ncsu.edufor a short introduction.

Scientific Computation (at NC State) – p. 18/29

Page 28: Scientific Computation (at NC State)

Difficulties of MPI

Messages are slow compared tocomputation. Starting a communication is thesame time as O(105) floating point adds ormultiplies point adds or multiplies (flops). Ittakes only a few hundred flops to equal afetch of a number from RAM (memory).

Can require substantial reorganisation ofcode and careful distribution of data.

Debugging can be difficult.

Scientific Computation (at NC State) – p. 19/29

Page 29: Scientific Computation (at NC State)

Advantages of MPI

It’s standard, available everywhere, will stillwork on the next comuter cluster you use.

We (NC State) have thousands oncomputational nodes on-line. You can get anaccount to use them. (get your advisor to goto hpc.ncsu.edu and request an account).

Scientific Computation (at NC State) – p. 20/29

Page 30: Scientific Computation (at NC State)

Many MPI codes already exist

For example, the specialized packages from aprevious slide.

finite elements – Abaqus, ANSYS (Elmer)

fluids – CFX (OpenFOAM)

chemistry codes – Gaussian, VASP, Amber(lammps, nwchem, abinit, espresso, .. )

weather and oceanographic – (wrf wrf-chem,cmaq, ROMS)

genetic tree codes (Mr. Bayes, BLAST)

Scientific Computation (at NC State) – p. 21/29

Page 31: Scientific Computation (at NC State)

Distributed Memory Libraries

And of course many distributed memory librariesexist.

dense linear algebra

sparse linear algebra

FFT

Scientific Computation (at NC State) – p. 22/29

Page 32: Scientific Computation (at NC State)

Distributed Memory Libraries

And of course many distributed memory librariesexist.

dense linear algebra – SCALAPACK

sparse linear algebra –PETsc or Trilinos

FFT –distributed version of fftw

Scientific Computation (at NC State) – p. 22/29

Page 33: Scientific Computation (at NC State)

Running parallel codes on a cluster

To run on in serial, in linux we might type./fooand then wait for the prompt to come back. But ifeveryone does this on one of the cluster loginnodes, then it locks up and we have to reboot.Generally, you’re sharing the login node with lotsof other users and really it’s nothing but a"souped up PC" so you’d be better off runningthe code on your laptop.

Scientific Computation (at NC State) – p. 23/29

Page 34: Scientific Computation (at NC State)

Running jobs on a cluster

So on any cluster, you instead submit jobs to runin a queue. On our cluster you submit a job tothe queue bybsub < bfoowhere bfoo is a file you wrote specifying howmuch time the job will need, how manyprocessors, and names of the files where outputwill be written.The job or jobs may run a few days. You don’thave to stay logged in, but can monitor theprogress of the output files.

Scientific Computation (at NC State) – p. 24/29

Page 35: Scientific Computation (at NC State)

Running jobs on a cluster – II

For a serial job one line of bfoo might be "./foo(same as running from a command line.) For aparallel job the corresponding line might bempiexec_hydra ./fooSee the HowTo on hpc.ncsu.edu

Scientific Computation (at NC State) – p. 25/29

Page 36: Scientific Computation (at NC State)

GPUs are fast

Graphical Processing Units were developed torun games. They have high bandwidth and lotsof simple processors. They can be programmedwith

OpenCL (standard)

CUDA – specific to NVIDIA

Both are extensions to C. OIT HPC does notsupport GPUs, but there is some expertise in theCS department. For Example, Frank Mueller hasa GPU cluster.

Scientific Computation (at NC State) – p. 26/29

Page 37: Scientific Computation (at NC State)

Disadvantages to GPUs

Lower level programming.

Not as many libraries available

Better at single precision arithmetic.

Arithmetic is substandard (don’t round .. chop.. ?)

Often only one portion of the code can run onthe GPU .. poor bandwidth back to the mainprocessor .. limited memory on the GPU

Scientific Computation (at NC State) – p. 27/29

Page 38: Scientific Computation (at NC State)

Distributed memory with attached GPUs

The world’s fastest computers typically havethousands of multi-cores (shared memory)processors with attached GPUS.So if you want to scale your Matlab code to runreally fast ?

Scientific Computation (at NC State) – p. 28/29

Page 39: Scientific Computation (at NC State)

Scaling high level code to run fast

Do as much as possible while the code is still inMatlab.

You can insert MPI calls from the bcMPIlibrary, so then you have parallel matlab (orparallel octave) code.

Figure out how to do the matlab commands interms of libraries (e.g. matrix operations arefrom LAPACK or BLAS or ..may run evenfaster if they are call from a GPU library).

Figure out how to partition the data and see ifyou can manage that within Matlab. Finallyyou can convert it to Fortan or C.

Scientific Computation (at NC State) – p. 29/29