scientific computation (at nc state)
TRANSCRIPT
Scientific Computation (at NCState)
Gary Howell, HPC/OIT
NC State University
gary [email protected]
Scientific Computation (at NC State) – p. 1/29
Science
Theoretical
Experimental
Computational
Scientific Computation (at NC State) – p. 2/29
Science
Theoretical Simplify to get ideal cases
Experimental
Computational Last 50 years – leave in asmuch complication as the computer can dealwith
Scientific Computation (at NC State) – p. 3/29
Science
Theoretical Simplify to get ideal cases
Experimental
Computational Last 50 years – leave in asmuch complication as the computer can dealwith
Algorithm Development?
Scientific Computation (at NC State) – p. 3/29
Efficiency
Ease in setting up the computation
Using resources efficiently
Scientific Computation (at NC State) – p. 4/29
Efficiency
Ease in setting up the computation High levellanguages are quick for programming .. canget the answer same day ?
Using resources efficiently Can get a betterquality answer from the same computationalresource .. price of much work for theprogrammer ?
Scientific Computation (at NC State) – p. 5/29
Command Line vs. Mouse
Point and Click computing
Command Line
Scientific Computation (at NC State) – p. 6/29
Command Line vs. Mouse
Point and Click computing Can only do whatis already set up
Command Line linux or cygwin fromWindows or darwin from a Mac. Once you’velaboriously constructed a sequence ofcommands, you turn it into a file and just typethe name of the file.
Need to use a native editor to linux, cygwin,darwin, as Windows dos files have an extrafew symbols
Scientific Computation (at NC State) – p. 6/29
High Level Languages
Can get to an answer quickly .. more powerfulthan a calculator ..
Maple, Mathematica
Matlab,
Sas
Python (numpy, scipy)
Scientific Computation (at NC State) – p. 7/29
High Level Languages
Can get to an answer quickly .. more powerfulthan a calculator ..
Maple, Mathematica Reduce
Matlab, Octave
Sas R
Python (numpy, scipy) color are open source
Scientific Computation (at NC State) – p. 7/29
High Level Languages
Can get to an answer quickly .. more powerfulthan a calculator ..
Maple, Mathematica Reduce,Sage includesall the red colored languages
Matlab, Octave
Sas R
Python (numpy, scipy) color are open source
Open Source codes are good when you move onand find that you might have to purchase yourown license
Scientific Computation (at NC State) – p. 7/29
Some more specialized packages
finite elements
fluids
chemistry codes
weather and oceanographic
genetics codes
Scientific Computation (at NC State) – p. 8/29
Some more specialized packages
finite elements Abaqus, ANSYS
fluids CFX
chemistry codes Gaussian, VASP, Amber
weather and oceanographic
genetics codes
Scientific Computation (at NC State) – p. 8/29
Some more specialized packages
finite elements Abaqus, ANSYS Elmer
fluids CFX OpenFOAM
chemistry codes Gaussian, VASP, Amberlammps, nwchem, abinit, espresso, ..
weather and oceanographic wrf wrf-chem,cmaq, ROMS ..
genetics codes Mr. Bayes, BLAST
Scientific Computation (at NC State) – p. 8/29
Lower Level Languages
Fortran 77 – slightly higher level than C
C – more versatile than Fortran
Scientific Computation (at NC State) – p. 9/29
Lower Level Languages
Fortran 77 – slightly higher level than C ,Fortran 90
C – more versatile than Fortran , C++
Fortran 90 and C++ are more "modern". Havefeatures that allow large packages to be writtenand maintained. But are also more complicatedthan C or Fortran 77
Scientific Computation (at NC State) – p. 9/29
Lower Level Languages
Fortran and C codes need to be compiled. Addsan extra step to the code writing process. Allowsloops to execute efficiently.For languages like R and Matlab-Octave, we canwrite the code a line at a time and execute it,good for verifying it as you go. For Fortran or Cyou have to compile then execute. If you want tostep through the code, you attach a debugger tothe compiled code.
Scientific Computation (at NC State) – p. 10/29
Libraries
dense linear algebra
sparse linear algebra
FFT
numeric libraries (e.g. evaluate Besselfunction, generate random number, computeintegrals )
Scientific Computation (at NC State) – p. 11/29
Linking to Libraries
To compile codes into executables in 2 phases.
compile .f or .c (source code files) to .o objectfiles by for example
pgf90 -c foo.f
Libraries are collections of .o files with anextension .a Fortran modules have extension.mod. The .a, .o and .mod files are linkedtogether to get an executable "foo" by (forexample)
pgf90 -o foo *.o *.a *.mod
Then we can run the executable by
./fooScientific Computation (at NC State) – p. 12/29
Fast High level languages
If the high level code, e.g. Matlab as a wrapperfor LAPACK, is spending most of its time in callsto LAPACK, it’s pretty efficient.You can sometimes get the high level code to callC or Fortran subroutines .. e.g. mex files inMatlab.
Scientific Computation (at NC State) – p. 13/29
Converting Code
You can transition high level code to low level byrewriting it so that lines of the code correspond tolibrary calls.For example, "y =A*x" in Matlab corresponds to acall to the DGEMV subroutine (from BLAS)called in a Fortran or C code.
Scientific Computation (at NC State) – p. 14/29
Parallelization Libraries
Shared Memory
Distributed Memory
Scientific Computation (at NC State) – p. 15/29
Parallelization Libraries
Shared Memory Processors can see thesame variables .. access the same RAM, e.g.multi-core machines. Many NCSUcomputational nodes have 8 cores.
Distributed Memory Processors are joined bya network, e.g., ethernet, programmer mustspecify what information is to be sent over thenetwork. Hundreds of computational nodesare available.
Scientific Computation (at NC State) – p. 15/29
Shared Memory
Pthreads, Cilk, OpenMP, UPC
OpenMP is particularly easy to use. InsertPragmas in a Fortran or C code to run a loopin parallel. The code will still run in serialmode. See the short course on-line athpc.ncsu.edu (under Courses).
All our new HPC blades have at least 8 cores.Some have 16 cores and 128 GBytes ofRAM. Anyone who can write a code inFortran or C can try OpenMP at a fraction ofthe effort of writing the code.
Scientific Computation (at NC State) – p. 16/29
Shared Memory Libraries
One way to take advantage of shared memory isjust to call shared memory libraries.
dense linear algebra
sparse linear algebra
FFT
Scientific Computation (at NC State) – p. 17/29
Shared Memory Libraries
One way to take advantage of shared memory isjust to call shared memory libraries.
dense linear algebra – acml and mkl BLASand LAPACK libraries. use 64 bit integerversions for large problems
sparse linear algebra – SuperLUmulti-threaded version
FFT – Multi-thread version of fftw
Scientific Computation (at NC State) – p. 17/29
Distributed Memory via MPI
MPI –Calls to Message Passing Interface libraryroutines "mail" i data from one processor toanother. Accomplished via MPI library callsinserted into Fortran or C (or R or Matlab orJava) routines.Most computer codes which use manyprocessors use MPI to communicate betweenprocessors. See short course at hpc.ncsu.edufor a short introduction.
Scientific Computation (at NC State) – p. 18/29
Difficulties of MPI
Messages are slow compared tocomputation. Starting a communication is thesame time as O(105) floating point adds ormultiplies point adds or multiplies (flops). Ittakes only a few hundred flops to equal afetch of a number from RAM (memory).
Can require substantial reorganisation ofcode and careful distribution of data.
Debugging can be difficult.
Scientific Computation (at NC State) – p. 19/29
Advantages of MPI
It’s standard, available everywhere, will stillwork on the next comuter cluster you use.
We (NC State) have thousands oncomputational nodes on-line. You can get anaccount to use them. (get your advisor to goto hpc.ncsu.edu and request an account).
Scientific Computation (at NC State) – p. 20/29
Many MPI codes already exist
For example, the specialized packages from aprevious slide.
finite elements – Abaqus, ANSYS (Elmer)
fluids – CFX (OpenFOAM)
chemistry codes – Gaussian, VASP, Amber(lammps, nwchem, abinit, espresso, .. )
weather and oceanographic – (wrf wrf-chem,cmaq, ROMS)
genetic tree codes (Mr. Bayes, BLAST)
Scientific Computation (at NC State) – p. 21/29
Distributed Memory Libraries
And of course many distributed memory librariesexist.
dense linear algebra
sparse linear algebra
FFT
Scientific Computation (at NC State) – p. 22/29
Distributed Memory Libraries
And of course many distributed memory librariesexist.
dense linear algebra – SCALAPACK
sparse linear algebra –PETsc or Trilinos
FFT –distributed version of fftw
Scientific Computation (at NC State) – p. 22/29
Running parallel codes on a cluster
To run on in serial, in linux we might type./fooand then wait for the prompt to come back. But ifeveryone does this on one of the cluster loginnodes, then it locks up and we have to reboot.Generally, you’re sharing the login node with lotsof other users and really it’s nothing but a"souped up PC" so you’d be better off runningthe code on your laptop.
Scientific Computation (at NC State) – p. 23/29
Running jobs on a cluster
So on any cluster, you instead submit jobs to runin a queue. On our cluster you submit a job tothe queue bybsub < bfoowhere bfoo is a file you wrote specifying howmuch time the job will need, how manyprocessors, and names of the files where outputwill be written.The job or jobs may run a few days. You don’thave to stay logged in, but can monitor theprogress of the output files.
Scientific Computation (at NC State) – p. 24/29
Running jobs on a cluster – II
For a serial job one line of bfoo might be "./foo(same as running from a command line.) For aparallel job the corresponding line might bempiexec_hydra ./fooSee the HowTo on hpc.ncsu.edu
Scientific Computation (at NC State) – p. 25/29
GPUs are fast
Graphical Processing Units were developed torun games. They have high bandwidth and lotsof simple processors. They can be programmedwith
OpenCL (standard)
CUDA – specific to NVIDIA
Both are extensions to C. OIT HPC does notsupport GPUs, but there is some expertise in theCS department. For Example, Frank Mueller hasa GPU cluster.
Scientific Computation (at NC State) – p. 26/29
Disadvantages to GPUs
Lower level programming.
Not as many libraries available
Better at single precision arithmetic.
Arithmetic is substandard (don’t round .. chop.. ?)
Often only one portion of the code can run onthe GPU .. poor bandwidth back to the mainprocessor .. limited memory on the GPU
Scientific Computation (at NC State) – p. 27/29
Distributed memory with attached GPUs
The world’s fastest computers typically havethousands of multi-cores (shared memory)processors with attached GPUS.So if you want to scale your Matlab code to runreally fast ?
Scientific Computation (at NC State) – p. 28/29
Scaling high level code to run fast
Do as much as possible while the code is still inMatlab.
You can insert MPI calls from the bcMPIlibrary, so then you have parallel matlab (orparallel octave) code.
Figure out how to do the matlab commands interms of libraries (e.g. matrix operations arefrom LAPACK or BLAS or ..may run evenfaster if they are call from a GPU library).
Figure out how to partition the data and see ifyou can manage that within Matlab. Finallyyou can convert it to Fortan or C.
Scientific Computation (at NC State) – p. 29/29