science on a (linux) computer in three...

Introductory short couse, Part II

Thorsten BeckerUniversity of Southern California, Los Angeles

September 2007

Science on a (LINUX) computer in three parts

http://geodynamics.usc.edu/~becker/teaching.html#unix

The first part dealt with

● UNIX: what and why● File system, Window managers● Shell environment● Editing files● Command line tools● Scripts and GUIs● Virtualization

Contents part II

● Typesetting● Programming

– common languages– philosophy– compiling, debugging, make, version control– C and F77 interfacing– libraries and packages

● Number crunching● Visualization tools

Programming: Traditional languages in the natural sciences

● Fortran: higher level, good for math– F77: legacy, don't use (but know how to read)– F90/F95: nice vector features, finally implements C

capabilities (structures, memory allocation)● C: low level (e.g. pointers), better structured

– very close to UNIX philosophy– structures offer nice way of modular programming,

see Wikipedia on C● I recommend F95, and use C happily myself

http://en.wikipedia.org/wiki/Fortran

http://www.ictp.trieste.it/~manuals/programming/sun/fortran/f77rm/1_elements.doc.html

http://www-ccs.ucsd.edu/c/

http://en.wikipedia.org/wiki/C_programming_language

Programming: Some Languages that

haven't completely made it to

scientific computing● C++: object oriented programming model

– reusable objects with methods and such– can be partly realized by modular programming in C

● Java: what's good for commercial projects (or smart, or elegant) doesn't have to be good for scientific computing

● Concern about portability as well as general access

Programming: Compromises

● Python– Object oriented

– Interpreted

– Interfaces easily with F90/C

– Numerous scientific packages

Programming: Other interpreted, high-

abstraction languages for scientists

● Matlab (or octave)– language is some mix between F77 and C, highly

vectorized– might be good enough for your tasks– interpreted (slower), but can be compiled

● awk, perl, etc: script languages– interpreted C, good for simple data processing

● IDL: visualization mostly, like matlab● Mathematica: symbolic math mostly

http://www.mathworks.com/access/helpdesk/help/helpdesk.html

http://www.octave.org/

http://www.rsinc.com/idl/

http://documents.wolfram.com/mathematica/

Programming:

Trade-offs for scientific computing

assembler

C

F95

matlab

abstraction, convenience

efficiency, speed

F77

Javapython

Programming: Most languages need to be

compiled to assembler language

● $(F77) $(FFLAGS) -c main.f -o main.o● there are standards, but implementations differ,

especially for F77/F90/F95, not so much for C● the compiler optimization flags, and the choice of

compiler, can change the execution time of your code by factors of up to ~10, check out someexample benchmarks for F90

● the time you save might be your own● don't expect -O5 to work all the time

http://www.polyhedron.co.uk/compare/linux/f90bench_p4.html

Programming: Philosophy

● strive for transparency by commenting and documenting the hell out of your code

● create modularity by breaking tasks into reusable, general, and flexible bits and pieces

● achieve robustness by testing often● obtain efficiency by avoiding unnecessary

computational operations (instructions)● maintain portability by adhering to standards

D. Knuth: The art of computer programming

http://www-cs-faculty.stanford.edu/~knuth/taocp.html

Example

project:

main.c

Programming

http://geodynamics.usc.edu/~becker/ftp/unix/example.tar

Example

project:

mysincos.h

Programming


Example

project:

myfunctions.c

Programming


Building: How to compile the example project

becker@jackie:~/dokumente/teaching/unix/example > lsbin/ main.c makefile myfunctions.c mysincos.h objects/ RCS/

becker@jackie:~/dokumente/teaching/unix/example > cc main.c -c

becker@jackie:~/dokumente/teaching/unix/example > cc -c myfunctions.c

becker@jackie:~/dokumente/teaching/unix/example > cc main.o myfunctions.o -o mysincos -lm

becker@jackie:~/dokumente/teaching/unix/example > echo 0 90 -90 | mysincosmysincos: reading x values in degrees from stdin 0 1 1 6.12303e-17 -1 6.12303e-17mysincos: computed 3 pairs of sines/cosines

Automating the build process withmake: makefile

Building:

Building: Building with make

becker@jackie:~/dokumente/teaching/unix/example > make dirsif [ ! -s ./objects/ ]; then mkdir objects;fi;if [ ! -s objects/i686/ ];then mkdir objects/i686/;fi;\if [ ! -s ./bin/ ];then mkdir bin;fi;\if [ ! -s bin/i686/ ];then mkdir bin/i686;fi;becker@jackie:~/dokumente/teaching/unix/example > makeicc -no-gcc -O3 -unroll -vec_report0 -DLINUX_SUBROUTINE_CONVENTION -c main.c \-o objects/i686//main.oicc -no-gcc -O3 -unroll -vec_report0 -DLINUX_SUBROUTINE_CONVENTION -c myfunctions.c \-o objects/i686//myfunctions.oicc objects/i686//main.o objects/i686//myfunctions.o -o bin/i686//mysincos -lmbecker@jackie:~/dokumente/teaching/unix/example > makemake: Nothing to be done for `all'.becker@jackie:~/dokumente/teaching/unix/example > touch main.cbecker@jackie:~/dokumente/teaching/unix/example > makeicc -no-gcc -O3 -unroll -vec_report0 -DLINUX_SUBROUTINE_CONVENTION -c main.c \-o objects/i686//main.oicc objects/i686//main.o objects/i686//myfunctions.o -o bin/i686//mysincos -lm

Building: Version control

● RCS, SCCS, CVS: tools to keep track of changes in any documents, such as source code or HTML pages

● different versions of a document are checked in and out and can be retrieved by date, version number etc.

● I recommend using RCS for everything

Building: RCS example

becker@jackie:~/dokumente/teaching/unix/example > co l main.cRCS/main.c,v > main.crevision 1.2 (locked)done

becker@jackie:~/dokumente/teaching/unix/example > emacs main.c

becker@jackie:~/dokumente/teaching/unix/example > ci u main.cRCS/main.c,v < main.cnew revision: 1.3; previous revision: 1.2enter log message, terminated with single '.' or end of file:>> corrected some typos>>done

becker@jackie:~/dokumente/teaching/unix/example > rcsdiff r1.2 main.c===================================================================RCS file: RCS/main.c,vretrieving revision 1.2diff r1.2 main.c9c9< $Id: main.c,v 1.2 2005/07/30 19:57:34 becker Exp $> $Id: main.c,v 1.3 2005/07/30 20:42:07 becker Exp $27c27< fprintf(stderr,"%s: reading x values in degrees from stdin\n",> fprintf(stderr,"%s: reading x angles in degrees from stdin\n",

Building: Version control is worth it

● small learning curve, big payoff● EMACS can integrate RCS, make,

debugging etc. consistently and conveniently● opening files, checking in/out can be done

with a few keystrokes or menu options

Building: EMACS modes

● EMACS is just one example of a programming environment

● e.g. there is a vi mode within EMACS

● dotfiles.com on .emacs

http://www.dotfiles.com/index.php3?app_id=6

Building: Debugging

● put in extra output statements into code (still the only option for MPI code, kind of)

● use debuggers:– compile without optimization: cc -g main.c -c – gdb: command line debugger

● gbg bin/i686/mysincos● (gdb) run● after crash, use where to find location in code that

caused coredump, etc.– visual debuggers: ddd, photran, etc.

http://www.gnu.org/software/ddd/

http://www.photran.org/

Building: ddd

Building: Eclipse environment (see also Code warrior etc.)

Tuning: Tuning programs

● use compiler options, e.g. (AMD64 in brackets)– GCC: -O3 -march=pentium4 (-m64 -m3dnow -march=k8) -funroll-loops -ffast-math -fomit-frame-pointer

– ifc: -O3 -fpp -unroll -vec_report0– PGI: -fast -Mipa (-tp=amd64)

● use profilers to find bottlenecks● use computational libraries

Tuning: Using standard subroutines and libraries

● many tasks and problems have been encountered by computational scientists for the last 50 years

● don't re-invent the wheel● don't use black boxes either● exceptions: I/O, visualization, and complex

matrix operations

Algorithms: Collection of subroutines: Numerical recipes

● THE collection of standard solutions in source code

● usually works, might not be best, but OK● NR website has the PDFs, I recommend to buy

the book● criticism of numerical recipes● website on alternatives to numerical recipes● use packages like LAPACK, if possible● else, use NR libraries but check● MATLAB wraps NumRec, LAPACK & Co.

http://www.nrbook.com/a/bookcpdf.php

http://nr.com/

http://amath.colorado.edu/computing/Fortran/numrec.html

http://www.fceia.unr.edu.ar/~fisicomp/apuntes/biblios/altnr.htm

Algorithms: Take a class or read a book on inverse theory

● Standard texts– Menke (good, hands on, expensive)– Parker (mathematical)

● Newer texts– Gubbins– Tarantola

● Class notes– Marc Spiegelman's MMM: great for PDEs– WHOI 12.747 class notes on with focus on

geoscience and oceanography (encompasses some of the Numerical Recipes material)

http://www.amazon.com/Geophysical-Data-Analysis-International-Geophysics/dp/0124909213/ref=sr_1_3/104-8632725-7247902?ie=UTF8&s=books&qid=1188449938&sr=1-3

http://www.amazon.com/Geophysical-Inverse-Theory-Robert-Parker/dp/0691036349/ref=pd_bbs_sr_1/104-8632725-7247902?ie=UTF8&s=books&qid=1188449938&sr=1-1

http://www.amazon.com/Time-Analysis-Inverse-Theory-Geophysicists/dp/0521819652/ref=sr_1_5/104-8632725-7247902?ie=UTF8&s=books&qid=1188449938&sr=1-5

http://www.amazon.com/Inverse-Problem-Methods-Parameter-Estimation/dp/0898715725/ref=pd_bbs_sr_1/104-8632725-7247902?ie=UTF8&s=books&qid=1188451083&sr=8-1

http://www.ldeo.columbia.edu/~mspieg/mmm/

http://w3eos.whoi.edu/12.747/

Tuning: Linear algebra packages, examples

● EISPACK, ARPACK: eigensystem routines● BLAS: basic linear algebra (e.g. matrix

multiplication)● LAPACK: linear algebra routines (e.g. SVD,

LU solvers), SCALAPACK (parallel)● parallel solvers: MUMPS (parallel, sparse,

direct), SuperLU● PETSc: parallel PDE solvers, a whole

different level of complexity

http://www.netlib.org/eispack/

http://www.caam.rice.edu/software/ARPACK/

http://www.netlib.org/blas/

http://www.netlib.org/lapack/

http://www.netlib.org/scalapack/scalapack_home.html

http://graal.ens-lyon.fr/~jylexcel/MUMPS/

http://crd.lbl.gov/~xiaoye/SuperLU/

http://www-unix.mcs.anl.gov/petsc/petsc-2/

Tuning: Examples for some other science (meta-)packages

● netlib repository● GNU scientific library● GAMS math/science software repository● FFTW: fast fourier transforms● hardware optimized solutions

– ATLAS: automatically optimized BLAS (LAPACK)– GOTO: BLAS for Intel/AMD under LINUX– Intel MKL: vendor collection for Pentium/Itanium– ACML: AMD core math library

http://www.netlib.org/

http://www.gnu.org/software/gsl/

http://math.nist.gov/

http://www.fftw.org/

http://www.netlib.org/atlas/

http://www.cs.utexas.edu/users/kgoto/signup_first.html

http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/index.htm

http://developer.amd.com/acml.aspx

Tuning: Calling F90 from C

● some subroutines are Fortran functions which you might want to call from C

● this works if you pointerize and flip– call func(x) real*8 x as func(&x) from C– storage of x(m,n) arrays in Fortran for x(i,j) is

x[j*m+i] (fast rows) instead of x[i*n+j] (fast columns) ● C x[0,1,2,...,n-1] will be x(1,2,...,n) in Fortran● don't pass strings (hardware dependent)● BTW: Fortran direct binary I/O isn't really binary

Tuning: Matlab

● basically all actual computations within matlab are based on LAPACK, ARPACK and other library routines etc.

● Mathwork added a bunch of convenient wrappers and neat ways to do things vectorially

● plus plotting and GUI creation● Matlab might be all you need

Tuning: How to design good code● strive for transparency, modularity,

robustness, efficiency, portability● benchmark and test● document● make things sharable (no funny stuff) ● share. open is good (at least for finding

bugs)

Tuning: How to design efficient code● watch array ordering in loops x[i*n+j] ● avoid things like:

– f1=cos(x)*sin(y);f2=cos(x)*cos(y); ● use BLAS, LAPACK● experiment with compiler options which can

turn good (readable) code into efficient● look into hardware optimized packages● design everything such that you can run in

parallel (0th order) on one file system

Visualization: General 2-D

● matlab, IDL, and 1000 others● I recommend:

– GNUPLOT: nice 2-D PS plots, scriptable● plot 'a.dat' using 1 : ($3/10) title 'data' w lp

– GMT: made for mapping, but can produce outstanding quality EPS figures (with some work)

– Matplotlib for python– those are script driven programs, no real GUI

(but there is iGMT and some stuff for gnuplot)– both are available as C source code

http://www.gnuplot.info/

http://gmt.soest.hawaii.edu/

Visualization: Geographic data

● matlab, IDL, and GIS systems● Free matlab interfaces to python matplotlib● Low level routines (proj etc.)● I recommend:

– GMT: tons of projections, tool of choice for professional quality plots

– I use GMT even for x – y plots– iGMT: can help you obtain initial scripts to modify

by hand later● Link to my PDF lecture notes on GMT

http://gmt.soest.hawaii.edu/

http://www.seismology.harvard.edu/~becker/igmt/

http://www.seismology.harvard.edu/~becker/igmt/igmt_tutorial.pdf

Visualization: 3-D

● Commercial– Matlab– IDL (not so good)

● free software with available source code:– geomview: nice 3-D geometry viewer (e.g. faults)– OPEN DX: object oriented, powerful package– paraview: VTK based parallel tool

● I recommend paraview and open-dx● There is plenty of stuff around (don't write

your own)

http://www.geomview.org/

http://www.opendx.org/index2.php

http://www.paraview.org/HTML/Index.html

Type setting: Graphics embedding: always use EPS or PDF files

● Adobe PS, EPS, or PDF files are moderately portable

● EPS & PDF can be included into (pdf)latex and Word (e.g. as converted png), and easily further edited

● EPS can be used for publications● use Adobe Illustrator for EPS editing and

making posters (this is the only non-LINUX, non-freeware software I am using)

Type setting: Some notes on type setting, publishing, and layout

● Alternatives (rest of the world) use office software: Word, Excel, Powerpoint – the free alternatives from OpenOffice (Mac,

Windows, LINUX) are quite good now– this is good for small text projects without many

equations, or presentations● Latex (by D. Knuth) is much better for larger

projects (e.g. thesis), and if using a lot of equations (this might change, Word Latex capabilities)

● Latex is bizarre.

http://www.openoffice.org/

Type setting: Latex

● typesetting program, not WYSIWYG– But there are tools like LyX, which are

● takes time to learn● results look beautiful and are book quality● produces DVI, PS or PDF files● use bibtex for citing references● good short reference:

Short introduction to Latex in 120 minutes

http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf

The next lecture

● Matlab

science on a (linux) computer in three...

Documents