science on a (linux) computer in three...
Post on 08-May-2018
220 Views
Preview:
TRANSCRIPT
Science on a (LINUX) computer in three parts
Introductory short couse, Part I
Thorsten BeckerUniversity of Southern California, Los Angeles
September 2007
Contents
● Part one: UNIX and computers
● Part two: Scientific computing and data
analysis
● Part three: Matlab practical
Purpose of this short course
● Introduce UNIX-based (e.g. LINUX, Mac OSX) computing and scientific work flow environments by providing pointers for further information
● Describe what I think are best practices in moderate to high-performance computing– I will make judgments and provide specific
recommendations– I cannot possibly provide a comprehensive, fair,
or entirely up to date overview
Typographic conventions
● Most important:
– links to web based information are blue
● UNIX (or shell) commands and program names you might type at the command line are written in bold
Contents part I
● UNIX (or LINUX, used synonymously here): what and why
● The file system and Window managers● Shell environment● Editing files● Command line tools● Scripts and GUIs● Type setting, publishing, layout
UNIX: What is UNIX?
● an operating system that originated in the 70ies● build for multi-user, multi-tasking, scalable (that
was new way back then)● runs on all computing hardware, including iPOD● many flavours, free: LINUX, BSD (a version
made it into OSX), Solaris (SUN)● they are all kind of the same thing, your mileage
may vary (e.g. directory structure)● there is convergence between LINUX, Mac OS,
and Windows look and feel
UNIX: Why use UNIX?
● can use same tools and programs on laptop, workstation, and supercomputer (less important if virtualization is available)
● flexible, modular, powerful● seamless integration of C and F90 programs,
shell commands, and post-processing (UNIX is written in C)
● all important numerical tools and libraries are available
● LINUX is open (security!), and ubiquitous
UNIX: UNIX/LINUX references
● UNIX reference card● Robbins: UNIX in a nutshell● Siever at al: LINUX in a nutshell● Online list of UNIX commands● Local computing center
documentation and lecture notes● google on UNIX: commands,
shell scripts, EMACS, awk, ...
UNIX: This overview
● describes typical, ca. anno 2005, scientific workplace set-up in natural sciences
● fairly low-level, close to machine● tries to not
– spend a lot of time on point-and-click GUIs– discuss vapor ware– discuss cutting-edge programs (with unclear
support situation and user base)● will be out of date tonight
File system: Graphical Window managers and tools
● GNOME● KDE● Provide support
and interface with other apps (search, web, access files on other servers, etc.)
File system: Hardcore: The actual file system
● user versus super-user (administrator) setup● tree structure of files within directories:
– /usr/local has software– /dev has devices– /home/$USER has all the user's files, which might
be subdivided into folders like–
–
–
–
–
–
– /mnt/data/ might hold shared data/storage
File system: But where is it? The shell
● open a shell to get a command line● type commands, such as ls to list the
contents of a directory
Even if you regularly use Mac OS-X or GNOME, someknowledge of the background
can save the day!
File system: Naming conventions
● suffixes indicate type of file: file.dat, file.c, file.f, file.f90, file.awk, file.txt, file.tex, file.ps (and determines helper applications)
● UNIX is case sensitive● normally, use lower case for files and
directories● some symbols (e.g.: *, %, ?) are special, if you
want those literally you got to quote (\*, \%, \?)● different quotes (”, ', `) have different meanings
File system: ls: list contents of directories
becker@jackie:~ > lscalendar data dokumente idl_gmt mail plates public_html RCS subduct TEX unison.logCITCOM Desktop evolution ioffice mylibs progs quakes Screenshot.png teaching tmpbecker@jackie:~ > ls F ltotal 6500rwrr 1 becker users 1638 Jun 17 07:39 calendardrwxrwxrx 4 becker users 4096 Jun 17 07:39 CITCOM/drwxrxrx 35 becker users 4096 Jul 12 15:22 data/drwx 2 becker users 4096 Jul 26 17:20 Desktop/drwxrxrx 25 becker users 4096 Jul 12 07:48 dokumente/drwx 7 becker users 4096 Jun 17 11:53 evolution/drwxrxrx 3 becker users 20480 Jul 27 15:00 idl_gmt/drwxrxrx 3 becker users 4096 Jun 17 07:39 ioffice/drwx 2 becker users 4096 Jul 7 12:21 mail/drwxrxrx 15 becker users 4096 Jun 17 07:46 mylibs/drwxrxrx 12 becker users 4096 Jun 17 07:46 plates/drwxrxrx 12 becker users 4096 Jun 17 07:46 progs/drwxrxrx 27 becker users 12288 Jul 18 19:20 public_html/drwxrxrx 4 becker users 4096 Jun 17 07:47 quakes/drwxrwxrx 2 becker users 4096 Jun 17 07:39 RCS/rwrr 1 becker users 35775 Jul 27 16:15 Screenshot.pngdrwxrwxrx 5 becker users 4096 Jun 17 07:47 subduct/lrwxrwxrwx 1 becker users 19 Jun 17 07:39 teaching > dokumente/teaching//drwxrxrx 29 becker users 4096 Jul 26 17:39 TEX/lrwxrwxrwx 1 becker users 12 Jun 16 17:28 tmp > /mnt/dos/tmp/rw 1 becker users 6508582 Jul 27 15:01 unison.log
File system: Commands have options
● command output and workings can be modified by adding -x (or x for tar)
● ls:– ls -F– ls -la
● usually, you can do “command --help” to learn more
● often, there are long version: ls --all --full● man pages (RTFM): “man command”
File system: File system commands I
● cp: copy files (will normally overwrite!)– cp filea fileb
● rm: remove files (for real!)– rm goneforever.dat– rm -i goneforever.dat
● mkdir: make directories– mkdir new_dir/
● cd: change directories (cd ..; cd -; cd ~)● pwd: print current directory
File system: File system commands II
● scp: copy files across machines– scp filea user@machine.usc.edu:~/directory/fileb
● more: display files– more filea.dat
● ln: create (symbolic) links (shortcuts in Windows) – cd new_dir– ln -s ../old_dir/script .– soft vs. hard: deletion of hard link deletes file
File system: Using regular expressions
● * (all): cp *.dat new_dir/
● [pat] (pattern): cp file[1-5].dat new_dir
● ? (single letter/number): cp file??.dat new_dir
● rm -rf * (DON'T TRY IT, IT WORKS)
File system: Permissions
● first character: - (file), d (directory), l (link)● r: read w: write x: execute or list● u: user g: group a: all o: other
– chmod u+x file– chmod a+r *.dat– chmod -R o-rwx my_stuff
● whoami, id: output of user and group
rwrr 1 becker users 1638 Jun 17 07:39 calendar{ {{
{u g auser group size ctime filename
Shells: The environment
● shells: interpret your commands when logged in and using a terminal session
● csh, tcsh: nice for interactive stuff, syntax close to C, command completion, auto correction
● bash, ksh: nice for programming● shells use mostly same commands, but there
are differences in the script languages, e.g.– export var=100 (bash)– setenv var 100 (csh)
Shells: Can use variables, and many are predefined (csh example)
becker@jackie:~ > setenv region 0/360/90/90
becker@jackie:~ > echo $region0/360/90/90
becker@jackie:~ > echo $HOME/home/becker
becker@jackie:~ > echo $USERbecker
becker@jackie:~ > envBIBINPUTS=.:/home/becker/TEX//:CFLAGS_DEBUG=g DDEBUG DDEBUG DLINUX_SUBROUTINE_CONVENTIONLDFLAGS=posixlib nofor_main Vaxlib L/usr/lib/gcc/i386redhatlinux/3.4.3/ lg2c lmMANPATH=/home/becker/progs/man/:/home/becker/progs/man/:DVIPSHEADERS=/home/becker/TEX//:SUPPORTED=en_US.UTF8:en_US:enSSH_AGENT_PID=31881HOSTNAME=jackie.usc.eduDXROOT=DXMEMORY=128CONFC=ifortHOST=jackie.usc.eduSHELL=/usr/bin/tcshFFLAGS_DEBUG=g DDEBUG fpp nofor_main DDEBUG
....
Shells: Source startup scripts
● ~/.login (= $HOME/.login) at startup● ~/.cshrc every time you start a shell● those scripts are where you define
environment variables and aliases you want to use in all sessions– alias rm 'rm -i'– setenv F77 ifort; setenv FFLAGS “-O3 -ipo”
● see references on UNIX and dotfiles.com
Shells: A few lines from my .tcshrc
# set architecture flag, e.g., ip27 for IRIX and i686 for Pentium#setenv ARCH `uname m | gawk '{print(tolower($1))}'`## hostname without domain#setenv myhostname `hostname | gawk '{split($1,a,".");print(a[1])}'`#if ( $ARCH == "i686" ) then # # Pentium/Xeon Linux system # # GMT etc setenv GMT_VERSION GMT3.4.5 #setenv GMT_VERSION GMT4.0 set local_gmt_path = /usr/local/src/${GMT_VERSION}/ set local_netcdf_dir = /usr/local/src/netcdf3.5.0/
...set rmstarset corect=cmdset autocorrectset nobeepset prompt=”%B%n@%m:%b%~\n> “
....
Shells: Command history and other feature that save typing
● use up, down, left, right arrows to navigate and edit commands on the command line
● use a bunch of tricks to access and modify last commands, e.g.– !n: execute last command that starts with “n”
● auto-completion (TAB key)● auto-correction● many more tricks
Shells: Job control
● ps: list currently running processes● jobs: list current jobs (processes started from
shell in background)● running commands in background
– emacs & (or: emacs; CTRL-Z; bg)– echo mybigjob.exe | nohup (don't quit with shell)– kill %2 (kill the second job running, % are job IDs)– kill -9 12344 (kill process with PID 12344)
● top: show machine load
Shells: Job control example
becker@jackie:~ > ps PID TTY TIME CMD 1758 pts/5 00:00:00 tcsh 2500 pts/5 00:00:00 ps
becker@jackie:~ > ps aux | tail
becker 1413 0.0 0.0 4348 1004 ? S 15:10 0:00 /bin/sh /usr/bin/realplay /tmp/youfm_cms.rambecker 1418 3.2 1.1 79664 12000 ? Sl 15:10 3:24 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.rambecker 1420 0.0 0.5 25720 5260 ? S 15:10 0:00 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.rambecker 1421 0.0 0.5 25720 5260 ? S 15:10 0:00 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.rambecker 1642 0.0 0.1 5200 1776 pts/3 Ss+ 16:03 0:00 cshbecker 1703 0.0 0.1 6624 1764 pts/4 Ss+ 16:08 0:00 cshbecker 1758 0.0 0.1 5328 1984 pts/5 Ss 16:14 0:00 cshbecker 1888 0.0 1.6 27332 17228 ? S 16:26 0:01 /usr/lib/acroread/Reader/intellinux/bin/acroread display :0.0 name main visual default +useFrontEndProgram xrm *useNullDoc:false progressPipe 3 xrm *noPrivateColormap:true xrm *exitPipe:4becker 2501 0.0 0.0 3032 772 pts/5 R+ 16:54 0:00 ps auxbecker 2502 0.0 0.0 4212 532 pts/5 R+ 16:54 0:00 tail
becker@jackie:~ > kill 1413...
Shells: Cluster job control
● on large, parallel machines one typically runs batch schedulers or queing systems
● this allows distributing jobs and utilizing resources efficiently
● PBS– qsub myjob.exe -tricky_options -q large– qstat | grep $USER– pbstop– qdel job-ID
Editors: Editing text or ASCII data files
● vi: old school: fast, efficient, bizarre– controlled by typing commands like !w, /text– good for minor editing tasks, required for admins
● emacs: best overall tool– GUI, menus– flexible, expandable– bizarre
● tons of others, but don't use Word or such, since UNIX expects pure ASCII characters
UNIX tools: Command line tools for file management
● more, less: display files page by page interactively
● cat: display file● head: display first few lines of file● tail: guess● paste: align files with columns row by row
● paste file1.dat file2.dat ● wc: count words, lines, and bytes of file
UNIX tools: Pipes and redirection (ksh example)
● >: redirect stdout, <: stdin, 2>: stderr● >>: append, |: pipe
– cat file1.dat > combined.dat
– cat file2.dat >> combined.dat
– cat file1.dat | wc● myconvectioncode.exe < input.dat● echo Whatever! > /dev/null● mycode > log.dat 2> error.dat
UNIX tools: grep and sort
● grep: find patterns in file– grep my_function *.c | more– grep -ni my_function.*c (disregard case and list
line numbers)● sort: sort row data
– sort -n +2 file.dat● uniq: only print unique lines
– sort -n splitting.dat | uniq > stations.dat
UNIX tools: awk and sed
● awk: (or gawk) powerful language for ASCII data and text manipulations– like C, interpreted at run time– the best thing since sliced bread
● cat file.dat | gawk '{print($2,cos($5))}' or● gawk '{print($2,cos($5))}' file.dat
● sed: streaming text editor– sed 's/Bush/Kerry/g' file.dat > new_file.dat
● perl: more powerful, more complex
UNIX tools: compression and dealing with big files
● gzip: compress ASCII files, which can be huge, to binary– compress: gzip file– uncompress: gunzip file.gz
● can write gzipped from within C, can use gunzip on the fly (zcat): this allows using nice ASCII tools such as awk while storing things compactly
● bzip2: smaller files, takes longer
UNIX tools: storage and backup
● tar: package multiple files into one file or tape drive (e.g. to backup or send across internet)– pack: tar cvf package.tar file_dir/*– display contents: tar tf package.tar– expand: tar xvf package.tar
● Unison file synchronizer (to sync laptop and workstation)
● backup all the time (it's easy to screw up big time), or have your admin backup for you
UNIX tools: Getting smart: a few tricks
● unpacking on the fly:– gunzip -c newsoftware.tgz | tar xv
● interpreting commands on the fly:– echo $variable_a `cat file.dat | gawk -f mean.awk`
● tcsh interactive functionality:– foreach f ( *.ps)
● convert $f $f:r.gif– end
Scripts: Scripts and GUIs
● scripts are the opposite of point-and-click● need to work hard once to generate template● benefit forever if you want to produce more
products (e.g. plots) using different parameters, or if the data has changed
● automate research galore (needed to explore parameter space)
● scripts can serve as documentation of steps taking to analyse data and produce results
Scripts: An example script#!/bin/bash## run run_fstrack for different models#models=${1"pmDsmean_nt pmDnngrand_nt saf1 saf2 saf3 "}strains=${2"2 1 0.5"}
# PBS queues to useq1="becker64";q2="scec"
c=0for m in $models;do cd $m for s in $strains ;do
if [ $c eq 1 ];then queue=$q1;c=0
else queue=$q2;c=1
fi# regional../run_fstrack 1 0 0 0 $s 1 2 1 14 0 60 "" 1 $queue
done cd done
Scripts: scripting languages
● csh, tcsh● bash● perl● python● Tk● Scripted programs
– gnuplot– Matlab
● Script whenever you can (because you'll want to reproduce things exactly)
Scripts: Visual scripting languages:Tcl/TK (e.g. iGMT), GTK, Tkinter, Qt
Scripting
● Can be the way to go if individual processing steps are not time-sensitive
● If speed is an issue, need to compile from higher level language such as script
● For basic LINUX automatization tasks, bash and awk are very useful
● Python seems to be a nice middle ground for more advanced projects
top related