2/12/04 distributed & parallel computing cluster patrick mcguigan [email protected]
TRANSCRIPT
2/12/04
DPCC Background
NSF funded Major Research Instrumentation (MRI) grant
Goals Personnel
– PI– Co-PI’s– Senior Personnel– Systems Administrator
2/12/04
DPCC Goals
Establish a regional Distributed and Parallel Computing Cluster at UTA (DPCC@UTA)
An inter-departmental and inter-institutional facility Facilitate collaborative research that requires large
scale storage (tera to peta bytes), high speed access (gigabit or more) and mega processing (100’s of processors)
2/12/04
DPCC Research Areas
Data mining / KDD– Association rules, graph-mining, Stream processing etc.
High Energy Physics– Simulation, moving towards a regional Dø Center
Dermatology/skin cancer– Image database, lesion detection and monitoring
Distributed computing– Grid computing, PICO
Networking– Non-intrusive network performance evaluation
Software Engineering– Formal specification and verification
Multimedia– Video streaming, scene analysis
Facilitate collaborative efforts that need high-performance computing
2/12/04
DPCC Personnel
PI Dr. Chakravarthy CO-PI’s
– Drs. Aslandogan, Das, Holder, Yu
Senior Personnel– Paul Bergstresser, Kaushik De, Farhad
Kamangar, David Kung, Mohan Kumar, David Levine, Jung-Hwan Oh, Gregely Zaruba
Systems Administrator– Patrick McGuigan
2/12/04
DPCC Components
Establish a distributed memory cluster (150+ processors)
Establish a Symmetric or shared multiprocessor system
Establish a large shareable high speed storage (100’s of Terabytes)
2/12/04
DPCC Cluster as of 2/1/2004
Located in 101 GACB Inauguration 2/23 as part of E-Week 5 racks of equipment + UPS
2/12/04
2/12/04
2/12/04
2/12/04
Photos (scaled for presentation)
2/12/04
DPCC Resources
97 machines– 81 worker nodes– 2 interactive nodes– 10 IDE based RAID servers– 4 nodes support Fibre Channel SAN
50+ TB storage– 4.5 TB in each IDE RAID– 5.2 TB in FC SAN
2/12/04
DPCC Resources (continued)
1 Gb/s network interconnections– core switch– satellite switches
1 Gb/s SAN network UPS
2/12/04
node2
node3
node4
node5
node6
node7
node8
node9
node10
node11
node12
node13
node14
Node15
node16
node1
node18
node19
node20
node21
node22
node23
node24
node25
node26
node27
node28
node29
node30
node31
node32
node17
master.dpcc.uta.edu grid.dpcc.uta.edu
gfsnode1
gfsnode2
gfsnode3
lockserver
ArcusII 5.2 TB
Brocade 3200
Open
Open
Open
Open
Foundry FastIron 800
raid3
raid6
raid8
raid9
raid10
node54
node55
node56
node57
node58
node59
node60
node61
node62
node53
raid4 IDE 6TB
Linksys 3512
node64
node65
node66
node67
node68
node69
node70
node71
node72
node63
raid5 IDE 6TB
Linksys 3512
node74
node75
node76
node77
node78
node79
node80
node81
node73
raid7 IDE 6TB
Linksys 3512
node34
node35
node36
node37
node38
node39
node40
node41
node42
node33
raid1 IDE 6TB
Linksys 3512
node44
node45
node46
node47
node48
node49
Node50
node51
node52
node43
raid2 IDE 6TB
Linksys 3512
100 Mbs switch
Campus Campus
DPCC Layout
2/12/04
DPCC Resource Details
Worker nodes– Dual Xeon processors
32 machines @ 2.4GHz 49 machines @ 2.6GHz
– 2 GB RAM– IDE Storage
32 machines @ 60 GB 49 machines @ 80 GB
– Redhat 7.3 Linux (2.4.20 kernel)
2/12/04
DPCC Resource Details (cont.)
Raid Server– Dual Xeon processors (2.4 GHz)– 2 GB RAM– 4 Raid Controllers
2 port controller (qty 1) Mirrored OS disks 8 port controller (qty 3) RAID5 with hot spare
– 24 250GB disks– 2 40GB disk– NFS used to support worker nodes
2/12/04
DPCC Resource Details (cont.)
FC SAN– RAID5 Array– 42 142GB FC disks– FC Switch– 3 GFS nodes
Dual Xeon (2.4 Ghz) 2 GB RAM Global File System (GFS) Serve to cluster via NFS
– 1 GFS Lockserver
2/12/04
Using DPCC
Two nodes available for interactive use– master.dpcc.uta.edu– grid.dpcc.uta.edu
More nodes are likely to support other services (Web, DB access)
Access through SSH (version 2 client)– Freeware Windows clients are available (ssh.com)– File transfers through SCP/SFTP
2/12/04
Using DPCC (continued)
User quotas not implemented on home directory yet. Be sensible in your usage.
Large data sets will be stored on RAID’s (requires coordination with sys admin)
All storage visible to all nodes.
2/12/04
Getting Accounts
Have your supervisor request account Account will be created Bring ID to 101 GACB to receive password
– Keep password safe Login to any interactive machine
– master.dpcc.uta.edu– grid.dpcc.uta.edu
USE yppasswd command to change password If you forget your password
– See me in my office, I will reset your password (with ID)– Call or e-mail me, I will reset your password to the original
password
2/12/04
User environment
Default shell is bash– Change with ypchsh– Customize user environment using startup files
.bash_profile (login session) .bashrc (non-login)
– Customize with statements like: export <variable>=<value> source <shell file>
– Much more information in man page
2/12/04
Program development tools
GCC 2.96– C– C++– Java (gcj)– Objective C– Chill
Java– JDK Sun J2SDK 1.4.2
2/12/04
Development tools (cont.)
Python– python = version 1.5.2– python2 = version 2.2.2
Perl– Version 5.6.1
Flex, Bison, gdb… If your favorite tool is not available, we’ll
consider adding it!
2/12/04
Batch Queue System
OpenPBS– Server runs on master– pbs_mom runs on worker nodes– Scheduler runs on master– Jobs can be submitted from any interactive node– User commands
qsub – submit a job for execution qstat – determine status of job, queue, server qdel – delete a job from the queue qalter – modify attributes of a job
– Single queue (workq)
2/12/04
PBS qsub
qsub used to submit jobs to PBS– A job is represented by a shell script– Shell script can alter environment and proceed
with execution– Script may contain embedded PBS directives– Script is responsible for starting parallel jobs (not
PBS)
2/12/04
Hello World
[mcguigan@master pbs_examples]$ cat helloworldecho Hello World from $HOSTNAME
[mcguigan@master pbs_examples]$ qsub helloworld15795.master.cluster
[mcguigan@master pbs_examples]$ lshelloworld helloworld.e15795 helloworld.o15795
[mcguigan@master pbs_examples]$ more helloworld.o15795 Hello World from node1.cluster
2/12/04
Hello World (continued)
Job ID is returned from qsub Default attributes allow job to run
– 1 Node– 1 CPU– 36:00:00 CPU time
standard out and standard error streams are returned
2/12/04
Hello World (continued)
Environment of job– Defaults to login shell (overide with #!) or –S switch– Login environment variable list with PBS additions:
PBS_O_HOST PBS_O_QUEUE PBS_O_WORKDIR PBS_ENVIRONMENT PBS_JOBID PBS_JOBNAME PBS_NODEFILE PBS_QUEUE
– Additional environment variables may be transferred using “-v” switch
2/12/04
PBS Environment Variables
PBS_ENVIRONMENT PBS_JOBCOOKIE PBS_JOBID PBS_JOBNAME PBS_MOMPORT PBS_NODENUM PBS_O_HOME PBS_O_HOST PBS_O_LANG PBS_O_LOGNAME PBS_O_MAIL PBS_O_PATH PBS_O_QUEUE PBS_O_SHELL PBS_O_WORKDIR PBS_QUEUE=workq PBS_TASKNUM=1
2/12/04
qsub options
Output streams:– -e (error output path)– -o (standard output path)– -j (join error + output as either output or error)
Mail options– -m [aben] when to mail (abort, begin, end, none)– -M who to mail
Name of job– -N (15 printable characters MAX first is alphabetical)
Which queue to submit job to– -q [name] Unimportant for now
Environment variables– -v pass specific variables– -V pass all environment variables of qsub to job
Additional attributes-w specify dependencies
2/12/04
Qsub options (continued)
-l switch used to specify needed resources– Number of nodes
nodes = x
– Number of processors ncpus = x
– CPU time cput=hh:mm:ss
– Walltime walltime=hh:mm:ss
See man page for pbs_resources
2/12/04
Hello World
qsub –l nodes=1 –l ncpus=1 –l cput=36:00:00 –N helloworld –m a –q workq helloworld
Options can be included in script:#PBS -l nodes=1#PBS -l ncpus=1#PBS -m a#PBS -N helloworld2#PBS -l cput=36:00:00
echo Hello World from $HOSTNAME
2/12/04
qstat
Used to determine status of jobs, queues, server $ qstat $ qstat <job id> Switches
– -u <user> list jobs of user– -f provides extra output– -n provides nodes given to job– -q status of the queue– -i show idle jobs
2/12/04
qdel & qalter
qdel used to remove a job from a queue– qdel <job ID>
qalter used to alter attributes of currently queued job– qalter <job id> attributes (similar to qsub)
2/12/04
Processing on a worker node
All RAID storage visible to all nodes– /dataxy where x is raid ID, y is Volume (1-3)– /gfsx where x is gfs volume (1-3)
Local storage on each worker node– /scratch
Data intensive applications should copy input data (when possible) to /scratch for manipulation and copy results back to raid storage
2/12/04
Parallel Processing
MPI installed on interactive + worker nodes– MPICH 1.2.5– Path: /usr/local/mpich-1.2.5
Asking for multiple processors– -l nodes=x– -l ncpus=2x
2/12/04
Parallel Processing (continued)
PBS node file created when job executes Available to job via $PBS_NODEFILE Used to start processes on remote nodes
– mpirun– rsh
2/12/04
Using node file (example job)
#!/bin/sh
#PBS -m n
#PBS -l nodes=3:ppn=2
#PBS -l walltime=00:30:00
#PBS -j oe
#PBS -o helloworld.out
#PBS -N helloword_mpi
NN=`cat $PBS_NODEFILE | wc -l`
echo "Processors received = "$NN
echo "script running on host `hostname`"
cd $PBS_O_WORKDIR
echo
echo "PBS NODE FILE"
cat $PBS_NODEFILE
echo
/usr/local/mpich-1.2.5/bin/mpirun -machinefile $PBS_NODEFILE -np $NN ~/mpi-example/helloworld
2/12/04
MPI
Shared Memory vs. Message Passing MPI
– C based library to allow programs to communicate– Each cooperating execution is running the same program
image– Different images can do different computations based on
notion of rank– MPI primitives allow for construction of more sophisticated
synchronization mechanisms (barrier, mutex)
2/12/04
helloworld.c
#include <stdio.h>#include <unistd.h>#include <string.h>#include "mpi.h"
int main( argc, argv ) int argc; char **argv;{ int rank, size; char host[256]; int val;
val = gethostname(host,255); if ( val != 0 ){ strcpy(host,"UNKNOWN"); }
MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); printf( "Hello world from node %s: process %d of %d\n", host, rank, size ); MPI_Finalize(); return 0;}
2/12/04
Using MPI programs
Compiling– $ /usr/local/mpich-1.2.5/bin/mpicc helloworld.c
Executing– $ /usr/local/mpich-1.2.5/bin/mpirun <options> \
helloworld– Common options:
-np number of processes to create -machinefile list of nodes to run on
2/12/04
Resources for MPI
http://www-hep.uta.edu/~mcguigan/dpcc/mpi– MPI documentation
http://www-unix.mcs.anl.gov/mpi/indexold.html– Links to various tutorials
Parallel programming course