experiences deploying clusterfinder on the grid arthur carlson (mpe) 7th astrogrid-d meeting tum,...
TRANSCRIPT
Experiences deploying Clusterfinder on the grid
Arthur Carlson (MPE)
7th AstroGrid-D Meeting
TUM, 11th-12th June 2007
Experiences deploying Clusterfinder on the grid
•What is the deployment problem?•A prototype solution using
–“grid-modules”–“environments”
•Status and conclusions
Deployment is when ...
Deployment is when ...
users
applications
hosts.
each of many
can (build and) run each of many
on each of many
Deployment is when ...
users
applications
hosts.
each of many
can (build and) run each of many
on each of many
“each” = >90%“many” = >10
Deployment is when ...
users
applications
hosts.
each of many
can (build and) run each of many
on each of many
certificates/password filesVOs (update of grid-mapfile,
sharing software)firewalls
repository/distribution/version controldata access“standard software” (compiler, ...)environment
“each” = >90%“many” = >10
grid-modules
grid-modules
A prototype system for getting software from where it is maintained to where it is used.
• Inspired by environment modules package– load/unload (PATH)– initadd/initclear (.profile)
• for software from a remote repository– update/deinstall– build/clean– test
grid-modules: install and use
• grid-modules-clone NEWHOST(LIST)
• also copies ~/.subversion for passwords
• grid-module [update|load|initadd|build|test][gridmod|env|gmon|cf|proc|gat]
grid-modules: adding modules
• set_module_infoagd_rep='svn://svn.gac-grid.org/software‘
all_modules=‘gridmod cf‘case $module in gridmod) rep=$agd_rep/grid-modules; frag=gridmod/bin;; cf) rep=$agd_rep/clusterfinder; frag=unknown;;
*) rep=unknown; frag=unknown;;esac
• customization scripts
grid-modules: adding modules
• set_module_infoagd_rep='svn://svn.gac-grid.org/software‘planck_rep='http://www.mpa-garching.mpg.de/svn/planck-group/planckbranches‘all_modules=‘gridmod cf proc‘case $module in gridmod) rep=$agd_rep/grid-modules; frag=gridmod/bin;; cf) rep=$agd_rep/clusterfinder; frag=unknown;; proc) rep=$planck_rep/ProC-2.3; frag=proc/build/dist/bin;; *) rep=unknown; frag=unknown;;esac
• customization scripts===== proc.build =====cd ~/grid-modules/proc/ProC-baseant
===== proc.load =====mkdir -p $HOME/.planckecho "allowIncompleteConf = true" > "$HOME/.planck/pipelinecoordinator.pref"
===== proc.unload =====rm -r $HOME/.planck
environments
environments
A prototype system for making different hosts look alike.
• Does a required software package exist on a remote host, and where is it installed?export IMAGEMAGICK_HOME=/usr/local/ImageMagick-6.3.2
• Make it available!export PATH=$PATH:/usr/local/ImageMagick-6.3.2/bin
• Host-specific information must be maintained by somebody somewhere.– require modules or take the bull by the horns
environments: load_env
The trick is to find the right scripts to execute for each host.
if ! hostname=`hostname -f 2>/dev/null`; then hostname=`hostname`; fi
scripts=`sed -n "s/^ *$hostname *//p" <<EOF
astrogrid.aei.mpg.de aeiburan.aei.mpg.de aeilx32i1.cos.lrz-muenchen.de lrz g95 lrz-32lx64a2.cos.lrz-muenchen.de lrz g95 lrz-64...
EOF`
cd ~/grid-modules/env/bin
source ./default
if [[ -f local ]]; then \ echo sourcing local environment script source localelif [[ "$scripts" ]]; then \ echo For $hostname sourcing these scripts: $scripts for script in $scripts; do source ./$script; donefi
This may need to be changedwhen adding a new host
environments: scripts
The work is done in the scripts.===== default =====export GSL_INCL=-I/usr/includeexport GSL_LIBS=-L/usr/lib
export IMAGEMAGICK_INCL=-I/usr/include/export IMAGEMAGICK_LIBS=-L/usr/lib/
export FC='gfortran -std=gnu -fno-second-underscore'export F_PORTABILITY_FLAGS=-DPLANCK_GFORTRANexport F_COMMONFLAGS='-W -Wall -Wno-uninitialized -Wno-unused -O2 -Wfatal-errors $(F_PORTABILITY_FLAGS)'export FCFLAGS='-c $(F_COMMONFLAGS) -I$(INCDIR)'
export CC=gccexport CCFLAGS_NO_C='-W -Wall -I$(INCDIR) $(GSL_INCL) $(IMAGEMAGICK_INCL) -fno-strict-aliasing -O2 -g0 -s -ffast-math'export CCFLAGS='$(CCFLAGS_NO_C) -c‘
===== lrz =====export GSL_INCL='$(GSL_INC)'export GSL_LIBS='$(GSL_SHLIB) $(GSL_BLAS_SHLIB)'export ANT_HOME=/lrz/sys/apache-ant-1.6.5
module load gslmodule load javamodule load gcc/4.1.0module load g95module load mpi.shmem/gcc
export PATH=/lrz/sys/jdk1.5.0_07/bin:${PATH}
====== g95 =====export FC=g95export F_PORTABILITY_FLAGS=-DPLANCK_G95
Defaults can be overridden.
Defaults work in most cases.
Cooperates with modules.
New scripts may need to bewritten for new hosts
Status
Status
• ca. 23 AGD hosts + 9 DGI hosts are accessible• F90 build of Clusterfinder successful on 22 hosts (70%)• Some of the problems experienced:
– difficulty finding FQDNs of resources, hosts listed by mistake– gsissh disabled– default job factory type disabled for globusrun-ws– no gsiscp installed, or unexpected default ports– svn not installed, too old, or not allowed connections– shell not bash, .profile not processed with batch jobs– file quota too small– some hosts (lx[32|64]ia1 at LRZ) share a file system– no F90 compiler installed, or hard to find– deep changes in grid-modules are hard to update
Conclusions
Conclusions
• Clusterfinder has been deployed on “many” hosts using a prototype deployment system that is “easily” extendable to many users and many applications.
• The system handles diversity without standing in the way of defining standards.
• AGD should use this system or decide on something better, but should not diverge.