using the cvmfs for distributing data analysis applications for fermilab scientific programs
DESCRIPTION
Using the CVMFS for Distributing Data Analysis Applications for Fermilab Scientific Programs. A.Norman & A.Lyon Fermilab Scientific Computing Division. CVMFS for Fermilab Science. Not going to talk about the internals of CVMFS See J.Blomer et al (CHEP 2012) - PowerPoint PPT PresentationTRANSCRIPT
Using the CVMFS for Distributing Data Analysis Applications for Fermilab Scientific Programs
A.Norman & A.LyonFermilab
Scientific Computing Division
CHEP2013 2
CVMFS for Fermilab Science
• Not going to talk about the internals of CVMFS– See J.Blomer et al (CHEP 2012)
• Will cover key design choices for running in the FNAL IF environment– Server Infrastructure
• FNAL Based (for LAN ops)• OSG Based (for WAN ops)
• Case Study of Migration to CVMFS Operations:– NOvA Experiment’s code base
• Operational Experiences
A.Norman
CHEP2013 3
Goal in Deploying CVMFS
• Introduce a new infrastructure to distribute input files to analysis jobs on the FNAL LAN– Something that was NOT
susceptible to failure modes seen with central (NFS) disk
– Meet the security needs of the lab environment
– Permit
A.Norman
Central Disk Overload
CHEP2013 4
Requirements
• Scalable to 10k’s concurrent Job level• Compatible with experiment’s analysis suites• Able to distribute files up to 2-4 GB sizes (static data
tables, flux ntuples, event template libraries)• Centrally managed• Compartmentalized failure modes (i.e. experiments
don’t adversely affect each other)• Secure access for experiment libraries to
update/maintain code repositories• Extensible to non-local file distribution
A.Norman
CHEP2013 5
2 Servers for 2 Environments
• Hosted by Sci. Computing Division• Design for LAN operations at FNAL• Scalable to meet the concurrency
needs of FermiGrid• Sized to accommodate large
experiment code bases• Designed to compartmentalize
experiment’s impact on each other – Avoid Bluearc style failures
• Designed to meet data pre-servation needs of Tevatron Run II
• Oasis Server Hosted by OSG-GOC• Designed for WAN operation
across all of OSG• Designed to aggregate VO’s into a
single mount point to simply site management and VO’s software availablity
• Scalable to meet the concurrency needs of the OSG
• Sized to accommodate diverse code bases
• Client Integrated into general OSG client software bundles
A.Norman
FNAL Infrastructure
Separate Repository Servers for each experiment.
Redundant Stratum 1 servers for aggregating the individual repositories
CVMFS Master’s role is ONLY to sign the repos
BOTH CFS repos and the OSG oasis repo are avai-lable to the client and buffered by FNAL squids
7
FNAL Infrastructure
• Each experiment is insulated from failures in the other repositories.
• This is includes lock-outs due to repo updates• Permits scalability through addition of repo servers
Total outbound bandwidth can be scaled through additional Stratum 1’s and squid proxies
CHEP2013 8
OSG Infrastructure
• Central repository for all VO’s– Each VO has a directory hierarchy available under the common
/cvmfs/oasis.opensciencegrid.org/ base path which maps to working areas within the repository servers
– This allows for common/shared code across VO’s (i.e. fermilab common product from ouser.fermilab, are accessible transparently to the nova and g-2 VO’s)
– Each VO has an independent logon to the repository server and can affect only files in their area
– Updates/publication of the repository are unified (update process rebuilds all VO’s areas)
A.Norman
Published View of VO’s
Private View of VO’s sandboxes
CHEP2013 9
Code Repository Structure
• Transitioning to a CVMFS infrastructure required both the experiment code base and the external product dependencies be:– Runnable from a read-only file system
• Improper transient/temp files (made within module homes & within user areas)
• Ownership/Access corrections/enforcement– Fully re-locatable at runtime or build time
• Removal of absolute path references, assumptions about directory hierarchies, relocation of references files, etc…
• External products need to resolve link time dependencies correctly
A.Norman
CHEP2013 10
Code Repository StructureCVMFS
Experiment Base Path
External Products
Relocatable UPS product
hierarchy
Versions Arch Flavors
Experiment Code
(version controled)
Release Tools Tagged Releases
Src Code, User Modules
Development Tree
Src Code, User Modules
Experiment Static Data
Data App
A.Norman
• Experiment Specific but rarely changes
• Common across many job classes• Large file sizes (Gig+)
• Experiment Specific daily changes• Represents many different job
classes• Many small files• Required for builds systems
• Seldom changes• Used by framework and different
job classes• Large storage footprint• Very small run time footprint
Nova Code Repository Structure
CVMFS Experiment Base
Path
External Products
Relocatable UPS product hierarchy
ART releases and deps (7+ versions)
~400 GB
Arch Flavors (SLF5/6, OSX)
Experiment Codenovasvn
SoftRelTools Scripts
Tagged Base Releases
7,120,000+ lines of code
Development Tree
103 modules, 805,000+ lines of
code
Experiment Static Data
(nova)
Data
Flux Files PID Libraries
App
Database Interface
A.Norman CHEP2013 11
• Neutrino flux file libraries (200 GB)• Neutrino interaction templates
(140 GB)• Database support files
• Frozen releases for all active analysis• Development snapshots (7 days)• SRT release management
• Full ART suite distributions for compatibility with all frozen nova
• System Compatibility Libraries (SLF5/6, Ubuntu)
• Merge-able with other UPS trees (common tools)
$CVMFS_DISTRO_BASE
$CVMFS_DISTRO_BASE/externals
$CVMFS_DISTRO_BASE/novasvn
$CVMFS_DISTRO_BASE/nova
CHEP2013 12
CVMFS Distribution/Release Deployment
• Goal is for everything to be deployable as “tar” style archives to specific boxes in the distribution (i.e. products, releases, etc)– Design of repository allows for independent updating of
products & releases– Also allows for patches to individual trees within boxes.
• Deployment model uses arch specific build machines– Builds specific tagged/frozen releases for major
architectures– Auto (nightly) builds of experiment specific development
A.Norman
CHEP2013 13
FNAL CVMFS Deployments
• Trivial deployment– Librarian can copy tarballs to staging area on repository
node– Kick off an untar operation of the archives to the proper
top node in the area to be published– Initiate a rebuild of the CVMFS catalogs
• Rebuild process returns status to allow for error checking by the librarian
A.Norman
Experiment Repository Node
StagingVolume
PublicationVolume
Release tar
Deployment/Release archives(2-400 GB per archive)
untar to dest. treePublication
to Stratum 1
CVMFS CatalogUpdate
User initiated
CHEP2013 14
Oasis Repository
PublicationVolume
OSG CVMFS Deployments• Dedicated staging space limited
– Can not stage large files (i.e. 200 GB archives)– Instead Librarian must do a streaming transfer/unpack of the tarballs
directly into an area that is mirrored to the actual repository
– Initiate a sync of the mirror area to the real server– Master catalog is rebuilt – Note: Rebuild on oasis affects ALL VO’s. Repo maint. can not be
retriggered for your VO until existing builds are done.
A.Norman
Oasis Login Node
MirrorVolume
Release tar
Deployment/Release archives(2-400 GB per archive)
Publication to Stratum 1
CVMFS CatalogUpdatesync
to repo
stream untar
cat nova_release.tar |gsissh oasis-opensciencegrid.org “tar –C <deploy_path> -xf –”
CHEP2013 15
Distribution Setup
• Entire NOvA offline distribution can be setup (anywhere) using:
• The CVMFS base path is detected using a block like:
A.Norman
function setup_novaoffline { export CVMFS_DISTRO_BASE=/cvmfs/oasis.opensciencegrid.org export EXTERNALS=$CVMFS_DISTRO_BASE/externals export CODE=$CVMFS_DISTRO_BASE/novasvn source $CODE/srt/srt.sh source $CODE/setup/setup_novasoft.sh “$@”}
for distro in ${distrolist[*]}do if [ -r $distro/setup ] then export CVMFS_DISTRO_BASE=$distro return 0 fi return 1done
distrolist=(/cvmfs/novacfs.fnal.gov/cvmfs/oasis.opensciencegrid.org/nusoft/app/externals…)
With:
Hierarchical support for different CVMFS repo’s as well as central disk (BlueArc) legacy distributions
CHEP2013 16
Job Submission/Running
• After initializing the CVMFS resident code distribution, running analysis is transparent to the end user.– All search and library load paths are searched properly
• Base release files are pickup from CVMFS distro• Test release files in local sotrage
(overriding the base release)– Configuration files are pulled from base or test releases
properly.– Output is generated in the proper writable– Scripts for data handling are stored in CVMFS to simplify
offsite copy back
A.Norman
CHEP2013 17
First Run Cache Footprint
A.Norman
Test Monte Carlo Job #1
Test Monte Carlo Job #2
CHEP2013 18
Correlation with NOvA Job Flow
A.Norman
ARTLib Load
Root GeomLoad
Geant Init
Geant Crosssections Loading
ART module load (Readout SIM)
Geant Detector SIM
Readout Simulation
CHEP2013 19
Repeat Run Cache Size
A.Norman
Repeated Runs of the same (or similar) Jobsstart with a fully populated cache and take no startup penalty
CHEP2013 20
Repeat Run Cache Size
A.Norman
Cache growth as the simulation enters into the det sim and loads different interaction tables
CHEP2013 21
Startup Overhead
• Measured Job startup overhead using jobs which generate a single Monte Carlo event– This is the minimum ratio of work to overhead
• Average job time (empty cache)– 241.8 seconds
• Average job time (pre-loaded cache)– 279.6 seconds
• Variation in the processingtime per event completelydominates the measurement
A.NormanGeneration Time 100 events (min)
CVMFS Job Run Time (3343 trials)
Average event generation time = 198 s
CHEP2013 22
FermiGrid Running
• Large Scale Running against the FNAL CVMFS has been successful• Demonstrated peak concurrent running of 2000 NOvA using CVMFS for
program+ static data delivery• 2000 concurrent jobs (from single submission) is a limitation of the queue
system (not CVMFS) and is being addressed
A.Norman
2000 ConcurrentCVMFS based jobs Running on Fermigrid
CHEP2013 23
OSG Oasis Running
• Started a pilot campaign with OSG to migrate generation of NOvA Monte Carlo to the Open Science Grid using Oasis hosted CVMFS– Phase 1: Generate 1 million cosmic ray interaction
overlay events using specific sites– Phase 2: Generate 16 million cosmic ray interaction
overlay events using general OSG resources– Phase 3: Reconstruct 16 million cosmic ray events
using output from Phase 2
A.Norman
CHEP2013 24
Phase 1: Results
• Generated 934500 events across 10 OSG sites• Oasis CVMFS was used for distribution of all job code
A.Norman
Combination of “NOvA” dedicated sites
And other general OSG sites
At peak had 1250 ∼concurrent jobs using Oasis
CHEP2013 25
Summary
• Fermilab has designed, implemented and deployed a new data handling infrastructure based on CVMFS– Major legacy code bases have been
ported/restructured to work with the system (including the NOvA & g-2 experiments)
– The new system is compatible with both Fermilab and OSG grid enviroments
– Large scale production has been successfully initiated using this system
A.Norman