Feedback from ATLAS
Speaker – Doug Benjamin(Duke University)
On behalf of the ATLAS collaboration
Contributors to talk: DB, Frank Berghaus, Alessandro De
Salvo, Asoka De Silva, Andrej Filipcic, Ian Gable, John Hover, Ryan Taylor, Alex
Undrus, Rod Walker
Outline of Talk
• ATLAS collaboration’s use of cvmfs file systemo Use cases and issues
• ATLAS cloud computingo How cernvm is used and scaleo Why cernvm is not a universal choice across ATLAS
• Shoalo Implemented and tested solution for proxy cache
discovery
CVMFS in ATLAS• Initially deployed in ATLAS Tier 3 community as a
way to trivially deploy software to sites with little computing support.
• Good ideas spread fast – ATLAS quickly realized it power and eventually to all ATLAS grid computing sites.
• CVMFS is used by all aspects of ATLAS offline computing from grid and batch jobs, users interactive analysiso LXPLUS, Tier 1 computing centers, local off grid clusters
to individual physics laptops
Types of data served via
CVMFS
• ATLAS offline software releaseso atlas.cern.ch Total size – 653 GB, grew 144 kB in last 3 months
• ATLAS conditions data too big for Oracle DBo atlas-cond.cern.ch Total size – 515 GB, grew 11 kB in last 3
mon.
• CERN SFT software repository• ATLAS nightly software releases• User interface software for interactive use• Site Configuration Data/ Configuration agents• Alessandro DeSalvo -“In general everything that you
want to be distributed to many sites while centrally managing a single master copy”
• CVMFS client stability/reliability –o Saw file corruptions with old CVMFS v2.0 clients corruptions o Since v2.1 – problem solved!!!
CVMFS over NFS
• Use in ATLAS pioneered and early testing by NORDU Grid
• SiGNET (4k cores) w/ nfs4 exports & single nfs server for >since 2 years works rather well.o Except: when the server gets in a strange state w/ nfs
problems) and is rebooted. o Entire cluster needs to remount nfs. (couple times per year)o Stability is improved with recent kernels (3.12) with better
fuse/nfs4 support on the NFS server. Client nodes unchanged.o If there are many sw validation jobs running (~100) with
different releases, the jobs would take very long time to complete (10h) and some might crash.
• DESY uses CVMFS over NFS• In CA Cloud – SFU uses it and is a test bed – happy w/ it• they have tested CVMFS over NFS on SL7 (the latest
test version of cvmfs that was announced a few weeks ago) on the nfs server and exported it successfully.
CVMFS and HPC’s(ie super computers)
• CVMFS works on many HPC sites already – cluster like sites, typically smaller
• Largest sites with thousands of nodes and cores have high speed very powerful parallel file systems but…o Nodes do not have TCP code stack nor connectivity to
outsideo Many simultaneous I/O operations negatively affect the
power parallel file systemso Fuse on the compute nodes not an option
• We need some straightforward tool to cache the files we need in the compute nodes memory if small enough --- New project perhaps?
CVMFS and ATLAS nightly code releases
(slide from Alex Undrus)• In order to aid in ATLAS software development and debugging,
ATLAS produces releases that change every night. • Nightly CVMFS repository demonstrates another practical CVMFS
use case: repository for short term storage. • The improved CVMFS server version 2.1.19 made this use case
easier to handle: o Reduced publish timeso Improved garbage handling
• with daily removal/additions the disk usage grew only twice during 4 months of operation
• With previous CVMFS server version the disk usage grew much faster
• However ATLAS can not operate CVMFS servers without CERN IT assistanceo Nightly CVMFS repository handled differently than the other repositories
– Should it be?
• We recommend CERN IT just invest a bit more effort to CVMFS support.o For ATLAS CVMFS and AFS are almost equally important
CVMFS and ATLAS nightly code releases(2)
(slide from Alex Undrus)• Use cases for ATLAS nightly release on CVMFS, they include:
o Use at remote Tier 3's : CVMFS is faster at remote locations o Use at systems that do not have AFS for some reasons o Use for limited production on the GRID (rarely used by ATLAS, but
possible) Use for nightlies validation at GRID sites (through HammerCloud system that fully supports nightlies testing)
• number of repositories: 1 with size (cvmfs/atlas-nightlies.cern.ch/data) : 0.27 TB
• amount of data served: ~ 0.3 - 0.4 TB (varies day by day) • repository growth: no growth: 6 nightly releases are installed daily• How do we deal with mistakes?
o mis-installation: due to transient nature of nightlies - releases with installation problems are simply removed (but such problems are rare)
• How often do we remove data from the repositories?o there are 6 releases of 6 nightly branches available (total 36 releases).
Every day 1 of 6 releases (of each nightly branch) is substituted with newly built nightly release.
o substantial amount of data is removed every day (but many files in substitute release are unchanged, percentage really varies from 100% to ~30%)
CVMFS section conclusions
• CVMFS is a critical part of the ATLAS computing infrastructure.o User tools for interactive worko Offline software (both static releases and nightlies)o Conditions datao Site configuration code and data
• Large HPC’s represent new challenges and opportunities.o They can be much different than Grid or Cloudo Hopefully CVMFS can be part of the solution
Sept 17, 2014 ATLAS Software & Computing Workshop 11/22
Cloud Scheduler
Condor Central Manager
User
Cloud Scheduler
Clouds boot Virtual Machines VM contextualizes to attach to the condor and
processes jobs Cloud scheduler retires VM when no jobs
require that VM
Sch
edul
er s
tatu
s co
mm
unic
atio
n
VirtualMachine
VirtualMachine
VirtualMachine
VirtualMachine
VirtualMachine
VirtualMachine
Ope
nSta
ck Cloud Interface
Ope
nSta
ck Cloud Interface
Goo
gle Cloud Interface
...
CernVM in ATLAS
(Frank Berghaus)• Features:
o Operating system and project software is made available over cvmfs
o Cloud-init used contextualization of images
o Same image works anywhere, on any Hypervisor, and on any Cloud Type
• Dynamic Condor Slot Configurationo Multi core and single core
Sept 17, 2014 ATLAS Software & Computing Workshop 13/22
ATLAS Cloud Production in 2014
HelixNebula
Over 1.2M ATLAS Jobs Completed
Mostly Single core production
New ATLAS requirements:
Multi-core High memory
Completed Jobs in 20141.2M
Jan 2014 Sep 2014
CERN
North America
AustraliaUK
Amazon
As of September 2014
ATLAS Cloud 2014
CERN PROD – 394 CPU yrs (CernVM)IAAS – 202 CPU years (CernVM)BNL/Amazon – 118 CPU yearsATLAS@HOME – 89 CPU yrs (CernVM)
Jan ‘15
Jan ’14
CERN – HLT farm Point 1 – Sim@P1For Run 2 - run when LHC off for 24 hrs – under complete control of Online group1.5 hrs to saturate the farm with VM instantiationRunning 10 GB cvmfs cache w/o issues Jan
’14Jan ‘15
Daily Slots Running jobs
20 k
0
10 k
GRIDPP – 17 CPU yrs (CernVM)
NECTAR – 35 CPU years
CPU consumption
Why some sites do not use
CernVM?
( John Hover, BNL )• (RACF/USATLAS) builds own images for several reasons,
1. First and foremost we are OSG sites. Part of our contribution back to OSG is to provide methods and tools usable by any OSG VO, many from other sciences e.g. biology.
2. CernVM, as a CERN product, and perceived to be more-or-less HEP oriented. • How does one change this perception?
3. CernVM is an artifact produced by a service at CERN, and only re-producible by them.
4. USATLAS provides both build-time and run-time customization via a (user-usable) toolset—a product, freely available for everyone's usage, heavily based on existing 3rd-party utilities (imagefactory, oz, RPM, puppet, etc.).
• What would it take for USATLAS/OSG to adopt CernVM?o “If the CernVM team were to release the complete toolset, and support it so anyone
can generate their own customized VM, then we'd seriously consider it. “ – John Hovero Issue is really service vs. product. People are more willing to try products than
services
CernVM and Cloud computing Conclusions
• ATLAS routinely uses CernVM as part of its cloud portfolio
• Thought should be given how to attract others outside of HEP and Astrophysics to using CernVM.
ShoalDynamic web cache discovery system
Ian Gable
https://github.com/hep-gc/shoal
Frank Berghaus, Andre Charbonneau,
Mike Chester, Ron Demarais, Colson
Driemel, Rob Prior, Alex Lam, Randall
Sobie, Ryan Taylor
Advertise the existence of Squid caches every 30 seconds using AMQP messaging to the shoal server. VMs contact the server REST interface to get the closest squid.Uses GeoIP libraries to determine closest squid for each client that contacts the serverEach squid also sends network load information.
Squids that don’t send messages regularly are automatically removed.
Three components
a reference client that can be used to contact the REST interface of the shoal-server to configure CVMFS.
maintains the list of running squids. It uses RabbitMQ to handle incoming AMQP messages from squid servers. It provides a REST interface for programmatically retrieving a json formatted ordered list of squids. It also provides a web interface for viewing the list.
shoal-server
shoal-agent
shoal-client
runs on squid servers and publishes the load and IP of the squid server to the shoal-server using a json formatted AMQP message at regular intervals. This agent is designed to be trivially installed in a few seconds with python's pip tool or yum.
Reliability features
Each squid can be ‘verified’ by the server, i.e. confirm that you can access the CVMFS repos from that squid. Currently we test this by checking availability of the .cvmfswhitelist file from multiple Stratum 1 servers, for example:
"http://cvmfs.racf.bnl.gov:8000/cvmfs/atlas/.cvmfswhitelist",
this protects against accidentally misconfigured agents.
‘Verification’
Configurable bandwidth limit on agentSet the maximum bandwidth which you want your squid to server before shoal stops handing it out
Configurable access levels on agentsSquid can be advertised as: “No outside access”, “Global”, etc (see docs)
Scalability
With 800 squids advertising every 30 seconds the shoal server is able to serve 1000 REST request per second for the nearest squid server. No real attempt to optimize yet.
CHEP paper:
Using 2012 generation 16 core core server
http://dx.doi.org/10.1088/1742-6596/513/3/032035
Deploying
Used now for IaaS ATLAS clouds
All components available as SL RPMs available from a yum repo.
Shoal Agent and Client available from python pip.
Client in particular has trivial dependency of only python 2.4+.
All docs and installation instructions available from GitHub
https://github.com/hep-gc/shoal
Experimental Docker Support
Containers for all three components
Automated builds that start with the CentOS image (could be changed)
Very simple Dockerfile
Conclusions
• Shoal solves the need of squid discovery for cloud clients.
• ATLAS uses it at scale and welcomes others to use it as well.
• CVMFS and CernVM are software tools very important to the ATLAS experiment
• Our collaboration with CernVM team over the years has been and continues to be very successful and enjoyable
• We hope to continue this collaboration as we expand more and more into HPC’s
Sept 17, 2014 ATLAS Software & Computing Workshop 27/22
Infrastructure-as-a-Service (IaaS) Clouds
IaaS Cloud: A pool of virtual machine hypervisors presenting a single controller interface Run many instances of one virtual machine configured for
ATLAS computing Advantages:
Isolate complex application software from site administration Minimize dependence on local system Flexible resource allocation
Examples: OpenStack Nimbus Commercial clouds: Amazon, Google, etc.
Running at labs (e.g., CERN), universities (e.g., Victoria),and research networks (e.g., GridPP)
Sept 17, 2014 ATLAS Software & Computing Workshop 28/22
Cloud Scheduler
Cloud Scheduler is a python package for managing VMs on IaaS clouds
Users submit HTCondor jobs Optional attributes specify virtual machine properties
Dynamically manages quantity and type of VMs in response to user demand
Easily connects to many IaaS clouds, and aggregates their resources
Provides IaaS resources in the form of an ordinary HTCondor batch system
Used by ATLAS, Belle II, CANFAR, and BaBar
Sept 17, 2014 ATLAS Software & Computing Workshop 29/22
Cloud Scheduler
Cloud Scheduler is a python package for managing VMs on IaaS clouds
Users submit HTCondor jobs Optional attributes specify virtual machine properties
Dynamically manages quantity and type of VMs in response to user demand
Easily connects to many IaaS clouds, and aggregates their resources
Provides IaaS resources in the form of an ordinary HTCondor batch system
Used by ATLAS, Belle II, CANFAR, and BaBar
Code https://github.com/hep-gc/cloud-scheduler
Website http://cloudscheduler.org/
Publication http://arxiv.org/abs/1007.0050