helix nebula marketplace for end-users: the cern experience
TRANSCRIPT
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Helix Nebula Marketplace for end-users:!the CERN experience D. Giordano (CERN) The Helix Nebula Initiative & PICSE: Towards a European Open Science Cloud 26 June 2015
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015 2"Bob"Jones,"CERN"
LHC"compu5ng"resources"in"2014"• 3.4"billion"CPU"hours"• 100"PetaBytes"storage"
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
The Worldwide LHC Computing Grid
WLCG:!An International collaboration to distribute and analyse LHC data Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists
3
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Growing data and processing needs
• Very rough estimate of a new RAW data per year of running using a simple extrapolation of current data volume scaled by the output rates.
• To be added: derived data (ESD, AOD), simulation, user data…
PB
0.0"
50.0"
100.0"
150.0"
200.0"
250.0"
300.0"
350.0"
400.0"
450.0"
Run"1" Run"2" Run"3" Run"4"
CMS"
ATLAS"
ALICE"
LHCb"
We are here
Data: Outlook for HL-LHC
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Helix Nebula: CERN use case Evaluating the use of public cloud IaaS for LHC computing
– Transparent integration of cloud resources within experiments’ distributed computing software and services
– Evaluation of financial costs for processing, data transfer, storage
5
HNX Platform
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Example HNX procurement • CERN price enquiry for a small fraction of the
resources necessary to run the ATLAS experiment simulation software
• Several offers received
– Atos selected as contractor • Providing services via the SlipStream engine
(from SixSq SME) • Services delivered over the GÉANT network
6
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Summary results
• Production phase started in March’15 • Up to 3,000 concurrent running VMs
– 5 weeks of production – ~1.2 million CPU hours of processing!
• ATLAS GEANT4 Simulation of events
– ~11.5 million events processed ! ~80,000 jobs – ~9 hours single job duration – ~97% job efficiency
7
!!!
11.5 million
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Compared with other ATLAS cloud sites (March)
8
• Significant contribution compared with other ATLAS cloud sites running simulation – Largest # of processed events – Longest wall time consumption – High wall time efficiency
(*) CERN-PROD_CLOUD: 90% of processed events belongs to !
fast workloads (~24 s/evt)
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Compared with other ATLAS cloud sites (March)
8
• Significant contribution compared with other ATLAS cloud sites running simulation – Largest # of processed events – Longest wall time consumption – High wall time efficiency
(*) CERN-PROD_CLOUD: 90% of processed events belongs to !
fast workloads (~24 s/evt)
Wall time consumption in seconds [1.e9]
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Compared with other ATLAS cloud sites (March)
8
WallClock efficiency: success/all jobs
• Significant contribution compared with other ATLAS cloud sites running simulation – Largest # of processed events – Longest wall time consumption – High wall time efficiency
(*) CERN-PROD_CLOUD: 90% of processed events belongs to !
fast workloads (~24 s/evt)
Wall time consumption in seconds [1.e9]
0.8 1
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Provisioning & monitoring chain Cloud resources fully integrated in the computing system of the ATLAS experiment
9
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
Future procurement initiatives • Go beyond the simulation use-case of the 2014 price enquiry
– Include reconstruction and analysis workloads
10
• Two procurements to be
launched in July ‘15 – simulation use-case – full-chain:!
simulation!+ reconstruction !+ analysis
Helix Nebula Initiative & PICSE D. Giordano 26/06/2015
• Successful activity – Unprecedented capacity acquired and maintained in HNX for
CERN use case • 1.2 million CPU hours of processing (~11.5 million events)
• The use of commercial IaaS is technically feasible – Evaluated for simulation workloads
• Next focus on workloads more challenging for storage and network – Variations in APIs can be overcome with reasonable technical
effort – Independent consumer-side monitoring and benchmarking of
the used resources used is essential
Final remarks
11