status and future evolution of the atlas offline...

13
Status and future evolution of the ATLAS Offline Software Rolf Seuster (TRIUMF)

Upload: others

Post on 04-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Status and future evolutionof the ATLAS Offline Software

    Rolf Seuster (TRIUMF)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 2

    Introduction

    ● Since the end of LHC Run 1, ATLAS improved the full analysis chain from RAW data to final ntuples, addressing a few issues seen as potential problems for Run 2

    ● This talk shows the big picture, summarizes the progress, and refers to other talks and posters with more details

    ➔ ➔ ➔➔

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 3

    Challenges for Run 2

    ● LHC will deliver higher √s and higher instantaneous luminosity, with 25ns bunch spacing as initially designed, expect ~40 interactions per crossing

    ● L1 trigger bandwidth increased to 100kHz (was 75kHz)● To cope with higher rate of interesting events EF output

    rate also increases to 1kHz (was ~400Hz in Run 1)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 4

    Achievements: Reduction of Processing time

    ● Significant speedup ofreconstruction to copewith 1kHz EF rate: ~4x !!

    ● In Run 1 Inner Detector dominated processing time, most work done:

    – more modern Matrix library Eigen replaced CLHEP: faster due to SIMD intrinsics, ...

    – general code cleanups, Magnetic field rewritten in c++– 'free' lunch: newer gcc, Intel mathlib, SLC6, 64bits

    ● More details in talk from Andi Salzburger (Abstr.209), poster by Jovan Mitrevski (Abstr.147)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 5

    Achievements:Memory usage reduction

    ● Now use 64 bit executables by default, which resulted in 25-50% more memory consumption, but ~25% faster

    – some ATLAS workloads then use over 2GB / core RSS– use athenaMP to lower

    total memory usage– fork workers and rely

    on OS CoW feature– overhead caused by

    sequential part:init / finalize + merging of outputs if needed

    – problem: most OS tools report false numbers● For status of athenaMP, see talk from Vakho Tsulaia (Abstr.165)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 6

    AthenaMP:reporting the right numbers

    – black/violet: correct measures: PSS via smaps or cgroups, later do not include sharing via memory mapped files or shmem (MP EventLoopMgr)

    – green/red: VMem and RSS from smaps, obviously wrong numbers, but used by many queueing systems

    ● Most OS tools don't take sharing via CoW into account● ATLAS developed own tool, cgroups also work

    the problem:

    ● athenaMP with 8 workers: real ~15 GB total in use,

    well below 2GB / core

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 7

    User experience:New Data Model

    ● xAOD ATLAS' new analysis focused EDM: – performant access in ROOT and athena– streamlined layout, single baseclass defining interface

    ● lightweight objects with additional data in AuxStore– no Transient/Persistent layer, rely on ROOT scheme evolution

    ● Structure-of-Array – like memory layout– better data locality– possibility to AutoVectorize,

    further performance boost● See talk by Scott Snyder

    (Abstr.182)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 8

    Getting data to the users

    ● Impossible for all users to analyse PB sized datasets● Centrally slim data sets down to only relevant entities ● Introduced reduction, or derivation framework:

    ● Flexible framework based on athena to select events, containers, objects or parts of objects in containers

    ● See talk by James Catmore (Abstr.164)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 9

    athena and ROOTcode re-usage

    ● Problem: athena and ROOT based analyses must give identical results. How to ensure ?

    ● Use same code and calibrations for both frameworks, with following limitations for ROOT analyses:

    – allow conditions access only from athena, from ROOT only for stable corrections like object calibrations constant over long time periods (e.g. egamma, jets)

    – no access to geometry, magfield, etc from ROOT● See poster by Steve Farrell (Abstr.177)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 10

    Future Frameworkinitial R&D

    ● Testing Gaudi Hive within ATLAS– initial tests promising with partial reconstruction

    parts of Calorimeter and Inner Detector reco● For full reco extensive work needed to adjust our code

    – lot of code not thread safe, caching, memory pools, etc.● Just started: working on simulation, new Geant4.10 is

    threadsafe, combine with GaudiHive● Need clear recipes on how to modify code to comply

    with future framework● See talk by Charles Leggett (Abstr.166)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 11

    Future Frameworkrequirements

    ● Collected requirements for various use cases, consequences are e.g.

    – no public tools exist, tools are private or services– tools should be thread safe and stateless– need some way to run legacy code

    which is not thread safe– allow for usage of accelerators

    ● See talk by Sami Kama(Abstr.151)

    GAUDI

    GAUDI Hive

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 12

    Other Contributions about ATLAS Offline SW not covered here

    ● ATLAS strategy for primary vertex reconstruction during run-II of the LHC (Abstr.163)

    ● ATLAS I/O Performance Optimization in as-deployed Environments (Abstr.171)

    ● ATLAS Metadata Infrastructure Evolution for Run 2 and beyond (Abstr.172)

    ● Event-driven Messaging for Offline Data Quality Monitoring at ALTAS (Abstr. 176)

    ● The ATLAS Event Service: A new approach to event processing (Abstr.183)

    ● Evolution of ATLAS Conditions Data and its management for LHC run-2 (Abstr. 203)

    ● The ATLAS EventIndex: architecture, design choices, deployment and first operation experience (Abstr. 208)

  • 13-17 April 2015 Rolf Seuster (TRIUMF) CHEP 2015: Status of offline ATLAS SW 13

    Conclusions

    ● Since end of Run 1, ATLAS updated offline Software significantly to match new requirements

    – factor of ~4 speedup of data reconstruction– overhauled analysis model including EDM– multi-processing software in Production

    ● Future requirement on framework identified– R&D has already started

    Enjoy Japan !I love the food

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13