advanced grid technologies in atlas data management alexandre vaniachine argonne national laboratory...
Post on 20-Jan-2016
219 Views
Preview:
TRANSCRIPT
Advanced Grid Technologies in ATLAS Data
Management Alexandre Vaniachine
Argonne National Laboratory
Invited talk at NEC’2003XIX International Symposium on Nuclear
Electronics & ComputingVarna, Bulgaria
15-20 September 2003
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
ATLAS Software OverviewGrid technologies deployedDC1 production experience
ATLAS computing challenge
Core software domains
Data management architecture
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
26th March 2003 LHCC ATLAS status report
ATLAS commissioning
Phase ASystem at ROD level.Systems for LVL1, DCS and DAQ.Check cable connections.Infrastructure.Some system tests.
Phase CSystems/Trigger/DAQ combined.
Phase DGlobal commissioning. Cosmic ray runs.Initial off-line software. Initial physics runs.
Phase BCalibration runs on local systems.
8/03 12/04 03/06 10/06
The discussions and the planningfor the commissioning phasesof the experiment have startedin the Collaboration at many levels
ATLAS Computing Challenge Our event size: 1-1.5 MB After on-line selection
events will be written to permanent storage at a rate of 100-200 Hz
Raw data: 1 PB/year With reconstructed and
simulated data the total is ~10 PB/year
ATLAS depends on computing as much as it depends on the trigger or the hadron calorimeter
These data start coming at the full rate at the end of 2006
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
+ The problem of the larger and more distributed collaboration >2000 collaborators 151 institutions 34 countries
+ The decision that CERN will supply only a fraction of the computing with the rest supplied by collaborators
The RESULT of the unprecedented data sizes and the distributed nature of physicists and computing is the need for multiple advances in computing tools
Planetary Computing Model
Computing infrastructure, which was centralized in the past, now will be distributed
(For experiments the trend is reverse)
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Software Framework: Athena
Athena features:Common code
base with Gaudi framework (LHCb)
Separation of data and algorithms
Memory management
Transient/ persistent data split
The backbone of ATLAS Computing Model data flow
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
SoftwareFramework
GridComputing
DataManagement
My presentation will focus on advances in computing technologies integrating Grid Computing and Data Management – two core software domains providing foundation for ATLAS Software Framework
Separation of transient and persistent datain ATLAS software architecture determinesthree corecomputingdomains
Core Computing Domains Scalable solutions for
data persistency Software framework
for data processing algorithmsGrid computing
for data processing and analysis
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Interfacing Athena to the Grid
Athena/GAUDI Application
Virtual Data, Algorithms
GRIDServices
Histograms, Monitoring Results
Job: configuration monitoring scheduling
Resource: estimation booking
GANGA: Gaudi/Athena aNd Grid Alliance
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
ATLAS Database Architecture Described in ATLAS
Database Architecture document
Site 1
Site 3Site 2
Transport & Install
Extract & Transform
Just Extract
Transport, Transform & Install
Ready for Grid Integration
Independent on persistency technology
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Ensuring that the ‘application’ software is independent of underlying persistency technology is one of the defining characteristics of the ATLAS software architecture (“transient/persistent” split)
Changing the persistency mechanism (e.g. Objectivity -> Root I/O) requires a change of “converter”, but of nothing else
The ‘ease’ of the baseline change demonstrates benefits of decoupling transient/persistent representations
Integrated operation of framework & data management domains demonstrated the capability of• reading the same data from different frameworks• switching between persistency technologies:
Objectivity DB & ROOT I/O persistency in ATLAS DC0ATLAS-specific temporary solution (AthenaROOT) in DC1An important milestone towards DC2 has been achieved recently:
• the LHC-wide hybrid ROOT-based persistency technology POOL for DC2 delivered in the latest ATLAS software release 7.0.0 (AthenaPOOL)
Technology Independence
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
LHC Common Persistence Infrastructure (POOL)
During the past year a new effort emerged – the LHC-wide Computing Grid Project (LCG)
The LCG's Requirements Technical Assessment Group (RTAG) on persistence recommended a common infrastructure:an object streaming layer based upon ROOTand a relational database layer for file management and
higher-level services Based on RTAG recommendations a common development
project was launched: POOL ATLAS is committed to this effort and adopted POOL technology To be clear: the common project infrastructure that POOL will
provide is our baseline event store technology
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
ATLAS Data Challenges In a recent world-wide
collaborative effort - Data Challenge 1 (DC1) - spanning over 56 prototype tier centers in 21 countries on four continents, ATLAS produced more than60 TB of data for physics studies
DC1 provided a testbed for integration and testing of advanced Grid computing components in a production environment
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
DC1 Production on the Grid
A significant fraction of DC1 data produced: NorduGrid US ATLAS Grid TestbedDC1 jobs successfully tested: EDG Grid3 (US ATLAS, US CMS, LIGO, SDSS sites)
BNL
Boston U
Tier1
Prototype Tier2
Tier1
Prototype Tier2
Michigan
Testbed sites
UTA
OU
Indiana
LBL
UNM
HU
Argonne
SMU
Outreach site
Condor-G submit & VDC host
Condor-G submit & VDC host
Chimera execution site
Chimera execution site
Chimera Storage host & MAGDA Cache
Chimera Storage host & MAGDA Cache
RLS serverRLS server Chicago
MAGDA server
MAGDA server
AtlasChimeraPacman cache
AtlasChimeraPacman cache
ATLAS releases from Nordugridand CERNPacman cache
ATLAS releases from Nordugridand CERNPacman cache
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Innovative TechnologiesSeveral novel Grid technologies were used in ATLAS
data production and data management for the first time. My presentation will describe new Grid technologies introduced in HEP production environment:Chimera Virtual Data System automating data
derivationVirtual Data Cookbook services managing templated
production recipesefficient Grid certificate authorization technologies for
virtual data access controlvirtual database services delivery for reconstruction
on Grid clusters behind closed firewalls
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Centralized Management For efficiency of the large production tasks distributed worldwide, it is
essential to establish shared production management tools
Metadata Catalog
LFN Attribute Value
Replica Catalog
LFN PFNs[ ]
Virtual Data Catalog
derived LFNs[ ] required LFNs[ ] ^transformation
The ATLAS Metadata Catalogue AMI and the Replica Catalogue MAGDA exemplify such Grid tools deployed in DC1
To complete the data management architecturefor distributed production ATLAS prototyped Virtual Data services
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
MAGDA Architecture
Replica Catalogue MAGDA: MAnager for Grid-based DAta
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
AMI Architecture
Metadata Catalogue AMI: ATLAS Metadata Interface
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Introducing Virtual Data The prevailing views in HEP Computing have been data-centric: we
need to produce the data (ASAP), with the production recipes being just some tools that were used in the process by the “production gurus”. The value of the production recipes has not been fully appreciated.
Preparation of recipes for data production requires significant efforts and encapsulates a considerable experts’ knowledge
Because the production recipes have to be fully validated their development is an iterative time-consuming process similar to the fundamental knowledge discovery
The GriPhyN project (www.griphyn.org) introduced a different perspective:
recipes are as valuable as the data If you have the recipes you may not even need the data: you can
reproduce the data ‘on-demand’
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
VDC Architecture
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Virtual Data in DC1 ProductionTo deliver scalable data management solution ATLAS
implemented innovative Computing Science concepts in practice: first use of Virtual Data technologies in DC1 production
Two concepts are implemented in ATLASVirtual Data System operation:Production workflow became computerized
Acyclic data dependencies tracking using GriPhyN and iVDGL software
• Providing Data Provenance Services• first use of Chimera Virtual Data system in production
Production recipes became templetizedTemplated recipes repository: Cookbook
• Providing Data Providence* Services• about a half of more than two hundred DC1 datasets were serviced
* prov·i·dence n. 1. Care or preparation in advance; foresight, The American Heritage Dictionary of the English Language
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Acyclic Portion of DC1 Workflow Chimera Virtual Data system eliminates ‘manual’ tracking of the data dependencies between
independent production steps & enables multi-step compound data transformations on-demand
AthenaGenerators
HepMC.root
digis.zebra
atlsimatlsim pileup
digis.root Athenarecon
recon.root
QA.ntuple
geometry.zebraAthena QA
AthenaAtlfast
filtering.ntuple
geometry.root
Athenaconversion
QA.ntuple
Athena QA
Atlfast.root
Atlfastrecon
recon.root
Feedback loop introduced in ATLAS by physics validation is omitted
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Chimera in DC1 Reconstruction Installed ATLAS releases 6.0.2+
(Pacman cache) on select US ATLAS testbed sites
2x520 partitions of DataSet 2001 (lumi10) have been reconstructed at JAZZ-cluster (Argonne), LBNL, IU and BU, BNL (test)
2x520 Chimera derivations, ~200,000 events reconstructed
Submit hosts - LBNL; others: Argonne, UC, IU
RLS-servers at the University of Chicago and BNL
Storage host and Magda cache at BNL Group-level Magda registration of
output Output transferred to BNL and
CERN/Castor
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Uncharted OGSA Area
Interest in X509 authorization capabilities of MySQL was prompted by Doug Olson announcement to PPDG mailing list
Numerous e-mail exchanges and discussions with interested PPDG participants on grid-enabling MySQL
Grid example by Kate Keahey SC02 OGSA Tutorial
Grid Service Example:Database Service
A DBaccess Grid service will support at least two portTypes GridService
Database_PortType
Each has service data
GridService: basic introspection information, lifetime, …
DB info: database type, query languages supported, current load, …, …
GridService DB_PortType
DB info
Name, lifetime, etc.
Database services on the grid is an uncharted OGSA areaAt CHEP’03 MySQL emerged as the most popular database
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Database Access on the Grid
Different security modelsA separate server does the grid authorization:
Spitfire (EDG WP2) – SOAP/XML text-only data transportDAI (IBM UK) – Spitfire technologies + XML binary
extensionsPerl DBI database proxy (ALICE) – SQL data transportOracle 10g (separate authorization layer)
Authorization is integrated in database server:on a higher level: GSS API (work by Richard Casella, BNL)on a lower level: certificate verification (my current work)
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Grid-enabling MySQL
Tested MySQL X509 certificate authorization technologyvalidated with DOE, CERN and Nordugrid certificatespotential problem with host certificates issued at CERN
Developed solutions for MySQL security problemsadopted in MySQL 4.0.13
Increased MySQL AB awareness of the grid computing needs
Set up grid-enabled server prototype for ATLASused in ATLAS Data Challenge 1 production for Chimera-
based reconstruction
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Production Experience
Collected production experience with grid security model:need to expand backward compatibility of grid proxy toolsneed to add the server purpose to grid host certificatesneed to initiate the grid proxy upon login (similar to AFS token)need for shared grid certificates
similar to privileged accounts traditionally shared in HENP computing for production, librarian, data management and database administration tasks
More information was presented atPPDG (All-hands meeting)Grid3 (production experience reported)
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Coherent Approach
Main Server
Replica Servers
Transport & Install
Extract & Transport
Extract-Transport-Install MySQL simplified the delivery of the extract-transport-install components of ATLAS database architecture to provide database services needed for the DC1 reconstruction on sites with Grid Compute Elements behind closed firewalls (e.g., NorduGrid)
NEC'2003 Varna, Bulgaria
Alexandre Vaniachine
Roadmap to Success
ATLAS computing is steadily progressing towards a highly functional software suite, plus a World Wide computing model
During the past year, Data Challenges have provided both an impetus and a testbed for bringing coherence to developments in all core software domains
Several advanced Grid Computing technologies were successfully tested and deployed in ATLAS Data Challenge 1 production environment
top related