CHEP 2004, Core Software
Integration of POOL into three Integration of POOL into three Experiment Software FrameworksExperiment Software Frameworks
Giacomo GoviCERN IT-DB & LCG-POOL
K. Karr, D. Malon, A. Vaniachine (Argonne National Laboratory)P. Van Gemmeren (BNL)R. Chytracek, D. Duellmann, M. Frank, M. Girone, G. Govi, V. Innocente, P. Mato Vila, J. Moscicki, I. Papadopoulos, H. Schmuecker (CERN)R D Schaffer (LAL-IN2P3) Z. Xie (Princeton University )T. Barrass (University of Bristol)C. Cioffi (University of Oxford)W. Tanenbaum (Fermi National Accelerator Laboratory)
CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 22
OUTLINEOUTLINE
• POOL Project mandate
• ATLAS, CMS, LHCb integration
• Commonalities and differences
• Learning from integration experience
• Conclusions
CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 33
POOL project mandatePOOL project mandate
• Provide a framework for C++ object persistency - - API neutral on storage technology - Root and RDBMS backends
- File access integrated with current GRID technologies
• Follow up experiment specific requirements - Extract ‘synthesis’ among many (overlapping) use cases
- Resolve conflicting requirements
- Find common solutions
• Encourage concrete experiment participation - - Include experiment members in the POOL Core developers team - Follow up quick integration of POOL releases in the experiment framework - Involve experiment in the validation phase of release processes. • Re-use experience from previously adopted Persistency-
related technologies: Objectivity, Gaudi, RD45
CHEP 2004, Core SoftwareCHEP 2004, Core SoftwareG.Govi, IT-DBG.Govi, IT-DB 4
ATLAS: The Overall approach of the experiment framework on object Persistency
From the Athena/Gaudi framework point of view, POOL is just a new I/O “technology” This implies writing a new conversion service Main components:
AthenaPoolCnvSvc - conversion service AthenaPoolConverter - converter base class T_AthenaPoolCnv<T> - templated converters PoolSvc - Athena/Gaudi service interface to POOL
Allows jobOptions DataHeader - stores the refs of the Event Data Objects
Ref to DataHeader is inserted in the event collection
ATLAS has simplified the user interface by allowing “generic” converters: Use templated converter and generate the necessary classes to create the
converter automatically User just needs to specify a “.h” file for each DataObject (pool ref’ed object)
to be stored
CHEP 2004, Core SoftwareCHEP 2004, Core SoftwareG.Govi, IT-DBG.Govi, IT-DB 5
ATLAS: The Overall approach of the experiment framework on object Persistency
Algorithm
TransientData Store
ADataObj
SubObj1
SubObj2
SubObj3
IDataObj
retrieve(ptr, “key”)orrecord(ptr, “key”) Data
Service ConversionService
PersistencyService
PoolSvc
AthenPoolCnvSvc
ROOTfiles
POOL::DataSvc
POOL::FileCatalog
Athena services
POOL-specificAthena services
POOL services
generic conversion
CHEP 2004, Core SoftwareCHEP 2004, Core SoftwareG.Govi, IT-DBG.Govi, IT-DB 6
ATLAS: The POOL components used
POOL DataSvcPOOL DataSvc is for now the entry point for object persistency is for now the entry point for object persistency Two caches: input/output, use Ref<T> for object I/O Cache functionality is not strictly needed => duplicates Athena/Gaudi transient store
Object lifetime managed by Athena/Gaudi transient store For event storage, using: RootStorageSvc and Implicit collections Some conditions storage using RootStorageSvc
Support both Tree-based and Key-based ROOT storage, selectable in Athena via JobOptions Using ROOT Trees as default for now to gain experience Key-based approach is similar to what has already been tested in ATLAS via Objectivity Expect to also use Object/Relational storage when available
Using XML catalogs for local data access, EDG RLS and Globus RLS for master file catalogs
For tag collections: using both Root and MySQL collections Currently deploying Oracle DB for detector description parameters via Relational
Access Layer
G.G
ovi,
IT-D
B
Slide 7
Use of POOL in CMS: current statusUse of POOL in CMS: current status
COBRA OSCAR ORCA Current version being tested using last POOL 2.0.0 internal release
Production still use pool 1.7
Usable for production, deployed to physicists Used for SW tutorials each Friday since autumn 2003 35 Million events produced with OSCAR (G4 simulation)
Essentially same functionality as previous Objectivity-based code Limitations
No concurrent update of databases▫ No direct connection to central database while running
Remote access limited to RFIO or dCache (soon GFAL?) No Schema evolution
Added values No need of a common base class (ooObj) Native support of stl containers Support for transient attributes
G.G
ovi,
IT-D
B
Slide 8
Algorithm Context
(Event)
pool Refs
Local transient store
Persistent store
POOL
DataSvc
(Object cache)
Reconstruction on demand
Data Access in CMSData Access in CMS
Retrieve a chunk
G.G
ovi,
IT-D
B
Slide 9
What CMS uses of POOL What CMS uses of POOL
Transition from Objectivity to Pool inspired by a minimum impact principle
• All objects (event and metadata) are stores as root keyed-objects (no root-tree)
• Only object navigation is used, no other access mechanisms• Ref
Full interface• File Catalog
Full interface XML implementation in Physics Applications MySQL & RLS used in production (DC04)
• Session Only Transaction Management No explicit Database/Container handling
CHEP 2004, Core Software G.Govi, IT-DB 10
LHCb Goals for POOL Integration
• Keep the existing framework architecture– Objects are transient and reside in “Data Stores”
• source for conversion to persistent or graphical representation– “Algorithms” access objects by “logical name”
from a data store
• Keep the existing event model description– Code (headers) generated from XML files
• Need to access existing data– Read data with pre-POOL software (ROOT based)
• Usage of POOL transparent to end-users (physicists)
CHEP 2004, Core Software G.Govi, IT-DB 11
Data Access in Gaudi Applications
POOL
(5) Register
DataService
Algorithm
(1) retrieveObject(…)
Try to accessan object data
(2) Search in Store
Data Store
(3) Request load
PersistencyService
ConversionService
Technology dispatcher
PersistencyService
PersistencyService
StorageServiceStorageService
ConversionService(s)
ConversionService(s)
CHEP 2004, Core Software G.Govi, IT-DB 12
Customization of POOL for Gaudi• Currently main technology for event storage
– Write event data in ROOT tree-mode– Detector data etc. not (yet) implemented
• POOL components used– FileCatalog (XML flavor), PersistencySvc, StorageSvc + ROOT backend
implementation– Collections ??
• Usage of Gaudi object cache– Efficiently Managed by the Gaudi framework
Event / time interval (detector description), …– Tree like structure, like file system “/Event/MC/Particles”– Consequence on reference implementation
• Dictionaries generated from XML event description– Non trivial: Dictionary completeness
CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1313
Commonalities & differences ICommonalities & differences I
• Common starting point: re-use code from existing frameworks
- Minimize the impact of the integration / migration to POOL supported technologies
- Results in three rather different integration approaches - POOL API design highly influenced by experiment-specific requirements
driven by this principle
• Root backend adopted as main storage technology for event data
- With tree-based container (Atlas,LHCb)
- With key-based container (CMS) - Common Interest in the future development of RDBMS backend
• Common choice of file bookkeeping through POOL catalogues
- XML catalog adopted in the three production chains
CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1414
Commonalities & differences IICommonalities & differences II
• Object caching and navigation - ATLAS: Integrate POOL Ref API with the Athena object store, using
a customized ownership policy
- CMS: Replace Objectivity with the POOL OO-db API (similar rules for navigation)
Both approaches are object-based and leave implicit the database and container management
- LHCb: Integrate a lower-level component (PersistencySvc). Object bookkeeping and navigation left to Gaudi framework.
A Service-based approach. Database and container management are explicitly controlled.
CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1515
Learning from Integration ExperiencesLearning from Integration Experiences
Differences are large in the object transient store and navigation area. -The navigation mechanism strongly influences the Cache features (object lifetime management, semantics for object association)
-Some experiment framework already had specific object bookkeeping services – not easily re-usable for common purposes
-A real decoupling of the Ref implementation from the Cache details is difficult to achieve
-A review of the top level architecture and API (done in 2003 with the experiments) could not find a common solution
CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1616
RemarksRemarks
• ATLAS– Some complex components of the data model and some StoreGate
constructs were initially difficult to persistify in POOL– Some compromises made; ATLAS event data model is for the most part
persistifiable in POOL today (sufficient for the Data Challenge)
• CMS– Integration started early with the first POOL releases: important
contribution to debugging and consolidation– Root based streaming in POOL not optimized for UPDATE operation
• LHCb– High level of customization required to POOL API to minimize integration
impact – POOL (or ROOT storing complex objects) needs considerable more CPU
consumption than the simple Gaudi object serialization (based on BLOB structures)
– But: ROOT provides schema evolution; BLOB serialization did not
CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1717
ConclusionsConclusions
• POOL has been successfully integrated in three of the LHC experiment software frameworks and used in data challenges
- The common solution provided satisfies requirements for production - As required by all experiment, the impact of the POOL integration to the
existing framework has been kept reasonably low
• Integration approaches differ in the object navigation area - POOL component usage follows the different requirements of the
experiment frameworks - Some area of duplication still present among POOL and the experiment
framework
• The core pool components are all used- by at least one experiment
• Common view to look forward - Improve performance (where possible) for the Root backend - Provide a RDBMS backend