summary of the persistency rtag report (heavily based on david malon’s slides) - and some personal...
TRANSCRIPT
Summary of the Summary of the Persistency RTAG ReportPersistency RTAG Report
((heavily based on David Malon’s slidesheavily based on David Malon’s slides))
- - and some personal remarksand some personal remarks
Dirk DüllmannDirk Dü[email protected]@cern.ch
CCS Meeting, April 18CCS Meeting, April 18thth 2002 2002
SC2 mandate to the RTAGSC2 mandate to the RTAG
Write the product specification for the Persistency Write the product specification for the Persistency Framework for Physics Applications at LHCFramework for Physics Applications at LHC
Construct a component breakdown for the Construct a component breakdown for the management of all types of LHC datamanagement of all types of LHC data
Identify the responsibilities of Experiment Identify the responsibilities of Experiment Frameworks, existing products (such as ROOT) and Frameworks, existing products (such as ROOT) and as yet to be developed productsas yet to be developed products
Develop requirements/use cases to specify (at least) Develop requirements/use cases to specify (at least) the metadata /navigation component(s)the metadata /navigation component(s)
Estimate resources (manpower) needed to prototype Estimate resources (manpower) needed to prototype missing componentsmissing components
Guidance from the SC2Guidance from the SC2
The RTAG may decide to address all types The RTAG may decide to address all types of data, or may decide to postpone some of data, or may decide to postpone some topics for other RTAGS, once the topics for other RTAGS, once the components have been identified.components have been identified.
The RTAG should develop a detailed The RTAG should develop a detailed description at least for the event data description at least for the event data management. management.
Issues of schema evolution, dictionary Issues of schema evolution, dictionary construction and storage, object and data construction and storage, object and data models should be addressed.models should be addressed.
RTAG CompositionRTAG Composition One member from each experiment, one from IT/DB, One member from each experiment, one from IT/DB,
one from ROOT team:one from ROOT team:– Fons Rademakers Fons Rademakers (Alice)(Alice)– David Malon David Malon (ATLAS)(ATLAS)– Vincenzo Innocente Vincenzo Innocente (CMS)(CMS)– Pere Mato Pere Mato (LHCb)(LHCb)– Dirk Düllmann Dirk Düllmann (IT/DB)(IT/DB)– Rene Brun Rene Brun (ROOT)(ROOT)
Quoting Vincenzo’s report at CMS Week (6 March 02)Quoting Vincenzo’s report at CMS Week (6 March 02)
““Collaborative, friendly atmosphere”Collaborative, friendly atmosphere”
““Real effort to define a common product”Real effort to define a common product”
This is already an accomplishment.This is already an accomplishment.
Response of RTAG to mandate Response of RTAG to mandate and guidance (excerpted from and guidance (excerpted from
report)report)
Intent of this RTAG is to assume an Intent of this RTAG is to assume an optimistic posture regarding the potential optimistic posture regarding the potential for commonalityfor commonality among the LHC among the LHC experiments experiments in all areasin all areas related to data related to data management management
Limited time available to the RTAG Limited time available to the RTAG precludes treatment of all components of a precludes treatment of all components of a data management architecture at equal data management architecture at equal depth depth – will propose areas in which further work, and will propose areas in which further work, and
perhaps additional RTAGs, will be neededperhaps additional RTAGs, will be needed
Response of RTAG to mandate Response of RTAG to mandate and guidance (excerpted from and guidance (excerpted from
report)report)
Consonant with SC2 guidance, the RTAG has Consonant with SC2 guidance, the RTAG has chosen to chosen to focusfocus its its initial discussions oninitial discussions on the the architecture of a architecture of a persistence management persistence management serviceservice based upon a based upon a common streaming layer, common streaming layer, and and onon the associated services the associated services needed to support needed to support itit– Even if we cannot accomplish everything we aspire to, Even if we cannot accomplish everything we aspire to,
we want to we want to ensureensure that we have provided a that we have provided a solid solid foundation for a near-term common projectfoundation for a near-term common project
While our aim is to define components and their While our aim is to define components and their interactions in terms of abstract interfaces that interactions in terms of abstract interfaces that any implementation must respect, it is any implementation must respect, it is notnot our our intention to produce a design that intention to produce a design that requirerequires a s a clean-slate implementation clean-slate implementation
Response of RTAG to mandate Response of RTAG to mandate and guidance (excerpted from and guidance (excerpted from
report)report)
For the streaming layer and related For the streaming layer and related services, we plan to provide a foundation for services, we plan to provide a foundation for an initial common project that can be an initial common project that can be based based upon the capabilities of existing upon the capabilities of existing implementationsimplementations, and upon , and upon ROOTROOT’s I/O ’s I/O capabilities capabilities in particularin particular
While new capabilities required of an initial While new capabilities required of an initial implementation should not be daunting, we implementation should not be daunting, we do not wish at this point to underestimate do not wish at this point to underestimate the amount of the amount of repackaging and refactoring repackaging and refactoring work requiredwork required to support common project to support common project requirements requirements
StatusStatus
Reasonable agreement on design criteria, e.g., Reasonable agreement on design criteria, e.g., – Component orientedComponent oriented, communication through abstract , communication through abstract
interfaces, no back channels, components make no interfaces, no back channels, components make no assumptions about implementation technology of assumptions about implementation technology of components with which they communicatecomponents with which they communicate
– Persistence for C++ data models is the principal targetPersistence for C++ data models is the principal target, , but our environments are already multilingual; should but our environments are already multilingual; should avoid constructions that make language migration and avoid constructions that make language migration and multi-language support difficult multi-language support difficult
– Architecture should Architecture should not preclude multiple persistence not preclude multiple persistence technologiestechnologies
– Experiments’ transient data models should Experiments’ transient data models should not need not need compile-time/link-time dependencies on persistence compile-time/link-time dependencies on persistence technologytechnology in order to use persistence services in order to use persistence services
Status IIStatus II Reasonable agreement on design criteria, e.g., Reasonable agreement on design criteria, e.g.,
– Transient object types may have several persistent representationsTransient object types may have several persistent representations, , the type of a transient object restored from a persistent one may be the type of a transient object restored from a persistent one may be different than the type of the object that was saved, a persistent different than the type of the object that was saved, a persistent object cannot assume it “knows” what type of transient object will object cannot assume it “knows” what type of transient object will be built from itbe built from it
– ……more…more… Component discussions and requirement discussions have Component discussions and requirement discussions have
been uneven—extremely detailed and highly technical in been uneven—extremely detailed and highly technical in some areas, with other areas neglected thus far for lack of some areas, with other areas neglected thus far for lack of timetime
Primary focus has beenPrimary focus has been on issues and components on issues and components involved in defining involved in defining a common persistence servicea common persistence service – Cache manager, persistence manager, storage manager, streamer Cache manager, persistence manager, storage manager, streamer
service, placement service, dictionary service(s), …service, placement service, dictionary service(s), …– Object identification, navigation, …Object identification, navigation, …
The proposed initial projectThe proposed initial project
Charge to the initial project Charge to the initial project is to deliver is to deliver the components of a common file-based the components of a common file-based streaming layerstreaming layer sufficient to support sufficient to support persistence for all four experiments’ event persistence for all four experiments’ event models, with management of the resulting models, with management of the resulting files hosted in a relational layerfiles hosted in a relational layer– Elaboration of what this means appears in the Elaboration of what this means appears in the
reportreport– Note that persistence service is intended to Note that persistence service is intended to
support all kinds of datasupport all kinds of data—it is not specific to —it is not specific to event dataevent data
Initial project componentsInitial project components
The specification in the report describes The specification in the report describes agreed-upon common project components, agreed-upon common project components, including including – persistence manager persistence manager – placement service placement service – streaming service streaming service – storage manager storage manager – externalizable technology-independent references externalizable technology-independent references – services to support event collectionsservices to support event collections– connections to grid-provided replica management connections to grid-provided replica management
and LFN/PFN servicesand LFN/PFN services
Initial project implementation Initial project implementation technologiestechnologies
Streaming layer should be implemented Streaming layer should be implemented using the ROOT framework’s I/O servicesusing the ROOT framework’s I/O services
Components with relational Components with relational implementations should make no deep implementations should make no deep assumptions about the underlying assumptions about the underlying technologytechnology– Nothing intentionally proposed that precludes Nothing intentionally proposed that precludes
implementation using such open source implementation using such open source products as MySQLproducts as MySQL
RTAG’s First Component RTAG’s First Component Diagram (under discussion)Diagram (under discussion)
DictionarySvcDictionarySvcStreamerSvcStreamerSvcStreamerSvcStreamerSvc
PersistencyMgrPersistencyMgr
I Refl ectionStreamerSvcStreamerSvc DictionarySvcDictionarySvc
StorageMgrStorageMgr
CacheMgrCacheMgr
I PRefl ection
PlacementSvcPlacementSvcI Placement
I Cnv
I ReadWrite
I Pers
C++
Persistence ManagerPersistence Manager
Principal point of contact between Principal point of contact between experiment-specific frameworks and experiment-specific frameworks and persistence servicespersistence services
Handles requests to store state of an Handles requests to store state of an object, returning a token (e.g., a persistent object, returning a token (e.g., a persistent address)address)
Handles requests to retrieve state of an Handles requests to retrieve state of an object corresponding to a tokenobject corresponding to a token
Like Gaudi/Athena persistence serviceLike Gaudi/Athena persistence service
Dictionary servicesDictionary services
Manages descriptions of transient (persistence-Manages descriptions of transient (persistence-capable) classes and their persistent representationscapable) classes and their persistent representations
Interface to obtain reflective interface about classesInterface to obtain reflective interface about classes Entries in dictionary may be generated by disparate Entries in dictionary may be generated by disparate
sourcessources– Rootcint, ADL, LHCb XML, …Rootcint, ADL, LHCb XML, …
With automatic converter/streamer generation, likely With automatic converter/streamer generation, likely that some persistent representations will be that some persistent representations will be derivable from transient representation, but derivable from transient representation, but possibility of multiple persistent representations possibility of multiple persistent representations suggests separate dictionariessuggests separate dictionaries
Streaming or conversion serviceStreaming or conversion service
Consults transient and persistent Consults transient and persistent representation dictionaries to produce representation dictionaries to produce persistent (or “persistence-ready”) persistent (or “persistence-ready”) representation of a transient object, or representation of a transient object, or vice versavice versa
Placement servicePlacement service
Supports runtime control of physical Supports runtime control of physical placement, equivalent to “where” hints in placement, equivalent to “where” hints in ODMGODMG
Intended to support physical clustering Intended to support physical clustering within in event, and to separate events within in event, and to separate events written to different physics streamswritten to different physics streams
Interface seen by experiments’ application Interface seen by experiments’ application frameworks is independent of persistence frameworks is independent of persistence technologytechnology
Insufficiently specified in the RTAG reportInsufficiently specified in the RTAG report
References and reference References and reference servicesservices
Common class for encapsulating persistent Common class for encapsulating persistent addresses addresses
ExternalizableExternalizable Independent of storage technology, likely Independent of storage technology, likely
with an opaque payload that is with an opaque payload that is technology-specifictechnology-specific
In hybrid model, intent is that from a Ref, In hybrid model, intent is that from a Ref, one can determine what “file” is needed one can determine what “file” is needed without consulting the particular storage without consulting the particular storage technologytechnology
Store managerStore manager
Stores and retrieves variable-length Stores and retrieves variable-length stream of bytesstream of bytes
Deals with issues at the file levelDeals with issues at the file level
RefRefLFN serviceLFN service
Translates an Object Reference into the Translates an Object Reference into the logical filename of the file containing the logical filename of the file containing the referenced objectsreferenced objects
Expected that Ref will have some kind of Expected that Ref will have some kind of file id, that can be used to determine file id, that can be used to determine logical file name in the grid senselogical file name in the grid sense
LFNLFN{PFN{PFN}PosixName service}PosixName service
Translate a logical filename into the posix Translate a logical filename into the posix name of one physical replica of this filename of one physical replica of this file
Expect to get this from grid projects, Expect to get this from grid projects, though common project may need to though common project may need to deliver a service that hides several deliver a service that hides several possible paths behind a single interfacepossible paths behind a single interface
Event collection servicesEvent collection services
Support for explicit event collections (not Support for explicit event collections (not just collections by containment)just collections by containment)– Support for collections of event referencesSupport for collections of event references
Queryable collections: Like a list of Queryable collections: Like a list of events, together with queryable tags events, together with queryable tags – Possibly indexedPossibly indexed
MetaDataCatalog
DictionarySvcStreamerSvcStreamerSvc
PersistencyMgr
IReflectionStreamerSvc DictionarySvc
StorageMgr
IPReflection
FileCatalog
ICnv
IReadWrite
C++
CacheMgr
ICache
TFile,TDirectoryTSocket
TClass, etc.
TBuffer, TMessage, TRef, TKey
TGrid
TTree
TStreamerInfo
IteratorSvc TChainTEventListTDSet
IPers
IFCatalog
SelectorSvc
IMCatalog
PlacementSvcIPlacement
TFile
CustomCacheMgrIPers
One possible One possible
mapping to a ROOTmapping to a ROOT
implementationimplementation
(under discussion)(under discussion)
Clarify Resources, Clarify Resources, Responsibilities & RisksResponsibilities & Risks
Expected resources at project start and their evolutionExpected resources at project start and their evolution– Commitments from experiments, the ROOT team and ITCommitments from experiments, the ROOT team and IT
Who does what (and until when)?Who does what (and until when)?– Who develops which software component(s)?Who develops which software component(s)?– Who maintains those components afterwards?Who maintains those components afterwards?– Who develops production services around those?Who develops production services around those?
What is the procedure for dropping any of those What is the procedure for dropping any of those services?services?– Any “in house” development involves a significant maintenance Any “in house” development involves a significant maintenance
commitment by somebody and risk for somebody elsecommitment by somebody and risk for somebody else– Need to agree on these commitmentsNeed to agree on these commitments
Components & InterfacesComponents & Interfaces
Need to rapidly agree on concrete component Need to rapidly agree on concrete component interfacesinterfaces– Could we classify/prioritize interfaces by risk?Could we classify/prioritize interfaces by risk?
eg by damage caused by a later interface changeeg by damage caused by a later interface change– At least the major (aka external) interfaces need the official At least the major (aka external) interfaces need the official
blessing of the Architects’ Forum and can not be modified blessing of the Architects’ Forum and can not be modified without it’s agreementwithout it’s agreement
No component infrastructure defined so farNo component infrastructure defined so far Component inheritance hierarchy, component factories, Component inheritance hierarchy, component factories,
component mapping to shared libraries etc.component mapping to shared libraries etc. LCG approach needs to be “compatible” with LCG approach needs to be “compatible” with
– several LCG sub-projectsseveral LCG sub-projects– several different experiment frameworksseveral different experiment frameworks– several existing HEP packages eg ROOT I/O several existing HEP packages eg ROOT I/O – several RDBMS implementationsseveral RDBMS implementations
May need to assume some instability until solid foundation is May need to assume some instability until solid foundation is accepted for LCG applications areaaccepted for LCG applications area
Components & Interface cont.Components & Interface cont.
Fast adoption of existing “interface” Fast adoption of existing “interface” classes may be tempting but is also very classes may be tempting but is also very riskyrisky– Should not just bless existing header files Should not just bless existing header files
which were conceived as implementation which were conceived as implementation headersheaders
– Should take the opportunity to (re-) design a Should take the opportunity to (re-) design a minimal, but complete set of abstract minimal, but complete set of abstract interfaces interfaces
– And then implement them using existing And then implement them using existing technologytechnology
Timescales, Functionality & Timescales, Functionality & Technologies of an Initial Technologies of an Initial
PrototypePrototype Experiments have somewhat differing timescales for their first Experiments have somewhat differing timescales for their first
use of LCG componentsuse of LCG components– Synchronization of initial release schedule would definitely improve Synchronization of initial release schedule would definitely improve
chances of successchances of success
Experiments may favor different subsets of full functionality for Experiments may favor different subsets of full functionality for a first prototypea first prototype– Need to agree on main requirements for prototype s/w and Need to agree on main requirements for prototype s/w and
associated services to guide implementation and technology associated services to guide implementation and technology choiceschoices
– Synchronization of feature content and implementation Synchronization of feature content and implementation technology is requiredtechnology is required
Which RDBMS backend? What are the deployment requirement? Which RDBMS backend? What are the deployment requirement? – ““lightweight” system (end-user managed) - maybe reduced requirements on lightweight” system (end-user managed) - maybe reduced requirements on
scalability and fault tolerance and even on functionalityscalability and fault tolerance and even on functionality– Fully managed production system - based on established database services Fully managed production system - based on established database services
(incl. backup, recovery from h/w fault …)(incl. backup, recovery from h/w fault …) May need prototype implementation for both May need prototype implementation for both
SummarySummary
Persistency RTAG has delivered its final report to the Persistency RTAG has delivered its final report to the SC2SC2– Significant agreement on requirements and a component Significant agreement on requirements and a component
breakdown have been achievedbreakdown have been achieved– Report does not define all components and their interaction in Report does not define all components and their interaction in
full or equal depthfull or equal depth– A common project on object persistency is proposedA common project on object persistency is proposed
Possible next stepsPossible next steps– Clarify available resources and responsibilitiesClarify available resources and responsibilities– Agree on scope, timescale and deployment model of first Agree on scope, timescale and deployment model of first
project prototypeproject prototype– Rapidly agree on concrete set of component interfaces and Rapidly agree on concrete set of component interfaces and
spawn work packages to implement them spawn work packages to implement them – Continue to resolve remaining open questionsContinue to resolve remaining open questions