david adams atlas virtual data in atlas david adams bnl may 5, 2002 us atlas core/grid software...
TRANSCRIPT
David Adams
ATLAS
Virtual Data in ATLAS
David Adams
BNL
May 5, 2002
US ATLAS core/grid software meeting
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 2
David Adams
ATLAS
Contents• Warning
• Definitions
• Purpose
• Event data granularity– EDO
– Sharing category
– File
– Event list
– Dataset
– ADB event collection
• ADB differences
• Event data space
• ATLAS data model
• Event data history
• EDO VDS
• SC or event VDS
• File VDS
• Event list
• Dataset VDS
• Conclusions
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 3
David Adams
ATLAS
WarningStarting point
• The following is intended as a starting point for discussion
Sources• Opinions expressed are my own• I don’t know of any ATLAS policies or
conventions for virtual data• There is ATLAS work in progress to use
GriPhyN virtual data model for DC1
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 4
David Adams
ATLAS
DefinitionsVirtual data
• Data which may be brought into existence using associated history or prehistory
History• Record of how data was produced
Prehistory• Prescription for creating data
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 5
David Adams
ATLAS
Definitions (cont)GriPhyN virtual data system (VDS)
• Unit of data– so far file
• Transformation takes data units as input an produces more data units
– so far an executable with formal parameters
• Derivation is is an application of a transformation
• How do we map ATLAS onto this model?– See following…
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 6
David Adams
ATLAS
PurposeRecord keeping
• History provides a record of how data was produced (event-by-event and collectively)
On-demand generation• If data does not exist or is not easily accessible
– History can be used to regenerate data
– Prehistory can be used to generate data
Production• Prehistory can be used to configure production
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 7
David Adams
ATLAS
Event data granularityATLAS levels of data granularity
• Physics object (e.g. track, jet or electron)• EDO – event data object• Sharing category• Event• File• Event list• Dataset
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 8
David Adams
ATLAS
EDODefinition
• EDO is a collection of physics objects– Typically homogenous
– May add some collective data such as total transverse energy
• An algorithm takes one or more EDO’s as input and produces one (or more) as output
– Reminiscent of VDS
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 9
David Adams
ATLAS
Sharing CategoryDefinition
• Collection of related EDO’s with the same event ID
– E.g. tracking data or high-level physics objects
• No sharing of EDO’s between categories?• Sharing category is not shared between files
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 10
David Adams
ATLAS
EventWarning
• Event may mean beam crossing or subset of associated data (event view)
HES event view• Arbitrary collection of EDO’s associated with
the same event ID• Scope defined by context
– E.g. file or transient data store– All data (including versions) probably not useful
• Typically (always?) includes all contents of a well-defined set of sharing categories
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 11
David Adams
ATLAS
FileCurrent HES definition
• Holds EDO’s for a specified set of event ID’s• Holds the same set of sharing categories for
each event• Sharing category or EDO may be held by value
or reference• EDO may be a replica
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 12
David Adams
ATLAS
File (cont)F ile f1
Event e1
P C p1
P C p2
P C p3
ED O 1 ED O 2
ED O 3
ED O 4 ED O 5 ED O 6
Event e2
P C p1
P C p2
P C p3
ED O 7' ED O 8'
ED 10 ED O 11 ED O 12
F ile f2
Event e1
P C p1
P C p3
Event e2
P C p1
F ile f3
Event e2
P C p1
P C p2
ED O 7 ED O 8
ED O 09
P C p3
E xa m ple o f p os s ib le a s so c ia tions be tw e en H E S file s , p la c em e nt c a tego rie s (P C 's ) and e v ent da ta o b je c ts (E D O 's ).
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 13
David Adams
ATLAS
File (cont)Future HES definition
• Add history for each EDO• Option to only hold history (regeneration)• Include non-event data
– E.g. replicas of shared history objects such as algorithms
• Option to hold only instruction for building (prehistory)?
• Drop PC’s (sharing categories)?
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 14
David Adams
ATLAS
Event listDefinition
• Collection of ID’s for events satisfying physics selection criteria
– E.g. 2 or more jets, one lepton, missing ET all with energy or momentum thresholds
• Data versions on which selections were based• Collective properties
– Integrated luminosity
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 15
David Adams
ATLAS
DatasetPurpose
• To identify the data (and hence the files) that must be gathered for a job to run
Definition• Event list• Restriction on content (EDO type-keys)
– E.g. only summary data or tracking data
• Versions of these EDO’s– Require consistency with selection versions?
• File collection(s) holding these EDO’s
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 16
David Adams
ATLAS
ADB differencesEvent
• ADB event generally holds copies or references to all the event data used in its construction
• Not possible to combine views of an event– E.g. tracks from one and jets from another
– Advantage is enforced consistency
– Disadvantage is limited flexibility
Event collection• ADB event collection is between the event list
and dataset defined here
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 17
David Adams
ATLAS
Event data space
Eve
nt I
D
Versio n
(code and
paramete
rs )
C o ntent(typ e-key,sharing c atego ry,R AW /ES D/AO D)
Eve nt l is t
File s
D atase t
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 18
David Adams
ATLAS
ATLAS data model
R a w da ta(byte s tre a m )
M D T D ig its
M u o n e v e nt d a ta m o d e l
R P C D ig its T G C D ig its C S C D ig its
M D T H its R P C H its T G C H its C S C H its
M uo n tra c k s
U np a c k ing
C a lib ra tio n,a lignm e nta nd c lu s te ring
C a nd id a te sfo r v irtu a ld a ta a nd /o rsm a rtc o nta ine rs
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 19
David Adams
ATLAS
Event data historyCurrent ATLAS model is object oriented
• History object for each EDO references– EDO
– parents of EDO
– Algorithm history object
– Job history object
• See figure
Contains complete history only if ancestor history objects are present
• Regeneration not possible if these are gone
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 20
David Adams
ATLAS
Event data history (cont)
EDO
c alib /geo m
EDOHis to ry
-------------E ve nt IDS ta r t tim eS top tim eC P U tim ere turn s ta tusc he c ks um
Jo bHis to ry
-------------R e l ve rs ionS ta r t tim e
C P U---- - - - - - -
O S.. .
Algo rithmHis to ry
-------------J obO ptionsA lgo ve rs ion
parents
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 21
David Adams
ATLAS
Event data historyModify to add prehistory
• Enable regeneration from a single event history object
• Replace algorithm history with algorithm history DAG (directed acyclic graph)
– Include links to ancestor algorithm history objects
– Requires opening the history objects of the parent EDO’s (unless these were written as part of the same job)
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 22
David Adams
ATLAS
EDO VDSATLAS data unit
• Having identified the ATLAS levels of granularity, we need to select which one(s) is used to define our VDS unit of data
• In the original GriPhyN design, the file was chosen
– This is being generalized
• Natural choice for us is the EDO– Smallest unit of processing (?)
– Smallest unit of replication (?)
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 23
David Adams
ATLAS
EDO VDS (cont)EDO transformation
• Transformation is an Athena algorithm specified by
– Parameters in jobOptions
– Algorithm version
• Athena executable typically performs multiple transformations
– Algorithm DAG
• Input type-keys are implicit (buried in the code)• This is prehistory data
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 24
David Adams
ATLAS
EDO VDS (cont)EDO derivation
• Specify input data (event view)– Event ID
– Input EDO instances (not just type-key)
– Use parent EDO histories to extend algorithm DAG’s back to the raw data
• Job-specific (which CPU, resources consumed,…)
• Combined with transformation data, these give the history for each produced EDO
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 25
David Adams
ATLAS
SC or event VDSSharing category or event (view) is a collection of EDO’s
• Transformations and derivations can be expressed by merging those for the constituent EDO’s
• Algorithm DAG’s can often be merged into a single (connected) DAG
– Transformation (derivation) should express which EDO’s kept (present)
Sensible to speak of SC or event VDS
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 26
David Adams
ATLAS
File VDS (cont)File is also a collection of EDO’s but
• Do events have a common transformation? – Same algorithm histories
• Do events have a common derivation?– Same job
– Same input file algorithm DAG’s
Probably not implies VDS less useful for files• However it is likely useful to keep track of the
transformations and derivations used in each file
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 27
David Adams
ATLAS
Event list VDS (cont)Event list Transformation
• Is a selection algorithm applied to each event• Includes specification of the content (EDO
type-keys) on which the selection is based• Might include restriction on EDO versions
Derivation • Recorded at the event level specifies
– EDO instances (normally from dataset)
– Job parameters (CPU, …)
• Meaningful in the context of a dataset
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 28
David Adams
ATLAS
Dataset VDS (cont)Dataset transformations include
• Algorithm DAG• Event selection• Event merge (new but trivial)
Dataset derivation includes• Input datasets• Distributed job description
– Full specification (e.g. CPU for a given EDO) probably requires examining EDO histories
May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 29
David Adams
ATLAS
ConclusionsMuch work to do. This is a first pass.
Most useful data units for VDS are• EDO for tracking data at the event level and
data regeneration• Dataset for staging and tracking production and
shared event selection• What about files?