david adams atlas virtual data in atlas david adams bnl may 5, 2002 us atlas core/grid software...

29
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

Upload: jeffery-griffin

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

David Adams

ATLAS

Virtual Data in ATLAS

David Adams

BNL

May 5, 2002

US ATLAS core/grid software meeting

Page 2: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 2

David Adams

ATLAS

Contents• Warning

• Definitions

• Purpose

• Event data granularity– EDO

– Sharing category

– File

– Event list

– Dataset

– ADB event collection

• ADB differences

• Event data space

• ATLAS data model

• Event data history

• EDO VDS

• SC or event VDS

• File VDS

• Event list

• Dataset VDS

• Conclusions

Page 3: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 3

David Adams

ATLAS

WarningStarting point

• The following is intended as a starting point for discussion

Sources• Opinions expressed are my own• I don’t know of any ATLAS policies or

conventions for virtual data• There is ATLAS work in progress to use

GriPhyN virtual data model for DC1

Page 4: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 4

David Adams

ATLAS

DefinitionsVirtual data

• Data which may be brought into existence using associated history or prehistory

History• Record of how data was produced

Prehistory• Prescription for creating data

Page 5: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 5

David Adams

ATLAS

Definitions (cont)GriPhyN virtual data system (VDS)

• Unit of data– so far file

• Transformation takes data units as input an produces more data units

– so far an executable with formal parameters

• Derivation is is an application of a transformation

• How do we map ATLAS onto this model?– See following…

Page 6: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 6

David Adams

ATLAS

PurposeRecord keeping

• History provides a record of how data was produced (event-by-event and collectively)

On-demand generation• If data does not exist or is not easily accessible

– History can be used to regenerate data

– Prehistory can be used to generate data

Production• Prehistory can be used to configure production

Page 7: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 7

David Adams

ATLAS

Event data granularityATLAS levels of data granularity

• Physics object (e.g. track, jet or electron)• EDO – event data object• Sharing category• Event• File• Event list• Dataset

Page 8: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 8

David Adams

ATLAS

EDODefinition

• EDO is a collection of physics objects– Typically homogenous

– May add some collective data such as total transverse energy

• An algorithm takes one or more EDO’s as input and produces one (or more) as output

– Reminiscent of VDS

Page 9: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 9

David Adams

ATLAS

Sharing CategoryDefinition

• Collection of related EDO’s with the same event ID

– E.g. tracking data or high-level physics objects

• No sharing of EDO’s between categories?• Sharing category is not shared between files

Page 10: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 10

David Adams

ATLAS

EventWarning

• Event may mean beam crossing or subset of associated data (event view)

HES event view• Arbitrary collection of EDO’s associated with

the same event ID• Scope defined by context

– E.g. file or transient data store– All data (including versions) probably not useful

• Typically (always?) includes all contents of a well-defined set of sharing categories

Page 11: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 11

David Adams

ATLAS

FileCurrent HES definition

• Holds EDO’s for a specified set of event ID’s• Holds the same set of sharing categories for

each event• Sharing category or EDO may be held by value

or reference• EDO may be a replica

Page 12: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 12

David Adams

ATLAS

File (cont)F ile f1

Event e1

P C p1

P C p2

P C p3

ED O 1 ED O 2

ED O 3

ED O 4 ED O 5 ED O 6

Event e2

P C p1

P C p2

P C p3

ED O 7' ED O 8'

ED 10 ED O 11 ED O 12

F ile f2

Event e1

P C p1

P C p3

Event e2

P C p1

F ile f3

Event e2

P C p1

P C p2

ED O 7 ED O 8

ED O 09

P C p3

E xa m ple o f p os s ib le a s so c ia tions be tw e en H E S file s , p la c em e nt c a tego rie s (P C 's ) and e v ent da ta o b je c ts (E D O 's ).

Page 13: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 13

David Adams

ATLAS

File (cont)Future HES definition

• Add history for each EDO• Option to only hold history (regeneration)• Include non-event data

– E.g. replicas of shared history objects such as algorithms

• Option to hold only instruction for building (prehistory)?

• Drop PC’s (sharing categories)?

Page 14: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 14

David Adams

ATLAS

Event listDefinition

• Collection of ID’s for events satisfying physics selection criteria

– E.g. 2 or more jets, one lepton, missing ET all with energy or momentum thresholds

• Data versions on which selections were based• Collective properties

– Integrated luminosity

Page 15: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 15

David Adams

ATLAS

DatasetPurpose

• To identify the data (and hence the files) that must be gathered for a job to run

Definition• Event list• Restriction on content (EDO type-keys)

– E.g. only summary data or tracking data

• Versions of these EDO’s– Require consistency with selection versions?

• File collection(s) holding these EDO’s

Page 16: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 16

David Adams

ATLAS

ADB differencesEvent

• ADB event generally holds copies or references to all the event data used in its construction

• Not possible to combine views of an event– E.g. tracks from one and jets from another

– Advantage is enforced consistency

– Disadvantage is limited flexibility

Event collection• ADB event collection is between the event list

and dataset defined here

Page 17: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 17

David Adams

ATLAS

Event data space

Eve

nt I

D

Versio n

(code and

paramete

rs )

C o ntent(typ e-key,sharing c atego ry,R AW /ES D/AO D)

Eve nt l is t

File s

D atase t

Page 18: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 18

David Adams

ATLAS

ATLAS data model

R a w da ta(byte s tre a m )

M D T D ig its

M u o n e v e nt d a ta m o d e l

R P C D ig its T G C D ig its C S C D ig its

M D T H its R P C H its T G C H its C S C H its

M uo n tra c k s

U np a c k ing

C a lib ra tio n,a lignm e nta nd c lu s te ring

C a nd id a te sfo r v irtu a ld a ta a nd /o rsm a rtc o nta ine rs

Page 19: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 19

David Adams

ATLAS

Event data historyCurrent ATLAS model is object oriented

• History object for each EDO references– EDO

– parents of EDO

– Algorithm history object

– Job history object

• See figure

Contains complete history only if ancestor history objects are present

• Regeneration not possible if these are gone

Page 20: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 20

David Adams

ATLAS

Event data history (cont)

EDO

c alib /geo m

EDOHis to ry

-------------E ve nt IDS ta r t tim eS top tim eC P U tim ere turn s ta tusc he c ks um

Jo bHis to ry

-------------R e l ve rs ionS ta r t tim e

C P U---- - - - - - -

O S.. .

Algo rithmHis to ry

-------------J obO ptionsA lgo ve rs ion

parents

Page 21: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 21

David Adams

ATLAS

Event data historyModify to add prehistory

• Enable regeneration from a single event history object

• Replace algorithm history with algorithm history DAG (directed acyclic graph)

– Include links to ancestor algorithm history objects

– Requires opening the history objects of the parent EDO’s (unless these were written as part of the same job)

Page 22: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 22

David Adams

ATLAS

EDO VDSATLAS data unit

• Having identified the ATLAS levels of granularity, we need to select which one(s) is used to define our VDS unit of data

• In the original GriPhyN design, the file was chosen

– This is being generalized

• Natural choice for us is the EDO– Smallest unit of processing (?)

– Smallest unit of replication (?)

Page 23: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 23

David Adams

ATLAS

EDO VDS (cont)EDO transformation

• Transformation is an Athena algorithm specified by

– Parameters in jobOptions

– Algorithm version

• Athena executable typically performs multiple transformations

– Algorithm DAG

• Input type-keys are implicit (buried in the code)• This is prehistory data

Page 24: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 24

David Adams

ATLAS

EDO VDS (cont)EDO derivation

• Specify input data (event view)– Event ID

– Input EDO instances (not just type-key)

– Use parent EDO histories to extend algorithm DAG’s back to the raw data

• Job-specific (which CPU, resources consumed,…)

• Combined with transformation data, these give the history for each produced EDO

Page 25: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 25

David Adams

ATLAS

SC or event VDSSharing category or event (view) is a collection of EDO’s

• Transformations and derivations can be expressed by merging those for the constituent EDO’s

• Algorithm DAG’s can often be merged into a single (connected) DAG

– Transformation (derivation) should express which EDO’s kept (present)

Sensible to speak of SC or event VDS

Page 26: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 26

David Adams

ATLAS

File VDS (cont)File is also a collection of EDO’s but

• Do events have a common transformation? – Same algorithm histories

• Do events have a common derivation?– Same job

– Same input file algorithm DAG’s

Probably not implies VDS less useful for files• However it is likely useful to keep track of the

transformations and derivations used in each file

Page 27: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 27

David Adams

ATLAS

Event list VDS (cont)Event list Transformation

• Is a selection algorithm applied to each event• Includes specification of the content (EDO

type-keys) on which the selection is based• Might include restriction on EDO versions

Derivation • Recorded at the event level specifies

– EDO instances (normally from dataset)

– Job parameters (CPU, …)

• Meaningful in the context of a dataset

Page 28: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 28

David Adams

ATLAS

Dataset VDS (cont)Dataset transformations include

• Algorithm DAG• Event selection• Event merge (new but trivial)

Dataset derivation includes• Input datasets• Distributed job description

– Full specification (e.g. CPU for a given EDO) probably requires examining EDO histories

Page 29: David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting

May 5, 2002Virtual Data in ATLAS US ATLAS core/grid mtg 29

David Adams

ATLAS

ConclusionsMuch work to do. This is a first pass.

Most useful data units for VDS are• EDO for tracking data at the event level and

data regeneration• Dataset for staging and tracking production and

shared event selection• What about files?