atlas data challenges
DESCRIPTION
ATLAS Data Challenges. LCG - PEB meeting CERN December 12th 2001 Gilbert Poulard CERN EP-ATC. Outlook. ATLAS Data Challenges Some considerations. ATLAS Data challenges. Goal understand and validate: our computing model, our data model and our software our technology choices - PowerPoint PPT PresentationTRANSCRIPT
ATLAS Data Challenges
LCG - PEB meeting
CERN December 12th 2001
Gilbert PoulardCERN EP-ATC
G. Poulard LCG-PEB 12 December 2001 2
Outlook ATLAS Data Challenges Some considerations
G. Poulard LCG-PEB 12 December 2001 3
ATLAS Data challenges Goal
understand and validate: our computing model, our data model and our softwareour technology choices
How?In iterating on a set of DCs of increasing complexity
Ideally: Start with data which looks like real data• Run the filtering and reconstruction chain• Store the output data into our database• Run the analysis• Produce physics results• To study performances issues, database technologies, analysis
scenarios, ...• To identify weaknesses, bottle necks, etc… (but also good points)
But we need to produce the ‘data’ and satisfy ‘some’ communities
• Simulation will be part of DC0 & DC1• Data needed by HLT community
G. Poulard LCG-PEB 12 December 2001 4
ATLS Data Challenges: DC0 Three ‘original’ paths involving databases:
GeneratorGeant3(ZebraObjy)Athena reconstructionsimple analysis
This is the “primary” chain (100,000 events)Purpose: this is the principal continuity test
Atlfast chain: GeneratorAtlfastsimple analysisDemonstrated for Lund, but (transient) software is changingPurpose: continuity test
Physics TDR(ZebraObjy)Athena reconstructionsimple analysis
Purpose: Objy test? Additional path:
GeneratorGeant4(Objy)Purpose: Robustness test (100,000 events)
G. Poulard LCG-PEB 12 December 2001 5
ATLAS Data Challenges: DC0 Originally: November-December 2001
'continuity' test through the software chainaim is primarily to check the state of readiness for DC1We plan ~100k Z+jet events, or similarsoftware works:
• issues to be checked include – G3 simulation running with the ‘latest’ version of the geometry– reconstruction running.
• data must be written/read to/from the database Now:
Before Xmas • ~30k events (full simulation) + ~30k events (conversion)• G4 robustness test (~100k events)
Early January• Repeat the exercise with a new release (full chain)
DC0 : End January• Statistics to be defined (~100k events)
G. Poulard LCG-PEB 12 December 2001 6
ATLAS Data Challenges: DC1 DC1 February-July 2002
reconstruction & analysis on a large scalelearn about data model; I/O performances; identify bottle necks …
use of GRID as and when possible and appropriatedata management
Use (evaluate) more than one database technology (Objectivity and ROOT I/O)
• Relative importance under discussionLearn about distributed analysis
should involve CERN & outside-CERN sitessite planning is going on, an incomplete list already includes sites from Canada, France, Italy, Japan, UK, US, Russia scale 107 events in 10-20 days, O(1000) PC’s
data needed by HLT & Physics groups (others?)simulation & pile-up will play an important roleshortcuts may be needed (especially for HLT)!
checking of Geant4 versus Geant3
G. Poulard LCG-PEB 12 December 2001 7
ATLAS Data Challenges: DC1
DC1 will have two distinct phasesFirst, production of events for HLT TDR, where the primary concern is delivery of events to HLT community;Second, testing of software (G4, dBases, detector description,etc.) with delivery of events for physics studiesSoftware will change between these two phases
Simulation & pile-up will be of great importancestrategy to be defined (I/O rate, number of “event” servers?)
As we want to do it ‘world-wide’ we will ‘port’ our software to the GRID environment and use as much as possible the GRID middleware (ATLAS kit to be prepared)
G. Poulard LCG-PEB 12 December 2001 8
ATLAS Data Challenges: DC2
DC2 Spring-Autumn 2003Scope will depend on what has and has not been achieved in DC0 & DC1 At this stage the goal includes:
Use of ‘TestBed’ which will be built in the context of the Phase 1 of the “LHC Computing Grid Project”
• Scale at a sample of 108 events • System at a complexity X% of 2006-2007 system
Extensive use of the GRID middlewareGeant4 should play a major rolePhysics samples could(should) have ‘hidden’ new physicsCalibration and alignment procedures should be tested
May be to be synchronized with “Grid” developments
G. Poulard LCG-PEB 12 December 2001 9
DC scenarioProduction Chain:
Event generationDetector Simulation Pile-upDetectors responsesReconstructionAnalysis
These steps should be as independent as possible
G. Poulard LCG-PEB 12 December 2001 10
Production stream for DC0-1 Input Output Framework
Event generation Pythia Herwig Isajet
none OO-db Athena
Detector Simulation Geant3 Dice Geant4
OO-db OO-db
FZ OO-db
Atlsim FADS/Goofy
Pile-up (DC1) & Detector responses
Atlsim Dice
OO-db OO-db
FZ OO-db
Atlsim/Dice Athena
Data conversion FZ OO-db Athena
Reconstruction OO-db OO-db “Ntuple”
Athena
Analysis “Ntuple” Paw / Root Anaphe / Jas
“OO-db” is used for “OO database”, it could be Objectivity, ROOT/IO, …
G. Poulard LCG-PEB 12 December 2001 11
DC0
Ntuplelike
Pythia, Isajet,Herwig HepMC Obj., Root ATLFAST
OO
Ntuple
Obj., Root
GENZ
G3/DICE RD event ?OO-DB ?
ATHENA reconstruction
Comb. Ntuple
Obj., RootComb. Ntuple
Phys. TDR data
Missing:-- filter, trigger-- HepMC in Root-- ATLFAST output in Root (TObjects)-- Link MC truth - ATLFAST-- Reconstruction output in Obj., Root-- EDM (e.g. G3/DICE input to ATHENA)
ZEBRA
G. Poulard LCG-PEB 12 December 2001 12
DC1
Ntuple
Pythia, Isajet,Herwig, MyGeneratorModule
HepMC Obj., Root ATLFAST OO
Ntuple
Obj., Root
GENZ
G3/DICE RD event ?OO-DB ?
ATHENA reconstruction
Comb. Ntuple
Obj., RootComb. Ntuple
G4Obj.
Missing:-- filter, trigger -- Detector description-- HepMC in Root -- Digitisation-- ATLFAST output in Root (TObjects) -- Pile-up-- Link MC truth - ATLFAST -- Reconstruction output in Obj., Root-- EDM (e.g. G3/DICE , G4 input to ATHENA)
Ntuple-like
ZEBRA
G. Poulard LCG-PEB 12 December 2001 13
DC0 G4 Robustness Test
Test planTwo kinds of tests:
A ‘large-N’ generation with the ATLAS detector geometry• Detailed geometry for the muon system (input from AMDB)• A crude geometry for InnerDetector and Calorimeter
A ‘large-N’ generation with a test beam geometry• TileCal - Test beam for electromagnetic interactions
Physics processesHiggs -> 4 muons (by Pythia) <---- Main targetMinimum bias event <---- if possible
G. Poulard LCG-PEB 12 December 2001 14
DC0 G4 Robustness Test Expected data size and CPU required (Only for ATLAS detector geometry)
per event 1,000 events4-vectors database ~ 50 KB ~ 50 MB
Hits/Hit-collections ~ 1.5 MB ~ 1.5 GB database(See the note below for these numbers)
CPU ~ 60 sec ~ 17 hours (Pentium III, 800MHz)[Note] Not the final number. It includes a safety factor to reserve extra disk space.
Required resources (Only for ATLAS detector geometry)PC farm ~ 10 CPUs ( 5 machines with dual processors)Disk space ~ 155 GBProcess period ~ 1 week
G. Poulard LCG-PEB 12 December 2001 15
Data management It is a key issue Evaluation of more than one technology is part of
DC1Infrastructure has to be put in place:
For Objectivity & ROOT I/O• Software, hardware, tools to manage the data
– creation, replication, distribution, …
Tools are needed to run the production“bookkeeping” , “cataloguing” , “job submission”…
We intend to use as much as possible GRID tools• Magda for DC0
G. Poulard LCG-PEB 12 December 2001 16
DC1-HLT - CPU Number of events
Time per event sec SI95
Total time Sec SI95
Total timeHoursSI95
simulation107 3000
3 * 1010 107
reconstruction 107 640
6.4 * 109 2 * 106
Based on experience from Physics TDR
G. Poulard LCG-PEB 12 December 2001 17
DC1-HLT - dataNumber of events
Event sizeMB
Total size GB
Total sizeTB
simulation107 2 20000 20
reconstruction 107 0.5 5000 5
G. Poulard LCG-PEB 12 December 2001 18
DC1-HLT data with pile-up
L Number of events
Event size MB
Total size GB Total size TB
2 x 1033 1.5 x 106 (1) 2.6(2) 4.7
40007000
47
1034 1.5 x 106 (1) 6.5(2) 17.5
1000026000
1026
In addition to ‘simulated’ data, assuming ‘filtering’ after simulation (~14% of the events kept).
- (1) keeping only ‘digits’
- (2) keeping ‘digits’ and ‘hits’
G. Poulard LCG-PEB 12 December 2001 19
Ramp-up scenario @ CERN
050
100150200250300350400
7 11 16 20 24 25 26
CPU
Week in 2002
G. Poulard LCG-PEB 12 December 2001 20
Some considerations (1):We consider that LCG is crucial for our successWe agree to have as much as possible common
projects under the control of the projectWe think that a high priority should be given on
the development of the shared Tier0 & shared Tier1 centers
We are interested in “cross-grid” projects Obviously to avoid duplication of work We consider as very important the interoperability between US and EU Grid (Magda as a first use case)
G. Poulard LCG-PEB 12 December 2001 21
Some considerations (2):We would like to set up a really distributed
production system (simulation, reconstruction, analysis) making use, already for DC1, of the GRID tools (especially those of EU-DataGrid Release 1)
The organization of the operation of the infrastructure should be defined and put in place
We need a ‘stable’ environment during the data challenges and a clear picture of the available resources as soon as possible
G. Poulard LCG-PEB 12 December 2001 22
Some considerations (3):We consider that the discussion on the
common persistence technology should start as soon as possible under the umbrella of the project
We think that other common items (eg. dictionary languages, release tools, etc) are worthwhile (not with the same priority) but we must ask what is desirable and what is necessary
We think that the plan for the simulation should be understood