the atlas grid progress roger jones lancaster university gridpp cm qmul, 28 june 2006
TRANSCRIPT
The ATLAS Grid ProgressThe ATLAS Grid Progress
Roger JonesRoger Jones
Lancaster UniversityLancaster University
GridPP CMGridPP CM
QMUL, 28 June 2006QMUL, 28 June 2006
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 2
ATLAS partial &“average” T1 Data Flow (2008)ATLAS partial &“average” T1 Data Flow (2008)
Tier-0
CPUfarm
T1T1OtherTier-1s
diskbuffer
RAW
1.6 GB/file0.02 Hz1.7K f/day32 MB/s2.7 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AOD2
10 MB/file0.2 Hz17K f/day2 MB/s0.16 TB/day
AODm2
500 MB/file0.004 Hz0.34K f/day2 MB/s0.16 TB/day
RAW
ESD2
AODm2
0.044 Hz3.74K f/day44 MB/s3.66 TB/day
RAW
ESD (2x)
AODm (10x)
1 Hz85K f/day720 MB/s
T1T1OtherTier-1s
T1T1EachTier-2
Tape
RAW
1.6 GB/file0.02 Hz1.7K f/day32 MB/s2.7 TB/day
diskstorage
AODm2
500 MB/file0.004 Hz0.34K f/day2 MB/s0.16 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AOD2
10 MB/file0.2 Hz17K f/day2 MB/s0.16 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AODm2
500 MB/file0.036 Hz3.1K f/day18 MB/s1.44 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AODm2
500 MB/file0.036 Hz3.1K f/day18 MB/s1.44 TB/day
ESD1
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AODm1
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
AODm1
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
AODm2
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
Plus simulation and Plus simulation and analysis data flowanalysis data flow
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 3
Computing System Computing System CommissioningCommissioning
• ATLAS developments are all driven by the Computing System ATLAS developments are all driven by the Computing System
Commissioning (CSC)Commissioning (CSC)• Runs from June 06 to ~March 07• Not monolithic, many components• Careful scheduling needed of interrelated components –
workshop next week for package leaders
• Begins with Tier-0/Tier-1/(some) Tier-2sBegins with Tier-0/Tier-1/(some) Tier-2s• Exercising the data handling and transfer systems
• Lesson from the previous round of experiments at CERN Lesson from the previous round of experiments at CERN
(LEP, 1989-2000)(LEP, 1989-2000)• Reviews in 1988 underestimated the computing requirements by
an order of magnitude!
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 4
CSC itemsCSC items
• Full Software ChainFull Software Chain
• Tier-0 ScalingTier-0 Scaling
• Streaming testsStreaming tests
• Calibration & AlignmentCalibration & Alignment
• High-Level TriggerHigh-Level Trigger
• Distributed Data ManagementDistributed Data Management
• Distributed Production Distributed Production
• Physics Analysis Physics Analysis
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 5
ATLAS Distributed Data ATLAS Distributed Data ManagementManagement
• ATLAS reviewed all its own Grid distributed systems (data ATLAS reviewed all its own Grid distributed systems (data management, production, analysis) during the first half of 2005management, production, analysis) during the first half of 2005
• Data Management is key
• A new Distributed Data Management System (DDM) was designed, A new Distributed Data Management System (DDM) was designed, based on:based on:
• A hierarchical definition of datasets• Central dataset catalogues, Distributed file catalogues• Data blocks as units of file storage and replication• Automatic data transfer mechanisms using distributed services (dataset
subscription system)
• The DDM system supports the basic data tasks:The DDM system supports the basic data tasks:• Distribution of raw and reconstructed data from CERN to the Tier-1s• Distribution of AODs (Analysis Object Data) to Tier-2 centres for analysis• Storage of simulated data (produced by Tier-2s) at Tier-1 centres for further
distribution and/or processing
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 6
ATLAS DDM ATLAS DDM OrganizationOrganization
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 7
Central vs Local Central vs Local ServicesServices
• The DDM system has a central role with respect to ATLAS Grid toolsThe DDM system has a central role with respect to ATLAS Grid tools
• Its slow roll-out on LCG is causing problems to other components
• Predicated on distributed file catalogues and auxiliary servicesPredicated on distributed file catalogues and auxiliary services
• Do not ask every single Grid centre to install ATLAS services• We decided to install “local” catalogues and services at Tier-1 centres
• Then we defined “regions” which consist of a Tier-1 and all other Grid
computing centres that:• Are well (network) connected to this Tier-1• Depend on this Tier-1 for ATLAS services (including the file catalogue)
• CSC will establish if this scales to the LHC data-taking era needs:CSC will establish if this scales to the LHC data-taking era needs:
• Moving several 10000s files/day
• Supporting up to 100000 organized production jobs/day
• Supporting the analysis work of >1000 active ATLAS physicists
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 8
ATLAS Data ATLAS Data Management ModelManagement Model
• In practice, it turns out to be convenient ( & more robust) to partition the In practice, it turns out to be convenient ( & more robust) to partition the
Grid so that there are default (not compulsory) Tier-1↔Tier-2 pathsGrid so that there are default (not compulsory) Tier-1↔Tier-2 paths
• FTS channels are installed for these data paths for production use
• All other data transfers go through normal network routes
• In this model, a number of data management services are installed only at In this model, a number of data management services are installed only at
Tier-1s and act also on their “associated” Tier-2s:Tier-1s and act also on their “associated” Tier-2s:
• VO Box
• FTS channel server (both directions)
• Local file catalogue (part of DDM/DQ2)
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 9
Tiers of ATLASTiers of ATLAS
T1
T0
T2T2
LFC
LFC
FTS Server T1FTS Server T0
T1
….
VO box
VO box
LFC: local within ‘cloud’All SEs SRM
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 10
Job Management: Job Management: ProductionsProductions
• Next step: rework the distributed production system to Next step: rework the distributed production system to optimise job distribution, by sending jobs to the data (or optimise job distribution, by sending jobs to the data (or as close as possible to them)as close as possible to them)
• This was not the case previously, as jobs were sent to free CPUs and had to copy the input file(s) to the local WN, from wherever in the world the data happened to be
• Make better use of the task and dataset conceptsMake better use of the task and dataset concepts• A “task” acts on a dataset and produces more datasets
• Use bulk submission functionality to send all jobs of a given task to the location of their input datasets
• Minimise file transfers and waiting time before execution
• Collect output files from the same dataset to the same SE and transfer them asynchronously to their final locations
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 11
Job Management: Job Management: AnalysisAnalysis
• A central job queue is good for scheduled productions (priority A central job queue is good for scheduled productions (priority
settings), but too heavy for user analysissettings), but too heavy for user analysis
• Interim tools developed to submit Grid jobs on specific Interim tools developed to submit Grid jobs on specific
deployments and with limited data management:deployments and with limited data management:
• LJSF for the LCG/EGEE Grid
• Pathena can generate ATLAS jobs that act on a dataset and submits
them to PanDA on the OSG Grid
• Baseline tool to help users to submit Grid jobs is GangaBaseline tool to help users to submit Grid jobs is Ganga Job splitting and bookkeeping
Several submission possibilities
Collection of output files
• Now becoming useful as DDM is populated
• Rapid progress after user feedback, rich features
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 12
ATLAS Analysis Work ATLAS Analysis Work ModelModel
Job preparation:Job preparation:
Medium-scale (on demand) running &testing:Medium-scale (on demand) running &testing:
Large-scale (scheduled) running:Large-scale (scheduled) running:
Local system (shell)
Prepare JobOptions Run Athena (interactive or batch) Get Output
Local system (Ganga)
Job book-keepingGet Output
Local system (Ganga)
Prepare JobOptionsFind dataset from DDMGenerate & submit jobs
GridRun Athena
Local system (Ganga)
Job book-keepingAccess output from
GridMerge results
Local system (Ganga)
Prepare JobOptionsFind dataset from DDMGenerate & submit jobs
ProdSysRun Athena on Grid
Store o/p on Grid
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 13
Analysis Jobs at Tier-2sAnalysis Jobs at Tier-2s
• Analysis jobs must run where the input data files areAnalysis jobs must run where the input data files are
• Most analysis jobs will take AODs as input for complex calculations Most analysis jobs will take AODs as input for complex calculations
and event selectionsand event selections
• And most likely will output Athena-Aware Ntuples (AAN, to be stored on
some close SE) and histograms (to be sent back to the user)
• People will develop their analyses on reduced samples many many People will develop their analyses on reduced samples many many
times before launching runs on a complete datasettimes before launching runs on a complete dataset
• There will be a large number of failures due to people’s code!
• Exploring a priority system that separates centrally organised Exploring a priority system that separates centrally organised
productions from analysis tasksproductions from analysis tasks
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 14
ATLAS requirementsATLAS requirements
• General productionGeneral production• Organized production• Share defined by the management
• Group ProductionGroup Production• Organized production• About 24 groups identified• Share defined by the management
• General UsersGeneral Users• Chaotic use pattern• Fair share between users
• Analysis service to be deployed over summerAnalysis service to be deployed over summer• Various approached to prioritisation (VOViews, gpbox, queues) to
be explored
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 15
Conditions data modelConditions data model
• All non-event data for simulation, reconstruction and analysisAll non-event data for simulation, reconstruction and analysis
• Calibration/alignment data, also DCS (slow controls) data,
subdetector and trigger configuration, monitoring, …
• Several technologies employed:Several technologies employed:
• Relational databases: COOL for Intervals Of Validity and some
payload data, other relational database tables referenced by COOL• COOL databases in Oracle, MySQL DBs, or SQLite file-based DBs• Accessed by ‘CORAL’ software (common database backend-
independent software layer) - independent of underlying database • Mixing technologies part of database distribution strategy
• File based data (persistified calibration objects) - stored in files,
indexed / referenced by COOL• File based data will be organised into datasets and handled using DDM
(same system as used for event data)
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 16
Calibration data Calibration data challengechallenge
• ATLAS, Tier-2s have only done simulation/reconstructionATLAS, Tier-2s have only done simulation/reconstruction
• Static replicas of conditions data in SQLite files, or preloaded MySQL replicas -
conditions data already known in advance
• ATLAS calibration data challenge (late 2006) will change thisATLAS calibration data challenge (late 2006) will change this
• Reconstruct misaligned/miscalibrated data, derive calibrations, re-reconstruct
and iterate - as close as possible to real data
• Will require ‘live’ replication of new data out to Tier-1/2 centres
• Technologies to be used @ Tier-2Technologies to be used @ Tier-2
• Will need COOL replication either by local MySQL replicas, or via Frontier• Currently just starting on ATLAS tests of Frontier - need experience• Decision in a few months - what to use for calibration data challenge
• Will definitely need DDM replication of new conditions datasets (sites
subscribe to evolving datasets)
• External sites will submit updates as COOL SQLite files to be merged into
central CERN Oracle databases
RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 17
ConclusionsConclusions
• We are trying not to impose any particular load on Tier-2 We are trying not to impose any particular load on Tier-2
managers by running distributed services at Tier-1smanagers by running distributed services at Tier-1s• Although this concept breaks the symmetry and forces us to set
up default Tier-1–Tier-2 associations
• All that is required of Tier-2s is to set up the Grid All that is required of Tier-2s is to set up the Grid
environmentenvironment• Including whichever job queue priority scheme will be found
most useful
• And SRM Storage Elements with (when available) a correct
implementation of the space reservation and accounting system