the atlas grid progress roger jones lancaster university gridpp cm qmul, 28 june 2006

The ATLAS Grid ProgressThe ATLAS Grid Progress

Roger JonesRoger Jones

Lancaster UniversityLancaster University

GridPP CMGridPP CM

QMUL, 28 June 2006QMUL, 28 June 2006

RWL Jones 28 June 2006 QMULRWL Jones 28 June 2006 QMUL 2

ATLAS partial &“average” T1 Data Flow (2008)ATLAS partial &“average” T1 Data Flow (2008)

Tier-0

CPUfarm

T1T1OtherTier-1s

diskbuffer

RAW

1.6 GB/file0.02 Hz1.7K f/day32 MB/s2.7 TB/day

ESD2


AOD2

10 MB/file0.2 Hz17K f/day2 MB/s0.16 TB/day

AODm2

500 MB/file0.004 Hz0.34K f/day2 MB/s0.16 TB/day

RAW

ESD2

AODm2

0.044 Hz3.74K f/day44 MB/s3.66 TB/day

RAW

ESD (2x)

AODm (10x)

1 Hz85K f/day720 MB/s

T1T1OtherTier-1s

T1T1EachTier-2

Tape

RAW


diskstorage

AODm2


ESD2


AOD2

10 MB/file0.2 Hz17K f/day2 MB/s0.16 TB/day

ESD2


AODm2


ESD2


AODm2


ESD1


AODm1


AODm1


AODm2


Plus simulation and Plus simulation and analysis data flowanalysis data flow


Computing System Computing System CommissioningCommissioning

• ATLAS developments are all driven by the Computing System ATLAS developments are all driven by the Computing System

Commissioning (CSC)Commissioning (CSC)• Runs from June 06 to ~March 07• Not monolithic, many components• Careful scheduling needed of interrelated components –

workshop next week for package leaders

• Begins with Tier-0/Tier-1/(some) Tier-2sBegins with Tier-0/Tier-1/(some) Tier-2s• Exercising the data handling and transfer systems

• Lesson from the previous round of experiments at CERN Lesson from the previous round of experiments at CERN

(LEP, 1989-2000)(LEP, 1989-2000)• Reviews in 1988 underestimated the computing requirements by

an order of magnitude!


CSC itemsCSC items

• Full Software ChainFull Software Chain

• Tier-0 ScalingTier-0 Scaling

• Streaming testsStreaming tests

• Calibration & AlignmentCalibration & Alignment

• High-Level TriggerHigh-Level Trigger

• Distributed Data ManagementDistributed Data Management

• Distributed Production Distributed Production

• Physics Analysis Physics Analysis

https://twiki.cern.ch/twiki/bin/view/Atlas/CSCFullSoftwareChain

https://twiki.cern.ch/twiki/bin/view/Atlas/CSCTier-0Scaling

https://twiki.cern.ch/twiki/bin/view/Atlas/CSCCalibrationAlignment

https://twiki.cern.ch/twiki/bin/view/Atlas/CSCHighLevelTrigger

https://twiki.cern.ch/twiki/bin/view/Atlas/CSCDistributedDataManagement


ATLAS Distributed Data ATLAS Distributed Data ManagementManagement

• ATLAS reviewed all its own Grid distributed systems (data ATLAS reviewed all its own Grid distributed systems (data management, production, analysis) during the first half of 2005management, production, analysis) during the first half of 2005

• Data Management is key

• A new Distributed Data Management System (DDM) was designed, A new Distributed Data Management System (DDM) was designed, based on:based on:

• A hierarchical definition of datasets• Central dataset catalogues, Distributed file catalogues• Data blocks as units of file storage and replication• Automatic data transfer mechanisms using distributed services (dataset

subscription system)

• The DDM system supports the basic data tasks:The DDM system supports the basic data tasks:• Distribution of raw and reconstructed data from CERN to the Tier-1s• Distribution of AODs (Analysis Object Data) to Tier-2 centres for analysis• Storage of simulated data (produced by Tier-2s) at Tier-1 centres for further

distribution and/or processing


ATLAS DDM ATLAS DDM OrganizationOrganization


Central vs Local Central vs Local ServicesServices

• The DDM system has a central role with respect to ATLAS Grid toolsThe DDM system has a central role with respect to ATLAS Grid tools

• Its slow roll-out on LCG is causing problems to other components

• Predicated on distributed file catalogues and auxiliary servicesPredicated on distributed file catalogues and auxiliary services

• Do not ask every single Grid centre to install ATLAS services• We decided to install “local” catalogues and services at Tier-1 centres

• Then we defined “regions” which consist of a Tier-1 and all other Grid

computing centres that:• Are well (network) connected to this Tier-1• Depend on this Tier-1 for ATLAS services (including the file catalogue)

• CSC will establish if this scales to the LHC data-taking era needs:CSC will establish if this scales to the LHC data-taking era needs:

• Moving several 10000s files/day

• Supporting up to 100000 organized production jobs/day

• Supporting the analysis work of >1000 active ATLAS physicists


ATLAS Data ATLAS Data Management ModelManagement Model

• In practice, it turns out to be convenient ( & more robust) to partition the In practice, it turns out to be convenient ( & more robust) to partition the

Grid so that there are default (not compulsory) Tier-1↔Tier-2 pathsGrid so that there are default (not compulsory) Tier-1↔Tier-2 paths

• FTS channels are installed for these data paths for production use

• All other data transfers go through normal network routes

• In this model, a number of data management services are installed only at In this model, a number of data management services are installed only at

Tier-1s and act also on their “associated” Tier-2s:Tier-1s and act also on their “associated” Tier-2s:

• VO Box

• FTS channel server (both directions)

• Local file catalogue (part of DDM/DQ2)


Tiers of ATLASTiers of ATLAS

T1

T0

T2T2

LFC

LFC

FTS Server T1FTS Server T0

T1

….

VO box

VO box

LFC: local within ‘cloud’All SEs SRM


Job Management: Job Management: ProductionsProductions

• Next step: rework the distributed production system to Next step: rework the distributed production system to optimise job distribution, by sending jobs to the data (or optimise job distribution, by sending jobs to the data (or as close as possible to them)as close as possible to them)

• This was not the case previously, as jobs were sent to free CPUs and had to copy the input file(s) to the local WN, from wherever in the world the data happened to be

• Make better use of the task and dataset conceptsMake better use of the task and dataset concepts• A “task” acts on a dataset and produces more datasets

• Use bulk submission functionality to send all jobs of a given task to the location of their input datasets

• Minimise file transfers and waiting time before execution

• Collect output files from the same dataset to the same SE and transfer them asynchronously to their final locations


Job Management: Job Management: AnalysisAnalysis

• A central job queue is good for scheduled productions (priority A central job queue is good for scheduled productions (priority

settings), but too heavy for user analysissettings), but too heavy for user analysis

• Interim tools developed to submit Grid jobs on specific Interim tools developed to submit Grid jobs on specific

deployments and with limited data management:deployments and with limited data management:

• LJSF for the LCG/EGEE Grid

• Pathena can generate ATLAS jobs that act on a dataset and submits

them to PanDA on the OSG Grid

• Baseline tool to help users to submit Grid jobs is GangaBaseline tool to help users to submit Grid jobs is Ganga Job splitting and bookkeeping

Several submission possibilities

Collection of output files

• Now becoming useful as DDM is populated

• Rapid progress after user feedback, rich features


ATLAS Analysis Work ATLAS Analysis Work ModelModel

Job preparation:Job preparation:

Medium-scale (on demand) running &testing:Medium-scale (on demand) running &testing:

Large-scale (scheduled) running:Large-scale (scheduled) running:

Local system (shell)

Prepare JobOptions Run Athena (interactive or batch) Get Output

Local system (Ganga)

Job book-keepingGet Output


Prepare JobOptionsFind dataset from DDMGenerate & submit jobs

GridRun Athena


Job book-keepingAccess output from

GridMerge results


Prepare JobOptionsFind dataset from DDMGenerate & submit jobs

ProdSysRun Athena on Grid

Store o/p on Grid


Analysis Jobs at Tier-2sAnalysis Jobs at Tier-2s

• Analysis jobs must run where the input data files areAnalysis jobs must run where the input data files are

• Most analysis jobs will take AODs as input for complex calculations Most analysis jobs will take AODs as input for complex calculations

and event selectionsand event selections

• And most likely will output Athena-Aware Ntuples (AAN, to be stored on

some close SE) and histograms (to be sent back to the user)

• People will develop their analyses on reduced samples many many People will develop their analyses on reduced samples many many

times before launching runs on a complete datasettimes before launching runs on a complete dataset

• There will be a large number of failures due to people’s code!

• Exploring a priority system that separates centrally organised Exploring a priority system that separates centrally organised

productions from analysis tasksproductions from analysis tasks


ATLAS requirementsATLAS requirements

• General productionGeneral production• Organized production• Share defined by the management

• Group ProductionGroup Production• Organized production• About 24 groups identified• Share defined by the management

• General UsersGeneral Users• Chaotic use pattern• Fair share between users

• Analysis service to be deployed over summerAnalysis service to be deployed over summer• Various approached to prioritisation (VOViews, gpbox, queues) to

be explored


Conditions data modelConditions data model

• All non-event data for simulation, reconstruction and analysisAll non-event data for simulation, reconstruction and analysis

• Calibration/alignment data, also DCS (slow controls) data,

subdetector and trigger configuration, monitoring, …

• Several technologies employed:Several technologies employed:

• Relational databases: COOL for Intervals Of Validity and some

payload data, other relational database tables referenced by COOL• COOL databases in Oracle, MySQL DBs, or SQLite file-based DBs• Accessed by ‘CORAL’ software (common database backend-

independent software layer) - independent of underlying database • Mixing technologies part of database distribution strategy

• File based data (persistified calibration objects) - stored in files,

indexed / referenced by COOL• File based data will be organised into datasets and handled using DDM

(same system as used for event data)


Calibration data Calibration data challengechallenge

• ATLAS, Tier-2s have only done simulation/reconstructionATLAS, Tier-2s have only done simulation/reconstruction

• Static replicas of conditions data in SQLite files, or preloaded MySQL replicas -

conditions data already known in advance

• ATLAS calibration data challenge (late 2006) will change thisATLAS calibration data challenge (late 2006) will change this

• Reconstruct misaligned/miscalibrated data, derive calibrations, re-reconstruct

and iterate - as close as possible to real data

• Will require ‘live’ replication of new data out to Tier-1/2 centres

• Technologies to be used @ Tier-2Technologies to be used @ Tier-2

• Will need COOL replication either by local MySQL replicas, or via Frontier• Currently just starting on ATLAS tests of Frontier - need experience• Decision in a few months - what to use for calibration data challenge

• Will definitely need DDM replication of new conditions datasets (sites

subscribe to evolving datasets)

• External sites will submit updates as COOL SQLite files to be merged into

central CERN Oracle databases


ConclusionsConclusions

• We are trying not to impose any particular load on Tier-2 We are trying not to impose any particular load on Tier-2

managers by running distributed services at Tier-1smanagers by running distributed services at Tier-1s• Although this concept breaks the symmetry and forces us to set

up default Tier-1–Tier-2 associations

• All that is required of Tier-2s is to set up the Grid All that is required of Tier-2s is to set up the Grid

environmentenvironment• Including whichever job queue priority scheme will be found

most useful

• And SRM Storage Elements with (when available) a correct

implementation of the space reservation and accounting system

the atlas grid progress roger jones lancaster university gridpp cm qmul, 28 june 2006

Documents

data transfers

data paths

reconstructed data

data handling

lhc data

associated tier

compulsory tier

analysis data flowrwl