atlas, escience and the grid birmingham 9 th june 2004 rwl jones lancaster university

44
ATLAS, eScience and the Grid ATLAS, eScience and the Grid Birmingham 9 Birmingham 9 th th June 2004 June 2004 RWL Jones Lancaster University RWL Jones Lancaster University

Upload: destiny-stokes

Post on 28-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

ATLAS, eScience and the GridATLAS, eScience and the Grid

Birmingham 9Birmingham 9thth June 2004 June 2004RWL Jones Lancaster UniversityRWL Jones Lancaster University

Page 2: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

OverviewOverview

What is eScience, what are Grids?What is eScience, what are Grids? Why does ATLAS need them?Why does ATLAS need them? What deployments exist?What deployments exist? What is the ATLAS Computing Model?What is the ATLAS Computing Model? How will ATLAS test this?How will ATLAS test this? ConclusionsConclusions

Page 3: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

What is eScience?What is eScience?

Electronic Science?Electronic Science?

For particle physics, eScience mainly means Grids…For particle physics, eScience mainly means Grids…

Science on `E’? (Maybe!)Science on `E’? (Maybe!)

““Enhanced” Science – John TaylorEnhanced” Science – John Taylor Actually, anything involving HPC &/or high speed networking, but really that can only be done with Actually, anything involving HPC &/or high speed networking, but really that can only be done with

modern computing!modern computing!

Cynical view: It has been a useful way to get funding from Governments etc!Cynical view: It has been a useful way to get funding from Governments etc!

GridPP had £17.5M for LCG, hardware (£3.5M), middleware, applications

GridPP2 has £14M for more hardware, deployment, applications

Page 4: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

The GridThe Grid

Note: Truly

HPC, but requires

more

Not designed for tight-coupled

problems, but spin-offs many

Page 5: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Grids – 3 Different KindsGrids – 3 Different Kinds

Computational GridComputational Grid Lots of fast processors spread over a large physical area

interlinked by fast networks Effectively a huge multiprocessor computer Shared memory more difficult but do-able

Data GridData Grid Lots of databases linked by fast networks Need effective access to mass stores Need database query tools that span different sites and different

database systems Sloan Sky Survey, Social sciences

Sensor or Control GridSensor or Control Grid Wide-area sensor networks or remote control, connections

by fast networks Flood-plain monitoring, Accelerator control rooms

ATLAS needs a hybrid of the first two

Page 6: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Time

Hype

Peak of Inflated Expectations

Trough of Disillusionment

Slope of Enlightenment

Plateau of Productivity

Trigger

Page 7: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

The ATLAS DataThe ATLAS Data

ATLASATLAS Not one experiment! A facility for many different

measurements and physics topics

Event selectionEvent selection 1 GHz pp collision rate

40 MHz bunch-crossing rate

200 Hz event-rate to mass-storage

Real-time selection Leptons Jets

Page 8: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

The ATLAS Computing ChallengeThe ATLAS Computing Challenge

Running conditions at startup:Running conditions at startup:

2007Average Luminosity (10^33) 1Trigger Rate (Hz) 160Physics Rate (Hz) 140Running (Equiv. Days) 50Physics Events (10^9) 0.8

CPU: ~14.5M SpecInt2k including analysisCPU: ~14.5M SpecInt2k including analysis

0.8x100.8x1099 event sample event sample 1.3 PB/year1.3 PB/year, before data processing, before data processing

““Reconstructed” events, Monte Carlo data Reconstructed” events, Monte Carlo data ~10 PB/year (~3 PB on disk) ~10 PB/year (~3 PB on disk)

CERN alone can handle only a fraction of these resources

Page 9: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

The SystemThe System

Tier2 Centre ~200kSI2k

Event Builder

Event Filter~7MSI2k

T0 ~5MSI2k

UK Regional Centre (RAL)

US Regional Centre

French Regional Centre

Asian Regional Centre

SheffieldManchesterLiverpoolLancaster ~0.25TIPS

Workstations

>10 GB/sec

450 Mb/sec

100 - 1000 MB/s

•Some data for calibration and monitoring to institutes

•Calibrations flow back

Each Tier 2 has ~25 physicists working on one or more channels

Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data

Tier 2 do bulk of simulation

Physics data cache

~Pb/sec

~ 300MB/s/T1 /expt

Tier2 Centre ~200kSI2k

Tier2 Centre ~200kSI2k

622Mb/s

Tier 0Tier 0

Tier 1Tier 1

PC (2004) = ~1 kSpecInt2k

Northern Tier ~200kSI2k

Tier 2Tier 2

~9 Pb/year No simulation

622Mb/s

N Tier 1s each store 1/N of raw data, reprocess it & archive the ESD, hold 2/N of current ESD for scheduled analysis & all AOD+TAG

Page 10: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

ATLAS is a worldwide collaboration, and so we span most Grid projectsATLAS is a worldwide collaboration, and so we span most Grid projectsWe benefit from all developmentsWe have problems maintaining coherence

It is almost certain will ultimately be working with several Grids (with defined interfaces)This may not be what funders like the EU want to hear!

Complexity of the Problem

Page 11: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

The ATLAS ComponentsThe ATLAS Components

Grid ProjectsGrid Projects Develop the middleware Provide hardware resources Provide some manpower resource

But also drain resources from our core activities

Computing ModelComputing Model Dedicated group to develop the computing model Revised resources and planning paper evolving Sep 2004 Now examine from DAQ to end-user Must include University/local resources Devise various scenarios, different distributions of data

Data ChallengesData Challenges Test the computing model Service other needs in ATLAS (but this must be secondary in DC2)

Page 12: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Grid ProjectsGrid Projects

etc.etc.

EGEE EGEE

Until these groups provide interoperabilitythe experiments must provide it themselves

Page 13: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

DeploymentsDeployments

Whichever deployment you have, you need:Whichever deployment you have, you need:

Hardware to run things on

Middleware to glue it together Scheduler Database of known files Information system for available resources Authentication and authorisation File replication Resource broker (maybe)

Front ends to hide complexity from the users

Page 14: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Current Grid3 Status Current Grid3 Status (3/1/04)(3/1/04)

(http://www.ivdgl.org/grid2003)(http://www.ivdgl.org/grid2003)

• 28 sites, multi-VO• shared resources• ~2000 CPUs• dynamic – roll in/out

Main LCG middleware:

Virtual Data Toolkit, captures recipe to remake data

Chimera, captures workflows in jobs

Page 15: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

LCG-2 today LCG-2 today (May 14)(May 14)

Inherited European Data Grid software

From Development to DeploymentFrom Development to Deployment

Resource Brokerage

Replica Lookup Service

Metadata services

Security

R-GMA information system

ARDA middleware

166 FTE, about 20 to UK166 FTE, about 20 to UK

Also provides experiment support

POOL object persistency

SEAL core libraries and services

Software Process and Infrastructure

Simulation (G4 and GenSen)

Page 16: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

NorduGrid middleware is deployed inNorduGrid middleware is deployed in::

Sweden (15 sites) Denmark (10 sites) Norway (3 sites) Finland (3 sites) Slovakia (1 site) Estonia (1 site)

Sites to join before/during DC2 Sites to join before/during DC2 (preliminary): (preliminary):

Norway (1-2 sites) Russia (1-2 sites) Estonia (1-2 sites) Sweden (1-2 sites) Finland (1 site) Germany (1 site)

Lightweight deployment based on Lightweight deployment based on GlobusGlobus

Many prototypes Important contribution to

ATLAS, especially installations

NorduGrid Resources: detailsNorduGrid Resources: details

Page 17: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

GridPP & GridPP2GridPP & GridPP2

Deployment AreaDeployment Area Hardware (tier-1/A and front-ends for tier-2s) Hardware support for Tier-2s Grid Operations Centre for EGEE

MiddlewareMiddleware Security and Virtual Organisation Management Service R-GMA deployment Networking services MSS

ApplicationsApplications Complete the Grid integration of the first wave of

experiments Support new experiments Generic Grid portal

Page 18: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

GridPP Summary: GridPP Summary: From Prototype to ProductionFrom Prototype to Production

BaBar

D0CDF

ATLAS

CMS

LHCb

ALICE

19 UK Institutes

RAL Computer Centre

CERN ComputerCentre

SAMGrid

BaBarGrid

LCG

EDGGANGA

EGEE

UK PrototypeTier-1/A Centre

CERN PrototypeTier-0 Centre

4 UK Tier-2 Centres

LCG

UK Tier-1/ACentre

CERN Tier-0Centre

200720042001

4 UK Prototype Tier-2 Centres

ARDA

Separate Experiments, Resources, Multiple

Accounts 'One' Production GridPrototype Grids

Page 19: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

EDG and LCG StrategyEDG and LCG Strategy

Try to write examples of the main componentsTry to write examples of the main components Try to get a small working Grid for Production jobsTry to get a small working Grid for Production jobs

Well defined datasets Well defined (pre-installed) code Coherent job submission

Test scalabilityTest scalability RedesignRedesign Set-up an analysis environmentSet-up an analysis environment

Develop User Interfaces in Develop User Interfaces in ParallelParallel

Develop Experiment-specific Develop Experiment-specific ToolsTools

! Requires clean Requires clean interfaces/component designinterfaces/component design

You can develop end-to-end prototypes faster (e.g. NorduGrid)

but this aims for something robust, generic and reusable

Page 20: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Rough ArchitectureRough Architecture

Installation ofSoftware and Env

Compute + Store Sites

User Interface to Grid+ experiment framework

User

MiddlewareRB, GIS

Data Catalogue

Job Configuration/VDC/metadata

Page 21: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

ATLAS Computing ModelATLAS Computing Model

Areas being addressed: Areas being addressed: 1. Computing Resources2. Networks from DAQ to primary storage3. Databases4. Grid Interfaces5. Computing Farms6. Distributed Analysis7. Distributed Production8. Alignment & Calibration Procedures9. Tests of Computing Model10.Minimum permissible service11.Simulation of model

Report end 2004 ready for Computing Technical Design Report Report end 2004 ready for Computing Technical Design Report

Page 22: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

A More Grid-like Model

CERNTier2

Lab a

Lancs

Lab c

Uni n

Lab m

Lab b

Uni bUni y

Uni x

PhysicsDepartment

Desktop

Germany

Tier 1

USAFermiLab

UK

France

Italy

NL

USABrookhaven

……….

The LHC Computing

Facility NorthGrid

SouthGrid

LondonGrid

ScotGrid

Page 23: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Features of the ModelFeatures of the Model

ALL T1 faciltiies have 1/6 of the raw dataALL T1 faciltiies have 1/6 of the raw data Allows reprocessing!

All T1 facilities have 1/3 of the full reconstructed dataAll T1 facilities have 1/3 of the full reconstructed data Allows more on disk/fast access space, saves tape

All regional facilities have all of the analysis data (AOD)All regional facilities have all of the analysis data (AOD)

Centres become facilities (even at T2 level)Centres become facilities (even at T2 level) Facilities are Regional and NOT National Facilities are Regional and NOT National

Physicists from other Regions should have also Access to the Computing Resources

Cost sharing is an issue Implications for the Grid middleware on accounting and prioritiesImplications for the Grid middleware on accounting and priorities

Between experiments Between regions Between analysis groups Virtual Organization Management System

Also, different activities will require different prioritiesAlso, different activities will require different priorities

Page 24: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Operation of Tier-0Operation of Tier-0

The Tier-0 facility at CERN will have to:The Tier-0 facility at CERN will have to:

hold a copy of all raw data to tape

copy in real time all raw data to Tier-1’s (second copy useful also for later

reprocessing)

keep calibration data on disk

run first-pass reconstruction

distribute ESD’s to external Tier-1’s (2/N to each one of N Tier-1’s)

Currently under discussion:Currently under discussion:

“shelf” vs “automatic” tapes

archiving of simulated data

sharing of facilities between HLT and Tier-0

Tier-0 will have to be a dedicated facility, where the CPU power and network Tier-0 will have to be a dedicated facility, where the CPU power and network

bandwidth match the real time event ratebandwidth match the real time event rate

Page 25: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

The Global ViewThe Global ViewDistribution to ~6 T1s

Each T1 holds 1/3 of the reconstructed data. The ability to do research requires therefore a sophisticated software infrastructure for complete and convenient data-access for the whole collaboration, and sufficient

network bandwidth (2.5 Gb/s) for keeping up the data-transfer from T0 to T1s.

Each T1 holds 1/3 of the reconstructed data. The ability to do research requires therefore a sophisticated software infrastructure for complete and convenient data-access for the whole collaboration, and sufficient

network bandwidth (2.5 Gb/s) for keeping up the data-transfer from T0 to T1s.

Page 26: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Operation of Tier-1’s and Tier-2’sOperation of Tier-1’s and Tier-2’s

We envisage at least 6 Tier-1’s for ATLAS. Each one will:We envisage at least 6 Tier-1’s for ATLAS. Each one will: keep on disk 2/N of the ESD’s and a full copy of AOD’s and

TAG’s keep on tape 1/N of Raw Data keep on disk 2/N of currently simulated ESD’s and on tape 1/N of

previous versions provide facilities (CPU and disk space) for Physics Group

analysis of ESDs run simulation, calibration and/or reprocessing of real data

We estimate ~4 Tier-2’s for each Tier-1. Each one will:We estimate ~4 Tier-2’s for each Tier-1. Each one will: keep on disk a full copy of AOD’s and TAG’s (possibly) keep on disk a selected sample of ESD’s provide facilities (CPU and disk space) for user analysis (~25

users/Tier-2) run simulation and/or calibration procedures

Page 27: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Analysis on Tier-2’s and Tier-3’sAnalysis on Tier-2’s and Tier-3’s

This area is under the most active changeThis area is under the most active change We are trying to forecast resource usage and usage patterns

from Physics Working Groups

Assume about ~10 selected large AOD datasets, one for Assume about ~10 selected large AOD datasets, one for each physics analysis groupeach physics analysis group

Assume that each large local centre will have full TAG to Assume that each large local centre will have full TAG to allow simple selectionsallow simple selections

Using these, jobs submitted to T1 cloud to select on full ESD New collection or ntuple-equivalent returned to local resource

Distributed analysis systems under developmentDistributed analysis systems under development Metadata integration, event navigation, database designs are

all at top priority ARDA may help, but will be late in the day for DC2 (risk of

interference with DC2 developments)

Page 28: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Resource SummaryResource Summary

  CERN All T1 All T2 Total

Auto tape (Pb) 4.4 7.2 1.4 12.9

Shelf tape (Pb) 3.2 0.0 0.0 3.2

Disk (Pb) 1.9 6.8 3.5 12.2

CPU (MSI2k) 4.8 12.7 4.8 23.7

Page 29: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

New ATLAS Production System New ATLAS Production System

LCG NG Grid3 LSF

LCGexe

LCGexe

NGexe

G3exe

LSFexe

super super super super super

ProdDBData Man.

System

RLS RLS RLS

jabber jabber soap soap jabber

Don Quijote

Windmill

Lexor

AMI

CaponeDulcinea

Much of the problem is data management

This must cope with >= 3 Grid catalogues

The demands will be greater for analysis

Page 30: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

GANGAGANGAInterfacing Interfacing Athena/Gaudi Athena/Gaudi to the to the

GRIDGRID

Athena/GAUDI Application

GANGA/GrappaGU

IjobOptions/Virtual DataAlgorithms

GRIDServices

HistogramsMonitoringResults

?

?

For LHCb an end-to-end solution

For ATLAS a front end

For Babar a working option!

Major contribution from Alvin

Tan – Job Options Editor, design

Highly rated in GridPP review

This is a substantial

UK contribution

Page 31: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

GANGA DesignGANGA Design

- User has access to functionality of - User has access to functionality of GANGA components through GUI GANGA components through GUI and CLI, layered one over the other and CLI, layered one over the other above a Python software busabove a Python software bus

- Components used by GANGA Components used by GANGA to to define a jobdefine a job Python classes Python classes

- They fall into 3 categories:They fall into 3 categories: Ganga components of general

applicability (to right in diagram)

Ganga components providing specialised functionality (to left in diagram)

External components (at bottom in diagram)

Page 32: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

ATLAS-specific environment

ATLAS-specific environment

Analysis: Next ComponentAnalysis: Next Component

Next step: Grid for distributed analysisNext step: Grid for distributed analysis Run analysis jobs from ‘home’ computer:

Jobs partitioned and sent to centres where the data resides and/or Relevant data extracted from the remote centres and transferred to local

installation

ARDA will eventually provide the lower middlewareARDA will eventually provide the lower middleware First prototype in test, but too late for computing model tests this year

Information Service

Authentication

Authorisation

Audi ting

Grid Monitoring

Workload Management

Metadata Catalogue

File Catalogue

Data Management

Computing Element

Storage Element

Job Monitor

Job Provenance

Package Manager

DB Proxy

User Interface

API

Accounting

7:

12:

5:

13:

8:

15: 11:

9: 10:

1:

4:

2:

3:

6:

14:

Personal view on importance

of ARDA:

Central role of clients

(deployment over

development)

Page 33: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

ATLAS Distributed Analysis & ATLAS Distributed Analysis & GANGAGANGA

The ADA (ATLAS Distributed Analysis) project started in late 2003 to bring together in a The ADA (ATLAS Distributed Analysis) project started in late 2003 to bring together in a

coherent way all efforts already present in the ATLAS Collaboration to develop a DA coherent way all efforts already present in the ATLAS Collaboration to develop a DA

infrastructure:infrastructure: GANGA (GridPP in the UK) – front-end, splitting DIAL (PPDG in the USA) – job model

It is based on a client/server model with an abstract interface between servicesIt is based on a client/server model with an abstract interface between services thin client in the user’s computer, “analysis service” consisting itself of a collection of services in

the server

The vast majority of GANGA modules fit easily into this scheme (or are being integrated The vast majority of GANGA modules fit easily into this scheme (or are being integrated

right now):right now): GUI, CLI, JobOptions editor, job splitter, output merger, ...

Job submission will go through (a clone of) the production systemJob submission will go through (a clone of) the production system using the existing infrastructure to access resources on the 3 Grids and the legacy systems

The forthcoming release of ADA (with GANGA 2.0) will have the first basic functionality to The forthcoming release of ADA (with GANGA 2.0) will have the first basic functionality to

allow DC2 Phase III to proceedallow DC2 Phase III to proceed

Page 34: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

AnalysisFramework

Job 1

Job 2

Application Task

Dataset 1

AnalysisService

1. Locate

2. select 3. Create or select

4. select

5. submit(app,tsk,ds)

6. splitDataset

Dataset 2

7. create

e.g. ROOT

e.g. athena

Result9. create

10. gather

Result 9. create

exe, pkgs scripts, code Atlas Data AnalysisGANGA+DIAL+AtCom+CMT/Pacman

Page 35: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Analysis SystemAnalysis System

First prototype existsFirst prototype exists Integrate with the ARDA back-endIntegrate with the ARDA back-end Much work needed on metadata Much work needed on metadata

for analysis (LCG and GridPP for analysis (LCG and GridPP metadata projects)metadata projects)

N.B. GANGA allows non-N.B. GANGA allows non-production MC job submission production MC job submission and data reconstruction end-to-and data reconstruction end-to-end in LCGend in LCG Middleware service interfaces

CEWMS FileCatalogue

etc. ...etc. Middlewareservices

High level service interfaces (AJDL)

Analysis Service

ROOTcmd line

Client

GANGAcmd line

Client

GANGATask

Management

GraphicalJob

Builder

GANGAJob

Management

High-levelservices

Client tools

Catalogueservices

GANGA GUI

DatasetSplitter

Dataset

MergerJob

Management

Page 36: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Installation ToolsInstallation Tools

To use the Grid, deployable software must be deployed on the Grid fabrics, and the To use the Grid, deployable software must be deployed on the Grid fabrics, and the deployable run-time environment establisheddeployable run-time environment established

Installable code and run-time environment/configurationInstallable code and run-time environment/configuration No explicit absolute paths – now OK No licensed software – now OK

Deployable package (e.g. set of RPMs)Deployable package (e.g. set of RPMs) Both ATLAS and LHCb use CMT for the software management and environment configuration CMT knows the package interdependencies and external dependencies this is the obvious tool to

prepare the deployable code and to `expose’ the dependencies to the deployment tool rpms, tar

Grid aware tool to deploy the aboveGrid aware tool to deploy the above PACMAN is a candidate which seems fairly easy to interface with CMT, see following talk

This is a substantial

UK contribution

Page 37: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

• POOL/SEAL release (done)

• ATLAS release 7 (with POOL persistency) (done)

• LCG-1 deployment (in progress...)

• ATLAS complete Geant4 validation (done)

• ATLAS release 8 (done)

• DC2 Phase 1: simulation production

• DC2 Phase 2: intensive reconstruction (the real challenge!)

• Combined test beams (barrel wedge)

• Computing Model paper

• Computing Memorandum of Understanding

• ATLAS Computing TDR and LCG TDR

• DC3: produce data for PRR and test LCG-n

• Physics Readiness Report

• Start commissioning run• GO!

2003

2004

2005

2006

2007

NOW

LCG and GEANT 4 Integration

Testing the Computing Model DC2

Testing the Physics Readiness DC3

Data-ready versions

Confront with data

Packages shake-down in DC3 (or earlier)

ready for physics in 2007

ATLAS Computing TimelineATLAS Computing Timeline

Page 38: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Data ChallengesData ChallengesTest Bench – Data ChallengesTest Bench – Data Challenges

ATLAS DC 1ATLAS DC 1 Jul 2002-May 2003Jul 2002-May 2003 Showed the many resources available (hardware, willing

people) Made clear the need for integrated system Very manpower intensive Some tests of Grid software

Mainly driven by HLT and Physics Workshop needs One external driver is sustainable, two is not!

Page 39: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

The goals include:The goals include: Use widely the GRID middleware and tools Large scale physics analysis Computing model studies (document end 2004)

Slice test of the computing activities in 2007 Run as much as possible the production on LCG-2

Simultaneous with Test beamSimultaneous with Test beam Simulation of full ATLAS and 2004 combined Test beam Test the calibration and alignment procedures Using same tools

DC2: May – Sept 2004DC2: May – Sept 2004

Page 40: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Preparation phase: worldwide exercise (May-June 04)Preparation phase: worldwide exercise (May-June 04) Event generation; Simulation; pile-up and digitization All “Byte-stream” sent to CERN

Reconstruction: at Tier0 Reconstruction: at Tier0 ~400 processors, short term, sets scale Several streams

Express lines Calibration and alignment lines Different output streams

ESD and AOD replicated to Tier1 sites Out of Tier0

Re-calibration new calibrations and alignment parameters

Re-processing Analysis

Page 41: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Monitoring & AccountingMonitoring & Accounting

We need to monitor the operation to validate the modelWe need to monitor the operation to validate the model The production database gives a historical integrated view

Publish on the web, in real time, relevant data concerning the

running of DC-2 and event production SQL queries are submitted to the Prod DB hosted at CERN Result is HTML formatted and web published A first basic tool is already available as a prototype

We also need to have snapshots to find bottlenecksWe also need to have snapshots to find bottlenecks Needs Grid monitoring tools

MonaLisa is deployed for Grid3 and NG monitoringMonaLisa is deployed for Grid3 and NG monitoring On LCG: effort to verify the status of the GridOn LCG: effort to verify the status of the Grid

o two main tasks: site monitoring and job monitoringo based on R-GMA & GridICE

Page 42: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

DC3DC3

From the end of September, pre-production begins for DC3From the end of September, pre-production begins for DC3 This will be more than an order of magnitude bigger than

DC2

The Physics TDR will be a major driver The Physics TDR will be a major driver We will have many real users The last chance to validate the software and computing

before the real data

Page 43: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

ConclusionsConclusions

The Grid is the only practical way to function as a world-wide The Grid is the only practical way to function as a world-wide collaborationcollaboration

DC1 showed we have many resources, especially peopleDC1 showed we have many resources, especially people Grid projects are starting to deliverGrid projects are starting to deliver

Slower than desirable Tensions over manpower Problems of coherence

Real tests of the computing model due this yearReal tests of the computing model due this year Serious and prompt input needed from the community Revised costs are encouraging

Real sharing of resources is requiredReal sharing of resources is required The rich must shoulder a large part of the burden Poorer members must also contribute This technology allows them to do this more effectively

Page 44: ATLAS, eScience and the Grid Birmingham 9 th June 2004 RWL Jones Lancaster University

RWL Jones, Lancaster University

Data Management Data Management ArchitectureArchitecture

AMI

ATLAS Metatdata Interface

Query LFN

Associated attributes and

values

Don Quixote

replaces replaces MAGDAMAGDA

Manage Manage replication, replication,

physical locationphysical location

VDC

Virtual Data

Catalog

Derive and

transform LFNs