g.rahal lhc computing grid: ccin2p3 role and contribution kisti-ccin2p3 workshop ghita rahal kisti,...

42
G.Rahal LHC Computing Grid: CCIN2P3 role and Contribution KISTI-CCIN2P3 Workshop Ghita Rahal KISTI, December 1st,

Upload: jordan-berry

Post on 30-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

G.Rahal

LHC Computing Grid: CCIN2P3 role and

ContributionKISTI-CCIN2P3 Workshop

Ghita Rahal

KISTI, December 1st, 2008

G.Rahal 2

Index

• LHC computing grid• LCG France• LCG at CCIN2P3• Infrastructure Validation: An

example with Alice• General issues• Conclusions

Credits to Fabio Hernandez (CC), Latchezar Betev (Alice)

G.Rahal 3

G.Rahal

Worldwide LCG Collaboration

• LHC Computing Grid■ Purpose: develop, build and maintain a

distributed computing environment for the storage and processing of data for the 4 LHC experiments

• Ensure the computing service and common (to the 4 experiments) application libraries and tools

■ Resources contributed by the countries participating in the experiments

• Commitments made each October year N for year N+1

• Planning 5-year forward

4

G.Rahal

LHC Data Flow

• Raw data generated by the detectors that need to be permanently stored

• These figures don't include neither the derived nor the simulated data

5

Experiment Data rate [MB/sec]

Alice 100

Atlas 320

CMS 220

LHCb 50

Σ all experiments

690

Accelerator duty cycle:

14 hours/day, 200 days/year

7 PB of additional raw data per nominal year

Accelerator duty cycle:

14 hours/day, 200 days/year

7 PB of additional raw data per nominal year

G.Rahal

Processing power for LHC data

• Computing resource requirements■ All LHC experiments for 2009

6

Type of Resource

Requirement

CPU 183 M SpecInt2000

Disk storage 73 PB

Tape storage 71 PB

• About 28.000 quad-core Intel Xeon 2.33 GHz (Clovertown) CPUs (14.000 compute nodes)• … and 5 MW of electrical power!!!

• About 28.000 quad-core Intel Xeon 2.33 GHz (Clovertown) CPUs (14.000 compute nodes)• … and 5 MW of electrical power!!!More than 73.000 1TB-disk

spinsMore than 73.000 1TB-disk

spins

Source: WLCG Revised Computing Capacity Requirements, Oct. 2007

G.Rahal

WLCG Architecture (cont.)

• Resource location per tier level

7

Significant fraction of the resources distributed over 130+

centres

Significant fraction of the resources distributed over 130+

centres

G.Rahal

Tier-1 centres

8

Institution CountryExperiments served with priority

ALICE ATLAS CMS LHCb

CA-TRIUMF Canada

ES-PIC Spain

FR-CCIN2P3 France

DE-KIT Germany

IT-INFN-CNAF Italy

NDGF DK/FI/NO/SE    

NL-T1 Netherlands    

TW-ASGC Taiwan      

UK-T1-RAL United Kingdom

US-T1-BNL USA      

US-FNAL-CMS USA      

Total 6 10 7 6

Sour

ce:

WLC

G M

emor

andu

m o

f Und

erst

andi

ng –

200

7/12

/07

G.Rahal 9

G.Rahal

LCG-France project

• Goal■ Setup, develop and maintain a WLCG Tier-1 and an Analysis Facility

at CC-IN2P3■ Promote the creation and coordinate the integration of Tier-2/Tier3

French sites into the WLCG collaboration• Funding

■ national funding for tier-1 and AF■ Tier-2s and tier-3s funded by universities, local/regional

governments, hosting laboratories, …• Schedule

■ Started in June 2004■ 2004-2008: setup and ramp-up phase■ 2009 onwards: cruise phase

• Equipment budget for Tier-1 and Analysis Facility■ 2005-2012: 32 M€

G.Rahal

LCG-France

11

Lyon

Clermont-Ferrand

Annecy

Marseille

Nantes

Ile-de-France

Strasbourg

CC-IN2P3:tier-1 & analysis facility

CC-IN2P3:tier-1 & analysis facility

Subatech: tier-2Subatech: tier-2

GRIF: tier-2•APC

•CEA/DSM/IRFU•IPNO•LAL•LLR

•LPNHE

GRIF: tier-2•APC

•CEA/DSM/IRFU•IPNO•LAL•LLR

•LPNHE

LPC: tier-2LPC: tier-2

IPHC: tier-3IPHC: tier-3

LAPP: tier-2LAPP: tier-2

CPPM: tier-3CPPM: tier-3

IPNL: tier-3IPNL: tier-3

Source: http://lcg.in2p3.fr

Grenoble

LPSC: tier-3LPSC: tier-3

G.Rahal

Associated Tier-2s

12

IHEP- ATLAS/CMS TIER-2 in BEIJING

ICEPP – ATLAS TIER-2 in TOKYO

CC-IN2P3 - LYONCC-IN2P3 - LYON

ROMANIAN ATLAS FEDERATIONROMANIAN ATLAS FEDERATION

BELGIUM CMS TIER-2s BELGIUM CMS TIER-2s

G.Rahal

LCG-France sites

Alice Atlas CMS LHCb

T1 CC-IN2P3

Tier-2

Analysis Facility Lyon

GRIF (Paris Region)

LAPP (Annecy)

LPC (Clermont)

Subatech (Nantes)

Tier-3

CPPM (Marseille)  

IPHC (Strasbourg)

IPNL (Lyon)

LPSC (Grenoble)

• Most sites serve the needs of more than one experiment and group of users

13

G.Rahal 14

% of Required Resources in all Tier-2 Sites

2007-2012

0%

5%

10%

15%

20%

Ratio CPU (% All Tiers) 12% 10% 12% 10% 10% 10%

Ratio Disk (% All Tiers) 9% 7% 8% 8% 10% 8%

2007 2008 2009 2010 2011 2012

All

Exp

erim

ents

Tier-2s planned contribution

installed/pledges CPU / Disk capacity (31/06/08)

CC

-IN

2P3

AF

GR

IF

LAP

P

LPC

Cle

rmon

t

Sub

atec

h

0%

25%

50%

75%

100%

125%

150%

Balance 2008 forLCG-France Tier-2 sites

Rat

io i

nst

alle

d/p

led

ges

200

8

Disk : -20 %

installed CPU = 100 % 2008 pledges installed disk = 80% 2008 pledges

G.Rahal

Connectivity

• Excellent connectivity to other national and international institutions provided by RENATER

• The role of the national academic & research network is instrumental for the effective deployment of the grid infrastructure

15

Sourc

e:

Frank

Sim

on,

REN

ATER

Cadarache

Tours

Le Mans

Angers

Kehl

Dark fiber

2,5 Gbit/s link

1 Gbit/s (GE) link

Genève (CERN)

G.Rahal 16

G.Rahal

LCG-France tier-1 & AF

17

Roughly equivalent to 305 Thumpers

(with 1TB disks) or 34 racks

Roughly equivalent to 305 Thumpers

(with 1TB disks) or 34 racks

G.Rahal 18

LCG-France tier-1 & AF contribution

G.Rahal

2007

CPU ongoing activity at CC

19

ATLA

SC

MS

Alic

eLH

Cb

NOTE: scale is not the

same on all plots

NOTE: scale is not the

same on all plots

2007 2008

2007 2008

G.Rahal

Resource usage (tier-1 + AF)

20

G.Rahal

Resource deployment

21

X 6.7X 6.7

G.Rahal

Resource deployment (cont.)

22

X 3.1X 3.1

G.Rahal

LCG tier-1: availability & reliability

23

Scheduled shutdown of services on:18/09/200703/11/200711/03/2008

Scheduled shutdown of services on:18/09/200703/11/200711/03/2008

Source: WLCG T0 & T1 Site Reliability Reports

G.Rahal

LCG tier-1: availability & reliability (cont.)

24

Source: WLCG T0 & T1 Site Reliability Reports

G.Rahal 25

G.Rahal

Validation program: Goal

• Registration of Data in T0 and on the GRID .• T0T1 replication• Condition Data on the GRID• Quasi online reconstruction

■ Pass 1 at T0■ Reprocessing at T1■ Replication of ESD : T1T2/CAFs

• Quality Control• MC production and user’s analysis at T2/CAFs

26

G.Rahal

27

Data flow and rates

CASTOR2DAQ

rfcp

CAF

xrootd

T1 storage

reco@T0

FTS

Gridftp 60MB/s

xrootd

First part: ½ nominal acquisition rate p+p (DAQ) + nominal rate for distribution

Average:60MB/s

Pic: 3GB/s

Source: L. Betev

G.Rahal

CCRC08 15 February- 10 March

• Tests with half the DAQ-to-CASTOR rates

28

• 82TB total with 90K files (0.9 GB/file)• 70% of the nominal monthly volume p+p

G.Rahal

T0 T1 replication

29

End of data taking End of data taking

Expected Rate: 60 MB/sExpected Rate: 60 MB/s

G.Rahal

T0T1 replication ALL

30

CCRC phase 1

CCRC phase 2

G.Rahal

T0 CC-IN2P3

31

End of data taking End of data taking

Tests before Run III (May)

Tests before Run III (May)

G.Rahal

T0 CC-IN2P3 ALL

32

Note: the expected rates are still unknown for some experiments (and keep changing).

This is the goal according to the Megatable, which is the reference document (even if it is no longer

maintained)

Note: the expected rates are still unknown for some experiments (and keep changing).

This is the goal according to the Megatable, which is the reference document (even if it is no longer

maintained)

G.Rahal

Alice CCRC08 : May period

• Detector activities• Alice offline upgrades

• New VO-box installation • New AliEn version • Tuning of reconstruction software• Exercise of ‘fast lane’ calib/alignment

procedure…

• Data Replication • T0->T1 Scheduled according to Alice shares

33

G.Rahal

May: All 4 experiments concurrently

34

Note: the expected rates are still unknown for some experiments (and keep changing).

This is the goal according to the Megatable, which is the reference document (even if it is no longer

maintained)

Note: the expected rates are still unknown for some experiments (and keep changing).

This is the goal according to the Megatable, which is the reference document (even if it is no longer

maintained)

• Tier-0 → CCIN2P3

G.Rahal

Post Mortem CCRC08

• Reliable central Data Distribution• High CC-IN2P3 efficiency/stability

(dCache, FTS,…)• Good and high performance of French

Tier 2s• Shown large security margins for

transfers between T1 and T2

35

G.Rahal

High priority: Analysis Farm 1/2

• Time to Concentrate on users analysis:■ Must take place in parallel with other tasks■ Unscheduled burst access to the data■ User expects fast return of her/his output■ Interactivity….

• At CC ongoing activity: ■ identify the needs■ Setup a common infrastructure for the 4 LHC

experiments.

36

G.Rahal

High priority: Analysis Farm 2/2

• At CC ongoing activity cont’d: ■ Goal: prototype to be tested beginning of

2009.

• Alice Specifics:■ Farm design already in test at CERN. Expect

to deploy one in France according to specs but shareable with other experiments.

37

G.Rahal

General Issues for CC-IN2P3

• Improve each component:■ Storage: higher performances for HPSS and

improved interactions with dCache,■ Increase level of redundancy of the services

to decrease human interventions (Voboxes, LFC,…)

■ Monitoring, Monitoring, Monitoring…..

• Manpower: Need to reach higher level of staffing, mainly for storage.

38

G.Rahal

Conclusion

• 2008 challenge has shown the Capability of LCG-France to meet the challenges of the computing for LHC

• It has also shown the need of permanent background test and monitoring of the worldwide platform

• Need to improve the level of reliability of storage and data distribution components.

39

G.Rahal 40

G.Rahal

ALICE Computing Model

• p-p:■ Quasi-online data distribution and first reco at T0■ Further reconstruction at Tiers-1s

• AA■ Calibration, alignment and pilot recon during data

taking■ data distribution and first reco At T0

• One Copy of RAW at T0 and one among Tier-1s

41

G.Rahal

ALICE Computing Model• T0:

■ First pass reco, storage of 1 Copy of RAW,■ Calibration and first pass ESD.

• T1■ Storage of % of RAW, ESD’s and AODs on disk■ Reconstructions■ Scheduled analysis

• T2■ Simulation■ End User analysis■ Copy of ESD and AOD

42