g.rahal lhc computing grid: ccin2p3 role and contribution kisti-ccin2p3 workshop ghita rahal kisti,...
TRANSCRIPT
G.Rahal
LHC Computing Grid: CCIN2P3 role and
ContributionKISTI-CCIN2P3 Workshop
Ghita Rahal
KISTI, December 1st, 2008
G.Rahal 2
Index
• LHC computing grid• LCG France• LCG at CCIN2P3• Infrastructure Validation: An
example with Alice• General issues• Conclusions
Credits to Fabio Hernandez (CC), Latchezar Betev (Alice)
G.Rahal
Worldwide LCG Collaboration
• LHC Computing Grid■ Purpose: develop, build and maintain a
distributed computing environment for the storage and processing of data for the 4 LHC experiments
• Ensure the computing service and common (to the 4 experiments) application libraries and tools
■ Resources contributed by the countries participating in the experiments
• Commitments made each October year N for year N+1
• Planning 5-year forward
4
G.Rahal
LHC Data Flow
• Raw data generated by the detectors that need to be permanently stored
• These figures don't include neither the derived nor the simulated data
5
Experiment Data rate [MB/sec]
Alice 100
Atlas 320
CMS 220
LHCb 50
Σ all experiments
690
Accelerator duty cycle:
14 hours/day, 200 days/year
7 PB of additional raw data per nominal year
Accelerator duty cycle:
14 hours/day, 200 days/year
7 PB of additional raw data per nominal year
G.Rahal
Processing power for LHC data
• Computing resource requirements■ All LHC experiments for 2009
6
Type of Resource
Requirement
CPU 183 M SpecInt2000
Disk storage 73 PB
Tape storage 71 PB
• About 28.000 quad-core Intel Xeon 2.33 GHz (Clovertown) CPUs (14.000 compute nodes)• … and 5 MW of electrical power!!!
• About 28.000 quad-core Intel Xeon 2.33 GHz (Clovertown) CPUs (14.000 compute nodes)• … and 5 MW of electrical power!!!More than 73.000 1TB-disk
spinsMore than 73.000 1TB-disk
spins
Source: WLCG Revised Computing Capacity Requirements, Oct. 2007
G.Rahal
WLCG Architecture (cont.)
• Resource location per tier level
7
Significant fraction of the resources distributed over 130+
centres
Significant fraction of the resources distributed over 130+
centres
G.Rahal
Tier-1 centres
8
Institution CountryExperiments served with priority
ALICE ATLAS CMS LHCb
CA-TRIUMF Canada
ES-PIC Spain
FR-CCIN2P3 France
DE-KIT Germany
IT-INFN-CNAF Italy
NDGF DK/FI/NO/SE
NL-T1 Netherlands
TW-ASGC Taiwan
UK-T1-RAL United Kingdom
US-T1-BNL USA
US-FNAL-CMS USA
Total 6 10 7 6
Sour
ce:
WLC
G M
emor
andu
m o
f Und
erst
andi
ng –
200
7/12
/07
G.Rahal
LCG-France project
• Goal■ Setup, develop and maintain a WLCG Tier-1 and an Analysis Facility
at CC-IN2P3■ Promote the creation and coordinate the integration of Tier-2/Tier3
French sites into the WLCG collaboration• Funding
■ national funding for tier-1 and AF■ Tier-2s and tier-3s funded by universities, local/regional
governments, hosting laboratories, …• Schedule
■ Started in June 2004■ 2004-2008: setup and ramp-up phase■ 2009 onwards: cruise phase
• Equipment budget for Tier-1 and Analysis Facility■ 2005-2012: 32 M€
G.Rahal
LCG-France
11
Lyon
Clermont-Ferrand
Annecy
Marseille
Nantes
Ile-de-France
Strasbourg
CC-IN2P3:tier-1 & analysis facility
CC-IN2P3:tier-1 & analysis facility
Subatech: tier-2Subatech: tier-2
GRIF: tier-2•APC
•CEA/DSM/IRFU•IPNO•LAL•LLR
•LPNHE
GRIF: tier-2•APC
•CEA/DSM/IRFU•IPNO•LAL•LLR
•LPNHE
LPC: tier-2LPC: tier-2
IPHC: tier-3IPHC: tier-3
LAPP: tier-2LAPP: tier-2
CPPM: tier-3CPPM: tier-3
IPNL: tier-3IPNL: tier-3
Source: http://lcg.in2p3.fr
Grenoble
LPSC: tier-3LPSC: tier-3
G.Rahal
Associated Tier-2s
12
IHEP- ATLAS/CMS TIER-2 in BEIJING
ICEPP – ATLAS TIER-2 in TOKYO
CC-IN2P3 - LYONCC-IN2P3 - LYON
ROMANIAN ATLAS FEDERATIONROMANIAN ATLAS FEDERATION
BELGIUM CMS TIER-2s BELGIUM CMS TIER-2s
G.Rahal
LCG-France sites
Alice Atlas CMS LHCb
T1 CC-IN2P3
Tier-2
Analysis Facility Lyon
GRIF (Paris Region)
LAPP (Annecy)
LPC (Clermont)
Subatech (Nantes)
Tier-3
CPPM (Marseille)
IPHC (Strasbourg)
IPNL (Lyon)
LPSC (Grenoble)
• Most sites serve the needs of more than one experiment and group of users
13
G.Rahal 14
% of Required Resources in all Tier-2 Sites
2007-2012
0%
5%
10%
15%
20%
Ratio CPU (% All Tiers) 12% 10% 12% 10% 10% 10%
Ratio Disk (% All Tiers) 9% 7% 8% 8% 10% 8%
2007 2008 2009 2010 2011 2012
All
Exp
erim
ents
Tier-2s planned contribution
installed/pledges CPU / Disk capacity (31/06/08)
CC
-IN
2P3
AF
GR
IF
LAP
P
LPC
Cle
rmon
t
Sub
atec
h
0%
25%
50%
75%
100%
125%
150%
Balance 2008 forLCG-France Tier-2 sites
Rat
io i
nst
alle
d/p
led
ges
200
8
Disk : -20 %
installed CPU = 100 % 2008 pledges installed disk = 80% 2008 pledges
G.Rahal
Connectivity
• Excellent connectivity to other national and international institutions provided by RENATER
• The role of the national academic & research network is instrumental for the effective deployment of the grid infrastructure
15
Sourc
e:
Frank
Sim
on,
REN
ATER
Cadarache
Tours
Le Mans
Angers
Kehl
Dark fiber
2,5 Gbit/s link
1 Gbit/s (GE) link
Genève (CERN)
G.Rahal
LCG-France tier-1 & AF
17
Roughly equivalent to 305 Thumpers
(with 1TB disks) or 34 racks
Roughly equivalent to 305 Thumpers
(with 1TB disks) or 34 racks
G.Rahal
2007
CPU ongoing activity at CC
19
ATLA
SC
MS
Alic
eLH
Cb
NOTE: scale is not the
same on all plots
NOTE: scale is not the
same on all plots
2007 2008
2007 2008
G.Rahal
LCG tier-1: availability & reliability
23
Scheduled shutdown of services on:18/09/200703/11/200711/03/2008
Scheduled shutdown of services on:18/09/200703/11/200711/03/2008
Source: WLCG T0 & T1 Site Reliability Reports
G.Rahal
LCG tier-1: availability & reliability (cont.)
24
Source: WLCG T0 & T1 Site Reliability Reports
G.Rahal
Validation program: Goal
• Registration of Data in T0 and on the GRID .• T0T1 replication• Condition Data on the GRID• Quasi online reconstruction
■ Pass 1 at T0■ Reprocessing at T1■ Replication of ESD : T1T2/CAFs
• Quality Control• MC production and user’s analysis at T2/CAFs
26
G.Rahal
27
Data flow and rates
CASTOR2DAQ
rfcp
CAF
xrootd
T1 storage
reco@T0
FTS
Gridftp 60MB/s
xrootd
First part: ½ nominal acquisition rate p+p (DAQ) + nominal rate for distribution
Average:60MB/s
Pic: 3GB/s
Source: L. Betev
G.Rahal
CCRC08 15 February- 10 March
• Tests with half the DAQ-to-CASTOR rates
28
• 82TB total with 90K files (0.9 GB/file)• 70% of the nominal monthly volume p+p
G.Rahal
T0 T1 replication
29
End of data taking End of data taking
Expected Rate: 60 MB/sExpected Rate: 60 MB/s
G.Rahal
T0 CC-IN2P3
31
End of data taking End of data taking
Tests before Run III (May)
Tests before Run III (May)
G.Rahal
T0 CC-IN2P3 ALL
32
Note: the expected rates are still unknown for some experiments (and keep changing).
This is the goal according to the Megatable, which is the reference document (even if it is no longer
maintained)
Note: the expected rates are still unknown for some experiments (and keep changing).
This is the goal according to the Megatable, which is the reference document (even if it is no longer
maintained)
G.Rahal
Alice CCRC08 : May period
• Detector activities• Alice offline upgrades
• New VO-box installation • New AliEn version • Tuning of reconstruction software• Exercise of ‘fast lane’ calib/alignment
procedure…
• Data Replication • T0->T1 Scheduled according to Alice shares
33
G.Rahal
May: All 4 experiments concurrently
34
Note: the expected rates are still unknown for some experiments (and keep changing).
This is the goal according to the Megatable, which is the reference document (even if it is no longer
maintained)
Note: the expected rates are still unknown for some experiments (and keep changing).
This is the goal according to the Megatable, which is the reference document (even if it is no longer
maintained)
• Tier-0 → CCIN2P3
G.Rahal
Post Mortem CCRC08
• Reliable central Data Distribution• High CC-IN2P3 efficiency/stability
(dCache, FTS,…)• Good and high performance of French
Tier 2s• Shown large security margins for
transfers between T1 and T2
35
G.Rahal
High priority: Analysis Farm 1/2
• Time to Concentrate on users analysis:■ Must take place in parallel with other tasks■ Unscheduled burst access to the data■ User expects fast return of her/his output■ Interactivity….
• At CC ongoing activity: ■ identify the needs■ Setup a common infrastructure for the 4 LHC
experiments.
36
G.Rahal
High priority: Analysis Farm 2/2
• At CC ongoing activity cont’d: ■ Goal: prototype to be tested beginning of
2009.
• Alice Specifics:■ Farm design already in test at CERN. Expect
to deploy one in France according to specs but shareable with other experiments.
37
G.Rahal
General Issues for CC-IN2P3
• Improve each component:■ Storage: higher performances for HPSS and
improved interactions with dCache,■ Increase level of redundancy of the services
to decrease human interventions (Voboxes, LFC,…)
■ Monitoring, Monitoring, Monitoring…..
• Manpower: Need to reach higher level of staffing, mainly for storage.
38
G.Rahal
Conclusion
• 2008 challenge has shown the Capability of LCG-France to meet the challenges of the computing for LHC
• It has also shown the need of permanent background test and monitoring of the worldwide platform
• Need to improve the level of reliability of storage and data distribution components.
39
G.Rahal
ALICE Computing Model
• p-p:■ Quasi-online data distribution and first reco at T0■ Further reconstruction at Tiers-1s
• AA■ Calibration, alignment and pilot recon during data
taking■ data distribution and first reco At T0
• One Copy of RAW at T0 and one among Tier-1s
41