the worldwide lhc computing grid wlcg service ramp-up lhcc referees’ meeting, january 2007
TRANSCRIPT
The Worldwide LHC Computing Grid
WLCG Service Ramp-Up
LHCC Referees’ Meeting, January 2007
Ramp-Up Outline
The clear goal for 2007 is to be ready for first data taking ahead of the machine itself
• This translates to:– Dress Rehearsals in the 2nd half of the year– Preparation for these in the 1st half– Continuous service operation and hardening– Continual (quasi-continuous) experiment production
• Different views:– Experiment, site, Grid-specific, WLCG…
• Will focus on first and (mainly) last of these… – Other views, in particular site views, will come shortly
3
WLCG Commissioning WLCG Commissioning ScheduleSchedule
Still an Still an ambitious ambitious programme programme aheadahead
Timely testing Timely testing of full data of full data chain from chain from DAQ to T-2 DAQ to T-2 chain was chain was major item major item from last CR from last CR DAQDAQ T-0 T-0
still largely still largely untesteduntested
Service Ramp-Up
• As discussed at last week’s WLCG Collaboration Workshop, much work has already been done on service hardening – Reliable hardware, improved monitoring & logging, middleware enhancements
Much still remains to be done – this will be an on-going activity during the rest of 2007 and probably beyond
• The need to provide as much robustness in the services themselves – as opposed to constant baby-sitting – is well understood
• There are still new / updated services to deploy in full production (see previous slide)
It is unrealistic to expect that all of these will be ready prior to the start of the Dress Rehearsals
• Foresee a ‘staged approach’ – focussing on maintaining and improving both service stability and functionality (‘residual services’)
Must remain in close contact with both experiments and sites on schedule and service requirements – these will inevitably change with time
• Draft of experiment schedule (from December 2006) attached to agenda• Updated schedules presented last Friday during WLCG w/s (pointer)
5
Running continously throughout the year (increasing rate)Simulation productionCosmic ray data-taking (detector commissioning)
January to June:Data streaming tests
February and May:Intensive Tier0 tests
From February onwards:Data Distribution tests
From March onwards:Distributed analysis (intensive tests)
May to July:Calibration Data Challenge
June to OctoberFull Dress Rehearsal
November:GO!
ATLAS 2007 Timeline
6Stefano Belforte INFN Trieste
Timeline
February Deploy PhEDEx 2.5 T0-T1, T1-T1, T1-T2 independent transfers Restart job robot Start work on SAM FTS full deployment
March SRM v2.2 tests start T0-T1(tape)-T2 coupled transfers (same data) Measure data serving at sites (esp. T1) Production/analysis share at sites verified
April Repeat transfer tests with SRM v2.2, FTS v2 Scale up job load gLite WMS test completed (synch. with Atlas)
May Start ramping up to CSA07
June
WLCG Milestones
• These high-level milestones are complementary to the experiment-specific milestones and more detailed goals and objectives listed in the WLCG Draft Service Plan (see attachment to agenda)– Similar to that prepared and maintained in previous years– Regularly reviewed and updated through LCG ECM– Regular reports on status and updates to WLCG MB / GDB
Focus is on real production scenarios & (moving rapidly to) end to end testing– Time for component testing is over – we learnt a lot but not
enough! – Time before data taking is very short – let alone the dress
rehearsals• All data rates refer to the Megatable and to pp running• Any ‘factors’, such as accelerator and/or service
efficiency, are mentioned explicitly– N.B. ‘catch-up’ is a proven feature of the end-end FTS service
Q1 2007 – Tier0 / Tier1s
1. Demonstrate Tier0-Tier1 data export at 65% of full nominal rates per site using experiment-driven transfers
– Mixture of disk / tape endpoints as defined by experiment computing models, i.e. 40% tape for ATLAS; transfers driven by experiments
– Period of at least one week; daily VO-averages may vary (~normal)2. Demonstrate Tier0-Tier1 data export at 50% of full nominal rates
(as above) in conjunction with T1-T1 / T1-T2 transfers– Inter-Tier transfer targets taken from ATLAS DDM tests / CSA06 targets
3. Demonstrate Tier0-Tier1 data export at 35% of full nominal rates (as above) in conjunction with T1-T1 / T1-T2 transfers and Grid production at Tier1s
– Each file transferred is read at least once by a Grid job– Some explicit targets for WMS at each Tier1 need to be derived from
above4. Provide SRM v2.2 endpoint(s) that implement(s) all methods
defined in SRM v2.2 MoU, all critical methods pass tests– See attached list; Levels of success: threshold, pass, success, (cum
laude)– This is a requirement if production deployment is to start in Q2!
Q2 2007 – Tier0 / Tier1s
• As Q1, but using SRM v2.2 services at Tier0 and Tier1, gLite 3.x-based services and SL(C)4 as appropriate, (higher rates? (T1<->T1/2))
• Provide services required for Q3 dress rehearsals– Includes, for example, production Distributed
Database Services at required sites & scale
• Full detail to be provided in coming weeks…
Measuring Our Level of Success
• Existing tools and metrics, such as CMS PhEDEx quality plots, ATLAS DDM transfer status, provide clear and intuitive views
These plots are well known to the sites and provide a good measure of current status as well as showing evolution with time
• Need metrics for WMS related to milestone 3– CMS CSA06 metrics are a good model
12
DDM Functional Test 2006 (9 Tier-1s, 40 Tier-2s)
Tier-1 Tier-2s Sept 06 Oct 06 Nov 06
ASGC IPAS, Uni Melbourne Failed within the cloud
Failed for Melbourne
T1-T1 not testd
BNL GLT2, NET2,MWT2,SET2, WT2 done done 2+GB & DPM
CNAF LNF,Milano,Napoli,Roma1 65% failure rate
done
FZK CSCS, CYF, DESY-ZN, DESY-HH, FZU, WUP Failed from T2 to FZK
dCache problem
T1-T1 not testd
LYON BEIIJING, CPPM, LAPP, LPC, LPHNE, SACLAY, TOKYO
done done, FTS conn =< 6
NG not tested
not tested
not tested
PIC IFAE, IFIC, UAM Failed within the cloud
done
RAL CAM, EDINBOURGH, GLASGOW, LANCS, MANC, QMUL
Failed within the cloud
Failed for Edinbrg.
done
SARA IHEP, ITEP, SINP Failed IHEP not tested
IHEP in progress
TRIUMFALBERTA, TORONTO, UniMontreal, SFU, UVIC Failed within
the cloudFailed T1-T1
not testd
Ne
w D
Q2
re
lea
se
(0
.2.1
2)
Aft
er
SC
4 t
es
t
Summary
• 2007 will be an extremely busy and challenging year!
For those of us who have been working on LHC Computing for 15+ years (and others too…) it will nonetheless be extremely rewarding
¿ Is there a more important Computing Challenge on the planet this year ?
The ultimate goal – to enable the exploitation of the LHC’s physics discovery potential – is beyond measure