gridpp3 project status sarah pearce 24 april 2010 gridpp25 ambleside
TRANSCRIPT
2GridPP24, Ambleside
Skiddaw
• The 4th highest mountain in England (or the 3rd, depending)
• The “simplest of the mountains of this height to ascend”
• A well trodden tourist track
• The first summit of the ‘Bob Graham Round’ fell running challenge
• The view from the top is ‘panoramic’
24/8/10
3GridPP24, Ambleside
Since the last meeting
• LHC continues to take data – see Pete’s talk• EGEE finished and EGI started – see Jeremy and
Andy’s discussion• Tier-1 running well• Tier-2s procuring more equipment from the 2nd
round of hardware grants• GridPP4 proposal reviewed and accepted – see
Dave’s talk
24/8/10
4GridPP24, Ambleside
Tier-1
• CPU hardware delivered and commissioned in time to meet WLCG pledge
• One tranche of disk delivery still going through acceptance
• Procurements for next round of CPU and disk have started
• Testing for upgrade to CASTOR 2.1.9 (from 2.1.7)• Operations very stable
24/8/10
5GridPP24, Ambleside
Tier-2s
• RHUL cluster successfully running in new RHUL machine room
• UCL-Central removed from list of UK sites• All grants for 2nd tranche of hardware issued: sites
procuring hardware to meet 2010 pledge.• Several sites made significant upgrades, including:
– Sheffield (inc. air con/ temperature monitoring equipment)– Lancaster kit for new machine room– Cambridge increased disk and CPU– IC moved site outside firewall – x2 improvement in
performance• Some issues with staffing (Durham, likely at Bristol)
– Discussion today at PMB/ DB on how to cover sites with small amounts (or no) dedicated staff
24/8/10
6GridPP24, Ambleside
EGI, EGI-Inspire etc.
• EGI started operations on 1 May 2010– Governed by EGI Council – Executive Board reports to Council –
Neil Geddes elected member of the EB– Key staff now in Amsterdam (except
Neasan)– First Technical Forum will Sept 14-17 in
Amsterdam
• EGI-InSPIRE also started– Grant Agreement with EC not signed
yet – so no money so far
• e-ScienceTalk will start 1 September– funds UK staff at IC and QMUL
24/4/10
7GridPP24, Ambleside
UKI CPU contribution (LHC)
CPU August 2010 – GStat2.0
24/8/10
Since April 2010
Country stats
10GridPP24, Ambleside
Storage
• From GStat (and previous talks…)
September 2008 March 2009 September 2009
April 2010
24/8/10
• From GStat2.0 (today)
13GridPP24, Ambleside
Experiments
ATLAS
• T1 data acceptance from CERN, T1s and T2s up from 79% to 96%
• Data availability in T2 storage is green, but this hides quite significant SE issues at some sites
LHCb
• Sharp drop in the proportion of production computing taking place in the UK, from 28% to 16% - early user jobs at CERN
• Issue with data transfer from the T2s to RAL (1.2.5)
• Ganga milestone delayed (Integrate XML job summary from Dirac into Ganga) due to setting up new DAST
CMS
• Some data loss at T1 and T2 but not considered significant by CMS
• Going well – CMS recognises the UK’s contribution
Other experiments
• MINOS, D0 and Babar mainly this quarter
• Red milestones for experiment satisfaction/user support questionnaire – waiting on ATLAS reply
24/8/10
14GridPP24, Ambleside
Grid services
Operations
• 2.1.3 Fraction job slots used (Target 80%, achieved 37%). Overall occupancy low this quarter.
Security
• No incidents this quarter
Networking
• No red metrics. Second (resilient) OPN link from RAL is operational
Data and storage
• Record FTS transfer rates (2.4.4), with an average over 370 MB/s sustained over the whole quarter
• Still questions over published storage values
24/8/10
15GridPP24, Ambleside
Tier-1
• T1 operating extremely well. Nearly all metrics for front-end systems at 100%.
• CASTOR SAM tests at 100% for the first time (3.4.8)
• Red metrics for farm occupancy (43%, against a target of 80%, 3.2.11)
• Red milestone for 2009 disk hardware accepted. One tranche of disk capacity failed acceptance – firmware fix and running again.
• Red milestone on moving out of Atlas centre – revised and will be met next quarter
24/8/10
16GridPP24, Ambleside
Tier-2s
• % of promised CPU available – green for all Tier-2s (metric 2). % of disk red for NorthGrid, but procurements underway. Next quarter will be measured against 2010 pledge.
• SAM availability and reliability tests green or orange (so above 90%) for most Tier-2s (metrics 3&4). Range of issues at SouthGrid sites.
• Other red metrics:
• CPU utilisation (wall clock time & CPU time, metrics 7/8) LondonGrid, SouthGrid – but generally low
• Number of management meetings NorthGrid (metric 11)
• Staff changes at several sites (Durham, Glasgow, Manchester, QMUL)
24/8/10
17GridPP24, Ambleside
Management and external
Project execution – red metrics• All quarterly reports in by target time
(though some earlier than others…)• Red metric for no. of UB meetings
Rest of Map• No red metrics• EGEE/EGI metrics being revised to reflect
EGI start
24/8/10
18GridPP24, Ambleside
Risk register
24/8/10
• 3 high level risks– Recruitment and retention – more of an issue as we get closer to
GridPP4– Sudden loss of key staff – as above– Uncertain long term funding. GridPP4 approved, but government
funding an issue everywhere
20GridPP24, Ambleside
Finances
• Substantial reduction in the Tier-1 FY10 hardware line – STFC requested reduced capital spend of £1.1m– New experiment resource requirements from C-RRB in
April 2010. Overall (to 2015) reduction in disk and CPU but increase in custodial storage.
• Second tranche of Tier-2 hardware grants all issued• Bridging posts for EGEE-funded staff• Travel costs £173k for 09/10 – within budget• Small amount of funding for R-GMA over 6 months
24/8/10