wlcg service report [email protected] ~~~ wlcg management board, 24 th november 2009 1

10
WLCG Service Report [email protected] [email protected] ~~~ WLCG Management Board, 24 th November 2009 1

Upload: charla-mccarthy

Post on 04-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

WLCG Service Report

[email protected] [email protected] ~~~

WLCG Management Board, 24th November 2009

1

Page 2: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

Introduction• Covers the weeks 9th to 22th November.

• Mixture of problems

• Incidents leading to (eventual) service incident reports• Kernel upgrade at CNAF took too long (4 days)• CASTOR SRM problems at CERN• GGUS: impossible to ALARM tickets last Friday

2

Page 3: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

Meeting Attendance Summary

3

Site M T W T F

CERN Y Y Y Y Y

ASGC Y Y Y Y Y

BNL Y Y Y Y

CNAF Y Y Y Y

FNAL

FZK Y Y Y Y Y

IN2P3 Y Y Y

NDGF

NL-T1 Y Y Y Y Y

PIC Y Y Y

RAL Y Y Y Y

TRIUMF

Page 4: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

GGUS summary (2 weeks)

VO User Team Alarm Total

ALICE 0 0 0 0

ATLAS 29 91 1 121

CMS 2 1 0 3

LHCb 5 13 0 18

Totals 36 105 1 142

4

Page 5: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

Alarm tickets

• There was one alarm ticket submitted by Atlas• Disk space problem at SARA on 8th November

Page 6: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

GGUS

• with the LHC startup it is now urgent to make sure GGUS infrastructure ensures a 24/7 availability• See https://savannah.cern.ch/support/?101122

Page 7: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

7

Page 8: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

SRM problems at CERN

• Wrong status returned• Thread exhaustion• Core dumps• For example on 20th November

• Problems exporting from CASTORATLAS to Tier1s• https://twiki.cern.ch/twiki/bin/view/FIOgroup/

PostMortem20Nov09

Page 9: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

Miscellaneous Reports

• CMS MC production issues with WMS at CNAF due to Condor bug (fix successfully tested)

• SRM instabilities at BNL (problem now understood)• dCache golden release deployed at SARA, IN2P3 and KIT• Dashboard DB migration at CERN

• OK for Atlas• Low performance after migration for CMS (understood and

fixed)

• AFS problems at CERN: is the service critical for experiments and what is the coverage?

• Fibre cut to PIC (between Barcelona and Madrid)• Incorrectly labeled cartridges at ASGC• CREAM CEs for Alice at CERN are unstable• Nikhef: compute capacity increased, network

infrastructure upgraded to 160 Gbps• PPS has been replaced by staged rollouts for middleware 9

Page 10: WLCG Service Report Jean-Philippe.Baud@cern.ch ~~~ WLCG Management Board, 24 th November 2009 1

Summary/Conclusions

• Kernel security updates went well at all sites but took far too long at CNAF.

• Quite a few SRM instabilities at several sites for different reasons

• CREAM CEs are still unstable

10