Download - Site report: CERN
Site report: CERN
Helge Meinhard (at) cern ch
HEPiX fall 2004 @ BNL
2
Events
50 years of CERN 29 Sep: LHC ring illumination 16 Oct: Open day at CERN: 32’000 visitors 19 Oct: Official ceremony with VIPs
CHEP Interlaken 27 Sep – 01 Oct Record-breaking attendance: more than 500 Lot of work… and lot of fun to see it all run
well
3
Fabric Infrastructure and Operations (1)
Central Data Recording in full swing Experiment peaks of 120 MB/s, combined average 10
TB/day, 1.5 PB expected in 2004 Major problem with disk server stability
Finally whole batch, ie. 1200 disks (~20 % of our total disk count) replaced
Many stateful servers successfully quattorised (J. van Eldik)
Machine room refurbishment going on LHS of machine room now empty, all machines and
services moved to RHS and vault
4
5
Fabric Infrastructure and Operations (2)
New machines 400 Farm machines being installed, alleviating
pressure on batch farm Awaiting 75 disk servers (360 TB SATA, 3Ware
9500), plus 22 disk arrays (140 TB SATA, Infortrend, FC attachment)
System administration team taking over more specialised tasks and installations
Legato phased out, new machine for TSM coming
6
Fabric Infrastructure and Operations (3)
Media migration to 9940 completed, evaluations of new tape drives coming
Purchasing for 2006…2008 becoming clearer
Lxbatch field-proving Quattor again and again (T. Kleinwort)
Preparing for LSF 6
7
Architecture and Data Challenges (1)
Worked with FNAL on SL: SLC3 in final phase of certification Production-like systems for tests
Support from RedHat started in July OpenLab
Voltaire joined as contributing partner Project proposal on security Disk server studies: 4-way Itanium SIAB (Storage In
A Box), 2 x 3Ware 9500, 24 SATA disks: 750 MB/s read, 340 MB/s write, 350 MB/s read over 10GbE (J. Iven)
8
Architecture and Data Challenges (2)
New Castor successfully tested in full chain
OpenAFS: High load on big servers revealed bugs CERN providing fixes to OpenAFS
Still investigating migration to krb5 Consolidated HEPiX scripts released
9
Data Bases
Looking at Oracle 10g Strong collaboration with Oracle (2 Oracle
funded fellows in OpenLab) DataGuard (disaster recovery and scheduled
upgrades) Oracle Streams (replication)
Preparing for a high-availability Oracle cluster (RAC) running under Linux
POOL passed real test with 400 TB of data (D. Duellmann)
10
Product Support (1)
Solaris 9 certification delayed, now targeted for end 2004 Moving target Debugging took longer than expected, as did writing NCM
components Solaris will be much more in line with Linux than before Will use software from Solaris Software Companion
Solaris security Certification machines compromised three times Awareness and requirements raised Considering Ipfilter, AV software, and automatic security
updates
11
Product Support (2)
CVS services 100 projects, 14 GB of source code, 4 new
projects/month Some servers compromised by pserver exploit Finer control of Web-based access required
Monitoring migrating from UIMON to Lemon (M. Schröder)
Transition to new desktop support contract New contractor since 01 July
Licensing infrastructure (M. Schröder)
12
User and Document Services
Procedure rolling for new supply contract for CERN desktops
InDiCo (conference management tool) passed field test at CHEP with highest marks
Cern Computer Newsletter changed: Dedicated pages on computing in every other issue of
CERN Courier More technical and CERN-specific info in online CNL
13
Internet Services
Windows Terminal Server now production service (R. Gaspar)
SMS 2003 client deployed everywhere (R. Otto) > 100 new servers, Windows 2003 server Mail
Secure (SSL) services for imap, pop, smtp as an option
Accepting mail requires successful reverse DNS lookup and successful smtp connect
Listbox service renewal (R. Gaspar) Spam fighting, Exchange 2003 (R. Otto)
14
Security
Many incidents Sasser worm, cvs breakin, Windows machines taken over by
intruders, Solaris problems, lxplus root compromised, … Lessons:
Tightening firewalls and patch handling Dual-boot machines are problematic Non-centrally managed machines are problematic
Visitors’ laptops are very non-centrally managed… Local Windows admin account forcefully secured by strong
password Viruses arrive at CERN faster than updated signatures P2P activities often the source of problems
15
Miscellaneous
Grid Deployment moving ahead (I. Bird) 82 sites, 9000 CPUs in LCG2
ACB (Automatic Call-Back) on its way out Call-back and ‘0800’ numbers stopped in July Final switch-off end 2004
Pilot for IP telephony Beam studies for the LHC à la SETI:
LHC@home LCG service challenges (HEPiX Edinburgh) LCG MoU being prepared