fio services and projects post-c5 february 22 nd 2002 tony.cass@ cern .ch
DESCRIPTION
FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@ CERN .ch. Headline Services. Physics Services P=4FTE; M=800K; I=1,300K Computing Hardware Supply P=6.35FTE; M=15K; I= 200K (funded by sales uplift) Computer Centre Operations P=3.7FTE; M=100K; I= 635K Printing - PowerPoint PPT PresentationTRANSCRIPT
Headline Services Physics Services
– P=4FTE; M=800K; I=1,300K Computing Hardware Supply
– P=6.35FTE; M=15K; I=200K (funded by sales uplift) Computer Centre Operations
– P=3.7FTE; M=100K; I= 635K Printing
– P=3.5FTE; M=20K; I=175K Remedy Support
– P=1.65FTE; M=215K; I=0 Mac Support
– P=1.25FTE; M=5K; I=50K
Projects WP4
– Contribution to WP4 of EU DataGrid– P=0.5FTE
LCG – Implementation of WP4 tools– Active progress towards day-to-day management of
large farms.– P=LCG allocation (2FTE?)
Computer Centre Supervision (PVSS)– Test PVSS for CC monitoring. Prototypes in Q1, Q2
and Q4.– P=2.7FTE, M=25K
B513 Refurbishment– Adapt B513 for 2006 needs. Remodel vault in 2002.– P=0.3FTE, M=1,700K
Macintosh Support Support for MacOS and applications, plus
backup services. We only support MacOS 9, but this is out of
date. No formal MacOS X support, but software is downloaded centrally for general efficiency.
Staffing level for Macintosh support is declining; now at 1.25FTE plus 50% service contract.– Plus 0.25FTE for CARA—used by some PC people, not just Mac
users.
Key work area in 2002 is removal of AppleTalk access to printers.– Migrate users to lpr client (already used by Frank and
Ludwig).– Streamlines general printing service—and is another
move towards an IP only network. (LocalTalk removed last year.)
Printing Support Overall service responsibility is with FIO, but
clearly much valuable assistance from– PS for OS and Software support for central servers– IS for Print Wizard
General aim is to have happy and appreciative users.– Install printers, maintain replace toner as necessary, …– Seems to be working: spontaneous outburst of public
appreciation during January’s Desktop Forum.– Promote and support projector installation in order to
reduce (expensive) colour printing. Working (if slowly) to improve remote monitoring
of printers—enable pre-emptive action.– Or (say it softly) a “Managed Print Service”
Computing Hardware Supply Aim to supply standardised hardware (desktop
PCs, portables, printers, Macs) to users rapidly and efficiently.
Migration to use of CERN’s BAAN package is almost complete. Increases efficiency through– use of a standard stock management application,– end-user purchases are by Material Request not TID,– streamlined ordering procedures.
Could we ever move to a “Managed Desktop” service rather than shifting boxes?– Idea is appreciated outside IT but needs capital.
Service relies on the Desktop Support Contract…
Service also handles CPU and Disk Servers.
Remedy Support “Remedy” was introduced to meet the needs of the
Desktop Support and Serco contracts for workflow management—problem and activity tracking.
FIO supports two Remedy Applications– PRMS for general problem tracking (the “3 level” model
for support)» Used for Desktop Contract (including the helpdesk) and within IT
– ITCM tracks direct CERNContractor activity requests for Serco and Network Management Contracts.
Do we need two different applications? Yes and No.– Two distinct needs, but could be merged.
» However, this isn’t a priority and effort is scarce.» And don’t even ask about consolidated Remedy support across
CERN!
PRMS and ITCM Developments PRMS
– Continuing focus over past couple of years has been to consolidate the basic service—integrate the many little changes that have been made to meet punctual needs.
– Outstanding requests for additional functionality include
» An improved “SLA Trigger mechanism”—defining how and when (and to whom) to raise alarms if tickets are left untreated too long.
» A “service logbook” to track interventions on a single system» Various small items including a Palm interface
ITCM– No firm developments planned, but many
suggestions are floating around. Overall, we need to migrate to Remedy 5… … and available effort is limited.
Computing Hardware Supply Aim to supply standardised hardware (desktop
PCs, portables, printers, Macs) to users rapidly and efficiently.
Migration to use of CERN’s BAAN package is almost complete. Increases efficiency through– use of standard stock management application,– end-user purchases are by Material Request not TID,– streamlined ordering procedures.
Could we ever move to a “Managed Desktop” service rather than moving boxes?– Idea is appreciated outside IT but needs capital.
Service relies on the Desktop Support Contract. Service also handles CPU and Disk Servers…
Physics Services Last year’s reorganisation split “PDP” Services
across – ADC: services to “push the envelope”– PS: Solaris and engineering services – FIO: Everything else.
Physics Services Last year’s reorganisation split “PDP” Services
across – ADC: services to “push the envelope”– PS: Solaris and engineering services – FIO: Everything else.
So, what is “Everything else”?– lxplus: main interactive service– lxbatch: ditto for batch– lxshare: time shared lxbatch extension– RISC remnants—mainly for LEP experiments– Much general support infrastructure– First line interface for physics support
Physics Service Concerns RISC Reduction RISC Reduction RISC Reduction
Managing Large Linux Clusters– Fabric Management
Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management Logistics State Management
Fabric Management Concerns Software Installation — OS and Applications
– We need rapid and rock-solid system and application installation tools.
– Development discussions are part of EDG/WP4 to which we contribute.
– Full scale testing and deployment as part of LCG project.
(Performance and Exception…) Monitoring Configuration Management Logistics State Management
Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring
– The Division is now committed to testing PVSS as a monitoring and control framework for the computer centre.
» Overall architecture remains as decided within PEM and WP4
– New “Computer Centre Supervision” Project has 3 key milestones for 2002
» “Brainless” rework of PEM monitoring with PVSS 900 systems now being monitored. Post-C5 presentation in March/April.
» Intelligent rework for Q2 then wider system for Q4
Configuration Management Logistics State Management
Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management
– How do systems know what they should install?– How does the monitoring system know what a
system should be running?– An overall configuration database is required.
Logistics State Management
Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management Logistics
– How do we keep track of 20,000+ objects?– We can’t manage 5,000 objects today.
» Where are they all? (Feb 9th: Some systems couldn’t be found)
» Which are in production? New? Obsolete? And which are temporarily out of service?
– How do physical and logical arrangements relate?» Where is this service located?» What happens if this normabarre/PDU fails?
State Management
Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management Logistics State Management
– What needs to be done to move this box » from reception
to a final locationto be part of a given service?
– What procedures should be followed if a box fails (after automatic recovery actions, naturally!)
– This is workflow management» that should integrate with overall workflow management.
Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management Logistics State Management
Work on these items is the FIO contribution to the Fabric Management part of the LHC Computing Grid Project.– Detailed activities and priorities will be set by LCG
» They are providing the additional manpower!» Planning document being prepared now based on input from
FIO and ADC.
… And where do the clusters go?
Estimated Space and Power Requirements for LHC Computing– 2,500m2 — increase of ~1,000m2
– 2MW — nominal increase of 800kW (1.2MW above current load)
Conversion of Tape Vault to Machine Room area agreed at post-C5 in June 2001.– Best option for space provision– Initial cost estimate of 1,300-1,400KCHF
We are converting the tape vault to a Machine Room area of ~1,200m2 with – False floor, finished height of 70cm– 6 “In room” air conditioning units.
» Total cooling capacity: 500kW
– 5 130kW electrical cabinets» Double power input» 5 or 6 20kW normabarres/PDU
3-4 racks of 44PCs/normabarre
– 2 130kW cabinets supplying “critical equipment area”
» Critical equipment can be connected to each PDU» Two zones, one for network equipment, one for other critical
services.
– Smoke detection, but no fire extinction
Vault Conversion
The Next Steps Create a new Substation for B513
– To power 2MW of computing equipment plus air-conditioning and ancillary loads.
– Included in the site-wide 18kV loop—more redundancy.
– Favoured location:
– Underground, but5 transformerson top.
Refurbish the main Computer Room once enough equipment has moved to the vault.
Summary Six Services
– Physics Services Remedy Support– Computing Hardware Supply Printing– Computer Centre Operations Macintosh Support
Service Developments to– Follow natural developments
» Remedy 5, LSF 4.2, RedHat 7.2
– Streamline provision of existing services» To reduce “P”—c.f. BAAN for Hardware Supply» To manage more—c.f. developments for Physics Services
Four Projects– Computer Centre Supervision B513 Refurbishment– Fabric Management Development (EDG)– Fabric Management Implementation (LCG)