fio services and projects post-c5 february 22 nd 2002 tony.cass@ cern .ch

35
FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@CERN .ch

Upload: jakeem-mooney

Post on 30-Dec-2015

25 views

Category:

Documents


0 download

DESCRIPTION

FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@ CERN .ch. Headline Services. Physics Services P=4FTE; M=800K; I=1,300K Computing Hardware Supply P=6.35FTE; M=15K; I= 200K (funded by sales uplift) Computer Centre Operations P=3.7FTE; M=100K; I= 635K Printing - PowerPoint PPT Presentation

TRANSCRIPT

FIO Services and Projects

Post-C5February 22nd 2002

[email protected]

2 [email protected]

Headline Services Physics Services

– P=4FTE; M=800K; I=1,300K Computing Hardware Supply

– P=6.35FTE; M=15K; I=200K (funded by sales uplift) Computer Centre Operations

– P=3.7FTE; M=100K; I= 635K Printing

– P=3.5FTE; M=20K; I=175K Remedy Support

– P=1.65FTE; M=215K; I=0 Mac Support

– P=1.25FTE; M=5K; I=50K

3 [email protected]

Projects WP4

– Contribution to WP4 of EU DataGrid– P=0.5FTE

LCG – Implementation of WP4 tools– Active progress towards day-to-day management of

large farms.– P=LCG allocation (2FTE?)

Computer Centre Supervision (PVSS)– Test PVSS for CC monitoring. Prototypes in Q1, Q2

and Q4.– P=2.7FTE, M=25K

B513 Refurbishment– Adapt B513 for 2006 needs. Remodel vault in 2002.– P=0.3FTE, M=1,700K

4 [email protected]

Macintosh Support Support for MacOS and applications, plus

backup services. We only support MacOS 9, but this is out of

date. No formal MacOS X support, but software is downloaded centrally for general efficiency.

Staffing level for Macintosh support is declining; now at 1.25FTE plus 50% service contract.– Plus 0.25FTE for CARA—used by some PC people, not just Mac

users.

Key work area in 2002 is removal of AppleTalk access to printers.– Migrate users to lpr client (already used by Frank and

Ludwig).– Streamlines general printing service—and is another

move towards an IP only network. (LocalTalk removed last year.)

5 [email protected]

Printing Support Overall service responsibility is with FIO, but

clearly much valuable assistance from– PS for OS and Software support for central servers– IS for Print Wizard

General aim is to have happy and appreciative users.– Install printers, maintain replace toner as necessary, …– Seems to be working: spontaneous outburst of public

appreciation during January’s Desktop Forum.– Promote and support projector installation in order to

reduce (expensive) colour printing. Working (if slowly) to improve remote monitoring

of printers—enable pre-emptive action.– Or (say it softly) a “Managed Print Service”

6 [email protected]

Computing Hardware Supply Aim to supply standardised hardware (desktop

PCs, portables, printers, Macs) to users rapidly and efficiently.

Migration to use of CERN’s BAAN package is almost complete. Increases efficiency through– use of a standard stock management application,– end-user purchases are by Material Request not TID,– streamlined ordering procedures.

Could we ever move to a “Managed Desktop” service rather than shifting boxes?– Idea is appreciated outside IT but needs capital.

Service relies on the Desktop Support Contract…

Service also handles CPU and Disk Servers.

7 [email protected]

Remedy Support “Remedy” was introduced to meet the needs of the

Desktop Support and Serco contracts for workflow management—problem and activity tracking.

FIO supports two Remedy Applications– PRMS for general problem tracking (the “3 level” model

for support)» Used for Desktop Contract (including the helpdesk) and within IT

– ITCM tracks direct CERNContractor activity requests for Serco and Network Management Contracts.

Do we need two different applications? Yes and No.– Two distinct needs, but could be merged.

» However, this isn’t a priority and effort is scarce.» And don’t even ask about consolidated Remedy support across

CERN!

8 [email protected]

PRMS and ITCM Developments PRMS

– Continuing focus over past couple of years has been to consolidate the basic service—integrate the many little changes that have been made to meet punctual needs.

– Outstanding requests for additional functionality include

» An improved “SLA Trigger mechanism”—defining how and when (and to whom) to raise alarms if tickets are left untreated too long.

» A “service logbook” to track interventions on a single system» Various small items including a Palm interface

ITCM– No firm developments planned, but many

suggestions are floating around. Overall, we need to migrate to Remedy 5… … and available effort is limited.

9 [email protected]

Computing Hardware Supply Aim to supply standardised hardware (desktop

PCs, portables, printers, Macs) to users rapidly and efficiently.

Migration to use of CERN’s BAAN package is almost complete. Increases efficiency through– use of standard stock management application,– end-user purchases are by Material Request not TID,– streamlined ordering procedures.

Could we ever move to a “Managed Desktop” service rather than moving boxes?– Idea is appreciated outside IT but needs capital.

Service relies on the Desktop Support Contract. Service also handles CPU and Disk Servers…

10 [email protected]

Physics Services Last year’s reorganisation split “PDP” Services

across – ADC: services to “push the envelope”– PS: Solaris and engineering services – FIO: Everything else.

11 [email protected]

Physics Services Last year’s reorganisation split “PDP” Services

across – ADC: services to “push the envelope”– PS: Solaris and engineering services – FIO: Everything else.

So, what is “Everything else”?– lxplus: main interactive service– lxbatch: ditto for batch– lxshare: time shared lxbatch extension– RISC remnants—mainly for LEP experiments– Much general support infrastructure– First line interface for physics support

12 [email protected]

Physics Service Concerns RISC Reduction

13 [email protected]

Physics Service Concerns RISC Reduction RISC Reduction

14 [email protected]

Physics Service Concerns RISC Reduction RISC Reduction RISC Reduction

15 [email protected]

Physics Service Concerns RISC Reduction RISC Reduction RISC Reduction

Managing Large Linux Clusters– Fabric Management

16 [email protected]

Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management Logistics State Management

17 [email protected]

Fabric Management Concerns Software Installation — OS and Applications

– We need rapid and rock-solid system and application installation tools.

– Development discussions are part of EDG/WP4 to which we contribute.

– Full scale testing and deployment as part of LCG project.

(Performance and Exception…) Monitoring Configuration Management Logistics State Management

18 [email protected]

Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring

– The Division is now committed to testing PVSS as a monitoring and control framework for the computer centre.

» Overall architecture remains as decided within PEM and WP4

– New “Computer Centre Supervision” Project has 3 key milestones for 2002

» “Brainless” rework of PEM monitoring with PVSS 900 systems now being monitored. Post-C5 presentation in March/April.

» Intelligent rework for Q2 then wider system for Q4

Configuration Management Logistics State Management

19 [email protected]

Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management

– How do systems know what they should install?– How does the monitoring system know what a

system should be running?– An overall configuration database is required.

Logistics State Management

20 [email protected]

Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management Logistics

– How do we keep track of 20,000+ objects?– We can’t manage 5,000 objects today.

» Where are they all? (Feb 9th: Some systems couldn’t be found)

» Which are in production? New? Obsolete? And which are temporarily out of service?

– How do physical and logical arrangements relate?» Where is this service located?» What happens if this normabarre/PDU fails?

State Management

21 [email protected]

Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management Logistics State Management

– What needs to be done to move this box » from reception

to a final locationto be part of a given service?

– What procedures should be followed if a box fails (after automatic recovery actions, naturally!)

– This is workflow management» that should integrate with overall workflow management.

22 [email protected]

Fabric Management Concerns Software Installation — OS and Applications (Performance and Exception…) Monitoring Configuration Management Logistics State Management

Work on these items is the FIO contribution to the Fabric Management part of the LHC Computing Grid Project.– Detailed activities and priorities will be set by LCG

» They are providing the additional manpower!» Planning document being prepared now based on input from

FIO and ADC.

23 [email protected]

… And where do the clusters go?

Estimated Space and Power Requirements for LHC Computing– 2,500m2 — increase of ~1,000m2

– 2MW — nominal increase of 800kW (1.2MW above current load)

Conversion of Tape Vault to Machine Room area agreed at post-C5 in June 2001.– Best option for space provision– Initial cost estimate of 1,300-1,400KCHF

24 [email protected]

We are converting the tape vault to a Machine Room area of ~1,200m2 with – False floor, finished height of 70cm– 6 “In room” air conditioning units.

» Total cooling capacity: 500kW

– 5 130kW electrical cabinets» Double power input» 5 or 6 20kW normabarres/PDU

3-4 racks of 44PCs/normabarre

– 2 130kW cabinets supplying “critical equipment area”

» Critical equipment can be connected to each PDU» Two zones, one for network equipment, one for other critical

services.

– Smoke detection, but no fire extinction

Vault Conversion

25 [email protected]

The Vault Not So Long Ago

26 [email protected]

An Almost Empty Vault

27 [email protected]

One air conditioning room…

28 [email protected]

…is gone.

29 [email protected]

The next…

30 [email protected]

…is on the way out

31 [email protected]

The Next Steps Create a new Substation for B513

– To power 2MW of computing equipment plus air-conditioning and ancillary loads.

– Included in the site-wide 18kV loop—more redundancy.

– Favoured location:

– Underground, but5 transformerson top.

Refurbish the main Computer Room once enough equipment has moved to the vault.

32 [email protected]

View from B31 Today

33 [email protected]

View from B31 with Substation

34 [email protected]

Looking Towards B31

35 [email protected]

Summary Six Services

– Physics Services Remedy Support– Computing Hardware Supply Printing– Computer Centre Operations Macintosh Support

Service Developments to– Follow natural developments

» Remedy 5, LSF 4.2, RedHat 7.2

– Streamline provision of existing services» To reduce “P”—c.f. BAAN for Hardware Supply» To manage more—c.f. developments for Physics Services

Four Projects– Computer Centre Supervision B513 Refurbishment– Fabric Management Development (EDG)– Fabric Management Implementation (LCG)