console infrastructure in the cern computer centre [email protected] hepix / hepnt autumn 2003...

14
Console Infrastructure in the CERN Computer Centre [email protected] HEPiX / HEPNT Autumn 2003 Vancouver Mostly work done by [email protected]

Upload: norman-walton

Post on 29-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Console Infrastructure in the CERN Computer Centre

[email protected]

HEPiX / HEPNT Autumn 2003

Vancouver

Mostly work done by [email protected]

HEPiX Vancouver: Console management at CERN 2Helge.Meinhard (at) cern.ch

The problem

CERN CC is running large farms CPU servers: now 1500 boxes, 6000* in 2006 Disk/tape servers: now 300 boxes, 1200* in 2006*) Error bar: ~ factor 2

Attempt at high-level management solution: ELFms T. Kleinwort

Low-level problems E.g. machine unpingable Console access and/or reset required

HEPiX Vancouver: Console management at CERN 3Helge.Meinhard (at) cern.ch

Existing solutions…

… do not scale

HEPiX Vancouver: Console management at CERN 4Helge.Meinhard (at) cern.ch

Requirements

Considered systematically in summer 2003

Main points: Remote console access

To boot loader and operating system (Linux) Preferably to BIOS as well

Remote reset ATX reset and/or ATX power on/off and/or Remote power cycling

HEPiX Vancouver: Console management at CERN 5Helge.Meinhard (at) cern.ch

Options

(1 CHF = 0.75 USD = 0.65 EUR = 1 CAD)† : yes, but…

HEPiX Vancouver: Console management at CERN 6Helge.Meinhard (at) cern.ch

Prototypes

Serial daisy-chaining Up to 4 nodes BIOS, boot loader, OS Console: minicom

But few boards come with two serial lines these days…

Remote reset

port 0

port 1

port 0

port 1

port 0

port 1

port 0

port 1

HEPiX Vancouver: Console management at CERN 7Helge.Meinhard (at) cern.ch

Decisions

Infrastructure for serial console via serial cards in PCs to be deployed

Nothing else for now (no remote reset etc.) 24 x 7 operator coverage can step in Many services are redundant

Specs for all new servers require support for Redirection of BIOS to serial line… and controllable system behaviour (stay off vs.

previous state) on power cycle

HEPiX Vancouver: Console management at CERN 8Helge.Meinhard (at) cern.ch

Serial infrastructure: head nodes

Dedicated head nodes vs. worker nodes serving as heads for small number of peers+ Cleaner – all worker nodes remain the same

+ Can be used for other head node applications (e.g. software distribution) if desired

– Extra investment, extra space

– If down, larger number of machines inaccessible via serial console

Decided in favour of dedicated head nodes

HEPiX Vancouver: Console management at CERN 9Helge.Meinhard (at) cern.ch

Concentration factor, scope

Head nodes equipped with 6 8-port cards Complete head node (w/o serial cables) is about 1800 CHF By far cheaper than higher number of ports per console server,

even though more console servers needed Will equip all CERN computer centre

Machine rooms on ground floor and basement Except Windows machines, machines dedicated to network

services Procurement running for 75 head nodes Cross-connection of head nodes not decided yet

Some free ports on head nodes

HEPiX Vancouver: Console management at CERN 10Helge.Meinhard (at) cern.ch

Software

Need a bit more than minicom Logging into one of ~75 servers and requesting /dev/ttyS25 not going to

scale Authentication and authorisation Logging of console output

Started prototyping our own solution (Andras Horvath / CERN) Put on hold when we learned (at HEPiX Amsterdam) of …

Software by Chuck Boeheim (SLAC) used at SLAC, Fermi, LBL, … Provides most of the functionality we require CERN-specific extensions can be easily added (wrapper scripts) Constructive discussions with Chuck, expect to share the work Aim is one common code base

HEPiX Vancouver: Console management at CERN 11Helge.Meinhard (at) cern.ch

xxx

pcitfionnn

Software schematics

lxplusnnn

Userapp

CDB – config service

• Machine – port @ head node mapping

• User – machine authorisations

Console server 1

Serverproc

conf

log

Machine 1.1

Machine 1.44

.

.

.

.

RS/232

Console server 75

Serverproc

conf

log

Machine 75.1

Machine 75.44

.

.

.

.

Console logrepository

HEPiX Vancouver: Console management at CERN 12Helge.Meinhard (at) cern.ch

Software components

User application Should run on all on-site Linux machines; Windows, Solaris?

Console application on head nodes Grants and logs access to serial lines Logs console output

Configuration service Machine – port @ head node mapping User – machine mapping (authorisation to access serial line)

Store for console logs Nothing on machines…

HEPiX Vancouver: Console management at CERN 13Helge.Meinhard (at) cern.ch

Software: TBD

On our wishlist: Authentication of head node towards user app, and of

user towards server process on head node Per-line control of access right (Possibility of) logging via syslog

CERN-specific extensions being designed Machine detection, feedback to config service Wrapper around user app asking config service to

provide mapping of machine to (port @) head node Automatic creation of local config files on head nodes Collection of console logs in central repository

HEPiX Vancouver: Console management at CERN 14Helge.Meinhard (at) cern.ch

Status, outlook

HW: Orders for head nodes, serial cards, cables out or being finalised Expected delivery: 2H November 2003

SW: Started discussing and investigating adaptations, CERN-specific elements being designed

Hope to have first head node ready in time for next disk server delivery (early December; no KVM switches!)

Full deployment will run well into 2004