ab-co-technical committee, 21st september 2006 1 functionality of the control system after...

25
AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help from Enzo Genuardi and Jean Juillard http://ab-co-tech-committee.web.cern.ch/ab-co-tech-committee /

Upload: osborne-hines

Post on 24-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

1

Functionality of the Control System after infrastructure failures

Alastair Bland (AB/CO/IN)with help from Enzo Genuardi and Jean Juillard

http://ab-co-tech-committee.web.cern.ch/ab-co-tech-committee/

Page 2: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

2

Introduction

Intended audience

Who to call when there are problems

Building 874 infrastructure (access, powering, cooling)

Building 874 computers

First steps to take when there is an infrastructure failure

What the TI operators need to work

Power off (voluntary or forced) then power on in the CCC and CCR

Recommendations for the future

Annex: Simplified Timeline of 29/7/2006 power cut

Page 3: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

3

Intended Audience

Anyone who might have to deal with a major power, cooling or network problem affecting the AB Control System and in particular the Control Room environment

– Technical Infrastructure (TI) operators

– Accelerator operators dealing with Access, Radiation, etc.(getting beams back is not covered – this requires Timing Experts and a lot of Front End rebooting!)

– AB/CO Exploitation Team

– AB/CO Specialists

– AB/CO Supervisors

Emphasis is on Building 874, Prevessin

Page 4: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

4

Who to call when there are problems

When there is a serious power or network problem the first difficulty in the modern paperless world is who to call and what is their number!

– Windows: the phone book if already by the user on the computer should be cached on the hard disk

– Linux: in an xterm type:/usr/bin/phone SURNAME

This will not work if IT Building 513 is powered down or not available due to network problems. Starting an xterm is difficult when the AB/CO NFS file servers ABSRV1 or CS-CCR-FEOP are not available

– Legacy HPUX: in an xterm type:cd /user/pcrops/production/phonex./xpb

This program has certain advantages over all the others but the database may not be up to date.

Do not forget that your own portable phone usually has a good list of colleagues. If Sunrise GSM does not work switch to another operator if your subscription allows this.

Page 5: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

5

Useful phone numbers

NAME Group Speciality Telephone GSM HomeSCHMICKLER Hermann AB/CO Group Leader 77078 74788 164004SICARD Claude-Henri AB/CO Exploitation Manager 73071CHARRUE Pierre AB/CO/IN Section Leader 75410 163230BLAND Alastair AB/CO/IN Windows/Linux Software 75568 163727GENUARDI Enzo AB/CO/IN Linux/HPUX Hardware 75537 163395BAKKER Dirk AB/CO/IN Video distribution 75575 163235BALLET-THOUBLE Rene AB/CO/IN Control Room hardware 75637 75144 163231ELYN Jean-Michel AB/CO/IN Linux software 78754 163591GLAFIROV Vladimir AB/CO/IN Windows/Linux Software 79369 SIGERUD Katarina AB/CO/AP Laser Alarm System 71464 79898 164648STAPLEY Niall AB/CO/AP Laser Alarm System 75834 79898 160918DE METZ-NOBLAT Nicolas AB/CO/FE Front Ends, Linux 73487 163070SURBACK Guy AB/CO/FE Front Ends, remote reboot 72718

The CO Exploitation team could also be called, especially if the PS is affected.

Page 6: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

6

AB/CO Organigram of September 2006

Page 7: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

7

Building 874 plan (ground floor only)

Warning: this plan is old and does not show the corridor correctly!

Page 8: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

8

How to get into Building 874

Normally you will need access privileges CCCPRIV and CCRPRIV

– Ask for this via EDH

– You need the special cards with the RFID chip, it is worth testing that you can get in by waving your card in front of the CCC and CCR readers

During a power failure the access system to the CCC and CCR may fail

The glass doors to the CCC normally default to open without power

– CO have a key to the CCC external doors. As there should always be operators in the CCC this should not be needed

Apparently the access control for the outside doors fails to the locked state

– There is another way into Building 874

Apparently the access control for the CCR fails to locked as well

– The operators’ key can open the CCR

– I recommend blocking the CCR door open with a chair

Page 9: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

9

CERN Control Centre (CCC)

Warning: this simulation is old, in particular the TI and Cryo desks are joined!

Page 10: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

10

Powering and Cooling in the CCC

The CCC Cooling is powered by normal electricity

– The CCC does not overheat rapidly, so this is not a problem

There are two sources of normal power: EBD5 and EBD6

– This feeds the ceiling lights and the sixteen 46 inch screens on the wall

– After a power cut use the remote control to waken the wall screens

The UPS power comes from EOD3

– Like EOD2 in the CCR this runs until it drops (no set time limit), on 29/7/2006 it ran around an hour but was within a few tens of minutes of dropping

– All the Consoles should be on EOD3 and almost all the screens are too

The lights on the tables are on EOD3, they are the emergency lighting

The IP telephones are powered from the network starpoint

LHC island

Page 11: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

11

Computer Control Room (CCR) and Network Starpoint

CCR

<- Intercom -> <- Acc.-> <------- C A T V Machines -------> Teletext <-- ASOS SPS analog, sign.->

Pages

TS/CV <--------- RF - LHC -------> A C C E S <-------------------------- Central Timing -------------------------> <--- PC's from PS ---> S

<---------------------------------- HPUX servers and Linux PC's ------------------------------------>

wireless LAN

<---- CCC/CCR switches and routers ---->

IT/CS Network Starpoint (locked - TI Operators have key) Maintenance Lab (Dirk and Rene)

CCR and Network Starpoint rack layoutwith logical UPS powering arrangement

Alastair Bland (AB/CO/IN), 19/09/2006

based on CCR layout diagram of Rene Ballet-Thouble and Claes Frisk

Thermometer zone HP ProLiants

CCRPRIV card reader

Rack legend:

Tiles with holes netw ork green rack blue rack HP rack console table 6 x netw ork outlets 4 x netw ork outlets

302 303 304 305 306 307free

310 311 312 313 314 315 316 317 318 319 320 321 322 323

RA7405

RA7406free

RA7407free

RA7408free

RA7409free

RA7410free

RA7411Beam Int.

RA 5306RA 5307RA 5308RA 5309RA 5310

1206absrv1

12201219121612111207 tcrsrv1

RA5621

RA5624

RA5616

308free

EOD1

EOD2

EBD4

RA5416

RA5419

RA5421

RA5424

RA6118 RA6119

RA6121

RA6124

606Net.

621Net.

1215Net.

1214elsrv1

1210samoa

WinXPConsole

RA5619

RA5420

RA5620

RA6120

917Net.

Linux Console

917 921914913912911910909908907906

607 608 609 610 611 612 613 614 615 616 617 618 619 620

rechargeable lamp

610606Net.

302 1206

310free

309free

606 621

918

1216stsrv1

1215

Ram- ses

TNRouter

terminal

GPN router+tele

PatchTNSw itches

fiber sw itches

GPN Sw itches

Patch

IT/CSServers

I

B

E

S

TN LHC / SPSTN LHC /

SPS

F

R

EOD9

GPN Prev. Router

Air Con.

AirCon.VentOut

EBD1

togallery

Access

SunriseGSM

cs-ccr-feop

laser2+tcrpl*laser1+tcrpl*

Page 12: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

12

CCR computer area

CCR

<- Intercom -> <- Acc.-> <------- C A T V Machines -------> Teletext <-- ASOS SPS analog, sign.->

Pages

<--------- RF - LHC -------> A C C E S <-------------------------- Central Timing -------------------------> <--- PC's from PS ---> S

<---------------------------------- HPUX servers and Linux PC's ------------------------------------>

wireless LAN

Maintenance Lab (Dirk and Rene)

Thermometer zone HP ProLiants

302 303 304 305 306 307free

310 311 312 313 314 315 316 317 318 319 320 321 322 323

RA7405

RA7406free

RA7407free

RA7408free

RA7409free

RA7410free

RA7411Beam Int.

RA 5306RA 5307RA 5308RA 5309RA 5310

1206absrv1

12201219121612111207 tcrsrv1

RA5621

RA5624

RA5616

308free

EOD1

EOD2

EBD4

RA5416

RA5419

RA5421

RA5424

RA6118 RA6119

RA6121

RA6124

606Net.

621Net.

1215Net.

1214elsrv1

1210samoa

WinXPConsole

RA5619

RA5420

RA5620

RA6120

917Net.

Linux Console

917 921914913912911910909908907906

607 608 609 610 611 612 613 614 615 616 617 618 619 620

rechargeable lamp

310free

309free

606 621

918

1216stsrv1

1215

Ram- ses

AirCon.VentOut

togallery

Access

cs-ccr-feop

laser2+tcrpl*laser1+tcrpl*

Page 13: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

13

Powering and cooling in the CCR (general)

The cooling is powered by normal electricity.

– It sucks air out of the floor (“Air Con. Vent out” on the plan)

Once we lose normal power the CCR heats up very quickly

– Normal temperature for rack of LASER2 is 24 degrees centigrade

– It was 33 degrees centigrade at 12:15 on 29/7/2006

– You must block open the door to the corridor, open outside doors, open starpoint door (key from TI operator).

– If more than 35 degrees: open outside door of starpoint and door/windows of maintenance lab. If still too hot you must start switching off equipment

The ceiling lights are on normal power

– There is a rechargeable lamp in the Maintenance lab. Use it as you do not want to fall through the false floor if it is open!

Power is distributed to racks via “normabarres”. The source is clearly labeled.

Please check if any of the “multiprises” have tripped due to overload or the 30mA to earth detection

Page 14: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

14

Powering and cooling in the CCR (computers)

EOD1 and EOD2 give UPS power. There are no batteries in the CCR, they are fed from the old TCR (building 212) diesels or with normal power or batteries in SE0 (building 924, Prevessin).

The HP type racks are powered from EOD1 and EOD2:

– EOD1 is cut after 10 minutes (this occurred on 29/7/2006)

– EOD2 runs until it drops (it ran without cutting on 29/7/2006)

– HP Proliants, HP network switches in the “starpoint deporté” and disks of ABSRV1 and TCRSRV1 are really dual powered

– Keyboard, Video and Mouse (KVM) switches and the CRT or TFT screens for HP Proliants are powered by EOD1 or EOD2

– All HPUX systems including the two boxes forming ABSRV1 and TCRSRV1 are powered by EOD1 or EOD2

All other racks are powered by EOD1 only

Check voltage and current here

Page 15: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

15

IT/CS Network Starpoint

TS/CV A C C E S S

<---- CCC/CCR switches and routers ---->

IT/CS Network Starpoint (locked - TI Operators have key)

TNRouter

terminal

GPN router+tele

PatchTNSw itches

fiber sw itches

GPN Sw itches

Patch

IT/CSServers

I

B

E

S

TN LHC / SPSTN LHC /

SPS

F

R

EOD9

GPN Prev. Router

Air Con.

AirCon.VentOut

SunriseGSM

Page 16: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

16

IT Network Starpoint cooling and powering

The network starpoint is cooled by an air conditioner in the room. It used to leak water but was fixed after 29/7/2006.

The starpoint has two power sources:

– EOD2 from the CCR

– EOD9 in the starpoint. EOD9 contains batteries itself (less than 10 minutes available?). It is fed from normal electricity at the moment.

The Technet and General Purpose Network routers are dual powered.

The IT/CS Spectrum system, IP-DNS-4 and IP-TIME-4 are dual powered.

The HP Procurve Gigabit switches have only one 230V input however they are often (but not always) associated with a HP Redundant Power Supply (RPS) which supplies low voltage power in case they lose 230 volts or the internal power supply fails.

– Beware: we have noticed a tendency for these RPS units to trip their own circuit breakers when there is power loss to the switches.

Page 17: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

17

First steps to take when there is an infrastructure failure

The TI operators should call us if:

– The air conditioning in the CCR is not running

– We have lost any of the UPS sources (EOD1, 2, 3 or 9)

When you arrive, prepare pen and paper and note down:

– The time (from the CCC Rolexes!)

– The situation now

– Any interventions you perform or people called

• If you have a camera or camera phone take a picture of a Rolex (to synch real time with camera time) then take pictures, preferably without flash, of the racks and equipment before and after you flick a switch back on.

– Save useful logfiles such as /var/log/messages before they are overwritten

– Your leaving timeAll this is vital because there will be “a fact finding mission” or “investigation”

Feel free to call your supervisor (you can delegate decisions and responsibility to him or her!)

You cannot fix the whole Control System on your own: get one of your technical colleagues called in too.

Page 18: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

18

What the TI operators need to work

Their Windows Consoles

The Phone book

The Electrical Network Supervisor

– This is in the old TCR, building 212, behind a firewall

– Started from “Startup” of TIOP login which starts Exceed to our HPUX system ELSRV1. From there the old X-Motif Console Manager is used to actually start the ENS programs.

The LASER Alarm System

– Started from Java Console Manager (probably needs ABSRV1, CS-CCR-WWW1)

– Needs HP Proliants LASER2 (oc4j + SonicMQ), SLJAS2 and SLJAS3 (SonicMQ). The HPUX system SLJAS1 (SonicMQ) is also probably needed. If LASER1/2 restart then TCRSC01 or 02 Oracle Databases in Building 513 must be running

The TIM system

– Runs on HP Proliants called TCRPL*. Needs TCRSC01 or 02 for login.

XCLUC (for monitoring the Proliants, HPUX systems and Front Ends)

Page 19: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

19

Scenario

Like 29/7/2006

– We lose normal power

– The diesels start but run only a few minutes

– 10 minutes later we lose EOD1 and EOD9

– You arrive

– Your aim is to make the UPS last as long as possible

Unlike 29/7/2006

– We then lose EOD2

– You have to restart all the HP Proliants, the HPUX systems, etc.

Page 20: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

20

Power off in the CCC

Organize powering off of all unused Consoles and Screens in CCC. This is done by briefly pushing the HP DC7600 power switch (clean shutdown for Windows, dirty for Linux)

Do not turn off:

– TI consoles and CSAM (CWO-CCC-C0WF, C1WF, C2WC, C2WF, C0WA, C1WA and STCSAM-TCR2)

– 2 Cryo Consoles (CWO-CCC-C4WC and C8WC)

– 2 PS Access Consoles (CWO-CCC-B9WC and B9WF) and Radiation Console (CWO-CCC-B9LC)

– 4 SPS and North Areas Consoles (CWO-CCC-A8WC, A9WC, A8LF and A9LF)

– If the LHC Access system has been installed leave it on

– One Linux and one Windows system in each Island for general use

Turn off most of the Watch4TV Linux boxes (not the Access ones!). Leave one per island at least.

Turn off the wall display Linux systems (CS-CCR-A, B, C and DWALL) as presumably the wall displays are cut already.

Page 21: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

21

Power Off in the CCR

If you are tempted to economize power in the CCR beware of:

– Briefly pressing the power button on a Linux machine performs a dirty shut down of the machine. This may corrupt:

• SonicMQ databases (LASER1/2, SLJAS*, TCRPL*, ABCOPL8 for Oasis)

• PVSS databases (CS-CCR-Q*, QPS*, WIC01, PIC01)

• Linux/VMware/WindowsXP/Wizcon systems (CS-CCR-CV*)

– Many other machines could be shut down in the CCR but the list has not been drawn up. Try CS-CCR-SPARE*!

To cleanly shut down Linux systems you need to have the root password or be in the list of “sudoers”. Many of you are already in this list which includes all the CO Exploitation team. The command to execute is:

shutdown –h now

Many of the AB/CO/IN team can do this remotely from home. They can also power them back on using the Integrated Lights Out (ILO) web pages.

HP Proliant rack

HP Proliant 380DL G4

Page 22: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

22

Power on in the CCR

The basic order of restart after a complete power loss is:

– SAMOA (HPUX Yellow Pages Server)

– HPDEPOT (HPUX 11 infrastructure)

– ABSRV1 (composed of either ABSRV2 or ABSRV3)

– TCRSRV1 (composed of either TCRSRV2 or TCRSRV3), STSRV1, ELSRV1

– CS-CCR-FEOP, CS-CCR-FELAB, CS-CCR-NFS*

– CS-CCR-INF* (for XCLUC, Big Brother, LEMON, etc.), ABSPS1

– CS-CCR-WWW1, HPSLWEB (Web Servers), SEATTLE, LSASRV1

– CS-CCR-CMW1, SLJAS1, SLJAS2, SLJAS3 (Middleware)

– LASER1 and LASER2 (Alarms)

– TCRPL* (TIM)

– The rest of the HP Proliants, legacy PC desktop Linux boxes and HPUX systems (may need fsck –y) followed by the Consoles in the CCC

– Make sure the Front Ends for Remote Reboot called RMSPCR and RMSTCR are running.

Page 23: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

23

Recommendations for the future

Train the TI Operators to:

– Call us quickly (I was called by an SPS operator at 12:15 29/7/2006)

– Check the switches and air conditioning in the CCR and StarpointI have done this for quite a few TI and SPS operators already

Train other members of CO what to do and provide documentationI hope this Technical Committee presentation has achieved this aim

Fix the main weaknesses in the infrastructure

– Errors in HP Proliant cabling - done

– HP Proliant Firmware update – partially done

– Cable HP Procurve switches on EOD2 to avoid trips - started

– Move the XCLUC/Clogger system to a dual powered system – started

– Move our online backup machines CS-CCR-BACKUP* elsewhere (513?) – dialog with IT started

– Consider making all our NFS fileservers dual powered – there are pros and cons, principally software reliability versus hardware reliability

Page 24: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

24

Annex: Simplified Timeline of 29/7/2006 power cut (1/2)

7h47 : Explosion of the Swiss/French Auto Transfer system

7h49 : Start of the Diesel(s) in Building 212 (Jura)

7h56 : loss of EOD9

– loss of General Network CESAM switch

– loss of switch connecting half the CCC IP phones (including 72200)

8h35 : loss of diesel

8h47 : loss of EOD1, dropped after 10 minutes of missing input power

– Crash and restart of LASER1, LASER2 and CS-CCR-Q4DS3 (cryo), HP claims that this can be fixed by upgrading the firmware. TI Operators probably lost Alarm System at this moment.

– loss of 6 HP Proliants with both inputs connected to EOD1 only (now fixed)

9h23 : Loss of UPS in Building 866 leads to loss of Telephone Node 5 and the backup in Building 58 is not accessible. Total loss of IP + “analog” phones + Sunrise GSM antenna. TI could receive calls via Orange France but not make them.

Page 25: AB-CO-Technical Committee, 21st September 2006 1 Functionality of the Control System after infrastructure failures Alastair Bland (AB/CO/IN) with help

AB

-CO

-Tec

hnic

al C

omm

ittee

, 21s

t Sep

tem

ber

2006

25

Annex: Simplified Timeline of 29/7/2006 power cut (2/2)

9h44 : Re-powering of EBD5 and EBD4 but EBD1 in CCR trips and EBD1, EBD2, EBD3 and EBD6 are not working. Overload in the line before EOD9 means EOD9 is also not re-powered. Building 866 telephony re-powered.

9h45 : Re-powering of EOD2 and EOD3 from normal power. However source of EOD1 does not have automatic re-enable so EOD1 is down.

9h46 : Loss of the Technical Network switches because the “disjoncteur” has tripped

9h50 : Telephone Node 5 restarts, “analog” phones work.

13h45: Source of EOD1 is manually re-enabled. So EOD1 now works.Slightly before the “Star Point deporté” for in particular LASER2 and CS-CCR-FEOP was fixed, as were the Technet Switches in the main Starpoint.

adapted from “Rapport coupure CCC du 29-07-2006.pdf” by Jean Juillard (AB/OP/TI)

see also “Report on the Power Cut of 29/7/2006” by Alastair Bland at the end of https://edms.cern.ch/file/766492/1/MAM27.doc