managed by ut-battelle for the department of energy best ever alarm system tool xihui chen, katia...

37
Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL [email protected] April 2009

Upload: william-sharp

Post on 20-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

Managed by UT-Battellefor the Department of Energy

Best Ever Alarm System Tool

Xihui Chen,

Katia Danilova,

Kay Kasemir

SNS/ORNL

[email protected]

April 2009

Page 2: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

2 Managed by UT-Battellefor the Department of Energy

Previous Attempts First ALH,

then soft-IOCs and EDM generated from ALH config.(Pam Gurd)

Issues– GUI

Static Layouts N clicks to see (some of the) active alarms

– Configuration .. was bad Always too many alarms Changes required contacting one of the 2 experts, wait

~days, restart CA gateway, hope that nothing else broke

– Information Operator guidance? Related displays? Most frequent alarm? Timeline of alarm?

Page 3: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

3 Managed by UT-Battellefor the Department of Energy

New End-User View: Alarm Table

All currentalarms– new, ack’ed

Sort by PV,Descr., Time, Severity, …

Optional: Annunciate or Enunciate

Acknowledge one or multiple alarms– Select by PV or description

– BNL/RHIC type un-ack’

Page 4: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

4 Managed by UT-Battellefor the Department of Energy

Another View: Alarm Tree

All alarms– Disabled, inactive, new, ack’ed

Hierarchical– Optionally only show

active alarms

– Ack’/Un-ack’ PVs or sub-tree

Page 5: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

5 Managed by UT-Battellefor the Department of Energy

Guidance, Related Displays, Commands

Basic Text

Start EDM screen

Open web page

Run ext. command

Hierarchical:Including info of parent entries

Merges Guidance etc. from all selected alarms

Page 6: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

6 Managed by UT-Battellefor the Department of Energy

.. Within CSS

Alarms

History of PV

EPICS Config.

Page 7: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

7 Managed by UT-Battellefor the Department of Energy

CSS Context Menus Connect the Tools

Send alarmPV to anyother CSSPV tool

Page 8: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

8 Managed by UT-Battellefor the Department of Energy

E-Log Entries

“Logbook”from context menucreates text w/basic info aboutselected alarms.Edit, submit.

Pluggable implementation, not limited to Oracle-based SNS ELog

Page 9: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

9 Managed by UT-Battellefor the Department of Energy

.. may require Authentication/Authorization

Log in/out while CSS is running

Online Configuration Changes

Page 10: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

10 Managed by UT-Battellefor the Department of Energy

Add PV or Subsystem

1. Right-click on ‘parent’

2. “Add …”

3. Enter name

Online. No search for config files, no restarts.

Page 11: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

11 Managed by UT-Battellefor the Department of Energy

Configure PV

Again online

Especially usefulfor operators toupdate guidanceand relatedscreens.

Page 12: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

12 Managed by UT-Battellefor the Department of Energy

Logging

..into generic CSS log also used for error/warn/info/debug messages

Alarm Server: State transitions, Annunciations

Alarm GUI: Ack/Un-Ack requests, Config changes

Generic Message History Viewer– Example w/ Filter on TEXT=CONFIG

Page 13: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

13 Managed by UT-Battellefor the Department of Energy

Logging: Get timeline

Example: Filter on TYPE, PV

1. PV triggers,clears, triggers again

2. Alarm Server latches alarm

4. Problem fixed

3. Alarm Server annunciates

5. Ack’ed by operator

6. All OK

Page 14: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

14 Managed by UT-Battellefor the Department of Energy

All Sorts of Web Reports

Page 15: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

15 Managed by UT-Battellefor the Department of Energy

Technical View

Alarm Cfg & StateRDB

Alarm Cfg & StateRDB

IOCsIOCs

Alarm ServerCurrent Alarms: Acknowledged? Transient? Annunciated?

Alarm ServerCurrent Alarms: Acknowledged? Transient? Annunciated?

LOGLOG

MessageRDB

MessageRDB

JMS2

Speech

JMS2

Speech

JMS2

RDB

JMS2

RDB

Tomcat-ReportsTomcat-Reports

CSS ApplicationsCSS Applications

Alarm Client GUI

JMS

Alarm Updates Ack’; Config UpdatesAnnunciationsLog Messages

TALKTALK ALARM_CLIENTALARM_CLIENTALARM_SERVERALARM_SERVER

PV Updates (Channel Access, …)

Page 16: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

16 Managed by UT-Battellefor the Department of Energy

General Alarm Server Behavior

Latch highest severity, or non-latching– like ALH “ack. transient”

Annunciate

Chatter filter ala ALH Alarm only if severity persists some minimum time .. or alarm happens >=N times within period

Optional formula-based alarm enablement:– Enable if “(pv_x > 5 && pv_y < 7) || pv_z==1”

– … but we prefer to move that logic into IOC

When acknowledging MAJOR alarm, subsequent MINOR alarms not annunciated– ALH would again blink/require ack’

Page 17: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

17 Managed by UT-Battellefor the Department of Energy

Best Ever Alarm System Tools, Indeed

.. but Tools are only half the issue

Good configuration requires plan & follow-up.

B. Hollifield, E. Habibi,"Alarm Management: Seven (??) Effective Methods for Optimum Performance", ISA, 2007

Page 18: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

18 Managed by UT-Battellefor the Department of Energy

Alarm Philosophy

Goal:

Help operators take correct actions

– Alarms with guidance, related displays

– Manageable alarm rate (<150/day)

– Operators will respond to every alarm(corollary to manageable rate)

Page 19: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

19 Managed by UT-Battellefor the Department of Energy

DOES IT REQUIRE IMMEDIATE OPERATOR ACTION?– What action? Alarm guidance!

Not “make elog entry”, “tell next shift”, … Consider consequence of no action

Is it the best alarm?– Would other subsystems, with better PVs, alarm at the

same time?

What’s a valid alarm?

Page 20: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

20 Managed by UT-Battellefor the Department of Energy

How are alarms added?

Alarm triggers: PVs on IOCs– But more than just setting HIGH, HIHI, HSV, HHSV

– HYST is good idea

– Dynamic limits, enable based on machine state,...

Requires thought, communication, documentation

Added to alarm server with– Guidance: How to respond

– Related screen: Reason for alarm (limits, …), link to screens mentioned in guidance

– Link to rationalization info (wiki)

Page 21: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

21 Managed by UT-Battellefor the Department of Energy

Impact/Consequence GridCategory So What Minor Consequence Major Consequence

Personnel Safety PPS independent from EPICS?

Environment, Public

Can EPICS cause contained spill of mercury?

Uncontained spill??

Cost:Beam Production, Downtime,Beam Quality

No effect

Beam off < 1 sec?

Beam off <10 min

<$10000

Beam off >10min

>$10000

Mostly: How long will beam be off?

Page 22: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

22 Managed by UT-Battellefor the Department of Energy

.. combined with Response Time

Time to Respond Minor Consequence Major Consequence

>30 Minutes NO_ALARM MINOR

10..30 minutes MINOR MAJOR

<10 minutes MAJOR MAJOR + Annunciate

– This part is still evolving…

Page 23: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

23 Managed by UT-Battellefor the Department of Energy

Example: Elevated Temp/Press/Res.Err./…

Immediate action required?– Do something to prevent interlock trip

Impact, Consequence?– Beam off: Reset & OK, 5 minutes?

– Cryo cold box trip: Off for a day?

Time to respond?– 10 minutes to prevent interlock?

MINOR? MAJOR?

Guidance: “Open Valve 47 a bit, …”

Related Displays: Screen that shows Temp, Valve, …

Page 24: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

24 Managed by UT-Battellefor the Department of Energy

“Safety System” Alarms

Protection Systems not per se high priority– Action is required, but we’re safe for now, it won’t

get worse if we wait

Pick One“Mommy, I need to gooo!”“Mommy, I went”

(Does it require operator action? How much time is there?)

Page 25: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

25 Managed by UT-Battellefor the Department of Energy

Avoid Multiple Alarm Levels

Page 26: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

26 Managed by UT-Battellefor the Department of Energy

Bad Example: Old SNS ‘MEBT’ Alarms

Each amplifier trip:≥ 3 ~identicalalarms, no guidance

Rethought w/ subsystemengineer, IOC programmerand operators: 1 better alarm

Page 27: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

27 Managed by UT-Battellefor the Department of Energy

Alarms for Redundant Pumps

Page 28: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

28 Managed by UT-Battellefor the Department of Energy

Alarm Generation: Redundant Pumps the wrong way

Control System– Pump1 on/off status

– Pump2 on/off status

Simple Config setting: Pump Off => Alarm:– It’s normal for the ‘backup’ to be off

– Both running is usually bad as well Except during tests or switchover

– During maintenance, both can be off

Page 29: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

29 Managed by UT-Battellefor the Department of Energy

Redundant Pumps

Control System– Pump1 on/off status

– Pump2 on/off status

– Number of running pumps

– Configurable number of desired pumps

Alarm System: Running == Desired?– … with delay to handle tests, switchover

Same applies to devices that are only needed on-demand

11Required Pumps:Required Pumps:

Page 30: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

30 Managed by UT-Battellefor the Department of Energy

Weekly Review: How Many? Top 10?

Page 31: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

31 Managed by UT-Battellefor the Department of Energy

A lot of information available

How often did PV trigger?

For how long?

When?

Temporary issue?Or need HYST,alarm delay,fix to hardware?

Page 32: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

32 Managed by UT-Battellefor the Department of Energy

Weekly Check: Stale, Forgotten?

Page 33: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

33 Managed by UT-Battellefor the Department of Energy

GUI: Similar to SNS GUI shown hereGUI: Similar to SNS GUI shown here

JMS

CSS OtherOther

RDBRDB

LOGLOG ALARMALARM

JMS2RDB

IOCIOC

LDAPLDAP

Interconnection ServerInterconnection Server

What about the DESY Alarm System?

FiltersFilters

Filt.AlrmFilt.Alrm

No Channel Access Monitor of selected alarm PVs!

IOCs push all alarms via new protocol into Interconn. Server.

No Channel Access Monitor of selected alarm PVs!

IOCs push all alarms via new protocol into Interconn. Server.

Page 34: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

34 Managed by UT-Battellefor the Department of Energy

Design Choices

Similar alarm table and tree GUIs

JMS for communication– slightly different messages, though

DESY IOCs send all alarms, then filtered in AMS– DESY: All IOC alarms should show up in AMS, zero additional

configuration

– At SNS, how many of the 350000 PVs would send alarms?We want to make the addition of alarms simple, but not automatic, and encourage guidance, related displays.

DESY/SNS: LDAP vs. RDB for configuration/state– Choice was based on available infrastructure.

JMS Listeners– SNS: Logger, Annunciator

– DESY: Logger, Send SMS, EMail, Voice Mail

Page 35: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

35 Managed by UT-Battellefor the Department of Energy

AMS – Alarm Message SystemConfiguration Views

- AMS is a JMS (Java Message Service ) based Information-System.

- It offers different options for message distribution:- SMS

- E-Mail

- Voices-Mail

- Another JMS Topic

- Messages are sent on the basis of filtered PV. (Filters can be combined: AND/OR – Sequence)

- The recipients are Users or User groups. User groups can be used in two ways.

- Send to all Users

- Send to one after another until a user confirms the message

User, User groups as well as Filters and Actions are configures in the AMS configuration View

Slide info from Helge Rickens, DESYSlide info from Helge Rickens, DESY

Page 36: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

36 Managed by UT-Battellefor the Department of Energy

AMS

Editor to configure a

Filter

Editor to configure a

Filter

Different views to select User, User-

Group, Filter condition, Filter

and Alarm Topics

Different views to select User, User-

Group, Filter condition, Filter

and Alarm Topics

Slide info from Helge Rickens, DESYSlide info from Helge Rickens, DESY

Page 37: Managed by UT-Battelle for the Department of Energy Best Ever Alarm System Tool Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov April

37 Managed by UT-Battellefor the Department of Energy

Summary

BEAST operational since Feb’09– Needs a logo

– For now without BEAUtY

– DESY AMS is similar and has beenoperational for longer

Pick either, but good configuration requires work in any case– Started with previous “annunciated” alarms

~300, no guidance, no related displays Now ~330, all with guidance, rel. displays

– “Philosophy” helps decide what gets added and how Immediate Operator Action? Consequence?

Response Time?

– Weekly review spots troubles and tries to improve configuration