exelon utilities cap & hp programs s/2017fall... · 2017-10-05 · 4 eu cap evolution 2010 - cr...
TRANSCRIPT
EEI Fall Occupational Safety and Health Committee Conference Sept 25-27, 2017
Portland, OR
Exelon Utilities CAP & HP Programs
Jeff Sword, Principal Specialist, Performance Assessment
ComEd
2
Genesis of CAP / HP at ComEd
2002 – 2003: ComEd’s new chief operations officer commissioned an incident investigation process in response to recent events: - OSHA injuries and Transmission/Distribution impact events
caused by human performance were investigated - Tracking and trending process established for investigations,
CA’s and basic lagging indicator data - Performance indicators and basic reporting was established - Criteria soon expanded to other events of consequence - Started training managers in TapRoot® - Human Performance Program basics established
• Error traps, Event Free Performance tools (job brief, 3-part communication, self check, peer check, etc.)
3
EU CAP Evolution
2004 - Transitioned from reactive incident investigations only to
Condition Reporting • Investigation process became CAP process
- Exelon Energy Delivery created from merger with PECO - Minimal consequence and precursor reporting commenced
2008 - Condition Reporting moved to web-based application with
trend codes for HP tool use and skills/work practices in addition to event and cause codes
2009 - Legacy Safety Audit/Observation programs consolidated into
new web-based platform, common coding with CRs
4
EU CAP Evolution 2010
- CR and Safety Observation apps joined as front end web apps to the mainframe database with leading/lagging data
2011-2012 - Exelon Utilities created with Constellation (BGE) merger
2012 - Established ComEd SIF program with BST support - Started investigating SIF-P near miss events
2013 - Accident/injury/illness moved from SHARE to OHM and CR - MVA reporting became unique CR entry, additional coding - SIF program now EU Wide
2016 - PEPCO Holdings Inc (PHI) joined EU. Adopted CAP / HP
5
HP / CAP 101
Human Performance Theory Errors vs. Events Error Types Performance Modes Program Elements (Prevent-Detect-Correct)
6
6 Human Performance Principles
People are not perfect and even the best make mistakes Goal is Event-Free performance, not Error free. Otherwise, we’d have to
remove humans from the workplace Error-likely situations are predictable, manageable and preventable Individual behavior occurs within the context of organizational processes and
values, which serve as the principal influence on the choice of behaviors People achieve high levels of performance based largely on the
encouragement and reinforcement received from leaders, peers and subordinates
Events can be avoided by understanding the reasons mistakes occur and applying the lessons learned from past events
The event-free performance tool book is only ONE of many elements to human performance
Example: • Average person makes 3-5 human errors per hour – generally no
consequence due to no hazards or error-likely situations • Average commercial airline pilot makes 10-15 errors per hour. Learnings
in the form of work practices, checklists, equipment design, alarms, peer checks, etc., prevent them from becoming events.
7
7 Why We Need To Focus On The Management Systems
If we don’t actively seek opportunities to find and plug holes in our defenses, or improve process quality, we have a one-slice view of the world and are solely reliant on the individual. This is a problem because …
EVENT
HAZARD
The Individual as the sole
defense
… PEOPLE MAKE MISTAKES
8
8 Why did the event happen – “what did he do”?
What single act caused the event?
9
9 Improving Human Performance
90% of events are caused by something OTHER THAN JUST the individual
95% of people react very similarly to the same stimuli
People do what they do at the time that they do it for reasons that make sense to them at the time
It is NOT Common Sense
10
10 Anatomy of an Error - The Swiss Cheese Model
The goal is to identify and plug the holes before they allow an event to occur
EVENT
Undesired Act Adverse Condition Error
Flawed Defenses / Failed barriers
The Perfect Storm • Several defenses fail • The holes align • Disaster happens
Management Systems
Solution To prevent events, we need to
plug the holes that cause near misses
To plug the holes, we need to find, report and fix the holes
To find the holes, they need to be recognized in practice or identified through analysis
To report the holes, employees have to trust the system
To fix the holes, we have to change something
11
11 What NOT To Do
Employees want to know that everyone will be treated fairly if expectations are not met
On the other hand, if a blind eye is seen turning away from intentional infractions, overall performance will suffer
Blame Cycle
Human Error
Less communication Management less aware of
jobsite conditions
Reduced trust Latent organizational weaknesses persist
Individual counseled and/or disciplined unjustly
More flawed defenses & error precursors
12
12 Causes of Adverse Events
Organizational & Programmatic
Failures
Exec Mngt
Failures
Human Failures
Equipment Failures
85 - 90% of causes reside here
10 - 15% of causes reside here
People want to do a good job
13
13 Recognize the Difference (industry definitions)
Human Performance Error • An error is an action that unintentionally departs from an expected
behavior. Errors include slips, lapses, mistakes (not violations). − Slips occur when the physical action fails to achieve the immediate
objective. − Lapses involve a failure of one’s memory or recall. − Mistakes are a specific type of error that is a result of a faulty intention
or plan (i.e. one does something believing it to be correct when it was, in fact, wrong).
o “Deviations” are here (not strictly complying with a rule, standard, or expectation)
• Active errors are those errors that have immediate, observable, undesirable outcomes and can be either acts of commission or omission. The majority of initiating actions are active errors.
• Latent errors result in hidden organization-related weaknesses or equipment flaws that lie dormant and include deficient procedures or process guidance, design deficiencies, management practices, and at-risk common behaviors or practices.
14
Recognize the Difference (industry definitions)
Violation
• A deliberate, intentional (with forethought) act to evade or circumvent a known policy, rule, or procedure requirement and that deviates from sanctioned organizational practices
• Most violations are well intentioned, arising from a genuine desire to get a job done according to management’s wishes. Such actions may be acts of either omission or commission. Usually consequences are unintended; violations are rarely acts of sabotage. The deliberate decision to violate a rule is a motivational issue.
14
15
Generic Error Modeling System (GEMS)
• Levels of Performance • Skill Based
• Rule Based • Knowledge Based
(Not Thinking)
(Thinking)
15
16
What is Skill Based?
• A low level of attention given to the performance of a task. • The performance of the task is routine and accomplished with little or no thought.
A Skill-based error is an inappropriate act due to lack of attention and/or forethought.
16
17
What is Rule Based?
• A higher level of attention given to the performance of a task. • The performance of the task is achieved with the use of learned or referenced rules.
A Rule-based error is an inappropriate action performed, without the use of or in opposition to, learned rules and/or procedures.
17
18
What is Knowledge Based?
• A highest level of attention given to the performance of a task. • The performance of the task is achieved with an analysis of the info available. The learned or referenced rules do not apply to the situation at hand.
A Knowledge-based error is an inappropriate action performed due to improper analysis of the situation*. There were no learned rules or procedures to apply.
18
* aka: Inaccurate Mental Model
19
Error Rates
19
Errors are least likely to occur while performing skill-based activities; this is followed by performance of rule-based activities. Knowledge-based activities provide the most likely error rate as there is a potential for inaccurate mental models. Skill Based: 0.1% chance of error (1 in 1,000).
• Example: Driving to work and distracted by cell phone causes an accident.
Rule Based: 1% chance of error (1 in 100). • Example: Performing a surveillance using a procedure and
taking shortcuts that are not part of the procedure. Knowledge Based: 10-50% chance of error (1 in 2 to 1 in 10).
• Example: Execution of a task based on an inaccurate mental model.
Note: Deviations exist in skill and knowledge-based performance. Violations only exist in rule-based performance.
20
20 Where We Strive to Be – HP Program Elements
• Prevention − Event-Free Performance Tool Book
o Identify error-likely situations o Add or modify defenses o Add contingencies / strengthen defense-in-depth
• Detection − Management and Safety Observations/Audits
o Real-time observations in the field o Report problems to identify weaknesses (Condition Reports (CR)) o Analyze trends; identify areas needing attention
• Correction − Corrective Action Program (CAP)
o Classify the event or condition based on significance o Conduct investigation o Identify effective corrective actions o Track to completion o Measure effectiveness
• Sharing Lessons Learned
21
21 PREVENTION – Event-Free Performance Tool Book
Top 10 Human Error Traps: • Stress • High work load • Time pressure • Poor communications • Vague or poor work guidance • Overconfidence in work and/or abilities • First time performing a task • Distractions • First working day following time off • 30 minutes after a meal or waking up
22
22 PREVENTION – Event-Free Performance Tool Book
1. Job Brief • Multi-Person • Single Person/Multiple Location
2. Questioning Attitude • 4 Key Questions
3. Verification Practices • Self-Check (STAR) • First Check • Peer Check • Independent Review
4. Three-Part Communication • 24-Hour Clock • Phonetic Alphabet
5. Procedure Adherence • Procedure Levels • Place-Keeping • Reader-Doer • Fundamentals of Operational
Execution
6. Stop When Unsure/OOPS 7. Visual Cues
• Flagging • Robust Operational Barriers
23
Error Types
Active Errors – Active errors are observable, physical actions that change equipment, system, or service state, resulting in immediate undesired consequences • Tools to reduce active errors
− Self-Check − First Check − Peer Check − Procedure Adherence − Stop When Unsure − Visual Cues
23
24
Error Types
Latent Errors - Latent errors result in hidden organization-related weaknesses or equipment flaws that lie dormant • Tools to reduce Latent errors
− Job Brief − Questioning Attitude − Independent Review − Three Part Communication − 24-Hour Clock − Phonetic Alphabet
24
25
25 Famous Last Words – Quotes from Actual Events
I’ve done this a thousand times I’m just setting up I’m just cleaning up I’ll just scope it out first I’m not going to get that close I’m not going to touch anything Hey new guy, I’ll be right back, don’t touch anything It’s not that heavy Nobody’s going to drive through here I don’t have to secure it, I won’t drop it Put a little more a** in it
26
26 One of Many Real World Tragic Examples:
What’s the difference?
What about now?: • Almost none: same workers, same task, same
requirements, same equipment, same tools – just not as deep.
That one little difference derailed the thought process and significantly changed the outcome!
None: 3 manholes, same crew Identical work Identical equipment
27
Favorite Posters - # 1 of 2
27
28
Favorite Posters - # 2 of 2
28
29
29 DETECTION – Real-Time Observations
Management Safety Audit / Observations • Two types
− General work/critical task − Driver/vehicle
• Both contain safety behavior and condition observation topics plus lists of Event Free Human Performance Tools and Work Practices/Skills to be assessed as one of the following: − Exceeds − Acceptable − Below
• Free-form field for comments is also provided • Audit data provide valuable volumes of data for trending and
analysis • EFPTB and Skills trend codes are identical to those in CR’s
for additional leading indicator data
30
30 CORRECTION - Purpose of the CAP Process
Promote continuous improvement through organizational learning
Build and hone a self-critical and compliance culture • We can always improve
• We need to identify all weaknesses, regardless of significance, in order to improve safety and processes
• Make accountability about understanding “why compliance is important to me,” and it’s not the same as enforcement
− “punishment teaches people how to avoid punishment” – Aubrey Daniels
• People make mistakes; we need to learn why and how they become events in order to reduce the number and severity
• The CAP provides a structured approach to do this
31
31 Event Classification and Investigative Response
Highest Significance Level - Root Cause Investigation
Medium Significance - Apparent Cause Evaluation
Lower Significance - ACE (optional) - Capture and repair
Serious injury; fatality; major equipment failure: - VP Event Free Clock Reset
Injury; dropped customers; equipment damage; violation of Lock Out Tag Out; Near Hit: - VP Event Free Clock Reset
Injury requiring first aid; tool used not in good working condition; Near hit: - Department Level Event Free Clock Reset
32
32 Measuring Progress
VP Event Free Clock Resets = Real-time measure of human performance = how we are doing in HP day-by-day
Distribution/Transmission Incident: System impact, similar to some other T&D utilities “operating incident” definitions; matches current EPRI recommendation. For example: • Customer interruption • Functional damage to system equipment - repair / rework to restore • Improper operation or misalignment of system devices • Unintentional de-energizing or energizing lines / equipment • Customer outages extended by human performance errors > 2 hours • Improper Zone of Protection (lockout/tagout) established or ZOP violated
OSHA Recordable injury caused by human performance (Safety Incident) Other HP VP EFC resets per procedure, generally consequence based
• RVA, flash (no injury), ZOP admin violation, property damage >$5K, plus more
33
Measuring Progress Continued
Non VP Resets: • Department Level Resets • Near Miss (Near Hits)
− EEI definition: An unintended event which has real potential to cause injury or illness to employees, contractors working on site, or the general public, but does not
− Another perspective that drives us to investigate these: An adverse event narrowly avoided because of luck or unplanned timely intervention. Otherwise known as a "close call". The causes and circumstances are the same as the serious incident that almost occurred.
• Precursor Conditions (which can lead to an event if not addressed) • Any condition adverse to quality
33
34
34 Reactive Incident Investigation
35
35
Event “Investigation” = Causal Analysis + CA (Event “Investigation” ≠ Criminal Investigation)
Apparent Cause Evaluations (ACE)
• Events of limited consequence that require structured analysis to reduce the likelihood of repeats, 1-2 people
Root Cause Investigations (RCI) • Events of significant consequence that require deep, multi-
department/ multi-discipline team investigation
Common Cause Analysis (CCA) • Analyzes multiple events to identify commonalities that may be
indicative of programmatic or organizational problems
Event Investigations
36
36 What is Causal Analysis? (AKA Root Cause Analysis)
The methodology of the incident investigation
Structured step-by-step methodology to: • Identify the cause of a problem or event – why it
happened • Enable determining lasting actions to prevent, or
mitigate the consequence, of recurrence
RCA techniques provide a structured methodology to: • Prevent “jumping to a conclusion” • Rule out opinions, pre-conceived notions, off-target causes,
emotion, etc. • Find all relevant causes • Force you past “what” and “how”
− Prevents only treating symptoms − To get to the “cause”, you must get past what happened and how it
happened and keep asking “why”…
37
37
A Causal Factor is any problem associated with the incident that, if non-existent, could have prevented the incident from occurring or would have significantly mitigated its consequences:
•Causal factors are “conditions” on the SnapChart •Sometimes known as “Direct Cause” or failure mode (how not why) •Ask “if this condition did not exist or was different, could it have affected the outcome?” •Something incorrect, inappropriate, or undesired about the action; deviation from the desired action; or no action when we should have had action; etc. E.g., flawed barrier •E.g., Someone deviated from a procedure or equipment operation deviated from design or intent •Fix this and you may only address symptoms or the individual
Causal Factor vs. Cause
A Cause is “why” the causal factor existed. •The Root Cause is the “most basic cause (or causes) that can reasonably be identified that management has control to fix and, when fixed or eliminated, will prevent (or significantly reduce the likelihood or consequences of) the problem's recurrence.”
38
38 Corrective Actions The main purpose of all this effort: To prevent the event from happening again Simply—Action to be taken to fix the problem or cause Attributes of a quality Corrective Action:
• Have the end in mind— “How is the action going to result in improvement?” • Clearly linked to the identified cause or problem • S-Specific • M-Measurable • A-Accountable • R-Reasonable • T-Timely • E-Effective • R-Reviewed
Who should define the Corrective Action: • The team suggests - the CA Owner refines and buys in • The team must have full acceptance from the assignee for scope and due date
Importance of Corrective Actions: • If not the right action, implemented correctly, or in a timely manner, will not get the needed results
(No performance improvement, risk of repeat condition)
• “Ineffective actions = Repeat Conditions = Wasted time”
39
39 Corrective Actions
Strength of Corrective Actions
Strongest
Weakest
Check Type of CA Remove or substantially reduce the hazard Remove the Target (Human, equipment, environment, etc…) Guard the Target (Human, equipment, environment, etc…) Improve human performance via good Human Factors Design
(Man-Machine interface) Improve human performance via Rules, Procedures, Signs …
(Field Bulletins, Safety Alerts) Improve human performance via Training, Supervision …
Skill Based 0.1% error
Rule Based 1% error
Knowledge Based 1% error
40
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00De
c-04
Apr-
05
Aug-
05
Dec-
05
Apr-
06
Aug-
06
Dec-
06
Apr-
07
Aug-
07
Dec-
07
Apr-
08
Aug-
08
Dec-
08
Apr-
09
Aug-
09
Dec-
09
Apr-
10
Aug-
10
Dec-
10
Apr-
11
Aug-
11
Dec-
11
Apr-
12
Aug-
12
Dec-
12
Apr-
13
Aug-
13
Dec-
13
Apr-
14
Aug-
14
Dec-
14
Apr-
15
Aug-
15
Dec-
15
Apr-
16
12 Month Rolling HP Incident RateDec 2004 to April 2016
40 Has All This Made A Difference?
Employee + Contractor
• Since inception ComEd has reduced the 12-month rolling VP Clock Reset rate by 75%
• The goal is event-free performance
2500!!
41
ComEd’s SIF Definitions
Serious Injury and Fatality (SIF) • SIF Actual (SIF-A)
− A fatality − A life-threatening injury or illness, that if not immediately
addressed, will lead to the death of the affected individual, and usually requires the intervention of internal and/or external emergency response personnel to provide life-sustaining support
− A life-altering injury or illness that results in permanent or long-term impairment or loss of use of an internal organ, body function, or body part
• SIF Potential (SIF-P) − Physical event where if one or two circumstances/ factors could
have reasonably changed, there is a high probability that the outcome could have become a SIF-A
41
42
EU SIF Exposure Decision Tree Database
42
CR text is entered here
All four utilities use the same criteria (SA-EU-P006)
43
Event Response Summary – 2Q17 Investigative Response to SIF Events by Exposure:
43
Exposure ACE RCI CR Only CR
w/CA's Total Vehicle 0 0 5 0 5
LOF 2 1 5 0 8 Flash 4 0 2 0 6 SFL 1 0 1 0 2
Contact 0 1 0 0 1 Fall 0 1 0 0 1
Fire/Expl 0 0 2 0 2 Attack 0 0 1 0 1
Unlisted 0 0 1 0 1 Total 7 3 17 0 27
% 26% 11% 63% 0% 100%
44
2Q17 YTD SIF vs. Classic Triangle Ratios
44
93 Significant Events (VP EFC & SINF)
1295 DL Ratio= 14
8262 PC / Audit Dev Ratio = 88
27 SIF Exposures
153 DL * Ratio= 6
993 PC / Audit Dev * Ratio= 37
Classic Theory
SIF Exposure Pyramid
SIF Theory
Classic Pyramid
2Q17
* SIF Precursors (Vulnerabilities) High-risk situations in which management controls are either absent, ineffective, or not complied with, and which will result in a serious or fatal injury if allowed to continue. • Captured as PC/ DL/ Audit Dev
Desired Ratios: DL>10
PC/Audit>100
Counts are employee only
45
What did we learn along the journey?
It’s not about the person Program engagement from top down and bottom up Less is more. Fewer better investigations & CA’s Plan obsolescence – don’t wait for paralysis Don’t build it from scratch or in the garage Have an expert/ambassador in every department Communicate but keep it real and simple 100-year-old culture can be changed Learn from success as well as failure (yours and others)
The Edison Electric Institute (EEI) is the association that represents the U.S. investor-owned electric industry. Our members provide electricity for 220 million Americans, operate in all 50 states and the District of Columbia, and directly employ more than 500,000 workers. Safe, reliable, affordable, and clean electricity powers the economy and enhances the lives of all Americans. The EEI membership also includes dozens of international electric companies as International Members, and hundreds of industry suppliers and related organizations as Associate Members. Since 1933, EEI has provided public policy leadership, strategic business intelligence, and essential conferences and forums for the energy industry. For more information, visit our Web site at www.eei.org.