staying alive: patterns for failure management from the bottom of the ocean - ronnie chen -...

50
STAYING ALIVE PATTERNS FOR FAILURE MANAGEMENT FROM THE BOTTOM OF THE OCEAN RONNIE CHEN SLACK 1 — Ronnie Chen @rondoftw

Upload: codemotion

Post on 05-Apr-2017

17 views

Category:

Technology


1 download

TRANSCRIPT

STAYING ALIVEPATTERNS FOR FAILURE MANAGEMENT FROM THE BOTTOM OF THE OCEAN

RONNIE CHEN SLACK

1 — Ronnie Chen @rondoftw

WHY DID I BECOME A DIVER?

2 — Ronnie Chen @rondoftw

A WHOLE NEW WORLD

3 — Ronnie Chen @rondoftw

4 — Ronnie Chen @rondoftw

5 — Ronnie Chen @rondoftw

7 — Ronnie Chen @rondoftw

8 — Ronnie Chen @rondoftw

TECHNICAL DIVING

▸ longer dive times▸ deeper dives▸ overhead ceiling

▸ decompression obligations▸ more gear. a lot more.▸ higher pressure▸ more risks

9 — Ronnie Chen @rondoftw

RISKS MAY INCLUDE...

1. hypoxia2. hyperoxia

3. nitrogen narcosis4. carbon dioxide buildup5. oxygen sensor failure

6. deep tissure isobaric counterdiffusion (ICD)7. high pressure nervous syndrome (HPNS)

8. software failure9. exhausting your carbon dioxide scrubber

10. carbon dioxide channeling from a poorly packed scrubber11. carbon buildup causing an spark leading to an oxygen fire. underwater.

12. flooding of breathing loop or circuitry13. water mixing with the scrubbing agent to produce a toxic caustic soda that will give you chemical burns on your mouth, airway, and lungs

14. plain old decompression sickness

10 — Ronnie Chen @rondoftw

 If you own a rebreather for five years, two percent of you

are going to die on it.

— Jill Heinerth, underwater explorer

11 — Ronnie Chen @rondoftw

12 — Ronnie Chen @rondoftw

13 — Ronnie Chen @rondoftw

14 — Ronnie Chen @rondoftw

HOLD UP!15 — Ronnie Chen @rondoftw

THIS IS A TALK ABOUTCOMMUNICATION AND PROCESS

? ? ?16 — Ronnie Chen @rondoftw

(it was a trap)

17 — Ronnie Chen @rondoftw

YOU CAME TO HEAR COOL STORIES...

18 — Ronnie Chen @rondoftw

BUT YOU'RE GETTING A MEANDERING MEDITATION ONBEST PRACTICES* WHEN DEALING WITH COMPLEX SYSTEMS INSTEAD

* These guidelines have only been shown to work for life or death situations under the ocean.They have not been proven to work for tech.

19 — Ronnie Chen @rondoftw

How failures really happen

20 — Ronnie Chen @rondoftw

Complex systems are designedto protect against simple failures.

21 — Ronnie Chen @rondoftw

But accidents still happen.

22 — Ronnie Chen @rondoftw

CATASTROPHES ARE CAUSED BY A FAILURE CASCADE▸ you have a rebreather malfunction

▸ which you would have caught it if you were testing your equipment on a regular basis▸ your backup tank had a leak and is running low and that wasn't caught either

▸ and your buddy is too far away and isn't checking in with you▸ and your dive light that you use to communicate at a distance is out of power

▸ and in the excitement you kick up silt and the visibility drops▸ and in your panic your air consumption goes up and then you breathe through the last of the air in your tank

▸ so you swim for the surface even though you have a decompression obligation

23 — Ronnie Chen @rondoftw

A post-mortem that blames this incident on a simple mechanical malfunction would only cover 12.5% of the issues that led up to this

accident.

24 — Ronnie Chen @rondoftw

Complex system failures don't happen because a single part of the system fails. They happen because all the safety procedures that are supposed to protect them from the simple system failure didn't work.

25 — Ronnie Chen @rondoftw

CORE RULES OF SAFETY SYSTEMS

1. An unused safety system doesn't exist.

26 — Ronnie Chen @rondoftw

NORMALIZATION OF DEVIANCEThat natural human tendency,

particularly in pressure circumstances,to take a safety shortcut.

To accept a lower standard of performance.

— Colonel Mike Mullane, astronaut

27 — Ronnie Chen @rondoftw

FALSE FEEDBACKthe absence of something bad happening means that it was safe

ADAPTATIONexperience is no longer a suitable gauge of risk

SOCIAL PRESSUREthis is just how we do things

28 — Ronnie Chen @rondoftw

CORE RULES OF SAFETY SYSTEMS

2. An untested safety system doesn't exist either!

29 — Ronnie Chen @rondoftw

CORE RULES OF SAFETY SYSTEMS

3.Unused or untested safety systems are more dangerous than not having one at all. Therefore, safety systems must be tested at regular

intervals.

The length of this interval should be determined not only by how likely it is for this system to fail but also how great the impact will be if it does.

30 — Ronnie Chen @rondoftw

A QUICK SIDENOTE ON ASSESSING RISK

▸ Make assessments based on likelihood of occurrence.▸ Make assessments based on magnitude of regret.

If you are only evaluating risk based on the chance of it happening, you must be prepared to experience the corresponding level of regret if it

does.

31 — Ronnie Chen @rondoftw

failures will happen

32 — Ronnie Chen @rondoftw

WHAT IS SAFETY?

33 — Ronnie Chen @rondoftw

FAILURE MANAGEMENT

▸ A framework for redundancy▸ The training and judgment to use it

34 — Ronnie Chen @rondoftw

FAILURE MANAGEMENT FOR SYSTEMS

▸ Have redundancy for systems that you cannot survive without.▸ Have a redundant pathway to success: a procedure for graceful

degradation for systems that are important but not critical.▸ Have a process for changing over from primary to redundant

systems.

35 — Ronnie Chen @rondoftw

FAILURE MANAGEMENT FOR SYSTEMS (CONT)

▸ Keep failures contained so that they don't bring down other systems▸ Make it easy to do the right thing and hard to do the dangerous

things

36 — Ronnie Chen @rondoftw

FAILURE MANAGEMENT FOR HUMAN SYSTEMS

37 — Ronnie Chen @rondoftw

TRAINING FOR PRESSURE

38 — Ronnie Chen @rondoftw

TRAINING: INEXPERIENCED PEOPLE TO THE FRONT

▸ Most inexperienced person leads▸ Experienced person advises and intervenes only when necessary▸ Team is invested in personal success to ensure mission success

39 — Ronnie Chen @rondoftw

TRAINING: INEXPERIENCED PEOPLE TO THE FRONT (CONT)

▸ Frees up more experienced people from micromanaging▸ Opportunity to revise and improve problematic systems▸ One of the best ways to equalize a gap in experience

40 — Ronnie Chen @rondoftw

GOOD JUDGMENT

Good judgment enables the reshaping of rules and frameworksto adapt to a changing environment.

41 — Ronnie Chen @rondoftw

REFINING JUDGMENT

▸ Post-Mortems▸ Pre-Mortems▸ Fire Drills

▸ Revisit Past Decisions

42 — Ronnie Chen @rondoftw

POST-MORTEMS▸ Look at the safety procedures that failed to stop the cascade▸ Look for opportunities to create new safety systems at critical

points

43 — Ronnie Chen @rondoftw

PRE-MORTEMS▸ Don't wait for failures to build safety frameworks

▸ Identify potential avenues of of failure and make plans for them▸ Include both likely failures and high regret failures

44 — Ronnie Chen @rondoftw

FIRE DRILLS▸ Vet your plans and safety systems▸ Perform targeted training

▸ Evaluate effectiveness of tools and documentation

45 — Ronnie Chen @rondoftw

REVISIT PAST DECISIONS▸ Examine successful operations to see what key insights were helpful

▸ Identify any dependency on luck in previous projects▸ Share rationale for decisions

46 — Ronnie Chen @rondoftw

RECOGNIZING SUCCESS

47 — Ronnie Chen @rondoftw

I WANT TO LEARN MORE!1. Diane Vaughn - The Challenger Launch Decision2. Richard I. Cook - How Complex Systems Fail3. Mike Mullane - https://www.youtube.com/watch?v=Ljzj9Msli5o4. Steve Lewis aka decodoppler - Staying Alive5. Sidney Dekker - Drift into Failure

48 — Ronnie Chen @rondoftw

49 — Ronnie Chen @rondoftw

Any Questions?

50 — Ronnie Chen @rondoftw