learning from incidents at autotrader
TRANSCRIPT
Learning from Incidents at Auto Trader
@4ndyHumphrey
Learning from Failure at Auto Trader
@4ndyHumphrey
What is a Learning Organisation? What is the Reality? What are my Choices? Incident Reviews - things to Avoid Incident Reviews - things to Encourage What about holding people to Account? A bit on Our process
Learning from Incidents
Our People
PRIVATE Car Sellers
Trade Car Dealers
30,000
15,000
Auto Trader Staff
Product & Tech Teams
850
275
Our Customers
Our Technology Platform
1.2 billion page views per month
70 million peak page views per day
15 million unique visitors per month
Supported by 100 live applications
Further Reading up front
Links:John Allspaw - The Infinite HowsSteve Shorrock - if it werent for the peopleEuroControl - Systems Thinking for SafetyLyndsay Holmwood - Blame-Language-SharingSydney Dekker - Just Culture
Black Box Thinking – Matthew Syed
People:Steven Shorrock
Erik Hollnagel
Sidney Dekker
Matthew Syed
John Allspaw
Lindsay Holmwood
Dave Zwieback
Nancy Leveson
Field Guide to Understanding Human Error – Sidney Dekker
Beyond Blame – Dave
Zwieback
Nancy Leveson - Engineering a Safer World
Further Reading up front
What is a Learning Organisation?
The Loom
A Learning Organisation
Moral ResponsibilityJob SatisfactionEconomic Imperative
Why should I want to learn?
What’s the reality?
Blame management
Blame - Fundamental Attribution Error
Blame - Justice
Blame - Hindsight
Blame – Bad Apple Theory
Blame – Ignoring context
Jonathan Caramanus/Green Renaissance/wwf.org.uk
Blame - It’s Easy
What are my choices?
Things will always go wrong
https://www.youtube.com/watch?v=EvegBo4TUdQ
You can blame people…
Or say it’s a one off…
Or you can look at the context…
…Learn and make changes
“Blame is the enemy of safety…”
But it is a choice:
Nancy Leveson
W. Edwards Deming
“Whenever there is fear, you will get wrong figures.”
Incident Reviews:Things to avoid
Culture of fear
Top down
Asking Why?
Environment
Capabilities
Behavior
Values and Beliefs
Identity
Contexts – WHERE?
Methods, Approaches – HOW?
Skills and Actions – WHAT?
Motivation and permission - WHY?
Sense of Self, Role– WHO?
Questioning styles:
Dilts Model
Don’t go too Deep!
Environment
Capabilities
Behavior
Values and Beliefs
Identity
Contexts – WHERE?
Methods, Approaches – HOW?
Skills and Actions – WHAT?
What is important/true – WHY?
Sense of Self – WHO?
Dilts Model
Single Root Cause
Points scoring
Incident Reviews: How to encourage learning
Priming
Keep an open mind
Explore how events unfolded
Incident Review Prompts(from The Field Guide To Understanding Human Error, by Sidney Dekker)
At each juncture in the sequence of events (if that is how you want to structure this part of the accident story), you want to get to know:
• Which cues were observed (what did he or she notice/see or did not notice what he or she had expected to notice?)• What knowledge was used to deal with the situation? Did participants have any experience with similar situations that was useful in dealing with this one?• What expectations did participants have about how things were going to develop, and what options did they think they have to influence the course
of events?• How did other influences (operational or organizational) help determine how they interpreted the situation and how they would act?
Here are some questions Gary Klein and his researchers typically ask to find out how the situation looked to people on the inside at each of the critical junctures:
Debriefings need not follow such a scripted set of questions, of course, as the relevance of questions depends on the event. Also, the questions can come across toparticipants as too conceptual to make any sense. You may need to reformulate them in the language of the domain.
Cues What were you seeing?What were you focusing on?What were you expecting to happen?
Interpretation If you had to describe the situation to your colleague at that point, what would you have told?Errors What mistakes (for example in interpretation) were likely at this point?Previousexperience/knowledge
Were you reminded of any previous experience?Did this situation fit a standard scenario? Were you trained to deal with this situation? Were there any rules that applied clearly here?Did any other sources of knowledge suggest what to do?
Goals What were you trying to achieve?Were there multiple goals at the same time?Was there time pressure or other limitations on what you could do?
Taking action How did you judge you could influence the course of events?Did you discuss or mentally imagine a number of options or did you know straight away what to do?
Outcome Did the outcome fit your expectation?Did you have to update your assessment of the situation?
Communications What communication medium(s) did you prefer to use? (phone, chat, email, video conf, etc.?) Did you make use of more than one communication channels at once?
Help Did you ask anyone for help?What signal brought you to ask for support or assistance? Were you able to contact the people you needed to contact?
Timelines
14:00 Alert received from
Site confidence
15:15 Incident communication
sent
16:00 Incident closure comms
sent
1. Factual timeline entries can be filled in prior to the Review Meeting
Timelines
14:00 Alert received from
Site confidence
15:15 Incident communication
sent
16:00 Incident closure comms
sent
1. Factual timeline entries can be filled in prior to the Review Meeting
13:10 Slow server performance
observed by BIll
14:20 Bill spoke to John about SC issues and
decided to recover DB
15:50 John finished DB recovery
2. As a group, overlay the basic timeline with key decisions and junctures
One conversation
Actions
Impartial facilitator
Investigate what went well
Practice – make it habit
What about holding people to account?
Accountability
Our process:
Major IncidentsHigh Severity IncidentsFailed Releases (all)Failed Changes (Large)
Our Process
Priming – Timeline - Actions
We understand and truly believe that everyone did the best job they could, given what they knew at the
time, their skills and abilities, the resources available, and the situation at hand
We are here to learn and find solutions to improve our ways of working
Why we are here:
Open MindedGo back in time
No single ‘Root Cause’How not Why
Things that help us learn
Blaming peopleHuman ErrorArse CoveringPoints scoring‘Trying Harder’
Talking over people
Things that stop us learning:
After the review:
• Incident details recorded• Actions (owners, dates) recorded• Owned by Service Management Team
Further Reading up front
Links:John Allspaw - The Infinite HowsSteve Shorrock - if it werent for the peopleEuroControl - Systems Thinking for SafetyLyndsay Holmwood - Blame-Language-SharingSydney Dekker - Just Culture
Black Box Thinking – Matthew Syed
People:Steven Shorrock
Erik Hollnagel
Sidney Dekker
Matthew Syed
John Allspaw
Lindsay Holmwood
Dave Zwieback
Nancy Leveson
Field Guide to Understanding Human Error – Sidney Dekker
Beyond Blame – Dave
Zwieback
Nancy Leveson - Engineering a Safer World
Further Reading Again
Questions?