Download - CS5032 Lecture 10: Learning from failure 2
LEARNING FROM FAILURE 2DR JOHN ROOKSBY
IN THIS LECTURE
This lecture will focus on how accidents and serious incidents are investigated and analysed
• When do investigations happen?
• How are they conducted?
• What analytical methods are used?
INVESTIGATIONS
Accidents and failures need to be investigated.
• Investigations enable you to identify the (most likely) causes of accidents and failures
• The causes and conditions leading up to accidents and incidents are often complex, and immediate reactions may be wrong
• Investigations seek to uncover underlying causes, not just the immediate causes
• Investigations should also address how future accidents can be avoided
• In this context, investigations are primarily to prevent future occurrences than establish responsibility
INVESTIGATIONS
The basic steps of an investigation are
1. Collection phase: Evidence, and facts are sought
2. Analysis phase: The evidence and facts are analysed and opinions invited from experts and other parties
3. Judgements phase: Judgements are made about the causes of an incident or accident and the associated responsibilities
4. Follow up: Recommendations should be made on how to stop similar problems happening again.
In practice, the process will be iterative.
INVESTIGATIONS
There are limitations on the investigation process. Investigations can be costly.
• We may never know all the facts. With complex systems, it can be very hard or impossible to know everything that happened in the run up to an incident. In major accidents sources of evidence may be damaged or lost.
• There will always be subjective views and uncertainties, especially around human actions.
Judgements need to be made about the extent to which an incident can be investigated.
Investigations often conclude with the “likely” or “probable” causes rather than a definitive version of events
WHO INVESTIGATES?
The scope and emphasis of an investigation is likely to reflect the position of the investigator
An investigator ought to be independent.
• In practice, this can be hard to achieve.• Some industries have an official, independent investigation
organisation• In the event of a major incident, a ‘public enquiry’ may be
used, in which the evidence and investigative process is made public and so open to scrutiny.
ANALYSIS
The analysis phase of an investigation needs to explore and evaluate often complex information.
Experts and specialists may need to be involved at this point.
There is no standard method for analysing an accident, and continuing debate about how this is best done.
Approaches include
• Narrative approaches• Causal chains• Systems approaches
NARRATIVE APPROACHES
All accident investigations will produce a narrative of some kind. Many reports are purely a narrative and a set of conclusions. A narrative is a written account of an incident or accident.
• Producing this can be non-trivial because it can be difficult to structure events, many of which may have occurred simultaneously and many of which may have ambiguities, into a linear document.
Producing a narrative is a key step in making sense of an incident
Narrative accounts have serious limitations however. It is difficult to evaluate their depth and coverage, and they tend to ‘storify’ complex events.
“ROOT CAUSE” APPROACHES
Many approaches have been developed to systematically identify the root causes of an incident. These approaches are based on the idea that the immediate events in an incident are symptoms of a much deeper problem.
Root cause analysis techniques usually express events as a chain. The chains often branch, and multiple chains can be synchronised to represent parallel events.
• Examples: MORT (management oversight risk tree), FMEA (Failure mode and effects analysis) , Barrier analysis, WBA (Why-because Analysis)
“ROOT CAUSE” APPROACHES - LIMITATIONS
The stopping problem• A causal chain could in theory go backwards indefinitely.
The proximity problem• A root cause is often found to be something proximal to
the accident (often a human operator).
The causation problem• Hindsight and investigative biases frame particular
actions in terms of their contributions to an outcome
However, this does not mean that it is wrong to try to identify underlying causes
Investigations usually refer to the “likely” or “probable” root causes
SYSTEMS METHODS
Systems methods for accident analysis have come into use over the last decade.
• From this perspective, accidents result from inadequate control or enforcement of safety-related constraints on the development, design, and operation of the systems.
Systems methods emphasise controls over the system itself. This recognises that no system is inherently safe, and that systems (particularly socio-technical systems) adapt and change over time.
• A key approach is STAMP (Systems-Theoretic Accident Model and Processes).
SYSTEMS METHODS
Key criticisms of systems models
• They are often used as a means of pursuing and attributing blame to high level people in an organisation
• They can turn attention too far away from the actual design and implementation of the technology
HINDSIGHT AND FORESIGHT
It is essential to learn from our mistakes, but we should not wait for accidents to happen before we try to improve the dependability of systems. How can we predict problems that may occur? How can we ensure systems are resilient to possible problems.
Several of the methods mentioned in this lecture can be used to follow through the consequences of possible problems or failures.
Predicting possible causes and consequences of failure, unless in very narrow circumstances, can involve many arbitrary decisions.
COLUMBIA
htt
p:/
/en
.wik
ipe
dia
.org
/wik
i/Sp
ace
_S
hu
ttle
_C
olu
mb
ia
INVESTIGATIONThe Columbia Accident Investigation Board was an independent board set up to analyse the Columbia disaster
• 13 board members and many investigators
• Investigation took around 5 months
• Cost approximately 17 million dollars
• 230 page report produced
The proximal cause was fairly clear from the outset. The investigation sought to focus on underlying causes.
• The investigation focused on organisational, historical, budgetary and political factors in the shuttle programme
• The questions surrounded the issue that foam strikes were routinely ignored
KEY POINTS
Investigations are important for learning from failures.
Investigations often show that initial assumptions about the cause of an incident are wrong or partial. They aim to find underlying or “root” causes.
All investigations involve some sort of judgement. Investigations should be as neutral as possible, but in practice this is difficult to achieve.
There are many methods for analysing an incident or accident, and no single right way to do this.
FURTHER READING
MAIB – Maritme Accident Investigation Branch
• http://www.maib.gov.uk/home/index.cfm
AAIB – Air Accidents Investigation Branch
• http://www.aaib.gov.uk/home/index.cfm
Columbia Accident Investigation Branch
• http://caib.nasa.gov/