about fmeas -asq(handout)ascendantconsulting.net/ftpdocs/pdfs/about fmeas... · of the three...
Post on 07-Jun-2020
4 Views
Preview:
TRANSCRIPT
1
Informational
Brief
2012, All Rights Reserved
Some Things You May Some Things You May
Not Know about FMEAsNot Know about FMEAs
� Introduction (critique of UA Flight 232)
� Risk Management Approaches
� Issues with the FMEA Approach
� Suggestions for Improvement
“It is far better to grasp the universe as it really is than persist in delusion, however satisfying and reassuring.”
Cal Sagan, Astronomer and Writer
As quality professionals we have the responsibility of assuring the overall quality performance
of the business we support. This responsibility includes not only the quality of products,
services, and processes supported in the business but the risk management of the systems which are tightly linked to quality performance. It is imperative that we not only consider the
improvement of the business systems we support, but also the tools and methods we used to
guide us in the decisions for these improvements.
The purpose of this presentation is to provide the attendees some insight into the methods and tools used to manage risk in their organizations. One of the more popular methods of risk
management used since the early 1960s is Failure Mode and Effects Analysis, or FMEA. This
presentation focuses on some of the well documented short-comings of the FMEA method
and the ramifications of poorly assessed conditions of risk. At the close of this presentation we will provide some suggestions how the FMEA method might be improved to provide better
risk determinations and management decisions thereof.
2
Slide 2
Executive Summary:
Some You May Not Know About FMEAs*
� RPNs provide limited Risk Discrimination.
� Prediction overconfidence is common.
� Expert judgments and claims are not consistent.
� No empirical evidence that Risk Rating methods yield useful decision-making information.
*Failure Mode and Effects Analysis
Rather than hold the audience in suspense throughout the presentation this slide provides the
four possible areas of concern that many users of the FMEA method may not know about. The viewer should be aware that there are other concerns about the FMEA method, but we felt the
four shown in this slide provide a foundation of the largest factors affecting risk assessment
using the FMEA approach.
In this presentation we will explain the limitations of Risk Priority Numbers (RPNs) as indicators
of risk, discuss some foundational work associated with human predictions and the pitfall of
overconfidence, look at the use of Subject Matter Expert (SME) claims and the challenges
therein, and close with a brief discussion on the lack of informative feedback supporting risk
rating methods. We hope you find this presentation both thought provoking and insightful.
3
Informational
Brief
2012, All Rights Reserved
IntroductionIntroduction
“Everyone is perfectly willing to learn from unpleasant experiences—if only the damage of the first lesson could be repaired.”
Lichtenberg, Scientist and Satirist
4
Slide 4
Basic Definition of Risk
� In the beginning:
Chance occurrence beyond the realm of human
control that could cause loss or harm… (initially
concerned with games of chance)
� Long definition:
The probability (or chance) and magnitude of loss(severity), of an unplanned and/or undesirable
event.
� Short Definition:
� The chance that something bad could happen.
Since the beginning Mankind has struggled to understand the nature of chance
occurrences and their relationship to loss or harm. It is useful to understand there are many definitions of risk. Some are better than others, and some are simply confusing.
We use this slide to bound the definition of risk as a foundational basis for the discussion.
5
Slide 5
Definition of Risk Management
� Long definition:
The identification, assessment and prioritization of
risk, and the structured and economical application
of resources to minimize and control the
probability and impact of undesirable events.
� Short Definition:
� Being smart about taking chances…
When combining the definition of risk with the definition of management we gain a sense of
our mission as quality professionals. We trust this definition does not surprise you, and hope that you carry it with you as reference throughout the remainder of this presentation.
6
Slide 6
A Difficult Story
� On July 19, 1989 UA Flight 232 headed out from Denver, CO to Chicago, IL
� In the early afternoon the plane lost use of the rear engine and control of ALL flight surfaces.
� All maneuvering control of the plane had to be made using thrust changes with the wing mounted engines.
� UA 232 was rerouted to Sioux City, Iowa for an emergency landing.
We want to start the presentation with a problem in order to establish a level of importance
for this discussion. In essence, this first section should answer the question, “What’s in it for me?”
Without memorializing this horrific accident we want to use it as a point of departure for the
role and potential peril of risk management efforts. The rest of the slides should be self explanatory.
7
Slide 7
� The crash of UA 232 resulted in 111 fatalities, with 185
passengers and crew surviving.
� The apparent cause was due to a failure of the rear engine
fan disk. The engine had 17 years of service on it…
� A comprehensive failure assessment found that shrapnel
from fan disk failure severed a key line linking all three
redundant hydraulic control systems.
� The claimed cause was human error in the inspection of the
fan disk during service maintenance of the engine.
Common-Mode Failure
Results of the US NTSA assessment of the UA 232 crash.
8
Slide 8
DC-10 Tail View
Common-Mode Failure
Common-ModeSystem Failure
It is instructive to recognize that hidden within this simple drawing is a disaster waiting to
happen.
The Common-Mode Failure of the hydraulic system was due to the severing of a section of
piping common to all three “redundant” hydraulic systems that control the flight surfaces.
How did this happen? Why wasn’t it addressed during the design stage of the DC-10
aircraft? Why was there a belief the chance of this line being damage was one in a billion?
What factors contributed to reduce the chance of complete hydraulic failure? What is the
role of risk management during the design stage?
9
Slide 9
Underestimation of Failure Likelihood
Do you think the risk management methods used by
McDonnell Douglas should have identified this design flaw?
Certainly there were indications of design problems prior to the initial production of the
DC-10. Here is an example of a similar failure that occurred about 4 years prior to the crash of UA Flight 232.
10
Slide 10
Causal Ladder of Failure
Plane Unable to Land Safely
Loss of Control of all Flight Surfaces
Debris from Fan Disk Damaged Key Control Hydraulics
Stress Cracks in Disk Blades Caused Failure of Fan Disk
Stress Cracks in Disk Blades Missed During Maintenance Inspection
Risk Assessment Methods used to Identify
Limitations of Human
Inspection of Fan Disk were Ineffective!
Likely Common Mode Failure
Not a One-in-a-billion likelihood!
Proposed Common-Mode Failure
Cause Attributed to Human Error!
What were the underlying causal elements that led to the failure of UA Flight 232? The
US NTSA report focused on “human error” as being the predominate cause even though they admitted it would have been difficult to observe the stress crack on the Fan Disk.
Given the nature of the common mode failure shouldn’t a team of risk professionals have
caught this flaw at the design stage? What do you think?
11
Slide 11
Other Common-Mode Failures
� Hurricane Katrina.
� The Financial Crisis of 2008/09.
� On-board microprocessors in automotive, aviation, and other transportation applications.
� Embedded software applications in all computer controlled devices and equipment.
� Climate Change.
� Supply disruption of raw resources: food, fuel, and water.
� Poor government policies that impact society.
12
Slide 12
Dramatization of UA Flight 232
� Follow the DiscoveryChannel.ca story of UA Flight 232 via this link:
http://watch.discoverychannel.ca/mayday/season-11/mayday-impossible-landing/#clip662372
13
InformationalBrief
2012, All Rights Reserved
Risk Management Risk Management
ApproachesApproaches
“There is perhaps no beguilement more insidious and dangerous than an elaborate and elegant mathematical
process built upon unfortified premises”
Thomas C. Chamberlain, Geologist and Writer (1899)
In this section we take a quick look at three approaches commonly used to manage risk,
Failure Mode, Effects and Criticality Analysis (FMECA), Fault Tree Analysis (FTA), and Failure Mode and Effects Analysis (FMEA).
Our focus in this section is on the FMEA approach. Feel free to view the slides in the
Appendix for details on the other two approaches.
14
Slide 14
Two General Ways to Manage Risk
� Choose the areas to “optimize” risk reduction, subject to various constraints:
� Budget constraints
� Dependencies and interactions among sources, targets, and consequences
� Dependencies among countermeasures
� Identify, document, and rank risk concerns, then tackle the largest perceived risks first:
� Using Rating Scales of risk as a guide
� Using Risk Matrices to identify key concerns
Adapted from webinar of 8-Nov-2012by Dr. Tony Cox of Cox-Associates
This slide describes the two general ways of conducting risk management today.
The first way of managing risk seeks to choose actions called countermeasures that provide
the greatest risk reduction possible for the money spent. This approach is considered an
optimization problem, and as such requires calculating the size of the risk reduction and how
much of the budget is required to achieve it. Using this approach a team has a wide variety of options for employing countermeasures. For example, the team can consider the effects
of a given countermeasure on other identified risks than the parent, and in doing so is able
to optimize both cost factors and risk reduction simultaneously.
The second way of managing risk involves ranking the largest risks identified in the system from a universe of many. This is done because there is just too many potential risks to
manage for the budget to handle. In this approach each potential risk is treated
independently of the others without considering any potential dependencies between risk
events or interactions among the sources or consequences of the countermeasures. This approach to risk management lacks any real optimization heuristics, but is considered much
easier and less complicated then the first approach.
15
Slide 15
Typical Risk Management Methods
� Typical Risk Management Methods in use today:
� Failure Modes and Effects Criticality Analysis (FMECA)
� Fault Tree Analysis (FTA)
� Failure Modes and Effects Analysis (FMEA)
� Over 75% of applications today use the FMEA approach.
CombinedCombined
TYPE
Type 1Type 1
Type 2Type 2
This slide provides a view of the three typical methods used in industry and government today
to manage risk. Included in this slide is a cross-classification of each method with the two general types of risk management approaches which is show to the far left.
Of the three methods shown FMEA embodies all of the characteristics of a Type 2 risk
management approach using an rank ordering of risk classifications as a basis for identifying the largest risks in a system. Of the three methods of risk management, the FMEA method is
the most popular. This is due to its simplicity of use and lack of complex probabilistic
mathematics which is often challenging for most folks untrained in probability theory to grasp.
This said beneath the ranking categories is an implicit sense of probability and uncertainty which is mask by a seemingly simple process of selecting a rating value. Despite the ranking
system the complexity of failure rates, hazard functions, and probability still prevail.
16
Slide 16
Risk as Defined by FMEA
Risk in FMEA is defined as:
� Severity (of failure effect),
� Occurrence (of cause or failure),
� Detection (of cause or failure)
The FMEA method of risk determination defines risk in three areas as shown in this slide.
These three dimensions in combination address the components of risk as defined by the FMEA method: potential for a failure to occur (Occurrence), the relative level of hazard
(Severity), and the ability to detect or prevent the failure before it happens (Detection). Ideally,
risk is reduced by having clear and accurate understanding of the failure mechanisms, or
reducing the uncertainty associated with each dimension.
17
Slide 17
AIAG Guidelines:
Severity Ranking Criteria
Reprinted from www.TheNewExcellence.com
This slide shows the ranking criteria supporting the Effect of a failure mode on the system or
user. It ranks the lowest level of risk a “1” and the highest level of risk a “10.” This criteria was developed by the Automotive Interest Action Group (AIAG) to support the management of
both supplier quality and process/product design by the organization via potential risk factors
in a given system.
18
Slide 18
AIAG Guidelines:
Occurrence Ranking Criteria
Reprinted from www.TheNewExcellence.com
This slide shows an example table of risk ranking criteria for the Occurrence of a failure
mode or the cause of a failure mode. Again, a “1” is considered low risk and a “10” is
considered high risk.
Please note the identification of the column 3 labeled Ppk in this table. In reviewing these
tables online I notice a mixed use of the indices Ppk and Cpk in this column. Ppk is called
the Process Performance Index and Cpk is called the Process Capability Index, and they
are interchangeably used as a soft probability measure of failure rates. In reviewing my AIAG reference this table uses Cpk.
There is great confusion in the automotive and other industries on the value and use of
these two process measures. The confusion is so great that they are often used interchangeably even though they measure entirely different things. In the next slide we try
to explain these two estimates in an effort to minimize the confusion.
19
Slide 19
Time 2
Time 3
Basis for Computation:
Performance vs. Capability Indices
BetweenGroup
VariationWithinGroup
Variation
Time 1
Time 4
Time 5
Time
Total
Variation Pp, Ppk
When making predictions of the future the measures we use should be reliable. A reliable
measure is one that is consistent over time. If measures are not reliable, then their utility in prediction is limited.
When observing the measure of a products, parts, services or any array of items considered
to be identical we don’t usually measure the same values for all pieces of work. Instead, we measure a range of values around a common aim for process that produced the work. This
range of values is called the observed or “total” variation. The components of total variation
as shown are called “within” and “between” group variation. Common terminology used by
others refer to the average “within” group variation as short-term variation and the “total”
variation as long-term variation. These references are non-standard terms that serve to confuse the purpose of breaking the variation into its components.
20
Slide 20
Time 2
Time 3Time 1Time 4
Time 5
Time
Total
VariationPp, Ppk Cp, Cpk
If process is in a state of control:
Pp ≡ Cp
Basis for Computation:
Performance vs. Capability Indices
Typical practice is to compute the Process Performance Indices using the “total” variation
as part of the calculation, the Process Capability Indices using the average “within” group variation. As you can see these are two different components of variation which can yield
two different results. Looking at the slide we notice a third component of variation called
“between” group variation. If the “between” group variation is too great the process is
considered unreliable. In essence, large shifts in between group variation indicates multiple
isolated causal elements are present in the process. If the between group variation is too great than estimates of the Process Performance Indices, Pp and Ppk, will be poor
predictors of the future performance of the process.
Better predictors of future process performance are Cp and Cpk. These two indices require the process first achieve a state of statistical or stationary control before being computed.
As such, all references to Pp and Ppk in an FMEA exercise should be exchanged with Cp
and Cpk. A better approach would be to use failure rates and probability of failure directly if
possible.
21
Slide 21
Yes
Yes
Yes
Yes
Type 2 RM Approach:
Example of a Process FMEA
Engage Transportation
Mode
Take RouteOver North
Bridge
Cross MajorDownstreamIntersection
Arrive at Work On Time
PROCESS
October 10, 2010
This is a simple example used to illustrate the correct application of a Process FMEA.
Please note the correct use of this tool requires you to list the process steps in the far left
column, not process variables. The focus with this tool is on the potential failure modes of
each process step and the relative risk of each cause of a given failure mode indicated as “RPN” which stands for Risk Priority Number.
22
Slide 22
Example of a Design FMEA
PRODUCT:
Lid
Handle
Body
This is simple example used to illustrate the correct application of a Design FMEA supporting
a product.
Please note the correct use of this tool requires you to list the product components in the far left column, not process steps. Using this tool we focus on the potential failure modes of each
product component and on the component-to-component interactions with a goal of
establishing mitigating design controls or redesigning the product such that we can eliminate
high risk failure modes.
23
InformationalBrief
2012, All Rights Reserved
Some Issues with Some Issues with
the FMEA Approachthe FMEA Approach
“Quality improvement will result from people improving their processes and from management improving the system.” T. Pyzdek
24
Slide 24
Four Issues with FMEAs + a bonus
� RPNs provide limited Risk Discrimination.
� Prediction overconfidence is common.
� Expert judgments and claims are not consistent.
� No empirical evidence that Risk Rating methods yield useful decision-making information.
� FMEA risk claims are rarely verified with actual follow-up data.
Here again are the fours issues cited earlier plus one additional issue for good measure.
The next slides discuss each of these issues in some detail to give the reader some insight
on the possible weak areas of a risk assessment using Failure Mode and Effects analysis.
25
Slide 25
Calculation of Risk Priority Number (RPN)
RPN = Severity Rating * Occurrence * Likelihood of Detection
- How many total RPN values are available for the FMEA analysis?
- How many unique RPN values are available for the analysis?
For those unfamiliar with FMEAs we provide a view of this simple calculation used to
compute a Risk Priority Number or RPN.
Using a scale of 1 to 10 for each risk area how many RPN values do you believe are
available for the FMEA analysis? Now, don’t cheat and look ahead. Try instead to think this
answer through.
Of the total number of expected RPN values calculated from the previous question, how
many of them do you believe are unique from all the others? This is a bit tougher questions,
but try to think it through…
26
Slide 26
Risk Ranking Scale of 1 to 10:
Enumerating RPN Classifications
S O DCalculated
RPN
RPN Value
Order
Observed Number
of Classifications
0 0 0 0 0 01 1 1 1 1 1
1 1 2 2 2 3
1 1 3 3 3 3
1 1 4 4 4 6
1 1 5 5 5 3
1 1 6 6 6 9
1 1 7 7 7 3
1 1 8 8 8 10
1 1 9 9 9 6
1 1 10 10 10 9
1 2 1 2 11 0
1 2 2 4 12 15
1 2 3 6 13 0
1 2 4 8 14 6
1 2 5 10 15 6
1 2 6 12 16 12
1 2 7 14 17 0
1 2 8 16 18 15
1 2 9 18 19 0
1 2 10 20 20 15
1 3 1 3 21 6
1 3 2 6 22 0
1 3 3 9 23 0 Microsoft Excel Worksheet
To aid in answering the previous questions let open up Excel or other spreadsheet program
and attempt to enumerate all of the possible combinations of RPN. This slide illustrates how
to set up this evaluation. If you have access to our spreadsheet, then open it up and look it over. Now, can you answer the previous questions?
27
Slide 27
Plot of RPN Class Counts
Risk Priority Value Plot
0
5
10
15
20
25
0 100 200 300 400 500 600 700 800 900 1000
RPN Values
Nu
mb
er
of
Cla
ssif
icati
on
s(Based on a 1 to 10 Scale)
This slide shows a plot of the enumerated RPN values supporting a 1 to 10 scale for three
ranked classifications.
What do you see in this slide? Do you notice the greatest number of RPN values are clustered around an RPN of about 100? How many RPN values are available above 500?
The distribution of RPN values above shows a bias towards the lower range of all possible
values. Also note, each bar of this plot indicates the number of duplicate RPN classifications
available. These are non-unique classifications. Did you know that RPN values behaved this way prior to our discussion? How might this behaviour affect the risk assessment process?
28
Slide 28
Summary:
Enumerated RPN Classes with a 1 to 10 Scale
FMEAs have an extremely limited capability to discriminate Risk Classifications
This slide summarizes the plot shown previously. Now you are able to answer the earlier
questions we asked about the measurement used to quantify risk with the FMEA method.
Out of a 1,000 possible unique risk classifications, 10 X 10 X 10, how many are actually
available for use in the FMEA risk assessment method?
Out of the actual number available, how many RPNs provide unique classifications? This
information directly speaks to the ability of FMEA to discriminate between different risk
conditions. We illustrate this effect in the next slide.
29
Slide 29
Some FMEA Examples
� Suppose S=10, O=9, and D=4 for a given failure mode. What actions might you consider?
� Suppose S=4, O=9, and D=10 for a given failure mode. What actions might you consider?
� Suppose a Hazardous failure with a chance of Occurrence ≅ 30%, and the Ability to Detect in Production is Variable? What actions might you consider?
Look at each of the three entries of risk classifications shown in this slide. Given the
components of risk shown, try to provide a sense of the actions you might take to reduce the potential risks.
Would you consider taking the same actions for each of the three listed risk conditions? If
so, then why when there is great differences observed among similar rankings. If not, then
why when the summarized risk in the form of RPN is the same for all three risk conditions?
Using RPN as the primary measure of risk management seems to present a few challenges.
Let’s look at the entire range of possible risk classification for RPN = 360.
30
Illustration:
Fifteen “Equivalent” Rankings of
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
ID
10Impossible9Very High4Very Low
9Very Remote10Very High4Very Low
9Very Remote8High5Low
8Remote9Very High5Low
10Impossible6Moderate6Moderate
6Low10Very High6Moderate
9Very Remote5Moderate8Hazardous
5Moderate9Very High8Hazardous
10Impossible4Moderate9Hazardous
8Remote5Moderate9Hazardous
5Moderate8High9Hazardous
4Mod. High10Very High9Hazardous
9Very Remote4Moderate10Hazardous
6Low6Moderate10Hazardous
4Moderate High9Very High10Hazardous
Ranked Value
Likelihood of Detection
Ranked Value
Likelihood of Occurrence
Ranked Value
Severity
RPN=360RPN=360
Note in the table shown the range of Severity, Occurrence, and Detection values
observed. All 15 combinations support the same risk, measured using RPN.
31
Slide 31
���� Poor Risk Discrimination
� Risk Priority Numbers are the products of three ordinal-scale values!
Multiplication and Division Operations
Addition and Subtraction Operations
Ranking or Grouping to define
an Ordering among Categories
Legal MathematicalOperations
Example
Characteristics
Temperature scales with an
absolute zero, i.e. Kelvin
Temperature scales of °F or °C w/o an absolute
zero
Places in a contest such as: 1st, 2nd,
and, 3rd
Values that possess ordering, distance, and an absolute zero
Values that possess both ordering and defined distance
Values ranked in a logical order
Ratio Scale
Data
Interval ScaleData
Ordinal Scale Data
Adapted from D.J. Wheeler, The Six Sigma Practitioner’s Guide to Data Analysis, 2005
So, what is the reason for the observed behaviour of RPNs. This slide provides a clue as
shown in the second column of the table.
What is the difference between Ordinal and Ratio scale data? How are RPNs calculated?
32
Slide 32
���� Prediction Overconfidence
� Buried beneath the risk rankings for Occurrence and Detection is an estimation of probability.
� Most of us don’t understand how probabilities work, and instead relegate our estimates to ranked values.
� This mental gymnastics carries with it some hidden problems.
� For years research psychologists have known that everyone is naturally “overconfident” in their predictions.
� Let’s illustrate this effect in the next slide where we will ask a few trivia TRUE/FALSE questions.
The claims on this slide are supported by the early ground breaking work of Psychologists
Amos Tversky and Daniel Kahnemann.
This work is so well know in the field of decision science that none questions its validity.
Unfortunately, few in industry have heard of or understand the ramifications of this work.
If we have the time during the presentation we will attempt a limited calibration exercise.
If we are unable to conduct this exercise due to time constraints you can do conduct it
yourself. You can check the Appendix for the answer once you complete the first part of
the exercise.
33
Simple Calibration Exercise
50% 60% 70% 80% 90% 100%The first six values in the constant PI is 3.14139.
10
50% 60% 70% 80% 90% 100%Modern humans first appeared on the earth about 200,000 years ago.
9
50% 60% 70% 80% 90% 100%In 2002, the price of a new desktop computer was under $1,500.
8
50% 60% 70% 80% 90% 100%One meter equals 37.39 inches.7
50% 60% 70% 80% 90% 100%Napoleon was born on the island of Corsica.6
50% 60% 70% 80% 90% 100%M is one of the three most commonly used letters.
5
50% 60% 70% 80% 90% 100%Mars is always further away from the Earth than Venus.
4
50% 60% 70% 80% 90% 100%A liter of oil weighs less than a liter of water.3
50% 60% 70% 80% 90% 100%There is no species of three-humped camel.2
50% 60% 70% 80% 90% 100%The ancient Romans were conquered by the ancient Greeks.
1
Confidence that You are Correct
Answer (T or F)
Statement
Exercise Instructions:
1. Read the statement.
2. Decide whether the statement is True or False.
3. Circle how confident you feel about your answer.
4. Complete all 10 statements.
Find the answers in the Appendix
34
Slide 34
Results:
Prediction Calibration
� In subjective assessments an evaluator is considered calibrated if the proportion of true assessments equals the average weighted confidence assigned by the evaluator.
� As an example, suppose in our exercise you observed 6 correct answers out of 10 possible, or 60%, and the average confidence of the 6 correct answers was 75%.
� Therefore, in the long-run you claim to have 75% confidence in achieving correct answers, but you actually answered 60% correct answers, therefore:
� If, %Actual Correct < %Confidence : Overconfident
� If, %Actual Correct > %Confidence : Under-confident
Follow the guidance in this slide to compute the percent of correct answers and the average
confidence for correct answers. Make the comparison shown in the bottom of slide.
Please bear in mind this is a simple exercise containing a sample of only 10 questions. Its
ability to determine “subjective assessment” performance is extremely limited.
If you wanted to get a reasonable estimate of “assessment” performance you would need a
minimum of 50 calibration questions to start. So, don’t worry if you did not do well with this
exercise. Accept that it is just an indicator and realize that you may be capable of
overconfident responses.
35
Slide 35
���� Inconsistent Expert Claims*
� Like the rest of us, Subject Matter Experts tend make “overconfident” claims.
� Unlike the rest of us, SMEs can often make overconfident claims outside their areas of experience and training.
� This overconfidence can present a problem to an assessment team when evaluating subjective risk.
� There is great tendency by team members to give the SMEs far more latitude in making claims than other members.
� Additionally, SMEs provide expert advice on an inconsistent basis, care should be taken when using uncalibrated expert advice without question…
*Thoroughly researched by Tversky,
Kahneman, Lichtenstein, Fishhoff, and
Phillips
Some additional insight from the work of listed researchers. The take-away is to realize that
SMEs are prone to the same judgement errors as the rest of us. Consider this possibility the
next time you receive expert advice from anyone including me!
36
Slide 36
Work Conducted by the US Navy in 1981
All of the previous information has been known since the mid-1970s and has been codified in
many US military references and guidance.
37
Slide 37
���� Limited Empirical Evidence for FMEAs
� A review of the literature, past and present, provides little quantitative empirical evidence of FMEA effectiveness.
� Most companies do not track or collect this information.
� There is substantial research showing the effectiveness of Probabilistic Risk Assessment (PRA) over Risk Ranking Methods (RR).
� Many US government agencies have returned to PRA over Risk Ranking methods since the late 70s. (see Appendix for additional details)
It is difficult to find any literature supporting empirical studies on the use and effectiveness of
the FMEA method of risk management.
38
Slide 38
Excerpt from 2009 IEEE Journal Article
� Meshkat, Leila PhD, Probabilistic Risk Assessment for Decision Making during Spacecraft Operations, IEEE, 2009. Page 1, Sect. 1.1 Quantitative Risk Assessment (QRA) :
Additional support that US government agencies, once enamored by the FMEA method, are
moving back to more conventional Type 1 risk assessment methods.
39
Slide 39
���� FMEA Risk Claims Rarely Verified
� This comment is supported by the previous one.
� There is usually no closed-loop evaluation for FMEA risk claims against actual warrantee returns, field issues, etc.
� Without this information it is difficult for an operation to know if their risk assessment efforts actually manage product and process risk.
� Without this knowledge, the operation is unable to address any glaring issues with their risk assessment efforts and take the needed improvement actions.
An FMEA assessment is a predictive evaluation of the system(s) under study. In any
empirical scientific endeavor we always gain feedback from the systems we study and compare our predictions to the actual performance.
For some reason, this doesn’t seem to happen with FMEA work in most companies. I’m not
sure why this is the case, but now that you understand this gap perhaps you might consider including the feedback loop into the risk management process at your company. This is no
other way to uncover the short-comings of this method and make the necessary corrections.
Please note the last bullet point on this slide and feel free to give the timeline of risk management methods a look in the Appendix when you have a chance.
40
Informational
Brief
2012, All Rights Reserved
Suggestions for Suggestions for
ImprovementImprovement
41
Slide 41
Suggestions for Improvement of RAs
� Consider phasing out the use of RPNs when conducting FMEAs.
� Consider sorting the risk evaluations in FMEAs by Severity, then Occurrence, and next Detection—then, work with the ranked failure modes directly. (increases risk discrimination)
� Consider a move to Probabilistic Risk Assessment (PRA) methods in the future as rate data become available:
� Use of Monte-Carlo Simulations (uses knowledge of input distributions)
� Use of Bayesian Inversion Analysis (uses past reliability performance)
� If PRA methods are not viable for your work, then consider adjusting the RPN to a Corrective Priority Number by dividing RPN by the unit cost to implement the corrective action or detection method, see next slide for example.
This is a subject of its own. If interest exist we can work on a separate presentation
covering the bullet points in this slide.
42
Slide 42
Estimating a Corrective Action Index
An example illustration the last bullet point in the previous slide. The use of CAI as shown
in this slide is adapted from the recent work of Dr. Tony Cox.
43
Slide 43
Selected References
D. H. Stamantis, Failure Mode and Effect Analysis, (1995), copyright ASQ/ASQC Quality Press.
Automotive Industry Action Group, Potential Failure Mode and Effects Analysis –Reference Manual, (February 1995), Second Edition.
Automotive Industry Action Group, Statistical Process Control (SPC) - Reference Manual, (March 1995), Second Printing.
D. W. Hubbard, The Failure of Risk Management – Why It’s Broken and How to Fix It, (2009), copyright John Wiley & Sons.
D. J. Wheeler, The Six Sigma Practitioner’s Guide to Data Analysis, 311-315 (2005), copyright SPC Press.
A. Tversky and D. Kahnemann, Judgement Under Uncertainty: Heuristics and Biases, Science 185, 1124-1131 (1974), copyright 1974 NAAS.
L. Meshkat, Probabilistic Risk Assessment for Decision Making during Spacecraft Operations, (2009), IEEE Journal.
Louis Anthony (Tony) Cox, Improving Risk Management, Comparisons and Decisions, November 2012 SIRA Meeting Webinar, Web Link: http://vimeo.com/53151221
44
Slide 44
Selected References
US DoD Information Analysis Center, Failure Mode, Effects, and Criticality Analysis (FMECA), (1993), Reliability Analysis Center, Rome, NY.
The New Excellence, AIAG FMEA Severity, Occurrence, and Detection Ranking Guidelines, (2009), Web link: www.TheNewExcellence.com
S. Lichtenstein, B. Fischhoff, and L. D. Phillips, Calibration of Probabilities: The State of the Art to 1980, (1981), Perceptronics, Inc. sponsored by the US Office of Naval Research.
G. Keren, On the Calibration of Probability Judgments: Some Critical Comments and Alternative Perspectives, (1997), Journal of Behavioral Decision Making, Vol. 10, 269-278.
G. E. Apostolakis, How Useful is Quantitative Risk Assessment?, (2004), Risk Analysis, Vol. 24, No. 3, 515-520.
http://livingsta.hubpages.com/hub/20-Worst-Accidents-Involving-US-Carriers, 20 Worst Accidents Involving US (Aviation) Carriers.
Louis Anthony (Tony) Cox Jr., Risk Analysis of Complex and Uncertain Systems, (2009), copyright Springer.
45
InformationalBrief
2012, All Rights Reserved
AppendixAppendix
46
Slide 46
Other UA Flight 232 Photos
47
Slide 47
Timeline of Risk Management Methods
1960s – Contractors of NASA
developed and used variants of
FMECA referred to as FMEA
1967 – SAE published
ARP-926 supporting
FMEA approach
1967 – US Civil Aviation
industry adopts FMEA
approach supported by SAE
1970 – Automotive
industry began wide-
spread use of FMEA
1973 – US EPA adopts use of
FMEA approach for Risk assessment
1993 – AIAG publishes first FMEA standard
1994 – SAE publishes first FMEA standard
1949 - Use of FMECA first
standardized in Mil-P-1629
1980 - Mil-P-1629 revised
supporting FMECA to MIL-
STD-1629A
1984 – US government support of
FMECA MIL-STD1629 canceled*
*Major changes in Risk Management Application
Use of FMECAs
Use of FMEAs
Use of FTAs and PRA
Legend
1971 – Begin Wide spread use of
Probabilistic Risk Assessment in
the Aviation industry and FTA
*
1981 – Mandatory use of PRA and
FTA by US Nuclear power industry*
1962 – Bell Labs develops the
Fault Tree Analysis approach
1970 – US FAA includes use
of FTA into 14CFR25.1309 for
all Transport Category Aviation
*2010 – Widespread use of PRA methods by
the US Dept. of Homeland Security
1987 – Mandatory use of PRA and such
tools as FTA and FMECA by NASA after
shuttle disaster in 1986
*
Timeline
48
Slide 48
A Combined RM Approach:
Failure Modes and Effects Criticality Analysis
Hardware and Products,
Processes, Product Applications,
Service Systems, etc.
Failure Effects, Detection Methods, Compensating Provisions, Severity
Class
Failure Rate, Mission Time, Modal
Criticality Number
49
Slide 49
FMEA Part:
Failure Modes and Effects Criticality Analysis
50
Slide 50
Criticality Analysis:
Failure Modes and Effects Criticality Analysis
51
Slide 51
FMECA Criticality Matrix:
Failure Modes and Effects Criticality Analysis
52
Slide 52
Criticality Matrix:
Failure Modes and Effects Criticality Analysis
Some Potentially Useful Analysis Guidance
53
Slide 53
Type 2 RM Approach:
FTA - Fault Tree Analysis
OR3
OR2OR1 AND1
AND2
Pa=p1+p2
Pb=p3*p4*p5
Pc=((p6+(p7*p8))
Psystem=Pa+Pb+Pc
PA2=p7*p8
Faults, Errors,
Malfunctions, etc.
A Fault Tree Analysis allows quantitative measure of process risk by bounding the
uncertainty associated with complex undesirable events that are linked together logically in a process. If one can assess the frequency of occurrence for the components of each
undesirable event, then one can estimate the chance that a Fault or Failure can be made.
This slide illustrates a simple example using Fault Tree analysis to understand the logical
structure associated with the error of excluding a buffer during one step in a
biopharmaceutical process.
54
Slide 54
AIAG Guidelines:
Detection Ranking Criteria
Reprinted from www.TheNewExcellence.com
Example of the risk ranking table supporting the Detection risk as provided by AIAG.
55
Answers to Simple Calibration Exercise
50% 60% 70% 80% 90% 100%The first six values in the constant PI is 3.14139.
10
50% 60% 70% 80% 90% 100%Modern humans first appeared on the earth about 200,000 years ago.
9
50% 60% 70% 80% 90% 100%In 2002, the price of a new desktop computer was under $1,500.
8
50% 60% 70% 80% 90% 100%One meter equals 37.39 inches.7
50% 60% 70% 80% 90% 100%Napoleon was born on the island of Corsica.6
50% 60% 70% 80% 90% 100%M is one of the three most commonly used letters.
5
50% 60% 70% 80% 90% 100%Mars is always further away from the Earth than Venus.
4
50% 60% 70% 80% 90% 100%A liter of oil weighs less than a liter of water.3
50% 60% 70% 80% 90% 100%There is no species of three-humped camel.2
50% 60% 70% 80% 90% 100%The ancient Romans were conquered by the ancient Greeks.
1
Confidence that You are Correct
Answer (T or F)
Statement
F
T
T
F
F
T
F
T
T
F
Answers to the calibration questions provided in the body of this presentation.
top related