about fmeas -asq(handout)ascendantconsulting.net/ftpdocs/pdfs/about fmeas... · of the three...

Informational

Some Things You May Some Things You May

Not Know about FMEAsNot Know about FMEAs

� Introduction (critique of UA Flight 232)

� Risk Management Approaches

� Issues with the FMEA Approach

� Suggestions for Improvement

“It is far better to grasp the universe as it really is than persist in delusion, however satisfying and reassuring.”

Cal Sagan, Astronomer and Writer

As quality professionals we have the responsibility of assuring the overall quality performance

of the business we support. This responsibility includes not only the quality of products,

services, and processes supported in the business but the risk management of the systems which are tightly linked to quality performance. It is imperative that we not only consider the

improvement of the business systems we support, but also the tools and methods we used to

guide us in the decisions for these improvements.

The purpose of this presentation is to provide the attendees some insight into the methods and tools used to manage risk in their organizations. One of the more popular methods of risk

management used since the early 1960s is Failure Mode and Effects Analysis, or FMEA. This

presentation focuses on some of the well documented short-comings of the FMEA method

and the ramifications of poorly assessed conditions of risk. At the close of this presentation we will provide some suggestions how the FMEA method might be improved to provide better

risk determinations and management decisions thereof.

Executive Summary:

Some You May Not Know About FMEAs*

� RPNs provide limited Risk Discrimination.

� Prediction overconfidence is common.

� Expert judgments and claims are not consistent.

� No empirical evidence that Risk Rating methods yield useful decision-making information.

*Failure Mode and Effects Analysis

Rather than hold the audience in suspense throughout the presentation this slide provides the

four possible areas of concern that many users of the FMEA method may not know about. The viewer should be aware that there are other concerns about the FMEA method, but we felt the

four shown in this slide provide a foundation of the largest factors affecting risk assessment

using the FMEA approach.

In this presentation we will explain the limitations of Risk Priority Numbers (RPNs) as indicators

of risk, discuss some foundational work associated with human predictions and the pitfall of

overconfidence, look at the use of Subject Matter Expert (SME) claims and the challenges

therein, and close with a brief discussion on the lack of informative feedback supporting risk

rating methods. We hope you find this presentation both thought provoking and insightful.

Informational

IntroductionIntroduction

“Everyone is perfectly willing to learn from unpleasant experiences—if only the damage of the first lesson could be repaired.”

Lichtenberg, Scientist and Satirist

Basic Definition of Risk

� In the beginning:

Chance occurrence beyond the realm of human

control that could cause loss or harm… (initially

concerned with games of chance)

� Long definition:

The probability (or chance) and magnitude of loss(severity), of an unplanned and/or undesirable

event.

� Short Definition:

� The chance that something bad could happen.

Since the beginning Mankind has struggled to understand the nature of chance

occurrences and their relationship to loss or harm. It is useful to understand there are many definitions of risk. Some are better than others, and some are simply confusing.

We use this slide to bound the definition of risk as a foundational basis for the discussion.

Definition of Risk Management

� Long definition:

The identification, assessment and prioritization of

risk, and the structured and economical application

of resources to minimize and control the

probability and impact of undesirable events.

� Short Definition:

� Being smart about taking chances…

When combining the definition of risk with the definition of management we gain a sense of

our mission as quality professionals. We trust this definition does not surprise you, and hope that you carry it with you as reference throughout the remainder of this presentation.

A Difficult Story

� On July 19, 1989 UA Flight 232 headed out from Denver, CO to Chicago, IL

� In the early afternoon the plane lost use of the rear engine and control of ALL flight surfaces.

� All maneuvering control of the plane had to be made using thrust changes with the wing mounted engines.

� UA 232 was rerouted to Sioux City, Iowa for an emergency landing.

We want to start the presentation with a problem in order to establish a level of importance

for this discussion. In essence, this first section should answer the question, “What’s in it for me?”

Without memorializing this horrific accident we want to use it as a point of departure for the

role and potential peril of risk management efforts. The rest of the slides should be self explanatory.

� The crash of UA 232 resulted in 111 fatalities, with 185

passengers and crew surviving.

� The apparent cause was due to a failure of the rear engine

fan disk. The engine had 17 years of service on it…

� A comprehensive failure assessment found that shrapnel

from fan disk failure severed a key line linking all three

redundant hydraulic control systems.

� The claimed cause was human error in the inspection of the

fan disk during service maintenance of the engine.

Common-Mode Failure

Results of the US NTSA assessment of the UA 232 crash.

DC-10 Tail View

Common-Mode Failure

Common-ModeSystem Failure

It is instructive to recognize that hidden within this simple drawing is a disaster waiting to

happen.

The Common-Mode Failure of the hydraulic system was due to the severing of a section of

piping common to all three “redundant” hydraulic systems that control the flight surfaces.

How did this happen? Why wasn’t it addressed during the design stage of the DC-10

aircraft? Why was there a belief the chance of this line being damage was one in a billion?

What factors contributed to reduce the chance of complete hydraulic failure? What is the

role of risk management during the design stage?

Underestimation of Failure Likelihood

Do you think the risk management methods used by

McDonnell Douglas should have identified this design flaw?

Certainly there were indications of design problems prior to the initial production of the

DC-10. Here is an example of a similar failure that occurred about 4 years prior to the crash of UA Flight 232.

Causal Ladder of Failure

Plane Unable to Land Safely

Loss of Control of all Flight Surfaces

Debris from Fan Disk Damaged Key Control Hydraulics

Stress Cracks in Disk Blades Caused Failure of Fan Disk

Stress Cracks in Disk Blades Missed During Maintenance Inspection

Risk Assessment Methods used to Identify

Limitations of Human

Inspection of Fan Disk were Ineffective!

Likely Common Mode Failure

Not a One-in-a-billion likelihood!

Proposed Common-Mode Failure

Cause Attributed to Human Error!

What were the underlying causal elements that led to the failure of UA Flight 232? The

US NTSA report focused on “human error” as being the predominate cause even though they admitted it would have been difficult to observe the stress crack on the Fan Disk.

Given the nature of the common mode failure shouldn’t a team of risk professionals have

caught this flaw at the design stage? What do you think?

Other Common-Mode Failures

� Hurricane Katrina.

� The Financial Crisis of 2008/09.

� On-board microprocessors in automotive, aviation, and other transportation applications.

� Embedded software applications in all computer controlled devices and equipment.

� Climate Change.

� Supply disruption of raw resources: food, fuel, and water.

� Poor government policies that impact society.

Dramatization of UA Flight 232

� Follow the DiscoveryChannel.ca story of UA Flight 232 via this link:

http://watch.discoverychannel.ca/mayday/season-11/mayday-impossible-landing/#clip662372

InformationalBrief

Risk Management Risk Management

ApproachesApproaches

“There is perhaps no beguilement more insidious and dangerous than an elaborate and elegant mathematical

process built upon unfortified premises”

Thomas C. Chamberlain, Geologist and Writer (1899)

In this section we take a quick look at three approaches commonly used to manage risk,

Failure Mode, Effects and Criticality Analysis (FMECA), Fault Tree Analysis (FTA), and Failure Mode and Effects Analysis (FMEA).

Our focus in this section is on the FMEA approach. Feel free to view the slides in the

Appendix for details on the other two approaches.

Two General Ways to Manage Risk

� Choose the areas to “optimize” risk reduction, subject to various constraints:

� Budget constraints

� Dependencies and interactions among sources, targets, and consequences

� Dependencies among countermeasures

� Identify, document, and rank risk concerns, then tackle the largest perceived risks first:

� Using Rating Scales of risk as a guide

� Using Risk Matrices to identify key concerns

Adapted from webinar of 8-Nov-2012by Dr. Tony Cox of Cox-Associates

This slide describes the two general ways of conducting risk management today.

The first way of managing risk seeks to choose actions called countermeasures that provide

the greatest risk reduction possible for the money spent. This approach is considered an

optimization problem, and as such requires calculating the size of the risk reduction and how

much of the budget is required to achieve it. Using this approach a team has a wide variety of options for employing countermeasures. For example, the team can consider the effects

of a given countermeasure on other identified risks than the parent, and in doing so is able

to optimize both cost factors and risk reduction simultaneously.

The second way of managing risk involves ranking the largest risks identified in the system from a universe of many. This is done because there is just too many potential risks to

manage for the budget to handle. In this approach each potential risk is treated

independently of the others without considering any potential dependencies between risk

events or interactions among the sources or consequences of the countermeasures. This approach to risk management lacks any real optimization heuristics, but is considered much

easier and less complicated then the first approach.

Typical Risk Management Methods

� Typical Risk Management Methods in use today:

� Failure Modes and Effects Criticality Analysis (FMECA)

� Fault Tree Analysis (FTA)

� Failure Modes and Effects Analysis (FMEA)

� Over 75% of applications today use the FMEA approach.

CombinedCombined

Type 1Type 1

Type 2Type 2

This slide provides a view of the three typical methods used in industry and government today

to manage risk. Included in this slide is a cross-classification of each method with the two general types of risk management approaches which is show to the far left.

Of the three methods shown FMEA embodies all of the characteristics of a Type 2 risk

management approach using an rank ordering of risk classifications as a basis for identifying the largest risks in a system. Of the three methods of risk management, the FMEA method is

the most popular. This is due to its simplicity of use and lack of complex probabilistic

mathematics which is often challenging for most folks untrained in probability theory to grasp.

This said beneath the ranking categories is an implicit sense of probability and uncertainty which is mask by a seemingly simple process of selecting a rating value. Despite the ranking

system the complexity of failure rates, hazard functions, and probability still prevail.

Risk as Defined by FMEA

Risk in FMEA is defined as:

� Severity (of failure effect),

� Occurrence (of cause or failure),

� Detection (of cause or failure)

The FMEA method of risk determination defines risk in three areas as shown in this slide.

These three dimensions in combination address the components of risk as defined by the FMEA method: potential for a failure to occur (Occurrence), the relative level of hazard

(Severity), and the ability to detect or prevent the failure before it happens (Detection). Ideally,

risk is reduced by having clear and accurate understanding of the failure mechanisms, or

reducing the uncertainty associated with each dimension.

AIAG Guidelines:

Severity Ranking Criteria

Reprinted from www.TheNewExcellence.com

This slide shows the ranking criteria supporting the Effect of a failure mode on the system or

user. It ranks the lowest level of risk a “1” and the highest level of risk a “10.” This criteria was developed by the Automotive Interest Action Group (AIAG) to support the management of

both supplier quality and process/product design by the organization via potential risk factors

in a given system.

AIAG Guidelines:

Occurrence Ranking Criteria

This slide shows an example table of risk ranking criteria for the Occurrence of a failure

mode or the cause of a failure mode. Again, a “1” is considered low risk and a “10” is

considered high risk.

Please note the identification of the column 3 labeled Ppk in this table. In reviewing these

tables online I notice a mixed use of the indices Ppk and Cpk in this column. Ppk is called

the Process Performance Index and Cpk is called the Process Capability Index, and they

are interchangeably used as a soft probability measure of failure rates. In reviewing my AIAG reference this table uses Cpk.

There is great confusion in the automotive and other industries on the value and use of

these two process measures. The confusion is so great that they are often used interchangeably even though they measure entirely different things. In the next slide we try

to explain these two estimates in an effort to minimize the confusion.

Time 2

Time 3

Basis for Computation:

Performance vs. Capability Indices

BetweenGroup

VariationWithinGroup

Variation

Time 1

Time 4

Time 5

Variation Pp, Ppk

When making predictions of the future the measures we use should be reliable. A reliable

measure is one that is consistent over time. If measures are not reliable, then their utility in prediction is limited.

When observing the measure of a products, parts, services or any array of items considered

to be identical we don’t usually measure the same values for all pieces of work. Instead, we measure a range of values around a common aim for process that produced the work. This

range of values is called the observed or “total” variation. The components of total variation

as shown are called “within” and “between” group variation. Common terminology used by

others refer to the average “within” group variation as short-term variation and the “total”

variation as long-term variation. These references are non-standard terms that serve to confuse the purpose of breaking the variation into its components.

Time 2

Time 3Time 1Time 4

Time 5

VariationPp, Ppk Cp, Cpk

If process is in a state of control:

Pp ≡ Cp

Basis for Computation:

Performance vs. Capability Indices

Typical practice is to compute the Process Performance Indices using the “total” variation

as part of the calculation, the Process Capability Indices using the average “within” group variation. As you can see these are two different components of variation which can yield

two different results. Looking at the slide we notice a third component of variation called

“between” group variation. If the “between” group variation is too great the process is

considered unreliable. In essence, large shifts in between group variation indicates multiple

isolated causal elements are present in the process. If the between group variation is too great than estimates of the Process Performance Indices, Pp and Ppk, will be poor

predictors of the future performance of the process.

Better predictors of future process performance are Cp and Cpk. These two indices require the process first achieve a state of statistical or stationary control before being computed.

As such, all references to Pp and Ppk in an FMEA exercise should be exchanged with Cp

and Cpk. A better approach would be to use failure rates and probability of failure directly if

possible.

Type 2 RM Approach:

Example of a Process FMEA

Engage Transportation

Take RouteOver North

Bridge

Cross MajorDownstreamIntersection

Arrive at Work On Time

PROCESS

October 10, 2010

This is a simple example used to illustrate the correct application of a Process FMEA.

Please note the correct use of this tool requires you to list the process steps in the far left

column, not process variables. The focus with this tool is on the potential failure modes of

each process step and the relative risk of each cause of a given failure mode indicated as “RPN” which stands for Risk Priority Number.

Example of a Design FMEA

PRODUCT:

Handle

This is simple example used to illustrate the correct application of a Design FMEA supporting

a product.

Please note the correct use of this tool requires you to list the product components in the far left column, not process steps. Using this tool we focus on the potential failure modes of each

product component and on the component-to-component interactions with a goal of

establishing mitigating design controls or redesigning the product such that we can eliminate

high risk failure modes.

InformationalBrief

Some Issues with Some Issues with

the FMEA Approachthe FMEA Approach

“Quality improvement will result from people improving their processes and from management improving the system.” T. Pyzdek

Four Issues with FMEAs + a bonus

� RPNs provide limited Risk Discrimination.

� Prediction overconfidence is common.

� Expert judgments and claims are not consistent.

� No empirical evidence that Risk Rating methods yield useful decision-making information.

� FMEA risk claims are rarely verified with actual follow-up data.

Here again are the fours issues cited earlier plus one additional issue for good measure.

The next slides discuss each of these issues in some detail to give the reader some insight

on the possible weak areas of a risk assessment using Failure Mode and Effects analysis.

Calculation of Risk Priority Number (RPN)

RPN = Severity Rating * Occurrence * Likelihood of Detection

- How many total RPN values are available for the FMEA analysis?

- How many unique RPN values are available for the analysis?

For those unfamiliar with FMEAs we provide a view of this simple calculation used to

compute a Risk Priority Number or RPN.

Using a scale of 1 to 10 for each risk area how many RPN values do you believe are

available for the FMEA analysis? Now, don’t cheat and look ahead. Try instead to think this

answer through.

Of the total number of expected RPN values calculated from the previous question, how

many of them do you believe are unique from all the others? This is a bit tougher questions,

but try to think it through…

Risk Ranking Scale of 1 to 10:

Enumerating RPN Classifications

S O DCalculated

RPN Value

Observed Number

of Classifications

0 0 0 0 0 01 1 1 1 1 1

1 1 2 2 2 3

1 1 3 3 3 3

1 1 4 4 4 6

1 1 5 5 5 3

1 1 6 6 6 9

1 1 7 7 7 3

1 1 8 8 8 10

1 1 9 9 9 6

1 1 10 10 10 9

1 2 1 2 11 0

1 2 2 4 12 15

1 2 3 6 13 0

1 2 4 8 14 6

1 2 5 10 15 6

1 2 6 12 16 12

1 2 7 14 17 0

1 2 8 16 18 15

1 2 9 18 19 0

1 2 10 20 20 15

1 3 1 3 21 6

1 3 2 6 22 0

1 3 3 9 23 0 Microsoft Excel Worksheet

To aid in answering the previous questions let open up Excel or other spreadsheet program

and attempt to enumerate all of the possible combinations of RPN. This slide illustrates how

to set up this evaluation. If you have access to our spreadsheet, then open it up and look it over. Now, can you answer the previous questions?

Plot of RPN Class Counts

Risk Priority Value Plot

0 100 200 300 400 500 600 700 800 900 1000

RPN Values

s(Based on a 1 to 10 Scale)

This slide shows a plot of the enumerated RPN values supporting a 1 to 10 scale for three

ranked classifications.

What do you see in this slide? Do you notice the greatest number of RPN values are clustered around an RPN of about 100? How many RPN values are available above 500?

The distribution of RPN values above shows a bias towards the lower range of all possible

values. Also note, each bar of this plot indicates the number of duplicate RPN classifications

available. These are non-unique classifications. Did you know that RPN values behaved this way prior to our discussion? How might this behaviour affect the risk assessment process?

Summary:

Enumerated RPN Classes with a 1 to 10 Scale

FMEAs have an extremely limited capability to discriminate Risk Classifications

This slide summarizes the plot shown previously. Now you are able to answer the earlier

questions we asked about the measurement used to quantify risk with the FMEA method.

Out of a 1,000 possible unique risk classifications, 10 X 10 X 10, how many are actually

available for use in the FMEA risk assessment method?

Out of the actual number available, how many RPNs provide unique classifications? This

information directly speaks to the ability of FMEA to discriminate between different risk

conditions. We illustrate this effect in the next slide.

Some FMEA Examples

� Suppose S=10, O=9, and D=4 for a given failure mode. What actions might you consider?

� Suppose S=4, O=9, and D=10 for a given failure mode. What actions might you consider?

� Suppose a Hazardous failure with a chance of Occurrence ≅ 30%, and the Ability to Detect in Production is Variable? What actions might you consider?

Look at each of the three entries of risk classifications shown in this slide. Given the

components of risk shown, try to provide a sense of the actions you might take to reduce the potential risks.

Would you consider taking the same actions for each of the three listed risk conditions? If

so, then why when there is great differences observed among similar rankings. If not, then

why when the summarized risk in the form of RPN is the same for all three risk conditions?

Using RPN as the primary measure of risk management seems to present a few challenges.

Let’s look at the entire range of possible risk classification for RPN = 360.

Illustration:

Fifteen “Equivalent” Rankings of

10Impossible9Very High4Very Low

9Very Remote10Very High4Very Low

9Very Remote8High5Low

8Remote9Very High5Low

10Impossible6Moderate6Moderate

6Low10Very High6Moderate

9Very Remote5Moderate8Hazardous

5Moderate9Very High8Hazardous

10Impossible4Moderate9Hazardous

8Remote5Moderate9Hazardous

5Moderate8High9Hazardous

4Mod. High10Very High9Hazardous

9Very Remote4Moderate10Hazardous

6Low6Moderate10Hazardous

4Moderate High9Very High10Hazardous

Ranked Value

Likelihood of Detection

Ranked Value

Likelihood of Occurrence

Ranked Value

Severity

RPN=360RPN=360

Note in the table shown the range of Severity, Occurrence, and Detection values

observed. All 15 combinations support the same risk, measured using RPN.

�� Poor Risk Discrimination

� Risk Priority Numbers are the products of three ordinal-scale values!

Multiplication and Division Operations

Addition and Subtraction Operations

Ranking or Grouping to define

an Ordering among Categories

Legal MathematicalOperations

Example

Characteristics

Temperature scales with an

absolute zero, i.e. Kelvin

Temperature scales of °F or °C w/o an absolute

Places in a contest such as: 1st, 2nd,

and, 3rd

Values that possess ordering, distance, and an absolute zero

Values that possess both ordering and defined distance

Values ranked in a logical order

Ratio Scale

Interval ScaleData

Ordinal Scale Data

Adapted from D.J. Wheeler, The Six Sigma Practitioner’s Guide to Data Analysis, 2005

So, what is the reason for the observed behaviour of RPNs. This slide provides a clue as

shown in the second column of the table.

What is the difference between Ordinal and Ratio scale data? How are RPNs calculated?

�� Prediction Overconfidence

� Buried beneath the risk rankings for Occurrence and Detection is an estimation of probability.

� Most of us don’t understand how probabilities work, and instead relegate our estimates to ranked values.

� This mental gymnastics carries with it some hidden problems.

� For years research psychologists have known that everyone is naturally “overconfident” in their predictions.

� Let’s illustrate this effect in the next slide where we will ask a few trivia TRUE/FALSE questions.

The claims on this slide are supported by the early ground breaking work of Psychologists

Amos Tversky and Daniel Kahnemann.

This work is so well know in the field of decision science that none questions its validity.

Unfortunately, few in industry have heard of or understand the ramifications of this work.

If we have the time during the presentation we will attempt a limited calibration exercise.

If we are unable to conduct this exercise due to time constraints you can do conduct it

yourself. You can check the Appendix for the answer once you complete the first part of

the exercise.

Simple Calibration Exercise

50% 60% 70% 80% 90% 100%The first six values in the constant PI is 3.14139.

50% 60% 70% 80% 90% 100%Modern humans first appeared on the earth about 200,000 years ago.

50% 60% 70% 80% 90% 100%In 2002, the price of a new desktop computer was under $1,500.

50% 60% 70% 80% 90% 100%One meter equals 37.39 inches.7

50% 60% 70% 80% 90% 100%Napoleon was born on the island of Corsica.6

50% 60% 70% 80% 90% 100%M is one of the three most commonly used letters.

50% 60% 70% 80% 90% 100%Mars is always further away from the Earth than Venus.

50% 60% 70% 80% 90% 100%A liter of oil weighs less than a liter of water.3

50% 60% 70% 80% 90% 100%There is no species of three-humped camel.2

50% 60% 70% 80% 90% 100%The ancient Romans were conquered by the ancient Greeks.

Confidence that You are Correct

Answer (T or F)

Statement

Exercise Instructions:

1. Read the statement.

2. Decide whether the statement is True or False.

3. Circle how confident you feel about your answer.

4. Complete all 10 statements.

Find the answers in the Appendix

Results:

Prediction Calibration

� In subjective assessments an evaluator is considered calibrated if the proportion of true assessments equals the average weighted confidence assigned by the evaluator.

� As an example, suppose in our exercise you observed 6 correct answers out of 10 possible, or 60%, and the average confidence of the 6 correct answers was 75%.

� Therefore, in the long-run you claim to have 75% confidence in achieving correct answers, but you actually answered 60% correct answers, therefore:

� If, %Actual Correct < %Confidence : Overconfident

� If, %Actual Correct > %Confidence : Under-confident

Follow the guidance in this slide to compute the percent of correct answers and the average

confidence for correct answers. Make the comparison shown in the bottom of slide.

Please bear in mind this is a simple exercise containing a sample of only 10 questions. Its

ability to determine “subjective assessment” performance is extremely limited.

If you wanted to get a reasonable estimate of “assessment” performance you would need a

minimum of 50 calibration questions to start. So, don’t worry if you did not do well with this

exercise. Accept that it is just an indicator and realize that you may be capable of

overconfident responses.

�� Inconsistent Expert Claims*

� Like the rest of us, Subject Matter Experts tend make “overconfident” claims.

� Unlike the rest of us, SMEs can often make overconfident claims outside their areas of experience and training.

� This overconfidence can present a problem to an assessment team when evaluating subjective risk.

� There is great tendency by team members to give the SMEs far more latitude in making claims than other members.

� Additionally, SMEs provide expert advice on an inconsistent basis, care should be taken when using uncalibrated expert advice without question…

*Thoroughly researched by Tversky,

Kahneman, Lichtenstein, Fishhoff, and

Phillips

Some additional insight from the work of listed researchers. The take-away is to realize that

SMEs are prone to the same judgement errors as the rest of us. Consider this possibility the

next time you receive expert advice from anyone including me!

Work Conducted by the US Navy in 1981

All of the previous information has been known since the mid-1970s and has been codified in

many US military references and guidance.

�� Limited Empirical Evidence for FMEAs

� A review of the literature, past and present, provides little quantitative empirical evidence of FMEA effectiveness.

� Most companies do not track or collect this information.

� There is substantial research showing the effectiveness of Probabilistic Risk Assessment (PRA) over Risk Ranking Methods (RR).

� Many US government agencies have returned to PRA over Risk Ranking methods since the late 70s. (see Appendix for additional details)

It is difficult to find any literature supporting empirical studies on the use and effectiveness of

the FMEA method of risk management.

Excerpt from 2009 IEEE Journal Article

� Meshkat, Leila PhD, Probabilistic Risk Assessment for Decision Making during Spacecraft Operations, IEEE, 2009. Page 1, Sect. 1.1 Quantitative Risk Assessment (QRA) :

Additional support that US government agencies, once enamored by the FMEA method, are

moving back to more conventional Type 1 risk assessment methods.

�� FMEA Risk Claims Rarely Verified

� This comment is supported by the previous one.

� There is usually no closed-loop evaluation for FMEA risk claims against actual warrantee returns, field issues, etc.

� Without this information it is difficult for an operation to know if their risk assessment efforts actually manage product and process risk.

� Without this knowledge, the operation is unable to address any glaring issues with their risk assessment efforts and take the needed improvement actions.

An FMEA assessment is a predictive evaluation of the system(s) under study. In any

empirical scientific endeavor we always gain feedback from the systems we study and compare our predictions to the actual performance.

For some reason, this doesn’t seem to happen with FMEA work in most companies. I’m not

sure why this is the case, but now that you understand this gap perhaps you might consider including the feedback loop into the risk management process at your company. This is no

other way to uncover the short-comings of this method and make the necessary corrections.

Please note the last bullet point on this slide and feel free to give the timeline of risk management methods a look in the Appendix when you have a chance.

Informational

Suggestions for Suggestions for

ImprovementImprovement

Suggestions for Improvement of RAs

� Consider phasing out the use of RPNs when conducting FMEAs.

� Consider sorting the risk evaluations in FMEAs by Severity, then Occurrence, and next Detection—then, work with the ranked failure modes directly. (increases risk discrimination)

� Consider a move to Probabilistic Risk Assessment (PRA) methods in the future as rate data become available:

� Use of Monte-Carlo Simulations (uses knowledge of input distributions)

� Use of Bayesian Inversion Analysis (uses past reliability performance)

� If PRA methods are not viable for your work, then consider adjusting the RPN to a Corrective Priority Number by dividing RPN by the unit cost to implement the corrective action or detection method, see next slide for example.

This is a subject of its own. If interest exist we can work on a separate presentation

covering the bullet points in this slide.

Estimating a Corrective Action Index

An example illustration the last bullet point in the previous slide. The use of CAI as shown

in this slide is adapted from the recent work of Dr. Tony Cox.

Selected References

D. H. Stamantis, Failure Mode and Effect Analysis, (1995), copyright ASQ/ASQC Quality Press.

Automotive Industry Action Group, Potential Failure Mode and Effects Analysis –Reference Manual, (February 1995), Second Edition.

Automotive Industry Action Group, Statistical Process Control (SPC) - Reference Manual, (March 1995), Second Printing.

D. W. Hubbard, The Failure of Risk Management – Why It’s Broken and How to Fix It, (2009), copyright John Wiley & Sons.

D. J. Wheeler, The Six Sigma Practitioner’s Guide to Data Analysis, 311-315 (2005), copyright SPC Press.

L. Meshkat, Probabilistic Risk Assessment for Decision Making during Spacecraft Operations, (2009), IEEE Journal.

Louis Anthony (Tony) Cox, Improving Risk Management, Comparisons and Decisions, November 2012 SIRA Meeting Webinar, Web Link: http://vimeo.com/53151221

Selected References

US DoD Information Analysis Center, Failure Mode, Effects, and Criticality Analysis (FMECA), (1993), Reliability Analysis Center, Rome, NY.

The New Excellence, AIAG FMEA Severity, Occurrence, and Detection Ranking Guidelines, (2009), Web link: www.TheNewExcellence.com

S. Lichtenstein, B. Fischhoff, and L. D. Phillips, Calibration of Probabilities: The State of the Art to 1980, (1981), Perceptronics, Inc. sponsored by the US Office of Naval Research.

G. Keren, On the Calibration of Probability Judgments: Some Critical Comments and Alternative Perspectives, (1997), Journal of Behavioral Decision Making, Vol. 10, 269-278.

G. E. Apostolakis, How Useful is Quantitative Risk Assessment?, (2004), Risk Analysis, Vol. 24, No. 3, 515-520.

http://livingsta.hubpages.com/hub/20-Worst-Accidents-Involving-US-Carriers, 20 Worst Accidents Involving US (Aviation) Carriers.

Louis Anthony (Tony) Cox Jr., Risk Analysis of Complex and Uncertain Systems, (2009), copyright Springer.

InformationalBrief

AppendixAppendix

Other UA Flight 232 Photos

Timeline of Risk Management Methods

1960s – Contractors of NASA

developed and used variants of

FMECA referred to as FMEA

1967 – SAE published

ARP-926 supporting

FMEA approach

1967 – US Civil Aviation

industry adopts FMEA

approach supported by SAE

1970 – Automotive

industry began wide-

spread use of FMEA

1973 – US EPA adopts use of

FMEA approach for Risk assessment

1993 – AIAG publishes first FMEA standard

1994 – SAE publishes first FMEA standard

1949 - Use of FMECA first

standardized in Mil-P-1629

1980 - Mil-P-1629 revised

supporting FMECA to MIL-

STD-1629A

1984 – US government support of

FMECA MIL-STD1629 canceled*

*Major changes in Risk Management Application

Use of FMECAs

Use of FMEAs

Use of FTAs and PRA

Legend

1971 – Begin Wide spread use of

Probabilistic Risk Assessment in

the Aviation industry and FTA

1981 – Mandatory use of PRA and

FTA by US Nuclear power industry*

1962 – Bell Labs develops the

Fault Tree Analysis approach

1970 – US FAA includes use

of FTA into 14CFR25.1309 for

all Transport Category Aviation

*2010 – Widespread use of PRA methods by

the US Dept. of Homeland Security

1987 – Mandatory use of PRA and such

tools as FTA and FMECA by NASA after

shuttle disaster in 1986

Timeline

A Combined RM Approach:

Failure Modes and Effects Criticality Analysis

Hardware and Products,

Processes, Product Applications,

Service Systems, etc.

Failure Effects, Detection Methods, Compensating Provisions, Severity

Failure Rate, Mission Time, Modal

Criticality Number

FMEA Part:

Criticality Analysis:

FMECA Criticality Matrix:

Criticality Matrix:

Some Potentially Useful Analysis Guidance

Type 2 RM Approach:

FTA - Fault Tree Analysis

OR2OR1 AND1

Pa=p1+p2

Pb=p3*p4*p5

Pc=((p6+(p7*p8))

Psystem=Pa+Pb+Pc

PA2=p7*p8

Faults, Errors,

Malfunctions, etc.

A Fault Tree Analysis allows quantitative measure of process risk by bounding the

uncertainty associated with complex undesirable events that are linked together logically in a process. If one can assess the frequency of occurrence for the components of each

undesirable event, then one can estimate the chance that a Fault or Failure can be made.

This slide illustrates a simple example using Fault Tree analysis to understand the logical

structure associated with the error of excluding a buffer during one step in a

biopharmaceutical process.

AIAG Guidelines:

Detection Ranking Criteria

Example of the risk ranking table supporting the Detection risk as provided by AIAG.

Answers to Simple Calibration Exercise

50% 60% 70% 80% 90% 100%The first six values in the constant PI is 3.14139.

50% 60% 70% 80% 90% 100%Modern humans first appeared on the earth about 200,000 years ago.

50% 60% 70% 80% 90% 100%In 2002, the price of a new desktop computer was under $1,500.

50% 60% 70% 80% 90% 100%One meter equals 37.39 inches.7

50% 60% 70% 80% 90% 100%Napoleon was born on the island of Corsica.6

50% 60% 70% 80% 90% 100%M is one of the three most commonly used letters.

50% 60% 70% 80% 90% 100%Mars is always further away from the Earth than Venus.

50% 60% 70% 80% 90% 100%A liter of oil weighs less than a liter of water.3

50% 60% 70% 80% 90% 100%There is no species of three-humped camel.2

50% 60% 70% 80% 90% 100%The ancient Romans were conquered by the ancient Greeks.

Confidence that You are Correct

Answer (T or F)

Statement

Answers to the calibration questions provided in the body of this presentation.

about fmeas -asq(handout)ascendantconsulting.net/ftpdocs/pdfs/about fmeas... · of the three...

Documents

beowulf. the epic hero predestined of mysterious origin...

closed loop enterprise quality management -...

changing the classification of federal white-collar jobs ......

sensitivity analysis of fmea as possible ranking … ·...

dynamic positioning conference october 14-15, 2014 · 2015....

softrel, llc benefits of sre assessment and software...

remote dp trials & fmeas: part of offshore what the...

how to implement a successful fmea process · where will...

the v8 vantage embodies power, - auto-brochures.com...

the quality of fmeas - accendo reliability · the quality...

home | tdk electronics - power line chokesr d 1 mh: fmeas=...

guidance on failure modes and effects analyses...

veterans administration health care: planning for...

beowulf. the epic hero predestined mysterious origin...

lean vs. six sigma -asq handout -...

assessing the nato/warsaw pact military...

case study #1 - problem solving exercise - ascendant...

bozz - voog website building platform · 2020. 11. 20. ·...

guidance on failure modes & effects analyses (fmeas)

the luxury health spa at chewton glen embodies the …...