lecture 3 maintenance decision making strategies (reliability … · maintenance decision making...
TRANSCRIPT
11Piero Baraldi
LECTURE 3
MAINTENANCE DECISION MAKING STRATEGIES
(RELIABILITY CENTERED MAINTENANCE)
Piero Baraldi
Politecnico di Milano, Italy
22Piero Baraldi
Types of maintenance approaches
Maintenance Intervention
Planned
Scheduled
Replacement or
Repair following a
predefined
schedule
Condition-based
Monitor the health
of the system and
then decide on
repair actions
based on the
degradation level
assessed
Predictive
Predict the
Remaining Useful
Life (RUL) of the
system and then
decide on repair
actions based on
the predicted RUL
2
Unplanned
Corrective
Replacement or
repair of failed units
33Piero Baraldi
3
• Maintenance decision making strategies
• Risk-Based Maintenance
• Reliability Centered Maintenance
44Piero Baraldi
RELIABILITY-CENTRED MAINTENANCE
4
55Piero Baraldi
Reliability-Centred Maintenance (RCM)
• What is it? • A systematic approach for establishing maintenance programs
• Maintenance intervention approaches: • Corrective maintenance• Planned maintenance (scheduled, condition-based)
• Primary objective • Determine the combination of maintenance tasks which will significantly
reduce the major contributors to unreliability and maintenance cost inlight of the consequences of failures
66Piero Baraldi
The RCM Method
• Focus on system functionality
• Find the most important functions of the system
• Avoid and remove maintenance actions which are not strictlynecessary
• When a maintenance plan already exists, the results of RCM isusually the elimination of inefficient preventive maintenance tasks
77Piero Baraldi
RCM Experience
• A wide range of companies have reported success byusing RCM, that is, cost reductions while maintaining orimproving operations regularity:• Aircraft industry. RCM is standard procedure for development of
new commercial aircrafts
• Military forces (especially in the US)
• Nuclear power stations (especially in the US and in France)
• Oil companies. Most of the oil companies in the North Sea areusing RCM
• Commercial shipping
88Piero Baraldi
Main Steps of a RCM Analysis
1. Study preparation2. System selection and definition3. Functional failure analysis (FFA)4. Critical item selection5. FMECA6. Selection of maintenance actions7. Determination of maintenance intervals8. Preventive maintenance comparison analysis9. In-service data collection and updating
99Piero Baraldi
1. Study Preparation
• Form RCM project group (Multi-disciplinarity)
• Define and clarify objectives and scope of work
• Identify requirements, policies, and acceptance criteria with respect to the safety and environmental protection
• Provide drawings and process diagrams (P&ID,…)
• Check discrepancies between as-built documentation and the real plant
• Define limitations for the analysis
1010Piero Baraldi
2. System Selection and Definition
•A standby valve is a maintainable item
•The valve actuator is not a maintainable item
1111Piero Baraldi
RCM Steps 3: Functional Failure Analysis 11
identify
system
functions
identify
functional
failures
judge
functional
failure
criticality
Functional Failure
Analysis
perform
FMECA on
MSI
List of the
dominant
failure
modes
1212Piero Baraldi
3. Functional Failure Analysis
Objectives:
• Identify and describe the system’s required functions and performance criteria
• Describe input interfaces required for the system to operate
• Identify the ways in which the system might fail to function
Pumping system
• To pump a
fluid
• Fluid
Containment
1313Piero Baraldi
3. Functional Failure Analysis
• The criticality of functional failures must be judged on plantlevel and should be ranked with respect to:• S = Safety of Personnel• E = Environment Impact• A = Production Availability• C = Material Loss
• The consequences may be ranked as:• H = High• M = Medium• L = Low• N = Negligible
1414Piero Baraldi
RCM Step 4: Critical Item Selection14
identify
system
functions
identify
functional
failures
judge
functional
failure
criticality
Functional Failure
Analysis
Functional
Significant
Items (FSI)
Maintenance
Cost Significant
Items (FSI)
Maintenance
Significant
Items (MSI)+ =
Critical Item
Selection
List of the
dominant
failure
modes
1515Piero Baraldi
4. Critical Item Selection
1616Piero Baraldi
RCM Step 5: FMECA16
identify
system
functions
identify
functional
failures
judge
functional
failure
criticality
Functional Failure
Analysis
Functional
Significant
Items (FSI)
Maintenance
Cost Significant
Items (FSI)
Maintenance
Significant
Items (MSI)+ =
Critical item
selection
perform
FMECA on
MSI
List of the
dominant
failure
modes
1717Piero Baraldi
6. Failure Modes, Effects and Criticality Analysis
• Objective: identify the dominant failure modes of the MSIs identified in step 4
• This step is performed by filling-in a FMECA sheet
1818Piero Baraldi
FAILURE MODES, EFFECTS AND CRITICALITIES ANALYSIS
(FMECA)
18
1919Piero Baraldi
FMECA
• Qualitative
• Inductive
AIM:
Identification of those component failure
modes which could fail the item
19
2020Piero Baraldi
FMECA: Procedure steps
1. For each item identify its operation modes (start-up, regime, shut-down, maintenance, etc.) and configurations (valves open or closed, pumps on or off, etc.);
2. For each item in each of its operation modes, compile a FMECA table
20
2121Piero Baraldi
FMECA TABLE
FUNCTION:OPERATION MODE:
component
Failuremode
Effect on other
functionality
Effects on other items
Effects on plant
Probability* Severity +
Criticality Detection methods
Protections and
mitigation
Description
Failure modes relevant for the
operational mode
indicated
Effects on the
functionality of the
item
Effects of failure
mode on adjacent item and surroundi
ng environm
ent
Effects on the
functionality and
availability of the entire plant
Probability of failure
occurrence(sometimes qualitative)
Worst potential conseque
nces (qualitativ
e)
Criticality rank of
the failure mode on the basis
of its effects
and probabilit
y (qualitativ
e estimation of risk)
Methods of
detection of the
occurrence of the failure event
Protections and
measures to
avoid the failure
occurrence
21
2222Piero Baraldi
SUBSYSTEM:
OPERATION MODE:
component Functions
PROCESSSHUTDOWN
VALVE
Shutdown the process(Designed with a closing time
of 10s)
FMECA TABLE
22
2323Piero Baraldi
SUBSYSTEM:
OPERATION MODE:
FMECA TABLE
Component Functions Failure Modes
PROCESSSHUTDOWN
VALVE
Shutdown the process(Designed with a closing
time of 10s)
•Close too slowly (> 14s)•Close too fast (<6s)
23
2424Piero Baraldi
SUBSYSTEM:
OPERATION MODE:component Failure mode Effects on other
itemsEffects on subsystem
Effects on plant Probability*
Description Failure modes relevant for the
operational mode indicated
Effects of failure mode on adjacent components and
surrounding environment
Effects on the functionality of the
subsystem
Effects on the functionality and availability of the
entire plant
Probability of failure occurrence(sometimes qualitative)
• Very unlikely: once per 1000 year or seldom
• Remote: Once per 100 year
• Occasional: Once per 10 years
• Probable: Once per year
• Frequent: Once per month or more often
FMECA TABLE
24
2525Piero Baraldi
SUBSYSTEM:
OPERATION MODE:
Safe = no relevant effects
•Marginal = Partially degradated system but no damage to humans
•Critical = system damage and damage also to humans. If no protective actions are
undertaken the accident could lead to loss of the system and serious consequences
on the humans
•Catastrophic = Loss of the system and serious consequences on humans
component Failure mode Effects on other
components
Effects on subsystem
Effects on plant
Probability* Severity + Criticality
Description Failure modes relevant for
the operational
mode indicated
Effects of failure mode on
adjacent components
and surrounding environment
Effects on the functionality
of the subsystem
Effects on the functionality
and availability of
the entire plant
Probability of failure
occurrence(sometimes qualitative)
Worst potential consequences
(qualitative)
Criticality rank of the
failure mode on the basis of its effects
and probability (qualitative estimation
of risk)
FMECA TABLE
25
2626Piero Baraldi
SUBSYSTEM:
OPERATION MODE:
component Failuremode
Effects on other
components
Effects on subsystem
Effects on plant
Probability* Criticality+ Detection methods
Protections and
mitigation
Remarks
Description Failure modes
relevant for the
operational mode
indicated
Effects of failure mode on adjacent components
and surrounding environment
Effects on the
functionality of the
subsystem
Effects on the
functionality and
availability of the entire
plant
Probability of failure
occurrence(sometimes qualitative)
Criticality rank of the
failure mode on the basis
of its effects
and probability (qualitativ
e estimation
of risk)
Methods of
detection of the
occurrence of the failure event
Protections and
measures to avoid the
failure occurrence
Remarks and
suggestions on the need to consider
the failure mode as accident initiator
Evident Failure
(detected instantaneously)
e.g. spurious stop of a running
pump
Hidden Failure
(can be detected only during
testing of the item)
e.g. fail to start of a standby pump
FMECA Table26
2727Piero Baraldi
Exercise: Domestic Hot Water27
2828Piero Baraldi
Example Boiler System: FMECA (1)Component Failure mode Detection
methods
Effect on whole
system
Compensating
provision and
remarks
Critically class Failure
frequency
Pressure relief
valve (V04)
Jammed open
Observe at
pressure relief
valve
↑ operation of
TS controller;
gas flow due to
hot water loss
Shut off water
supply, reseal or
replace relief
valve
Safe Likely
Jammed closeManual testing
No
consequences.
If combined
with other
component
failure: rupture
of container or
pipes
Periodic
inspection;
replacement
Critical Rare
Gas valve
(V03)
Jammed open
Water at faucet
too hot; pressure
relief valve open
(observation)
Burner
continues to
operate,
pressure relief
valve opens
Open hot water
faucet to relieve
pressure. Shut
off gas supply.
Pressure relief
valve
compensates.
IE1
Critical Likely
Jammed close
Observe at
output (water
temperature too
low)
Burner ceases to
operateReplacement Safe Negligible
28
2929Piero Baraldi
Example Boiler System 2: FMECA (2)
Component Failure
mode
Detection
methods
Effect on
whole system
Compensating
provision and
remarks
Critically class Failure
frequency
Temperature
measuring and
comparing device
(Tsc01)
Fail to react
to
temperature
rise above
preset level
Observe at
output (water at
faucet too hot);
Pressure relief
valve opens
Controller, gas
valve, burner
continue to
function “on”.
Pressure relief
valve opens
Pressure relief
valve
compensates.
Open hot water
faucet to relieve
pressure. Shut
off gas supply.
IE2
Critical Negligible
Fail to react
to
temperature
drop below
preset level
Observe at
output (water at
faucet too cold)
Controller, gas
valve, burner
continue to
function “off”.
replacement Safe Negligible
29
3030Piero Baraldi
RCM Steps 3-530
identify
system
functions
identify
functional
failures
judge
functional
failure
criticality
Functional Failure
Analysis
Functional
Significant
Items (FSI)
Maintenance
Cost Significant
Items (FSI)
Maintenance
Significant
Items (MSI)+ =
Critical item
selection
perform
FMECA on
MSI
List of the
dominant
failure
modes
3131Piero Baraldi
6: RCM Decision Logic
Input to RCM Decision logic: the dominant failure modes
Identified in the previous step (FMECA)
Condition Based
Maintenance
Scheduled
Maintenance
Scheduled
Maintenance
Condition Based
Maintenance
Corrective
Maintenance
3232Piero Baraldi
6. Scheduled On-Condition Task
There are three criteria that must be met for an on-condition task to be applicable:
1. It must be possible to detect reduced failure resistance for a specific failure mode (e.g., degradation index, d)
2. It must be possible to define a potential failure condition that can be detected by an explicit task (e.g. threshold for the detection, ddetection)
3. There must be a reasonable consistent age interval between the time of potential failure (tdetect) is detected and the time of functional failure (tfailure)
32
t
dfailure
ddetection
tdetect tfailure
3333Piero Baraldi
6: RCM Decision Logic: Scheduled Overhaul
Input to RCM Decision logic: the dominant failure modes
Identified in the previous step (FMECA)
Condition Based
Maintenance
Scheduled
Maintenance
Scheduled
Maintenance
Condition Based
Maintenance
Corrective
Maintenance
3434Piero Baraldi
6. Scheduled Overhaul
An overhaul task is considered applicable to an item only if thefollowing criteria are met:
1. There must be an identifiable age at which there is a rapidincrease in the items failure rate function.
2. A large proportion of the items must survive to that age.
3. It must be possible to restore the original failure resistanceof the item by reworking it.
34
t
λ(t)
3535Piero Baraldi
6: RCM Decision Logic: Scheduled Replacement
Input to RCM Decision logic: the dominant failure modes
Identified in the previous step (FMECA)
Condition Based
Maintenance
Scheduled
Maintenance
Scheduled
Maintenance
Condition Based
Maintenance
Corrective
Maintenance
3636Piero Baraldi
6. Scheduled replacement36
A scheduled replacement task is applicable only under the following circumstances:
1. The item must be subject to a critical failure.
2. The item must be subject to a failure that has major potential consequences.
3. There must be an identifiable age at which the item shows a rapid increase in the failure rate function.
4. A large proportion of the items must survive to that age.
3737Piero Baraldi
6: RCM Decision Logic: Scheduled Functional Test
Input to RCM Decision logic: the dominant failure modes
Identified in the previous step (FMECA)
Condition Based
Maintenance
Scheduled
Maintenance
Scheduled
Maintenance
Condition Based
Maintenance
Corrective
Maintenance
3838Piero Baraldi
6. Scheduled function test
A scheduled function test task is applicable to an item under the following conditions:
1. The item must be subject to a functional failure that is not evident to the operating crew during the performance of normal duties.
2. The item must be one for which no other type of task is applicable and effective.
38
3939Piero Baraldi
6: RCM Decision Logic: Run To Failure
Input to RCM Decision logic: the dominant failure modes
Identified in the previous step (FMECA)
Condition Based
Maintenance
Scheduled
Maintenance
Scheduled
Maintenance
Condition Based
Maintenance
Corrective
Maintenance
4040Piero Baraldi
6. Run to failure
• Run to failure is a deliberate decision to run to failurebecause the other tasks are not possible or the economics areless favorable.
• Run to failure maintenance is generally considered to be the
most expensive option, and should only be used on low-cost
and easy to replace components that are not critical tooperations.
40
4141Piero Baraldi
7. Determination of Maintenance Intervals
• Scheduled Maintenance tasks are to be performed at regular intervals.To determine the optimal interval is a very difficult task that has to bebased on information about:
• the failure rate function,
• the likely consequences and costs of the failure the PM task is supposed toprevent,
• the cost and risk of the PM task
• …
4242Piero Baraldi
7. Determination of Maintenance Intervals
An opinion:
The RCM – Handbook; Naval Sea Systems Command, S9081-AB-GIB
010/MAINT, US Dept. of Defense, Washington DC 20301, 1983: “The best
thing you can do if you lack good information about the effect of age on
reliability is to pick a periodicity that seems right. Later, you can
personally explore the characteristic of the hardware at hand by
periodically increasing the periodicity and finding out what happens”
4343Piero Baraldi
(Maintenance) Model Granularity
• The “granularity” of the model is determined by the problem and the availability / accuracy of the data
Prater's principle of "optimal sloppiness"
level of detail --->
predictive
power
4444Piero Baraldi
7. Determination of Maintenance Intervals
• Scheduled Maintenance tasks are to be performed at regular intervals.To determine the optimal interval is a very difficult task that has to bebased on information about:
• the failure rate function,
• the likely consequences and costs of the failure the PM task is supposed toprevent,
• the cost and risk of the PM task
• …
• In practice the various maintenance tasks have to be grouped intomaintenance packages that are carried out at the same time, or in aspecific sequence
The maintenance intervals can therefore not be optimized for each singleitem. The whole maintenance package has, at least to some degree, tobe treated as an entity
4545Piero Baraldi
8. Planned Maintenance (PM) Comparison Analysis
Each maintenance task selected must meet two requirements:
1. It must be applicable:
• it can prevent a failure,
• reduce the probability of the occurrence of a failure to anacceptable level
• reduce the impact of a failure
2. It must be cost-effective (i.e., the task must not cost morethan the failures it is going to prevent)
Cost of
FailureCost of PM
4646Piero Baraldi
8. PM Comparison Analysis: ‘Cost’ of a PM Task
• The risk/cost related to maintenance induced failures
• The risk the maintenance personnel is exposed to during the task
• The risk of increasing the likelihood of failure of another item while the one is out of service
• The use and cost of physical resources
• The unavailability of physical resources elsewhere while in use on this task
• Production unavailability during maintenance
• Unavailability of protective functions during maintenance
4747Piero Baraldi
8. PM Comparison Analysis: ‘Cost’ of a Failure
• The consequences of the failure in terms of:
• loss of production
• possible violation of laws or regulations,
• reduction in plant or personnel safety
• damage to other equipment
• The consequences of not performing the PM task even if a failure does not occur (e.g., loss of warranty)
• Increased premiums for emergency repairs (such as overtime, expediting costs, or high replacement power cost)
4848Piero Baraldi
Updating Process
• Short-term interval adjustments
• Medium-term task evaluation
• Long-term revision of the initial strategy
Maintenance
Reference
Plan
System- Maintenance
activitiesgoals
results
4949Piero Baraldi
RCM Comments
• General issues: maintenance people often rely onmanufacturer’s recommendations and end up with toofrequent maintenances
• Difficult task to be dynamically based on the informationavailable at the time, e.g. the knowledge of the failure ratevalue, the probable consequences and costs of the failurethat PM is supposed to prevent, the costs and risks of PM
• Most of the models require information not available. Thiscalls for expert opinion elicitation properly supported bysensitivity and uncertainty analysis