misconceptions of maintenance and reliability - … · the ispe good practice guide on maintenance2...
TRANSCRIPT
March 2013
Misconceptions of
Maintenance and
Reliability A Biopharmaceutical Industry Survival Guide
BPOG Reliability Team
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 2 of 13
Authors and Reviewers
Authors:
Gerard Clarke, Reliability Engineer at Pfizer
James Baillargeon, Instruments and Control Manager at MedImmune
Paul Boles, Senior Technical Manager GMP Manufacturing at Genentech
Rob Christman, Associate Director Global Reliability Engineering at Genzyme
Steve Jones, Director at BioPhorum Operations Group
Reviewers:
Ken Trotta, Maintenance and Reliability Engineering at Bayer Healthcare
Keith Scruggs, Director of Engineering at Baxter Healthcare
BioPhorum Operations Group
At the BioPhorum Operations Group our mission is to CONNECT biopharmaceutical
organizations, provide an effective environment for the community to COLLABORATE on
shared issues and ACCELERATE improvement across the industry. BPOG currently consists of
over 500 active participants from 18 member companies: Abbvie; Amgen; Baxter; Bayer;
Biogen Idec; BMS; Gallus Pharmaceuticals; Genentech; Genzyme; GSK; Janssen (J&J); Lonza;
MedImmune; Merck Inc; Novartis; Pfizer; Regeneron; Sanofi. Find out more
at www.biophorum.com
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 3 of 13
“Organizations are often
slow to adopt because
many of the new
concepts are counter-
cultural.”
Foreward
Anyone who cares to run a Google search
on ‘Maintenance Excellence and
Reliability Engineering’ will get an
indication how prominent the subject has
become within the corporate agenda
(more than six million results). This is
particularly true of the biopharmaceutical
industry – one of the most heavily
regulated – where such concepts are
becoming more widely adopted in
attempts to reduce risk and reduce costs.
Unfortunately, all is not smooth sailing.
Many techniques are still in their infancy
and, while leaders are pressing for wider
adoption, organizations are often slow to
adopt because many of the new concepts
are counter-cultural. Reliability Engineers
spearheading the change find themselves
constantly challenging existing mindsets,
having to educate the non-believers by
introducing sound reliability concepts.
Across a large organization this becomes a
difficult and time-consuming task.
In this brief Survival Guide, we go back to
basics, focus on common misconceptions
and we introduce some of the key
concepts behind Reliability Engineering –
very much in layman’s language. So, if you
are not sure about the difference
between random failure and a bathtub
curve, preventive and corrective
maintenance, or the reasons why
increasing the frequency of maintenance
can be counterproductive, please read on.
Concise enough to be consumed in a short
commute, we believe it will form a
valuable discussion document. We do
hope you find it useful.
BPOG - BioPhorum Operations Group
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 4 of 13
“We start with a simple
definition of preventative
maintenance, and what we
mean by failure.”
“By understanding failure
mode, appropriate
maintenance strategies can be
established to help detect,
prevent or mitigate failure and
improve reliability of the
component.”
Misconception –
preventative maintenance
can prevent all failures
Failure is an unfortunate fact of life.
Systems have a natural tendency to break
and wear out, and the components of any
asset are subject to the effects of wear
and tear. Eventually, components fail.
It is a common misconception that simply
because preventive maintenance is
employed, or the frequency of
maintenance is increased, the risk of
failure can be eliminated.
While preventive maintenance can reduce
the risk of failure, so long as the failure
mode exists (the way in which a system,
subsystem or component fails to meet
design or performance requirements), the
risk of failure remains.
To quantify failure rates, engineers
employ a concept, ‘mean time between
failures’ (MTBF), or the expected time
between inherent failures of a system
during operation. However, even when
MTBF is known, there is still the
uncertainty of when the failure will occur.
Reliability is about managing the
probability of failure over time.
Look more closely at probabilities of
failure in the real world and it is possible
to construct a chart like the one shown on
page 5, first suggested by John Moubray1.
Note how, contrary to popular belief, only
a small percentage of equipment ages or
wears out at the end of its expected life.
In practice, most failures occur in early life
- infant mortality - or completely
randomly at any point in its life.
By understanding the failure mode,
appropriate maintenance strategies can
be established to help detect, prevent or
mitigate failure and improve the reliability
of a component. Nevertheless, 100%
reliability can never be guaranteed in
reality so long as the failure mode still
exists.
1 Reliability Centered Maintenance, 2
nd edition,
John Moubray, Industrial Press Inc., 1997
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 5 of 13
Reliability Centered Maintenance, 2nd
edition, John Moubray, Industrial Press Inc., 1997
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 6 of 13
“Other industries have
improved the way maintenance
is delivered using predictive
and condition-based
techniques.”
Misconception – all
preventative maintenance
is time-based
Historically the biopharmaceutical
industry has adopted mainly time-based
maintenance but, in fact, other more
effective strategies can be used.
Increasingly the industry is improving the
way maintenance is delivered by using
predictive and condition-based
techniques. These techniques are used
extensively in the aerospace and
automotive industries to great effect.
Predictive and condition-based techniques
can be used to anticipate failure ahead of
time, enabling maintenance to perform
repairs in a planned and scheduled
manner, well before failure.
In summary, preventative maintenance
can be divided into three categories:
1. Time-based or age-related. This
type of preventive maintenance applies
where the failure rate increases over time,
so the component is replaced ahead of
expected life to prevent failure in service.
The earlier chart, however, showed that
this pattern applies only to a small
percentage of failures in the real world.
Clearly, it does not make sense for this to
be our primary approach to preventative
maintenance.
2. Run-based or usage-related. This
type of preventive maintenance applies
where the failure rate increases with
usage.
This strategy is a development of the
time-based or age-related approach. If a
component deteriorates only when in
service (ie no deterioration over time if
not used), then maintenance based on
usage is appropriate.
Examples falling into this category are
valve diaphragms replaced after a
predetermined number of process cycles
or stressed component needing
replacement after a number of duty
cycles.
3. Predictive or Conditioned-based.
This type of preventive maintenance
applies to situations in which failure rates
appear randomly, where neither time nor
usage provide good early failure
indicators.
As shown in the earlier chart, this is the
most common pattern of failure and, to
be truly effective, preventative
maintenance programs should reflect this
fact.
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 7 of 13
“Other industries have
improved the way maintenance
is delivered using predictive
and condition-based
techniques.”
In the biopharmaceutical industry,
vibration monitoring of bearings, motors
and gearboxes in plant and equipment is
increasingly common practice, where an
increase in detected vibration can be used
to indicate failure. Such systems provide a
step increase in reliability compared to
invasive time-based replacement.
Similarly, thermography can be used to
monitor the condition of electrical
controls to signal early onset of failure.
On the shop floor, visual inspections
carried out by operators provide early
signals as part of a structured Total
Productive Maintenance system.
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 8 of 13
Misconception – more
frequent maintenance
leads to increased
reliability
The idea that increasing frequency of
invasive maintenance leads to better
reliability is a fallacy.
In many situations, opening up a system
or removing a functioning asset from
production in order to perform invasive
maintenance, may actually increase the
chances of failure.
The concept of iatrogenic (technician-
caused) failures, also referred to as ‘infant
mortality’, speaks to the dangers of
introducing potential failure modes to an
asset by invasively performing tasks on
components that may be working
acceptably, but are placed in a
compromised state by the technician
inadvertently infringing the operational
integrity of equipment.
Adding more frequent intervals can also
introduce high degrees of waste, when
the costs of extra wrench-time, added
materials, and diverted resources is taken
into account. More frequently performed
maintenance also reduces the availability
of equipment for production.
Unfortunately, many preventive
maintenance (PM) programmes set
maintenance frequencies using generic
industry practices without consideration
of the asset and the operating
environment. Worse still, time-based
intervals are often arbitrarily tightened in
a knee-jerk response to failures and
deviations.
Such actions can, in fact, worsen the
situation by inadvertently introducing
infant mortality, leading to poor reliability.
A far more effective approach is to
understand the failure modes and develop
specific strategies to address them.
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 9 of 13
“A failure with a low
probability of occurrence many
still occur, even under the most
robust maintenance strategy.”
Misconception – asset
failure means the
maintenance strategy is
ineffective
Clearly, asset failure could signal a failing
in the maintenance strategy - but not
necessarily. Further analysis and
investigation is required before a
maintenance strategy is deemed to be
ineffective.
The effectiveness of a maintenance
strategy should be evaluated against
targets such as quality, health & safety,
environmental integrity, production
output, operating costs, etc.
For Example:
Quality: Is the maintenance strategy
effective in meeting targets such as
equipment deviations?
Operating Costs: Is the maintenance
strategy effective in meeting targets such
as the MRO budget?
Production Output – Is the maintenance
strategy effective in meeting targets such
as units/month, etc?
Remember, a preventive maintenance
strategy cannot completely eliminate the
risk of failure. Failure with a low
probability of occurrence may still occur,
even under the most robust maintenance
strategy.
An effective maintenance strategy
manages asset-failure to a tolerable risk,
aligned with the business objectives. If
you are meeting your objectives, then the
asset maintenance strategy is effective
with respect to your business objectives.
Managing failure modes
The purpose of an asset maintenance
strategy is to identify those failure modes
which will be managed through
preventive maintenance and those which
will be managed through corrective
maintenance.
If the failure mode is not adequately
addressed by the maintenance strategy,
there may be a need for revision to better
address the failure mode.
If, however, the failure mode is addressed
by the maintenance strategy, then a
review of the strategy may be necessary.
Does an effective maintenance strategy
equate to 100% reliability?
Despite a perfect world goal of 100%
reliability, all assets have failure modes
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 10 of 13
and all failure modes have a failure rate
and a failure pattern.
This is not to say that we give up trying to
improve reliability, on the contrary,
periodic maintenance effectiveness
reviews are used to identify root causes of
recurring failures and drive continuous
improvement in reliability that are
quantifiable to the business.
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 11 of 13
“In practice we find that less
than 5% of maintenance tasks
are critical to product quality,
the rest are there for business
reasons.”
“Having a maintenance
strategy of run-to-failure is
perfectly acceptable when a
failure mode cannot be
detected and the equipment is
deemed to be non-critical.”
Misconception – all
biopharmaceutical
maintenance is critical
If failure impacts product quality, then yes
it’s critical, but if it doesn’t have product
impact, then it needn’t be. In practice we
find that only a small percentage of
maintenance tasks are critical to product
quality, the rest being there for business
reasons.
The ISPE Good Practice Guide on
Maintenance2 cites, “The maintenance
program should help to ensure that the
equipment is continually maintained in a
qualified state and is suitable for intended
use.”
The primary goal of maintenance in the
biopharma industry is to reduce the risk of
a failure that may impact product drug
quality. In this way, the qualified state of
the equipment is preserved through
planned activities with expected
outcomes.
2 ISPE Good Practice Guide on Maintenance,
March 2009
Not all functional failures of an asset,
however, impact drug quality.
Differentiating between those failure
modes that do and those that do not
enables effort to be focused where it is
needed most.
Having a maintenance strategy of run-to-
failure is perfectly acceptable when a
failure mode cannot be detected and the
equipment is deemed to be non-critical.
Conversely, monitoring the condition of
critical equipment provides constant
assurance that the equipment is safely
operating in its qualified state, whilst
providing early signals of wear that may
lead to a failure that affects product
quality.
Managing failure and risk in a complex
biopharmaceutical plant is a complicated
task that can best be handled using risked-
based maintenance methodologies, such
as Reliability Centered Maintenance
(RCM). Reliability-centered maintenance
is a process used to determine what must
be done to ensure that any physical asset
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 12 of 13
continues to do what its users want it to
do in its present operating context3.
3 Reliability Centered Maintenance, 2
nd edition,
John Moubray, Industrial Press Inc., 1997
Misconceptions of Maintenance and Reliability A Biopharmaceutical Industry Survival Guide
Page 13 of 13
Misconception – any
deviation from a PM
schedule will lead to
equipment not fit for use
Another misconception and perhaps the
most dangerous.
Let’s begin with the most demanding
case; critical equipment. Performing
critical maintenance outside the optimum
time interval may increase the risk of a
functional failure that impacts the
qualified state.
However, execution of PM outside of the
optimum interval does not in itself cause
the asset to be no longer qualified or
suitable for intended use, unless the
qualified state or suitability for use is
dependent upon the execution of the PM
task at a specific point.
In the majority of circumstances, this
condition does not apply.
To further illustrate this important point,
consider this example. Forgetting to check
the brake fluid level in your vehicle does
not mean the brakes are about to fail.
What it does mean a higher risk that the
brake fluid might be low, which in turn
might cause a brake failure. The increased
level of risk will depend upon the
condition of the brake system.
So, apart from a very small number of
specific exceptions, deviation from PM
schedule increases risk, but does not
directly cause the asset to be no longer fit
for use. This is not to say that PM tasks
are unimportant; they are important
because they reduce risk and save money.
If an organization falls behind with its
maintenance schedule, it is important to
prioritize work so that the bigger risks are
still addressed and slippage is allowed
only on the lower risk items.
Schedule-adherence at an aggregate level,
therefore, provides a fantastic leading
indicator on the risks that the business is
running. When organizations fall behind,
the most important priority is to clear the
backlog to get back on track.
It is paradoxical that, at this point, many
organizations choose instead to burden
technicians with unnecessary paperwork,
which in turn may cause further delay and
increase the very real risks that they are
busily documenting.
Conclusions
We hope that you find this survival guide
useful. We have deliberately set out to
make this document provocative to gain
your attention and create dialogue. We
encourage you to share it with your
colleagues to enable your organization to
more quickly recognize these
misconceptions whenever they arise and
to stay focused on the actions that will
accelerate the journey towards
maintenance effectiveness.