Prof. Enrico Zio
Reliability of Simple Systems
Prof. Enrico Zio
Politecnico di MilanoDipartimento di Energia
Prof. Enrico Zio
Background
n Definition under IEC 50 (191):n Summarising expression to describe availability and its influencing factors,
reliability and maintainability.n Note: Dependability is only used for general descriptions of non-quantitative
character.
n Broad definition:n Dependability is the methodical approach of estimating, analysing and
avoiding failures in the future.
RAMDependability
Reliability Availability Maintainability
MTBF R(t)
MTBFMTBF + MDTA=
MDT MTTR
Prof. Enrico Zio
Index
1. Probability theory: basic definitions
2. Reliability analysis: theory and examples
Prof. Enrico Zio
Index
1. Probability theory: basic definitions
2. Reliability analysis: theory and examples
Prof. Enrico Zio
Introduction: reliability and availability
• Reliability and availability: important performance parameters of a system, with respect to its ability to fulfill the required mission in a given period of time
• Two different system types:Ø Systems which must satisfy a specified mission within an
assigned period of time: reliability quantifies the ability to achieve the desired objective without failures
Ø Systems maintained: availability quantifies the ability to fulfill the assigned mission at any specific moment of the life time
Maintainablilty:Ability of a unit, under given circumstances, to maintain or respectively to reset its actualstate so that the desired requirements are met, provided that maintenance is carried outusing the specified resources and stated procedures.
Prof. Enrico Zio
Basic definitions (1)
Reliability is the ability of an item to perform a required function under stated conditions for a stated period of time.
Therefore….the failure is an eventwhereby a unit or componentunder consideration is no longercapable of fulfilling a required functionunder stated conditionsfor a stated the period of time.
Prof. Enrico Zio
Basic definitions (2)
The required function includes the specification of satisfactory operation as well as unsatisfactory operation. For a complex system, unsatisfactory operation may not be the same as failure.
The stated conditions are the total physical environmental including mechanical, thermal, and electrical conditions.
The stated period of time is the time during which satisfactory operation is desired, commonly referred to as service life.
Prof. Enrico Zio
Basic definitions (3)
• T = Time to failure of a component (random variable)Ø cdf = FT(t) = probability of failure before time t: P(T<t)
Ø pdf = fT(t) = probability density function at time t:
fT(t)dt = P(t<T<t+dt)
Ø ccdf = R(t) = 1- FT(t) = reliability at time t: P(T>t)
Ø hT(t) = hazard function or failure rate at time t
)()(
)()()|()(
tRdttf
tTPdttTtPtTdttTtPdtth T
T =>+£<
=>+£<=
Prof. Enrico Zio
Hazard function: the bath-tub curve
0 1 2 3 4 5 6 7 8 9 10 11 12 13Operating time t in years
Failu
re ra
te l
Three types of failures:
- Early failures (Infant mortality), caused by errors in design, defects in manufacturing, etc..
- Wear-out failures, caused by ageing.
(Both types are systematic failures and could be prevented by improvement in design, manufacturing, maintenance).
- Random failure: appear spontaneously and purely by chance.
Early failures
Characteristic: The failure rate is initially high, but rapidly decreases.
These types of failure rates result in the traditional bathtub curve
l total
Bathtub curve model of failure rate (Example)
Wear-out failure
Characteristic: The failure rate increases monotonically.
Random failure
Characteristic: Constant failure rate during the whole lifetime of the units.
Prof. Enrico Zio
Hazard function: the bath-tub curve
• The hazard function shows three distinct phases:i. Decreasing - infant mortality or burn in period
ii. Constant - useful life
iii. Increasing - ageing
(i) (iii)(ii)
lThe unit of the failure rate l is failure/time, often indicated as FIT (Failure in Time).e.g. 1 FIT = 1 Failure per 109h in FRU(Field Replaceable Unit) employed in the railway industry
Prof. Enrico Zio
In electronic hardware, particularly computer systems, a field-replaceable unit (FRU) is a circuit board or part that can be quicklyand easily removed and replaced by the user or by a technicianwithout having to send the entire product or system to a repairfacility. The defective unit is found by standard troubleshootingprocedures, removed, and either discarded or shipped back to the factory for repair. The new unit is installed directly in place of the defective one.
The FRU scheme is often the most cost-effective way to maintaincomplex systems, and is a major motivating factor behind the evolution of modular construction. When backed up by good parts availability, knowledgeable technical support, and reader-friendlydocumentation, this approach can minimize system downtime and optimize reliability.
Field-Replaceable Unit (FRU)
Prof. Enrico Zio
The exponential distribution (1)
• hT(t) = l, t ³ 0
• Only distribution characterized by a constant hazard rate
• Widely used in reliability practice to describe the constant part of the bath-tub curve
tT etTPtF l--=£= 1)()(
( ) 00 0
tTf t e t
t
ll -ì = ³í
= <îFT(t)
1
t t
fT(t)l
fT(t) =λe-λt
Prof. Enrico Zio
The exponential distribution (2)
• The expected value and variance of the distribution are:
• Failure process is memoryless
21 1[ ] ; [ ]E T MTTF Var Tl l
= = =
1 22 1
1
1 2 2 11 2 1
1 1
( )
( ) ( ) ( )( | )( ) 1 ( )
1
T T
Tt t
t tt
P t T t F t F tP t T t T tP T t F t
e e ee
l ll
l
- -- -
-
< < -< < > = = =
> -
-= = -
Prof. Enrico Zio
The exponential distribution (3)IEC 61709: Electronic components –Reliability –Reference conditions for failure rates and stress models for conversion
The failure rate under given operating conditions is calculated as follows:
TIUfRe p×p×p×l=lwherelref is the failure rate under reference conditions;pU is the voltage dependence factor;pI is the current dependence factor;pT is the temperature dependence factor.These Parameter are listed, e.g in the SN29000 library!
Reference conditions for climatic and mechanical stresses
Type of stress Reference condition 1)
qamb, ref = 40 °C
Climatic conditions Class 3K3 as per IEC 721-3-3Mechanical stress Class 3M3 as per IEC 721-3-3Special stresses 3) NoneFor details of notes (-1, -2,-3) please refer to IEC 61709
Ambient temperature 2)
The definitions, reference conditions and conversion models used in the IEC 61709 fullycorrespond with the already existing SIEMENS standard SN 29500 method.
Prof. Enrico Zio
The Weibull distribution
• In practice, the age of a component influences its failure process so that the hazard rate does not remain constant throughout the lifetime
altT etTPtF --=£= 1)()(
1( ) 00 0
tTf t t e t
t
aa lla - -ì = ³ïí
= <ïî
2
21 1 1 2 1[ ] 1 ; [ ] 1 1E T Var Tl a l a a
æ öæ ö æ ö æ ö= G + = G + -G +ç ÷ ç ÷ ç ÷ç ÷è ø è ø è øè ø
ò¥ -- >=G0
1 0)( kdxexk xk
Prof. Enrico Zio
Index
1. Probability theory: basic definitions
2. Reliability analysis: theory and examples
Prof. Enrico Zio
Definition of the problem
• Objective:Ø Computation of the system reliability R(t)
• Hypotheses:Ø N = number of system components
Ø The components’ reliabilities Ri(t), i = 1, 2, …, N are known
Ø The system configuration is known
Prof. Enrico Zio
Series system
EPROM Capa.
Resi.
Fan
Trans. Diode
DC/DC
Optic. Module
IC
PINs
LED
Solder joints
ASIC
field-replaceable HW-unit (Example)
EPROM
Optic. Module
PINs
LED
IC IC IC
Trans. DiodeDiode
Trans.Trans.
Fan
Capa. Capa. Capa. Capa. Capa.
Resi.Resi. Resi.Resi. Resi.Resi. Resi.
ASIC
Resistors 1 FIT 8 16 8 FITCapacitors 2 FIT 6 12 12 FITDiodes 8 FIT 3 6 24 FITTransistors 15 FIT 4 12 60 FITICs 25 FIT 4 64 100 FITEPROM 100 FIT 2 64 200 FITDC/DC 40 FIT 2 28 80 FITASIC 250 FIT 2 1016 500 FITFAN 150 FIT 2 10 300 FITOptical Module 800 FIT 2 32 1.600 FITSolder joints 0,1 FIT 1260 126 FIT
3.010 FIT
Failure rates ofl Components
Sum offailure rates
Name ofComponents
No.ofPins
No.ofComp.
Example: tot. failure rate of the HW-unit l unit =
Prof. Enrico Zio
Series system
• All components must function for the system to function
• For N exponential components:
)()(1
tRtRN
iiÕ
=
=
etR tl-=)([ ]
1
1
N
ii
E T
l l
l
=
ì=ïï
íï =ïî
å System failure rate
MTTF
Prof. Enrico Zio
Parallel system
• All components must fail for the system to fail
• For N exponential components:
[ ]1
( ) 1 1 ( )N
ii
R t tR=
= - -Õ
( )1
1 1N
ti
i
R t e l-
=
= - é - ùë ûÕ( )
1
1 1 1
2 11
1 1 1
1
1 1
1 11
N N N
i i j ii i j
N N NN
Ni j i k j i j k
ii
MTTFl l l
l l l l
-
= = = +
- --
= = + = +
=
ì= - +ï +é ùë ûïï
í+ - + -ï + +é ùë ûï
ïî
å åå
åå åå
!
Prof. Enrico Zio
Parallel system: an example
• Two exponential units with failure rates l1 and l2
• For N identical elements, compare series and parallel
! ! ( ) ( ) ( )1 2 1 2 1 2 1 2( ) ( )1 2
1 2
( ) 1 (1 )(1 ) ,t t t t t t tR t t t seriese e e e e e eR RR R
ll l l l l l l l- - - - - + - - += - - - = + - > > =
[ ]1 2 1 2
1 1 1MTTFl l l l
= + -+
1
1
1
N
n parallel MTTF
n
series MTTFN
l
l
=
ü= ïï
ýï= ïþ
å1
1 1N
series paralleln
MTTF MTTFN n
l l=
× = < = ×å
Prof. Enrico Zio
r-out-of-N system
• N identical components function in parallel but only r are needed (parallel system: r = 1)
• For N identical exponential components:
( )eekN
tR t kNktN
rk
ll - --
=
-÷÷ø
öççè
æ=å 1)( 1N
k rMTTF
kl=
=å
Prof. Enrico Zio
Standby system
• One component is functioning and when it fails it is replaced immediately by another component (sequential operation of one component at a time)
• The system configuration is time-dependent Þ the story of the system from t = 0 must be considered
• Two types of standby:Ø Cold: the standby unit cannot fail until it is switched on
Ø Hot: the standby unit can fail also while in standby
Prof. Enrico Zio
Cold standby (1)
• Since the components are operated sequentially, the
system fails at time , which is a random
variable sum of N independent random variables
• Convolution theorem
Example: 2 components
å= =Ni iTT 1
[ ] ò==
ò -==+=Þþýü
¥×-
¥
¥-
0
21212122
11
)()(~)(
)()()(*)()(,)(,)(,
dxxfesfxfL
dxxtfxftftftfTTTtfTtfT
xs
TTTTTT
T
[ ] )(~)(~)(*)()(~ 2121 sfsftftfLsf TTTTT ==
Prof. Enrico Zio
Cold standby (2)
Example: N components
ò-=t
T dxxftR0
)(1)(
Õ==
N
iTiT sfsf
1)(~)(~
Prof. Enrico Zio
Cold standby: an example
• Consider a “cold” standby system of two units
• The on-line unit has an MTTF of 2 years
• When it fails, the standby unit comes on line and its MTTFis 3 years
• Assume that each component has an exponential failure times distribution
(1) What is the probability density function of the system failure time? What is the MTTF of the system?
(2) Repeat assuming that the two components are in parallel in a one-out-of-two configuration
Prof. Enrico Zio
• T1 and T2 are independent random variables denoting the times when the on-line and standby units are operating, respectively
• The system failure time is also a random variable, T = T1+T2
Cold standby: an example – solution (1)
yrs21
1 =lyrs31
2 =l
τ t
On-line, T1 Standby, T2
1 2 1 2 2 2 1 2( ) ( ) ( )1 2 1 2 1 2
0 0 0
( )t t t
t t tsysf t e e d e e d e e dlt l t l l t l l l l tl l t l l t l l t- - - - - - - - -= = = =ò ò ò
( )2 1 2 1 2( ) /3 / 21 2 1 20
2 1 2 1( )tt t t t yrs t yrse e e e e el l l t l ll l l l
l l l l- - - - - - -= = - = -
- -
ò =ò==¥
-¥
0 0
1)(i
tiTii dtetdtttfMTTF i
ll l
)(tfT
Prof. Enrico Zio
Cold standby: an example – solution (2)
yrstu3
=yrst2
=x
( )22
0 0(3 ) 2[ ] [ ]u uMTTF yrs ue e yrs e ex xx
¥ ¥- - - -= - - - - - =
( ) ( )2 21 13 2 5yrs yrs yrsyr yr
æ ö æ ö= - =ç ÷ ç ÷
è ø è ø(3.8yrs for parallel!)
ò ò -==¥ ¥
--
o
yrstyrstT dttetedtttf
yrstu
0
2/3/ )()(3
Prof. Enrico Zio
Hot standby (1)
• The convolution theorem can no longer be used to calculate the reliability of the system, because there is no independence of failures any more
• Simple case of two components: the system will perform its task in the interval (0, t) in either of the two mutually exclusive ways:
Ø the online component 1 does not fail in (0, t)[probability = R1(t) ]
Ø the online component fails in (0, t) [probability = fT1(t)dt];the standby component 2 does not fail in (0, t) [probability Rs(t)]
and it operates successfully from t to t [probability R2(t-t)]
Prof. Enrico Zio
Hot standby (2)
• The system reliability is given by the sum of the probabilities of the two mutually exclusive events:
• For 2 exponential components:
1 1 2
1 2 1
( )1
0
1 ( )
1 2
( ) s
s
tt t
t t t
s
R t de e e e
e e e
t t tl l l l
l l l l
tl
ll l l
- - - - -
- - - +
= + =
= + é - ùë û+ -
ò
)()()()()( 20
11 tttt -+= ò tRRdftRtR s
t
Prof. Enrico Zio
Time-dependent systems: an example
• When both A and B are fully energized they share the total load and the failure densities are fA(t) and fB(t)
• If either one fails, the survivor must carry the full load and its failure density becomes gA(t) or gB(t)
A
B
Find the reliability R(t) of the system if
( ) ( ) tA Bf t f t e ll -= = ( ) ( ) 1k t
A Bg t g t k e kll -= = >
Prof. Enrico Zio
Time-dependent systems: an example - solution
• R(t) = P{system survives up to t} = P{neither component fails before t}+P{one fails at some time t < t, the other one survives up to t, with f(t), and from t to t with g(t)} =
( )( ) ( )( )2 2 (2 )
0 0
2 2t t
k tt t k t ke e d e e e e e dl tl lt lt l l l tl t l t- -- - - - - - -= + = + =ò ò
222
k t te kek
l l- --=
-