reliability engineering

148
INTRODUCTION TO ENGINEERING RELIABILITY Lecture Notes for MEM 361 Albert S. D. Wang Albert & Harriet Soffa Professor Mechanical Engineering and Mechanics Department Philadelphia, PA 19104 Drexel University

Upload: syed-sohail-akhtar

Post on 31-Oct-2014

355 views

Category:

Documents


7 download

DESCRIPTION

The notes are very useful for reliability related courses.

TRANSCRIPT

Page 1: Reliability engineering

INTRODUCTION TO ENGINEERING RELIABILITY

Lecture Notes for MEM 361

Albert S. D. Wang

Albert & Harriet Soffa Professor

Mechanical Engineering and Mechanics Department

Philadelphia, PA 19104

Drexel University

Page 2: Reliability engineering

Chapter - I Introduction I - 1

CHAPTER I. INTRODUCTION

Why Engineering Reliability? In a 1985 Gallop poll, 1000 customers were asked: what attributes are most important to you in selecting and buying a product? On a scale from 1 to 10, the following were the results from the poll: * Performance quality 9.5 * Lasts a long time (dependability) 9.0 * Good service 8.9 * Easily repaired 8.8 * Warranty 8.4 * Operational safety 8.3 * Good look 7.7 * Brand reputation 6.3 * Latest model 5.4 * Inexpensive price 4.0 It is interesting to note that the top five attributes are related to “Engineering Reliability”; the study of which is the essence of this course. I-1 Basic Notions in Engineering Reliability. Engineering Product. An “engineering” product is designed, manufactured, tested and deployed in service. The product may be an individual part in a large operating system; or it may be the system itself. In either case, the product is expected to perform the designed functions and last beyond it’s designed service life. Product Quality. It is a “quantitative" measure of the engineered product to meet it’s designed functional requirements, including the designed service life.

Example 1-1: A bearing ball is supposed to have the designed diameter of 10mm. But an inspection of a large sample off the production line finds that the diameters of the balls vary from 9.91mm to 10.13mm, although the average diameter from all the balls in the sample is very close to being 10mm. Now, the bearing ball is an engineering product; it’s diameter is one “quality measure”. The diameter of any given ball off the production line is uncertain, although it is “likely” to be between 9.91mm and 10.13mm.

Example 1-2: A color TV tube is designed to have an operational life of 10,000 hours. After-sale data shows that 3% of the tubes were burnt out within the first 1000 hours; 6.5% were burnt out within 2000 hours. Here, the TV tube is an engineering product and the operating life in service is one “quality measure”. For any one given TV tube off the production line, it’s operating life (the time of failure during

Page 3: Reliability engineering

Chapter - I Introduction I - 2

service) is uncertain; but there is a 3% chance that it may fail within 1000 hours of operation, and a 6.5% chance that it may fail within 2000 hours of operation.

Random Variable. A random variable (denoted by X) is one which can assume one or more possible values (denoted by x); but at any given instance, there is only a chance that X=x. In this context, the “quality measure” of an engineering product is best described by one or several random variables, say X, Y, Z, etc.

Discussion: The diameter of the bearing balls discussed in Example 1-1 can be described by the random variable X; according to the sample taken, X can assume any value between x=9.91mm and x=10.13mm. If a much larger sample is taken from the production line, some diameters may be smaller than 9.91mm and some diameters may be larger than 10.13mm. In the extreme case, X can thus assume any value between 0 and ∝. Similarly, the operating life of the TV tube discussed in Example 1-2 can also be described by a random variable, say Y. From the statement therein, for a given tube, the chance that Y ≤ 1000 hours is 0.03 and that Y ≤ 2000 hours is 0.065.

Probability Function. Let X be a random variable representing the quality measure of a product; and it can assume any value x in the range, say, 0< x < ∝. Furthermore, for X=x, there is a certain associated “chance” or "probability"; this is denoted by f(X=x) or by f(x). Note that f(x) is the probability that the value of X is exactly x.

Discussion: In Example 1-1, the random variable X describes the diameter of a bearing ball. From the sample data, we see that the probability for X<9.91mm, or X>10.13mm, should be very small while the probability for X=10mm should be much higher; this is because the bearing ball is designed to have a target diameter of 10 mm. In this context, f(x) may be graphically displayed as follows:

x, mm10.00 10.139.91

(design target)

f(x)

x*

f(x*)

.05

0.1

If f(x) is known, a number of questions related to the quality of the bearing balls can be rationally answered. For instance, the percentage of the bearing balls having the diameters X ≤ x* is obtained as:

F(X x*) = F(x*) = f(x)dx ≤ ∫9.91

x*

Here, F(x*) represents (1) the probability that a given bearing ball is less than or equal to x*; or equivalently (2) the percentage of the bearing balls with diameters less than or equal to x*. Clearly, for a given bearing ball, the probability that its diameter is larger than x* is: R(x*) = 1-F(x*). Note: Based on the sample given, F(x*) denotes graphically the area under the f(x) curve from 9.91mm to x*; while F(X≤10.13) denotes the total area under the entire f(x) curve. Since the diameter of any given bearing ball is less or at most equal to 10.13 mm, the probability that the diameter of a given bearing ball being less than or equal to 10.13mm is 100%; or F(X≤10.13) = 1.

Page 4: Reliability engineering

Chapter - I Introduction I - 3

Note the difference in notation between f(x) and F(x) and their meanings; f(x) is termed the “probability density function” while F(x) is termed the “cumulative distribution function”. We shall discuss these functions and their mathematical relationships in Chapter II.

Probability of Failure. If the quality of a product X is measured by it’s time of failure during service, then the value of X is in terms of time, t. Let the range of t be 0< t<∝; and the associated probability for X=t is f(t).

Discussion: In Example 1-2, the operating life of the TV tube may be described by the probability function f(t), such as shown below:

Here, the TV tube is designed for a life of 10000 hours of operation; the chance for a given tube to last 10000 hours is better than any other t-values. Based on the sample data, there is a 3% of failure up to t=t*= 1000 hours; thus, we have The above is indicated graphically by the shaded area under the f(x) curve in the interval 0 ≤ t ≤ t*. Here, note the relation between f(t) and F(t).

Product Reliability. Let the quality of a product be measured by the time-to-failure probability density function, f(t); the probability of failure up to t = t* is given by the cumulative distribution function, F(t*). Then, the probability for “non failure” before t = t* is:

F(X t*) = 1- F(t*) = f(t)dt > ∫t*

The term F(X>t*), which is associated with the probability density function f(t), represents the probability of survival, or known as the reliability function. A precise definition of the latter will be fully discussed in Chapter IV.

Discussion: In Example 1-2, the data shows that 3% failed before t ≤ 1000 hrs and 6.5% before t ≤ 2000 hrs. Hence, in terms of the reliability function, we write:

R(1000) = f(t)dt = 0.97∫

1000

R(2000) = f(t)dt = 0.935∫ ∞

2000

t, hrs. 1000

(design target)

f(t)

t*

f(t*)

.05

0.1

F ( X t * ) = F ( t * ) = f ( t ) d t ≤ ∫

0 t *

= 0 . 0 3

Page 5: Reliability engineering

Chapter - I Introduction I - 4

If we want to know the service life t* for which no more than 5% failure (or 95% or better reliability), we can determine t* from the following relation:

R(t*) = f(t)dt = 0.95∫ ∞

t* Clearly, one attempts to answer all the questions regarding product reliability; and this can be done when the mathematical form of f(t) is known. In fact, the knowledge of f(t) or the pursuit of it, is one of the central elements in the study of engineering reliability. I-2. Probability, Statistics and Random Variables. In the preceding section, we used terms such as “random variable”, “probability” of occurrence (say, X=x), “sample data”, etc. which form the basic notions of engineering reliability. How these notions, which are sometimes abstractive in nature, all fit together in a real-world setting is a complicated matter, especially for the beginners in the field. As in nature, the occurrence of some engineered event is often imperfect; indeed, it may seem to occur in random; but when it is observed over a large sample or over a long period of time, there may appear a definitive “mechanism” which causes the event to occur. If the mechanism is exactly known, the probability for the event to occur can be inferred exactly; if the mechanism is not known at all, sampling of a set of relevant data (observations) can provide a statistical base from which at least the nature of the mechanism may become more evident. The latter is, of course, keenly dependent on the details of data sampling and on how the sample is analyzed.

Example 1-3. The number obtained by rolling a die is a random variable, X. In this case, we know all the possible values of X (the integers from 1 to 6) and the exact mechanism that causes a number to occur. Thus, the associated probability density function f(x) is determined exactly as: f(x)=1/6, for x = 1, 2, . . 6. Note, for example, the probability that the number from a throw is less than 3 is given by F(x<3) = f(1) + f(2) = 1/6+1/6 = 2/6 = 1/3. Similarly, the probability that the number from a throw is greater than 3 is given by F(x>3) = f(4) + f(5) + f(6) = 3/6 =1/2 The probability that the number from a throw is any number x ≤ 6 is given by F(x ≤ 6) = f(1)+f(2)+ . . F(6) = 1. Note: In this example, X is known as a discrete random variable, since all the possible values of X are distinct; and the number of all the values is finite. The distribution of f(x) is said to be uniform since f(x)=1/6, for all x = 1, 2, . . 6.

Example 1-4. Now, let us pretend that we do not know any thing about the die. By conducting a sampling test in which the die is rolled N=100 times and each time the integer “x” on the die

Page 6: Reliability engineering

Chapter - I Introduction I - 5

recorded, the following data is obtained from an actual experiment of N=100 throws: X-value: x 1 2 3 4 5 6 -------------------------------------------------------------------------- # times x occur, n 17 14 16 20 15 18 The above is said to be a "sample" of size N=100. It is, comparatively speaking, a rather small sample; even, in fact, only the integers from 1 to 6 can actually be observed, we cannot be certain integers other than 1 to 6 could not appear (remember: we pretended not to know any thing about the die). However, we can infer from the sample quite closely what is actually happening: namely, we observe that the number “1” appears 17 times out of 100 throws; the number “2” appears 14 times out of 100 throws; and so on. Hence, we can estimate the probability density function f(x) as: f(1)=17/100; f(2)=14/100; f(3)=16/100; f(4)=20/100; f(5)=15/100; f(6)=18/100. We see that the estimated f(x) is not uniform over the range of X; but they vary slightly about the theoretical value of 1/6. A graphical display of the above results is more revealing:

.167

i

f(i)

theoretical value

1 2 3 4 5 6

.17 .14 .16.20

.15 .18

It is generally contented that all estimated f(x) would approach the theoretical value of 1/6 if N is sufficiently large, say N=1000 (you may want to experiment on this). The relation between the sample size N and the theoretical probability function is another central element in "statistics". The above example illustrates the statistical relationship between a test sample and the theoretical probability distribution function f(x) for X (X being generated by a specific mechanism-rolling a die for N times). This example casts an implication in engineering: i.e. in most cases, the theoretical f(x) is unknown and the only recourse is to find (estimate) f(x) through a test sample, along with and a proper statistical analysis of the sample.

Example 1-5. Suppose we roll two dice and take the sum of the two integers to be the random variable X. Here, we know the exact mechanism in generating the values for X. First, there are exactly 36 ways to generate a value for X; and X can have any one of the following 11 distinct inters: i = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12. For instance, the number “2” is generated by the sum of “1” and “1” in one throw; while the number “3” is generated by the sum of “1” and “2”; but there is only one chance to obtain X=2, while two chances to obtain X=3 (1+2 and 2+1). Hence, the probability for f(2)=1/36; and f(3)=2/36. In fact, the probabilities associated with each of the 11 values are: f(2)=f(12)=1/36; f(3)=f(11)=2/36; f(4)=f(10)=3/36; f(5)=f(9)=4/36; f(6)=f(8)=5/36; f(7)=6/36. A graphical display of f(i) is shown below:

Page 7: Reliability engineering

Chapter - I Introduction I - 6

i

f(i)

2 3 4 5 6 7 8 9 10 11 12

3/36

6/36

1/36

6/36

Here, X is also a discrete random variable; but it's probability density function f(i) is not uniform. It is, however, a symmetric function with respect to X=7. For X=7, the theoretical probability is 6/36, the largest among the 11 numbers.

Discussion. Again, if we do a statistical sampling by actually rolling 2 dice N times, we may obtain an estimated f(i) function. In that case, we need a very large sample in order to approach the theoretical f(i) as displayed above.

Central Tendency. In most engineering settings, the random variable X often represents some measured quantity from identical products; for example, the measured diameters of a lot of bearing balls off the production line constitute just such a case. In general, the measured values tend to cluster around the designed value (i.e. the diameter), thus the central tendency. To evaluate this tendency is clearly important for design or quality control purposes. The following example illustrates the evaluation of such a tendency:

Example 1-6. A master plumber keeps repair-call records from customers in his service area for 72 consecutive weeks: ------------------------------------------------------------------------------------------------------------------ 71 73 22 27 46 47 36 69 38 36 36 37 79 83 42 43 45 45 55 47 48 60 60 60 49 50 51 75 76 78 31 32 35 85 58 59 38 39 40 40 41 42 42 54 73 53 54 65 66 55 55 56 56 57 49 51 46 54 62 62 54 62 63 64 67 37 58 58 61 62 52 52 ------------------------------------------------------------------------------------------------------------------- Here, let X be the number of repair-calls per week, which seems uniformly random. While one sees that the smallest X value is 22 and the largest is 85, the sample really does not provide a definitive value range for X. Furthermore, since no definitive mechanism(s) could be identified as to how and why the values of X are generated, the true probability distribution of X could never be determined. Hence, instead of looking for the mechanism(s), the sample data can be analyzed in some way to show its central tendency, which may in turn be used to estimate the probability distribution function f(x) for X. Here, we follow a simple procedure as described in the following: First, we note that there are 72 data values in the sample, roughly within the range from 21 to 90; so we divide this range into 7 equal "intervals" of 10; namely, 21-30, 31-40, 41-50, etc. Second, for each

Page 8: Reliability engineering

Chapter - I Introduction I - 7

interval, we count from the sample the number of X values that fall within the interval. For instance, in the first interval (21-30), there are 2 values (22, 27); in the second interval (31-40), there are 13 values (36, 38, 36, 36, 37, 31, 32, 35, 38, 39, 40, 40 and 37); and so on. After all is counted in the 7 intervals, the following result is obtained:

intervals: 21-30 31-40 41-50 51-60 61-70 71-80 81-90 --------------------------------------------------------------------------------------------------------------------------- # X values 2 13 15 22 11 7 2 in interval: In this manner, we can already observe that fewer values fall into the lower interval (21-30), or into the upper interval (81-90); but more values fall into the middle intervals, especially in the central interval (51-60). With the above “interval grouping”, we may estimate the probability for X to fall inside the intervals. Instead of treating X, we introduce a new variable I representing the value-intervals; the values of I are the integers 1 to 7, since there are 7 intervals. Thus, the probability density function of I, f(i) can be approximated as follows: f(1)=2/72 f(2)=13/72 f(3)=15/72 f(4)=22/72 f(5)=11/72 f(6)=7/72 f(7)=2/72. A bar chart for the above is constructed as shown below; it is termed a “histogram” for the sample data considered:

10 20 30 40 50 60 70 80 90 calls

f(i)

10/72

20/72fitted f(x)

f(x)

1 2 3 4 5 6 7 i

The above bar chart displays some important features for the weekly repair-calls. Namely, it suggests that the most probable number of the repair calls occurs at i=4, the 4th value-interval; or 50-60 calls per week. Secondly, the shape of the histogram provides another clue as to the form of the estimated probability distribution function, f(x). Note that the "fitted" f(x) shown in the figure is just a qualitative illustration; details of sample fitting will be further discussed in Chapter III.

Discussions. We note that the “histogram” obtained above is not unique. For one may take more or fewer intervals value intervals in the range from 21 to 90. In either case, one may obtain a somewhat different histogram for the same sample; often, one may even draw a quite different conclusion for the problem under consideration. This aspect in handling sample data will be examined further in later chapters.

Sampling Errors. As has been seen, most engineering-related events are unlike rolling dice. Rather, the mechanisms that generate the random variable X are not exactly known. Moreover,

Page 9: Reliability engineering

Chapter - I Introduction I - 8

the values of X may not be distinct and the range of their values may not be defined exactly either. In such cases, the probability density f(x) can be estimated from test samples. However, questions arise as to the proper size of the sample, possible bias that may have been introduced in taking the sample and the manner the sample is analyzed. This necessitates an evaluation of the confidence level in estimating f(x). We shall defer the discussion of this subject in Chapter III. I-3. Concluding Remarks. This chapter provides an overview of the intrinsic elements that embody the subject of engineering reliability. At the heart is the interrelationship linking the random variable X, the mechanisms which generate the values of X, the statistics of the sample data related to X, and the determination and/or estimation of the probability distribution function f(x). The fundamentals of probability - i.e. the mechanisms that generate the random variable X and the properties of various probability functions are investigated more critically in Chapter II. Sample statistics and methods for fitting sample data, some known probability distribution functions, and sampling error estimate and related subjects are discussed in Chapter III. The basics in reliability and failure rates are included in Chapter IV. Chapter V presents techniques for reliability testing, while Chapter VI discusses applications to some elementary product quality control issues. Throughout the text, simple but pertinent examples are used to illustrate the essence of the important points and/or concepts involved in the subject of concern. A modest number of homework problems are included in each chapter; the students are urges to do the problems with a clear logical reasoning, rather than seeking the “formula” and “plugging in” just for an answer.

Page 10: Reliability engineering

Chapter - I Introduction I - 9

Summary. As a beginner, it is useful to be conceptually clear about the meaning of some of the key terminologies introduced in this chapter, and to distinguish their differences and interrelations:

• The random variable X is generated to occur as an event, by some mechanisms that are or are not known. When it occurs, X may assume a certain real value, denoted by x; and x can be any one value inside a particular range; say, 0 ≤ x< ∞ .

• In one occurrence, there is a chance (probability) that X=x; that chance is denoted by f(x); here X=x means X equals exactly x.

• Since x denotes any value in the range of X values, f(x) is treated mathematically a distribution function

over the value range. Thus, f(x) is the mathematical representation of X; f(x) has several properties and these will be discussed in Chapter II.

• In one occurrence, the probability that X ≤ x is denoted by Px|X ≤ x, or simply F(x); here, F(x) is the

sum of all f(x) where X ≤ x. Similarly, Px|X>x denotes the sum of all f(x) where X>x; sometimes, it is also denoted by R(x) and/or by 1-F(x).

• In this course, we sometimes mix the use of the symbols between Px|X ≤ x and F(x); between Px|X>x

and R(x) and 1-F(x), etc. This can be a source of confusion at times.

• If the exact mechanism that generates X is known, it is theoretically possible that the exact value-range x, along with the theoretical f(x) is also known; if the mechanism is not known exactly, one can only rely upon statistical samples along with a proper statistical analysis methodology in order to determine f(x).

• A certain quality of an engineered product can be treated as a random variable (X); x is then the measure

of that quality in one such product picked in random. Often than not, the exact mechanisms which generate the product quality (X) are not completely known; hence, sampling of the quality and statistical analysis of samples become essential in determining the associate f(x) function for X.

• Engineering reliability is a special case where the random variable X represents time-to-failure, such as

service life-time of a light bulb; for obvious reasons, it is necessary to obtain the time-to-failure probability f(t) for, say, the light bulb.

Assigned Homework. 1.1 Let the random variable X be defined as the product of the two numbers when 2 dice are rolled.

• List all possible values of X by this mechanism; • Determine the theoretical probability function, f(x); • Sketch a bar-chart for f(x), similar to Example I-5; • Compute F(25), explain the meaning of F(25); • Compute R(15), explain the meaning of R(15); • Show that the sum of all possible value of f(x) equals to one.

[Partial answer: there are 18 values for X; f(6)=1/9; f(25)=1/36]

1.2 A coin-bag contains 3 pennies, 2 nickels and 3 dimes. If 3 coins are to be taken from the bag each time, their

sum is then a random variable: X.

Page 11: Reliability engineering

Chapter - I Introduction I - 10

• List all the possible values of X by this mechanism; • Determine the associated probability distribution f(x); • Plot the distribution in a graphical bar-chart; • Show that the sum of all possible value of f(x) equals to one.

[Partial answer: there 56 combinations in drawing “three coins”; but only 9 difference values; $0.20 and 0.25 are among them] 1.3 (Optional; for extra effort) Let the random variable X be the sum of the three numbers when 3 dice are rolled.

• Complete the theoretical probability distribution f(x) for X; • Show your results in a bar-chart; • Comment on the form of the bar-charts obtained by rolling 1, 2 and 3 dice, respectively.

[There are 216 possible outcomes in rolling 3 dice; they provide only 16 values, from 3 to 18; f(10)=27/216; f(13)=21/216; one die gives a uniform f(x); 2 dice yield a bi-linear f(x); . . . ] 1.4 In Example 1-6, the exact mechanism that generates repair calls (X) is not known; but the sample provided

can be used to gain some insight into the probability distribution function f(x).

• Now, by using the class-interval = 6 calls instead of 10 calls, Re-do a histogram for the sample; • Discuss the difference between your histogram and the one obtain in Example 1-5.

1.5 (Optional; for extra effort) The Boeing 777 is designed for a mean service life of 20 years in normal use. Let

the service life distribution be given by f(t), where t is in years; and the form of f(t) looks like the one shown in Example 1-2.

• Sketch f(t) as a function of t (years); and locate the design life (20 years) on the t-axis; • If a B-777 has been in serve for 10 years already, what is the chance that the craft is still fit to fly until for

another 6 years?

[This is a case of "conditional" probability]

Page 12: Reliability engineering

Chapter- II Fundamentals II - 1

CHAPTER II. FUNDAMENTALS IN PROBABILITY

II-1 Some Basic Notions in Probability. Probability of an Event. Suppose that we perform an experiment in which we test a sample of N "identical" products, and that n of them fail the test. If Ν is "sufficiently" large (N ∞→ ) the following defines the probability that a randomly picked product would fail the test:

PX = p = n/N; 0 ≤ n ≤ N; N ∞→ (2.1)

Here, X denotes the event that a product fails the test; it is a random variable because the picked product may or may not fail the test; thus, the value of PX represents the probability that the event X (fail the test) does occur. Clearly, the value of PXis bounded by: 0 ≤ PX ≤ 1. (2.2) The Non-Event. Let X be an event with the probability of PX. We define X’ the non-event of X, meaning that X does not occur. Then, the probability that X’ occurs is given by:

PX’ = 1 - PX (2.3)

The relationship between PX and PX’ can be graphically illustrated by the so-called Venn Diagram as shown below:

PX

PX'

X

X'

In the Venn diagram, the square has a unit area; the (shaded) circle is PX; and the area outside the circle is PX’. If PX is the probability of failure, PX’ is then the probability of survival, or the reliability. Event and Non-Event Combination. In a situation where there are only two possible outcomes (such as in tossing a coin, one outcome is a “head” and the other a “tail”), X is a random variable with two distinct values, say 1 and 0; the associated probability distribution are then: f(1)= PX= p and f(0) = PX’= q. It follows from (2.3) that f(0) + f(1) = q + p =1.

Page 13: Reliability engineering

Chapter- II Fundamentals II - 2

Example 2-1: In tossing a coin, the head will or will not appear; we know that the probability for the head to appear is PX=p =1/2, and that for the head not to appear is PX’=q =1- p =1/2. Similarly, in rolling a die, let X be the event that the number “1” occurs. Here, we also know that PX =p =1/6, and the probability that “1” will not occur is PX’= q = 1-p = 5/6. In the above, we know the exact mechanisms that generate the occurrence of the respective random variable X. In most engineering situations, one can determine PX or p from test samples instead. Example 2-2. In a QC (quality control) test of 500 computer chips, 14 chips fail the test. Here, we let X be the event that a chip fails the QC test; and from the QC result, we estimate using (2.1): PX = p ≅ n/N = 14/500 = 0.028. Within the condition of the QC test, we say that the computer chip has a probability of failure p = 0.028, or a survivability of q = 0.972.

Discussion. In theory, (2.1) is true only when N → ∝. The p=0.028 value obtained above is based on a sample of size N=500 only. Hence, it is only an estimate and we do not know how good the estimate is. There is a way to evaluate the goodness of the estimate; and this will be discussed in Chapter III.

Combination of Two Events. Suppose that two different events X and Y can possibly occur in one situation; if the respective probabilities are PX and PY. The following defined:

PX ∩ Y = the probability that both X and Y occur; and PX ∪ Y = the probability that either X, or Y, or both occur.

X ∩ Y is termed the intersection of X and Y; X ∪ Y is termed the union of X and Y; a graphical representation of these two cases is shown by means of the Venn diagrams:

X Y X Y

X Y∪X Y∩

In each diagram, the outline square area is 1x1, representing the total probability; the circles X

Page 14: Reliability engineering

Chapter- II Fundamentals II - 3

and Y represent the probabilities of the respective events to occur. The shaded area on the left is X ∩ Y, in which both X and Y occur; the shaded area on the right is X ∪ Y, in which either X or Y or both occur. The union is mathematically expressible as: PX ∪ Y = PX+ PY - PX ∩ Y (2.4) which can be inferred from the Venn diagram. Note that the blank area outside the circles in each case represents the probabilities of “non event", that is neither X nor Y will occur, PX ∪ Y'= 1 - PX ∪ Y. Independent Events. If the occurrence of X does not depend on the occurrence of Y, or vice versa, X and Y are said to be mutually independent. Then,

PX ∩ Y = PX∗ PY (2.5)

Expression (2.5) is an axiom of probability and it cannot be shown on Venn diagram.

Example 2-3. In rolling two dice, let the occurrence of #1 in the first die be X and that in the second is Y. In this case, occurrence of Y does not depend on that of X; and we know PX=PY=1/6. It follows from (2.5) and (2.4), respectively, that PX ∩ Y= #1 appears in both dice = (1/6)(1/6) = 1/36. PX ∪ Y= #1 appears in either or both dice = 1/6+1/6-1/36=11/36.

Discussion: The fact that PX ∪ Y=11/36 can also be found as follows: there are in all 11 possible combinations in which #1 will appear in either or both dice - (1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (3,1), (4,1), (5,1), (6,1), while there are a total of 36 possible outcomes.

Conditional Probability. If the occurrence of X depends on the occurrence of Y, or vice versa, then X and Y are said to be mutually dependent. We define

PX/Y = probability of X occurrence, given the occurrence of Y; PY/X = probability of Y occurrence, given the occurrence of X.

It follows the axiom (2.5) that PX ∩ Y= PX/Y∗ PY = PY/X∗ PX (2.6)

Example 2-4. Inside a bag, there are 2 red and 3 black balls. The probability for drawing a red ball out is: P(X=2/5 and that to draw another red ball from the rest of the balls in the bag is: PY/X=1/4. Thus, to draw both red balls consecutively, the probability is: PX ∩ Y=PY/X∗ PX=(1/4)(2/5)=1/10. Example 2-5. An electrical system is protected by two circuit breakers that are arranged in-series. When an electrical surge passes through, at least one breaker must break in order to protect the system; if both do not break, the system would be damaged. In a QC test of the breakers individually, the probability for the breaker not to break is PX=0.02;

Page 15: Reliability engineering

Chapter- II Fundamentals II - 4

and if two are connected in-series, the probability of failure to break of the second, given the failure to break of the first, is much higher: PY/X=PX/Y=0.1. The probability that the system fails by an electric surge is when both breakers fail to break: PX ∩ Y=PY/X∗ PX= (0.1)(0.02)=0.002. The probability that at least one fail to break (when they are in-series) is: PX ∪ Y=PX+PY- PX ∩ Y = 0.02 + 0.02 - 0.002 = 0.038. Discussion. If failure-to-break of one breaker does not affect the other, the failure probability of the system is when both fail to break: PX ∩ Y=PY∗ PX= (0.02)(0.02)=0.0004.

Mutually Exclusive Events. In a situation where if X occurs then Y cannot, and vice versa, then X and Y are said to be mutually exclusive. In the Venn diagram, X and Y do not intersect. So,

PX ∩ Y = 0. (2.7)

It follows from (2.4) and (2.6), respectively, that PX ∪ Y = PX + PY (2.8) PX/Y=PY/X=0 Discussion: To illustrate mutually exclusive or non-exclusive events, consider the following example: for a deck of poker cards, the probability of drawing an "ace" of any suite is P(X=4/52, and that of drawing a "king" is P(Y=4/52. These are two independent events but mutually exclusive in a single draw; since if an "ace" is drawn, it is impossible to draw a "king". Thus, PX ∩ Y = 0; PX ∪ Y = PX+PY = 8/52 Alternatively, given the occurrence of an "ace", the probability of drawing a "king" (in a single draw) is PY/X = 0. Now, in a single draw, the probability of getting a "heart" is PZ=13/52; the chance of getting an "ace of heart" is then PX ∩ Z=PX∗ PZ = (4/52)(13/52) = 1/52. The chance of getting either an "ace" or a "heart" is the union of X and Z: PX ∪ Z = PX+PZ-PX ∩ Z = 4/52+13/52-1/52 = 16/52 Note that to get an "ace" (X) and a "heart" (Z) is not mutually exclusive, while to get an "ace" and a "king" (Y) are mutually exclusive.

Combination of N-Events. In a situation where N possible events X1, X2, X3, . . . XN can

occur their intersection or union cannot be obtained. However, if they are independent events, then their intersection is:

Page 16: Reliability engineering

Chapter- II Fundamentals II - 5

PX1 ∩ X2 ∩ X3 . . . ∩ XN = PX1∗ PX2∗ PX3∗ ∗ ∗ PXN (2.9)

Note that (2.9) represents the probability that all N events occur. Similarly, X'1, X'2, X'3, . . X'N are the respective non-events; and their intersection is:

PX'1 ∩ X'2 ∩ X'3. . . ∩ X'N=PX'1∗PX'2∗PX'3∗ ∗ ∗ PX'N (2.10)

And the above is the probability that none of the N events occur. As for the union of the N-events PX1 ∪ X2 ∪ X3 . . . ∪ XN, it represents the probability that

one or more or all of the N events occur. Since Pone or more or more events occur+ Pnone occurs = 1, we can write:

PX1 ∪ X2 ∪ X3 . . . ∪ XN+[PX'1∗ PX'2∗ PX'3∗ ∗ PX'N]= 1 (2.11)

Note that (2.11) is the total probability of all possible outcomes; thus, it is a unity. The terms PX'i, i=1,2,...N in (2.11) can be replaced by

PX'i = 1 - PXi; i=1,2,..N (2.12)

Hence, the union of all the N-events can be expressed in the following alternate form:

PX1 ∪ X2 ∪ X3 . . . ∪ XN=1-[1-PX1]∗ [1-PX2] ∗ ∗ ∗ [1−PXN] (2.13)

A Special case: if PXi = p for all i=1,N, then the intersect in (2.9) becomes: PX1 ∩ X2 ∩ X3 . . . ∩ XN = pN;

And the union in (2.13) becomes:

PX1 ∪ X2 ∪ X3 . . . ∪ XN = 1- (1-p)N

The example below illustrates such a special case. Example 2-6: A structural splice consists of two panels connected by 28 rivets. QC finds that 18 out of 100 splices have at least one defective rivet. If we assume defective rivets occur independently and the probability of being defective is p, what can we say about the quality of the rivets? Here, let Xi, i=1,28 be the event that the ith rivet is found defective in one randomly chosen splice;

and PXi= p. The probability for one or more or all rivets to be found defective in one splice is the

union of all Xi, i=1,28: PX1 ∪ X2 ∪ X3 . . . ∪ X28.

But, QC finds the probability of a splice to have at least one defective rivet is 18/100; hence,

PX1 ∪ X2 ∪ X3 . . . ∪ X28 = 1-[1-p ]28 = 0.18.

Page 17: Reliability engineering

Chapter- II Fundamentals II - 6

Solving, we obtain p = 0.0071. We can say that about 7 out of 1000 rivets may be found defective Discussion. In this example, QC rejection rate of the splice (0.18) is given but the probability of being defective of a single rivet is not. By using the definitions of intersect and union of multiple events, we can estimate the probability of being defective for a single rivet, p. Conversely, if p is given, we can use the same relations to estimate the rejection rate of the splice.

II-2 The Binomial Distribution. A special case of N-events leads to the so-called “binomial distribution”; it is also referred to as the “Bernoulli Trials”. Specifically, if the events X1, X2, . . XN are independent as well as

statistically identical: PXn= p and PX'n =1-p =q ; for n = 1,2,...N.

Then, we can always write:

PXn+ PX'n = p +q = 1 for n=1,2,...N (2.14)

It follows that

(q +p )N = 1 (2.15)

Note that (2.15) is a binomial of power N; upon expansion, we have

CNoq N+CN1q N-1p +CN2q N-2p 2 + . . + CNiq N-ip i + . . + CNNp N = 1 (2.16)

where

CNi = (N!) / [(N-i)! i!] i = 0, 1, 2, . . . N. (2.17)

It turns out that each term in the binomial expansion (2.16) has a distinct physical meaning: CNoq N = qN is the probability that none of the N-events ever occur: f(i =0);

CN1q N-1p is the probability that one of the N-events occur: f(i =1);

CN

Np N = p N is the probability that all of the N-events occur: f(i =N); and In particular,

CNiq N-ip i are the probabilities that i out of N events occur: f(i).

The expression

Page 18: Reliability engineering

Chapter- II Fundamentals II - 7

CNiq N-ip i = f(i) (2.18)

is known as the binomial distribution, representing the probability that: of the N-events there will be exactly i events (i = 1, 2, . . .N) that will occur. Note that (2.16) is the total probability for all possible outcomes:

f(0) + f(1) + f(2) + . . . + f(N) = 1 (2.19)

It can, in turn, also be rewritten as:

f(1) + f(2) + . . . + f(N) = 1 - f(0) = 1- q N (2.20)

which is the probability for one or more events to occur (or the union of all events).

The binomial distribution (2.18) is also known as the Bernoulli trial; the meaning of the trial is illustrated in the following examples:

Example 2-7: Suppose that a system is made of 2 identical units A and B. Under a certain prescribed operational conditions, the failure probability of each unit is p. For the system, one of the following 4 situations may occur during the prescribed operation: A-fail and B-fail; A-fail, B-not fail; A-not fail, B-fail; A-not fail, B-not fail. The associated probabilities for the four situations are, respectively, pp, pq, qp and qq. Thus, we have f(0) = probability of none failure = qq f(1) = probability of just one failure = pq + qp = 2pq f(2) = probability of two failures = pp Check that the total probability of all possible outcomes is 100%:

f(0)+f(1)+f(2) = q 2+ 2pq +p 2= (q +p )2 = 1

The above follows the binomial distribution for N=2: (q+p)2 = 1. Example 2-8: In rolling a die repeatedly, what is the probability that the #1 appears at least once in 4 trials? Here, the same event (the appearance of #1) is observed in N=4 repeated "trials"; the random variable of interest is the number of times the observed event occurs; this can be any number i = 0, 1, 2, 3 or 4. This is a trial case known as the Bernoulli Trial: Now, for f(0), f(1), f(2), f(3) and f(4) we can write:

f(0): no no no no = q4

f(1): yes no no no = pq3

no yes no no = pq3

Page 19: Reliability engineering

Chapter- II Fundamentals II - 8

no no yes no = pq3

no no no yes = pq3

f(2): yes yes no no = p2q2

yes no yes no = p2q2

yes no no yes = p2q2

no yes yes no = p2q2

no yes no yes = p2q2

no no yes yes = p2q2

f(3): yes yes yes no = p3q

yes yes no yes = p3q

yes no yes yes = p3q

no yes yes yes = p3q

f(4): yes yes yes yes = p4

The total probability of all outcomes is thus:

f(0)+f(1)+f(2)+f(3)+f(4) = q 4+ 4q 3p +6q 2p 2+4q p 3+p 4= (q +p )4= 1 Again, it follows the binomial distribution. Discussion: For engineering products, a system or a single component may be under repeated and statistically identical demands. Say, in each demand, the failure probability is p =1/6 while that for non-failure is q =5/6. Then, the probability that failure of the system (or component) occurs at least once in 4 repeated demands is given by:

f(1)+f(2)+f(3)+f(4) = 1- f(0) = 1 - q 4 = 1 - (5/6)4 = 51.8% The above result can be obtained by applying (2.18) to (2.20) directly.

The Pascal Triangle. The coefficients in the binomial expansion of power N in (2.17) can be easily represented by a geometrical construction, known as the Pascal Triangle, if N is not very large:

1 N=0 1 1 N=1 1 2 1 N=2 1 3 3 1 N=3 1 4 6 4 1 N=4 1 5 10 10 5 1 N=5 1 6 15 20 15 6 1 N=6 1 7 21 35 35 21 7 1 N=7 1 8 28 56 70 56 28 8 1 N=8 . . . . . . . . . . . . . . . . . . . . . . . . . . . Example 2-9. Suppose the probability of a light bulb being burnt out is p whenever the switch is turned on. In a hallway, 8 light bulbs are controlled by one switch. Compute f(0), f(1), . . , f(8) when the switch is turned on.

Page 20: Reliability engineering

Chapter- II Fundamentals II - 9

Using the Pascal Triangle, for N=8, we can quickly write:

f(0) = q8; f(1) = 8q7p; f(2) = 28 6p2; f(3) = 56q5p3; f(4) = 70q4p4;

f(5) = 56q3p5; f(6) = 28q2p6; f(7) = 8q p7; f(8) = p8.

The Poisson Distribution. While the Pascal triangle becomes cumbersome to use when N is large, say N>20, there is a simple expression for (2.18) when N is large and p is small (say N>20 and p <<1):

f(i) = (Np )i exp[-Np ]/(i!) i = 0,1,2, . . N (2.21)

Expression (2.21) is known as the Poisson Distribution. It is an approximation of the binomial distribution (2.18) when N is large and p small.

Example 2-10: Suppose that the probability for a compressor to pass a QC test is q=0.9. If 10 compressors are put through the QC test, compute the various probabilities of failure: f(i), i=0,1, . . 10. In this case, N=10 and p =0.1; the binomial distribution (2.18) and the simplified Poisson distribution (2.21) will be used; the following are the respective results: f(0) f(1) f(2) f(3) f(4) f(5) f(6), f(7), f(8), f(9), f(10) ---------------------------------------------------------------------------------------------- by (2.18) 0.349 0.387 0.194 0.057 0.011 0.0015 ≈ 0 by (2.21) 0.368 0.368 0.184 0.061 0.015 0.0031 ≈ 0 ------------------------------------------------------------------------------------------------- Discussion: The exact binomial distribution (2.18) gives a maximum at f(0)=0.349 and f(1)=0.387; the Poisson approximation (2.21) gives f(0)=f(1)=0.368. For the rest, the two distributions are rather close. The Poisson approximation would yield better results if N is larger or p smaller. In this example, the N (=10) value is not large enough and p (=0.1) is not small enough, thus the difference.

II-3. Properties of Discrete Random Variables. The random variable X is discrete when it’s values xi (i =1,N) are distinctly defined and N is

finite. If the probability that X takes the value xi is f(xi), f(xi) has the following important

properties: The axiom of total probability:

Σ f(xi) = 1; Σ sums over i =1, N. (2.22)

Here, the function f(xi) is formally termed as the probability mass function of X, or pmf for

short. It is called “mass” function in the sense that the probability f(xi) is exacted at X=xi, much

like the lumped-mass representation of particle dynamics in physics; the expression in (2.22) which equals to unity, resembles the “total” of all the “lumped masses”.

Page 21: Reliability engineering

Chapter- II Fundamentals II - 10

The partial sum (for n ≤ N): F(xn) = Σ f(xi) Σ sums over i =1, n. (2.23)

It is termed the cumulative mass function of X, or CMF for short. Note that 0 F(x n) 1 for 1 ≤ n ≤ N. The Mean of X: The mean of f(xi) is defined as:

µ = Σ xi f(xi); Σ sums over i =1, N (2.24)

The Variance of X: The variance of f(xi) is defined as:

σ2 = Σ (xi-µ)2 f(xi); Σ sums over i =1, N (2.25)

By utilizing (2.24), the variance defined in (2.25) can be alternately expressed as:

σ2 = Σ xi2 f(xi) - µ2 Σ sums over i =1, N (2.26)

The standard deviation: The standard deviation is σ, as defined in (2.26). The above properties are illustrated by the following example:

Example 2-11: Suppose the values of X are given as: (0, 1, 2, 3, 4, 5) and the associated pmf are: f(0)=0, f(1)=1/16, f(2)=1/4, f(3)=3/8, f(4)=1/4 and f(5)=1/16. Compute the mean, variance and standard deviation for X. We compute: * the total probability by checking the sum: 0 + 1/16 + ¼ + 3/8 + 1/4 + 1/16 = 1; * the mean of X by applying (2.24): µ = 0∗ 0 + 1∗ (1/16) + 2∗ (1/4) + 3∗ (3/8) + 4∗ (1/4) + 5∗ (1/16) = 3 * the variance of X by applying (2.26):

σ2 = [02∗ 0+12(1/16)+22(1/4)+32(3/8)+42(1/4)+52(1/16)] −32 = 1 * the standard deviation is computed as σ = ±1. Discussion: It is geometrically revealing if the pmf is represented by a bar chart versus x and the CMF by stair case chart, as shown in the figures on the next page. Note that the sum of the pmf bars (lumped masses) is a unity and that the center of gravity of the total mass is at x=µ =3. Note that the mean is not the average of all of the values of X.

Page 22: Reliability engineering

Chapter- II Fundamentals II - 11

It is also interesting to note that the mass moment of inertia about the mean is σ2, which equals 1; the radius of gyration of the total mass about mean is σ, which is the standard deviation. This analogy in physics is sometimes helpful.

pmf

0 1 2

xi

1/4

1/2

3 4 5

0.5

1.0

CMF

0 1 2 43 5

xi0

1/165/16

11/16

15/16

Note: In the upper figure, the length of the bar at each xi represents the value of f(xi); in the lower

figure, the stair-case rise at each xi equals the value of f(xi). The charts provide a geometric view of

the pmf and CMF, respectively. Example 2-12. For the binomial distribution f(i) given by (2.18), it’s mean can now be found by using (2.24):

µ = Σ (i) CNi(1-p )N-ipi = Np; Σ sums over i = 1, N

The variance is found by using (2.25):

σ2 = Σ (i-µ)2 CNi(1-p )N-ipi = Np(1-p ); Σ sums over i = 1, N

The proof of the above results for µ and σ2 requires manipulations of the binomial terms in (2.18);

Page 23: Reliability engineering

Chapter- II Fundamentals II - 12

the details of which will not be included here; interested readers may consult the text by E. E. Lewis (2nd Edition, p.23). Recall that the Poisson distribution (2.21) is a simplified version of the binomial distribution when

N → ∝ and p <<1. The mean µ and the variance σ2 for the Poisson distribution are readily obtained from the above:

µ = σ2 = Np Thus, the Poisson distribution is a “one-parameter” exponential function (see 2.21):

f(i) = µi e-µ/i! The Expected Value: Suppose that a weighting function g(xi) is accompanying f(xi) whenever

the random variable X assumes the value of xi. Then, the expected value of X with respect to the

weighting function g(xi) is defined as:

E(g) =Σ g(xi)f(xi) Σ sums over i =1, N (2.27)

Here, the weighting function g(xi) may be understood as follows:

• Each time xi is taken by X, a value of g(xi) is realized. But, for X to assume xi,

there is the probability f(xi); hence, the expected realization is g(xi)f(xi);

• E(g) is the cumulative realized value for all possible values of X.

Example 2-13. Suppose in Example 2-11 the accompanying weighting function is g(xi)=αxi, where

i=0 to 5; and α is a real constant. Then, the expected value E(g) is given by (2.27): E(g) = α[0∗ 0+1(1/16)+2(1/4)+3(3/8)+4(1/4)+5(1/16)] = αµ = 3α.

Applications of the expected value formula in (2.27) will be detailed in chapter VI.

Summary of Sections II-2 and II-3. The following table summarizes the essential elements in the discrete random variable X discussed in Sections II-2 and II-3: Function/Properties General Discrete Special Case: Bernoulli Trials Binomial Poisson ------------------------------------------------------------------------------------- values of X xi, i=1,N i = 0, 1, 2, . N -----------------------------------------------------------------------------------------------------------------------

pmf f(xi) f(i)=N!/[(N-i)!i!]piqN-i f(i)=(Np)i/i!e-Np ----------------------------------------------------------------------------------------------------------------------- CMF F(xn)=Σf(xi); over 1,n f(n)=Σf(i), over 0,n

Page 24: Reliability engineering

Chapter- II Fundamentals II - 13

----------------------------------------------------------------------------------------------------------------------- mean of X, µ µ=Σxif(xi); over 1,N µ=Np -----------------------------------------------------------------------------------------------------------------------

variance of X, σ2 σ2=Σxi2f(xi)-µ2 σ2=Np(1-p) σ2=Np ----------------------------------------------------------------------------------------------------------------------- non-event (reliability) R(xn)=1-F(xn) f(0)=(1-p)N f(0)=e-Np ----------------------------------------------------------------------------------------------------------------------- II-4. Properties of Continuous Random Variables. A random variable X is continuous when it’s value x are continuous distributed over the range of X. For definiteness, let the range of X be: -∝ < x < ∝; then the probability that X takes the value x in the range is a continuous function, f(x). Here, f(x) is formally termed the probability density function; or pdf for short. The pdf must satisfy the axiom of total probability: The cumulative distribution function, or CDF for short, is defined as:

(2.29)−∝∫ f(x) dx x

F(x) =

F(x) represents the probability that value of X x.

The Mean µ and The Variance σ2 of f(x) are:

(2.30)

−∝∫ x f(x) dx

∝µ =

(2.31)∫ f(x) dx ∞

σ2 = −∞

(x-µ)2

The variance in (2.31) can alternatively be expressed as:

( 2 . 3 2 )∫ f ( x ) d x

∞ σ 2 =

− ∞ x 2 - µ 2

The Proof of (2.32) is left in an exercise problem at the end of this Chapter.

Discussion: A pdf, which is common engineered product quality variation, is shown graphically on the next page. This f(x) resembles a bell-like curve; the “head” of the curve is with diminishing f(x) as x increases, while the “tail’’ is with diminishing f(x) as x decreases. The maximum of f(x) occurs

∫ ∝

− ∝ f ( x ) d x = 1 ( 2 . 2 8 )

Page 25: Reliability engineering

Chapter- II Fundamentals II - 14

at x=xmode (which can be determined by setting df(x)/dx=0).

The total area under f(x) is the integral in (2.28), which must equal to 1; the centroid of the area under the f(x) curve, represented by the integral (2.30), is located at x=µ, which is the mean of X; the area moment of inertia, with respect to the axis x=µ and represented by (2.31) or (2.32), is the

variance σ2. Finally, the radius of gyration of the area under f(x), revolving about the axis x=µ, is given by ±σ; it is the standard deviation of X.

∝ −∝

f(x)

x →

µµ

σσσσ

x mode

The Expected Value: given g(xi) accompanying f(xi), the expected value of X with respect to

g(x) is given by:

(2.33)−∞∫ g(x) f(x) dx

∞E(g) =

Note: when g(x)=x, E(g)=µ; when g(x)=x2, E(g)= σ2+ µ2. Thus, the expected value of X is it’s mean;

the expected value of X2 is (σ2+ µ2). The latter is readily seen from (2.32).

The Median of X: The median of X is when X = xm such that

(2.34) −∞∫ f(x) dx = 1/2 F(x m) =

xm

From the geometric point of view, X=xm separates the total area under the f(x) curves into two

halves. The Mode of X: The mode of X is when X = xmode such that it corresponds to the maximum of f(x) as discussed above. Note: The mean µ, median xm and mode xmode are distinctly defined quantities; each has it’s own physical

Page 26: Reliability engineering

Chapter- II Fundamentals II - 15

meaning. But the three quantities become the same if f(x) is a symmetric respect to the mean: xmode= xm = µ. Skewness of f(x): The pdf is a skewed distribution when it is not symmetric with respect to the mean; then in general xmode ≠ xm≠ µ. Α measure for the skewness of f(x), known also as the skewness coefficient, is given by:

(2.35) −∝∫ (x- ) f(x) dx ∝

sk =1

σ33

µ

If X is discrete, then (2.35) can be expressed as

sk = (1/σ3)Σ (xi - µ)3 f(xi) Σ sums over 1,N. (2.36) It may be shown that when, sk > 0, f(x) is a left-skewed curve: xmode< xm< µ

sk < 0, f(x) is a right-skewed curve; xmode> xm> µ

sk = 0, f(x) is a symmetric curve; or xmode = xm = µ. A graphical display of the left-skewed and right-skewed curves is shown below:

Example 2-14. Let X be a continuous random variable with it's values defined in the interval a x b. Suppose that the corresponding pdf is a constant: f(x)=k. This is a case of uniform distribution. Now, in order for f(x) to be a bona fide pdf, it must satisfy the total probability axiom: b

∫ kdx = k(b-a) = 1;

a This yield the value for k: k = 1/(b-a).

x

f ( x )

r i g h t s k e w l e f t s k e w

Page 27: Reliability engineering

Chapter- II Fundamentals II - 16

The cumulative function F(x) is obtained by integrating f(x) from a to x: F(x) = (x-a)/(b-a) The mean and variance can be easily obtained:

µ = (a+b)/2 σ2 = (b-a)2/12 Example 2-15. Let the random variable T be defined in the value-range of 0 t < ∝ and the pdf in the form:

f(t) = λe-λt Here, the pdf is an exponential function. It’s CDF is easily integrated as:

F(t) = 1 - e-λt The mean and the standard deviation are also easily obtained: σ = µ = 1/λ. Example 2-16. The life-time (time to failure) of a washing machine is a random variable. Suppose that it’s pdf is described by the function:

f(t) = A t e- 0.5t 0 ≤ t < ∝ where A is a constant and t is in years. We examine the following properties of this pdf: (a). The CDF:

A t e dt = 0∫ - 0.5tt

F(t) =0

∫ - 0.5tt

(0.5t) e d(0.5t) [1 - (1 + 0.5t) e- 0.5t ](4A) (4A)=

where the integration is carried out by integration by parts. (b). The pdf satisfies the axiom of total probability F(∝) = 1, we have

0∫ f(t) dt = ∝

0∫ A t e dt = 1∝

- 0.5t

Upon carrying out the integration, we obtain A = 1/4. (c). The mean of the pdf, which is also called “the mean time to failure”; or MTTF for short:

0∫ t (A t e )dt = - 0.5t∝

] = 4µ = (1/4) [ 2!/(0.5)3

Page 28: Reliability engineering

Chapter- II Fundamentals II - 17

where the integration is carried out using an integration table. (d). The variance is given by:

0∫ ∝

] - 16 == σ = (1/4) [ 3!/(0.5)4- 0.5t t (A t e )dt - 2 µ 228

(e). The standard deviation is hence: σ = 8 . A graphical display of the pdf and the CDF in this example is shown below:

pdf CDF

1/4

1/2

3/4

1

00

.05

.10

.15

.20

0 1 2 3 4 5 6 7 8 9 yrs.

µ σ

pdf

CDF

(f) The skewness coefficient, sk, of the pdf is given by the integral:

sk = 0

∫ ∝ - 0.5t

(A t e )dt 3(t - )µ(1/σ )3

With σ = 8 and µ = 4, it can be shown that sk > 0; so the pdf is a left-skewed curve as can be seen in the plot. Discussion: Plots of the pdf and/or CDF provide a visual appreciation of the life distribution of the washing machine. We see that the failure probability rises during the first two years in service (about 25% of the machines will fail by the end of the second year; i.e. F(2) ≈ 0.25). Similarly, the mean-time-to-failure (MTTF) is 4 years; and we find F(4)=0.594; so nearly 60% of the machines will fail in 4 years. If the manufacturer offers a warranty for one full year (t=1), then F(1)=0.09; or 9% of the machines is expected to fail during the warranty period.

Page 29: Reliability engineering

Chapter- II Fundamentals II - 18

Another Note: Often, one has to integrate complex integrals in order to obtain explicit results. An integration table at hand is always helpful. At times, some integration can become extremely tedious to the extent that it may not be integrated explicitly. Such difficulty can be circumvented by a suitable numerical method of integration. One should not view such difficulty as an inherent feature in this course.

Summary. This chapter introduces: (1) the basic notions in probability, and (2) the mathematical definitions of the properties of a probability distribution function. It is essential to be conceptually clear and mathematically proficient in dealing with these subjects. In the former, we should be clear about the following:

• The event X: the probability of X to occur is denoted by PX; the probability of X not to occur is PX’=1-PX.

• If event X is a random variable, X can have a range of possible values: denoted by x if value of x is

continuous or by xi if value of xi is discrete. The probability of X=x is denoted by PX=x= f(x); similarly, the probability of X= xi is denoted by f(xi).

• f(x) is called pdf since x is continuous; f(xi) is called pmf since xi is discrete; f(x) is the probability when

X is exactly equal to x.

• F(x) is called CDF and F(xi), CMF. F(x) is the probability when X ≤ x; it is the area under the f(x) curve up to the specified value of x.

• For multiple events X, Y, Z or X1, X2, X3 etc., make clear the physical meanings of their “intersection”

and “union”, as well as the meanings of “dependent”, “independent” and “mutually exclusive” events.

• If occurrence of X depends on the occurrence of Y, the probability of X to occur is PX/Y; this is known as the conditional probability. But if X and Y are independent events, PX/Y=PX; and PY/X=PY as well.

• For identical and independent events, the binomial distribution applies. The simpler Poisson distribution

is a degenerated binomial distribution, when N is large and p small. For identical but mutually dependent events, the binomial distribution will not apply.

As for the properties of a probability function f(x), note the definition of

• The total probability axiom; the distribution mean, variance and the skewness. • The value of X may be discrete or continuous; handle them with care.

• Elementary integration skill will be helpful.

Page 30: Reliability engineering

Chapter- II Fundamentals II - 19

Assigned Homework. 2.1. Suppose that PX=0.32, PY=0.44 and PX ∪ Y=0.58. Answer the following questions with proof: (a) Are X and Y mutually exclusive? (b) Are X and Y independent events? (c) What is the value of PX/Y? (d) What is the value of PY/X? [Partial answer: (a): no; (b): no; PX/Y=0.409; PY/X=0.5625] 2.2. Suppose that PA=0.4, PA ∪ B=0.8 and PA ∩ B=0.2. Compute: (a) PB; (b) PA/B; (c) PB/A; [Partial answer: PB=0.6; PB/A=0.5] 2.3. In a QC test of 126 computer chips from two different suppliers, the following are the test results: Pass the QC test do not pass Supplier -1 80 4 Supplier -2 40 2 Now, let A denote the event that a chip is from supplier-1 and B the event that a chip pass the QC test. Then, answer the following questions: (a) Are A and B independent events? (b) Are A’ and B independent events? (c) What is the meaning of PA ∪ B? (D) What is the value of PA ∪ B? [Hint: PA=84/126; PB=120/126 and PA ∩ B=80/126] 2.4. Use the Venn diagram and show the following equalities:

• PY = PY ∩ X + PY ∩ X’, X and Y may be dependent events; • PY = PY/X PX + PY/X’ PX’, X and Y may be dependent events; • PY = PY PX + PY PX’, if X and Y are independent events.

2.5. An electric motor is used to power a cooling fan; an auxiliary battery is used in the event of a main power outage. Experiment indicates that the chance of a main power outage is 0.6%. And, when the main power is on,

the chance that the motor itself fails is pm= 0.25x10-3; when the auxiliary battery is on, the chance of the motor

failure is pb= 0.75x10-3. Determine the probability that the cooling fan fails to function.

[Hint: let X be the event of main power outage, Y, the event of fan failing to operate; note that Y depends on X and X' and PY=PY/XPX+PY/X'PX' applies. Note also PY/X=pb and

P(Y/X'=pm. Answer: PY=0.253x10-3].

2.6. The values of the random variable X are (1,2,3); the associated pmf is: f(xi) = Cxi3, i=1,3.

(a) Find the value for C; (b) Write the expression of F(xi);

(c) Determine µ and σ and sk; (d) Plot f(xi) and F(xi) graphically.

[Partial answer: C=1/36; µ=2.722; σ=0.506; sk = -1.62]

Page 31: Reliability engineering

Chapter- II Fundamentals II - 20

2.7. A single die is rolled repeatedly 6 times; each time the #6 is desired.

(a) Use (2.18) to compute the pmf, f(i), i=0,6 is the number of times the #6 appears; (b) Plot a bar-chart for f(i); i=0,6; indicate the largest f(i) value; (c) Determine the mean and variance of f(i); (d) Re-do the computation of f(i) by the Poisson equation (2.21); comment on the difference.

2.8. Show the details how Equation (2-25) is reduced to (2-26); similarly, show how Equation (2-31) is reduced to Equation (2-32). 2.9. In a QC test of a lot of engines, 3% failed the test. Now, 8 such engines are put into service; what is the probability of the following situations? (a) None will fail; (b) All will fail; (c) More than half will fail; (d) Less than half will fail. [Partial answer: (a): 0.784; (b): nearly 0] 2.10. The probability of a computer chip being defective is p=0.002. A lot of 1000 such chips are inspected: (a) What is the probability that 0.1% or more of the chips are defective? (b) What is the probability that more than 0.5% of the chips are defective? (c) What is the mean (expected) number of the defective chips? [Hint: Use the Poisson distribution for p<<1 and N>>1. Partial answer: (a): Pn>1 = 1- f(0)- f(1) = 0.594] 2.11. Suppose that the probability of finding a flaw of size x in a beam is described by the pdf

f(x) = 4xe-2x 0 ≤ x ≤ ∝ x is in microns (10-6 m). (a) Verify that f(x) satisfies the “total probability” axiom; (b) Determine the mean value of the flaw size distribution; (c) If a flaw is less than 1.5 micron, the beam passes inspection; what is the chance that a beam is accepted? [Partial answer: (b) µ=1 micron; (c) PX<1.5= 0.8] 2.12. A computer board is made of 64 k-bit units; the board passes inspection only if each k-bit unit is perfect. On-

line inspection of 1000 boards finds 60 of them unacceptable. What can you say about the quality of the k-bit unit?

[Hint: let p be the probability that a k-bit unit is imperfect; and then find the value of p].

2.13. (Optional for extra effort) A cell phone vendor assures that the reliability of the phone is 99% or better; the buyer would consider an order of 1000 phones if the reliability is only 98% or better. The two sides then agree to inspect 50 phones; if no more than 1 phone fails the inspection, the deal will be made.

(a) Estimate the vendor’s risk that the deal is off; (b) Estimate the buyer’s risk that the deal is on.

[Hint: For the vendor, p = 0.01; estimate the chance that more than 1 in 50 inspected fail. For the buyer, p =

0.02; estimate the chance that no more than 1 in 50 inspected fails].

Page 32: Reliability engineering

Chapter- II Fundamentals II - 21

Page 33: Reliability engineering

Chapter-III Data Sampling III-1

CHAPTER III. DATA SAMPLING AND DISTRIBUTIONS

In the preceding chapters, some elementary concepts in probability and/or random variables are introduced; and these are illustrated using a number of examples whenever possible. We note that the central element in these examples is the pertinent probability distribution function (pmf or pdf) for the random variable X identified in the particular problem. In general, the pertinent pmf or pdf may be determined, or estimated, by one of the following two approaches:

(a) Probabilistic Approach. If the exact mechanisms by which the random variable X is generated are known, the underlying pmf or pdf for X can be determined on the basis of a certain probability theory. In Chapter II, we have illustrated this approach using simple examples such as rolling dice or flipping coins. In addition, a somewhat more complex mechanism involving the so-called Bernoulli Trial was shown to lead to a class of pmf's known as binomial distribution. (b) Data Sampling Approach. Engineering issues, such as the life-time of a product in service or the expected performance level of a machinery, often involve intrinsic mechanisms that are not exactly known; the underlying pmf or pdf for the identified random variable cannot be determined exactly. Then, the alternative is to estimate the pmf or pdf using techniques involving data sampling. In Chapter I, we demonstrated briefly how a statistical sample could provide an estimate leading to the underlying pdf for the identified random variable. But in that process, there are certain techniques and the details of which need be thoroughly discussed.

Thus, this Chapter discusses the basic elements in the data sampling approaches, along with their connection to some of the well-known probability distribution functions. III-1. Sample and Sampling. At the outset, let us introduce the following terms:

Population. A “population” includes all of its kind. Namely, when a random variable X is defined, all of it's possible values constitute the “population”. The underlying pdf of X must be defined for each and every element in the population; it is referred to as the true pdf of X. Clearly, for a continuous random variable, the population size is infinite. Sample. A “sample” is a sub-set of the population. Thus, the size of the sample is finite even if the population is infinite. Elements in the sample represent only partial “possible values” of X. In general, more than one sample may be taken from the same population. If a sample contains N elements, it is referred to as a “sample of size N”.

Page 34: Reliability engineering

Chapter-III Data Sampling III-2

Sampling. This refers to the "creation" of sample data in order to estimate the underlying pdf of the random variable X. Depending on the sample is taken, the estimated pdf may not be close to the true pdf of the population; this is especially the case when the sample size is small compared to that of the population. Hence, proper techniques in sampling become important. Moreover, one would also want to have a degree of confidence in the sampling technique as well as the estimated pdf. Random Sampling. This refers to sampling techniques that guarantee each possible value in the population will have an equal chance of being sampled. Such techniques ensure a closer agreement between the estimated pdf and the true pdf. Sampling Error. This refers to the difference between the estimated pdf (from a sample) and the true pdf of the population. Logically, the sampling error can be easily assessed if the true pdf of the population is known. But the true pdf may never be known in some cases; hence, the error in sampling can be estimated only based on some statistical reasoning.

III-2 Sample Statistics. Random Sample. Consider a random sample of size N; denote the data in the sample as xi,

i=1,N. Assume each xi is selected in random; the sample pdf is then a uniform distribution:

f(xi) = 1/N for i=1,N (3.1)

The sample Mean, according to (2.24), is then:

µs = Σ xi(1/N) = (1/N)Σ xi Σ sums over 1,N (3.2)

The sample variance and skewness are, respectively:

(σs)2 = (1/N)Σ (xi − µs)2 Σ sums over 1,N (3.3)

(sk)s = [1/(Nσ3)] Σ (xi - µs)3 Σ sums over 1,N (3.4)

Note that (3.2) is actually the averaged value of the sample xi; it is based on the assumption

that each xi has the same chance being sampled. However, if N is not large, this assumption can

bias the variance; it is likely that at least one value outside xi may have been excluded in the

sampling. To admit this possibility, the sample variance in (3.3) is often modified in the form:

(σs)2 = [1/(N-1)]Σ (xi − µs)2 Σ sums over 1,N (3.5)

The expression (3.5) will be used in all subsequent discussions and in all homework problems.

Page 35: Reliability engineering

Chapter-III Data Sampling III-3

Note: In most engineering situations, one cannot assume a uniform pdf for a sample, e.g. the one shown in (3.1), without a justification. The following example is a case in point: Example 3-1. Seventy (70) cars are picked randomly from the assembly line for QC inspection. For each car, the “braking distance” from the running speed of 35 mph to a complete stop is recorded. The following table lists the “raw data” obtained in the QC tests, referred to as the sample: xi, i=1,70:

Braking distance, in feet _____________________________________________________________________

39 54 21 42 66 50 56 62 59 40 41 75 63 58 32 43 51 60 65 48 61 27 46 60 73 36 38 54 60 36 35 76 54 55 45 71 54 46 47 42 52 47 62 55 49 39 40 69 58 52 78 56 55 62 32 57 45 84 36 58 64 67 62 51 36 73 37 42 53 49 ______________________________________________________________________ Here, the braking distance is treated as the random variable X; theoretically, it's possible value could range from 0 to infinite feet. But the "sample" has only 70 data (N=70); so the underlying pdf can only be approximated. In the sample, the smallest and largest values are shown in bold face; the sample range is R = 84-21= 63 ft. As a first-step estimate, we assume that the sample has a uniform pdf 1/N; then the sample mean and variance can be calculated from (3.2) and (3.5), respectively:

µs = 52.3 ft (σs)2 = 168.5 ft2 σs = 12.98 ft

Sample Histogram. If the sample size is large, say N > 50, a histogram can be constructed which may reveal the true distribution character of the sample. If constructed properly, the histogram may guide us to estimate the pdf of the sample. To illustrate, consider:

Example 3-2. Let us return to the sample data in Example 3-1. First, we group the 70 data points in 7 class intervals, numbered from i = 1,7: Class interval (i): 1 2 3 4 5 6 7 --------------------------------------------------------------------------------------------------------------- Value range 20-29 30-39 40-49 50-59 60-69 70-79 80-89 ---------------------------------------------------------------------------------------------------------------- Next, we do a tally by recording the number of the data that fall inside each of the class intervals. For instance, there are 2 data points in the (20-29) interval; 11 data points in the (30-39) interval, 16 in the (40-49) interval, etc. We then define the occurrence frequency by dividing that number by N (=70); resulting in the corresponding occurrence frequency: 2/70, 11/70, 16/70, etc. These results are tallied below:

Page 36: Reliability engineering

Chapter-III Data Sampling III-4

i, Interval number: 1 2 3 4 5 6 7 --------------------------------------------------------------------------------------------------------------------------- Interval limits: 20-29 30-39 40-49 50-59 60-69 70-79 80-89 --------------------------------------------------------------------------------------------------------------------------- # data in interval: 2 11 16 20 14 6 1 --------------------------------------------------------------------------------------------------------------------------- Occurrence frequency: 2/70 11/70 16/70 20/70 14/70 6/70 1/70 ---------------------------------------------------------------------------------------------------------------------------

Now, we treat the class-interval number i, i=1,7 as a new random variable with the real integers 1 to 7; the unit of the integer is ∆, which equals 10 feet in this case. The underlying pmf, f(i) is then estimated as: f(1)=2/70; f(2)=11/70; f(3)=16/70; f(4)=20/70; f(5)=14/70; f(6)=6/70; f(7)=1/70. A graphical representation of the above in a bar chart, known as the histogram is shown below:

10 20 30 40 50 60 70 80 90 90

x

f(x)

5/70

10/70

15/70

2/70

11/70

16/70

20/70000

14/70

6/70

1/70

Discussion: The histogram is the pmf of the discrete random variable representing the class interval i, whose value range is (1,7). We can use (2.24) and (2.26) to compute the mean and the variance of the pmf, f(i), and convert them in term of the unit of x (feet): hence, µ=52.86 ft. and σ=13.2 ft. Note that the results for µ and σ compare fairly well with µs=52.3 ft and σs= 12.98 ft that were

computed earlier in Example 3-1; but the histogram reveals a distribution that is non-uniform; in fact, the histogram suggests a symmetric, bell-like distribution with the maximum at about x=55 feet. Clearly, to assume a uniform distribution for f(x) would be quite imprudent. In general, the histogram can often provide a useful clue as to the mechanism by which the sample is generated. In this example, the mechanism stems probably from the engineering design of the brake; and the targeted (designed) braking distance is probably around 55 feet or so. The histogram can also serve as a basis for fitting the sample with a continuous distribution function f(x). Note: The histogram obtained above is based on 7 class-intervals, with the interval width ∆=10 ft. The question is often asked as to why 7 interval? What if 5 intervals, say with ∆=15 ft, are used instead? Or, what if 15 intervals with ∆=5 ft, are used instead? Actually, the histogram corresponding to ∆=15 ft is shown as follows:

Page 37: Reliability engineering

Chapter-III Data Sampling III-5

10 20 35 50 65 80 95 x

f(x)

10/70

20/70

30/70

4/70

25/70

30/70

10/70

1/70

∆ =15

The shape of the above looks like a half-sine curve rather than a bell-shaped curve; but it’s mean is 53.0 ft and the standard deviation is 12.75 ft. which are about the same as before. Similarly, if ∆=5 ft is used, the resulting histogram is shown below:

10 20 30 40 50 60 70 80 90 x

f(x)

5/70

10/70

15/7

1/70

9/70 10/70

3/70

∆ = 5

Here, the shape of the histogram is rather rugged, although the mean and standard deviation (52.93 ft and 12.75 ft, respectively) are again within good agreement as before. The question is, given a sample, what class-interval should one select in order to obtain a proper histogram?

The Sturges Formula. From the preceding, we see that selection of the class-internal width affects the shape of the resulting histogram; the influence is a matter related to the sample size (N), the value range (R) and the sample data within the range. Yet, a proper selection for the class-interval width is essential for obtaining a relevant histogram shape and thus the distribution function that is to be fitted. In 1926, H. A. Sturges offered an empirical formula for the optimum selection of class-interval size ∆:

Page 38: Reliability engineering

Chapter-III Data Sampling III-6

∆ = R [1 + 3.3 log10(N)]-1 (3.6)

Discussion: Returning to Example 3-2, we find from the sample N=70 and R =84 - 21=63 ft. According to (3.6), we calculate: ∆ ≈ 9. Recall that we have used ∆=10 as the first try which yielded good result.

Ranking Statistics. In many situations only a small sample is available, say N < 20; such a sample would be too small to yield a useful histogram. In that case, the method of ranking statistics is often employed instead. The essence of ranking statistics is to estimate the CDF instead of the pdf for the given sample. Of course, once the CDF is found the pdf follows. The procedures in the ranking statistics are illustrated by the example below:

Example 3-3. Proof-tests of 14 turbo engines provide the time-to-failure data (in hours): 103; 113; 72; 207; 82; 97; 126; 117; 139; 127; 154; 127; 199; 159. Here, the sample size is only N=14. It is not useful for histogram construction. Instead, the 14 data points are first ranked in ascending order as follows: Ranking i: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ------------------------------------------------------------------------------------------------------------------ Time-to-fail ti: 72 82 97 103 113 117 126 127 127 139 154 159 199 207

------------------------------------------------------------------------------------------------------------------ The above ranking of the data may be interpreted by one of the following three assumptions: (1) The equal rank assumption. Each of the N data points has the same “ranking order”. This implies that the underlying probability of occurrence for each data is the same; that is f(ti)=1/N for all

i=1,N. It then follows that the CDF each ranked data is (see 2.23): F(ti) = i/N for i=1,N

The above states that the value of the CDF corresponding to the ith ranked data ti is simply i/N. When

paired with ti, the following is obtained:

Time-to-fail, ti: 72 82 97 103 113 117 126 127 127 139 154 159 199 207

--------------------------------------------------------------------------------------------------------------------- F(i)=1/N: .071 .143 .214 .286 .357 .428 .50 .571 .643 .714 .786 .857 .929 1.0 --------------------------------------------------------------------------------------------------------------------- Since ti are ranked in ascending order, the paired data are readily plotted as shown below:

Page 39: Reliability engineering

Chapter-III Data Sampling III-7

2 5 5 0 7 5 1 0 0 1 2 5 1 5 0 1 7 5 2 0 0 225

0

. 2 5

1 . 0

. 7 5

. 5

F ( i )

t , h r s .

F i t t e d C D F : F ( t )

In the above plot, the solid line is the fitted CDF, F(t), based on the 14 data points. Discussion: It is said that the equal-rank equation is somewhat biased, especially when N is small. Note, from the above data, that the cumulative probability is 7.1% for t ≤ 72 hours, while 100% for t≤207. This implies that there is a 7.1% chance for some data smaller than 72, while no chance for any data larger than 207. Thus, for a small sample, the equal-rank assumption may over-estimate the CDF at the tail end of the population and underestimate the head end of the population. (2) The mean-rank assumption. This assumption allows the possibility that one or more virtual data may be present in between the ith and (i+1)th data. Specifically, the existing ith data is the mean of it’s two neighboring virtual data. In particular, there is at least one data below the lowest ranked data and there is at least one data above the highest ranked data. By a statistical reasoning (the details of which are omitted here), the value of F(i) corresponding to the ith data is computed by: F(i)=i/(N+1) for i=1,N If we use the above formula in the previously example, we find F(t≤72)=6.67% (instead of 7.1%) and F(t≤207)= 93.33% (instead of 100%). These results are deemed more realistic than that found using the equal rank assumption. Consequently, a better fit for F(t) may be realized. (3) The median-rank assumption. This assumption is similar to the mean rank assumption; but each of the existing data is regarded as the median (instead of the mean) of it’s two neighboring virtual data. Again, by a similar statistical reasoning, the CDF for the ith data is given by: F(ti)= (i-0.3)/(N+0.4) for i=1,N

This approximation gives F(ti) values lower than that by the mean-rank assumption near the tail

portion of the F(ti) curve, but higher in the head portion of the curve.

Note: All three representations yield essentially the same results for samples with large size, say N>50. However, for reason of uniformity, the mean-rank assumption will be used exclusively in all future discussions and in all the homework problems.

Page 40: Reliability engineering

Chapter-III Data Sampling III-8

III-3. Parametric Probability Distributions. In the preceding section, we have introduced several methods for evaluating the distributional character of the underlying probability functions – f(x) or F(x) – given a sample data; for instance, the character of f(x) may be revealed by a histogram; or the character of F(x) may be revealed by a plot based on ranking statistics. But, we have yet to fit a specific set of data with a specific probability function. In statistics and probability literature, a number of known distribution functions for the pdf (or CDF) have been employed in various applications. A common feature in these functions is that their forms are explicit mathematically, and each may contain one or more free parameter(s); in particular, the properties of these functions (e.g. the mean, variance, skewness, etc.) are thus expressible in terms of the free parameter(s). This gives a multitude of choices to fit a given sample data with one of these functions. In this section, a number of the wildly used functions will be discussed in detail. The Normal Distribution. The normal distribution (also known as the Gaussian distribution) is arguably the most well-known and widely used distribution function in many practices. The pdf contains two (2) free parameters and is expressed in the following form:

f(x) =

1

√2πexp [

12

− ( )x-ab

2] −∝ < < ∝x (3.7)

b

where a and b are the two free parameters of real value. For (3.7) to be a bona-fide pdf, we require f(x) 0 for all values of x; and it must satisfy the total probability condition:

(3.8)F( )

1√2π

exp [12

− ( )x-ab

2]∝ =

−∝∫

∝ dx = 1

b

It is easy to check that f(x) 0 is satisfied by (3.7); and the proof of (3.8) can be accomplished by the “method of residues” from the theory of complex variables (the reader is here challenged to complete the proof!). The mean and variance of the pdf (3.7) are found from using (2.30) and (2.31), respectively:

(3.9)

x

√2πexp [

12

− ( )x-ab

2]=

−∝∫

∝ dx µ

b

√2π

exp [12

− ( )x-ab

2]=

−∝∫

∝ dx σ2

x- µ( )2

b (3.10)

Page 41: Reliability engineering

Chapter-III Data Sampling III-9

Integration of (3.9) and (3.10) requires formula derived from the method of residues. Here, we shall omit the details of the integration and state only the results: µ = a σ = b. Consequently, (3.7) is in fact expressible explicitly in terms of its mean and standard deviation:

f(x) =

1

√2πexp [ 1

2− ( )

2] −∝ < < ∝xσ (3.11)

x-σ

µ

A graphical display of (3.11) as a function of x is shown below:

∝ −∝

f(x)

x → µ

σ 1

σ 2 > σ 1

Note that f(x) is always symmetric with respect to its mean value, µ ; its overall shape resembles a bell-like curve whose spread (or band width) varies with the standard deviation,σ . The figure shown above illustrates two normal functions, having the same µ; but each has a different σ value. Note that a large σ value yields a large spread of f(x), and vice versa. The Standardized Normal Distribution. One short coming of the normal distribution function is that the pdf (3.11) cannot be integrated explicitly to obtain a close-form expression for the CDF, F(x); the alternative is to integrate (3.11) numerically. To do so, however, the numerical values of µ and σ are needed. In situations where µ and σ are not at all known, the difficulty can be alleviated by a mathematical transformation of the variable x to the variable z: z = (x - µ)/σ (3.12) It then follows that dx = σ dz (3.13) From the above, (3.11) is transformed to a function of z, ϕ(z), satisfying the following equality:

Page 42: Reliability engineering

Chapter-III Data Sampling III-10

F(x) =

−∝∫

x

f(x) dx = Φ(z)−∝∫

z(x) = ϕ(z) dz (3.14)

The above invokes the fact that the cumulative probability F(x) is the same as Φ(z), provided that x and z are one-to-one related by (3.12). Differentiation of the integral expressions in (3.14) yields: f(x) = ϕ(z)[dz/dx] = ϕ(z)[1/σ] (3.15) Combining the above with (3.11) and (3.12), we obtain:

(z) =

1

√2πexp [

12

− 2 ] −∝ < < ∝z (3.16)zϕ

Note that ϕ(z) does not explicitly contain any free parameters; so is its CDF:

(3.17) −∝ ∫ z (z) =

1

√2π exp [ 1

2 − 2 ] z Φ dz

Consequently, a numerical evaluation of Φ(z) in (3.17) can be carried out through the value range of z, without the need of having the specific values of µ and σ. The function ϕ(z) in (3.16), or the function Φ(z) in (3.17), is known as the standardized normal distribution. By “standardization” it means that all normal functions in the regular form (3.11) can be transformed into (3.16); and the functional shape of ϕ(z) in (3.16) is a bell-like curve as shown below:

z

(z)

-3 -2 -1 1 2 3

.1

.2

.3

ϕ

Page 43: Reliability engineering

Chapter-III Data Sampling III-11

Note that the normalized pdf is symmetric about z = 0, with µz = 0 and σz = 1; the maximum

occurs at z = 0, with the numerical value of ϕ(0)=1/(2π)1/2 ≈ 0.4. Note also ϕ(z) decays rapidly to approaching zero for z > 3 or z < -3.

As for the CDF Φ(z), it is obtained by integrating ϕ(z) numerically over the z-variable, say from -∞ to z. Note that Φ(z) rises from 0 to 1 as z increases from - ∞ to ∞ (see the figure shown below); practically, Φ(z) is nearly 0 for z ≤ -3 and nearly 1 for z ≥ 3; due to symmetry of ϕ(z), Φ(z) = 0.5 at z = 0.

z

.5

.6

.7

.8

.9 1

.4 .3 .2

.1

-3 -2 -1 0 1 2 3 4

Φ(z)

A Conversion Table for ΦΦ(z). As mentioned, the value of Φ(z) can be obtained by numerically integrating (3.17) if the value of z is given. A comprehensive table, which lists values of Φ(z) in the domain of –5.0<z<5.0, is included in Appendix III-A at the end of this chapter. Note that the value of z varies with the increments of 0.01; and for each z value, four significant digits are retained in value of Φ(z). This table can also be used to determine the inverse of Φ(z). Specifically, if the value of Φ(z) is

given, say Φ(z)=F, then the inverse of Φ(z) is defined as: z= Φ-1(F); and the latter can be found from using the same table in Appendix III-A. More extensive mathematical tables list values for Φ(z) in the range of -6 ≤ z ≤ 6 and with up to 15 significant digits; some electronic calculators have built-in statistical functions with options for Φ(z) as well as the inverse of Φ(z). Use of the conversion table in Appendix III-A will be illustrated in the examples below:

Example 3-4. Let the time-of-wearout of a cutting knife be normally distributed, with the mean and standard deviation of 2.8 hrs and 0.6 hrs, respectively. Determine

Page 44: Reliability engineering

Chapter-III Data Sampling III-12

(a) The probability of a knife wearing out within 1.5 hrs. Here, the random variable is the time-of-wearout, t; The CDF is given: a normal distribution with µ =2.8 hrs and σ =0.6 hrs. In the standardized form: F(t)= Φ(z) with z=(t-µ)/σ. Now, for t=1.5 hrs, z=(t-µ)/σ=(1.5−2.8)/0.6=−2.167. This means that F(t ≤ 1.5)=Φ(z ≤ -2.167); and we are interested in finding the value of F(t ≤ 1.5) or Φ(z ≤ -2.167). By using the table in Appendix III-A, we locate z values in the first column at z=-2.1 and then go horizontally to at z=-2.16; here, we are into the “F-field” and find F=0.01539 (at z=-2.16); similarly, we find F=0.01500 at z=-2.17. By means of a linear interpolation, we estimate F=0.01512 at z=-2.167. Thus, we have F(t ≤ 1.5)=Φ(z ≤ -2.167) =0.01512. Thus, the probability of the cutting knife to wear-out within 1.5 hrs is 1.512%. (b) Time to replace the cutting knife when there is 75% probability of wearout. Here, Φ(z)=F(t)=0.75. Again, from using Appendix III-A, we scan the F-field to find a value closest to 0.75. We see that when z=0.67, F=0.7486; and when z=0.68, F=0.7517. By a linear interpolation, we obtain z =0.6745 for Φ(z)=0.75. From z=(t-µ)/σ =(t−2.8)/0.6= 0.6745, we find t=3.2047 hrs. Thus, the time to replace the knife is 3.2047 hrs.

Example 3-5. The tensile strength of a metal wire is known to be normally distributed, with µ= 1000 lbs and σ=40 lbs. In a proof test, 100 wires are loaded up to 950 lbs in tension; estimate: (a) How many wires will pass the test? For x=950, we have z = (x-µ)/σ = (950-1000)/40 = -1.25; from Appendix III-A, we find the value of Φ(-1.25)=0.1056. Thus, the cumulative failure probability up to 950 lbs of load is F(950)=10.56%. Thus, about 11 wires will fail the test; or 89 wires will pass the test. (b) If the first 15 weakest wires are to be screened out, at what load should the wires be proof-tested? Here, the cumulative failure probability is 15 out of 100; or Φ(z)=0.15. From Appendix III-A, we find the corresponding z=-1.037. Thus, the proof load (x) should be: x = zσσ+ µ = µ = -1.037(40)+1000 = 958.5 lbs.

Central Population. With the standardized normal function Φ (z), one can readily see the following relationship: Φ(z) + Φ(-z) = 1 (3.18) From the graphical display of the pdf ϕ(z) on the next page, one can see that the area under the curve from –z to z is equal to [Φ(z)-Φ(-z)]; owing to symmetry or by means of (3.18), this area is also equal to [1-2Φ(z)]. This area is commonly called the “central population”, or the Yield, denoted as Y = 1 - 2Φ(-z) (3.19)

Check that for z=1, Y=68.3%; for z=2, Y=95.45%; and for z=3, Y=99.73%; etc. The meaning of

Page 45: Reliability engineering

Chapter-III Data Sampling III-13

the “yield” will be further discussed in Chapter VI.

z

(z)

-3 -2 -1 1 2 3

.1

.2

.3

z z

Φ (-z) Φ (-z)

Central population

ϕ

=1- Φ (z)

Example 3-6. The diameter of a bearing ball is normally distributed, with µ=10mm and σ=0.2mm. What is the central population whose diameter is in between 9.9mm and 10.1mm? Solution: From using z=(x-µ)/σ we find z =(10.1-10.0)/0.2=0.5 or -z*=-0.5. From Appendix III-A, we find Φ(-0.5)=0.3085. Hence, the yield: Y=1-2Φ(-0.5)=0.383. 38.3% of the balls will have the diameter in between 9.9mm and 10.1mm.

Fitting A Sample to the Normal Distribution: Suppose a sample of size N is initially given and we wish to fit the sample with a proper distribution function. In general, there are two alternate routes to followed, depending the size N of the given sample:

rank sample data:x i x i+1< i=1,N

if N>50 if N<50

construct a proper histogram

use ranking method to fit the F(x) of a known distribution

histogram shape suggests choice of distribution

experience suggests choice of distribution

Page 46: Reliability engineering

Chapter-III Data Sampling III-14

From the above flow chart, we see that for a sample of size N>50 it is possible to construct a histogram whose distribution characteristics can then suggest a proper choice of the distribution function for the sample. But when N<50, it is impractical to do a histogram; then, experience often plays a role in the choice of a distribution function to be fitted with the sample. In either case, the general guideline is that (a) the fitted function should preserve the physical properties of the sample with minimum error; and (b) the fitted function is mathematically convenient to apply in subsequent reliability and related analyses. Here, we use one specific example to outline the steps in fitting a given sample to the normal distribution function.

Example 3-7. Consider the sample in Example 3-3, where the times-to-failure of 14 turbo engines are recorded. Here, the sample is not large enough for a histogram construction. We first rank the sample data in ascending order: Ranking, i: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 --------------------------------------------------------------------------------------------------------------- Time-to-fail, ti: 72 82 97 103 113 117 126 127 127 139 154 159 199 207

Before fitting the sample to the normal distribution function, let us just compute the sample “mean” and “variance” using (3.2) and (3.5):

µs = 130.14 hrs; σs2 = 1551.2 σs = 39.38 hrs.

Normal Probability Plot: To fit the sample formally to the normal distribution, we apply the mean rank formula to approximate the cumulative probability function, F(ti):

F(ti) = i/(N+1) i = 1,14

Since F(ti) can be put in the standardized form Φ(zi), with zi=(ti-µ)/σ, it follows that

zi = Φ−1[i/(N+1)] = ti/σ - µ/σ

The above is a linear equation between Φ−1[i/(N+1)] and ti/σ. However, still with µ and σ are so far

unknown. But, for each i=1,N, we know Φ(zi)= F(ti)= i/(N+1); and with the aid of Appendix III-A, we can

determine the inverse zi =Φ−1[i/(N+1)], for i=1,14:

Φ(Φ(zi) ) =F(ti): .07 .13 .20 .27 .33 .40 .47 .53 .60 .67 .73 .80 .87 .93

------------------------------------------------------------------------------------------------------------

Φ Φ−1−1[i/(N+1)]: -1.5 -1.11 -.84 -.62 -.44 -.25 -.07 .08 .26 .43 .62 .85 1.11 1.5

We then pair the values of zi=Φ−1[i/(N+1)] with ti and plot each pair in the zi verses ti frame as

shown on the next page (the 14 open circles are from the paired data).

Page 47: Reliability engineering

Chapter-III Data Sampling III-15

A straight line labeled y=at + b is drawn through the paired data points. Here, y corresponds to zi and

t corresponds to ti; and it is readily identified that

a=1/σ;σ; b=-µ/σµ/σ. Note: In the plot below, we have added a vertical axis, labeled F(t) whose scale corresponds to Φ(z); the fitted straight line is not in the F(t) verses t frame!

.841

.159

Φ-1 [i/(N+1)]

0

1

2

3

-1

-2

-3

0.5

.977

.999

.023

.0014

50 100 150 200 250 hrs

µ=132.5

y = at+b

σσ

t80

.10

93 172

F( )t i

To find µµ: we note that when t=µ, z=0. From Appendix III-A, Φ(0)=0.5; or F=0.5. Hence, a horizontal line is drawn from F=0.5 or from z=0 to intersect the straight line y=at+b; from the intersect, a vertical line is drawn downward to intersect the t-axis at t ≈ 132.5 hrs, hence the value for µ: µ µ = 132.5 hrs. To find σσ: We note that when t=(µ±σ), z=±1; hence, from Appendix III-A, we find Φ(−1) = F(µ−σ) = 0.159 and Φ(1) = F(µ+σ) = 0.841. Thus, a horizontal line is drawn from F=0.16 or z=-1 to intersect the straight line, y=at+b; then from the intersect point, a vertical line is drawn downward to intersect the t-axis at about 93 hrs. The distance from t=93 to the distribution mean is equal to σσ; namely, σσ ≈ 132.5 - 93 = 39.5 hrs.

Page 48: Reliability engineering

Chapter-III Data Sampling III-16

Alternatively, a horizontal line from F=0.84 or z=+1 is drawn to intersect the straight line; from there a vertical line is drawn downward to intersect the t-axis at about 172 hrs. The distance from t=172 to the distribution mean (132.5) is equal to σ; or, σ=172−132.5=39.5 hrs. Discussions: In this example, the values for µ and σ (132.5 and 39.5) found in fitting the sample to the normal function appear to be fairly close to the sample mean µs=130.14 hrs; and standard

deviation σs = 39.38 hrs obtained earlier.

The straight line in the graph can provide answers to many related questions. For instance, within 80 hours of operation, there is a 10% failure probability for the engine. We find this result from drawing a vertical line upward from t=80 to intersect the straight line, and then a horizontal line to intersect the F-axis at F ≈ 0.1.

In the graph, the vertical axes F(t) and Φ−1 are put side-by-side to show their transformation

relationships; only Φ−1 is linearly related to t; and both are in arithmetic scale. F(t) is not linear with t; and F is in the transformed scale, through the use of Appendix III-A. With F axis in the transformed scale, the sample data can be paired with F(ti) versus ti directly,

without having to compute Φ−1. This will save time and avoid mistakes in computation. Appendix III-B provides such a plotting paper that can be copied off for ready use.

In the probability plot, the straight line y=at+b was drawn through the 14 paired points by “eye-balling”. This approach is highly subjective and it can result in error. A more rigorous way to determine the straight line is by the method of linear regression or the least square. The method is analytical; no plotting is necessary. This is explained in detail in the next section.

The Least-Square Method. In the probability plot, the values of yi=Φ−1[i/(N+1)] are paired

with ti. In a “perfect” situation, all the paired points (yi, ti) should fall onto the straight line,

y=at+b; if they are not, an error is present between the data and the fitted line. For each paired point, the error is expressed by [yi-y(ti)]. Thus, we define the “squared error function” as:

S = Σ [yi - y(ti)]2/N.

= Σ(yi -ati -b)2/N Σ sums over i=1,N (3.20)

The idea here is to find the line y = at+b such that S is the minimum. Hence, we set: ∂ S/ ∂ a= ∂ S/ ∂ b=0. (3.21) Upon carrying out the minimization process, we find

a = xy - x y

x x2

x( )-b = y - a x (3.22)

where

Page 49: Reliability engineering

Chapter-III Data Sampling III-17

x = 1N

Σ t i y = 1N

Σ yi

xx = 1

NΣ yy =

1

NΣ y it i t i y i

xy = 1N

Σ t i y i

The minimum of S is then given by:

S = (1 - rmin

2 ) ( )y y - y y (3.23)

where

r 2=

x y - x y( )2

yy- x xx x( ) ( - y y ) (3.24)

The quantity r2 is known as the “correlation factor”, whose value lies in between 0 and 1. When r2 → 1, the paired data points is said to correlate well with a straight line fit; when r2 → 0, the correlation is not good at all. Since the straight line represents graphically the fitted normal function, the constants a and b are related to µ and σ as: µ=µ=-b/a σ=σ=1/a.

Example 3-8. We now return to Example 3-7; and treat the sample data there by the method of least square. From the tabulated data in that example, we first calculate the following: Substitute the above into (3.22), we obtain a=0.02145 and b=-2.7917. Consequently, we obtain σ=1/a

=46.62 and µ=-b/a=130.15. The correlation factor is computed from (3.24): r2 = 0.96. The correlation is rather good, as we can see from the plot. Discussion: We list the estimated values for the mean and the standard deviation using the various methods: By the 1st-step sample calculation: µ = 130.14 hrs σ = 39.98 hrs By eye-balling from plot: µ = 132.5 hrs σ = 39.5 hrs By the least square fit: µ = 130.15 hrs σ = 46.62 hrs

N

xx = 1 N Σ t i t i yy =

1

N Σ y i y i

xy = 1 N Σ t i y i

x = 1

Σ t i = 130.14 N

y = 1

Σ y i = 0.

= 18377.57 = 0.69

= 30.915

Page 50: Reliability engineering

Chapter-III Data Sampling III-18

We see that the mean is rather stable regardless the method used; but the standard deviation can vary considerably. Clearly, the least square method provides the best results; As the latter requires a considerable computation, a computer routine can be easily programmed, however.

Sampling Error. When a sample of finite size is taken randomly from a population of infinite size, there is uncertainty that the sample may not have the same statistical characteristics as that in the population. But in practice we often use the sample to represent the population and to estimate the underlying probability distribution function for the population; thus, question often arises as to how much confidence do we have in the estimated function? To answer this question properly, however, it requires a considerable development in the theories of statistics. Here, we will discuss only briefly some of the basic notions in sampling, sampling error estimate or confidence levels in sampling. Theoretical Error. Suppose a sample of finite size N is taken randomly from a population and it is fitted to, say, a normal function; thus, the sample mean and standard deviation µs and σs are

obtained. Owing to the size of the sample, both µs and σs are likely to be different from the true

µ and σ of the population. However, µs and σs will approach the true µ and σ under two extreme conditions: (1) when N is

large, or N ∞→ ; and (2) if samples of size N are repeatedly taken from the population. In the latter case, the estimated sample means and standard deviations are themselves random variables; for instance, the sample means µs may be described by the pdf f(µs). According to the central

limit theorem in statistics, f(µs) tend to be normally distributed if the number of samples is large

and moreover, the mean of f(µs) tends to be the mean of the true mean of the population (µ),

while the variance of f(µs) tends to be equal to σ2/N. Clearly, in the limit that N → ∝, the variance

of f(µs) → 0.

The term σ/(Ν)1/2 is a theoretical measure of how much the sample µs deviates from the true µ of

the population. The difficulty here is, of course, that the true µ and σ of the population is in fact unknown. Estimated Error. When the true σ of the population is unknown, one can estimate the sampling error by using σs of the sample in place of σ. In that case, the meaning of σs/(N)1/2 may be best

explained by the following example:

Take Example 3-7. The sample size is N=14; and the computed (by the least square method) sample mean µs=130.15 hrs and standard deviation σs=46.62 hrs. The estimated sampling error is then

σs/(N)1/2 = 46.62/(14)1/2 =12.46 hrs.

This error represents the range of deviation of the estimated sample mean. In other words, if samples of similar size are taken repeatedly from the population, the means from all the samples will fall inside the range of 130.15±12.46 hrs. Thus, by extrapolation, the true mean µ of the population also falls inside this range.

Page 51: Reliability engineering

Chapter-III Data Sampling III-19

Note that the smaller the sample size N, the larger the estimated sample error. In the above example, the sample error is rather large. This is due to the sample size, N=14, which is rather small.

Theoretical Confidence Level. The refined way to assess sampling error is to establish a “confidence level”. Here, we return to the premise that if multiple samples are taken from a population, the sample mean µs is a random variable and it's distribution is normal-like according

to the central limit theorem. So, we normalize the random variable µs as in (3.12):

zs=(µs-µ)/[σ/(N)1/2] (3.25)

In the above, µ and σ are still the true mean and standard deviation of the population, both unknown. Nonetheless, the standardized normal pdf ϕ(zs) and CDF Φ(zs) can be evaluated

without knowing the values of µ and σ. In particular, the mean of zs is at zero and the standard

deviation is 1. In fact, the pdf ϕ(zs) is shown below:

z

ϕ(zs)

-3 -2 -1 1 2 3

.1

.2

.3

z z

Central population

s

αα−

α/2 α/2

= 1 − α

Now, consider the central population between ±zα, where α is a small number less than 1. And, α

represents the probability that zs falls outside the range of ±zα. Conversely, (1-α) represents the

probability that zs falls inside the range of ±zα. Hence, α is called the “risk” factor that zs falls

outside ±zα; and (1-α) is called the “confidence” level that zs falls inside ±zα. The range between

±zα is called the “confidence interval” associated with α. Note that if α is specified, the value of zα=Φ−1(α/2) can be found via Appendix III-A; and there

is no need to specify the true µ and σ of the population.

Example: For a 80% confidence level, we set (1-α)=0.8; so, α=0.2. The confidence level is found via

Page 52: Reliability engineering

Chapter-III Data Sampling III-20

Appendix III-A: Z0.2=Φ−1(α/2) = Φ−1(0.1)=-1.28; or m Z0.2=1.28.

For a 99% confidence level, we set α=0.01; from Appendix III-A, we find Φ−1(α/2)= -2.58; the confidence interval is: m Z0.01=2.58. We return to (3.25) to obtain the confidence level for the sample mean µs:

µ± = µ ± zασ/(N)1/2

Estimated Confidence Level. Since the exact pdf of the population is unknown, we can still obtain a pair of “estimated” confidence intervals by substituting µs and σs of the sample in place

of the true µ and σ of the population: µ± = µs ± zα σs/(N)1/2 (3.26) The above is read: for a (1-α) confidence level, the mean of the population falls inside the estimated interval µ±.

The confidence interval for the variance σ2 of the population can be similarly established; and a pair of estimated confidence interval for σ can be obtained by substituting µs and σs from the

sample in place of the true µ and σ of the population. Here, we omit the mathematical details in the derivation and present only the final result: σ± = σs ± zα σs/[2(N-1)]1/2 (3.27)

Example 3-9. Again, take Example 3-7. Let the confidence level be α=0.2 (80% confidence); then

µ± = µs ± zα σs/(N)1/2 = 130.15 ± 1.28 (46.62)/(14)1/2 = 130.15 ±15.95 hrs.

σ± = σs ± zα σs/[2(N-1)]1/2 = 46.62 ± 1.28 (46.62)/(26)1/2 = 46.62 ± 11.70 hrs.

Similarly, for 99% confidence (α=0.01), we have:

µ± = 130.15 ± 2.58 (46.62)/(14)1/2 = 130.15 ± 32.15 hrs. σ± = 46.62 ± 2.58 (46.62)/(26)1/2 = 46.62 ± 23.59 hrs.

Discussion: For higher confidence, the ranges of µ± and σ± are wider. III-4. Other Parametric Distribution Functions. In many engineering applications, the normal distribution is not always the best choice for fitting a given sample. For example, the tensile strength distribution of some materials can be skewed to the right of its mean; the time-to-failure distribution for fatigue failures in machine parts can span from a few thousands hours to millions of hours, resulting in a very skewed pdf. Of course, if the sample size is sufficiently large, one can always construct a proper histogram and find a clue as to the skewness in the sample’s distributional character; and to fit a skewed distribution with the

Page 53: Reliability engineering

Chapter-III Data Sampling III-21

normal function can result in significant error. In the following, a few commonly known distribution functions will be introduced. Among them, we discuss with some details the log-normal and the Weibull functions. Log-Normal Distribution. The mathematical development of the log-normal distribution stems from the central limit theorem in statistics: if Xi, i=1,N, is a set of N random variables, then their

sum, X=ΣXi, is also a random variable; the pdf of X tends to be normally distributed, if N is

sufficiently large. This is so even if Xi’s are not all normally distributed.

Now, suppose that failure of a machine part is caused by the occurrence of N random events, Ti,

i=1,N, where Ti represents the time-to-occur of the ith event. If all the Ti’s are independent

events, then the probability of failure of the machine, T, is given by: T = T1∗ T2∗ TN∗ . . . ∗TN In generally, T is not normally distributed even if all Ti’s are normally distributed. However, in

light of the central limit theorem, the logarithm of T is normally distributed if N is sufficiently large: ln(T) = Σ ln(Ti) Σ sums over 1,N

Now, let g(t) be the pdf of T; and f(τ) be the pdf of ln(T), where τ=ln(t). In particular, f(τ) is normal and has the form:

σ τ σ τ

1

√2π exp [ 1

2 −

2 f( ) τ =

τ −µ τ ( ) (3.28)

Here, µτ and στ are the mean and standard deviation of ln(T); both remain free parameters. Let F(τ) and G(t) be the CDF f(τ) and g(t), respectively; and note that the variables τ and t are one-to-one related through τ = ln(t). Thus, we can write:

F( ) = −∝∫τ

f( ) d = G(t) = (3.29) τ τ τ g(t) dt∫o

t

Differentiating the above with respect to τ yields, f(τ)=g(t)[dt/dτ] (3.30) In turn, it leads to,

g(t) = f( ) [1/t] στ στ

1

√2πexp [

12

− 2= τ( ) (3.31)1t

−µln(t)]τ

Page 54: Reliability engineering

Chapter-III Data Sampling III-22

At this point, we formerly replace µτ and στ by two new parameters as follows: µτ =ln(to) and στ = ωο (3.32)

Consequently, the pdf g(t) of T in (3.31) is expressed as:

ωο ωο

1

√2π exp [12

− 2= ( ) (3.33) g(t) 1t

ln(t/t )]

o

Not that the g(t) is not a normal function. It’s mean and variance can be determined following (2.30) and (2.31); omitting the details of the derivation, these are given by:

µ t = t exp ( )ωο

2/2o σt = t o

22 ωο2

e ωο2

e( -1) (3.34)

The general characteristics of f(τ), which is normal, and g(t) which is not are illustrated by the graphs shown below:

∝ −∝τ

f( )τ

µτ

στστ

normal distribution

t∝

0 1 to

g(t)

µ tσt

non-normal

Note that g(t) is always skewed to the left of the mean, while f(τ) is symmetric about it’s mean as it is normal. The physical difference between them is that the t-axis in g(t) is in real-time, arithmetic scale, while the τ-axis in f(τ) is in logarithmic time scale. Note for instance,

Page 55: Reliability engineering

Chapter-III Data Sampling III-23

the location for the mean of f(τ) is at µτ, which corresponds to to in the real-time t-axis (see

3.32); the mean of g(t) is located at toexp(ωo2/2) according to (3.34).

The log-normal function g(t) is characterized by the two free parameters: to and ωo. In essence,

to shifts the distribution along the t-axis while the value of ωo dictates the distribution’s scatter

band or the overall shape. It can be shown numerically that (1) when ωo ≥ 1, g(t) approaches an

exponential-like function; (2) when ωo → 0, g(t) then becomes a normal-like; in particular, g(t)

coincides with f(τ), with their respective mean and the variance also coinciding; and (3) in the range of 0< ωo< 1, g(t) is always a left-skewed function. These features are schematically shown

in the sketch below:

t

0

g(t)ωο ≈ 1

ωο≈ 0.5

ωο ≈ 0.1

Discussion: The log-normal function offers a greater freedom than the normal function in fitting a set of data whose distribution may or may not be symmetric. The log-normal function can be exponential, left-skewed or normally distributed, depending only on the parameter ωο. Use of the log-normal function is as simple as that of the normal function; the extra step is to transform the variable t to τ=ln(t). As f(τ) is normal, it can then be expressed in the standardized form in order to facilitate numerical computations as it has been done before.

The Standardized Log-Normal Function. The log-normal function g(t) in (3.33) cannot be integrated explicitly to obtain a closed form for the CDF, G(t). But, through τ=ln(t), g(t) is related to f(τ) in (3.28), which is normal. Hence, the CDF of f(τ), F(τ), can be expressed in the standardized normal form, via the variable z: z=(τ -µτ)/στ = [ln(t/to)]/ωo (3.35)

Note that the pdf ϕ(z) has the standardized normal form in (3.16); and its CDF Φ(z) can be evaluated numerically as before. In short, the value of G(t) equals to that of Φ(z); and the value of Φ(z) is rendered through the use of (3.35) in conjunction with Appendix III-A.

Example 3-10: Fatigue failure of a rotating shaft is fitted with the log-normal function g(t), as given in (3.33). The characterizing parameters are obtained as to= 5000 hrs and ωo= 0.2.

Page 56: Reliability engineering

Chapter-III Data Sampling III-24

Observations: (a) The random variable here is t, time-to-failure in hours; (b) We note that 0<ωo<1; the pdf should be slightly left-skewed;

(c) The standardized normal CDF is Φ(z), with z=ln(t/to)/ωo.

We may then compute the following quantities:

(1) The expected failure time (the mean-time-to-failure): MTTF = toexp(ωο2/2) = 5101 hrs;

(2) The variance of g(t): σt2= to

2exp(ωo2) [exp(ωo

2)-1] = 1.0169x106 hrs2; σt = 1030 hrs.

(3) The design time, td, say for 1% probability of failure, or F(td) = 0.01:

For Φ(z)=0.01; we find z=-2.32 from Appendix III-A; from ln(td/to)/ωo=-2.32, so td = 3144

hrs. (3) If the inspection period is set at 3000 hrs, what % of failure is expected before the inspection ? We compute z =ln(3000/5000)/0.2 =-2.554; from Appendix III-A, Φ(-2.554) = 0.00539. Thus, we expect a 0.539% failure probability before inspection.

Fitting A Sample to Log-Normal Distribution: Given the sample data xi, i=1,N, let it be

fitted with the log-normal function in the form of (3.33): Here, the quantities to be determined are the two free parameters: xo and ωo. Using (3.35), we transform the variable x to z by, z = [ln(x) - ln(xo)]/ωo

Then, the pdf ϕ(z) is a standardized normal; and its CDF is Φ(z), which is equal to the CDE of f(x), F(x). The inverse Φ(z) is: Φ−1(F) = z = [ln(x) - ln(xo)]/ωo (3.36)

We see that Φ−1(F) is a linear function of ln(x). Hence, in fitting the sample xi, i=1,N, we

follow the same procedures outlined earlier for fitting the normal function. Specifically:

(a) Rank xi in ascending order; (b) Compute ln(xi);

(c) Compute F(xi)= i/(N+1); (d) Find Φ−1[i/(N+1)] via Appendix III-A;

(e) Plot Φ−1[i/(N+1)] versus ln(xi); (f) Obtain xo and ωo from the plot.

Example 3-11. The failure times xi (in days) for 20 heating devices are recorded and are listed in

ω ο ω ο

1

√2π exp [ 1

2 − 2 = ( ) f(x) 1 x

ln(x/x ) ] o

Page 57: Reliability engineering

Chapter-III Data Sampling III-25

ascending order from 1 to 20. The corresponding values for ln(xi), F(xi)= i/(N+1) and Φ-1[i/(N+1)]

are then determined accordingly; these are summarized in the table below:

i F(xi)=i/(N+1) xi ln(xi) Φ-1[i/(N+1)]

---------------------------------------------------------------------------------- 1 0.0476 2.6 0.9555 -1.6684 2 0.0952 3.2 1.1632 -1.3092 3 0.1429 3.4 1.2238 -1.0676 4 0.1905 3.9 1.3610 -0.8761 5 0.2381 5.6 1.7228 -0.7124 6 0.2857 7.1 1.9601 -0.5659 7 0.3333 8.4 2.1282 -0.4307 8 0.3810 8.8 2.1748 -0.3030 9 0.4286 8.9 2.1861 -0.1800 10 0.4762 9.5 2.2513 -0.0597 11 0.5238 9.8 2.2824 0.0597 12 0.5714 11.3 2.4248 0.1800 13 0.6190 11.8 2.4681 0.3030 14 0.6667 11.9 2.4765 0.4307 15 0.7143 12.3 2.5096 0.5659 16 0.7619 12.7 2.5416 0.7124 17 0.8095 16.0 2.7726 0.8761 18 0.8571 21.9 3.0865 1.0676 19 0.9048 22.4 3.1091 1.3092 20 0.9524 24.2 3.1864 1.6684 ----------------------------------------------------------------------------

Plotting of the above data can be done in two different ways: (1) pair Φ−1[i/(N+1)] versus ln(xi); and

(2) pair F(xi) versus ln(xi). Of course, the scale for Φ−1[i/(N+1)] is arithmetic while the scale for F(xi) is not. The following plot displays both axes in the plot.

Page 58: Reliability engineering

Chapter-III Data Sampling III-26

Φ-1 [i/(N+1)]F(i)

0

1

2

3

-1

-2

-3

0.5

.814

.977

.999

.159

.023

.0014

y = a +b

ln(x)

0 1 2 3 4

ξ

mean of ln(x) = ln(x )ο

slope of line = 1/ ωο

In the plot, a straight line is then drawn through the paired points. This line, denoted by y=aξ+b,

where y= Φ−1[F]; ξ=ln(x) in the plot, represents the fitted CDF function, Φ(z) graphically. The free parameters of g(t), however, are determined from the plot as follows:

(1) The mean of Φ(z) is found by drawing a horizontal line from Φ−1(F)=0 to intersect the straight line; and from the intersect point, drawn a line vertically downward to intersect the ln(x)-axis; in this case, it is at about ln(x)=2.2; it thus yields the parameter: xo= 9.025 (days). Alternatively, we

can draw a horizontal line from F=0.5; this is the same as that from Φ−1(F)=0.

(2) The slope of the straight line y=aξ+b is then found in the Φ−1(F) versus ln(x) frame: hence, a=1.36. According to (3.36), a=1/ωo. So, ωωo= 0.735.

Here, ωo is greater than 0.5 but less than 1; so the pdf g(t) should be considerably left-skewed.

As for the mean and the variance of the g(t), they are computed by: µ=xoexp(ωo2/2)=11.82 days and

by σ2= xo2exp(ωo

2)[exp(ωo2)-1] =100.15 (day)2. The standard deviation is thus σ=10 days.

Discussion: The constants in y=aξ+b can, of course, be found by using the least-square method

discussed earlier. In that case, we set yi=Φ−1[Fi] and ξi=ln(xi). For this example, the least square fit

yields the following results: a = 1.35; b =-2.969 and r2= 0.955.

Page 59: Reliability engineering

Chapter-III Data Sampling III-27

Consequently, we find xo=exp(-b/a)=9.018 days; and ωo=1/a=0.74. In addition, the correlation factor

r2= 0.955, indicating a relatively good fit.

Note: A ready made log-normal plotting paper is provided in Appendix III-C; both F(xi) and Φ−1[F] axes are included. With this form, the plot can be made in the F versus ln(x) frame, without the need

of computing Φ−1[F]. This will save a considerable amount of time. Weibull Distribution Function. The Weibull distribution function is due to Walodie Weibull who in 1931 established a probability distribution function for the strength of textile yarns. Similar to the log-normal function, it is very versatile in fitting a wide range of engineering data including the skewed and normal like distributions. But, unlike the normal or the log-normal functions, the Weibull function can be expressed in close forms for the pdf and the CDF; and there is no need to perform numerical integration. This unique attribute provides a considerable mathematical ease in probability and/or reliability analysis of complex systems. The random variable X is said to be a Weibull distribution, if its pdf is a 2-parameter function of the form:

f(x) = (m/θ)(x/θ)m-1exp[-(x/θ)m] 0 ≤ x < ∝ (3.37) Here, m is called the “shape parameter” and θ, the scale parameter; both have values greater than 0. The characteristics of the Weibull pdf in (3.37) depend on the values of m and θ. In general, the role of m is similar to that of σ in the normal distribution; or ωο in the log-normal distribution; it determines the “shape” of f(x). The role of θ is similar to that of µ in the normal; or to in the log-normal distributions; it shifts the f(x) along the x-axis. In particular, it can be readily shown that:

(1) The Weibull becomes an exponential function when m=1; (2) It approaches the normal function when m ≈ 4; (3) It tends to be left-skewed when 1<m<4; and (4) It tends to be right-skewed when m>5.

The above features of the Weibull function are illustrated graphically below:

Page 60: Reliability engineering

Chapter-III Data Sampling III-28

0

f(x)

x

m 4 m = 2 m =1 ≈

m > 5

Upon integrating (3.37), the CDF of the Weibull function takes the simple close form:

F(x) = 1- exp[-(x/θ)m] 0 ≤ x < ∝ (3.38) The mean and the variance of the Weibull function are found from using (2.30) and (2.31):

µ = θΓ(1+1/m); σ2= θ2[Γ(1+2/m) - Γ(1+1/m)2] (3.39) Here, Γ(ζ) is the Gamma function, defined by the integral:

Γ(ζ) = ∫ 0

∝ ζ−1 y exp(-y) dy (3.40)

In (3.40), ζ is real and is always greater than 1; ζ is not necessarily an integer, however.

About the Gamma Function: Given the value of ζ, which is always greater than 1, the integral in (3.40) can only be integrated numerically. Numerical tables listing values of Γ(ζ) as a function of ζ are available in many mathematical handbooks; some calculators have such a function as well. The graph shown on the next page is obtained by numerically computing the Gamma function Γ(ζ) in the range of 1 ≤ ζ ≤ 2, with the increment of 0.1:

Page 61: Reliability engineering

Chapter-III Data Sampling III-29

Γ(ζ)

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ζ.84

.88

.92

.96

1.0

Use of the Gamma function graph shown above: The graph shown above is convenient for a quick estimation of the value of Γ(ζ), given the value of ζ. For example: from the graph, we find Γ(1.5) ≈ 0.886. For ζ > 2, the following recurrence relation of the Gamma function is useful: Γ (ζ) = (ζ−1)Γ(ζ−1) For example, the value for Γ(4.5) may then be computed as follows: Γ(4.5) = (4.5-1)Γ(3.5) = (4.5-1)(3.5-1)Γ(2.5) = (4.5-1)(3.5-1)(2.5-1)Γ(1.5) = 3.5 x 2.5 x 1.5 x 0.886 = 11.63.

Discussion: The Weibull function can fit data in a variety of forms: exponential, normal, left-skewed and right-skewed. Note: The parameter θ is neither the mean nor the median of the distribution; a case in point: when x=θ, F(θ)=0.632. Example 3-12. Suppose that the tensile strength distribution of a cotton yarn is described by a Weibull function, with m=5 and θ=1 Kg. Then, from (3.37), the pdf is given by:

f(x) = 5x4exp[-x5], 0 ≤ x < ∝ Then, the mean and variance of f(x) are calculated by using (3.39), along with the Gamma function and its recurrence formula (3.40): µ = θΓ(1+1/5) = 0.918 Kg

σ2= θ2[Γ(1+2/5) −Γ(1+1/5)2] = 0.0443 (Kg)2 The standard deviation is:

Page 62: Reliability engineering

Chapter-III Data Sampling III-30

σ = 0.21 Kg. The graphical form of f(x) is plotted and shown as follows:

.2 .4 .6 .8 1.0 1.2 1.4 1.6 1.8 x

2.0

0.4

0.8

1.2

1.6

0

0.918

θ

σ=0.21

f(x)

Discussions: As it can be seen from the graph above, the mode (maximum) of f(x) occurs at x=θ=1, while the mean is at x=µ=0.918. The distribution is slightly skewed to the right of the mean, not quite a normal distribution. At x=θ, the CDF value is F(θ)=0.632, which is greater than 0.5. Clearly, the Weibull parameter θdoes not correspond to the mean, the mode or the median of the distribution.

Fitting Samples to Weibull Distribution: Let the sample xi, i=1,N be fitted with the Weibull

distribution function in the form of (3.37); then it is necessary to determine the parameters m and θ from the sample. The fitting procedure is similar to that for the log-normal distribution function discussed previously: From (3.38), we first define the reliability function R(x) as:

R(x) = 1 - F(x) = exp[-(x/θ)m] (3.41) We then take the logarithmic of the inverse of (3.41), twice and obtain ln[ln(1/R)] = m[ln(x)] - m[ln(θ)] (3.42) We see that ln[ln(1/R)] is a linear function of ln(x). We then return to the sample and do the following: (1) rank the values of the data xi, i=1,N in

ascending order; (2) approximate the CDF, Fi=F(xi) by the mean-rank formula; i.e. Fi = i/(N+1);

Page 63: Reliability engineering

Chapter-III Data Sampling III-31

(3) calculate, for each data point, the value of ln[ln1/(1-Fi)] or ln[ln(1/Ri)]; (4) plot ln[ln(1/Ri)]

versus ln(xi), either graphically or by means of linear regression.

If carried out graphically, the paired data points provide a linear fit in the ln[ln(1/Ri)] versus ln(xi)

frame; and the slope of the line equals to m. To determine θ, we observe from (3.42) that at ln[ln(1/R)]=0, ln(x)=ln(θ); hence, a horizontal line is drawn from ln[ln(1/R)]=0 to intersect the straight line; and a vertical line is drawn downward to intersect ln(x) axis at ln(x)=ln(θ). The above fitting procedure is illustrated by the example worked out below:

Example 3-13. Consider the sample of 20 data points given in Example 3-11; and the sample is to be fitted with the Weibull function. Following the procedures outlined above, the sample and the various related quantities are then computed and tabulated as follows: Rank i Fi=i/(N+1) Ri=1-Fi xi ln(xi) ln[ln(1/Ri)]

----------------------------------------------------------------------------------------- 1 0.0476 0.9534 2.6 0.9555 -3.0202 2 0.0952 0.9048 3.2 1.1632 -2.3018 3 0.1429 0.8571 3.4 1.2238 -1.8698 4 0.1905 0.8095 3.9 1.3610 -1.5544 5 0.2381 0.7619 5.6 1.7228 -1.3022 6 0.2857 0.7413 7.1 1.9601 -1.0892 7 0.3333 0.6667 8.4 2.1282 -0.9027 8 0.3810 0.6190 8.8 2.1748 -0.7349 9 0.4286 0.5714 8.9 2.1861 -0.5805 10 0.4762 0.5238 9.5 2.2513 -0.4360 11 0.5238 0.4762 9.8 2.2824 -0.2985 12 0.5714 0.4286 11.3 2.4248 -0.1657 13 0.6190 0.3810 11.8 2.4681 -0.0355 14 0.6667 0.3333 11.9 2.4765 0.0940 15 0.7143 0.2857 12.3 2.5096 0.2254 16 0.7619 0.2381 12.7 2.5416 0.3612 17 0.8095 0.1905 16.0 2.7726 0.5057 18 0.8571 0.1429 21.9 3.0865 0.6657 19 0.9048 0.0952 22.4 3.1091 0.8550 20 0.9524 0.0476 24.2 3.1864 1.1133 -----------------------------------------------------------------------------------------

We note that the fitting must be conducted in the ln[ln(1/Ri)] versus ln(xi) frame; i.e. use the paired

data listed in the last two columns of the table. However, note also the one-to-one relationship between Fi and ln[ln(1/Ri)]; the plot can in fact be conducted in the Fi versus ln(xi) frame as well. In

particular, at ln[ln(1/Ri)]=0, Fi =0.632. This F value corresponds to ln(x)=ln(θ). The plot shown on the next page is done in both the ln[ln(1/Ri)] versus ln(xi) frame and the Fi versus

ln(xi) frame. In the plot, however, the straight line y=aξ+b is drawn through the paired points by eye-

balling (here, y = ln[ln(1/R)] and ξ = ln(xi)).

Page 64: Reliability engineering

Chapter-III Data Sampling III-32

The slope of the line is found to be a=1.65, which equal to the parameter m. The parameter θ is found by drawing a horizontal line from y=0 to intersect the straight line, and then a vertical line downward to intersect the ξ-axis at ξ=ln(x)=2.48; since ln(x)=ln(θ), we find θ=11.94 days.

F

-1

0

1

2

-2

-3

-4

.308

.632

.934

.999

.127

.048

.018

y = a +b

ln(x)

0 1 2 3 4

ξ

slope of line = m

y=ln[ln(1/R)]

ξ=

ln( )θ

The mean and variance of the distribution are calculated from (3.38), with the aid of the Gamma function chart displayed on p. III-28. Thus, µ= θΓ(1+1/1.65) =11.94x0.89 =10.63 days; and

σ2= θ2[Γ(1+2/1.65)-Γ(1+1/1.65)2] = 44.32. The standard deviation is σ = 6.65 days. Discussions: In plotting the Weibull distribution, we note that the parameters m and θ must be determined in the ln[ln(1/R)] versus ln(x) frame; both axes are in arithmetic scale; only then, the slope of the fitted straight line is the parameter m. The F-axis is added alongside the ln[ln(1/R)] axis only to provide a time-saving alternative that the fitting can also be conducted in the F versus ln(x) frame, without having to compute the values for ln[ln(1/Ri)] .

A Weibull plotting paper is provided in Appendix III-D, with both the F and ln[ln(1/R)] ordinates. Again, one must be aware of that the slope of the fitted straight line must be determined in the ln[ln(1/R)] vs. ln(x) frame! By the Method of Linear Regression: A better fit may be obtained if the least square fit procedure is

used to determine the parameters m and θ. In that case, we set yi = ln[ln(1/Ri)] and ξi=ln((xi) and the

fitted line is denoted as y=aξ+b; following the detailed steps outlined in Example 3-7, the values of the constants a and b are then obtained. Then,

Page 65: Reliability engineering

Chapter-III Data Sampling III-33

m=a; and ln(θθ) = -b/a.

For the above example, the least-square method yields: m=1.66 and θ=12.36 days. The mean and the standard deviation are, respectively, µ=11.12 days and σ=6.89 days. In addition, the correlation factor

is r2=0.964. It is noted that the Weibull fit is slightly better than the log-normal fit (see Example 3-11); the correlation factor of the former is 0.964 for, while 0.955 for the latter.

The Extreme-Value Distribution Functions. In addition to the normal, log-normal and Weibull distribution functions discussed previously, many distribution functions of different forms are in use for engineering reliability problems. For instance, for flaws in materials or noises in machines that occur in random and in large numbers, the Extreme-Value Distributions have been very useful in practice. The pdf's of the maximum- and minimum-value distributions are in the general form:

f(x) = (1/Θ) e±(x-u)/Θexp[- e±(x-u)/Θ] -∝ < x < ∝ (3.43) Here, both functions contain the same two parameters: u and Θ. Τhe maximum-value function takes the negative of the ambiguous sign in the exponential power; and the corresponding CDF is obtained by integrating (3.43):

F(x) = exp[- e-(x-u)/Θ] -∝ < x < ∝ (3.44) The minimum-value function takes the positive of the ambiguous sign; and the corresponding CDF is obtained as:

F(x) = 1- exp[- e(x-u)/Θ] -∝ < x < ∝ (3.45) The distribution mean and variance of the respective functions are expressible in terms of the parameters u and Θ: µ = u ± 0.57722Θ; σ2= π2Θ2/6 (3.46) The maximum-value function is left-skewed; it’s mean is located at µ=u+0.57722Θ; the mode occurs at xmode= u, resulting in F(u)= 0.3679. The minimum-value function is right-skewed; it’s

mean located at µ=u-0.57722Θ; but the mode occurs also at xmode=u, resulting in F(u)=0.6321. Both distributions have the same standard deviation, given in (3.46); and their respective skewness coefficients are: sk=±1.1396. The skewness is independent of the values of u or Θ. Note that the pdf and CDF of the extreme-value functions are expressed in explicit forms; there is no need to use any numerical integration scheme or any tabulated tables to evaluate the values of the CDF, only if the parameters u and Θ are given.

Page 66: Reliability engineering

Chapter-III Data Sampling III-34

The respective pdf's are graphically shown blow in the frame of f(x)Θ versus (x-u):

0 Θ 2Θ 3Θ 4Θ

x-u

−4Θ −3Θ −2Θ −Θ

max-valmin-val

f(x)Θ

0.1

0.2

0.3

0.4

0.5

0.0

Discussions: The maximum-value distribution is commonly referred to as the Gumbel distribution; it is also known as the double exponential distribution. The latter name is due to the fact that the CDF in (3.44) can be expressed explicitly in terms of a single variable: ω=(x-u)/Θ, resulting in a double exponential function as: F(ω) = exp[- exp(-ω)] The minimum-value distribution in (3.45) can be similarly expressed in the same double exponential form as shown above, by letting ω=- (x-u)/Θ. In addition, if x be replaced by ln(x), u by ln(θ) and Θ by 1/m, the CDF of the minimum-value

function in (3.45) reduces to F(x)=1- exp[ -(x/θ)m], which is the form of the Weibull function. Thus, the minimum-value function is just another form of the Weibull function. Also Note: The range of the random variable X in the minimum-value distribution is −∝<x< ∞ , while that of the transformed random variable y in the Weibull form is 0<y< ∞ . Example 3-14. During aircraft landing, the maximum tensile force (in lbs) in a key fastener in the landing gear is described by the maximum-value function, with u=8000 lbs and Θ=1500 lbs. Let us examine the following issues: (a) The mean and standard deviation of the distribution function, using (3.46):

µ=8000 + 0.57722x1500 = 8866 lbs; σ=π2(1500)2/6 = 1924 lbs.

Page 67: Reliability engineering

Chapter-III Data Sampling III-35

(b) Suppose that the maximum force distribution in the fastener during each landing is the same as that during the previous landing; then for N successive landings, the maximum force distribution FN(x) in the fastener is given by:

FN(x) = [F(x)]N = exp[- e-(x-u)/Θ]N = exp[- e-(x-η)/Θ]

In the above expression, η = u + Θ ln(N).

Discussion: We see that the maximum force distribution in the fastener in N successive landings is still an maximum-value function; it is with the same Θ, but with u replaced by η=u+Θ ln(N); i.e. the parameter η depends on (and increases with) N. For example, if N=10000 landings, then η=8000+1500[ln(10000)]=21815 lbs. Thus, during 10000 landing, the maximum force distribution function has the same functional form, but with the parameter Θ still equals to 1500 lbs while the parameter u increases from 8000 lbs to 21815 lbs. Furthermore, the mean of the latter distribution is µN=22673 lbs, and the standard deviation σN is the

same as that of the single landing: σ=1924 lbs; i.e. the shape of distribution remains unchanged with N. The fact that the mean maximum force increases with ln(N) implies that, in repeated applications, there is a greater chance for a higher tensile force to occur; the larger the N, the more so. A Related Situation: In a unit volume of material, the detectable size of flaws can be described by the maximum-value function; it implies that the fact that the size distribution of the detectable flaws scales with the material volume.

III-5. The Weakest-Link Model Involving Extreme-Value Functions. In the previous section, we see that repeated applications (e. g. in N landings) of the maximum-value distribution result in another maximum-value distribution; i.e. the shape parameter Θ is preserved regardless of the number of applications; but the mode parameter η and hence also the mean µN will shift to the right as N increases. Similarly, repeated applications of the minimum-value distribution will result in another minimum-value distribution, whose shape parameter Θ is preserved; but the mode and the mean shift to the left as N increases. As the minimum-value distribution is just another form of the Weibull distribution, the same characteristics are observed with the Weibull distribution as well. The Weibull weakest-link model refers to a chain of N links as shown below:

x x

Suppose each link would fail by the applied tensile force x; and x is describable by the CDF, F(x).

Page 68: Reliability engineering

Chapter-III Data Sampling III-36

Then, the reliability of the chain of N links is:

RN(x) = [1 - F(x)]N (3.47)

Or, the failure probability of the chain is: FN(x) = 1- [1 - F(x)]N (3.48)

Let F(x) be the CDF of the Weibull function:

F(x) = 1 - exp[-(x/θ)m] It follows that

FN(x) = 1 - exp[-(x/θN)m] (3.49) where θN = θ/(N)1/m (3.50) Thus, the failure distribution of the chain, FN(x), is also a Weibull function, with the same shape

parameter m as that of the individual link; only the scale parameter is now θN = θ/(N)1/m. The

latter decreases with 1/(N)1/m. So N shifts the scale parameter to the left; i.e. It reduces the mean of f(x).

Discussions. Though the weakest-link model expressed in (3.48) can accommodate F(x) of the individual link having any distribution form, only the Weibull function would preserve the distribution shape parameter for a chain of N-links. In many engineering situations the Weibull distribution, which is very versatile itself, is often the choice for reliability applications as it is demonstrated in the above example. As for the weakest-link model, it finds application in many engineering problems. Examples include fibers of a longer length are weaker in tension than those of a short length; glass rods of a larger diameter are weaker under bending and/or torsion than those of a smaller diameter; N electrical circuit arranged in series would have a shorter life in service if N is large, longer life if N is small; etc. Somewhat more complicated situation is when N-links arranged In-Parallel. In such case, the resulting distribution function would not retain the shape of the link’s distribution, even if the latter is a Weibull function. In general, the distributional shape for a system of N-parallel links would spread more than that of the individual link (i.e. the standard deviation would increase), while the mean would decrease from that of the single link. Example 3-15. Suppose the tensile strength of a 10-inch long fiber is described by the Weibull

function, with m=7 and θ=8 kg. Then, the CDF for the 10-inch long fiber is: F(x) = 1 - exp[-(x/8)7]. (a). Now, if the design load, xd is defined by F(x ≤xd) = 0.01, it follows that xd = 4.15 kg. That is,

under 4.15 kg tension, there is 1% chance that the 10-inch long fiber would fail. (b). For a fiber having 20-inch length and assuming that the weakest-link theory applies, the resulting

distribution is still a Weibull, with m=7 but with θ2= 8(2)-1/7=7.25 kg. That is the CDF for two 10-

inch long fibers arranged in-series is: F2s(x)=1-exp[-(x/7.25)7].

Page 69: Reliability engineering

Chapter-III Data Sampling III-37

Then, under the same design load of 4.15 kg as before, the failure probability of the 20-inch long

fiber is: F2s(x ≤ 4.15)=1- exp[-(4.15/7.25)7]=0.02. So, there is a 2% chance that the 20-inch long

fiber would fail before 4.15 kg tension. (c). If two 1-inch fibers are bundled in-parallel, we may assume that the load x applied to the bundle is shared equally by the two fibers before any one of them fails. However, there are three possible failure conditions: (1) both fibers fail at the strength x/2; (2) one fiber fails at the strength x/2, while the other fails at a strength in between x/2 and x; and (3) the reverse role as in (2). Thus, the CDF for the bundle of two fibers, F2p, can be written as (the reader

should independently verify the following): F2p(x) = F(x/2)F(x/2) + 2F(x/2)[F(x) - F(x/2)]

In the above, F(x) is the CDF for the single 10-inch long fiber. If the bundle of two fibers is under the design load x=4.15 kg, the probability of failure of the bundle is then: F2p(4.15) = F(4.15/2)F(4.15/2) + 2F(4.15/2)[F(4.15)-F(4.15/2)]

We can easily compute:

F(4.15) = 1- exp[-(4.15/8)7= 0.01; F(4.15/2) =1- exp[-(2.075/8)7= 7.9x10-5. Hence,

F2p(4.15) = (7.9x10-5)2 + 27.9x10-5 [0.01- 7.9x10-5] =1.574x10-6.

Note: The probability of failure for 2 fibers in a bundle is much less than the single fiber under the same load, as it should; i.e. the bundle is stronger than the single fiber. This is obvious. However, if the applied load is 2x4.15 = 8.3 kg, the failure probability of the bundle is then:

F2p(8.3) = F(4.15)F(4.15) + 2F(4.15)[F(8.3)-F(4.15)] = 1.44x10-2

In this case, the applied load in each fiber before failure is 4.15 kg; but the probability of failure of the bundle under 8.3 kg is 0.0144, which is higher that of the single fiber under 4.15 kg (which is 0.01). So, the averaged fiber strength in a bundle of N fibers is weaker than that of the single fiber. But, to derive the distribution function for a bundle of N fibers would be mathematically tedious, especially for large N (say N<<3); those details are beyond the scope of this chapter.

Page 70: Reliability engineering

Chapter-III Data Sampling III-38

Summary: The main purpose of this chapter is to introduce aspects of statistics in place of probability. In engineering settings, a random variable is almost always generated by some unknown mechanisms. Hence, it is not possible to determine the exact values and/or the value-range of X; nor is it possible to figure out the exact probability distribution for X. The alternative is to collect all, or nearly all, possible values of X in the field; and this is called the population of X. From the population, one then devises a set of techniques based on the theories in statistics to figure out the probability distribution function of X. In general, the population is large; for practical reasons, a smaller sample is taken instead. It is in this context, one uses the same statistical techniques to figure out the probability distribution of the sample. Though it is hoped that the latter is a close approximate of the former. This chapter thus contains several working topics: (1) the statistical nature of a sample that is taken from a population; (2) use of the sample to approximate the underlining probability distribution of the population; (3) confidence evaluation of the probability distribution estimated from a sample; (4) the physical characteristics of some well known probability distribution functions, including the normal, log-normal, Weibull and the extreme-value functions; (5) techniques for fitting a sample to the normal, log-normal, and Weibull functions. In studying this chapter, it is essential to be conceptually clear and/or become proficient in the following areas:

• The meaning of the population of X: it contains all or nearly all of the possible values of X; it can be finite or infinite in size.

• The meaning of a sample: it is a subset of the population; the size of a sample is almost always finite (N). • Random sampling: it is a sampling technique that attempts to insure that the sample taken possesses the

same probabilistic characteristics as those of the population. • The physical characteristics of the normal distribution; why do we standardize the normal function? How

to use the numerical table in Appendix III-A? And how to fit a sample to the normal function?

• The physical characteristics of the log-normal distribution; what is the similarity and difference between the normal and the log-normal? How to fit a sample to the log-normal function?

• The physical characteristics of the Weibull distribution; the need to use of the Gamma function. How to

fit a sample to the normal function? Note the versatility and the mathematical ease in the Weibull function.

• Use of the graphical papers in Appendices III-B, C and D to fit the normal, log-normal and Weibull

distributions, respectively. The meaning of the fitted straight line; also note the double vertical scales used in each of the plotting papers and know their relationships.

• Details of the linear regression (the least square) method in fitting a sample analytically (without using

the plotting paper). Keep in main the correlation factor in this method.

Page 71: Reliability engineering

Chapter-III Data Sampling III-39

Assigned Homework. 3.1. Data for the response time (in seconds) are collected from 90 electrical n a quality inspection program. These are listed below: 1.48 1.46 1.49 1.42 1.35 1.34 1.42 1.70 1.56 1.58 1.59 1.59 1.61 1.25 1.31 1.66 1.58 1.43 1.80 1.32 1.55 1.60 1.29 1.51 1.48 1.61 1.67 1.36 1.50 1.47 1.52 1.37 1.66 1.44 1.29 1.80 1.55 1.46 1.62 1.48 1.64 1.55 1.46 1.62 1.48 1.64 1.55 1.65 1.54 1.53 1.46 1.57 1.65 1.59 1.47 1.38 1.66 1.59 1.46 1.61 1.56 1.38 1.57 1.48 1.39 1.62 1.49 1.26 1.53 1.43 1.30 1.58 1.43 1.33 1.39 1.56 1.48 1.53 1.59 1.40 1.27 1.30 1.72 1.48 1.66 1.37 1.68 1.77 1.62 1.33

(a) Calculate the sample mean, sample variance, sample standard deviation and sample skewness. (b) Use the Sturges formula and construct a proper histogram. Does the histogram suggest a normal

function? [(a): µs=1.5086; σs=0.1273; (sk)s=-0.0527; (b): distribution is almost bell-like]

3.2. Fit the sample in the previous problem with the normal distribution, by means of the plotting paper provided in Appendix III-B (you may down load a copy from the web site). Then, provide answers to the following:

(a) The mean, variance and standard deviation; (b) The probability that a relay has at least 1.5 sec relay-time; (c) The probability that a relay whose relay-time is outside the range of 1.5 ± 0.1 sec.

3.3. The following are data from testing 16 cutting knifes for their useful life in operating hours on a cutting machine: 2.10 0.82 2.80 2.51 3.15 2.73 4.55 5.00 4.27 2.66 4.80 1.69 3.58 1.99 4.60 2.11

(a) Calculate the sample mean, standard deviation and skewness; (b) Fit the sample to a normal function and determine the distribution parameters from the plot; (c) If five (5) knifes are used for a 120-minutes mission on a cutting machine, what is the probability that

none of the knifes would fail the mission? (d) What is the probability that at least one of the knifes would fail the mission?

3.4. Dynamic compressive strength of an automobile chock absorber is normally distributed, with µ= 1000 kg and σ=40 kg. Then,

(a) If a shock absorber is tested under 900 kg compressive force, what is the probability of its failure? (b) For 99% reliability, a chock absorber should not be loaded beyond what compressive load?

[(a): F(x ≤ 900 kg) = 0.0062]

3.5. Fit the sample in Problem 3.3 to a normal distribution by means of the least square method; list the values of the distribution parameters (µ and σ), and the correlation factor.

[µ=3.085; σ=1.466; r2= 0.96]

3.6. Fit the sample in Problem 3.3 to a log-normal distribution by means of

Page 72: Reliability engineering

Chapter-III Data Sampling III-40

(a) The plotting paper provided in Appendix III-C; list the values of the distribution parameters; (b) The least square method; list the values of the distribution parameters and the correlation factor; (c) Compute the mean and standard deviation of the fitted function.

[(a): to=2.81; ωo=0.5737; (b): r2= 0.906; (c): µ=3.31; σ=2.07] 3.7. Repeat Problem 3.6(a), (b) and (c); this time fit the sample (in Problem 3.3) to a Weibull distribution.

[m=2.18; θ=3.56; r2= 0.95; µ=3.15; σ=1.53]

3.8. For problem 3.5, evaluate the confidence limits µ± and σ± for 80%, 90% and 99% confidence levels.

[for 80%: µ± = 3.085±0.39; σ± = 1.215±0.284]

3.9. Service life of a machine tool is described by the log-normal distribution, with to=100 hrs and ωo=0.5.

Determine:

(a) The mean-time-to-failure and the standard deviation of the distribution; (b) The service life at 10% failure probability; (c) The failure probability within the first hour of service.

[(a): 113.3 hrs and 60.4 hrs; (b): 52.73 hrs.; (c): 0]

3.10. Service life of a bearing ball follows a Weibull distribution with the shape parameter m=2.5. Field data shows that 10% of the bearing balls fail with in 2-years of service. Determine

(a) The other parameter, θ; (b) Τhe mean-time-to-failure (MTTF) of the bearing ball; (c) Τhe failure probability of the bearing ball within 6-month of service; and (d) The service life for 1% failure probability.

[(b): MTTF=4.37 yrs; (d): 0.78 yrs.]

3.11. Tensile strength of a 10”-long graphite fiber is described by the Weibull distribution, with mo=7 and θo=

8kg. Determine

(a) The strength distribution for a 40”-long fiber; (b) The strength distribution for two 40”-long fibers arranged in parallel; (c) The probability of failure of the 2-40"-long fiber arranged in parallel when loaded to 15kg in tension.

[(a) for single 40"-fiber: set N=4]; [(b) for a bundle of 2-40"-long fiber: use F2b(x) = F2s(x/2)F2s(x/2) + 2F2s(x/2) [F2s(x) - F2s(x/2)];

[(c) substitute x =15kg into the above].

Page 73: Reliability engineering

Chapter-III Data Sampling III-41

APPENDIX III-A

Values of Φ(z) in the Standardized Normal CDF, for -5.00 ≤ z ≤ 0.00

Table to be handed out in class

Page 74: Reliability engineering

Chapter-III Data Sampling III-42

APPENDIX III-A (Continued)

Values of Φ(z) in the Standardized Normal CDF, for 0.00 ≤ z ≤ 5.00

Table to be handed out in class

Page 75: Reliability engineering

Chapter-III Data Sampling III-43

Page 76: Reliability engineering

Chapter-III Data Sampling III-44

APPENDIX III-B Plotting Paper for Normal Function

-3

-2

-1

0

1

2

3

.841

.977

.999

.159

.0014

F(xi)

.997

.988 .994

.968 .933 .894

.773 .692 .599 .5 .401 .309 .227

.106 .0668 .0401

.0122 .0062 .0030

.0228

Φ −1 (F i )

X

Page 77: Reliability engineering

Chapter-III Data Sampling III-45

APPENDIX III-C Plotting Paper for Log-Normal Function

-3

-2

-1

0

1

2

3

.841

.977

.999

.159

.0014

F(xi)

.997

.988 .994

.968

.933

.894

.773

.692 .599 .5 .401

.309

.227

.106 .0668 .0401

.0122 .0062

.0030

.0228

Φ −1 (F i )

ln(x)

Page 78: Reliability engineering

Chapter-III Data Sampling III-46

APPENDIX III-D Plotting Paper for Weibull Function

-4

-3

-2

-1

0

1

2

.632

.934

.999

.127

.018

F(xi)

.997

.969

.989

.880

.808

.723

.541

.458

.376

.308

.249

.200

.160

.100

.079

.062

.038

.030

.023

.049

ln(x)

ln[ln(1/R)]

Page 79: Reliability engineering

Chapter-IV Failure Rates IV-1

CHAPTER IV. FAILURE RATES AND RELIABILITY MODELS

The term “reliability” in engineering refers to the probability that a product, a system or a particular component will perform without failure under the specified condition and for a specific period of time. Thus, it is also known as the “probability of survival”. To quantify reliability, a test is usually conducted to obtain a set of “time-to-failure” sample data; say ti, i=1,N). The sample can then be fitted to a probability density function, f(t), or to a

probability cumulative function, F(t). The “reliability function” is defined as: R(t) = 1- F(t). Hence the behavior of R(t) is conjugate to that of F(t), the cumulative probability of failure in time. But failure of an engineering product, or system, may stem from such random factors as material defects, loss of precision, accidental over-load, environmental corrosion, etc. The effects on failure of the these random factors are only implicit in the collected data ti, i=1,N); and it is

difficult to ascertain which factor is predominant and when it is predominant, from using F(t). Another way to look at the failure behavior in time is to examine the failure rate. Failure rate is the time rate of change of the probability of failure. Since the latter is a function of time, failure rate is also a function of time. But in terms of failure rate, one can obtain physical information as to which factor is controlling the failure behavior and/or when it is controlling the failure behavior.

Example 4-1: A TV producer tested 1000 sets in an accelerated reliability evaluation program. In that program, each set is turned on-and-off 16 times each day to mimic a typical TV usage for a week. Based on a failure-to-perform criterion, failure data are obtained for the first 10 days of test:

__________________________________________________________ ___________ day-1 day-2 day-3 day-4 day-5 day-6 day-7 day-8 day-9 day-10 ------------------------------------------------------------------------------------------------------------- 18 12 10 7 6 5 4 3 0 1 ------------------------------------------------------------------------------------------------------------- Here, we define the failure rate as the "probability of failure per day", denoted by λi, i=1, 10:

For the first day (i=1): λ(1)=18/1000/day; For the second day (i=2): λ(2)=12/(1000-18)=12/982/ day. For the third day (i=3): λ(3)= 13/(1000-18-12)= 13/970/day. . . . . . . Note that the failure rate for day-1 is based on a total of 1000 TV sets, in which 18 failed during the day; the failure rate for day-2 is based on a total of (1000-18) sets; and for day-3, a total of (1000-18-12) sets; etc. In this way, we can obtain λ(t) up to t=10 days. A plot of λ(t) versus t is displayed on the next page. Clearly, for this procedure to yield reliable λ(t), the number of the TV sets tested each day must be large relative to the number of failures in that day. However, we also note that the time required in gathering the data is only 10-days, a relatively short time period compared to what might be needed to

Page 80: Reliability engineering

Chapter-IV Failure Rates IV-2

generate a set of time-to-failure data ti, i=1,N).

λ

48121620

1 3 5 7 9t

, 10-3i

(day)

/day

Physical significance of λλ(t): The above plot of shows that the TV failure rate is initially high; but it decreases with time. Such a decaying behavior is known as infant mortality or wear-in phenomenon. It implies that early failure is caused by "birth defects", that are present in the product before it is put into service; those products that have survived the wear-in period are deemed to have fewer defects at birth, statistically speaking.

The plot can be easily fitted by a smooth function λ(t) for 0<t. In fact, as it will be shown next, λ(t) is analytically related to f(t), the time-failure probability density function. In many engineering situations, it may be more time-saving and less expensive to data-fit λ(t) and than f(t).

IV-1. Relation Between Failure Rate and Failure Probability. Failure Rate. The relation between the failure rate function λ(t) and the failure probability density function f(t) can be readily established by examining f(t) graphically as shown below:

f(t)

t

t∆t

F(t)R(t) = 1- F(t)

F(t+ ∆t)

Since the area under f(t) from 0 to t is F(t), while the area from t to ∝ is R(t), the area (shaded) from t to (t+∆t) is noted by f(t)∆t, which represents the fractional probability that failure occurs within ∆t. The latter occurs when the product has actually survived the time period from 0 to t.

Page 81: Reliability engineering

Chapter-IV Failure Rates IV-3

Hence, the probability that failure occurs within ∆t is a conditional one: f(t)∆t/R(t); and the rate of change of that probability at the time t is: λ(t) = f(t)∆t/R(t)/∆t = f(t)/R(t) (4.1) The above establishes the relation between λ(t) and f(t). In fact, one can obtain f(t) from λ(t). To this end, we note that f(t) = dF(t)/dt = d[1-R(t)]/dt = -dR(t)/dt Eq. (4.1) can then be written in the form: λ(t) = -[dR(t)]/dt]/R(t) The in turn yields differential relationship: λ(t)dt= -dR(t)/R(t) Integrating the above from 0 to t and noting that R(0)=1, we obtain R(t):

R(t) = exp[ -∫ λ(τ) dτ]

0

t (4.2)

Then, from (4.1),

f(t) = λ(t) exp[ -∫ λ(τ) dτ

0

t ] (4.3)

Noting that R( ∞ ) = 0, one can readily verify the following,:

µ = MTTF = ∫ t f(t) dt = ∫0

0

∞R(t) dt (4.4)

Discussion: The failure rate data (the bar chart) in Example 4-1 can be fitted nicely by the function

λ(t) = 0.02 t-0.56 From (4.3), the corresponding failure function f(t) is obtained as

f(t) = 0.02 t-0.56exp[-0.04545t0.44] Similarly, from (4.2), the reliability function R(t) is obtained as:

R(t) = exp[-0.04545t0.44] And, the mean-time-to-failure is obtained using (4.4): ∝

Page 82: Reliability engineering

Chapter-IV Failure Rates IV-4

MTTF = µ = ∫ exp[-0.04545t0.44]dt

0

Example 4-2. A company produces videocassette recorders. In order to formulate a pricing, warranty and after-sale service policy, a reliability evaluation program is carried out; the test finds that the failure rate is somewhat a constant: λ(t)=1/875/hour. Thus, from (4.2), the reliability function is obtained: R(t)=exp[-t/8750] t in hours.

And, from (4.3), the time-to-failure probability function is obtained: f(t) = exp[-t/8750]/8750

From using (4.4), the mean of f(t), or the MTTF, is given by: µ = 8750 hours. In this case, we note that a constant failure rate leads to an exponential function for f(t). Significance of this relationship will be discussed next.

The Bath Tub Curve of λλ(t). For many engineered products, the failure rate function λ(t) has a time-profile much like the cross-section of a bath tub, such as shown below:

time

λ(t)

infant youth aging

(wear-in) (const. rate) (wear-out)

This curve is a ubiquitous character of all living things, for instance, the human life expectancy. Service life of many engineered products have much in common. As illustrated above, the curve may be broadly classified in three time zones: infancy, youth and ageing; each may correspond to a distinctive failure mode. The infancy or wear-in period is generally short, with a high but decreasing rate such as in the case discussed in Example 4-1. This mode of λ(t) may be due to defective parts, defects in materials, damages due to handling, out of manufacturing tolerance, etc.; such factors can have ill effects early in life. To correct the situation, a number of measures can be taken: design improvement, stricter material selection, tightened quality control, just to mention a few. If such

Page 83: Reliability engineering

Chapter-IV Failure Rates IV-5

measures are insufficient, a proof-test may be instituted. The latter refers to products under go a specified period of simulated tests, in hope that most early failures are weeded out. Another measure may also be instituted, known as redundancy, which is built into the product to provide a fail-safe feature. The youth period is exacted by a constant rate; this occurs with products that do not contain fatal defects or that have survived the infancy period. The rate-value is generally the lowest; and in some cases it maintains a long and flat behavior such as shown below:

time

λ(t)

This constant-rate mode is generally due to random events from without, rather than by inherent factors from within. Such events are beyond the control during the periods of design, prototype development, manufacturing, etc. The constant rate period is often used to formulate the pricing, warranty and servicing policies of the product; the latter is of particular importance in commerce. As it will be shown later, product with a constant failure rate has the unique attribute that its probability of failure is independent of the products past service life; this aspect aids mathematical ease in modeling repair frequency, spare-part inventory, maintenance schedule, etc. The aging or wear-out period is associated with increasing failure rate; it is attributed to material fatigue, corrosion, contact wear, etc., modes often encountered in mechanical systems with moving parts, such as valves, pumps, engines, cutting tools, bearing balls, wheels and tires, just to mentioned a few. In some products, the youth period is absent or relatively short while the wear-out period is long, such as depicted below:

time

λ(t)

Page 84: Reliability engineering

Chapter-IV Failure Rates IV-6

For products with rapidly increasing failure rates, it requires corrective measures: regularity of inspection, maintenance, replacement, etc. Thus, the central concern in the wear-out period is to be able to predict the probable service life with a suitable model, so that a prudent schedule for preventive maintenance can be formulated. Generally speaking, the wear-in mode is a quality control issue, while the wear-out mode is a maintenance issue. The random failure or constant rate mode, on the other hand, is widely used as the basis for product reliability considerations. Some of the key features in this regard are discussed in the next section. IV-2. Reliability Models Based on Constant Failure Rates. Reliability Model for a Single Unit. Let the failure rate of a certain product be constant, say λ(t)=λο; the corresponding reliability function is given by (4.2): R(t) = exp(- λοt) (4.5) And, the corresponding probability density function is given by (4.3): f(t) = λοexp(- λοt) (4.6) The mean (MTTF) and the standard deviation of f(t) are given by: µ = MTTF = 1/λο (4.7)

σ = µ = 1/λο At the mean-time-to-failure (µ=1/λο), the reliability value is R(µ) = exp(-1) = 0.368; or F(µ) = 0.632. Note that the number 0.632 is also the value of F(θ) in the Weibull distribution (see III-31); but θ is not the mean of the Weibull function.

Example 4-3. A device in continuous use has a constant failure rate λο=0.02/hr. Then, the following

may be computed: (a) The probability of failure within the first hour of usage: Pt≤1 = F(1) = 1 - exp(-0.02x1) = 1.98%. (b) The probability of failure within the first 10 hours: Pt≤10 = F(10) = 1 - exp(-0.02x10) = 18.1%. (c) The probability of failure within the first 100 hours:

Page 85: Reliability engineering

Chapter-IV Failure Rates IV-7

Pt≤100 = F(100) = 1 - exp(-0.02x100) = 86.5%. (d) The probability of failure within the next 10 hours, if it has already been in use for 100 hours. This is conditional probability situation, since the device has already survived 100 hours. Now, let X=1-F(100) be the probability the devise survives 100 hours and Y=F(110) be that fails within 110 hours (see the sketch below). Then, the answer to (d) is found from the conditional probability of Y, given X already occurred: PY/X = PX ∩ Y/PX = [F(110) - F(100)]/[1 - F(100)]

t

Y= F(110)

X= 1-F(100)

X∩Y

f(t)

110100

Since F(110) = 1 - exp(-0.02x110) = 88.92% and F(100) = 86.5%, we find PY/X = (0.8892-0.865)/(1- 0.865) = 18.1%. Discussion: The answer to (d) is identical to that in (b). Thus, being of constant failure rate, the device has no memory of prior usage.

Single Unit Under Repeated Demands. When a product is called to service by a “demand”, there is a probability p for failing to respond to the demand. If N such demands are called during the time period t, we define the average number of demands per unit time as: m = N/t (4.8) Assuming failure of the unit during each demand is an independent event, the reliability of the unit subjected to N repeated demands is (see Chapter III, the binomial distribution): RN = (1-p)N. In particular if p<<1 and N>>1, the above is reduced to the Poisson distribution:

RN = e-Np = e-mp t (4.9) Now, let λο = mp (4.10)

Page 86: Reliability engineering

Chapter-IV Failure Rates IV-8

Here, λο = mp represents the “equivalent” failure rate for the unit under repeated demands. And, (4.9) becomes identical to (4.5), the constant rate reliability function. Hence, the reliability of a product under repeated demands is a case of “constant failure rate”, when p<<1 and N>>1.

Example 4-4. Within one-year warranty period, a cell phone producer finds that 6% of the phones sold were returned due to damage incurred in accidental dropping of the phone on the ground. A simulated laboratory test determined that when a phone is dropped on a hard floor the probability of failure is 1 in 10; or p=0.2. Based on this information, the engineers at the phone manufacturer made the following interpretation: (a) Let the time unit be “year”. Then, for a single unit: F(1) = cumulative probability of being damaged within 1 year is 6%; Or, R(1) = 0.94. The above can also be interpreted as 6 phones out of every 100 were damaged per year; Or, 94 phones out of every 100 did not suffer any damage during the year. (b) Then, let m = number of demands (drops on hard floor) per phone per year. From (4.9): R(1) = 1 - F(1) = exp(-mpt) = exp(-mx0.2x1) = 0.94 Solving the above, we obtain m=0.31 drop per phone per year. Interpretation: On the average, there are 31 drops per 100 phones per year. Or, alternatively, the mean time of drops for one phone is: MTTF = 1/0.31 = 3.23 yrs. Discussion: Given m=0.31, a factor stemming from customers habits, the phone producer can only redesign the phone by making it more impact resistant, or more robust; this mean to decrease the value of p. If, for instance, p=0.1, then R(1) = exp(-0.31x0.1x1) = 0.97; or F(1) = 0.03. It cuts the returning rate from 6% to 3% per year.

Single Unit Under Step-Wise Constant Rates. A unit often operates at different levels of

performance during a typical operating cycle. If at each performance level the unit fails with a

constant rate, it can then be treated as a “step-wise” constant failure rate problem. For instance,

the electric motor used in a household heat-pump system is called when the room temperature is

low; it is shut off when the room temperature is raised to the preset high. During a typical service

cycle, say 24 hour, the motor may be called N number of times; the failure rate profile may be

depicted schematically such as shown below:

t i m e

λ ( t )

s t a r t s t a r t s t a r t s t a r t r u n r u n r u n r u n

s t a n d - b y s t a n d - b y s t a n d - b y s t a n d - b y

Page 87: Reliability engineering

Chapter-IV Failure Rates IV-9

This profile provides the following information for an evaluation of the motor’s reliability: N = the number of starts (demands) per service cycle (24 hours);

c = time fraction when the motor is in running state during the service cycle;

1-c = time fraction when the motor is in stand-by state during the service cycle;

p = probability of failure when the motor responds to a operation call (start) λr = failure rate (per hour) when motor is in the running state; and λs = failure rate (per hour) when motor is in the stand-by state. Under the constant rate condition, the “combined” or “equivalent” failure rate, λc, for the motor in service can be expressed as: λc = λd + c λr + (1-c)λs (4.11) In the above, λd = mp, m being the number of calls per unit of time (=N/24 calls per hour). And, the reliability function of the motor is given by: R(t) = exp(- λct) (4.12) It is clear that, (4.12) is accurate only if the service time t is much greater than the length of one single cycle (24 hours).

Example 4-5. An electric blower is used in a heating system. The manufacturer of the blower has provided the following failure rate data: p = failure probability on demand = 0.0005 per call;

λr = failure rate per hour when blower is in running = 0.0004/hr;

λs = failure rate per hour when blower is in stand-by = 0.00001/hr.

During the a typical 24 hours in the winter months, the following data is obtained from the heater’s operation recording: # of calls time of call time of stop running running time 1 0:47 am 1:01 am 0.23 hr 2 1:41 am 2:07 am 0.43 hr 3 2:53 am 3:04 am 0.18 hr 4 3:55 am 4:13 am 0.30 hr 5 4:43 am 5:05 am 0.37 hr 6 5:58 am 6:19 am 0.35 hr 7 6:50 am 7:14 am 0.40 hr 8 7:46 am 8:07 am 0.35 hr 9 8:55 am 9:08 am 0.22 hr

Page 88: Reliability engineering

Chapter-IV Failure Rates IV-10

10 9:49 am 10:05 am 0.27 hr 11 10:49 am 11:01 am 0.20 hr 12 11:52 am 12:08 pm 0.27 hr 13 12:59 pm 1:11 pm 0.20 hr 14 1:49 pm 2:04 pm 0.25 hr 15 2:52 pm 3:11 pm 0.32 hr 16 3:58 pm 4:05 pm 0.12 hr 17 4:41 pm 4:59 pm 0.30 hr 18 5:43 pm 6:02 pm 0.32 hr 19 6:37 pm 7:00 pm 0.38 hr 20 7:37 pm 7:58 pm 0.35 hr 21 8:37 pm 8:55 pm 0.30 hr 22 9:29 pm 9:52 pm 0.38 hr 23 10:35 pm 10:47 pm 0.20 hr 24 11:37 pm 11:53 pm 0.27 hr _____________________________________________________________ Total calls Total running time = 6.96 hrs N=24; m=24/24=1/hr Time fraction: c=6.96/24=0.29 Thus, the combined failure rate for the blower is:

λc = mp + cλr + (1-c)λs = 1x0.0005+0.29x0.0004+0.71x0.00001=6.23x10-4/hr.

With λc, the reliability of the blower in a month service (720 hours) is given by:

R(720) = exp(-0.000623x720) = 0.64.

Failures of A Maintained Unit. It occurs in many engineering situations that a single device in continuous use can be regularly maintained so that the device can function indefinitely. If the unit is of a constant failure rate, it is possible to estimate the number of repairs and/or replacements needed to maintain the device over a long period of continued service.

Now, let p(n/t) be the probability that exactly n repairs or replacements are needed over the time

period t. Note that p(n/t) must satisfy the conditions that for t = 0, there is no failure:

p(0/0) = 1; p(n/0) = 0, for n > 0 (4.12) At any time period t > 0, however, p(n/T) must also satisfy the total probability condition: Σ p(n/T) = 1, sum over n = 0,1,2, . . ∝ (4.13) . Now, consider the time interval from t to t+∆t: the probability that zero repair will occur before t is p(0/t) and the probability that zero repair will occur before t+∆t is p(0/t+∆t). Note then, in order for p(0/t+∆t) to occur, we must have p(0/t) in the first place. Since the device has a constant failure rate, say λο, the failure probability during ∆t is λο∆t, while

the non-failure probability is (1- λο∆t). Consequently, we can write:

Page 89: Reliability engineering

Chapter-IV Failure Rates IV-11

p(0/t+∆t) = p(0/T)(1- λο∆t) Rearrange and we obtain the differential relation: ∆p(0/t)/∆t = - λοp(0/t) Integrating the above over the range from t=0 to t and noting the initial conditions in (4.12), we find the probability for zero repair within the time period of t: p(0/t) = exp(-λοt) (4.14) The result in (4.2) is not surprising, for it means that there is no failure before t; or it is simply the reliability of the unit for the time period t. To find the expression for p(n/t), however, we consider the probabilities that (1) n repairs occur already over t; so no more repair occurs during ∆t; (2) n-1 repairs occur over t; one repair occur during ∆t, though ∆t → 0). In short, we write: p(n/t+∆t) = p(n/t)(1- λο∆t) + p(n-1/t)λο∆t The above can be rewritten in the differential form: ∆p(n/t)/∆t = -λο p(n/t) + λο p(n-1/t) (4.15) Integration of (4.15) from 0 to t yields the integral expression for p(n/t):

p(n/t) = λο exp(-λοt) ∫ p( n -1/τ) exp(-λοτ) dτ (4.16)t

0 Equation (4.16) is a recursive relationship; it allows for the determination of p(n/t) successively for n=1,2,3, . . For instance, for n=1, we substitute the result in (4.14) into (4.16) and carry out the integration, obtaining: p(1/t) = (λοt) exp(-λοt) (4.17) For n=2, we in turn obtain: p(2/t) = [(λοt)2/2] exp(-λοt) (4.18) A general expression for p(n/t) is given by:

Page 90: Reliability engineering

Chapter-IV Failure Rates IV-12

p(n/t) = [(λοt)n/n!] exp(- λοt) (4.19) We note that (4.19) is in the form of the Poisson distribution for the random variable n (see, Eq. 2-21), whose mean and variance have the same value (see Example 2.11): µn = σn2 = λοt (4.20)

In the above, µn is known as the “mean number” of repairs needed over the time period t.

Mean-Time-Between-Failures. The "mean time between failures", or MTBF, of the maintained unit is defined as: MTBF = t/µn = 1/ λο (4.21) The above is identical to the MTTF of the un-maintained unit, see (4.6) and (4.7). With the distribution for n in (4.19), the cumulative probability that more than N repairs are needed over the designated time period t is given as follows:

P(n>N)/t = Σ [(λοt)n/n!] exp(-λοt) = 1 - Σ [(λοt)n/n!] exp(-λοt) (4.22)n=N+1

∞ N

n=0 A more precise interpretation of the terms in the above equation is as follows: (a) the term which sums from n= 0 to n=N represents the probability that up to N repairs will occur during the time period t; (2) the term which sums from n=N+1 to n → ∝ represents the probability that more than N repairs will occur during the time period t. Together, their sum represents the total probability during the period t, including n=0, 1,2,3, . . , ∞ (see Eq. 4.13).

Example 4.6. A DC power pack (in a computer) is in continuous use; and it has a constant failure rate λο=0.4 per year. If a spare is kept on hand in case of the power pack failure, what is the chance

of running out of the replacement spare within a 3-months period? Solution: Here, the designated time period is t=1/4 year; the mean number of failures in 3 months is λοt=0.1. The probability that more than 0 (or at least 1) failure occurs within t=1/4 is calculated

from using (4.22) for N=0: Pn>0/t = 1 - exp(- λοt) = 0.095

Or, there is roughly 10% chance that the spare will be used within 3 months. Discussion: If 2 spares are kept in hand, the chance of running out of the spares within 3-months can be calculated also from (4.22). By setting N=1 (i.e. n is more than 1, or at least 2), we obtain:

Page 91: Reliability engineering

Chapter-IV Failure Rates IV-13

P(n>1/t = 1 - (1+λοt) exp(- λοt) = 0.00468.

Hence, the chance to use 2 spares in 3-month is less than one half of 1%. Example 4.7. Field data have shown that truck tires fail due to random puncture on the road. If the mean-time-between-failure (MTBF) of a tire is 1500 Km, then a truck with 10 wheels must carry some spare tires. (a) What is the chance that at least 1 spare will be used on a 100 Km trip? (b) What is the chance that at least 2 spares are needed on a 100 Km trip? (c) How many spares should be kept in order to have more than 99% assurance that it will not run out of spares on a 100 Km trip? Solution: Puncture of tire is a random event; hence, it can be treated as a case of constant failure rate. In this example, the MTBF=1500 Km is known from field data; thus, we can approximate the constant failure rate of a single tire as: λο=1/MTBF = 1/1500km.

Since the truck has 10 wheels, the 10 tires will accumulate a total of t=10x100=1000 Km over a trip of 100 Km. Hence, λοt = 1000/1500 = 0.667. Thus,

(a) The probability that at least 1 tire is punctured is: P(n>0)/t = 1 - exp(-λοt) = 0.487;

(b) The probability that at least 2 tires are punctured is: P(n>1)/t = 1 - (1+λοt) exp(-λοt) = 0.144;

(c) For more than 99% reliability or less than 1% failure chance, N spares should be kept;

and the chance of running out of N spares must be less than 1%: P(n>N)/t ≤ 1% If N=2, it takes more than 2 punctures to fail the 100 Km run; and that chance is:

P(n>2/t = 1 - [1+λοt +(λοt)2/2] exp(-λοt) = 0.03

The probability of running out of 2 spares is 3%; or the reliability is 97%. If N=3, more than 3 punctures would fail the 100 Km run; and that chance is:

P(n>3)/t = 1-[1+λοt +(λοt)2/2 + (λοt)3/6 ] exp(-λοt) = 0.00486.

The probability of running out of 3 spares is 0.486%. Hence, for 99% reliability or better, N=3 spares should be kept.

IV-3. Time-Dependent Failure Rates.

Page 92: Reliability engineering

Chapter-IV Failure Rates IV-14

The general characteristics of the failure rate function λ(t) resemble the profile of a bathtub, consisting 3 time periods each is associated with a distinct failure mode. In the infancy (wear-in) period, the rate is initially high but decreases with time; the youth period then follows with a constant rate; it is in turn followed by the ageing (wear-out) period where the rate increases with time. In the previous section, we have discussed several reliability models that are based on the constant rate, or the youth period assumption; in this section, we examine the cases where the failure is time-dependent. The Wear-In Mode of Failure. The wear-in mode is characteristic to products with initial defects, stemming from product-design to manufacture and handling. When put into service, the products may initially experience a high rate of failure caused mainly by the inherent defects; as the early failures occur, the failure rate then gradually reduces. The test data for the TV sets discussed in Example 4-1 exhibits just such a “wear-in” behavior; in fact, the data can be fitted to a decreasing time function for λ(t):

λ(t) = a t -b (4.23) In the above, a and b are positive and real constants. Having found λ(t), one can obtain the reliability function R(t) and the failure probability density function f(t), see (4.2) and (4.3), respectively. Then, a number of questions related to product reliability may be rationally answered.

Example 4-8. A circuit has the failure rate described by: λ(t)=0.05/(t1/2) per year; t in years. Here, λ(t) has the form of (4.23). The associated reliability function R(t) and the failure probability density function f(t) are obtained by integrating (4.2) and (4.3), respectively: R(t) = exp[- 0.1(t1/2)]; f(t) = 0.05 exp[- 0.1(t1/2)]/(t1/2) With the above, the following may be computed: (a) The reliability for 1 year use: R(1)=exp[- 0.1]=0.905; F(1)=9.5%; (b) The reliability for the first 6-month use: R(0.5) = exp[- 0.10.5] = 0.93. (c) The fraction of failed circuits in 3 years: F(3)=1-R(3)=1-exp[-0.13]=0.16; 84% still in use. (d) A circuit has been in service for 1 year; the reliability for it to be in service for another 6-month: R(0.5/1) = 1- F(0.5/1) F(0.5/1) =[F(1.5 -F(1)]/R(1)=[0.115 - 0.095]/0.905=0.022

Page 93: Reliability engineering

Chapter-IV Failure Rates IV-15

R(0.5/1) = 1- 0.022 = 0.98

Discussion: The result in (d) shows that the circuit in question had a fewer defects and thus lasted 1 year without failure; this particular circuit, when used for another 6 months, has a higher reliability than a randomly picked new circuit; the reliability of the latter is the answer in (b). The above also demonstrates that if early failures can be eliminated from a lot of the circuits before putting them into service, the remaining circuits will have a higher reliability. This gives rise to the concept of proof-test, a practice often used in off-line quality control.

The Concept of Proof-Test. Proof-test is usually conducted with products having an initial high failure rate, followed by a relatively short wear-in period. The test is designed to subject the products under simulated service conditions for a short period of time; in the process, most products with fatal defects will be weeded out, while products that have passed the test will yield a higher reliability. Suppose that the failure rate function λ(t) of a product has a short wear-in period, such as shown in the sketch below. Here, as the case in general, the associated wear-in time period tp is not well defined but it can be estimated. Beyond this period, it is assumed that the failure rate function begins with the constant-rate mode.

λ(t)

tp

wear-in

t

If a proof-test is conducted for the time period tp, the fraction of failure and the fraction of survival can be estimated from the R(t) curve, as sketched below. The upper shaded area represents the fraction of failure, while the lower shaded area represents the fraction of survival..

t

R(t)

τ

1.0

0.5

0.0

tp

R(tp)

R(tp+τ)

early failures

Page 94: Reliability engineering

Chapter-IV Failure Rates IV-16

Now, for the product that survived the proof-test, let τ (=t-tp) be the actual service life. Thus, the associated reliability is given by:

R(τ/tp) = R(tp+τ)/R(tp) = exp[-∫ λ(ξ)dξ]/exp[-∫ λ(ξ)dξ ] 0 0

tp+τ tp

Upon rearranging,

R(τ/tp) = exp[-∫ λ(ξ)dξ]; τ > 0 (4.24) tp +τ

tp

It can be readily shown that R(τ/tp) is much greater than R(t), t being (τ+tp).

Discussion: Example 4-8(d) is a case related to proof-test. We can now use (4.24) to compute the desired result. In fact, we find R(0.5/1)=0.98 using (4.24) which is the same as that found differently in Example 4-8(d).

The Wear-out Mode of Failure. The wear-out mode is characteristic to products that have been in service for some length of time usually well into the constant-rate period. Occasional over-loading and adverse environment may have induced damages in the product; and the effect of “fatigue” then sets in. Consequently, the failure rate begins to rise. Fatigue failures in durable products are common in practice; and the problem must be adequately addressed within the context of reliability. In fact, time-to-failure probability distribution that can be described by the normal, log-normal or the Weibull functions are associated with the wear-out failure mode. This is briefly discussed below: Normal Function and Failure Mode. The time-to-failure probability described by the popular normal distribution is associated with a failure rate function that is of the wear-out mode. We can show this starting from the standardized normal CDF, Φ(z), where z is the transformed time: z = (t-µ)/σ, and µ and σ are the mean and standard deviation of the normal probability density function, f(t); see (3.11) to (2.17) for details. The corresponding reliability function is the given by: R(t)=1-Φ(z). It follows from (4.1) that the failure rate function is:

λ(t) = (1/2π)(1/σ) exp[-z2/2]/[1-Φ(z)] (4.25) With the help of Appendix III-A, a plot of λ(t) versus time can be obtained using (4.25):

Page 95: Reliability engineering

Chapter-IV Failure Rates IV-17

t

µµ−2σ

10/σ

λ(t)

µ−σ µ+2σµ+σ0

5/σ

15/σ

It is seen that λ(t) rises sharply when the time t exceeds the MTTF, i.e. beyond t > µ.

Example 4-9. Field data of a certain brand of tires show that 90% of the tires on passenger cars fail to pass inspection between 22 to 30 k-miles. If the time-to-failure probability of the tires can be described by a normal distribution, (a) What is the failure rate when a tire has 20 k-miles on it? (b) What is the failure rate when a tire has 25 k-miles on it? Since 90% of the tires fail between 20 and 30 k-miles (i.e. the central population is 90%), we can write: Φ(z20k) = 0.05 Φ(z30k) = 0.95

Using Appendix III-A, we find, z20k = (20-µ)/σ = -1.65 z30k = (30-µ)/σ = 1.65

Solving the above for µ= 25 k-miles and σ = 3.03 k-miles Having found the values of σ and µ in normal pdf f(t), the corresponding failure rate function λ(t) is then computed from (4.25). Thus, (a) For a tire having 20 k-miles on it, λ(t=20)=0.14 per k-miles; and (b) For a tire having 25 k-miles on it, λ(t=25)=0.8 per k-miles. Since λ increases with t (miles), failure of the tire has the wear-out mode. Discussion: For a product in service, knowledge of its failure rate helps to formulate a rational repair or replacement schedule. In the above example, a failure rate of 0.14 per k-mile at the 20 k-mile life may be more tolerated than one of 0.8 per k-mile at the 25 k-mile life.

Log-Normal Distribution Its Failure Modes. The log-normal time-to-failure distribution is explicitly expressed in (3.33); its behavior is that of “left-skewed” function, the degree of skew is dictated by the parameters ωo. When the value of ωo → 1, the pdf g(t) becomes an exponential

Page 96: Reliability engineering

Chapter-IV Failure Rates IV-18

function; when ωo → 0.1, g(t) approaches the normal function. Thus, the failure rate function

corresponding to ωo → 1 is one of “constant rate” mode; the failure rate function corresponding

to ωo → 0.1 is one of “wear-out” mode. Hence, we see that ωo can affect the failure mode.

According to (4.1) and with the substitution of (3.33), we obtain a general expression for the failure rate function:

ωο

1

√2πexp [

12

−2( ) (4.26) (t) = 1

tln(t/t )o

ωο] /[1- ]Φ( )zλ

In the above, Φ(z) is the standardized CDF of g(t) via the transformation (see Eq. 3.35): z = [ln(t/to]/ωo (4.27)

Example 4-10. Failure of a brand of shock absorbers used in passenger cars is described by the log-normal function. Field data shows that 90% of the shock absorbers fail between 120 k-miles and 180 k-miles. What is the failure rate of the shock absorber at t =150 k-miles? Solution: Given g(t) being log-normal, the parameters to and ωo can be determined from the field

data. The standardized CDF of g(t) is Φ(z), where z is expressed in (4.27). From the field data, we have Φ(z120) = 0.05 and Φ(z180) = 0.95

Using Appendix III-A and then (4.27), we find:

z120 = ln(120/to]/ωo = -0.1645

z180 = ln(180/to]/ωo = 0.1645

From there, we solve for log-normal parameters to and ωo:

to = 147 k-miles and ωo = 0.1232

Then, the failure rate function is given by (4.26). At t=150 k-miles, λ(150)=0.49 per k-mile. Compare this value to λ(120)=0.009/k-mile, and λ(to)=

λ(147)=0.044/k-mile; we see that λ(t) increase rapidly once t > to. In this case, ωo = 0.1232, which is

close to being 0.1; so the distribution is almost normal like. Discussion: In general, the failure rate for log-normal is of the wear-out mode; the rate will increase sharply once t is greater than to. But if ωo → 1, the log-normal reduces to exponential; the failure

rate is then constant in time (see Chapter III, section III-4).

Weibull Distribution and Failure Modes. Following the same procedures as before, the failure

Page 97: Reliability engineering

Chapter-IV Failure Rates IV-19

rate function for the Weibull distribution can be obtained from using (3.37) and (4.1):

λ(t) = (m/θ)(t/θ)m-1 (4.28) The behavior of λ(t) thus depends on the values of the Weibull parameters θ and m. In particular, when 0<m<1, λ(t) is a decreasing function of t, representing the wear-in mode; when m=1, λ(t) is simply a constant, representing the random failure mode (or the youth mode); when m>1, λ(t) is an increasing function of time, representing the wear-out mode. We can see this in the following example.

Example 4-11. A hearing aid has the time-to-failure described by the Weibull distribution:

f(t) = (m/θ)(t/θ)m-1 exp[-(t/θ)m]

The Weibull CDF is F(t)=1-exp[-(t/θ)m; the reliability function is R(t)=exp[-(t/θ)m. The corresponding failure rate function is then found from using (4.1). Thus, we have

λ(t) = f(t)/R(t) = (m/θ)(t/θ)m-1

Thus, if the shape parameter m=0.5 and the scale parameter θ=180 in days, then

λ(t) = 0.0373 t-0.5 per day. This is a decreasing time function, representing the wear-in failure mode. If we increase the value of m to 1.5, we obtain λ(t) = 0.00062 t05. This is an increasing function, representing the wear-out mode. Similarly, if we set m=1, λ(t)= 1/θ which is a constant. Discussion: The parameter m in the Weibull function controls the shape behavior of f(t), hence the reliability function R(t) as well as the failure rate function λ(t). The figures shown below illustrate the shape behaviors of f(t), R(t) and λ(t) for m=0.5, 1.0, 2.0 and 4.0:

t

f(t) R(t) λ(t)m=4

m=2

m=1

m=.5 m=4 m=2

m=0.5

m=1

1

m=0..5

m=1

m=2m=4

Example 4-12. For the hearing-aid considered in the previous example, the failure rate function is:

λ(t)=0.0373(t)-0.5

Page 98: Reliability engineering

Chapter-IV Failure Rates IV-20

where t is in days; and the associated failure mode is that of wearing-in. The plot below illustrates the behavior of λ(t) as a function of t:

day

λ

.03726

1

It is seen that the product is inherently defects-laden, as the failure rate before the first day of service is more than 3% per day. Suppose that the manufacturer conducts a proof-test on the hearing-aids so as to screen out the infancy failures; and the new pdf for the time-to-failure of the screened hearing-aids is again a Weibull function but with m=1.5 and θ=180 days. Then, according to (4.28), the new failure rate function is:

λ(t)=6.21x10-4(t)0.5 The mode of the new failure rate is now an increasing function of t, representing the wear-out mode. And, it can be shown that the failure rate of the screened product at day-1 is only 0.0621%/day. However, the failure rate is rapidly increasing as it goes beyond 100 days:

day

λ

50 100

.00621

Model for Bath-Tub-Like Failure Rates. From the above, it is seen that the Weibull function can describe the wear-in (m<1), constant-rate (m=1) and wear-out (m>1) modes. Thus, it can be used to describe the entire bath-tub curve with a combined failure rate function in the general form:

λ(t) = (ma/θa)(t/θa)ma-1 + (mb/θb)(t/θb)mb-1+ (mc/θc)(t/θc)mc-1 (4.29)

where 0<ma<1, mb=1 and mc >1; also θa < θb < θc.

Note that the first term in (4.29) is decreasing with t, which is in the infantile mode, while the second term is constant in t, hence in the youth mode; the last term increases with t, representing the wear-out mode. Proper choice of the Weibull parameters m's and θ's in (4.29) will yield a smooth bath-tub curve for λ(t). This is illustrated by the example below:

Example 4-13. Suppose that the failure rate function for a cutting knife behaves like a bath-tub curve;

its wear-in period is described by λ(t)=(m/θ)(t/θ)m-1 with m=0.2 and θ=10 (days); the constant rate

mode, by λ(t)=1/100/day and the wear-out mode by λ(t)=(m/θ)(t/θ)m-1, with m=2.5 and θ=120 days.

Page 99: Reliability engineering

Chapter-IV Failure Rates IV-21

The combined failure rate function is thus: λ(t) = 0.1262(t)-0.8 + 0.01+1.58x10-5(t)1.5 per day

0 20 40 60 80 100 120 140 160 180 day .00

.02

.04

.06

.08 λ per day

wear-out

wear-in

const. rate

combined λ (t)

Discussion: The failure rate behavior of the cutting knife does not have a significant constant-rate period; the wear-in period is short (about 25 days) while the wear-our period is long.

VI-4. Failure Rates for Systems of Multiple Units. Almost all engineering systems are a complex combination of multiple units or sub-components; in general, each of unit or sub-component has its own failure rate; when functioning in the “system”, one or more of the units can fail which may or may not cause the system failure. Of course, failure of a unit or a sub-component can reduce the reliability of the system, even if the system is still safe. Analysis of the system’s failure rate can often be conducted by representing the system as a combination of two elementary models: the “in-series” and “in-parallel” models. The In-Series Models. Let the system S contain N sub-components, Xi, with i=1,N, connected in series as a chain:

X1 X2 X X X3 N-1 N

Let fi(t) be the time-to-failure pdf of Xi; then the corresponding failure rate is λi=fi(t)/Ri(t); see

(4.1), where the reliability function Ri(t) is given by (4.2):

Ri(t) = exp[-∫ λi(t)dt]

0

t

(4.30) Here, the N sub-components are arranged in series; failure of one or more of components will

Page 100: Reliability engineering

Chapter-IV Failure Rates IV-22

cause failure of the whole system. Thus, according to (2.9), the failure probability of the system is represented by the intersection all sub-component failures, Xi:

PSsy = PX1 ∪ X2 ∪ X3 ∪ • • • ∪ XN Alternatively, the system reliability requires the non-failure (X'i) or the reliability RN(t) of each and every sub-component; thus, using (2-10): Rsy = PX'1 ∩ X'2 ∩ X'3 ∩ • • • ∩ X'N If each sub-component failure is independent from any other sub-component failure, then the system reliability can be reduced to:

Rsy(t) = R1(t).R2(t).R3(t). . . RN(t) (4.31)

Upon substituting (4.30) into (4.31), we obtain:

exp[-∫ λ1(t)dt] .0

texp[-∫ λ2(t)dt] . . . .

0

texp[-∫ λN(t)dt]

0

tRsy(t) =

= exp[-∫ (λ1+λ2+ . . +λN)dt] = 0

texp[-∫ λsy(t)dt]

0

t

(4.32) In the above expression, λsy(t) is the “system failure rate” which is defined by

λsy (t) = Σλi(t); sum over i = 1,N (4.33)

Discussion: The in-series model is often referred to as the “weakest-link” model, as first discussed in Chapter III, Section III-5. In particular if Xi = X for all i =1,N, (4.33) reduces to:

λsy (t)=Νλ(t).

Most engineering systems are more complex than the in-series model; but the model can at least provide a lower-bound estimate for the system’s failure rate. Note that the reliability lower-bound of the system cannot be better than the reliability of the poorest sub-component in the system.

Example 4-14. N identical sub-components are arranged in series; the time-to-failure pdf for each of the sub-components is described by the Weibull function, with the parameters m and θ. Determine (a) The system failure rate; and (b) The time-to-failure pdf for the system. Solution: The failure rate function of the sub-components is given by (4.28):

λ(t)=(m/θ)(t/θ)m-1

Page 101: Reliability engineering

Chapter-IV Failure Rates IV-23

Hence, the system failure rate is found follows from (4.33):

λsy(t) = N(m/θ)(t/θ)m-1= (m/θΝ)(t/θΝ)m-1

where

θΝ= θ/Ν1/m

Note that the system failure rate function has the same form as that of the sub-component so the system time-to-failure pdf is also a Weibull function, but with the parameters m and θN. In essence, the in-series arrangement of N units shifts the parameter θ of the unit to θN; the latter is much reduced, depending on the value of N, while the parameter m remains unchanged regardless the value of N. Example 4-15. A computer circuit board is made of a total of 67 components in 16 different categories. The failure rate of each of the 16 categories is listed in the table below: the first column lists the 16 categories; the second column indicates the number of components (n) in each category; the third column lists the component failure rate λ (=constant) and the forth column is the cumulative failure rate nλ derived from the n components in each category. Component type Number of units, n Unit failure rate, λ Cumulative failure rate, nλ

type-1 capacitor 1 0.0027 x10-6/hr 0.0027 x10-6/h type-2 capacitor 19 0.0025 0.0475 resistor 5 0.0002 0.0010 flip-flop 9 0.4667 4.2003 nand gate 5 0.2456 1.2286 diff. receiver 3 0.2738 0.8214 dual gate 2 0.2107 0.4214 quad gate 7 0.2738 1.9166 hex inverter 5 0.3196 1.5980 shift register 4 0.8847 3.5388 quad buffer 1 0.2738 0.2738 4-bit shifter 1 0.8035 0.8035 ± inverter 1 0.3196 0.3196 connector 1 4.3490 4.3490 wiring board 1 1.5870 1.5870 solder connector 1 0.2328 0.2328 __________________________________________________________ _______________________

Total units: N = 67 Sum of all failure rates = 21.672 x10-

6/hr Here, the sum of all failure rates in the fourth column is the lower-bound system failure rate:

λsy=21.672x10-6/hr

Based on the lower-bound estimate, the system reliability is:

Rsy=exp[-21.672x10-6 t]

And, the system mean-time-to-failure is: MTTF=1/λsy= 46142 hrs.

Page 102: Reliability engineering

Chapter-IV Failure Rates IV-24

Since λsy is a constant, the system failure is of the random mode; and the time-to-failure probability

of the system is described by the exponential function.

The In-Parallel Model and the Bundle Theory. Suppose that the system S contains N sub-components, denoted by Xi, i=1,N, and are arranged in-parallel as depicted below:

out putinput

x1

X2

XN In this case, failure of one sub-component may or may not cause failure of the system. Indeed, this is a system with a degree of “redundancy”, or with a “fail-safe” feature. In general, a certain “load-sharing mechanism” is built into the in-parallel system; namely, the load carried by one sub-component that fails will be "shared" by the un-failed ones according to the load-sharing mechanism. In addition, if the failure probability of the un-failed ones depends on that of the failed ones, it will result in a "conditional probability" situation. Hence, in order to evaluate the system reliability for the in-parallel models, the following input information is needed in addition to the reliability functions (Ri's) of the sub-components: (a) the load-sharing mechanism in the in-parallel arrangement; and (b) the conditional failure probability between any two sub-components. This can often result in mathematically complex formulations. However, if there is no specifically built-in load-sharing mechanism and if failure of one sub-component does not depend on that of the others, mathematical complexity in the in-parallel model can be greatly simplified. In fact, the system reliability can be readily found as:

Rsy(t) = 1 - [1-R1(t)][1-R2(t)][1-R3(t)]. . . [1-RN(t)] (4.34) The above result is based on the assumption that the system is safe as long as one sub-component

is safe under the system loading. In fact, the term [1-R1(t)][1-R2(t)][1-R3(t)]..[1-RN(t)] in (4.34) is the probability that all N components fail. Note that Rsy in (4.34) is better than or at least equal to the best of the Ri's. Hence, (4.34) is in the upper bound for systems of N sub-components regardless of their arrangement.

Example 4-16. The reliability of a pressure valve during a specific service period is Ro=0.8. If two such valves are arranged in parallel, evaluate the system reliability for the service period. Solution: The system reliability is readily given by (4.34), if failure of one valve does not affect the

Page 103: Reliability engineering

Chapter-IV Failure Rates IV-25

failure of the other: Rsy = 1 - (1-0.8)(1-0.8) = 0.96 Discussion: If three valves are arranged in series, the system reliability can be improved: Rsy = 1 - (1-0.8)(1-0.8)(1-0.8) = 0.992 Thus, parallel structure improves system reliability. Example 4-17. Suppose the valve in the previous example have the time-dependent reliability given by: R(t) = exp[-λοt] Then, the reliability of the system with two valves in parallel is:

Rsy(t) = 1 - 1 - exp[-λοt]2 = 2exp[-λοt] - exp[-2λοt] If N-valves are arranged in parallel and it requires at least M (M<N) of the valves be reliable during a particular application, then the system reliability can be determined by the "binomial distribution" (see section II-2 for reference to binomial distribution):

Rsy(t) = Σ CNi R(t)i [1-R(t)]N-i sum over i = M, . . N

Series-Parallel Combination Model. Many engineering systems are configured in a network of in-series and in-parallel units; each unit may contain sub-components that are in turn arranged in-series and/or in-parallel. If the reliability function, or the failure rate function, of each component in the units is given, one can evaluate the following for the system:

(a) The lower bound for the system Rsy, using (4.32); (b) The upper bound or the system Rsy, using (4.34); and/or (c) The exact system Rsy, using the method of network reduction technique; this is to be

discussed in Example 4-18 below: Example 4-18. A system is made of 7 sub-components arranged in a network as shown below. For the specific service duration, the reliability value of each sub-component is known as indicated:

0.9

0.9

0.9I O

0.8

0.8

0.75

0.75

A

B

C

D

Page 104: Reliability engineering

Chapter-IV Failure Rates IV-26

The system is a relative simple network of in-series and in-parallel units; the lower-bound, and upper-bound for the system reliability can be readily computed using (4.31) and (4.34), respectively; the “exact” system reliability Rsy can be evaluated by the network reduction technique.

(a) The lower-bound, by (4.31):

(Rsy)lb = (0.9)3 (0.8)2 (0.75)2 = 0.26244 (b) The upper bound, by (4.34):

(Rsy)ub = 1 - [(1-0.9)3 (1-0.8)2 (1-0.75)2] = 0.9999975 (c) The exact Rsy, by the network reduction technique: * The in-parallel unit from B to D is replaced by a single equivalent component with

RBD = 1 - [(1-0.75)2] = 0.9375 * The in-series unit from A to B to D is replaced by a single equivalent component with RABD = (0.9) (0.8) (0.9375) = 0.675 * The in-series unit from A to C is replaced by a single equivalent component with RAC = (0.9) (0.8) = 0.72 * The in-parallel unit from A to O is replaced by a single equivalent component with RAO = 1 - [(1-0.675) (1-0.72)] = 0.909 * The system reliability is determined by the in-series unit from I to O: Rsy = (0.9) (0.909) = 0.8181.

Discussion: This network has 2 levels of in-parallel units: one is from B to C and the other is from A to O. Such a system is said to have a much greater degree of redundancy than the all-in-series system. The “exact” reliability in this case is much higher than the lower bound; yet, it is also substantially lower than the upper bound.

The Bundle Theory. The “bundle” theory (due to Daniels, 1945) refers to a system of many sub-components that are arranged in-parallel. Originally, Daniels considered a loose bundle of N “identical” threads. If the bundle is loaded by a tensile force T, then the tensile load on each thread is assumed to be x = T/N; and, when one of the threads fails, the remaining (N-1) threads would share the bundle load (Nx) equally; so the tension on each thread will be increased to Nx/(N-1); increase in tension may in turn increase the failure probability for each of the surviving threads. If one or more of the threads fail again, the rest of threads continued to shared the bundle load equally; this, of course, may further increase the probability of failure of the surviving ones. Clearly, the failure probability of the bundle depends on the thread strength distribution and the assumed load-sharing mechanism that the applied bundle load is always shared equally by the surviving threads. This assumption helps to reduce the complexity of the problem (if the bundle of threads are twisted into a rope or cast into a binder matrix material, the load-sharing mechanism

Page 105: Reliability engineering

Chapter-IV Failure Rates IV-27

would be much more complex). Now, let the tensile strength of the thread be the random variable X. Then, the probability that a typical thread fails at or before X ≤ x is denoted by: F(x) = PX ≤ x (4.35) Now, let the random variable TB be the failure load of the bundle; then XB=TB/N represents the "averaged" thread tension at the outset when no thread has failed before TB. Note that X is the strength of the thread, while TB is the strength of the bundle. When XB reaches x, X also reaches x, if no thread has failed; X reaches TB/(N-1) when one of the threads fails; Similarly, X reaches TB/(N-i) when i of the threads fail. The probability that the bundle fails at the average thread strength XB is denoted by: FB(x) = PXB ≤ x (4.36) Thus, given the thread strength F(x), Daniels theory leads to the bundle strength FB(x):

FB(x) = Σ Σ (-1)N-n N! [F(Nx/r1)] . [F(Nx/(r1+r2))] . . [F(x)] /(r1!r2! . . rn!)

n=1

N

r

r1 r2 rn (4.37)

In the above, the inner sum is taken over r = r1, r

2, . . . rn; and r

1, r

2, . . . rn are integers equal or

greater than 1; their combination is subject to the condition:

Σ ri = Ni=1

n (4.38)

For a bundle of only 2 threads (N=2), the running number n can only be 1 and 2. So, for n=1, there can be only r

1=N=2; for n=2, there can be only r

1=1 and r

2=1. Accordingly, we obtain from

expanding (4.37) the following expression for the CDF of the bundle strength:

FB(x) = 2F(2x) F(x) - F(x)2 (4.39) Expansion of (4.37) for a bundle of 3 threads (N=3) is similar but is considerably more tedious. Details are left in one of the assigned exercises. Beyond N=3, expansion of (4.37) becomes unmanageable. However, when the value of N becomes larger, say N>10, Daniels showed that FB in (4.37)

reduces to the CDF of a normal distribution. In that case, the normal parameters in FB, namely µB

and σB, can be expressed in terms of the parameters in the pdf of the individual threads (the pdf of the threads need not be normal). That part of the derivation, however, is outside the scope of this Chapter.

Page 106: Reliability engineering

Chapter-IV Failure Rates IV-28

Example 4-19. Suppose the tensile strength X (in GPa) of a single fiber is given by the CDF

F(x) = PXx = 1 - exp[-(x/8)7] Then, at 1% of failure probability, the maximum applied fiber stress is determined from

F(x) = 0.01 = 1 - exp[-(x/8)7] Solution of the above yields: x = 4.15 GPa. Now, if two fibers are bundled together “loosely”, the maximum applied bundle stress at 1% of failure probability is determined via (4.39):

FB(x) = 0.01 = 2[1-exp[-(2x/8)7][1-exp[-(x/8)7] - [1-exp[-(x/8)7]2

The above yields: x ≈ 4.02 GPa.

Discussion: From this example, the strength of “loose” bundle is weaker than that of the single fiber. Alternatively, the probability of failure of a bundle of N loose fibers under T=Nx is actually greater than that of the single fiber under the load of x. Note: The CDF in (4.39) for the bundle of 2 fibers has actually been obtained earlier in Chapter III, Example 3.13, case (c). In the latter, the random variable X=x is the total load on the bundle; the load on the single fiber is thus x/2, when no fiber fails; the load on the bundle and on the surviving fiber is x when one fiber fails.

Failure Rate of A Loose Bundle. If the random variable XB in (4.36) and (4.37) is a time variable, one can simply replace x by t to obtain f(t) or FB(t); the corresponding failure rate

function for the bundle, λΒ(t), is then obtained from using (4.1): λΒ(t) = [dFB(t)/dt]/[1-FB(t)] = fB(t)/RB(t) (4.40)

Page 107: Reliability engineering

Chapter-IV Failure Rates IV-29

Summary: This chapter introduces the reliability function: R(t)=1-F(t), where F(t) is the probability of failure during the time period 0 to t. An alternative expression for R(t) is through the failure rate function λ(t) as expressed in (4.2). The reasons for expressing R(t) in term of λ(t) instead of f(t) or F(t) are several:

(a) The time-profile of λ(t) reveals the physical modes in which failure occurs: namely, the wear-in mode, the constant-rate mode and the wear-out mode; knowledge of the failure mode helps to devise proper reliability enhancement measures;

(b) It is generally less time-consuming in collecting test failure-rate data than the time-to-failure data;

and

(c) It is mathematically simpler in formulating reliability models in terms of failure rates, especially for systems of multiple units.

In studying this chapter and when working on the assigned exercise problems, pay attention to the following areas:

• The physical meaning of and the mathematical relations between the time-to-failure pdf f(t) and the failure-rate function λ(t), see (4.1) to (4.4).

• The mode of λ(t), whether it is in the wear-in mode, constant-rate mode or wear-out mode. Failure

mode dictates the approach taken to address the reliability concern; for instance, the wear-in mode is mainly an off-line quality-control issue, the wear-out mode is a repair-replacement issue, and the constant-rate mode is a sale-warranty-inventory issue.

• The time-dependent wear-in and wear-out modes. Each is attributable to distinctive causes; briefly,

the former is related to early failures due to “birth defects” while the latter is related to long-term failures due to accumulative damages (fatigue); both are intrinsic factors.

• The time-independent constant-rate mode. Physically, it implies that the product inherent no major

defects and has incurred no appreciable damages; failure occurs due to random extrinsic factors, such as accidental over loading, service misuse, etc.

• The reliability models discussed in this chapter. Most are based on constant-rate(s), only for reasons

of mathematical simplicity; these include “unit under repeated demands”, “unit of step-wise rates”, and “regularly maintained unit”.

• Models for time-dependent rates. This topic is narrowly discussed; but pay attention to the concept of

proof-test, and the failure modes implied in the familiar normal, log-normal and Weibull functions.

• Systems of multiple units of constant rates. These are modeled only by assuming failures of the units are independent events, again for reasons of simplicity; the bundle theory discussed briefly at the end of the chapter is an exception, only to illustrate the mathematical complexity involved in modeling such systems.

Page 108: Reliability engineering

Chapter-IV Failure Rates IV-30

Assigned Homework. 4.1 Fit the test data given in example 4-1 to a failure rate function.

(a) Comment on the mode of failure; (b) Compute the value of R(1.5); (c) Compute the mean-time-to-failure.

4.2 Given λ(t) = kt, k is a positive constant and t 0:

(a) What is the associated failure mode? (b) Determine the associated f(t), F(t) and R(t) in terms of k;

(c) Sketch f(t), F(t) and R(t) on a graph paper, if t is in hours and k = 0.1/hr2. 4.3 The pdf for the time-to-failure (in hours) of a system has the following form: f(t) = 0.001 exp[-0.001t]

(a) Determine the corresponding failure rate function; (b) What is the associated mode of failure? (c) What is the reliability R(t) when t=100 hours? (d) What is the value for the MTTF?

4.4 The reliability function of a machine is described by

R(t) = exp[-0.04t-0.008t2] t in years

(a) What is the corresponding failure rate function? (b) What is the associated mode of failure? (c) What is the design life at which the reliability is better than 90% failure probability?

[(a) λ(t) = 0.04+0.016t (c) t=1.907 yrs.]

4.5 A device that controls a machine has a constant failure rate of 0.7/yr; in order for the machine to function normally for a long time, the device must be repaired at once upon failure. Then

(a) What is the probability that the device fails during the first year of operation? (b) What is the probability that the device fails during the second year of operation? (c) What is the probability that there will be at least one repair in 3 years of operation?

[(b) 50.34% (c) 87.75%]

4.6 The failure rate of a circuit is constant in time with the value of 0.04/day. If 10 circuits are put in continuous and independent use,

(a) What is the probability that there be none failure during the first day of use? (b) What is the probability that there be exactly one failure during the first day of use? (c) What is the probability that there be more than one failure during the first day of use? (d) If the circuit is repaired upon failure, what is the probability there be more than one failure in 3 days?

[(a) 67% (c) 6.16% (d) 33.74%] 4.7 The landing gear of an airplane has the constant failure rate λp=0.000001/hr when in parking, ,

λt=0.00001/hr when taxiing on the runway; and λf =0.0000001/hr when in flight. During take-off, it has the

Page 109: Reliability engineering

Chapter-IV Failure Rates IV-31

failure probability p1= 0.0001; during landing, it has the failure probability p2 = 0.0002. Typical flight schedule for this plane in a 24-hour day is: 4 take-offs and landings; the average taxing time before take-off and after landing is 30 minutes; the average flying time between take-off and landing is 2.5 hours.

(a) Develop an expression for the combined failure rate similar to (4.11). (b) Estimate the reliability of the landing gear in a 30-day service. (c) How often (in hours) should the landing gear be maintained as new for 99% reliability?

4.8 The time-to-failure pdf of an electric clock is normally distributed, with µ=5 years and σ=0.8 years.

(a) Determine the design life for 99% reliability; (b) What is the reliability that the clock will run for 3 years? (c) What is the failure rate if the clock has successfully served 3 years without failure? (d) What is the probability that the clock will run for another 3 years without failure? [(a) t = 3.144 yrs. (b) R(3) = 99.38%] [(c) λ(3) = 0.022/yr. (d) F(3/3) = 10.63%]

4.9 A motion-sensing floodlight is turned on when it senses a motion in the dark; it is lit for 5 minutes then turned off if there is no more motion detected. On the average, the floodlight is turned on 12 times during each night. The failure probability of p=0.0005 when it is turned on; the failure rate of 0.0001/hr when it is lit and 0.00001/hr when it is off.

(a) What is the combined failure rate for a 24-hours cycle? (b) What is the reliability for 30 days of service? (c) How many spares should be kept for the period from November 1st to March 31st for 99%

reliability? [(a) λcomb = 0.0002638/hr (b) 17.3% (c) 4 spares] 4.10 A cutting tool has the MTBF=2000 hours. Assume random failure:

(a) What is the reliability for a 500-hour mission? (b) What is the chance that at least one repair is needed during the 500-hr mission? (c) What is the probability that more than one repair is needed during the 500-hr mission? (d) What is the probability that more than two repairs are needed during the 500-hr mission? [(a) R(500) = 77.88% (b) Pn>0/500) = 22.12%] [(c) Pn>1/500 = 2.65% (c) Pn>2/500 = 0.22%]

4.11 A relay circuit has an MTBF of 0.8 years. Assuming random failure,

(a) What is the reliability for one year of service? (b) If two relays are arranged in parallel, what is the reliability for one year of service? (c) If five relays are arranged in parallel, what is the reliability for one years of service? Do this part by

expanding (4.34) for N=5. [(a) R(1) = 28.65%; or F1(1) = 71.35%] (b) F2(1)=50.9%]

4.12 A widget has the time-to-failure probability described by a Weibull distribution, with MTTF of 5 days and standard deviation of 1.2 days. If 10000 widgets are proof-tested for 1 day before they are put into actual service,

(a) Determine the failure rate function for the virgin widgets; plot the function for 0< t <10 days; (b) Determine the failure rate function for those passed the proof-test;

Page 110: Reliability engineering

Chapter-IV Failure Rates IV-32

(c) What is the expected number of failures during the proof-test? (d) What the number of failures during the first day of the wedges in actual service? Hint: determine numerically first the parameters m and θ for the Weibull pdf. [(b) λ(1) = 0.0015/day (c) F(1) = 0.03%; 3 will fail proof test] [(d) P1<t<2 = 0.81%; or about 81 failures out of 10000.]

4.13 A designer assumes a 90% probability that a new machine will fail between 2 and 10 years in service.

(a) Fit a log-normal distribution to this design; (b) Compute the MTTF; (c) Obtain the associated failure rate function; (d) plot the function for 0< t <10 years. [(a) to=4.47; ωo=0.49 (b) MTTF = 5 years]

4.14 A unit consisting of three identical components with the reliability R; it can be connected either in a "star" or in a "delta", as shown below:

A A

BB

C C

R

RR

RR

R

By means of the in-series and/or in-parallel models, show that the reliability of the "delta" is better than the "star".

[R∆ − R∗ = R - R3] 4.15 A network of 4 identical components with the reliability Ro are arranged in configuration A and B, as shown below:

Ro

Ro

Ro

Ro

Ro Ro

Ro Ro

A

B

Derive the system reliability for configurations A and B, respectively

Page 111: Reliability engineering

Chapter-IV Failure Rates IV-33

[Configuration B is better than A: RB/RA= (2-Ro2)/(2-Ro)2] 4.16 (Optional) Derive the CDF FB(x) of bundle strength for a bundle of 3 loose fibers, given the CDF F(x) of fiber strength. Use (4.37) and let N=3.

[FB(x)=F(x)3-3F(3x)F(x)2-3F(x)F(3x/2)2+6F(3x)F(3x/2)F(x)]

Page 112: Reliability engineering

Chapter-V Testing V-1

CHAPTER V. RELIABILITY TESTING

One principal reason for investigating product reliability is to insure and/or improve the quality of the product at par with the expected service life; but a full-scale reliability investigation can be costly. As a minimum, adequate time-to-failure and/or failure-rate database is needed in order to determine the pertinent reliability function for the product in question. In general, such database is generated either by a test program involving new products randomly picked off-line, or by in-service field-tests, or both. In either instance, the data so gathered reflects the quality of the product after it is already made off the production line; furthermore, the cost or the time needed to complete such a full-scale testing program can be prohibitive. For this reason, product engineers have devised tests that may (1) improve the quality before it is made; or (2) require limited off-line tests; and/or (3) accelerate the time of the tests, provided that the data so gathered still capture the reliability characteristics of the product. V-1. Reliability Enhancement. Reliability enhancement is practiced during product design and development stage, before full-scale production. In this stage, product prototypes are constructed based on initial designs and put in a certain simulated test-run. Generally, the first round of prototype testing tends to fail early and frequently; if so, diagnostic actions are then taken to identify the modes and/or mechanisms of failure; of course, other possible deficient factors inherent in the initial design may also be identified; design-modification is subsequently implemented in the next round of prototype construction and it is expected to enhance the product quality as well as reliability. Clearly, the enhancement can be more or less effective depending on the efficiency of the “diagnostic” and “design-modification” cycles. A frequently practiced approach is to test-run the prototypes under “over-load” and/or “severe environment” conditions. The idea of the former is similar to that of “proof-testing”; it helps to identify factors that cause product infantile mortality. The latter is related to product quality that may degrade faster in severe service environment. Both ideas are a way to accelerate the product time-to-failure process. The specifics in carrying out a reliability enhancement program often depend on the individual product in question; besides to discuss any “diagnostic” or “design-modification” details is beyond the scope this chapter. In what follows, we briefly describe some of the techniques used in reliability enhancement tests. The Duane Plot. To a large extend, infantile product failure is traceable to defects introduced in the design and development stage; it is hence essential to “debug” any possible defect that can be corrected in the design-modification cycle. The Duane Plot is an empirical method to achieve enhanced reliability through the debugging-modification iteration; the general procedure is as follows:

In the first round, one or more prototypes are constructed and test-run to failure. Suppose that the causes for failure are diagnosed and are traced to deficiencies in the design; a design-modification then follows to eradicate or minimize the deficiencies; a new round of prototypes are built and test-

Page 113: Reliability engineering

Chapter-V Testing V-2

run again. If additional failures are discovered, and the causes are ascertained, a new round of design modification is then carried out. The test-design-modification cycle can be repeated several rounds until the design is rid of all possible defects; The Duane plot is an analysis tool to minimize the number of test-design-modification cycles, or the number of prototypes required in each test cycle.

Now, let τ be the accumulated “test time” during successive prototype test-runs; and let η(τ) be the number of failures observed during the period from 0 to τ. Though the value of η increases with τ, the frequency of failure decreases with each design-modification cycle. Hence, in general, the quantity η(τ)/τ, representing the averaged failure rate during the period from 0 to τ, will decrease with τ. In particular, η(τ)/τ and τ maintain a linear relationship in the log-log scale: ln[η(τ)/τ] = αln(τ) + β (5.1) In the above, α and β are coefficients defining the straight line. The linear behavior depicted in (5.1), empirically found by J. J. Duane (1964), is quite pervasive over a variety of products in the design and development stage; but it is not clear why it exhibits a linear behavior. Use of (5.1) is illustrated in the following example; in particular, the meaning of the coefficient α will become evident.

Example 5-1. A computer circuit is developed to interface a complex mechanical device. Failure of the circuit occurs during test-run; debugging and design modification are made and new round of test-runs are conducted. In this case, 8 cycles are repeated with one prototype tested in each cycle. The following is data for nearly 240 hours of test-runs, in which 8 failures had occurred: # failures η: 1 2 3 4 5 6 7 8 ______________________________________________________ _________________ cumulative test time τ: 1.1 3.9 8.2 17.8 79.7 113.1 208.4 239.1 (hrs)

The Associated Duance Plot: ln[ηη(ττ)/ττ] versus ln(ττ)

ln[η(τ)/τ]

ln(τ)

-4.0

-3.0

-2.0

-1.0

0

5.0 1.0 2.0 3.0 4.0 0 -1.0

slope = α

Page 114: Reliability engineering

Chapter-V Testing V-3

Significance of plot: We observe that the data points in the plot follow a straight line, with the correlation factor r2=0.988. In fact, the slope of the line is found to be α = − 0.654. The meaning of α is interesting; it is known as the reliability growth coefficient. Graphically, it is the slope of the straight line and its value lies within the interval (0,−1). Physically, the value of −α is a measure of the efficiency of the design-modification cycle. When α=0, the line is flat; or the number of failures is proportional to the cumulative time τ; this means that no reliability enhancement is achieved. In that case, two possible explanations exist: (a) Failure occurs not due to defects in design; or (b) Design-modification cycle is inefficient; or defects are not eliminated. Usually, the latter is the case. When α → 1, the line is vertical; it means that the number of failures is a constant, independent of the length of test-run. Hence, the first round design-modification has already reached perfection. In this example, the reliability growth coefficient is 0.654; it indicates that the design-modification is relatively effective. Furthermore, the last two data point appears to be unnecessary, as the first 6 data points would have been sufficient to provide the same straight line; or the same α value; any more growth test-runs would not add much improvement; but only adds more cost and time.

Environmental Stress Test. Engineering materials, such as polymeric-based plastics, are prone to degradation by environmental agencies; the latter can include factors of temperature change, humidity and vapor invasion, long-term cyclic loading, rusting due to oxidation, ultra-violet light exposure, etc. Generally, environmental effects are insides in nature and they accumulate over a long period of time; product reliability enhancement against these environmental effects can bring the level of product quality higher, especially during the product infancy period. A procedure known as environmental stress test is commonly employed during the product design and development stage. Usually, one or more selected environmental variables are introduced into the reliability growth test along with normally applied mechanical loading. For instance, temperature conditioning may be superimposed onto mechanical stressing in prototype test runs; time-to-failure can be accelerated by testing under a higher temperature. The latter is known as the time-temperature correspondence behavior. Thermal cycling and cyclic over-loading are other ways that can be superimposed onto normal stressing during the test runs. In addition to possible time saving, environmental stressing along with the normal loading provides a combined effect on product failure mechanisms and failure modes. Sometimes, the combined effects are additive; most often, they are coupled, depending on the specifics of the problem at hand. V-2. Censored Reliability Test. Let us consider the following situations:

An electric motor is designed with a running life in the hundreds of millions of cycles. If the motor runs at 3600 rpm, it would take 19.3 days to reach 100 million cycles.

Page 115: Reliability engineering

Chapter-V Testing V-4

An automobile tire is designed for a life of about 40000 miles in normal use. In a simulated test-run, the tire accumulates 40 to 60 miles per hour; failure of a tire under such test is not expected to occur for at least 600 to 1000 hours.

Censored reliability test is a technique used to gather “time-to-failure” data from test specimens, some or most of which are suspended from the test before failure occurs. Sometimes. data from suspended tests may also come expectedly, due to sudden interruption of the test apparatus or due to power outage. Though a censored data point is not a time-to-failure point in the true sense, it nevertheless represents a time-point at which the specimen has lasted some amount of time without failure. Thus, censored data can be statistically significant if handled properly. Censored Ungrouped Data. Let t1, t2, t3, . . . ti+, . . . tN be a sample of N “ranked” time-to-failure data. The ith data ti

+ with a “+” sign signifies that the test was censored (suspended) at ti

without failure. Now, if ti was not censored, we can fit the data with a probability distribution function, f(t) or F(t), using say the mean-rank method: F(ti) = i/(N+1) Since R(t)=1-F(t), we have: R(ti) = (N+1-i)/(N+1) (5.2)

From the above, we can write: R(ti-1) = (N+2-i)/(N+1) (5.3) Combining (5.2) and (5.3), we have R(ti) = [(N+1-i)/(N+2-i)]R(ti-1) (5.4)

In (5.4), R(ti-1) is the reliability at ti-1; R(ti) is that at ti. Hence, the quantity (N+1-i)/(N+2-i) is the

reliability between ti-1 and ti, given the reliability R(ti-1):

R(ti/ti-1) = (N+1-i)/(N+2-i) (5.5) With (5.5), (5.4) is just a statement based on (2.5): R(ti) = R(ti/ti-1)R(ti-1) (5.6)

Now, in the event that censoring takes place at ti (i.e. no failure takes place at ti), then there is no

change of reliability between ti-1 and ti. Then, it follow that R(ti) = R(ti-1); or

Page 116: Reliability engineering

Chapter-V Testing V-5

R(ti/ti-1) = 1 (5.7) Between (5.5) and (5.7), we can state the following: if censoring is not taken at ti, R(ti/ti-1) is

given by (5.5); and if censoring is taken at ti, R(ti/ti-1) = 1, as shown in (5.7).

If there are more than one censored data taken either before or at ti, the reliability R(ti) can be

expressed in the general form: R(ti) = R(ti/ti-1) R(ti-1/ti-2) R(ti-2/ti-3) R(t1/0)R(0) (5.8) Note that the terms R(ti/ti-1), etc., must take the appropriate values calculated either from (5.5) if not censored, or from (5.7) if censored. In (5.8), R(0)=1 as it should. In this way, one can obtain R(ti) versus ti from the sample with one or more censored data. Of course, from R(ti) versus ti one easily obtains F(ti) versus ti, which can be fitted with a properly selected parametric function, be it the exponential, normal, log-normal or Weibull, etc.

Example 5-2. Ten electric motors underwent life testing; three of the motors were censored (indicated by a + sign, below). The failure times ti in hours are ranked as:

27, 39, 40+, 54, 69, 85+, 93, 102, 135+, 144. Solutions: Here, i=1,10 and censoring took place at i=3,6,9. We first calculate the quantities R(ti/ti-1)

using (5.5) or (5.7) depending on whether or not censoring takes place at ti. We then calculate R(ti)

according to (5.8) for all i=1,10. These result are summarized in the table below: i ti R(ti/ti-1) R(ti)

-------------------------------------------------------- 1 27 0.909 0.909 2 39 0.900 0.818

3 40+ 1.000 0.818 4 54 0.875 0.716 5 69 0.857 0.614

6 85+ 1.000 0.614 7 93 0.800 0.491 8 102 0.750 0.368

9 135+ 1.000 0.368 10 144 0.500 0.184 ---------------------------------------------------- Let us spot check the calculations listed in the table above: At i=2, t2=39; the data is not censored and we calculate using (5.5): R(ti/ti-1)= R(t2/t2-1)=(10+1-2)/(10+2-2)=0.9;

Page 117: Reliability engineering

Chapter-V Testing V-6

Use (5.8) and we calculate: R(t2)= R(t2/t1)R(t1/t0)=0.9x0.909=0.818.

At i=3, t3=40; the data is censored; R(ti) does not change from t=39 to t=40; so

R(t3)=R(t2)=0.818.

At i=4, t=54; the data is not censored; so R(ti/ti-1)=(10+1-4)/(10+2-4)=0.875

And, R(t4)= 0.0.874x1.000x0.900x0.909 = 0.716.

A plot of the calculate R(ti) versus ti is shown below:

t

R(t)

0 50 100 150

1.0

0.5

0.

censored data

hours Discussion: In the above plot, there exists an extra data point at to=0 and R(0)=1 and we have

attempted to link only the uncensored data points by drawing a piece-wise linear line. Note that the censored points are left outside the line. The line represents the approximated R(t) function, which can always be fitted by an analytical function. Even though the censored point are not a part of the line on the graph, the effect of censoring is inherent in the calculation of R(ti). For instance, if the

three censored points were actually failures, the R(t) line would have gone down to intersect the t-axis before t=150 hours; with the censored points, R(t) is still positive finite ( ≈ 0.2) at t=150 hours. An alternative is to draw the line pass through all the points, censored or not. Then, a slightly different R(t) function will result. In practice, the former approach (not to include the censored points) is preferred since the fitted line tends to underestimate, rather than overestimate, the R(t) function (to underestimate the reliability is safer).

Censored Groupped Data. When the reliability test sample is large (say, N>>100) and the failure time may extend to several decades of the time-unit used (say, ti ranges from 1 to 1000 hours), it

is a common practice to group the data in finite number of time intervals, say 7 to 10 intervals

Page 118: Reliability engineering

Chapter-V Testing V-7

(refer to Sturges formula in Chapter III); and the grouped data are then fit to a probability distribution function. The question here is what to do if there are censored data in the sample. First, let the class intervals be selected in the following way: (to, t1, t2, . . ti, . . tn), where to is the initial time (say to=0), the first interval is from to to t1, the second interval is from t1 to t2, and so on. Suppose mi data points begin at the start of the ith interval (from ti-1 to ti), di data points fail

within the interval, and ci data points are randomly censored during the interval. Then, the

conditional probability of survival at the end of the interval is approximated by: R(ti/ti-1) = 1- [di/(mi - 0.5ci)] (5.9)

Note that at the beginning of the (i+1)th interval there will be mi+1= mi-di-ci items under test.

With (5.9), valid for all i=1,n, the estimated reliability R(ti) is given by (5.8) again. Example 5-3. 206 turbine disks were tested for failure. Time-to-failure data are tabulated in 16 intervals, each may be 100 to 200 hrs in length. During each of the intervals, failure may or may not occur; in some intervals, there may be censored data. The rough data are tabulated in the first 5 columns of the table shown below: i time-interval mi di ci ti R(ti/ti-1) R(ti)

------------------------------------------------------------------------------------------------- 1 0 - 200 hrs 206 0 4 200 1.0000 1.0000 2 200 - 300 202 1 2 300 0.9950 0.9950 3 300 - 400 199 1 11 400 0.9948 0.9898 4 400 - 500 187 3 10 500 0.9835 0.9735 5 500 - 700 174 0 32 700 1.0000 - - - - - 6 700 - 800 142 1 10 800 0.9927 0.9664 7 800 - 900 131 0 11 900 1.0000 - - - - 8 900-1000 120 1 9 1000 0.9913 0.9580 9 1000-1200 110 0 18 1200 1.0000 - - - - - 10 1200-1300 92 2 5 1300 0.9776 0.9366 11 1300-1400 85 1 13 1400 0.9873 0.9247 12 1400-1500 71 0 14 1500 1.0000 - - - - 13 1500-1600 57 1 14 1600 0.9800 0.9062 14 1600-1700 42 1 14 1700 0.9714 0.8802 15 1700-2000 27 0 5 2000 1.0000 - - - - 16 2000-2100 22 1 2 2100 0.9524 0.8384 −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Note: The time intervals shown above are not the same length: the first and the 6th interval are 200 hours each; the 15th interval is 300 hours; the rest of the intervals are 100 hours each. The reason for the uneven class intervals is probably due to limitations encountered during testing. Basis for calculation: the censored data in each interval are assumed taken randomly; hence, the calculation of R(ti/ti-1) can be based on (5.9); the calculation of R(ti) is based on (5.8). The calculated

values for R(ti/ti-1) and R(ti) are listed in the last two columns of the above table.

For a spot check: at i=10, ti=1000 hour. From (5.9), the calculated R(t10/t9) is 1-[2/(92-2.5)=0.9976;

from (5.8), the calculated R(10) = 0.9976xR(9) = 0.9976x0.958 = 0.9366.

Page 119: Reliability engineering

Chapter-V Testing V-8

A plot of the reliability function R(ti) versus ti is shown below:

t

R(t)

0 500 1000 1500

1.0

0.9

0.8

hours20000.7

data-fitted

extrapolated

Discussion: Of the sample of 206 data, there are only 13 failures with 172 data censored. By the end of 2100 hours, 22 items are still under test. Thus, the fitted R(t) function is based on life data from t=0 to t=2100 hours only. The plot, however, can be extrapolated for longer times if needed to (as shown by the dotted line in the plot). There may have been too many censored data in the early time intervals (see column ci in the above

table). Usually, the number of censors should be more or less constant throughout the time intervals.

V-3. Accelerated Reliability Testing. Condensed-Time Test. Many products are not operated continuously in real time. Home appliances, for instance, are used only a fraction of the day, but are expected to be operational for many years of use. Actually, over the life of an appliance, its actual on-time is much smaller than its off-time. To assess the reliability of these products, life-data collected in the usual way can be expansive and time-consuming. In such cases, reliability tests are often conducted in condensed-time, meaning that the test is conducted under continuous operation, without any off-time. Clearly, the condensed-time method is useful only if the “on-off” switching has no or little effect on product failure; thus, a verification of this fact is important. Generally, the verification is an integral part of the reliability test itself. In this regard, some interesting insights may be gained in the following example:

Example 5-4. Time-to-failure data of a flashlight bulb is generated in the laboratory, where the bulb

Page 120: Reliability engineering

Chapter-V Testing V-9

is lit by a constant 6-volt DC source. Data from two sets of tests are collected:

(a) Condensed-time tests, where 26 bulbs are lit continuously till burnt out; and (b) Cyclic on-off tests, where 27 bulbs are turn-on for 30 minutes and turn-off for 30

minutes, and so on, till burnt out.

The following is a tabulation of the respective data (in calendar hours) from the two tests: Condensed-Time test Cyclic “On-Off’ test 72 82 161 177 87 97 186 186 103 111 196 208 113 117 219 224 117 118 224 232 121 121 241 243 124 125 243 258 126 127 262 266 127 128 271 272 139 140 280 284 148 154 292 300 159 177 317 332 199 207 342 355 - - - - - - 376 - - - ----------------------------------------------------- Statistical analysis of the above data using the quick “first-step estimation” technique (see Chapter III) provides the following results: µ=128.42 hours σ=31.32 hours for the condensed-time test; µ=257.11 hours σ=55.90 hours for the cyclic on-off test. Since the on-time in the cyclic on-off test is just one-half of that in the condensed-time test, the parameters µ and σ in the on-off test should be reduced by half: µ=128.55 hrs σ=27.95 hrs for the cyclic on-off test. Thus, the two sets of data have nearly the same “mean” (128.42 versus 128.55); but their respective standard deviations are somewhat different (31.32 versus 27.95); namely, the cyclic on-off data is slightly less scattered than that of the condensed-time data. It is not clear if there is any physical reason for this difference. But, based on the quick analysis, it suggests that the cyclic on-off test has no or little adverse effect on the flashlight bulb’s operational life distribution; the condensed-time test can be used as an acceptable reliability test method.

Discussion: The above data may be better fitted by a known distribution function, such as the normal or the Weibull; and use of the least-square method would further yield a better fit.

Over-Stress Test. Many products are in continuous use without any significant off-time. Failure of such products is usually caused by long-term deterioration, known as fatigue, during their service life. To access their reliability, life tests involving over-stressing may be employed in

Page 121: Reliability engineering

Chapter-V Testing V-10

order to shorten the time-to-failure. For the method to be useful, however, over-stressing must not significantly alter the product failure mode and mechanisms. By over-stressing, it is meant that the severity of loading is elevated higher than the designed load; so failure may occur in short time-under-load. But, mechanical load is not the only “over-stressing” agency; for instance, electronic components may be tested at elevated temperatures in order to hasten the incidence of failure. Similarly, steel pipes in nuclear power plants may be exposed to extreme neutron irradiation that can increase the brittleness of the steel, thus causing brittle failure. To apply over-stressing in reliability test, it is not necessary to require that the failure mode be the same at all the stress levels. The key idea here is to obtain a relationship between the stress-level and the product life-time, with some level of confidence. The following example illustrates the essence in over-stress testing:

Example 5-5. A steel crank-shaft is designed for the design-stress σd and the design-life td. Life

tests are then conducted on the crank-shaft at three stress levels: σ1, σ2 and σ3, which are all higher

than the design stress, σd. At each of the three test stress levels, life distribution data is gathered; and

the results are displayed in a σ σ vs. t plot as shown below:

stress level

time-to-failure

design load: σd

test stress levels

σ1

σ2

σ3

Discussion: In the above, life distributions (the shaded normal-like curves) at the three test stress levels are plotted with test data; but the life distribution curve for the design load σd is projected by

extrapolation without any test data. The point here is that at the higher stress level, test time needed to generate data is relatively short compared to that needed under the design-stress level σd. In the plot shown above, two parameters are of interest: the mean µ and the standard deviation σ. The distribution means at the three test-stress levels are fit into a decreasing function, say µ(t), shown by the solid line; the dash line is the projection of µ(t) towards the distribution mean at the design stress σd.

Page 122: Reliability engineering

Chapter-V Testing V-11

In the same plot, it is also noted that the life distribution curve is less scattered at the higher stress level than at the lower stress level. The physical reason is that at a higher stress level, the most dominant mechanism usually emerge and cause failure in a particular mode; at lower stress levels, more than one mechanisms may be simultaneously in effect; and one of them may ultimately become the dominant failure mechanism. Hence, among many specimens tested, there may occur more than one failure mode when tested at a lower stress level. The S-N Curve: Fatigue test on structural materials often takes the form of cyclic loading about a mean-stress level. In that case, time-to-failure data are plotted with the mean-stress (S) versus the number of load cycles (N) (most often the log of N); the line that connects the means at the different S levels is commonly known as the S-N curve. Fatigue Limit: The rate of decrease of the S-N curve in time signifies the ability of the material against fatigue; if the curve is steep, it indicates the material being fatigue-sensitive; if it is relatively flat, it indicates otherwise. Extension of a relatively flat S-N curve establishes the so-called fatigue limit, meaning the material is fatigue free if tested under that stress limit. Example 5-6. Flashlight bulbs considered in Example 5-4 were designed to operate under 6v DC power. In an accelerate test, the bulbs were subjected to 9, 12 and 15 volts, respectively; 12 bulbs were tested at each voltage level and the time-to-failure data in hours are tabulated below: 9-volts 12 volts 15 volts 44 15 8 56 19 9 58 23 9 59 25 10 60 28 11 61 30 11 62 32 11 63 34 12 64 37 12 70 37 13 74 39 13 88 41 15 ------------------------------------------------------ Solution: As was done in the previous example, the above 3 sets of data are analyzed using the “first-step” estimation for the respective distribution parameters µ and σ; and these are tabulated as follows: 9-volts 12 volts 15 volts Mean 63.25 30.00 11.16 Std. Dev. 15.86 8.22 2.03 From the above, we see the trend that both the distribution mean and standard deviation decrease with the test voltage; in fact the rate of decrease in both parameters is rather sharp. Physically, a rapidly decreasing mean indicates that time-to-failure is accelerated with increasing rate, while a decreasing standard deviation indicates that failure is possibly dominated by a single mechanism.

Page 123: Reliability engineering

Chapter-V Testing V-12

In order to obtain the distribution mean and standard deviation for the flashlight tested under 6-volt, we use the above “over-stress” test results to make an extrapolation for the each of the parameters. To do so, we plot both µ and σ versus the three over-stress test voltages, as shown in the figure below:

6 9 12 15 v

50

100

150

03

µ σ

-5

5

15

25

µ=132

σ=26

mean

std. dev.

0

The above plots show that the mean is a slightly non-linear function of the test-voltage, while the standard deviation is nearly a linear function. By extrapolating the plotted curves toward the 6-volt level, the values of µ and σ under 6-volt are estimated as follows: µ=µ=132 hours σσ=26 hours. Compare the above results with the respective values of µ=µ=128.42 hrs and σσ=32.32 hrs tested under 6-volt in Example 5-4; we see that the accelerated test provides a reasonable extrapolation for µ and σ under 6-volt. Discussion: In the above, the non-linear plot of the distribution mean can be re-plotted in terms of ln(µ) versus the test voltage:

voltage 9-volts 12 volts 15 volts ------------------------------------------------------------------------------ ln(µ): 4.15 3.40 2.71

Then, a straight-line relation is obtained as shown in the plot below:

Page 124: Reliability engineering

Chapter-V Testing V-13

6 9 12 15 v2

3

ln(µ)

4

6ln(µ)=4.9

µ = exp[4.9] = 134..3 hours

By extrapolating the straight line to 6-volts, we find ln(µ =4.9; or, µ µ =134.3 hours.

In practice, the condensed-time and over-stressing concepts are often used in combination in accelerated life tests; it usually can provide further reduction in test time.

Page 125: Reliability engineering

Chapter-V Testing V-14

Summary: When a product fails to perform the designed functions in life, it is often difficult or at least time-consuming to pin down the real cause or causes. Most often, a sufficiently large data-base is needed in order to render a rational estimate for the reliability of the product. This chapter briefly introduces the concept of “reliability enhancement”; it is an effective approach to eliminate or at least minimize the defects that are inherent in the product design-and-development stage. Methods of “censored test” and “accelerated test” are all aimed at saving-time in gathering the needed statistical data.

• The Duane Plot is an empirical method that can be applied in a reliability enhancement program. The essence of the method is illustrated in Example 5-1.

• Distinguish the difference between the “ungrouped censored” and the “grouped censored” tests. The

former is illustrated in Example 5-2, the latter in Example 5-3. Be familiar with the way the test data are analyzed in each case.

• The “condensed time” is applied to products that are operated with some on-times and some of-times.

And, the method is reliable only if the on-off switching operation does not significantly cause failure or accelerate the failure rate. See Example 5-4 for some details.

• “Over-stressing” is another approach to accelerate product failure. The general provision is that the

failure-mode and failure mechanisms caused by over-stressing should not be much different from that caused by the product’s design-stress. See Examples 5-5 and 5-6 for some details.

Assigned Homework: 5.1. Use the test data provided in Example 5-1:

(a) Apply the least-square fit to the Duane plot; (b) Verify that α =-0.654 and r2 = 0.988.

5.2. By debugging a computer software, failure-causing defects are found and then corrected at 1.4, 8.9, 24.3, 68.1, 117.2 and 229.3 hours. Make a Duane plot and estimate the reliability growth coefficient α. [α ≈ −0.65] 5.3. The wear-out times of 9 emergency road-flares are: 17.0, 20.6, 21.3, 21.4+, 22.7, 25.6, 27.0+, 27.7, and 29.7+ minutes (numbers with a superscript “+” refer to flares distinguished by accident).

(a) Use the “ungrouped censored” method to make a plot for the reliability function. (b) Determine the probability of wear-out within 24 hours for a randomly picked flare.

5.4. Grouped data (uncensored) for the time-to-failure (in unit of 103 hours) of an electrical circuit are given as follows: 0 t 6 5 6 t 12 19 12 t 18 61 18 t 24 27 24 t 30 20 30 t 36 17

(a) Make a plot for the reliability function. (b) Estimate the reliability of the circuit for the design life of 10,000 hours.

5.5. Repeat Example 5-6 by fitting the data using the normal distribution plotting paper; compare the results with that obtained in the example.

Page 126: Reliability engineering

Chapter-VI Quality Control VI-1

CHAPTER VI. PRODUCT QUALITY MEASURES

Quality and reliability are often synonymous in the views of consumer. For product engineers, however, there are distinct measures in each case, in term of design, development, manufacture and service performance of the product. As we have learned in Chapters IV and V, reliability refers specifically to the probability of survival of a product or system within a certain service life; and that possibility can be improved through some reliability enhancement programs, including quality control. In broad terms, product quality is associated with (a) the ability to incorporate and optimize all the design parameters to meet the specified performance targets, including product reliability; and (b) the ability to reduce variability in meeting the performance targets. Generally speaking, the former is central to quality assurance in the early design and development stages, while the latter is essential to quality control during the manufacturing and in-field service stages. The flow-chart below illustrate the intertwined relationship between quality and reliability:

product life

product inception product development manufacture in-service

design & modification on-line QC field service

Reliability Improvement Cycle

This chapter discusses only some of basic notions pertaining to on-line quality control (QC). Other aspects in the reliability improvement cycle will be included in a subsequent course on quality control in much broader terms. VI-1. On-Line Quality Control. Design Target and Quality Variability. In engineering, a product is almost always designed with one or more quality indices, Xi; and each is tied with a design target, τi. In the most ideal situation, the actual product should all be meeting the targets: Xi = τi. However, during the process from design to manufacturing, many random factors may be introduced; and the quality of the product can inherent various degrees of scattering. Thus, each of Xi can become a random variable. For simplicity, let there be only one index, X and one target: X=τ. And, on-line inspection of the product for the index X often yields a normal-like distribution f(x) with the mean (µ) near or around the target τ, and with some degree of scatter that can be measured by the standard deviation (σ), such as shown schematically in the figure below:

Page 127: Reliability engineering

Chapter-VI Quality Control VI-2

f(x)

x

µ

σσ

τ = target

USLLSL

In the above figure, if the distribution mean coincides with the target; the quality index X is said to be on-target; otherwise, it is off-target. As for the variation of X from its mean value, it is usually judged by a set of independent “specification limits”, labeled USL (upper specification limit) and LSL (lower specification limit), respectively. The significance of the various elements shown in the figure is as follows: (a) We would want µ = τ; that is the mean quality is on target. (b) We would want σ be small; so X is clustered around the target τ. Thus, through quality control, one attempts to place µ on target and reduce σ to an acceptable small value. This, however, requires (a) identification of the physical factors that influence the values of µ and/or σ; and (b) the implementation of corrective actions in order to meet the limits on these parameters in the quality index distribution. Generally speaking, factors influencing µ are inherent in the design/development stage; most are easily identified, though some may be traceable to manufacturing precision or the lack of it. Of course, these factors must be identified accurately so that relevant corrective measures can be devised in the design modification cycle. On the other hand, factors that influence σ are more random in nature; some are traceable to materials variability; others may be due to variables encountered during manufacturing and handling processes. These variables are usually difficult to identify and not easily controlled. Methods and techniques for controlling the mean and the standard deviation are subjects belonging to product robust design, to be briefly discussed later in this chapter. Acceptance/Rejection Limits. In the previous figure, the acceptance limits USL and LSL are set for purpose of accept/reject evaluation. Thus, given the quality distribution f(x) or F(x), those products with X below the LSL and above the USL will be rejected; thus the fraction of rejection is F(LSL) + [1-F(USL)]; or those that are accepted, known as the yield, is: Y = F(USL) - F(LSL)] (6.1) In particular, if the distribution mean is on target (µ=τ), the specification limits can be set in the following manner: USL = µ + ∆ and LSL = µ − ∆

Page 128: Reliability engineering

Chapter-VI Quality Control VI-3

f(x)

x

µ =τ

∆USLLSL

rejectreject

Here, 2∆ is known as the “acceptance gate width” and the white area between the specification limits represents the fraction of product yield. If the normal f(x) is standardized (see chapter III for details) through z = (x-µ)/σ), we can readily obtain: Φ[(LSL-µ)/σ)] = Φ[−∆/σ)] and Φ[(USL-µ)/σ)] = Φ[∆/σ)] Hence, the yield in (6.1) can be expressed as: Y = 1 - 2Φ(-∆/σ) (6.2) In this case, since µ=τ, the fraction of yield depends singly on the distribution standard deviation σ. Note that setting 2∆ is independent of the manufacturing process itself, so the on-line quality control is based sorely on the value of ∆/σ. The 3-σσ Criterion. The conventional on-line quality control follows the so-called 3-σσ criterion. Specifically, the product quality distribution f(x) meets the conditions: µ=τ; and σ ≤ ∆/3 (6.3) Under this criterion, the quality of the product will pass the QC with: Fraction yield ≥ 1 - 2Φ(-3) = 99.73%; or Fraction rejected ≤ 2Φ(-∆/σ); or ≤ 2Φ(-3) = 0.027%. The 6-σσ Criterion. Most modern electronic parts (e.g. computer chips) are required to meet more stringent quality specifications. The 6-σσ criteria requires that: µ=τ; and σ ≤ ∆/6 (6.4) In this case, with Φ(-6) = 0.000,000,001, the quality of the part will pass QC if and only if: Fraction rejected ≤ 2Φ(-6) = 0.002 ppm; or

Page 129: Reliability engineering

Chapter-VI Quality Control VI-4

Fraction yield ≥ (1 - 0.002x10-6).

Example 6-1. The manufacturer of a bearing ball performs an on-line inspection and finds that the bearing ball diameter can be described by a normal distribution with the mean µο and σο. Although µο is on target, the value of σο is unacceptable. An on-line QC program is then instituted which screens out the bearing balls with diameters outside ±1.5σο from the mean. Now, examine the following

questions: (a) What is the yield after the QC screening? (b) For the bearing balls passing QC, what is the new diameter distribution? (c) What does the QC actually achieve? Answers: (a) The yield after QC is obtained via (6.2): Y = 1- 2Φ(−1.5σο/σο) = 1 - 2(0.06681) = 86.638%

(b) Assume that the diameter distribution of the 86.638% yield is a truncated normal function; its mean remains the same as µ=µο due to symmetry, while the standard deviation changes by a factor of

κ: σ=κσο; thus

f(x) = 1

√2πexp [

12

− ]κσ

( )2x-κσ

µo | |x - µο < 1.5σο

f(x) = 0 | |x - µο > 1.5σο

οο

For f(x) to be a pdf, the following must be satisfied:

∫µ − κσο ο

µ + κσο οf(x) dx = 1

The above in turn yields κ=0.74263; and σ = 0.74263σο.

(c) The QC screening has narrowed the spread of the original diameter distribution; i.e. the original diameter scatter (σο) is reduced by about 25%.

Short-Term Process Capability Index. The 3-σ criterion is often used as a standard index for measuring the effectiveness of product processing capability for a batch of products during on-line QC. The short-term “Process Capability Index”, Cp, is thus defined by:

Cp = ∆/3σ (6.5)

If the quality of the batch meets the 3-σ criterion exactly, that is µ=τ and σ = ∆/6, then Cp =1; if

the quality of the batch falls below the 3-σ criterion (σ>∆/3), then Cp<1; if the quality exceeds the

Page 130: Reliability engineering

Chapter-VI Quality Control VI-5

3−σ criterion (σ< ∆/3), then Cp >1.

The index Cp is frequently used in industrial circles to measure the processing capability in short

term, since the product short-term yield can be readily expressed in term of Cp: Y = 1- 2Φ(−3Cp) (6.6)

Note: If the quality of the product meets the 6-σ criterion exactly, then the short-term process capability index is Cp= 2.

Long-Term Process Capability Index. Product quality can vary from batch to batch; over long period of time and on-line QC for many batches, the variation in quality can be characterized by the shift of the distribution mean (µ) from the design target (τ), and sometimes also by the change of the standard deviation σ from one batch to another.

As an example, suppose several batches of a product are inspected on-line; quality distribution of each batch is shown in the following graphical sketch:

τ

on-target

loss of precision

over calibration

back on target

It is noted that: * The initial batch of the product is on target (µ=τ); * Loss of precision shits the mean to the left of τ in the second batch; * Over correction shifts the mean to the right of τ in the third batch; * Loss of precision shifts the mean back on target in the fourth batch.

Page 131: Reliability engineering

Chapter-VI Quality Control VI-6

Suppose the above represents a quality variation cycle; and the cycle repeats itself again and again during a long period of processing; it is then considered a case of long-term processing, characterized by the long-term “process capability index”, Cpk.

Note: Over long period of time, it is possible that the “over-all” quality distribution mean (µk) is still on the target τ; but the overall standard deviation may be increased to σk; or worse, the over-

all µk is off-target and the standard deviation is increased to σk.

In the first case (µk=τ), the long-term process index is defined as:

Cpk = ∆/3σk (6.7)

In the second case (µk=τ), the long term processing index is modified as: Cpk = (1-k)(∆/3σk)

(6.8) where k = abs(µk-τ)/∆ (6.9)

In both cases, the long-term yield is given by: Y = 1- 2Φ(−3Cpk) (6.10)

The Taguchi Processing Capability Index. Another form for the processing capability index takes into account the long-term effects due to shifting of the mean from the target, as well as possible increase of the standard deviation. Specifically, the “effective” long-term distribution standard deviation, σm is defined as:

(σm)2 = (σk)2 + (µk - τ)2 (6.11)

It follows that the processing capability index is given by: Cpm= ∆/3σm (6.12)

Note that in the definition of Cpm, the long-term quality distribution f(x) need not be normal. As

it will be discussed in Section VI-3, the expression in (6.11) is associated with the Taguchi loss axiom.

Example 6-2. The diameter of a driving shaft is designed to be 10 cm, with the acceptance tolerance of ± 0.01 cm. From on-line QC over a long period of time, it is found that 1.5% of the shafts exceeded the USL while 0.04% fell below the LSL. Find the long term processing capability index. Solutions: Here, we have τ = 10.00 cm and ∆ = 0.01 cm. Assume normal distribution, we have:

Page 132: Reliability engineering

Chapter-VI Quality Control VI-7

Φ [(10.01−µκ)/σκ] = 1−1.5% = 0.985

Φ [(9.99−µκ)/σκ] = 0.04% = 0.0004

Using the table in Appendix III and solving for µκ and σκ, we obtain:

µκ = 10.002 cm σκ = 0.0036 cm

We see that the mean of the distribution is slightly off target (10.002 vs. 10.00 cm). For the long-term processing index Cpk: Using (6.10), we compute k = (10.002-10)/0.01 = 0.2; and the long term processing capability index is given by (6.8): Cpk = (1- 0.2)[0.01/(3x0.0036)] = 0.741.

Since Cpk is less than 1, the long-term quality of the shafts is not meeting the 3-σ criterion. For the Taguchi processing index Cpm: The long-term variance is calculated first by using (6.11):

(σm)2 = (0.0036)2 + (10.002 − 10)2 = 1.696 x 10−5 σm= 0.00412

Then, from (6.12) we have Cpm = 0.01/(3x0.00412) = 0.8094

The Taguchi index is also less than 1. Discussion: Both Cpk and Cpm measure the effect of long-term deviation of the product quality from the design target; in practical use, the two should not be compared. Example 6.3. A TV producer uses a large quantity of 50Ω resisters. The acceptance tolerance is ±2.5Ω. Two suppliers have provided samples of the resisters and an in-house QC on 30 randomly selected resisters in each supplier provided the following data: Sample 1: 48.47 48.49 48.66 48.84 49.14 49.27 49.29 49.30 49.32 49.39 49.43 49.49 49.52 49.54 49.69 49.75 49.78 49.93 49.96 50.03 50.06 50.07 50.09 50.42 50.44 50.57 50.70 50.77 50.87 51.87Ω Sample 2: 47.67 47.70 48.00 48.41 48.42 48.44 48.64 48.65 48.68 48.85 49.17 49.72 49.85 49.87 50.07 50.75 50.60 50.63 50.90 51.02 51.05 51.28 51.33 51.38 51.43 51.60 51.70 51.74 52.06 52.33Ω Analysis of the above data provides the following points for consideration: (a) Both samples fall within the specification limits of 50 ± 2.5Ω; both are acceptable. (b) The sample data are fitted to a normal distribution via the least-square method, yielding:

For sample 1: µ = 49.70Ω; σ = 0.84Ω r2 = 0.962

For sample 2: µ = 50.10Ω; σ = 1.59Ω r2 = 0.956

Page 133: Reliability engineering

Chapter-VI Quality Control VI-8

(c) Assume that the samples supplied were randomly selected from long-term stocks, the long- term processing capability indices are: For sample 1: Cpk = 0.873; Cpm = 0.934

For sample 2: Cpk = 0.503; Cpm = 0.523

On the basis of the above analysis, the following considerations are relevant: (a) The resisters from both of the suppliers fit the normal distribution well; (b) Both meet the acceptance tolerance (47.50 to 52.50Ω); (c) Sample-1 is more off-target than sample-2; (d) Sample-2 is more variant than sample-1. (e) Supplier-1 is a better choice, based on the processing index Cpk or Cpm. We shall re-visit this problem latter in chapter based on a different set of considerations.

VI-2. Systems of Multiple Parts. The forgoing discussions were focused mainly on a single item (part), subject to a single set of specification limits. Real engineering systems, however, may contain many parts; and each part may be subjected to more than one set of specifications. In many cases, the total number of specifications imposed on a system of multiple parts can grow rapidly. The following are some basic notions pertaining to on-line QC for systems of multiple parts. System Yield. Suppose that a system is made of N parts; each part is subjected to a certain specification limits. Now, let Xi, (i=1,N), be the event that the ith part fails to meet its specification; then, the probability that all the N parts fail to meet their respective specification is given by the intersection of all Xi: PX1 ∩ X2 ∩ X3 ∩ . . . . ∩ XN (6.13)

For simplicity, assume that Xi = p for all i=1,N and that the events are independently of each other; then, the system of N parts passes all the individual specification is given by: PX’1 ∩ X’2 ∩ X’3 . . ∩ X’N = PX’1PX’2 . . PX’N= (1 - p )N (6.14)

The expression in (6.14) is actually the system yield based on the above-mentioned assumptions: Y = (1 - p )N (6.15) If N is large and p is small (p <<1), (6.15) can be reduced to:

Y = e-Np (6.16) Example 6-4. A computer manufacturer found that the circuit boards used in the computer meet the 3-σ criteria. (a) What can we say about the quality of the board?

Page 134: Reliability engineering

Chapter-VI Quality Control VI-9

(b) If the board contains N chips, what can we say about the quality of the chip? Analysis: (a) Based on the 3-σ criterion, the yield of the board is 99.73%. (b) Now, suppose there are N chips in the board; each chip can fail with the probability p. Then, for

large N and small p, it follows from (6.16) and the result in (a) that the yield of the board is:

Y = e-Np = 0.9973 Thus, we find from the above: p = - ln(0.9973)/N Now, suppose that each board contains N=100 chips, then

p = -ln (0.9973)/100 = 27.04x10-6 This implies that there are to be 27 chip failures per million, or 27 ppm. Note: Here, one may consider that p reflects the technology level of chip production; to improve the yield of the board, the value of p should be reduced. Then, consider the following situation:

* Suppose the same technology is used (i.e. p = 27.04x10-6) in making the chips; then a board that contains N=10000 chips has the yield of only:

Y = e-10000(0.00002704) = 0.763

That is certainly not a desirable outcome.

* Suppose a circuit board contains N=10000 chips; but the desired yield of the board must meet the 3-σ criterion, i.e. Y=0.9973. This can be achieved only by improving the quality of the chips, i.e. to reduce the failure probability p, which is found by setting the yield of the board at:

Y = 0.9973 = e-10000p From that we find

p = -ln(0.9973)/10000 = 0.2704x10-6

This translates to 0.27 failures per million chips. Discussion: In modern electronic technology, a system (say, a computer) often contains millions of chips; consequently, the reliability of the chips must be very high in order to insure a high yield of the system (computer).

The Significance of the 6-σσ Criterion. In modern times, engineering systems such as electronic appliances can contain millions of parts. In order to assure such systems with a high degree of reliability, the probability of any one part not meeting its specifications must be measured in terms of a few ppm or less. In this context, the 6-σ criterion is often instituted in on-line QC, that the short-term quality variation must be such that σ≤ ∆/6. This also implies that the short-term processing capability index Cp ≥ 2.0. Alternatively, the fraction rejected on a short-term basis should be equal or less than p ≤ 2Φ(-6) = 0.002 ppm (6.17)

Page 135: Reliability engineering

Chapter-VI Quality Control VI-10

Thus, if a system contains N=104 parts and each part meets the 6-σ criterion, the system yield on the short-term basis is given by:

Y = e-Np = e-0.00002 = 99.998%. However, when long-term effects are taken into consideration, the system yield is expected to reduce somewhat. Let the long-term processing capability index for each part be Cpk=1.5; the

corresponding failure to meet specification for each part increases to: p = 2Φ(−3x1.5) = 6.796 ppm Compared to (6.17), the long-term quality of the parts deviated much from the 6−σ criterion; consequently, the corresponding system yield is reduced to:

Y = e-Np = e-0.06796 = 93.43%

Discussions: The above example illustrates the close relationship between system complexity and system yield, on both the short-term and long-term bases. When the system is complex (i.e. when N is large), a tight acceptance specification limit such as the 6-σ criterion may be required for each part; then and only then, a higher level of system reliability can be achieved. Implement the 6-σ methodology has been a new trend in advanced manufacturing; the ISO standards are example, are based on the 6-σ concept. For a through coverage of the latter is beyond the scope of this chapter.

V-3. Loss Due to Quality Variation. When a product does not meet the specification limits, it must be rejected; or if not, the chances are that it will not perform well within the designed functions. In either case, this will incur a certain monetary loss to the producer. How to estimate the “loss” due to product quality variation is usually proprietary in business; and the model used is a highly guarded secrete. Nonetheless, the general approach in loss estimate is based on the product quality distribution and the rejection/acceptance criterion. In this connection, there are two commonly used models: the “apparent” estimation model and the “intangible” estimation model. A typical model for the former is the so-called “goal-post” model; one for the latter is the “Taguchi” model. The Goal-Post Loss Estimate Model. Let Lo be the cost per product manufactured. Then, any one such product rejected represents a loss of at least Lo. Now, if the pdf f(x) is a measure of the product quality and it must meet the specification limits, say: LSL=µ-∆ and USL=µ+∆, then the probability that a product being rejected is given by 2Φ(-∆/σ) according to (6.2). Hence, the “expected loss” per product manufactured is given by: E(L) = Lo[2Φ(-∆/σ)] (6.18)

Alternatively, one can formally define a “loss function”, L(x), for the product as follows: L(x) = 0 for LSL < x < USL (6.19)

Page 136: Reliability engineering

Chapter-VI Quality Control VI-11

L(x) = Lo for x< LSL and x>USL

Then, weighing against the product quality distribution f(x), the “expected value” for product loss is given by the following integral:

∫ ∝

−∝ L(x) f(x) dx E(L) = (6.20)

Substituting the loss function L(x) of (6.19) into (6.20) and carrying out the integration yield the exact result as expressed in (6.18). The above can be graphically visualized in the following sketch:

f(x)

x

µ

∆USLLSL

rejectreject

yield

x

L(x)

Lo

acceptance gate

no loss

rejected rejected

loss loss

Example 6-5. A bearing-ball producer adheres to the 3-σ for on-line QC. The cost for each bearing ball produced is $0.25. Based on the goal-post model, what is the expected loss per bearing ball produced due to quality variation? In this case, the expected loss per product produced is simply: E(L) = 0.25[2Φ(−3)] = 0.0675 cents Or, the loss is about 0.27% of the product cost. Example 6-6. A car manufacturer produces a driving shaft for $350 a piece. The length of the shaft is designed to be exactly 100 inches; but it is acceptable within the tolerance of ±0.1 inch. On-line

Page 137: Reliability engineering

Chapter-VI Quality Control VI-12

inspection of the shaft finds that 1.5% of the shaft was rejected because it exceeds the USL; and 0.04% of the shaft was rejected because it is below the LSL. Using the goal-post model, estimate the expected loss per shaft due to quality variation. Solution: In this case, the total rejection is 1.5% + 0.04% = 1.54%; so the expected loss is simply: E(L) = $350 x 0.0154 = $5.39 per shaft produced. Discussion: Of course, the above estimate is based on the implied assumption of “mass” production of the shaft. Since the length of the shaft is targeted at 100 inches; we can actually infer that the shaft length distribution is a normally function. Thus, with the on-line inspection data, we can write Φ [(LSL−µ)/σ] = Φ [(99.9−µ)/σ] = 0.0004 Φ [(USL−µ)/σ] = Φ [(100.1−µ)/σ] = 1− 0.015 = 0.985 By using Appendix III-A, we find: (99.9−µ)/σ = -3.3 and (100.1−µ)/σ = 2.17 From the above, we obtain: µ = 100.02” and σ = 0.036” The length of the manufactured shaft is off target by +0.02”.

The Taguchi Loss Estimate Model. In the goal-post loss estimate model, the product that is acceptable does not contribute to any loss; only the rejected ones do; and they each incur the same loss no matter how badly it is rejected. In the case discussed in Example 6-6 above, even for those that are off the design target, no loss would be incurred as long as the product meets the acceptance limits. The Taguchi model (Genichi Taguchi) proposes that loss can be caused both by quality scatter measured by σ, and the degree of the quality that is of target (measured by the difference between distribution µ and the target τ); even if the quality actually meets the specification limits, loss could be incurred simply because it is not on target. The rationale behind this model is that there are some intangible effects that cause the off-target product not performing to the designed level. A common form of the Taguchi loss function is defined as:

L(x) = Lo [(x - τ)/∆]2 (6.21)

Note that (6.21) contains the absolute value of (x-τ) which measure the deviation from the target; and it is weighted with respect to the quality acceptance gate ∆. Substituting (6.21) into (6.20) and denoting k=Lo/∆2, we have:

Page 138: Reliability engineering

Chapter-VI Quality Control VI-13

E(L) = ∫ k(x - τ)2 f(x) dx

= k[ ∫ (x-µ)2f(x)dx + (µ−τ) ∫ (x-µ)f(x)dx +(µ−τ)2 ∫ f(x)dx]

Carrying out the integration over the range of x (typically from - ∞ to ∞ ) and simplifying, we obtain:

E(L) = (Lo/∆2) [σ2+ (µ-τ)2]

(6.22) It is noted that (6.22) is obtained without specifying the specific form of f(x); only the mean µ and the variance σ2 of f(x) need to be specified. Hence, f(x) needs not be normally distributed. Note also that the quantity [σ2+(µ-τ)2]/∆2 in (6.22) is related to the Taguchi processing capability index Cpm, as defined in (6.12). The Taguchi loss function in (6.21) is a quadratic function of x, the origin of which is at the target τ; and the quadratic range is between LSL and USL. It implies that loss will be incurred as soon as the product quality deviates from the target τ even it is within the acceptance limits. A graphical representation of the loss function, L(x), is shown below:

L(x)

τ xLSL USL

Lo

a parabolic curve

∆ ∆

Example 6-7. Let us return to Example 6-6 and use the Taguchi model in (6.22) to estimate the loss. Here, we have τ = 100”; ∆ = 0.1”; µ = 100.02” and σ = 0.036”; hence, (6.22) yields,

E(L) = (350/0.01) [0.0362 + (100.02 - 100)2] = $59.36

Point to ponder: $5.39 based on the goal-post model? Or, $59.36 based on the Taguchi model?

Page 139: Reliability engineering

Chapter-VI Quality Control VI-14

Which estimate is right?

Example 6-8. Let us return to Example 6-3. The two samples of 30 resisters were each fitted by the normal function; and the following were the results:

For sample 1: µ = 49.70Ω; σ = 0.84Ω r2 = 0.962

For sample 2: µ = 50.10Ω; σ = 1.59Ω r2 = 0.956 The long-term processing capability indices were calculated as:

For sample 1: Cpk = 0.873; Cpm = 0.934

For sample 2: Cpk = 0.503; Cpm = 0.523

From the above, we see that both samples meet the specification, though sample-1 is deemed of having the better quality due to higher Cpk or Cpm. Now, if we use the goal-post model to calculate “loss estimate”, then either sample would incur any loss. However, if the Taguchi model is used, the calculated loss on “per $” cost basis for each sample would be: For sample 1: E(L) = $0.13 per $ For sample 2: E(L) = $0.41 per $.

Based on loss estimate, the Taguchi model points to supplier-1 as a better choice.

Smaller-is-Better and Larger-is-Better Models. The Taguchi model degenerates into two special cases, known as the “smaller is better” and the “larger is better”, respectively. In the former case, the model implies that it is better if the measure for the product quality is smaller. An example of this case is the noise level of automobile engines; the lower-specification-limit (LSL) for the noise level should be as low as possible, or LSL → 0; the upper-specification-limit (USL) for the noise level, on the other hand, is set at some finite acceptable level. Thus, by setting τ =LSL → 0 and ∆ = USL, the Taguchi loss function in (6.21) reduces to:

L(x) = Lox2/(USL)2 (6.23)

Then, the associated loss for the smaller-is-better case is given by:

E(L) = [L /(USL) (6.24)∫∝

f(x) dxx2

0o

2]

Graphically the smaller is better model is illustrated as shown below:

Page 140: Reliability engineering

Chapter-VI Quality Control VI-15

L(x)

x

USLLSL

reject

f(x)L(x)

Lo

In the larger-is-better case, it is better if the product quality measure is as large as possible. An example is the impact resistance of the automobile bumper. Here, the USL is the ideal target and it is better that τ → ∞ ; the LSL is set at some finite acceptable level. In this case, the Taguchi loss function in (6.21) reduces to:

L(x) = Lo(LSL)2/x2

(6.25) Then, the associated Taguchi loss is:

(6.26)∫∝

f(x) dxx-2

0

E(L) = L (LSL)o2

The larger is better model is illustrated graphically as shown below:

L(x)

x

USLLSL

reject

f(x)

Lo

→ ∝

Example 6-9. The purity of a chemical solution is measured by the % of contaminants in the solution. Thus, it is better if the contaminant in the solution is as small as possible. Online inspection of a particular batch of the solution finds that 0.5% of the solution sampled exceeded the USL; and the contaminant distribution in the solution can be described by the pdf:

f(x) = (1/α) e-x/α where α is a parameter characterizing the distribution function. Given the cost of the solution at $10.00 per pound, what is the expected Taguchi loss?

Page 141: Reliability engineering

Chapter-VI Quality Control VI-16

Discussion & Answer: In the above, two parameters are not explicitly specified: α and USL. In this case, the pdf has the exponential form and the CDF is readily obtained as F(x)=1- e-αx. Hence, the condition F( ∞ )=1 is automatically satisfied regardless the value of α. Now, we use the inspection result that 0.5% of the solution exceeded the USL: F(x=USL) = 1 – e-USLα = 0.995 From the above, we obtain: USL = 5.298α Then, by applying (6.24) for smaller-is-better, we obtain: E(L) = $0.712 per pound. Note: In obtaining E(L), the parameter α cancelled out automatically; the value of α remains still undetermined. Example 6-10. The tensile strength of a coating material is described by the Weibull function:

f(x) = (m/θ)(x/θ)m-1 exp[-(x/θ)m] where m=4 and θ=500 MPa. In a certain thermal coating situation, the lowest tensile strength limit is 100 MPa. The production cost is $30.00 per pound. Determine (a) The expected loss based on the goal-post loss model; and (b) The expected loss based on the Taguchi loss model. Solution: This is a “larger is better” case, as coating with tensile strength lower than 100 MPa would fail the specification. From the Weibull pdf given above, we first find the CDF:

F(x) = 1 - exp[-(x/θ)m] With m=4 and θ =500 MPa, the fraction of rejected material is:

F(x100) = 1 - exp[-(100/500)4] = 0.0016 (a) Based on the “goal-post” loss model, the expected loss is: E(L) = 0.0016x30 = 0.048 cent (b) The loss function for “larger-is-better” is given by (6.25): L(x) = Lo (LSL)2/x2

The expected “Taguchi” loss is found by integrating (6.26): E(L) = 4.254 cents per pound. Discussions: In the above two examples, integrals containing exponential functions had to be

Page 142: Reliability engineering

Chapter-VI Quality Control VI-17

evaluated in close-form. In Example 6-9, integration by parts was needed; in Example 6-10, a transformation of the variable x was necessary. An integration table would often be handy; otherwise, a numerical integration routine can be helpful as well. Example 6-11. Air bags used in automobiles are designed to fully inflate within 2.5 milliseconds (µsec) upon impact; but air-bag full inflation sooner or later than 2.5 µsec is unsafe to the passenger. The Highway Safety Regulation stipulates that the permissible time limits for air-bag full inflation are 2.5±0.375 µsec. A producer of air bags finds that the inflation times of their product are normally distributed, with the mean of 2.5 µsec (on target) and the standard deviation of 0.5 µsec. In order to meet the Safety Regulation, the producer instituted a screen test on the air bags to remove those with inflation times outside the upper and lower limits: µ ±1.5σ. (a) What is the yield of the air bags (those passing the screen test)? (b) What is the standard deviation of those passed the screen test? Solutions: The inflation times of the as-produced air bags are normally distributed; the pdf is given below:

f(t) = [1/[σ(2π)1/2]exp [-(t-µ)2/(2σ)] where σ =0.5 and µ=2.5 µsec, respectively. Note that the full-inflation time is on-target (µsec); but the standard deviation of 0.5µsec may be too high. In particular, within one standard deviation the yield is only Y=1-2Φ(-0.375/0.5)=0.5468. However, the screening test has removed the ones outside the limits of µ ± 1.5σ. Then, from (6.2), the yield from the screen test is: Y = 1 - 2Φ(-1.5) = 0.8664. For those passing the screen test, assume that the inflation time is still normally distributed; so the new pdf has the form:

f*(t) = A f(t) for (µ−1.5σ) < t < (µ+1.5σ)

f*(t) = 0 for t <(µ−1.5σ) and t > (µ+1.5σ) In f*(t), A is scaling constant that can be determined from the following:

∫∞

∞−dttf )(* = 1 = ∫

+

σµ

σµ

5.1

5.1)( dttAf = 0.8664Α

The above yields A = 1.154.

Note that the mean of f*(t) will remain as µ = 2.5 µsec; but the standard deviation of the new f*(t) is reduced to:

σ* = ± 0.7426σ = ± 0.3713 µsec Since the Safety Regulation limits are 2.5±0.375 µsec, we see that the screened air bags provide a yield to: Y=1-2Φ(-0.375/0.3713)=0.6876.

Discussion: The integral associated with the variance (σ*)2 can only be evaluated numerically. To do so, we first transform the variable t to ζ by introducing ζ = (t - µ)/σ. It then follows that

Page 143: Reliability engineering

Chapter-VI Quality Control VI-18

(σ ) =* 2 1.5 2 ∫0

ζ[(2A )/ ]√2πσ2exp[- /2]dζζ 2

The integral involving ζ is then evaluated numerically.

VI-4. Robust Design – A Brief Introduction. In product quality control, one often places attention to (a) product yield and (b) projected long-term loss. If the quality is on-target, the product yield is determined by the parameter ∆/σ, as in Φ(-∆/σ), see (6.2). Whereas the value of ∆ is usually preset by the designer, along with the target τ, it leaves the reduction of σσ as the only option to increase product yield; and at the same time, the projected long-term Taguchi loss is reduced, see (6.22). Factors influencing σ are intrinsic in nature and they can originate from many random sources. Hence, it is usually difficult to reduce σ beyond a certain limit. On the other hand, factors that influence the bias (µ−τ) are relatively few; and they can be identified physically known as noises. In most cases, noises can be controlled in the early design and development stages. The essence in robust design is to understand the physical nature of the “noises” and to correct them in the product design/development stages. The final outcome is to keep the quality mean on target and to reduce the quality variance to as small a value as possible. Noise Behaviors. Let X the product quality measure. Suppose that the “noise” factor A affects the outcome of X. If the influence mechanism of A on the value of X is known, a mathematical relationship (or a function), x(A), can be formulated on a deterministic basis. In general, there exist three distinctive behaviors of x(A): the linear behavior, the softening behavior and the hardening behavior. The following examples illustrate the key features in each of these three behaviors.

Example 6-12. Torque is required to open the cap of a peanut butter jar; but the required torque is related to the tightness of the cap. The cap tightness is often a random variable stemming from a variety of factors: the cap’s dimension, the material used, the ambient temperature, etc. Let the torque required to open the jar be denoted by X; the “noise” be the tightness of the cap, denoted by A. For simplicity, assume that the tightness is influenced by the stiffness of the material used only. Then, a certain relationship between X and A exists, denoted by the function, x(A). In the following, we examine the behavior of this function in the context of robust design. (1) Linear Behavior. If x(A) is a linear function, the slope of the straight line depends only on the stiffness of the material used. Then, consider two cases: (a) the cap is made using a softer material; and (b) the cap is made using a stiffer material. The respective x(A) functions are illustrated graphically as shown below:

Page 144: Reliability engineering

Chapter-VI Quality Control VI-19

x

A

stiff material

soft material

1 2

3

45

6

τ

Based the x(A) lines shown above, a robust design may begin as follows: The design target for the torque (needed to open cap) is denoted by ττ and it is placed on the X-axis indicated by an arrow. Note that it is undesirable if the torque is too high or too low from the target. Let the design start by selecting a distribution of the tightness with its mean at position-1; the function x(A) then yields the related torque distribution with the mean at position-3 if the softer material is used, while at position-5 if the stiffer material is used. Between these two designs, position-5 is on target while position-3 is below the target; but the design at position-3 yields a smaller scatter than that at position-5. A different design: let the cap tightness be designed with the distribution placed at position-2 (made tighter); then, the cap with the softer material yields a torque distribution at position-4, while the cap with the stiffer material lands at position-6. Between these two designs, the one at position-4 is on target; the one at position-6 is above the target. Furthermore, the design at position-4 yields a smaller scatter than that at position-6. Thus, the design at position-4 is a better design; it is on target and it has a smaller scatter. (2) Softening Behavior. If the dependence of X on A is a “softening” relationship, then x(A) is a concave downward curve, as illustrated graphically below:

Page 145: Reliability engineering

Chapter-VI Quality Control VI-20

x

A

stiff material

soft material

1 2

3

4

5

6

τ

In this case, we note that shifting A along the horizontal direction will affect both the mean and the variance of X in the vertical direction. Specifically, shifting A to the right will result in an increased mean and decreased variance for X, regardless the material used, soft or stiff. However, increase in the mean of X is more rapid if the stiffer material is used, while decrease in the variance of X is more rapid if the softer material is used. A robust design, in this case, would look for a material with A that will result in optimum design; that is the mean of X is on target and the variance of X is acceptably small. (3) Hardening Behavior. If the dependence of X on A is a hardening relationship, i.e. the X(A) curve is concave upward, then by shifting A to the right it will increase both the mean and the variance of X; note that any increase in the variance is undesirable. And, this undesirable effect is more pronounced if the stiffer material is used. A robust design in this case would not consider a hardening behavior of x(A). This effect of shifting A to the right on the increase of mean and variance of X can be readily seen in the plot shown below:

x

A

stiff material

soft material

1 2

3

45

6

τ

Page 146: Reliability engineering

Chapter-VI Quality Control VI-21

Discussion: In the above, we have demonstrated the essence of robust design using three different behavior functions of only a single variable: the linear, softening and hardening of X(A). In real-world design situations, the behavior function of X may depend on a multiple of parameters: X(A, B, C, ... ); there, robust design is a science by itself, involving linear and/or nonlinear dynamic programming.

Page 147: Reliability engineering

Chapter-VI Quality Control VI-22

Assigned Exercises: 6.1. A process is found to have a short-term Cp= 0.95 and a long-term Cpk=0.9. What is the short-term yield?

What is the long-term yield? [The long-term yield: Y = 99.31%] 6.2. A part must meet the 5-σ criterion in each of 10 independent specifications; determine the probability that the part fails to meet each of the specifications; then, estimate the final yield.

[Parts failed to meet the specifications: p = 0.6038x10-6; final yield: Y = 99.99%] 6.3. The target quality of a product is set at being 10. Off-line sampling finds that the product quality distribution (pdf) fits the following function:

f(x) = (0.04x)e-0.2x 0 x < ∝ (a) Is the product quality on target? (b) If the specification limits are 10±5, what is the probability of not meeting the specification? (c) What is the Taguchi loss if it costs $5.00 per product out of specification? [Ans.: (a) yes, µ=10; (b) 46.33%; (c) σ = 7/07 and E(L) = $10 per product] 6.4. An off-line inspection of a sample beer cans shows that 0.5% of the cans failed under the compression load of 101.37 lbs, while 0.3% sustained the load of 102.52 lbs. The design target of the can’s compressive strength is 102.00 lbs. (a) Assume that the compressive strength is normally distributed, determine the mean and variance. (b) Is the quality of the cans (in terms of their compressive strength) on target? (c) If the acceptance criterion is 102±0.25 lbs., what is the expected yield? (d) What is the Taguchi “process capability index”? (e) If it costs $0.05 to produce a can, what is the expected loss based on the “goal-post” criterion? (f) What is the expected long-term loss, based on the method of Taguchi? [(a) µ = 101.925 lbs; (c) Y = 72.42%; (d) Cpm= 0.364; (e) E(L) = 0.65 cent] 6.5. Voltage-drift in a computer circuit is allowed within the limits of ±0.8 volt. When the circuit is connected to a battery with a known voltage-drift distribution function, the probability that the voltage-drift allowable not be exceeded must be evaluated. Now, (a) If the voltage-drift distribution (in volts) of the battery is:

f(x) = (3/4)(1 - x2) for abs(x) <1 f(x) = 0 for abs(x) > 1 What is the probability that the circuit will encounter a voltage outside the allowable? (b) If each time the voltage exceeds the allowable, the battery has to be replaced and it costs $100 for a replacement, what is the expected Taguchi loss per battery? [(a) p = 5.6%; (b) E(L) = $31.25] 6.6. The solder-joints in a computer chip must have a diameter within 4±0.01µm. It is known that, for a batch of solder joints, the diameter distribution is normal and the mean is on target.

Page 148: Reliability engineering

Chapter-VI Quality Control VI-23

(a) What is the short-term standard deviation σ, if the solder joint meets the 6-σ criterion? (b) If there are 10000 solder joints on a chip, what is the yield of the chips? [(a) σ = 0.00167 µm; (b) Y = 99.98%] In a long-term production period, µ moved off target by 0.005 µm and σ increased 10%; (c) Determine the long-term processing capability index for the solder joints; (d) Determine the probability of failure to meet the 6-σ criteria for the solder joints; (e) Determine the long-term yield of the chips (which contain 10000 solder joints).

[(c) Cpk= 1.724; (d) p = 2Φ(-5.17); (e) Y = e-10000p] 6.7. Lifetime of an electric motor is normally distributed. There is a 5% chance for the motor to fail before 2000 hours; and a 15% chance for it to last beyond 4000 hours. In a device, the motor is used as a power source with the guarantee to provide at least 3000 hours of operation; however, should the motor fail before 3000 hours, it must be immediately replaced. The cost for replacing the motor is $1000. Determine the following, with relevant reasoning and calculations: (a) The mean (µ) and the standard deviation (σ) of the motor’s operational life distribution; (b) The probability that the motor fails before 3000 hours of operation; (c) The expected Taguchi loss, by the “large-is-better” model. [(a) µ = 3225 hrs; σ = 744.9 hrs; (b) p = 38.21%]