graphical analysis of repair data

Appendix M: Repair Data Analysis of Abernethy, R.B. (2006), The New Weibull Handbook, 5th ed., available from Dr. R.A. Abernethy, [email protected], 536 Oyster Road, North Palm Beach, FL 33408. May 5, 2006

AN APPLICATION OF GRAPHICAL ANALYSIS OF REPAIR DATA

Wayne Nelson, [email protected], 739 Huntingdon Drive, Schenectady, NY 12309, USA

SUMMARY. This expository article presents a simple and informative non-parametric plot of repair data on a sample of systems. The plot is illustrated with transmission repair data from cars on a preproduction road test.

KEY WORDS: repair data; reliability data; graphical analysis.

1. INTRODUCTION

Purpose. This article presents a simple and informative plot for analyzing data on numbers or costs of repeated repairs of a sample of systems. The plotting method provides a non-parametric graphical estimate of the population mean cumulative number or cost of repairs per system versus age. This estimate can be used to:

1.Evaluate whether the population repair (or cost) rate increases or decreases with age (this is useful for system retirement and burn-in decisions),2.Compare two samples from different designs, production periods, maintenance policies, environ-ments, operating conditions, etc.,3.Predict future numbers and costs of repairs,4.Reveal unexpected information and insight, an important advantage of plots.

Overview. Section 2 describes typical repair data. Section 3 defines the basic population model and its mean cumulative function (MCF) for the number or cost of repairs. Section 4 shows how to calculate and plot a sample estimate of the MCF from data from systems with a mix of ages. Section 5 explains how to use and interpret such plots.

2. REPAIR DATA

Purpose. This section describes typical repair data from a sample of systems.

Transmission data. Table 1 displays a small set of typical repair data on a sample of 34 cars in a preproduction road test. Information sought from the data includes (1) the mean cumula-tive number of repairs per car by 24,000 test miles (132,000 customer miles, design life) and (2) whether the population repair rate increases or decreases as the population ages. For each car the data consist of the car's mileage at each transmission repair and the latest observed mileage. For example, the data on car 024 are a repair at 7068 miles and its latest mileage 26,744+ miles; here + indicates this how long the car has been observed. Nelson (1988,1990,1995, 2003) gives repair data on blood analyzers, residential heat pumps, window air-conditioners, power supplies, turbines, and other applications. The methods below apply to recurrence data from many fields.

1

Table 1. Transmission repair dataCAR M I L E A G E _ .

024 7068 26744+026 28 13809+027 48 1440 29834+029 530 25660+

031 21762+032 14235+034 1388 21133+035 21401+098 21876+107 5094 18228+108 21691+109 20890+110 22486+111 19321+

112 21585+113 18676+114 23520+115 17955+116 19507+117 24177+118 22854+119 17844+120 22637+121 375 19607+122 19403+123 20997+124 19175+125 20425+126 22149+129 21144+130 21237+131 14281+132 8250 21974+133 19250 21888+

1

Censoring. A system's latest observed age is called its "censoring age", because the system's repair history beyond that age is censored (unknown) at the time of the data analysis. Usually, system censoring ages differ. The different censoring ages complicate the data analysis and require the methods here. A system may have no failures; then its data are just its censoring age. Other systems may have one, two, three, or more repairs before its censoring age.

Age. Here "age" (or "time") means any useful measure of system usage, e.g., mileage, days, cycles, months, etc.

3. THE POPULATION AND ITS MEAN CUMULATIVE FUNCTION

Model. Needed information on the repair behavior of a population of systems is given by the population mean cumulative function (MCF) versus age t. This function is a feature of the following model for the population without censoring. At a particular age t each population system has accumulated a total cost (or number) of repairs. These cumulative system totals usually differ. Figure 1 depicts the population of such uncensored system cumulative cost history functions as smooth curves for ease of viewing. In reality the histories are staircase functions where the rise of each step is a system's cost or number of repairs at that age. However, staircase functions are hard to view in such a plot. At age t, there is a population distribution of the cumulative cost (or number) of repairs. It appears in Figure 1 as a continuous density. This distribution at age t has a population mean M(t). M(t) is plotted versus t as a heavy line in Figure 1. M(t) is called the population "mean cumulative function" (MCF) for the cost (or number) of repairs. It provides most information sought from repair data.

Figure 1. Population cumulative cost histories (uncensored), distribution

Repair rate. When M(t) is for the number of repairs, the derivative

m(t) = dM(t)/dt

1

is assumed to exist and is called the population "instantaneous repair rate". It is also called the "recurrence rate" or "intensity function" when some other repeating occurrence is observed. It is expressed in repairs per unit time per system, e.g., transmission repairs per 1000 miles per car. Some mistakenly call m(t) the "failure rate", which causes confusion with the quite different failure rate (hazard function) of a life distribution for non-repaired units (usually components). The failure rate for a life distribution has an entirely different definition, meaning, and use, as explained by Ascher and Feingold (1984). Note that the hazard function for a time-to-repair distribution is also called a "repair rate" but is an entirely different concept.

4. ESTIMATE AND PLOT OF THE MCF

Steps. The following steps yield a non-parametric estimate M*(t) of the population MCF M(t) for the number of repairs from a sample of N systems; N = 34 cars.

1. List all repair and censoring ages in order from smallest to largest as in column (1) of Table 2. Denote each censoring age with a +. If a repair age of a system equals its censoring age, put the repair age first. If two or more systems have a common age, list them in a suitable order, possibly random.

2. For each sample age, write the number I of systems then in use ("at risk") in column (2) as follows. If the earliest age is a censoring age, write I = N −1; otherwise, write I = N. Proceed down column (2) writing the same I-value for each successive repair age. At each censoring age, reduce the I-value by one. For the last age, I = 0.

3. For each repair, calculate its observed mean number of repairs at that age as 1/I. For example, for the repair at 28 miles, 1/34 = 0.03, which appears in column (3). For a censoring age, the observed mean number is zero, corresponding to a blank in column (3). However, the censoring ages determine the I-values of the repairs and thus are properly taken into account.

4. In column (4), calculate the sample mean cumulative function M*(t) for each repair as follows. For the earliest repair age, this is the corresponding mean number of repairs, namely, 0.03 in Table 2. For each successive repair age this is the corresponding mean number of repairs (column (3)) plus the preceding mean cumulative number (column (4)). For example, at 19,250 miles, this is 0.04+0.26 = 0.30. Censoring ages have no mean cumulative number.

5. For each repair, plot on graph paper its mean cumulative number (column (4)) against its age (column (1)) as in Figure 2. This plot displays the non-parametric estimate M*(t), which is a staircase function and is called the sample MCF. Censoring times are not plotted but are taken into account in the MCF estimate.

1

Table 2. MCF Calculations (1) (2) (3) (4)Mileage No. r mean MCF obs'd no. 1/r 28 34 0.03 0.03 48 34 0.03 0.06 375 34 0.03 0.09 530 34 0.03 0.12 1388 34 0.03 0.15 1440 34 0.03 0.18 5094 34 0.03 0.21 7068 34 0.03 0.24 8250 34 0.03 0.2713809+ 3314235+ 3214281+ 3117844+ 3017955+ 2918228+ 2818676+ 2719175+ 2619250 26 0.04 0.3119321+ 2519403+ 2419507+ 2319607+ 2220425+ 2120890+ 2020997+ 1921133+ 1821144+ 1721237+ 1621401+ 1521585+ 1421691+ 1321762+ 1221876+ 1121888+ 1021974+ 922149+ 822486+ 722637+ 622854+ 523520+ 424177+ 325660+ 226744+ 129834+ 0

1

Plot. Figure 2 was plotted by Nelson and Doganaksoy's (1994) program MCFLIM, which does the calculations above. The program also calculates non-parametric approximate 95% confidence limits for M(t); they are shown above and below each data point with a -. Nelson's (1995,2003) complex calculations of these limits requires a computer program like MCFLIM, available from him.

| -0.50+ | | | - |0.40+ - | - |MCF | - |0.30+ 1 | - | 1 | 1 | -0.20+ 1 | - 1 | |- 1 | 10.10+ - |-1 - |1 - | - |1 --0. +:-...:....:....:....:....:....:....:....:....:....:....:....: =- 4000 8000 12000 16000 20000 24000 M I L E A G E

Figure 2. Transmission data MCF and 95% confidence limits - .

Software. The following programs calculate and plot the MCF estimate and confidence limits from data with exact ages and right censoring. They handle the number and cost (or value) of recurrences and allow for positive or negative values.

• MCFLIM of Nelson and Doganaksoy (1989). This was used to obtain Figure 2. • The RELIABILITY Procedure in the SAS/QC Software of the SAS Institute (2004).• The JMP software of the SAS Institute (2005), 565-572.• SPLIDA features developed by Meeker and Escobar (2004) for S-PLUS. • A program developed for General Motors by Robinson (1995).• The ReliaSoft (2005) Weibull++7 software has a Recurrence Data Analysis (RDA) add-on.

5. HOW TO INTERPRET AND USE A PLOT

MCF estimate. The plot displays a non-parametric estimate M*(t) of M(t). That is, the estimate involves no assumptions about the form of M(t) or the process generating the system histories. This nonparametric estimate is a staircase function that is flat between repair ages, but the flat portions need not be plotted. The MCF of a large population is usually regarded as a smooth curve, and one usually imagines a smooth curve through the plotted points. Interpretations of such plots appear below. See Nelson (1988,1995,2003) for more detail.

Mean cumulative number. An estimate of the population mean cumulative number of repairs by a specified age is read directly from such a curve through the plotted points. For

1

example, from Figure 2 the estimate of this by 24,000 miles is 0.31 repairs per car, an answer to a basic question.

Repair rate. The derivative of such a curve (imagined or fitted) estimates the repair rate m(t). If the derivative increases with age, the population repair rate increases as systems age. If the derivative decreases, the population repair rate decreases with age. The behavior of the rate is used to determine burn-in, overhaul, and retirement policies. In Figure 2 the repair rate (derivative) decreases as the transmission population ages, the answer to a basic question.

Burn-in. Some systems are subjected to a factory burn-in. Systems typically are run and repaired until the instantaneous (population) repair rate decreases to a desired value m'. An estimate of the suitable length t' of burn-in is obtained from the sample MCF as shown in Figure 3. A straight line segment with slope m' is moved until it is tangent to the MCF. The corresponding age t' at the tangent point is suitable, as shown in Figure 3.

Figure 3. To determine age for burn-in t'.

Other information. Nelson (2003) gives other applications and information on

1. Predicting future numbers or costs of repairs for a fleet,2. Analyzing repair cost data or other numerical values associated with repairs,3. Analyzing availability data, including downtime for repairs,4. Analyzing data with more complex censoring where system histories have gaps with

missing repair data,5. Analyzing data with a mix of types of repairs,6. The minimal assumptions on which the non-parametric estimate M*(t) and confidence limits

depend.

Nelson and Doganaksoy (1994) and Nelson (2003) show how to statistically compare two data sets with respect to their MCFs. Such data sets may come from different designs, production periods, environments, maintenance policies, etc.

1

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

0 20 40 60 80 100

TIME t

M(t)

t' Desired

m' Desired

Literature. Most models and data analysis methods for repair data are parametric and involve more assumptions, which often are unrealistic. For example, Englehardt (1995) and Ascher and Feingold (1984) present such models, analyses, and assumptions for a single system, not for a sample of systems. The simplest such parametric model is the Poisson process. Also, the methods here extend to costs, downtimes, and other values associated with repairs, whereas previous methods apply only to simple counts of repairs.

Concluding remarks. The simple plot of the sample MCF is informative and widely useful. It requires minimal assumptions and is simple to make and present to others.

Acknowledgments. This updated version of Nelson (1998) appears here with the kind permission of Wiley, publisher of Quality and Reliability Engineering International. The author gratefully thanks Mr. Richard J. Rudy of Daimler-Chrysler, who generously granted permission to use the transmission data here.

Author. Dr. Wayne Nelson consults and trains privately on reliability data analysis. He is a Fellow of the Institute of Electrical and Electronic Engineers, the American Society for Quality, and the American Statistical Association. He has published 100+ articles, two Wiley classic books (Applied Life Data Analysis and Accelerated Testing), the SIAM book Nelson (2003), and various booklets.

REFERENCES

Abernethy, R.B. (2006), The New Weibull Handbook, 5th ed., available from Dr. R.A. Abernethy, [email protected], 536 Oyster Road, North Palm Beach, FL 33408.

Ascher, H. and Feingold, H. (1984), Repairable Systems Reliability, Marcel Dekker, New York.

Englehardt, M. (1995), "Models and Analyses for the Reliability of a Single Repairable System," in N. Balakrishnan (ed.), Recent Advances in Life-Testing and Reliability, CRC Press, Boca Raton, FL, 79-106.

Meeker, W.Q. and Escobar, L.A. (2004), “SPLIDA (S-PLUS Life Data Analysis) Software – Graphical User Interface,” available from Prof. Wm. Q. Meeker, Statistics Dept., Iowa State Univ., Ames, Iowa 50010 or www.public.iastate.edu/~splida.

Nelson, Wayne (1988), "Graphical Analysis of System Repair Data," J. Quality Technology 20, 2-35.

Nelson, Wayne (1990), "Hazard Plotting of Left Truncated Life Data," J. Quality Technology. 22, 230-238.

Nelson, Wayne (1995), "Confidence Limits for Recurrence Data--Applied to Cost or Number of Repairs," Technometrics, 37, 147-157.

Nelson, Wayne (1998), "An Application of Graphical Analysis of Repair Data," Quality and Reliability Engineering International 14, 49-52.

Nelson, Wayne B. (2003), Recurrent Events Data Analysis for Product Repairs, Disease Recurrences, and Other Applications, ASA-SIAM Series on Statistics and Applied Probability, www.siam.org/books/sa10 .

Nelson, Wayne and Doganaksoy, Necip (1994), Documentation for MCFLIM and MCFDIFF -- Programs for Recurrence Data Analysis, available from Wayne Nelson, [email protected].

ReliaSoft Corp. (2005) Weibull++7 Life Data Analysis Reference, ReliaSoft Publishing, 115 S. Sherwood Village Dr., ReliaSoft Plaza, Tucson, AZ 85710, www.reliasoft.com. RDA documentation at http://www.weibull.com/LifeDataWeb/recurrent_events_data_analysis.htm

1

Robinson, J.A. (1995), “Standard Errors for the Mean Cumulative Number of Repairs on Systems from a Finite Population,” in Recent Advances in Life-Testing and Reliability, 195-217, ed. N. Balakrishnan, CRC Press, Boca Raton, FL.

SAS Institute (2004), SAS/QC 9.1 User's Guide, Volumes 1, 2, and 3, SAS Publishing, SAS Campus Dr., Cary, NC 27513. This can be accessed free at (click on SAS/QC) http://support.sas.com/91doc/docMainpage.jsp

SAS Institute (2005), “JMP Statistical Discovery Software: Statistics and Graphics Guide,” Version 6, 565-572, SAS Campus Dr., Cary, NC 27513.

1

graphical analysis of repair data

Documents