10.2 product-limit (kaplan-meier) method · 10.2 product-limit (kaplan-meier) method let us use the...

13
10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival analysis or reliability analysis. Let T be a continuous random variable denoting failure time, and F (t) be the cdf of T . The reliability function R(t) or survival function S (t) is defined to be R(t)=1 - F (t) or S (t)=1 - F (t). Thus, R(t) and S (t) represent the probability that the event will occur after time t. In the course notes, I will use R(t) for a reliability study. Just replace R(t) with S (t) for a survival study. If we have complete data (no censoring) in a study of n subjects or items, then estimates of F (t) and R(t) are the empirical cdf b F (t) and the empirical reliability function b R(t). That is, b F (t) = number of items failing by time t n b R(t) = 1 - b F (t)= number of items not failing (surviving) at time t n Consider the following complete data set where t i = the i th failure time n i = the number of surviving items just prior to t i = the number of items at risk at time t i d i = the number of failures at t i i t i n i d i b R(t i ) n i - d i n i i Y k=0 n k - d k n k 0 -∞ 10 0 10/10 = 1 1 10/10 =1 1 36.3 10 1 9/10 = .9 9/10 1*(9/10) = .9 2 41.7 9 1 8/10 = .8 8/9 3 43.9 8 1 7/10 = .7 7/8 4 49.9 7 1 6/10 = .6 6/7 5 50.1 6 1 5/10 = .5 5/6 6 50.8 5 2 3/10 = .3 3/5 7 51.9 3 2 1/10 = .1 1/3 8 52.9 1 1 0/10 = 0 0 250

Upload: others

Post on 04-Aug-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

10.2 Product-Limit (Kaplan-Meier) Method

• Let us use the term failure time to indicate the time of the event of interest in either asurvival analysis or reliability analysis.

• Let T be a continuous random variable denoting failure time, and F (t) be the cdf of T .

• The reliability function R(t) or survival function S(t) is defined to be

R(t) = 1− F (t) or S(t) = 1− F (t).

• Thus, R(t) and S(t) represent the probability that the event will occur after time t.

• In the course notes, I will use R(t) for a reliability study. Just replace R(t) with S(t) for asurvival study.

• If we have complete data (no censoring) in a study of n subjects or items, then estimates of

F (t) and R(t) are the empirical cdf F̂ (t) and the empirical reliability function R̂(t). That is,

F̂ (t) =number of items failing by time t

n

R̂(t) = 1− F̂ (t) =number of items not failing (surviving) at time t

n

• Consider the following complete data set where

ti = the ith failure time

ni = the number of surviving items just prior to ti= the number of items at risk at time ti

di = the number of failures at ti

i ti ni di R̂(ti)ni − dini

i∏k=0

(nk − dknk

)0 −∞ 10 0 10/10 = 1 1 10/10 = 11 36.3 10 1 9/10 = .9 9/10 1*(9/10) = .92 41.7 9 1 8/10 = .8 8/93 43.9 8 1 7/10 = .7 7/84 49.9 7 1 6/10 = .6 6/75 50.1 6 1 5/10 = .5 5/66 50.8 5 2 3/10 = .3 3/57 51.9 3 2 1/10 = .1 1/38 52.9 1 1 0/10 = 0 0

250

Page 2: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

• Note that R̂(ti) =i∏

k=0

(nk − dknk

). Why? By definition:

R(ti) = P (T > ti)

• Note thatP (T > ti)

P (T > ti−1)can be estimated by

P̂ (T > ti)

P̂ (T > ti−1)=

(number of observations > ti)/n

(number of observations > ti−1)/n

=number of observations > ti

number of observations > ti−1

=

Now we substitute to get

R̂(ti) =P̂ (T > ti)

P̂ (T > ti−1)R̂(ti−1)

=ni − dini

R̂(ti−1)

• Recursively applying the formula yields R̂(ti) = for any event time ti.

• To find the standard error se(R̂(t)), consider the distribution of Xt = the number of itemssurviving at time t.

• Then Xt ∼ Binomial(n, p) where p = R(t)). Therefore,

V ar(Xt) = −→ V ar

(Xt

n

)=

and sd(R̂(t)) =

√R(t) (1−R(t)

n.

• Substitution of R̂(t) yields se(R̂(t)) =

√R̂(t) (1− R̂(t)

nwhich can be used to generate

approximate confidence intervals R̂(t)± z ∗ se(R̂(t)) for R(t).

251

Page 3: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

• The Kaplan-Meier Method (or product-limit method) generalizes the product form for

calculating R̂(ti) when censoring does exist.

• The goal is to estimate R(t) when we have censored observations.

• Consider the following censored data set where

t1 < t2 < · · · < tk be the failure times of the uncensored items.

di = the number of failures at times ti (i = 1, 2, . . . , k).

ni = the number of items at risk at time ti

= the number of censored and uncensored items surviving just prior to ti

– By convention, t0 = −∞, d0 = 0, and n0 = n.

• The Kaplan-Meier estimate of R(t) is

R̂(ti) = for i = 1, 2, . . . , k

• For any time t that is not an event time, find the largest ti < t. Then(R̂(t)

)= R̂(ti).

That is, R̂(t) remains constant between event times.

• An estimate of the variance of R̂(t) (Lawless 1982)) is:

V̂ ar(R̂(ti)) =[R̂(ti)

]2 ∑k

dknk(nk − dk)

where the summation is taken over all uncensored times tk such that tk ≤ ti.

• Taking the square root yields se(R̂(ti)) which can be used to generate the following approx-imate confidence interval for R(ti):

R̂(ti)± z ∗ se(R̂(ti))

• Let µT = E(T ) = be the mean time to a failure event (or, the mean reliability or survivaltime). The estimated mean is

µ̂T =k∑

i=1

R̂(ti) (ti − ti−1) wherelt0 is defined to be 0.

• If the last observation is censored, then µ̂T is biased and underestimates µT .

• The estimated variance of µ̂T is

V̂ ar(µ̂T ) =m

m− 1

k∑i=1

A2i dinisi

where Ai =k−1∑j=i

R̂(ti) (tj+1 − tj) and m =k∑

i=1

di.

• Thus, an approximate confidence interval for µT is

µ̂T ± z∗√V̂ ar(µ̂T )

252

Page 4: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

The following example contains data from a product lifetime study of industrial grinders.Twenty grinders were tested and the time each grinder failed or the time it was re-moved due to censoring was recorded. + implies a censoring time.

i ti ni dini − dini

i∏k=0

(nk − dknk

)0 −∞ 20 0 1 20/20 = 1

1 42.1+ 20 — — — —

2 77.8 19 1 18/19

3 83.3+ 18 — — — —

4 88.7 17 1 16/17

5 101.1 16 1 15/16

6 105.9 15 1 14/15

7 117.0 14 1 13/14

8 126.9 13 1 12/13

9 138.7 12 1 11/12

10 148.9 11 1 10/11

11 151.3+ 10 — — — —

12 157.3 9 1 8/9

13 163.8 8 1 7/8

14 177.2+ 7 — — — —

15 194.3+ 6 — — — —

16 195.6+ 5 — — — —

17 207.0 4 1 3/4

18 215.3+ 3 — — — —

19 217.4 2 1 1/2

20 258.8+ 1 — — — —

253

Page 5: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

Plot of Kaplan-Meier R̂(t) for the Grinder Data

276

254

Page 6: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

SAS output for Grinder Example

RELIABILITY STUDY OF GRINDER LIFETIMES

The LIFETEST Procedure

Product-Limit Survival Estimates

SurvivalStandard Number Number

FAILTIME Survival Failure Error Failed Left

0.000 1.0000 0 0 0 2042.100* . . . 0 1977.800 0.9474 0.0526 0.0512 1 1883.300* . . . 1 1788.700 0.8916 0.1084 0.0724 2 16

101.100 0.8359 0.1641 0.0867 3 15105.900 0.7802 0.2198 0.0972 4 14117.000 0.7245 0.2755 0.1050 5 13126.900 0.6687 0.3313 0.1108 6 12138.700 0.6130 0.3870 0.1147 7 11148.900 0.5573 0.4427 0.1170 8 10151.300* . . . 8 9157.300 0.4954 0.5046 0.1193 9 8163.800 0.4334 0.5666 0.1194 10 7177.200* . . . 10 6194.300* . . . 10 5195.600* . . . 10 4207.000 0.3251 0.6749 0.1297 11 3215.300* . . . 11 2217.400 0.1625 0.8375 0.1320 12 1258.800* . . . 12 0

NOTE: The marked survival times are censored observations.

Summary Statistics for Time Variable FAILTIME

Quartile Estimates

Point 95% Confidence IntervalPercent Estimate Transform [Lower Upper)

75 217.400 LOGLOG 163.800 .50 157.300 LOGLOG 117.000 217.40025 117.000 LOGLOG 77.800 148.900

Mean Standard Error

163.177 12.355

NOTE: The mean survival time and its standard error were underestimatedbecause the largest observation was censored and the estimationwas restricted to the largest event time.

Summary of the Number of Censored and Uncensored Values

PercentTotal Failed Censored Censored

20 12 8 40.00

255

Page 7: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

RELIABILITY STUDY OF GRINDER LIFETIMES

95% CONFIDENCE INTERVALS (Lower,Upper)

Obs FAILTIME _CENSOR_ SURVIVAL Lower Upper

1 0.0 . 1.00000 1.00000 1.000002 42.1 1 1.00000 . .3 77.8 0 0.94737 0.68119 0.992414 83.3 1 0.94737 . .5 88.7 0 0.89164 0.63146 0.971796 101.1 0 0.83591 0.57266 0.944007 105.9 0 0.78019 0.51479 0.911388 117.0 0 0.72446 0.45914 0.875059 126.9 0 0.66873 0.40592 0.83563

10 138.7 0 0.61300 0.35510 0.7934911 148.9 0 0.55728 0.30664 0.7488612 151.3 1 0.55728 . .13 157.3 0 0.49536 0.25274 0.6985214 163.8 0 0.43344 0.20302 0.6451115 177.2 1 0.43344 . .16 194.3 1 0.43344 . .17 195.6 1 0.43344 . .18 207.0 0 0.32508 0.10502 0.5710419 215.3 1 0.32508 . .20 217.4 0 0.16254 0.01275 0.4692021 258.8 1 . . .

SAS code for Grinder Example

**************************************;*** Multiply censored data example ***;**************************************;

DATA EXAMPLE2;DO ITEM = 1 TO 20;

INPUT FAILTIME STATUS $ @@;CENSORED = (STATUS=’Y’); OUTPUT;

END;LABEL FAILTIME = ’TIME TO FAILURE IN HOURS’;TITLE ’RELIABILITY STUDY OF GRINDER LIFETIMES’;CARDS;42.1 Y 77.8 N 83.3 Y 88.7 N 101.1 N

105.9 N 117.0 N 126.9 N 138.7 N 148.9 N151.3 Y 157.3 N 163.8 N 177.2 Y 194.3 Y195.6 Y 207.0 N 215.3 Y 217.4 N 258.8 Y;PROC LIFETEST DATA= EXAMPLE2 PLOTS=(S) OUTSURV=SURVIVE;

TIME FAILTIME*CENSORED(1);

PROC PRINT DATA=SURVIVE;RUN;

256

Page 8: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

10.3 Using SAS Proc Reliability and Proc Lifetest

10.3 Proc Lifetest and Proc Reliability in SAS

251

257

Page 9: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

252258

Page 10: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

253259

Page 11: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

254

260

Page 12: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

10.4 Plotting Methods for Exponential and Weibull Distributions

Plot to check for an exponential distribution:

Plot to check for an Weibull distribution:

Example: Appliance Cycle Data Set: The following table shows the cycles (number of timesused) to failure of a component in a small appliance. The engineering group wanted to estimatethe percentage failing during the warranty (500 cycles) and an estimate of the median life. For the54 tested appliances, failure times are unmarked, while censored times are indicated by +.

250

261

Page 13: 10.2 Product-Limit (Kaplan-Meier) Method · 10.2 Product-Limit (Kaplan-Meier) Method Let us use the term failure time to indicate the time of the event of interest in either a survival

Appliance Cycle Data Example

10.3.1 Example 1

255

Check for Exponential Distribution Check for Weibull Distribution

262