lab session assessing the fit of cox models

Lab Session Assessing Model Fit for Cox Models

Measuring model fit in Cox models is not analogous to multiple regression. The term residuals has a different meaning even though technically they are differences between two quantities.

The most straightforward approaches to assessing model fit involve graphing Cox-Snell residuals and compuring concordance measures. There are other approaches that are discussed in the Cleves and Singer& Willett text books as well.Here we use the rearrest dataset that we used earlier for testing the PH assumption.

Cox-Snell residual analysis of model fit

We compute Cox-snell residuals with the post-estimation predict command

.quietly stcox personal property cage

. predict csn, csnell

A kdensity graph of the csn should be approximately distributed under an exponential distribution if the model fit is relatively good. The graph below looks ok.

. kdensity csn

0.5

11.

5D

ensi

ty

0 1 2 3Cox-Snell residual

kernel = epanechnikov, bandwidth = 0.1419

Kernel density estimate

1

123456789

10111213141516171819202122232425262728

2930

A graph of CS residuals against a Kaplan-Meier estimate of cumulative hazard rate should hug a 45-degree straight line. To do this we can use the survival model procedures to get a an estimated cumulative hazard rate function based on the CS residuals

. stset csn, fail(event)

failure event: event != 0 & event < .obs. time interval: (0, csn] exit on or before: failure

------------------------------------------------------------------------------ 194 total obs. 0 exclusions------------------------------------------------------------------------------ 194 obs. remaining, representing 106 failures in single record/single failure data 106 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 2.634187

. sts gen H=na

0.5

11.

52

2.5

0 .5 1 1.5 2 2.5Cox-Snell residual

Nelson-Aalen cumulative hazard Cox-Snell residual

We can graph the cumulative hazard function based on the CS residuals against a 45-degree line of the values for those CS residuals . This is accomplished by listing the CS residual name twice on the graphing command below:

2

31

32333435363738394041424344454647484950515253

54555657585960

. twoway line H csn csn, sort0

.51

1.5

22.

5

0 .5 1 1.5 2 2.5Cox-Snell residual

Nelson-Aalen cumulative hazard Cox-Snell residual

The graph hugs the line ok until we get to higher values. This is not unusual because the magnitude of CS residuals tends to increase with increasing values of the time duration variable because of their definition involving the cumulative hazard rate. At high values of time, there are fewer cases still at risk for the event, and hence more sampling variation.

Harrell’s C Concordance Statistic

While I omit the details of how it is computed, Harrell’s Concordance Statistic measures the level of agreement between predictions and observed failure order (which persons rearrested sooner than others). The proportion of subject pairs in which the predictions and observed failure order are concordant. Suppose you compare subjects 1&2 failure order and find that 1 failed in a shorter time than subject 2. Then you compare the model predictions about expected survival. If the model predicts that subject 2 has a longer expected survival time than the expected survival time for subject 1, the pair of subjects are concordant. The minimum and maximum values are 0 and 1 theoretically, but flipping a coin would produce a value of 0.5. Hence the model fit is assessed by the degree to which the Harrell C statistic exceeds 0.50.

. stcox personal property cage

failure _d: event analysis time _t: months

Iteration 0: log likelihood = -494.7453

3

616263

64656667686970717273747576777879808182838485868788899091

Iteration 1: log likelihood = -477.21953Iteration 2: log likelihood = -475.33103Iteration 3: log likelihood = -475.29173Iteration 4: log likelihood = -475.29169Refining estimates:Iteration 0: log likelihood = -475.29169

Cox regression -- Breslow method for ties

No. of subjects = 194 Number of obs = 194No. of failures = 106Time at risk = 2678.455851 LR chi2(3) = 38.91Log likelihood = -475.29169 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- personal | 1.765921 .3623642 2.77 0.006 1.181153 2.640198 property | 2.548234 .8941334 2.67 0.008 1.281052 5.068879 cage | .9355462 .015692 -3.97 0.000 .9052904 .966813------------------------------------------------------------------------------

. estat concordance


Harrell's C concordance statistic

Number of subjects (N) = 194 Number of comparison pairs (P) = 12351 Number of orderings as expected (E) = 8566 Number of tied predictions (T) = 0

Harrell's C = (E + T/2) / P = .6935 Somers' D = .3871

1A shortcoming of the Harrell Concordance measure is that it’s value is sensitive to the amount of censoring in the sample data. The degree of bias increases with the level of censoring. There is an option called Gönen and Heller's concordance coefficient in which the level of censoring does not affect the value of the statistic.

estat concordance, gh


Gonen and Heller's K concordance statistic

Number of subjects (N) = 194

Gonen and Heller's K = .6765 Somers' D = .353

.

4

9293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148

lab session assessing the fit of cox models

Documents