lab session assessing the fit of cox models
DESCRIPTION
cox modelTRANSCRIPT
Lab Session Assessing Model Fit for Cox Models
Measuring model fit in Cox models is not analogous to multiple regression. The term residuals has a different meaning even though technically they are differences between two quantities.
The most straightforward approaches to assessing model fit involve graphing Cox-Snell residuals and compuring concordance measures. There are other approaches that are discussed in the Cleves and Singer& Willett text books as well.Here we use the rearrest dataset that we used earlier for testing the PH assumption.
Cox-Snell residual analysis of model fit
We compute Cox-snell residuals with the post-estimation predict command
.quietly stcox personal property cage
. predict csn, csnell
A kdensity graph of the csn should be approximately distributed under an exponential distribution if the model fit is relatively good. The graph below looks ok.
. kdensity csn
0.5
11.
5D
ensi
ty
0 1 2 3Cox-Snell residual
kernel = epanechnikov, bandwidth = 0.1419
Kernel density estimate
1
123456789
10111213141516171819202122232425262728
2930
A graph of CS residuals against a Kaplan-Meier estimate of cumulative hazard rate should hug a 45-degree straight line. To do this we can use the survival model procedures to get a an estimated cumulative hazard rate function based on the CS residuals
. stset csn, fail(event)
failure event: event != 0 & event < .obs. time interval: (0, csn] exit on or before: failure
------------------------------------------------------------------------------ 194 total obs. 0 exclusions------------------------------------------------------------------------------ 194 obs. remaining, representing 106 failures in single record/single failure data 106 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 2.634187
. sts gen H=na
0.5
11.
52
2.5
0 .5 1 1.5 2 2.5Cox-Snell residual
Nelson-Aalen cumulative hazard Cox-Snell residual
We can graph the cumulative hazard function based on the CS residuals against a 45-degree line of the values for those CS residuals . This is accomplished by listing the CS residual name twice on the graphing command below:
2
31
32333435363738394041424344454647484950515253
54555657585960
. twoway line H csn csn, sort0
.51
1.5
22.
5
0 .5 1 1.5 2 2.5Cox-Snell residual
Nelson-Aalen cumulative hazard Cox-Snell residual
The graph hugs the line ok until we get to higher values. This is not unusual because the magnitude of CS residuals tends to increase with increasing values of the time duration variable because of their definition involving the cumulative hazard rate. At high values of time, there are fewer cases still at risk for the event, and hence more sampling variation.
Harrell’s C Concordance Statistic
While I omit the details of how it is computed, Harrell’s Concordance Statistic measures the level of agreement between predictions and observed failure order (which persons rearrested sooner than others). The proportion of subject pairs in which the predictions and observed failure order are concordant. Suppose you compare subjects 1&2 failure order and find that 1 failed in a shorter time than subject 2. Then you compare the model predictions about expected survival. If the model predicts that subject 2 has a longer expected survival time than the expected survival time for subject 1, the pair of subjects are concordant. The minimum and maximum values are 0 and 1 theoretically, but flipping a coin would produce a value of 0.5. Hence the model fit is assessed by the degree to which the Harrell C statistic exceeds 0.50.
. stcox personal property cage
failure _d: event analysis time _t: months
Iteration 0: log likelihood = -494.7453
3
616263
64656667686970717273747576777879808182838485868788899091
Iteration 1: log likelihood = -477.21953Iteration 2: log likelihood = -475.33103Iteration 3: log likelihood = -475.29173Iteration 4: log likelihood = -475.29169Refining estimates:Iteration 0: log likelihood = -475.29169
Cox regression -- Breslow method for ties
No. of subjects = 194 Number of obs = 194No. of failures = 106Time at risk = 2678.455851 LR chi2(3) = 38.91Log likelihood = -475.29169 Prob > chi2 = 0.0000
------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- personal | 1.765921 .3623642 2.77 0.006 1.181153 2.640198 property | 2.548234 .8941334 2.67 0.008 1.281052 5.068879 cage | .9355462 .015692 -3.97 0.000 .9052904 .966813------------------------------------------------------------------------------
. estat concordance
failure _d: event analysis time _t: months
Harrell's C concordance statistic
Number of subjects (N) = 194 Number of comparison pairs (P) = 12351 Number of orderings as expected (E) = 8566 Number of tied predictions (T) = 0
Harrell's C = (E + T/2) / P = .6935 Somers' D = .3871
1A shortcoming of the Harrell Concordance measure is that it’s value is sensitive to the amount of censoring in the sample data. The degree of bias increases with the level of censoring. There is an option called Gönen and Heller's concordance coefficient in which the level of censoring does not affect the value of the statistic.
estat concordance, gh
failure _d: event analysis time _t: months
Gonen and Heller's K concordance statistic
Number of subjects (N) = 194
Gonen and Heller's K = .6765 Somers' D = .353
.
4
9293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148