eha diagnostics sociology 229a: event history analysis class 5 copyright © 2008 by evan schofer do...
TRANSCRIPT
![Page 1: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/1.jpg)
EHA Diagnostics
Sociology 229A: Event History AnalysisClass 5
Copyright © 2008 by Evan SchoferDo not copy or distribute without permission
![Page 2: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/2.jpg)
Announcements
• Class topics: • Cox model: examining the baseline hazard
– And hazard for various groups in your data
• Cox model diagnostics (part 1)• Discussion of readings
![Page 3: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/3.jpg)
Cox Model: Baseline Hazard
• Cox models involve a “baseline hazard”• Note: baseline = when all covariates are zero• Question: What does the baseline hazard look like?
– Or baseline survivor & integrated hazard?
– Stata can estimate the baseline survivor, hazard, integrated hazard. Two steps:
• 1. You must ask stata to save the info when you run the Cox model
– Ex: stcox gdp degradation education democracy ngo ingo, robust nohr basehc(h0)
• 2. Use “stcurve” command to plot the baseline curves– Ex: stcurve, hazard OR stcurve, survival
![Page 4: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/4.jpg)
Cox Model: Baseline Hazard
• Baseline rate: Adoption of environmental law0
.02
.04
.06
.08
Sm
ooth
ed
haza
rd fu
nctio
n
1970 1980 1990 2000analysis time
Cox proportional hazards regression
![Page 5: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/5.jpg)
Cox Model: Baseline Hazard
• Note: It may not always make sense to plot the baseline hazard
• Baseline shows hazard when X variables are zero• Sometimes zero values aren’t very useful/interesting
– Example: Does it make sense to plot hazard of countries adopting laws, if X vars = zero?
• Hazard rate might be quite low• In some cases, you’ll just get a flat zero curve
– Or extremely high values
– Solutions:• 1. Rescale indep vars before running cox model• 2. Use stcurve to choose relevant values of vars.
![Page 6: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/6.jpg)
Cox Model: Estimated Hazards
• You can also use stcurve to plot estimated hazard rates based on values of indep vars
• Ex: What is hazard curve if democracy = 1, 5, 10?
• Strategy: use “at” subcommand:• stcurve , hazard at(democ=1) at2(democ=10) • NOTE: All other variables are pegged at the mean…
![Page 7: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/7.jpg)
Cox: Estimated Hazard Rate
• Hazard rate for adoption of environmental law0
.2.4
.6.8
Sm
ooth
ed
haza
rd fu
nctio
n
1970 1980 1990 2000analysis time
democracy=1 democracy=10
Cox proportional hazards regression
![Page 8: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/8.jpg)
Cox Model Diagnostics
• Issues that you must deal with:• 1. How to estimate results with “ties” in your data
– Ties = cases that fail at the exact same time
• 2. How to identify violations of the proportional hazard assumption
• 3. Dealing with outliers/influential cases• 4. Assessing model fit
– Most of this applies to parametric models• Ties are not a concern• But, additional issues come up: choosing the right
functional form (shape) to model the hazard.
![Page 9: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/9.jpg)
Cox Model Issues: Ties
• How to handle ties in data• It is mathematically complex to estimate models when
there are tied failures– That is: two cases that have events at the exact same time
• Several mathematical approaches:– Breslow approximation – simplest approach
• Stata default, but not the best choice!
– Efron approximation – generally better• More computationally intensive, but given the power of
modern computers it is not an issue• stcox var1 var2 var3, efron
![Page 10: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/10.jpg)
Cox Model Issues: Ties– Exact marginal – “continuous time approximation”
– Box-Steffensmeier & Jones: “Averaged Likelihood”
• Assumes ties didn’t happen EXACTLY at the same time… and considers all possible orderings
– Exact partial – “discrete”– Box-Steffensmeier & Jones: “exact discrete method”
• Assumes ties happened EXACTLY at the same time
– Advice:• Use Efron at a minimum• Exact methods are often more accurate
– Exact marginal often makes most sense… events rarely occur at the EXACT same time… unless you have discrete data
– But, exact methods can take a LONG time.– For big datasets with many ties, Efron is OK.
![Page 11: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/11.jpg)
Proportional Hazard Assumption
• Key assumption: Proportional hazards• Estimated Hazard ratios are proportional over time• i.e., Estimates of a hazard ratio do NOT vary over time
– Example: Effect of “abstinence” program on sexual behavior
• Issue: Do abstinence programs lower the rate in a consistent manner across time?
– Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group).
– Groups are assumed to have “parallel” hazards• Rather than rates that diverge, converge (or cross).
![Page 12: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/12.jpg)
Proportional Hazard Assumption
• Strategies:
• 1. Visually examine raw hazard plots for sub-groups in your data
• Watch for non-parallel trends• A crude method… not the best approach… but often
identifies big violations
![Page 13: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/13.jpg)
Proportional Hazard Assumption• Visual examination of raw hazard rate
0.0
5.1
.15
1970 1980 1990 2000analysis time
west = 0 west = 1
Smoothed hazard estimates, by west
You want them to change proportionally
If one doubles, so does the other…
![Page 14: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/14.jpg)
Proportional Hazard Assumption
• 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables
• What stata calls “stphplot”• Parallel lines indicate proportional hazards• Again, convergence and divergence (or crossing)
indicates violation
– A less-common approach: compare observed survivor plot to predicted values (for different values of X)
• What stata calls “stcoxkm”• If observed are similar to predicted, assumption is not
likely to be violated.
![Page 15: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/15.jpg)
Proportional Hazard Assumption• -ln(-ln(survivor)) vs. ln(time) – “stphplot”
Parallel=good
Convergence suggests violation of proportional hazard assumption
(But, I’ve seen worse!)
-10
12
34
-ln[-
ln(S
urv
ival
Pro
babi
lity)
]
7.585 7.59 7.595 7.6 7.605ln(analysis time)
west = 0 west = 1
![Page 16: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/16.jpg)
Proportional Hazard Assumption• Cox estimate vs. observed KM – “stcoxkm”
0.0
00.
20
0.4
00.
60
0.8
01.
00
Sur
viva
l Pro
bab
ility
1970 1980 1990 2000analysis time
Observed: west = 0 Observed: west = 1Predicted: west = 0 Predicted: west = 1
Predicted differs from observed for countries in West
![Page 17: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/17.jpg)
Proportional Hazard Assumption
• 3. Piecewise Models• Piecewise = break model up into pieces (by time)
– Ex: Split analysis in to “early” vs “late” time
• If coefficients vary in different time periods, hazards are not proportional
– Example:• stcox var1 var2 var3 if _t < 10 • stcox var1 var2 var3 if _t >= 10 • Look for large changes in coefficients!
![Page 18: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/18.jpg)
Proportional Hazard Assumption
• In a piecewise model, coefficients would differ in non-proportional models
Proportional Non-Proportional
Here, the effect is the same in both time periods
Early Late Early Late
Here, the effect is negative in the early period and positive in the late period
![Page 19: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/19.jpg)
Piecewise Models• Look at coefficients at 2 (or more) spans of timeEARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .4465818 .4255587 1.05 0.294 -.3874979 1.280661 degradation | -.282548 .1572746 -1.80 0.072 -.5908005 .0257045 education | -.0195118 .0328195 -0.59 0.552 -.0838368 .0448131 democracy | .2295673 .2625205 0.87 0.382 -.2849634 .744098 ngo | .6792462 .3110294 2.18 0.029 .0696399 1.288853 ingo | .6664661 .4804229 1.39 0.165 -.2751456 1.608078------------------------------------------------------------------------------LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .4963942 .357739 1.39 0.165 -.2047613 1.19755 degradation | -.5702894 .2395257 -2.38 0.017 -1.039751 -.1008277 education | .0142118 .0143762 0.99 0.323 -.0139649 .0423886 democracy | .2541799 .0981386 2.59 0.010 .0618317 .4465281 ngo | .1742862 .1448187 1.20 0.229 -.1095532 .4581256 ingo | -.1134661 .2104308 -0.54 0.590 -.5259028 .2989707------------------------------------------------------------------------------
Note: Effect of ngo is larger in early period
![Page 20: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/20.jpg)
Proportional Hazard Assumption
• 4. Tests based on re-estimating model• Try including time interactions in your model• Recall: Interactions – effect of A on C varies with B• If effect of variable X on hazard rate (or ratio) varies
with time, then hazards aren’t proportional
– Recall example: Abstinence programs• Perhaps abstinence programs have a big effect initially,
but the effect diminishes (or reverses) later on
![Page 21: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/21.jpg)
Proportional Hazard Assumption
• Red = Abstinence group; green = control
No time interaction Positive timeinteraction
In non-proportional case, the effect of abstinence programs varies across time
![Page 22: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/22.jpg)
Proportional Hazard Assumption
• Strategy: Create variables that reflect the interaction of X variables with time
• Significant effects of time interactions indicate non-proportional hazard
• Fortunately, inclusion of the interaction term in the model corrects the problem.
• Issue: X variables can interact with time in multiple ways…
– Linearly– With “log time” or time squared– With time dummies– You may have to try a range of things…
![Page 23: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/23.jpg)
Proportional Hazard Assumption
• Red = Abstinence group; green = control
Linear time interactionEffect grows consistently over timeTry “Abstinence*time”
Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”
![Page 24: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/24.jpg)
Proportional Hazard Assumption
• 5. Grambsch & Therneau test – Ex: Stata “estat phtest”
• Test for non-zero slope of Schoenfeld residuals vs time– Implies log hazard ratio function = proportional
• Can be applied to general model, or for each variable
stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*)
. estat phtest
Test of proportional hazards assumption
Time: Time ---------------------------------------------------------------- | chi2 df Prob>chi2 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------
Significant chi-square indicates violation of proportional hazard assumption
![Page 25: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/25.jpg)
Proportional Hazard Assumption
• Variable-by-variable test “estat phtest”:
. estat phtest, detail
Test of proportional hazards assumption
Time: Time ---------------------------------------------------------------- | rho chi2 df Prob>chi2 ------------+--------------------------------------------------- gdp | 0.09035 0.63 1 0.4277 degradation | -0.22735 3.41 1 0.0646 education | 0.06915 0.47 1 0.4950 democracy | -0.04929 0.20 1 0.6560 ngo | -0.18691 4.56 1 0.0327 ingo | -0.03759 0.34 1 0.5609 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ----------------------------------------------------------------
Note: Certain variables are especially problematic…
![Page 26: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/26.jpg)
Proportional Hazard Assumption• Notes on estat phtest :
– 1. Requires that you calculate “schoenfeld residuals” when you run the original cox model
– And, if you want a test for each variable, you must also request scaled schoenfeld residuals
– 2. Test is based on identifying non-zero time trend… but how should we characterize time?
• Options: normal/linear time, log time, time dummies, etc– Results may differ depending on your choice– Ex: estat phtest, log – specifies “log time”
• Plot of smoothed Schoenfeld residuals can indicate best way to characterize time
– Linear trend (not a curve) indicates that time is characterized OK– Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)
![Page 27: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/27.jpg)
Proportional Hazard Assumption
• What if the assumption is violated?
• 1. Improve model specification• Add time interactions to address nonproportionality• Ex: If high democracies are not proportional to low
democracies, try adding “highdemoc*time”• Variables can be interacted with linear time, log time,
time dummies, etc., to address the issue
• 2. Model groups separately• Split sample along variables that are non-proportional.
![Page 28: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/28.jpg)
Proportional Hazard Assumption
• What if the assumption is violated?
• 3. Use a stratified Cox model• Allows a different baseline hazard for each group
– But, you can’t estimate effect of stratifying variable!
• Ex: stcox var1 var2 var3, strata(Dhighdemoc)
• 4. Use a piecewise model• Split time into chunks… in which PH assumption is met
– Requires sufficient sample size in all time periods!
![Page 29: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/29.jpg)
Proportional Hazard Assumption
• What if the assumption is violated?
• 5. Live with it (but temper your conclusions)• Violation of proportional hazard assumption tends to:
– Overestimate the effect of variables whose hazard ratios are increasing over time
– And, underestimate those whose hazard ratios are decreasing
• However, Allison points out: Cox model is reasonably robust
– Other issues (e.g., model misspecification) are bigger issues
![Page 30: EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission](https://reader036.vdocuments.us/reader036/viewer/2022062517/56649f2b5503460f94c45679/html5/thumbnails/30.jpg)