More complex event history analysis
Start of Study
End of Study
0
t1
0 = Unemployed; 1 = Working
UNEMPLOYMENT AND RETURNING TO WORK STUDY
Spell or Episode
Start of Study
End of Study
0
t1 t2 t3
0 = Unemployed; 1 = Working
1 1
UNEMPLOYMENT AND RETURNING TO WORK STUDY
0
Start of Study
End of Study
0
t1
0 = Unemployed; 1 = Working
1
UNEMPLOYMENT AND RETURNING TO WORK STUDY
Transition = movement from one state to another
Recurrent events are merely outcomes that can take place on a number of occasions. A simple example is unemployment measured month by month. In any given month an individual can either be employed or unemployed. If we had data for a calendar year we would have twelve discrete outcome measures (i.e. one for each month).
Social scientists now routinely employ statistical models for the analysis of discrete data, most notably logistic and log-linear models, in a wide variety of substantive areas. I believe that the adoption of a recurrent events approach is appealing because it is a logical extension of these models.
Consider a binary outcome or two-state event
0 = Event has not occurred
1 = Event has occurred
In the cross-sectional situation we are used to modelling this with logistic regression.
0 = Unemployed; 1 = Working
UNEMPLOYMENT AND
RETURNING TO WORK STUDY –
A study for six months
Months1 2 3 4 5 6
obs 0 0 0 0 0 0
Constantly unemployed
Months1 2 3 4 5 6
obs 1 1 1 1 1 1
Constantly employed
Months1 2 3 4 5 6
obs 1 0 0 0 0 0
Employed in month 1 then unemployed
Months1 2 3 4 5 6
obs 0 0 0 0 0 1
Unemployed but gets a job in month six
Here we have a binary outcome – so could we simply use logistic regression to model it?
Yes and No – We need to think about this issue.
Appropriate Software
STATISTICAL ANALYSIS FOR BINARY RECURRENT
EVENTS (SABRE)• Fits appropriate models for recurrent events.
• It is like GLIM.
• It can be downloaded free.
www.cas.lancs.ac.uk/software
SABRE fits two models that are appropriate to this analysis.
Model 1 = Pooled Cross-Sectional Logit Model
Think of this as being the same as a logistic regression in any software package.
)'exp(1
)]'[exp()(
it
it
x
yxL
it
itB
POOLED CROSS-SECTIONAL
LOGIT MODEL
x it is a vector of explanatory variables and is a vector of
parameter estimates .
We could fit a pooled cross-sectional model to our recurrent events data.
This approach can be regarded as a naïve solution to our data analysis problem.
We need to consider a number of issues….
MonthsY1 Y2
obs 0 0
Pickle’s tip - In repeated measured analysis
we would require something like a ‘paired’ t test
rather than an ‘independent’ t test because we
can assume that Y1 and Y2 are related.
SABRE fits two models that are appropriate to this analysis.
Model 2 = Random Effects Model
(or logistic mixture model)
Repeated measures data violate an important assumption of conventional regression models.
The responses of an individual at different points in time will not be independent of each other.
This problem has been overcome by the inclusion of an additional, individual-specific error term.
The random effects model extends the pooled cross-sectional model to include a case-specific random error term to account for residual heterogeneity.
For a sequence of outcomes for the ith case, the basic random effects model has the integrated (or marginal likelihood) given by the equation.
df(
)'exp(1
)]'[exp()(
1
it
iti
x
yxL
itT
t
itB
Davies and Pickles (1985) have demonstrated that the failure to explicitly model the effects of residual heterogeneity may cause severe bias in parameter estimates. Using longitudinal data the effects of omitted explanatory variables can be overtly accounted for within the statistical model. This greatly improves the accuracy of the estimated effects of the explanatory variables
An example – see Davies, Elias & Penn (1992).
A study of wive’s employment status.
Y (femp) 0 = wife unemployed
1 = wife employed
X1 (fmune) 0 = husband employed
1 = husband unemployed
X2 (fund1) 0 = no child under 1 year
1 = child under 1 year
Results of various modelsModel X Vars Deviance d.f.
Pooled - 2054 1579
Pooled fmune 1970 1578
Pooled fmune + fund1
1877 1577
Random effects
fmune + fund1
1344 1576
Deviance = 1344.2363 on 1576 residual degrees of freedom
dis e
Parameter Estimate S. Error
___________________________________________________
int 1.5054 0.23772
fmune ( 1) 0.00000E+00 ALIASED [I]
fmune ( 2) -2.2871 0.38153
fund1 ( 1) 0.00000E+00 ALIASED [I]
fund1 ( 2) -2.5752 0.34447
scale 2.2524 0.16565
Random effect
Past BehaviourCurrent
Behaviour
STATE DEPENDENCE
UnemployedEmployed
Employed
MAYAPRIL
STATE DEPENDENCE
MonthsY1 Y2
obs 0 0
Lag Model
1tyγ'β -itx
ACCOUNTS FOR PREVIOUS
OUTCOME (yt-1)
This is called a Lagged model
A Lagged model helps to control for a previous outcome (or behaviour).
Model X Vars Deviance d.f.
Random effects
fmune + fund1
1344 1576
Drop y fmune + fund1
1160 1421
Lag fmune + fund1
823 1420
Results of models – with state dependence
Deviance = 823.21859 on 1420 residual degrees of freedom
Deviance decrease = 336.96811 on 1 residual degree of freedom
dis e
Parameter Estimate S. Error
___________________________________________________
int -1.3695 0.17259
fmune ( 1) 0.00000E+00 ALIASED [I]
fmune ( 2) -1.5287 0.39847
fund1 ( 1) 0.00000E+00 ALIASED [I]
fund1 ( 2) -3.1227 0.35764
lag 4.3046 0.22885
scale 0.50379 0.28180
State dependence can be explored further by the estimation of aa ‘two-state’ MARKOV model.
UnemployedExplanatory
Variables Employed
EmployedExplanatory
Variables
The Model Provides TWO sets of estimates
MAY
APRIL
Results of models – with state dependence Model X Vars Deviance d.f.
Drop y fmune + fund1
1160 1421
Lag fmune + fund1
823 1420
Markov fmune + fund1
803 1417
Parameter Estimate S. Error
___________________________________________________
Unemployed Women at t-1
_______
int -1.5549 0.23159
fmune ( 1) 0.00000E+00 ALIASED [I]
fmune ( 2) -1.9071 0.74901
fund1 ( 1) 0.00000E+00 ALIASED [I]
fund1 ( 2) -1.4606 0.71256
scale 1.2392 0.29000
Employed Women at t-1
_______
int 3.0647 0.17575
fmune ( 1) 0.00000E+00 ALIASED [I]
fmune ( 2) -1.3717 0.50228
fund1 ( 1) 0.00000E+00 ALIASED [I]
fund1 ( 2) -3.4226 0.35791
scale 0.10000E-02 0.28111
SABRE – Good Points
• Fits appropriate models for recurrent events.• It is like GLIM.• It can be downloaded free.• There is a users list.• Uses the deviance to compare models (correct likelihood).• Fits the Markov model.• Fits a range of other models (e.g. loglinear + ordinal).• Can do more advance analysis (e.g. Mover/Stayers).
SABRE – Bad Points
• It is like GLIM – you need to understand a prog. Syntax.
• Data management and handling are poor.• There are few users.
Alternatives to SABRE
• STATA – Does not fit the full range of models.• Multilevel model software – Okay up to a point but check
that the likelihood is correct (complicated).• No software other than SABRE fits a continuation ratio
model (ordinal), Markov model or the mover/stayer.