11. duration modelingpeople.stern.nyu.edu/wgreene/discretechoice/2014/dc2014-11-durat… · build...

35
[Topic 11-Duration Models] 1/35 11. Duration Modeling

Upload: others

Post on 30-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 1/35

11. Duration Modeling

Page 2: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 2/35

Modeling Duration• Time until retirement• Time until business failure• Time until exercise of a warranty• Length of an unemployment spell• Length of time between children• Time between business cycles• Time between wars or civil insurrections• Time between policy changes• Etc.

Page 3: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 3/35

The Hazard Function≥

≤For the random variable t = time until an event occurs, t 0.f(t) = density; F(t) = cdf = Prob[time t]= S(t) = 1-F(t) = survivalProbability of an event occurring at or before time t is F(t)A condit ∆

∆∆

∆−

ional probability: for small > 0,h(t)= Prob(event occurs in time t to t+ | has not already occurred)h(t)= Prob(event occurs in time t to t+ | occurs after time t)

F(t+ )-F(t) =

1 F(t)Consider a ∆ →

∆λ →

∆ −

λ → = ∆λ ≈ ≤ ≤ ∆ ≥

λ

s 0, the functionF(t+ )-F(t) f(t)

(t) = (1 F(t)) S(t)

f(t)(t) the "hazard function" and (t) Prob[time t time+ | time t]

S(t)(t) is a characteristic of the distribution

Page 4: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 4/35

Hazard Function

λ

λ ≥ = − λ − λ = λ λ

t

0

t

0

t

0

Since (t) = f(t)/S(t) = -dlogS(t)/dt,

F(t) = 1 - exp - (s)ds , t 0.

dF(t) / dt exp - (s)ds ( 1) (t)

(t)exp - (s)ds (Leibnitz ' s Theorem)

Thus, F(t) is a function of the ha

λ

zard; S(t) = 1 - F(t) is also, and f(t) = S(t) (t)

Page 5: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 5/35

A Simple Hazard FunctionThe Hazard functionSince f(t) = dF(t)/dt and S(t) = 1-F(t),

f(t) h(t)= =-dlogS(t)/dt

S(t)Simplest Hazard Model - a function with no "memory"(t) = a constant,

f(t)dlogS(t) / dt.

S(t)The second

λ λ

= λ = −

simplest differential equation;dlogS(t) / dt S(t) K exp( t), K = constant of integrationParticular solution requires S(0)=1, so K=1 and S(t)=exp(- t)F(t) = 1-exp(- t) or f(t)= exp( t), t 0. Exponent

= −λ ⇒ = −λλ

λ λ −λ ≥ ial model.

Page 6: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 6/35

Duration Dependence

When d (t)/dt 0, there is 'duration dependence'λ ≠

Page 7: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 7/35

Parametric Models of Duration

λ λ

λ λ λ

λ λ λ + λλ

p-1

p-1 p

There is a large menu of parametric models for survival analysis:Exponential: (t)= ,

Weibull: (t)= p( t) ; p=1 implies exponential,

Loglogistic: (t)= p( t) / [1 ( t) ],Lognormal: (t φ λ Φ λ

λ λ)= [-plog( t)]/ [-plog( t)],

Gompertz: (t)=p exp( t),Gamma: Hazard has no closed form and must be numerically integrated,and so on.

Page 8: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 8/35

Censoring

Most data sets have incomplete observations.Observation is not t, but t* < t. I.e., it is known (expected) that failure takes place after t.

How to build censoring into a survival model?

Page 9: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 9/35

Accelerated Failure Time Models

λ

′λ

′ ′λλ

p-1

(.) becomes a function of covariates.=a set of covariates (characteristics) observed at baseline

Typically,(t| )=h[exp( ),t]

E.g., Weibull: (t| )=exp( )p[exp( )t]E.g., Exponential: (t| )=ex

x

x x

x x xx

β

β β′ ′′ ′

p( )[exp( )t]; f(t| )=exp( )exp[-exp( )t]

x xx x x

β ββ β

Page 10: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 10/35

Proportional Hazards Models

λ = λλ

′λ p p-1

(t | ) g( ) (t)(t) = the 'baseline hazard function'

Weibull: (t| )=pexp( ) (t)None of Loglogistic, F, gamma, lognormal, Gompertz, are proportional hazard models.

x x

x x β

Page 11: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 11/35

ML Estimation of Parametric Models

−d (1 d)

Maximum likelihood is essentially the same as for the tobit modelf(t| ) = densityS(t| ) = survivalFor observed t, combined density is

g(t| )=[f(t| )] [S(t | )]d = 1 if not censored, 0 if censored.

xx

x x x

=

= λ

= λ +∑

dd

n

i i i i ii 1

Rearrange

f(t| )g(t| )= [S(t | )] [ (t | )] S(t | )

S(t | )

logL d log (t | ) logS(t | )

xx x x xx

x x

Page 12: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 12/35

Time Varying Covariates

Hazard function must be defined as a function ofthe covariate path up to time t;(t,X(t)) = ...

Not feasible to model a continuous path of theindividual covariates. Data may be observed atspecific int

λ

1 1 2ervals, [0,t | x(0)),[t , t | x(1)),...

Treat observations as a sequence of observations.Build up hazard path piecewise, with time invariantcovariates in each segment. Treat each interval save for the last as a censored (at both ends) observation.Last observation (interval) might be censored, or not.

Page 13: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 13/35

Unobserved Heterogeneity

λ λ

′λ λ

′ ′λ λ = + ε λ

Typically multiplicative - (t| ,u)=u (t| )

Also typical:(t| ,u)=u [exp( ), t]

In proportional hazards models like Weibull,(t| ,u)=uexp( ) [t] exp( ) [t]

Approaches: Assume

variable with mean 1.x x

x x

x x x

β

β β

θ θ−

ε

θ −θ

f(u), then integrate u out of f(t| ,u). (1) (log)Normally distributed (u), amenable to quadrature(Butler/Moffitt) or simulation based estimation

exp( u)u(2) (very typical). Log-gamma u has f(u)=

x

θ −

θ

Γ θ

λ λ+ θ λ

1

+1 P 1

P 1/

( )

[A(t)] p( t)Produces f(t| )= ,

[1 ( t) ]A(t) = survival function without heterogeneity, for exponential or Weibull.

x

Page 14: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 14/35

Interpretation

• What are the coefficients?• Are there ‘marginal effects?’• What quantities are of interest in the

study?

Page 15: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 15/35

Cox’s Semiparametric Model

′λ = − λ

′= =

′∑s k

i i i 0 i

ii k k

sAll individuals with t T

Cox Proportional Hazard Model(t | ) exp( ) (t )

Conditional probability of exit - with K distinct exit times in the sample:

exp( )Prob[t T | ]

exp( )

(The set of

x x

xX

x

β

ββ

≥s k individuals with t T is the risk set.

Partial likelihood - simple to maximize.

Page 16: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 16/35

Nonparametric Approach• Based simply on counting observations

• K spells = ending times 1,…,K• dj = # spells ending at time tj• mj = # spells censored in interval [tj , tj+1)• rj = # spells in the risk set at time tj = Σ (dj+mj)

• Estimated hazard, h(tj) = dj/rj

• Estimated survival = Πj [1 – h(tj)] (Kaplan-Meier “product limit” estimator)

Page 17: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 17/35

Kennan’s Strike Duration Data

Page 18: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 18/35

Kaplan Meier Survival Function

Page 19: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 19/35

Hazard Rates

Page 20: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 20/35

Kaplan Meier Hazard Function

Page 21: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 21/35

Weibull Accelerated Proportional Hazard Model

+---------------------------------------------+| Loglinear survival model: WEIBULL || Log likelihood function -97.39018 || Number of parameters 3 || Akaike IC= 200.780 Bayes IC= 207.162 |+---------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+

RHS of hazard modelConstant 3.82757279 .15286595 25.039 .0000PROD -10.4301961 3.26398911 -3.196 .0014 .01102306

Ancillary parameters for survivalSigma 1.05191710 .14062354 7.480 .0000

Page 22: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 22/35

Weibull Model+----------------------------------------------------------------+| Parameters of underlying density at data means: || Parameter Estimate Std. Error Confidence Interval || ------------------------------------------------------------ || Lambda .02441 .00358 .0174 to .0314 || P .95065 .12709 .7016 to 1.1997 || Median 27.85629 4.09007 19.8398 to 35.8728 || Percentiles of survival distribution: || Survival .25 .50 .75 .95 || Time 57.75 27.86 11.05 1.80 |+----------------------------------------------------------------+

Page 23: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 23/35

Survival Function

Duration

.20

.40

.60

.80

1.00

.0010 20 30 40 50 60 70 800

Estimated Survival Function for LOGCT

Surv

ival

Page 24: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 24/35

Hazard Function with Positive Duration Dependence for All t

Duration

.0050

.0100

.0150

.0200

.0250

.0300

.0350

.0400

.000010 20 30 40 50 60 70 800

Estimated Hazard Function for LOGCT

Haza

rdFn

Page 25: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 25/35

Loglogistic Model+---------------------------------------------+| Loglinear survival model: LOGISTIC || Dependent variable LOGCT || Log likelihood function -97.53461 |+---------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+

RHS of hazard modelConstant 3.33044203 .17629909 18.891 .0000PROD -10.2462322 3.46610670 -2.956 .0031 .01102306

Ancillary parameters for survivalSigma .78385188 .10475829 7.482 .0000+---------------------------------------------+| Loglinear survival model: WEIBULL || Log likelihood function -97.39018 ||Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+

RHS of hazard modelConstant 3.82757279 .15286595 25.039 .0000PROD -10.4301961 3.26398911 -3.196 .0014 .01102306

Ancillary parameters for survivalSigma 1.05191710 .14062354 7.480 .0000

Page 26: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 26/35

Loglogistic Hazard Model

Page 27: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 27/35

Page 28: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 28/35

Page 29: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 29/35

Page 30: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 30/35

Page 31: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 31/35

Page 32: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 32/35

Page 33: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 33/35

Page 34: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 34/35

Log Baseline Hazards

Page 35: 11. Duration Modelingpeople.stern.nyu.edu/wgreene/DiscreteChoice/2014/DC2014-11-Durat… · Build up hazard path piecewise, with time invariant covariates in each segment. Treat each

[Topic 11-Duration Models] 35/35

Log Baseline Hazards - Heterogeneity