local likelihood with time-varying additive hazards model

17
The Canadian Journal OfSmriSrics Vol. 35, No. 2,2007, Wges 321-337 La revue canadienne de ataiisrique 32 1 Local likelihood with time-varying additive hazards model Hui LI, Guosheng YIN and Yong ZHOU Key words undphruses: Additive model; asymptotic normality; censored data; local polynomial, maximum likelihood; nonparametricestimation. MSCZCKX): Primary 62N01; secondary 62N02. Abstract: The authors propose the local likelihood method for the time-varying coefficient additive hazards model. They use the Newton-Raphson algorithm to maximize the likelihood into which a local polynomial expansion has been incorporated. They establish the asymptotic properties for the time-varying coefficient estimators and derive explicit expressions for the variance and bias. The authors present simulation results describing the performance of their approach for finite sample sizes. Their numerical comparisons show the stability and efficiency of the local maximum likelihood estimator. They finally illustrate their proposal with data from a laryngeal cancer clinical study. Cusage de la vraisemblance locale dans le cadre d’un modele additif de risques a coefficients variant dans le temps R b d : Les auteurs proposent l’emploi de la vraisemblance locale dans le cadre d’un modkle additif de risques dont les coefficients sont fonction du temps. Ils font appel B l’algorithme de Newton-Raphson pour maximiser la vraisemblanceaprks y avoir incorpore une expansion polynomiale locale. Ils etudient les proprietes asymptotiquesdes estimateurs des coefficients et en expriment le biais et la variance sous fonne explicite. Ils pdsentent en outre des rhultats de simulationdecrivant la performance de l e u approche dans de petits echantillons. Leurs comparaisons numkriques montrent la stabilite et l’efficacite de l’estimation B vraisemblance locale maximale. Les auteurs illustrent leur propos au moyen de donnees issues d’une etude clinique sur le cancer du larynx. 1. INTRODUCTION In survival analysis, one is usually interested in exploring the relationshipbetween survival time T and a possibly time-dependent covariate vector Z(t) through hazard regression. The propor- tional hazards model (Cox 1972) formulates a multiplicative association between the baseline hazard function A,(t) and the covariates Z(t) via an exponential link function, in the form of ~(t I Z) = ~o(t) exp{PTz(t)). (1) The Cox model is built upon the proportional hazards assumption, which might be violated in many biomedical studies. To study temporal covariate effects, one would allow the regression parameters in model (1) to vary with respect to time rather than remaining constant. Tme- dependent coefficient models provide more flexibility in model fitting and have been extensively investigated by Zucker & Karr (1990); Murphy & Sen (1991); Hastie & Tibshirani (1993); Mmec & Manec (1997); Cai & Sun (2003); Wmnett & Sasieni (2003); Tian, Zucker & Wei (2005) and Martinussen & Scheike (2006), among others. Most of the work has focussed on a Cox-type regression model with time-varying coefficients, by replacing p in (1) with a completely unspecified function p(t). Murphy & Sen (1991) proposed a sieve estimator for the cumulative coefficient function B(t) = r,” p(u) du. Wmnett & Sasieni (2003) studied the residual-based nonparametricestimation under the time-varying coefficient Cox model. Cai & Sun (2003) and ‘lian. Zucker & Wei (2005) proposed a kernel-weighted partial likelihood estima- tion procedure for the time-varying coefficients and a Breslow-type estimator for the cumulative baseline hazard function.

Upload: hui-li

Post on 11-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Local likelihood with time-varying additive hazards model

The Canadian Journal OfSmriSrics Vol. 35, No. 2,2007, Wges 321-337 La revue canadienne de ataiisrique

32 1

Local likelihood with time-varying additive hazards model Hui LI, Guosheng YIN and Yong ZHOU

Key words undphruses: Additive model; asymptotic normality; censored data; local polynomial, maximum likelihood; nonparametric estimation.

MSCZCKX): Primary 62N01; secondary 62N02.

Abstract: The authors propose the local likelihood method for the time-varying coefficient additive hazards model. They use the Newton-Raphson algorithm to maximize the likelihood into which a local polynomial expansion has been incorporated. They establish the asymptotic properties for the time-varying coefficient estimators and derive explicit expressions for the variance and bias. The authors present simulation results describing the performance of their approach for finite sample sizes. Their numerical comparisons show the stability and efficiency of the local maximum likelihood estimator. They finally illustrate their proposal with data from a laryngeal cancer clinical study.

Cusage de la vraisemblance locale dans le cadre d’un modele additif de risques a coefficients variant dans le temps R b d : Les auteurs proposent l’emploi de la vraisemblance locale dans le cadre d’un modkle additif de risques dont les coefficients sont fonction du temps. Ils font appel B l’algorithme de Newton-Raphson pour maximiser la vraisemblance aprks y avoir incorpore une expansion polynomiale locale. Ils etudient les proprietes asymptotiques des estimateurs des coefficients et en expriment le biais et la variance sous fonne explicite. Ils pdsentent en outre des rhultats de simulation decrivant la performance de l e u approche dans de petits echantillons. Leurs comparaisons numkriques montrent la stabilite et l’efficacite de l’estimation B vraisemblance locale maximale. Les auteurs illustrent leur propos au moyen de donnees issues d’une etude clinique sur le cancer du larynx.

1. INTRODUCTION In survival analysis, one is usually interested in exploring the relationship between survival time T and a possibly time-dependent covariate vector Z ( t ) through hazard regression. The propor- tional hazards model (Cox 1972) formulates a multiplicative association between the baseline hazard function A,(t) and the covariates Z ( t ) via an exponential link function, in the form of

~ ( t I Z) = ~ o ( t ) exp{PTz(t)). (1)

The Cox model is built upon the proportional hazards assumption, which might be violated in many biomedical studies. To study temporal covariate effects, one would allow the regression parameters in model (1) to vary with respect to time rather than remaining constant. Tme- dependent coefficient models provide more flexibility in model fitting and have been extensively investigated by Zucker & Karr (1990); Murphy & Sen (1991); Hastie & Tibshirani (1993); Mmec & Manec (1997); Cai & Sun (2003); Wmnett & Sasieni (2003); Tian, Zucker & Wei (2005) and Martinussen & Scheike (2006), among others. Most of the work has focussed on a Cox-type regression model with time-varying coefficients, by replacing p in (1) with a completely unspecified function p(t). Murphy & Sen (1991) proposed a sieve estimator for the cumulative coefficient function B(t) = r,” p(u) du. Wmnett & Sasieni (2003) studied the residual-based nonparametric estimation under the time-varying coefficient Cox model. Cai & Sun (2003) and ‘lian. Zucker & Wei (2005) proposed a kernel-weighted partial likelihood estima- tion procedure for the time-varying coefficients and a Breslow-type estimator for the cumulative baseline hazard function.

Page 2: Local likelihood with time-varying additive hazards model

322 Lt, Y IN & ZHOU Vol. 35, No. 2

In some situations, the multiplicative relationship between the baseline hazard and covariates in (1) might not be appropriate. As an alternative, the additive hazards model is capable of providing excessive risk information concerning survival, which defines a linear formulation as

(2)

Lin & Ying (1994) proposed a semiparametric approach and derived a large-sample theory for (2). Both models (1) and (2) are biologically well motivated and have solid statistical jus- tification, and together provide complementary estimates of covariate effects (Breslow & Day 1980, 1987). The risk difference yielded from (2) offers additional survival information beyond the risk ratio in (l), which is particularly important in epidemiology and public health studies.

To characterize the trend of the risk difference over time, we study the time-varying coeffi- cient additive hazards model,

(3) where the Erst component of P(t ) corresponds to the baseline hazard X o ( t ) . Various forms of model (3) have been studied, including fully nonparametric and partially linear semiparametric models (Aalen 1980, 1989; Huffer & McKeague 1991; McKeague & Sasieni 1994, among oth- ers). The estimation is usually focussed on the cumulative coefficient function B(t), based on the least squares method (Andenen, Borgan, Gill & Keiding 1993; Klein & Moeschberger 2003). If one is interested in estimating P(t) . a nonparametric kernel smoothing method needs to be applied to the least squares estimator B(t). A comprehensive overview of dynamic regression models involving the Cox and Aalen structures is given by Martinussen & Scheike (2006).

In the present paper, we propose a local likelihood method for estimating the time-dependent coefficient functions P ( t ) using local linear techniques (Fan, Gijbels & King 1997; Cai & Sun 2003). Under the Cox proportional hazards model, Fan, Lin & Zhou (2006) developed a varying-coefficient model when regression coefficients depend on some exposure variable in- stead of time. Local polynomials and likelihood methods have been extensively discussed in Fan & Gijbels (1996) and Loader (1999). We take a Erst-order Taylor series expansion of P(t ) around each of the prespecified time points t. After the linearization, we maximize a kernel- weighted local hkelihood function to estimate P ( t ) directly.

The rest of ttus article is organized as follows. In Section 2, we introduce notation and the local likelihood function under the time-varying additive hazards model. In Section 3, we present the large-sample theory, including the consistency and asymptotic normalty properties, for the maximum local likelihood estimator. We discuss simulation studies that we conducted to examine the finite sample properties of our method, numerically comparing its performance with that of the least squares estimator in Section 4. We illustrate the new proposal with a real data example in Section 5, and provide concluding remarks in Section 6. Proofs are outlined in the Appendix.

X(t I Z ) = Xo(t)+PTZ(t).

x t I Z) = PT ( W t ) ,

2. LOCAL LIKELIHOOD For the ith subject (i = 1,. . . , n), we assume that the failure time Ti is conditionally inde- pendent of the right-censoring time Ci given the p x 1 covariate vector Zi(t) which may be external time-dependent covariates as defined in Kalbfleisch & Prentice (2002). The first com- ponent of Z t ( t ) is 1 corresponding to the baseline hazard. Let Xi = min(Ti, C,) and Ai = I(Ti 5 Ci) denote the observed time and censoring indicator, respectively. Assume that the triplets {(Xi, A,, Zi ( t ) ) , i = 1, . . . , n} are independent and identically distributed.

We define the counting process Ni(t) = I ( X i 5 t ,Ai = 1) and the at-risk process K ( t ) = I ( X i 2 t ) . Let the filtration F+ be all the data accrued up to time t: Fn,t = a{Ni(u) ,K(u) ,Z,(u) , 0 5 u 5 t , i = 1,. , , ,n}. Then

~ i ( t ) = N i ( t ) - Y , ( U ) ~ ~ ( U ) Z ~ ( U ) du, I”

Page 3: Local likelihood with time-varying additive hazards model

2007 ADDITIVE HAZARDS MODEL 323

is an F+rnartingale. The logarithm of the likelihood function under model (3) is given by

where T > 0 represents the end time of a study. The functional form of P( t ) is completely unspecified in our development, and thus it will be estimated nonparametrically. We employ local polynomial techniques to estimate p(t) at prespecified time points. Suppose that each component of the vector P(t) is smooth, so that the first and second derivatives P’(t) and P”(t) exist and are continuous. Then

P(.) = P ( t ) + P’(t)(u - t ) , (5)

for u in a small neighbourhood oft. Let ( ( t ) = (PT( t ) , (p’(t))T)T, and omit the dependence of <( t ) on t whenever doing so will not cause ambiguity. For ease of exposition, let < be the running parameter, e0 = {&(t) , (pb(t))T}T be the true value, and i be the maximum local likelihood estimator. Substituting the local linearization equation (5) into (4). we then obtain the logarithm of the local likelihood by incorporating a kernel smoothing function as a weight,

(6)

where K ( . ) is a kernel function, Kh( . ) = K ( . / h ) / h and h > 0 is a bandwidth. Let Zi(u,u - t) = ( l , ~ - t ) ~ @ Z ~ ( u ) , wheregis theKroneckerproduct. Thus,(6)canberewritten - as

If we denote i(t) as the maximizer of (7), then the maximum local likelihood estimator &t) is the vector of the first p components of&).

The Hessian matrix of tn (<, t) is given by

where aB2 = aaT. Clearly, t:(& t) is negative definite, as n -+ 00, and thus tn(& t) is strictly concave, which guarantees a unique maximum for each fixed t. To obtain a consistent estimator for to, we maximize the local likelihood function &(<, t) with respect to <, while fixing t based on the Newton-Raphson iterative algorithm.

3. LARGE SAMPLE THEORY

andvj = Ju jKz(u)dufor j =O,land2. ForamatrixAwithelementsaij,definellAll = laijl, and for a vector a, la1 = supi tail. Let J(+) be the E-neighbourhood oft. Define

H = diag(1,h) @ Ip, where Ip is a p x p identity matrix, 0, = diag(1,pZ) and a, =

In the Appendix we present the regularity conditions under which we establish the consis-

Let P (t I Z(t)) = P (X 2 t I Z(t)), p(t I Z(t)) = P (t I Z(t))/A(t I Z(t)), pj = J d K ( u ) du

diag (210, 212).

tency and asymptotic normality of the local likelihood estimator,

Page 4: Local likelihood with time-varying additive hazards model

324 LI, YIN & ZHOU Vol. 35, No. 2

THEOREM 1. Under Conditions (1)-(4), there exists a unique solution { ( t ) that maximizes the local likelihoodfunction such that

H { i ( t ) - t o @ ) } 3 0,

P where - denotes convergence in probability. 8 in addition, assuming that Condition (5 ) is satis$ed, we have

We obtain a consistent estimator $(t) of Po(t ) as the first p elements of i ( t ) . We can then estimate X ( t I Z i ( t ) ) by i ( t I Z i ( t ) ) = p T ( t ) Z i ( t ) . From Theorem 1, we can show that if Con- ditions (1)-(5) are satisfied, then

sup I i ( t I Zg(t)) - X ( t I Zi( t ) ) l 2 0, i = 1, . . . ,n. tE [0,71

THEOREM 2 Suppose that Conditions (1H6) are satisfied, then fo r t E [h, T - h],

6 {H(i(t) - t o ( t ) ) - D-l(t)b(t)} 2 N(0, D-'(t)r(t)D-'(t)), (9)

V where - denotes convergence in distribution, D(t) = Rp @ X ( t ) , r(t) = R, @ X ( t ) ,

1 b(t) = ~ h ~ p 2 ( 1 , 0 ) ~ @ ( X ( t ) & ( t ) } and X ( t ) = E{p(t I Z ( t ) ) Z B 2 ( t ) } .

Under Conditions (1)-(6), for the first p components in (9), we have that

We can obtain the asymptotic property of the estimator at the boundary as follows. Without loss of generality, we consider the left boundary ch, 0 < c < 1, while the property of the right boundary T - ch can be derived similarly. Let p(O+ I Z(Of)) = P ( O f I Z(O*))/X(O+ [ Z(O+)) , p; = J-, +m d K ( u ) ' du, v; = J-',p" u W ( u ) du,

Under Conditions (1H6), we have that

Jnh {H(i(ch) - to(&)) - DF'b,)} 5 N ( 0 , DL1l?cDF1),

where D, = R i @ X,, rc = Rz @ Ec,

1 b, = 5h2p5(l,0)T @ {EcP{(Of)} and Xc = E {p(O+ 1Z(O+))ZB2(O+)}.

The proof of the boundary effect property for the estimator is omitted due to the similarity to that for Theorem 2. The bias and variance of Hi(&) can be consistently estimated by 6;'(t)b,(t)

Page 5: Local likelihood with time-varying additive hazards model

2007 ADDITIVE HAZARDS MODEL 325

and (nh) - ' 6; ( t ) f n ( t )D, ' (t ), where

THEOREM 3. Under Conditions (1)-(6),

6L'( t )bn(t) % D-'( t )b( t ) ,

and 6;'(t)fn (t)6;' ( t ) % D-'(t)J?(t)D-'(t).

The variance of P ( t ) can be consistently estimated by (nh)-'&,'(t). Under Condi- tions (1)-(6), we have that

g n ( t ) 5 ~ ( t ) ,

For varyingcoefficient models, it is more informative to base inferences on the simultaneous confidence bands rather than the pointwise intervals. For the estimated coefficients, we choose a time interval, say [dl, 41 c [h,r - h]. Based on a resampling method, one can derive a large-sample approximation to the distribution of

sj = sup ??j(t)l&t) - - < O j ( t ) I , j = 1,. . . , 2p , (10) tE[di,dal

where $ j ( t ) is a possibly data-dependent, positive weight function that converges uniformly to a deterministic function. Let ca,j 0' = 1,. . . , 2p ) be the lOO(1 - a)th percentile of this ap- proximation distribution, with 0 c (I! < 1. Then, the 1 - (I! confidencz band for &j (t) is given bY

&(t) f ca,j??;'(t), dl 5 t 5 d2, j = 1,. . . ,2p.

By the conclusion of Theorem 2, we can show that if h = 0 ( n p r ) with 1/5 < r < 1, then the first derivative of the likelihood function is

Following an argument similar to that of man, Zucher & Wei (2005), we consider a stochastic perturbation of (11) through replacing M;(u) by Ni(u)Gi+

Page 6: Local likelihood with time-varying additive hazards model

326 LI, YIN & ZHOU Vol. 35, No. 2

where {Gi, i = 1 , . . . , n} is a random sample from the standard normal distribution and is independent of the data {(Xi, Ai, Zi (t ) ), i = 1, . . . , n}. Then, the standardized distribution

s = sup l ~ ( t ) { ~ ~ ( ~ , t ) } - l ~ ~ ( ~ , t ) l tE[di,dzl

can be used to approximate the distribution of its counterpart S = (Sl, . . . , Szp)T defined by (lo), where G ( t ) = diag (61 ( t ) , . . . , i&,(t)).

4. SIMULATION STUDIES We carry out simulation studies to examine the performance of the proposed local likelihood estimators for finite sample sizes. We use the Gaussian kernel function, and evenly choose 16 points along the time axis, i.e., ngrid = 16. We take a local linear expansion of each time-varying coefficient function around these 16 prespecified time points, at which the coefficient functions are evaluated. First, we consider the model given by

q t 121) = Ao(t ) + Pl(t)Zl, (12)

where &(t) = 1.2 - sin(t) and = 0.5 + (t - 1)2. The covariate Z1 is generated from a uniform distribution on [0,1]. The failure times are simulated from model (12), and the cen- soring time is min(.r, C), where C is independently generated from a uniform distribution on [7 /2 ,3~/2] . The study ending time T is chosen to yield a desirable censoring rate, e.g., when T = 1.7, we obtain a censoring percentage of 25%. We take sample sizes n = 300 and n = 500, and perform 500 simulations for each bandwidth h. One method for the bandwidth selection is based on the K-fold cross-validation (see for example, Hoover, Rice, Wu & Yang 1998). We divide the data into K equal-sized subgroups denoted by Dk (k = 1, . . . , K). The kth prediction error is given by

where mw)) = l tY,(u)P(-k)(u)z,(u) -T du,

and &,(u) is estimated using the data from all the subgroups other than Dk. The optimal bandwidth can be obtained by minimizing the total prediction error, PE (h) = xf=l PEk(h), with respect to h.

The least squares method cannot estimate &(t) and &(t) directly; instead, it yields the esti- mates for the cumulative coefficient functions, A&) = X o ( u ) du and Bj(t ) = s,” Pj(u) du. Thus, a nonparametric kernel smoothing estimation is needed to estimate the baseline hazard Xo (t ) and the excessive hazard differences characterized by Pj (t). If we let b denote the band- width, then after obtaining the least squares estimators &(t) and 6j (u), we have

In Table 1, we present the simulation results with h = 0.06 up to 0.15, for t o = 0.3,0.8 and 1.3, which approximately correspond to the 2Oth, 50th and 80th percentiles of the failure times. We show the true bias of P(t) in the column of “bias”, and the empirical bias of B(t) - Po(t ) in “ b z ’ . We can see that the local likelihood estimators are biased and the bias generally increases as the bandwidth increases. The estimators @t) are biased upward, and the empirical and the true biases match reasonably well, particularly for n = 500.

Page 7: Local likelihood with time-varying additive hazards model

2007 ADDITIVE HAZARDS MODEL

TABLE 1: Estimation for the time-varying coefficient P( t ) under model (12) based on the maximum local likelihood procedure with 25% censoring and sample sizes of 300 and 500.

X o ( t ) = 1.2 - sin(t) pl(t) = 0.5 + ( t - 1)’

7t h to bias b% SD SE CR(%) bias b% SD SE CR(%)

300 -06 0.3 0.8 1.3

.08 0.3 0.8 1.3

.10 0.3 0.8 1.3

.12 0.3 0.8 1.3

.15 0.3 0.8 1.3

500 .06 0.3 0.8 1.3

.08 0.3 0.8 1.3

.10 0.3 0.8 1.3

.12 0.3 0.8 1.3

.15 0.3 0.8 1.3

.002

.005 007 .004 .009 .012 .006 .014 .019 .009 .021 .028

.013

.032

.043

.002

.005

.007

.004

.009

.012

.006

.014

.019

.m

.021

.028

.013

.032

.043

.000 -.008 .003 .005

.001

.001

.006

.008

.006

.006

.016

.009

.008

.024

.013

.002

.008 -.005 .004 .013 .001 .004 .021 .006 .006 .027 .010 .008 .032 .013

.217 .224

.219 .203

.217 .231

.190 .200

.185 .176

.194 .199

.177 .189

.163 .157

.182 .184

.172 .183

.146 .142

.175 .175

.167 .178

.127 .125

.170 .167

.180 .174

.164 .159

.164 .166

.157 .155

.141 .137

.146 .145

.146 .146

.124 .123

.134 .134

.140 .142

.112 .111

.128 .127

.139 .138

.lo1 .097

.138 .120

95.2 91.4 95.6 96.4 93 .O 94.0 96.0 94.2 95.8 95.8 93.8 97.2 96.2 95.8 97.8

93.6 93.6 94.8 94.6 93.8 95.4 95.6 93.8 96.6 96.2 94.6 96.2 95.8 94.8 92.2

.014 .036 .455

.014 .027 ,442

.014 .029 .488

.026 .039 .397

.026 .038 .372

.026 .049 .435

.040 .049 .370

.040 .056 .327

.040 .055 .411

.057 .053 .360

.057 .076 .294

.057 .062 .398

.090 .060 .349

.090 .111 .261

.090 .063 .385

.014 .019 .349

.014 .021 .349

.014 .038 .374

.026 .027 .308

.026 .031 .302

.026 .050 .336

. O N .039 286

.040 .043 .265

.040 .055 .314

.057 .042 .272

.057 .062 .239

.057 .057 .303

.090 .060 .279

.090 .092 .213

.090 .055 .307

.452

.437

.547

.389

.378

.476

.378

.337

.440

.366

.304

.420

.357

.267

.402

.349

.339

.a0

.310

.293

.352

.291

.261

.327

.282 235 .311 .277 .205 .293

95.2 94.6 96.4 94.8 94.8 96.2 94.6 95.2 97.0 94.6 94.6 97.2 95.4 94.0 97.0

94.6 93.4 96.6 94.8 94.2 96.4 94.8 94.4 95.8 95.2 93.2 95.8 95.2 92.0 95.4

The average estimated standard errors (SE) based on the asymptotic distribution are close to the standard deviations (SD), which indicates a good performance of the variance estimator. The 95% confidence interval coverage rates (CR%) are close to the nominal level. As expected. with a relatively large bandwidth, the bias is large and the variance is small, indicating a made-off between bias and variance. As the sample size increases from 300 to 500, the variance decreases accordingly. In Figure 1, we present the average of the estimated time-varying coefficients over 500 simulations based on the maximum local likelihood (MLL) method, coupled with the 95% confidence intervals using the average of the estimated standard errors. We also exhibit the true functions and estimated curves based on the ordinary least squares (OLS) method. The MLL and

Page 8: Local likelihood with time-varying additive hazards model

328 LI, YIN & ZHOU Vol. 35, No. 2

OLS estimation methods produce very similar results. The estimated curves and the m e curves are closely matched. The confidence intervals are nmow in the middle, and wide at the two tails due to a lack of data.

(a) Estimated curves of ho(t) (b) Estimated curves of p,(t)

0 0.5 1 1.5 t

0 0.5 1 1.5 t

FIGURE 1: Estimation of h ( t ) and P l ( t ) with n = 300 and 25% censoring under model (12). The true functions are solid lines; the maximum local likelihood estimates are dashdotted lines (h = 0.06); the 95% confidence intervals are dotted lines; and the least squares estimates are dashed lines ( b = 0.04).

In the second part of the simulation studies, we consider a model with a mixture of a time- varying coefficient and a constant regression coefficient,

A(t IZl,Z2) = Ao(t> + Pl(tW1 + P2Z2, (13)

where A,(t) = 0.7t, pl(t) = (1 - t ) 2 and P2 = 0.8. The covariate 21 is generated from a uniform distribution on (0,1] and 2 2 is a Bernoulli random variable with probability 0.5. We take n = 300 and generate censoring times from a uniform distribution to yield a 25% censoring rate. We conduct 500 simulations and summarize the results in Table 2 and Figure 2. The point estimates are reasonably close to the m e values, particularly in the middle range of the time axis. The variance formula based on the asymptotic approximation provides a fairly good estimate of the variability. The estimated curves closely match the true curves, indicating the well-behaved empirical performance of our method.

(a) Estimated curves of h,(t) (b) Estimated curves of p,(t) (c) Estimated curves of p,(t) 2.51-

2

1.5

1

0.5

0

-0.5

-1 - 0 0.5 1 1.5 0 0.5 1 1 .I

t t

.

0.5 1 1 .! t

FIGURE 2: Estimation of Xo(t), &(t) and &(t) with n = 300 and 25% censoring under model (13). The true functions are solid lines; the maximum local likelihood estimates are dashdotted lines (h = 0.06);

the 95% confidence intervals are dotted lines; and the least squares estimates are dashed lines (b = 0.04).

Page 9: Local likelihood with time-varying additive hazards model

ADDITIVE HAZARDS MODEL 329

TABLE 2: Estimation for the time-varying coefficient P(t) under model (13) based on the maximum local likelihood procedure with 25% censoring and n = 300.

A,( t ) = 0.n P l ( t ) = (1 - tIZ P z ( t ) = 0.8

h t o bias b G SD SE CR(%) bias b% SD SE CR(%) bias b% SD SE CR(%)

.06 0.3 0 0.8 0 1.3 0

.08 0.3 0 0.8 0 1.3 0

.10 0.3 0 0.8 0 1.3 0

.12 0.3 0 0.8 0 1.3 0

.15 0.3 0 0.8 0 1.3 0

-.M1 .011 ,017

-.M4 .010 .010

--.025 .007 .M)l

-.030 .004 .000

-.032 .003

--.015

.170

.220

.394

.154

.188

.344

.145 ,168 .321 .143 .150 3 9 .142 .136 .283

.140 92.0

.210 92.6

.378 91.2

.124 91.8 ,184 94.8 .337 92.6 .116 93.2 .169 94.4 .313 924 .111 92.0 .158 96.2 ,296 93.0 .lo8 91.0 .147 96.4 .273 93.4

.014

.014 ,014 -

.026

.026

.026 ,040 .040 ,040 .057 .057 .057 .090 .090 .090

,056 ,308 ,278 94.2 0 '002 .402 .369 93.2 0

-.012 .751 681 92.8 0 .072 ,281 .248 93.2 0 .017 ,345 .324 94.2 0 .015 ,654 606 92.8 0 .083 ,265 .234 92.8 0 .038 ,302 .297 94.8 0 .035 .593 ,563 93.4 0 .094 ,259 .226 93.2 0 .057 .267 .278 96.0 0 .030 ,738 532 92.8 0 .lo4 ,256 .220 92.0 0 .078 ,237 2.58 96.6 0 .131 ,532 .489 91.8 0

.010

.OW

.070

.015 ,012 .059 ,018 .013 .049 . O M .011 .041 .016 .008

-.022

.201

.285 614 .179 .254 S17 .166 .222 '487 ,159 .207 ,564 .153 .190 .409

.190 94.2

.27a 92.6 353 94.0 .170 92.6

.493 94.8

.159 93.0

.228 95.2 '457 94.4 ,152 93.0 ,213 95.0 .430 94.0 .148 94.0 .199 96.8 .398 93.8

.24a 93.4

To compare the estimation efficiency between the proposed local likelihood and the least squares methods, we compute the joint mean squared error (MSE) for &(t) and &t),

where { t k , k = 1, . . . , ngrid} are the prespecified grid points. We also carry out the weighted least squares ( W L S ) estimation (Huffer & McKeague 1991), where the weights require the non- parametric kernel smoothing estimation. Table 3 summarizes the numerical comparisons of the mean squared emrs of the estimation according to the maximum local likelihood, ordinary and weighted least squares, with n = 300 under models (12) and (13). respectively. For a wide range of bandwidths, the MLL, OLS and WLS procedures yield similar MSEs. The simulation results indicate that WLS is generally more efficient than OLS. With s m a l l sample sizes, the kernel-based estimate of the baseline hazard in the WLS might not be numerically stable.

TABLE 3: Comparison of mean squared errors (MSE) for the maximum local likelihood (MLL), ordinary least squares (OLS) and weighted least squares (WLS) methods, with n = 300 and 25% censoring.

MSE under Model (1 3) MSE under Model (14)

h MLL b OLS WLS h MLL b OLS WLS

.06 .00232 .02 DO234 .00189 .06 .00346 .02 .00348 .00310

.08 .00316 .04 .00444 .00395 .08 .00382 .04 .00400 .00377

.I0 .00356 .06 .01383 .01339 .I0 .00463 .06 .01148 .01171

5. EXAMPLE As an illustration, we apply our proposed method to a data set from a laryngeal cancer study (Klein & Moeschkrger 2003). The larynx, or voice box, is an area of the throat that contains an intricate mixture of cartilage and muscles. It houses the vocal cords that act as the sound

Page 10: Local likelihood with time-varying additive hazards model

330 LI, YIN & ZHOU Vol. 35, No. 2

source for speech, and also performs other complex functions such as protecting the airway to the lungs during swallowing. Laryngeal cancer occurs when cells in the lining of the throat grow uncontrollably and form tumors, which can then invade normal tissue and spread to other parts of the body. The data in our analysis involves 90 male patients diagnosed with cancer of the larynx, with their survival times recorded in years from the initial treatment to death or loss of follow-up. The censoring percentage of the data is 44.4%. Covariates of interest include the stage of the disease (stage I: 37%, 11: 19%, 111: 30% and IV: 14%) and age (ranging from 41 to 86 with a mean of 64.6 years). We recategorize the tumor stage into three indicator variables, with stage I as the baseline, and subtract the mean from the ages of the patients.

In order to examine the time trend of the covariate effects, we Et the following time-varying additive hazards model,

We choose the bandwidth h = 1.4 based on the K-fold cross-validation method. We use the Gaussian kernel function and take 7agrid = 104. Based on our local lrkelihood approach, we ob- tain the estimated baseline hazard function, time-varying coefficient functions, the corresponding 95% pointwise confidence intervals and the 95% simultaneous confidence bands, as given in Fig- ure 3(a)-(e). In these plots, we also present the least squares estimates and the constant estimates of the coefficients when assuming the coefficients are time-invariant. The baseline hazard func- tion has an interesting pattern, which increases during the Erst Eve years and becomes flat for two years, then decreases afterwards, i.e., an umbrella-shaped baseline hazard. The three indicator covariates of the tumor stage add excessive risk to patients. Stage I1 of the disease does not sig- nificantly increase the hazard over stage I, even though it has an increasing pattern over the entire follow-up; stage I11 adds a U-shaped excessive risk trend over time (during the earlier follow-up, there are significant hazard differences); and stage IV shows a steady increment of hazard which slowly increases after Eve years. Although younger patients are more likely to survive, the age effect increases the hazard over time, and then starts dropping after seven years of follow-up. In other words, the risk difference between younger and older patients shrinks if older patients can survive long enough. Figure 3(Q shows the estimated survival functions under our model (both the MLL and OLS estimates) averaged over all the subjects, and the usual Kaplan-Meier esti- mator. Since the estimated marginal survival curves are quite close, it confirms that the proposed model Ets the data adequately and the estimation methods behave well. The interpretations of the excessive risk dfferences are intuitive and meaningful, which may help us to understand the temporal effects of covariates.

6. DISCUSSION We have proposed a maximum local likelihood method for the time-varying coefficient additive hazards model. The model incorporates varying coefficients to exhibit the trend of covariate ef- fects over time. It is a very flexible modelling structure that can capture temporal patterns of covariate effects and provide excessive risk information. As opposed to a one-step procedure (Fan & Chen 1999), the local likelihood is maximized through the Newton-Raphson iterative al- gorithm until the prespeciEed convergence criteria are met. It is possible to generalize the current setting to the semiparametric model, where some covariate effects are time-varying and others are constant. To assess whether a coefficient is time-varying or invariant, a straightforward way is to examine whether the 95% confidence band completely covers the constant estimate of the coefficient. More objective modelchecking methods and survival prediction can be developed along the lines of those in Tian, Zucher & Wei (2005). An alternative method for checking the constancy of the time-varying covariate effect is based on the maximal deviation test, which is available in the R package “timereg” by Martinussen & Scheike (2006).

Page 11: Local likelihood with time-varying additive hazards model

2007 ADDITIVE HAZARDS MODEL 331

(a) Estimated curves of stage I I 0.61 1

t (years)

(c) Estimated curves of stage IV

-0 2 4 6 8 1 0 t (years)

(0) Estimated curves of A&t)

-0.4 - 0 2 4 6 8 1 0

t (years)

(b) Estimated curves of stage 111 0.6 I 1

I I 0 2 4 6 6 1 0

t (years)

(d) Estimated curves of age

t (years)

(f) Estimated curves of survival function

0.2 0 2 4 6 8 1 0

t (years)

FIGURE 3: Estimation of time-varying coefficient functions for the laryngeal cancer data, using h = 1.4. Panels (aHe): MLL estimates (solid lines), corresponding pointwise 95% confidence intervals (dashed

lines), simultaneous 95% confidence bands (dotted lines); OLS estimates (thick dashed lines); and estimated constant parameters (horizontal dashdotted lines). Panel ( f ) contains the Kaplan-Meier

estimator (the solid line), the MLL estimated survival function averaged over all the covariate values (the dashed line), and the OLS estimated average survival function (the dash-doaed line).

The traditional least squares method focuses on estimating the cumulative coefficient func- tion over time. Hence, one needs to use the kernel smoothing method to estimate the coefficient itself, which can be adaptively obtained with different kernel functions and bandwidths for dif- ferent coefficients. In contrast, by maximizing the local likelihood function, we can directly estimate the varying coefficients, which provides a coherent and efficient estimating procedure.

Page 12: Local likelihood with time-varying additive hazards model

332 LI, YIN & ZHOU Vol. 35, No. 2

However, the time-varying coefficients in the local likelihood share a common kernel function and bandwidth. The least squares estimator is unbiased, while the bias of the local hkelihood estimator is explicitly given. The direct comparisons of estimation efficiency between the max- imum local likelihood and the least squares methods might not be completely fair, as the two approaches have different focuses of estimation. In real applications, if one is interested in B(t), then the least squares method is desirable, while if P( t ) is of primary interest, then the MLL method may be recommended.

APPENDIX Throughout the paper, we need to impose the following conditions:

1. The kernel function K ( . ) 2 0 is a bounded and symmetric density with a compact bounded support, e.g., [-1,1].

2. The coefficient functions &(t) have continuous second derivatives for t E [0, T ] ,

3. The conditional probability P (t I Z ( t ) ) is continuous with respect tot and E lZ(t)I2 < 00.

4. As n 4 co, nh/ log n 4 00, and nh5 is bounded.

5 . The function X(uIZ(u)) is bounded away from 0 for u E [ O , T ] , i.e., infuEJ(,,,) X(u I Z(u)) > 0. There exists a random variable V, such that E ( V 2 ) < 00,

X ( t I Z ( t ) ) < 00.

and SUPUEJ(,,,) IZ(u)l I v. 6. Fort E [0,7], X ( t ) is positive definite and continuous with respect tot.

We first state a useful lemma.

The proof of Lemma 1 is omitted as it is analogous to that of Fan, Lin & Zhou (2006, Lemma A. 1).

Proof of Theorem 1. Let a = H(E - E,,) and 8 = H ( i - to). It follows from (7) that

It is easy to obtain & that maximizes (14) with respect to a. Noting that

dMi( t ) = dNi( t ) - Y,( t )&( t )Zi ( t ) d t ,

Page 13: Local likelihood with time-varying additive hazards model

2007 ADDITIVE HAZARDS MODEL 333

By Lemma 1 and Conditions (1)-(4), we have

The second partial derivative of A(a, t) with respect to a is

Clearly, (16) is negative definite, and thus A(a, t) is strictly concave with a maximum at a = 0. The process Bn(a, t) is a local square integrable martingale with the predictable variation

process, for 0 5 s 5 T,

(%(a, t ) , &(a, t ) ) ( s ) =

By Lemma 1 and Conditions (1)-(4), we obtain

Therefore, (17) in conjunction with (15) implies that

Since & maximizes the concave function t, (a, t) - tn (0, t ) , by the concavity lemma (An- dersen & Gill 1982), we obtain

thus &t) - Po(t) + 0 in probability,

that

P 6 = H(t - &) - 0,

Furthennore, by applying the Lenglart inequality (Fleming & Harrington 1991), we can show

where a = H(< - to) and 8 is a convex and compact set of R2p. By Lemma A.3 of Fan, Lin & Zhou (2006), we can show that

Page 14: Local likelihood with time-varying additive hazards model

334 LI, YIN & ZHOU Vol. 35, No. 2

Therefore, the proof of Theorem 1 is complete. 0

Proof of Theorem 2. The first partial derivative of (14) with respect to a is

n

Taking a = 0 and plugging in Ni(t), we have CA(0, t) = Qn(t ) + Un(t ) , where

- n-lg JdT Kh(u - t)x(u)@i(u, u - t ) du, i= 1

Note that the process U;(t) = &Un(t) is a local square integrable martingale with the predictable variation process, for 0 I s 5 T,

By Lemma 1 and Conditions (1)-(6), we have

rn(t) = (u:(t),U:(t))(~) = 0, B E { p ( t I Z(t ) )Zm2( t ) } + o P ( l ) = R, B x(t) + oP(1). Denote the jrh element of U; (t) as

where

Zaj(u) is the jth component of Zi(u) and Ij = 0 if 1 5 j 5 p , and 1 if p + 1 I j I 2p. To prove the asymptotic normality, it is sufficient to examine the Lindeberg-type condition, for all E > 0,

As n becomes large enough, the set indicator in (18) is empty since Kh (u - t ) Hij (a) is bounded. Thus by the martingale central limit theory (Fleming & Harrington 1991) we have

m U , ( t ) 5 N(0, R, B x(t)). (19)

Now, we establish the asymptotic property of Qn(t). Note that

1 P,'(u)Zi(u) = € , T % ~ ( u , u - t ) + ~ ( P i ( t ) ) ' Z i ( u ) ( ~ - t ) 2 + O p ( h 2 ) .

Page 15: Local likelihood with time-varying additive hazards model

2007 ADDITIVE HAZARDS MODEL 335

By Conditions (1)-(6) and Lemma 1, we have

By the Taylor expansion of t: (a, t) in the neighbourhood of 0. for any &* + 0 in probability, )) A *

&(a , t ) = t i ( o , t ) + ~ p ( l ) = - G n ( t ) - D , ( t ) + ~ p ( l ) ,

where

Obviously, E {Gn( t ) } = 0 since {Mi(u) , i = 1,. . . , n} are martingales. By Lemma 1, we obtain that Gn(t) 3 0, as n -, 00. Also, by Conditions (1)-(6) and Lemma 1, we have

D n ( t ) = a p @ E ( t ) + ~ p ( l ) =D(t)+op(l)*

)) A*

One can show that

Since 8 maximizes ln(&, t ) , it follows from the Taylor expansion around 0 that ln(a , t ) = -D(t) +op( l ) .

11 A * 0 = tA(&,t) = tA(0,t) + ln (a , t )&,

& = -{ln (a , t)}-Q?:(o, t ) .

where 8* lies between 0 and 8, and hence 8* + 0 in probability. Thus 1' A*

A * Sinceti(0,t) -b(t) = U n ( t ) + o p ( l ) a n d t n ( a , t) = -D(t)+o,(l),itiseasytoobtain

& - D-l(t)b(t) = - { l ~ ( 8 * , t ) } - l { l ~ ( O , t ) - b(t)} + oP(l).

Noting (19) and (20) and the conclusion of Theorem 1, one can then show that

&{li - D-l(t)b(t)} % N(O,D-l(t){a,~ E(t)}D-l(t)).

Thus the proof of Theorem 2 is complete.

Proof of Theorem 3. Through some algebraic manipulations, we have

IIen(t) - W~) I I

0

It follows from (8) that the first term on the right-hand side of the above equation is op( 1). The second term is negligible by Lemma 1. Therefore, we have Ilen(t) - E(t)ll - 0. P

Page 16: Local likelihood with time-varying additive hazards model

336 Ll, YIN & ZHOU Vol. 35, No. 2

ACKNOWLEDGEMENTS We would like to thank the Editor, an Associate Editor and two referees for their critical and insightful sug- gestions, which led to great improvements in OUT revised manuscript. This research was partially supported by funds from the Physician Referral Service at the University of Texas, M. D. Anderson Cancer Center and a U. S. Department of Defense grant.

REFERENCES

0. 0. Aalen (1980). A model for nonparametric regression analysis of counting processes. Mathematical Statistics and Probability Theory: Proceedings of the Sixth International Conference held in Wisla, December 7-13, 1978 (W. Klonecki, A. Kozek, and J. Rosifiski, eds.), Lecture Notes on Mathematical Statistics and Probability 2, Springer, New York, pp. 1-25.

0. 0. Aalen (1989). A linear regression model for the analysis of lifetimes. Statistics in Medicine, 8,

P. K. Andersen & R. D. Gill (1982). Cox’s regression model for counting processes: A large sample study. The Annals of Statistics, 10, 1100-1 120.

P. K. Andersen, 0. Borgan, R. D. Gill & N. Keiding (1993). Statistical Models Based on Counting Pro- cesses. Springer, New York.

N. E. Breslow & N. E. Day (1980). Statistical Methodr in Cancer Research I , Volume I: The Analysis of Case-Control Studies. IARC Scientific Publication 32, International Agency for Research on Cancer, Lyon.

N. E. Breslow & N. E. Day (1987). Statistical Methods in Cancer Research, Volume II: The Design and Analysis of Cohort Studies. IARC Scientific Publication 82, International Agency for Research on Cancer, Lyon. [Reprint edition, Oxford University Press: April 28, 1994.1

Z. Cai & Y. Sun (2003). Local linear estimation for timedependent coefficients in Cox’s regression models. Scandinavian J o u m l of Statistics, 30, 93-1 1 1.

D. R. Cox (1972). Regression models and life tables (with discussion). Journal oj-the Royal Statistical Society Series B, 34, 187-220.

J . Fan & J. Chen (1999). One-step local quasi-likelihood estimation. Journal of the Royal Statistical Society Series B, 61,927-943.

J . Fan & I. Gijbels (1996). Local Polynomial Modelling and its Applications. Chapman & Hall, London. J. Fan, I. Gijbels & M. King (1997). Local likelihood and local partial likelihood in hazard regression. The

J. Fan, H. Lin & Y. Zhou (2006). Local partial likelihood estimation for life time data. The Annals of

T. R. Fleming & D. P. Harrington (1991). Counting Processes and Survival Analysis. Wiley, New York. T. J. Hastie & R. J. Tibshirani (1993). Varying-coefficient models. Journal of the Royal Statistical Society

D. R. Hoover, J. A. Rice, C. 0. Wu & L. P. Yang (1998). Nonparametric smoothing estimates of time-

F. W. Huffer & I. W. McKeague (1991). Weighted least squares estimation for Aalen’s additive risk model.

J. D. Kalbfleisch & R. L. Prentice (2002). The Statistical Analysis of Failure Time Data, Second edition.

J . P. Klein & M. L. Moeschberger (2003). Survival Analysis: Techniques for Censored and Truncated Data,

D. Y. Lin & Z. Ymg (1994). Semiparametric analysis of the additive risk model. Biometrika, 81,61-71. C. Loader (1999). Local Regression and Likelihood. Springer, New York.

907-925.

Annals of Statistics, 25, 1661-1690.

Statistics, 34,290-325.

Series B, 55,757-796.

varying coefficient models with longitudinal data. Biometrika, 85,809-822.

Journal of the American Statistical Association, 86, 1 1 4-1 29.

Wiley, New York.

Second edition. Springer, New York.

Page 17: Local likelihood with time-varying additive hazards model

2007 ADDITIVE HAZARDS MODEL 337

T. Martinussen & T. H. Scheike (2006). DyMmic Regression Models for Survival Data. Springer, New York.

L. Marzec & P. Marzec (1997). On fitting Cox’s regression model with timedependent coefficients. Biometrika, 84, 901-908.

I. W. McKeague & P. D. Sasieni (1994). A partly parmetric additive risk model. Bwmetrika, 81, 501- 514.

S. A. Murphy & F? K. Sen (1991). Timedependent coefficients in a Cox-type regression model. Stochastic Processes and theirApplications, 39, 153-180.

L. Tian, D. Zucker & L. J. Wei (2005). On the Cox model with time-varying regression coefficients. J o u m l of the Americm Statistical Association, 100,172-183.

A. Wmnett & P. Sasieni (2003). Iterated residuals and time-varying covariate effects in Cox regression. Journal of the Royal Statistical Society Series B, 65,473-488.

D. M. Zucker & A. F. Karr (1990). Nonparametric survival analysis with time-dependent covariate effects: a penalized partial likelihood approach. The Annals of Statistics, 18,329-353.

Received 26 April 2 0 6 Accepted I8 December 2006

Hui LI: [email protected] School of Mathemtical Sciences

Beijing N o r d University Beijing 100875, China

Guosheng YIN: [email protected] D e p a m n t of Biostatistics

M. D. Anderson Cancer Center Houston, l’X 77030, USA

Yong ZHOU: [email protected] Academy of Mathemtics and Systems Science

Chinese Academy of Sciences Beijing IW80, China