censoring-robust estimation in observational...

28
Censoring-robust estimation in observational survival studies: Assessing the relative effectiveness of vascular access type on patency among end-stage renal disease patients Vinh Q. Nguyen and Daniel L. Gillen Department of Statistics, University of California, Irvine September 20, 2012 Abstract The proportional hazards model is commonly used in observational studies to estimate and test a pre-defined measure of association between a variable of interest and the time to some event T . For example, it has been used to investigate the effect of vascular access type in patency among end-stage renal disease patients (Gibson, Gillen, Caps, Kohler, Sherrard & Stehman-Breen 2001). The measure of association comes in the form of an adjusted hazard ratio as additional covariates are often included in the model to adjust for potential confounding. Despite its flexibility, the model comes with a rather strong assumption that is often not met in practice: a time-invariant effect of the covariates on T . When the proportional hazards assumption is violated, it is well known in the literature that the maximum partial likelihood estimator is consistent for a parameter that is dependent on the observed censoring distribution, leading to a quantity that is difficult to interpret and replicate as censoring is usually not of scientific concern and generally varies from study to study. Solutions have been proposed to remove the censoring-dependence in the two-sample setting but none have addressed the setting of multiple, possibly continuous, covariates. We propose a survival tree approach that identifies group-specific censoring based on adjustment covariates in the primary survival model that fits naturally into the theory developed for the two-sample case. With this methodology, we propose to draw inference on a pre-defined marginal adjusted hazard ratio that is valid and independent of censoring regardless of whether model assumptions hold. 1 Introduction The proportional hazards model (Cox 1972) is popular in the health sciences for the analysis of failure time data since it offers investigators a means to address scientific questions without assuming a full probability distribution on the event time and the ability to incorporate right- censored observations with ease. Although fully parametric assumptions are avoided, it nonetheless carries a particularly strong assumption: the presence of a time-invariant covariate effect on the hazard ratio scale. The assumption is seldom met in the real world for the two-sample case, and is even more susceptible to failure in the observational setting where multiple covariates are included in the model. To illustrate, consider the United States Renal Data System Dialysis Morbidity and Mortality Study Wave 2 as described in Gibson et al. (2001), hereafter referred to as the Vascular Access (VA) study. 1

Upload: others

Post on 25-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Censoring-robust estimation in observational survival studies:

Assessing the relative effectiveness of vascular access type on

patency among end-stage renal disease patients

Vinh Q. Nguyen and Daniel L. GillenDepartment of Statistics, University of California, Irvine

September 20, 2012

Abstract

The proportional hazards model is commonly used in observational studies to estimate andtest a pre-defined measure of association between a variable of interest and the time to someevent T . For example, it has been used to investigate the effect of vascular access type inpatency among end-stage renal disease patients (Gibson, Gillen, Caps, Kohler, Sherrard &Stehman-Breen 2001). The measure of association comes in the form of an adjusted hazardratio as additional covariates are often included in the model to adjust for potential confounding.Despite its flexibility, the model comes with a rather strong assumption that is often not metin practice: a time-invariant effect of the covariates on T . When the proportional hazardsassumption is violated, it is well known in the literature that the maximum partial likelihoodestimator is consistent for a parameter that is dependent on the observed censoring distribution,leading to a quantity that is difficult to interpret and replicate as censoring is usually not ofscientific concern and generally varies from study to study. Solutions have been proposed toremove the censoring-dependence in the two-sample setting but none have addressed the settingof multiple, possibly continuous, covariates. We propose a survival tree approach that identifiesgroup-specific censoring based on adjustment covariates in the primary survival model that fitsnaturally into the theory developed for the two-sample case. With this methodology, we proposeto draw inference on a pre-defined marginal adjusted hazard ratio that is valid and independentof censoring regardless of whether model assumptions hold.

1 Introduction

The proportional hazards model (Cox 1972) is popular in the health sciences for the analysisof failure time data since it offers investigators a means to address scientific questions withoutassuming a full probability distribution on the event time and the ability to incorporate right-censored observations with ease. Although fully parametric assumptions are avoided, it nonethelesscarries a particularly strong assumption: the presence of a time-invariant covariate effect on thehazard ratio scale. The assumption is seldom met in the real world for the two-sample case, and iseven more susceptible to failure in the observational setting where multiple covariates are includedin the model. To illustrate, consider the United States Renal Data System Dialysis Morbidity andMortality Study Wave 2 as described in Gibson et al. (2001), hereafter referred to as the VascularAccess (VA) study.

1

Page 2: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

The VA study was an observational prospective study involving 4,065 incident-hemodialysis andperitoneal-dialysis patients intiating dialysis in 1996 or early 1997 at 799 dialysis facilities acrossthe United States. End stage renal disease (ESRD) is a condition where the filtration performedby the kidneys has been reduced to a point where life can no longer adequately be sustained. It isestimated that more than 300,000 persons in the United States have ESRD, and this number hasbeen steadily rising over the past few decades. The standard of care for patients suffering fromESRD is renal replacement therapy in the form of dialysis or kidney transplantation. Hemodialysisis a technique for removing blood from the patient, cleansing toxins from the blood outside of thebody, and then replacing the blood back into the patient. This process typically takes three to fourhours to complete and persons with ESRD typically undergo hemodialysis treatment three to fourdays a week. Given the frequency and duration of dialysis, it is infeasible to repeatedly insert a newaccess (needle) into the patient’s vein at each dialysis visit. Such frequency would quickly resultin irreparable damage to the vein eliminating a route to remove blood from the patient. As such,a “permanent” access is placed in the patient which remains there until either the access becomesclogged and inoperable or the patient stops dialysis (typically due to transplantation or death).

Despite improvements in permanent access technology, access failure and repair remains a majorproblem in the care of dialysis patients. Repeated interventions to maintain a working access exactan economic toll on the Medicare system (all US citizens undergoing dialysis for treatment ofESRD are covered by Medicare), and the physical and emotional tolls on the patient are equallyburdensome. Today, two main types of dialysis access are in use. The first, which has been in usethe longest and is the cheapest to manufacture and easiest to insert in a patient is the prostheticgraft. The other is the autogenous arteriovenous fistula (AVF) which can be placed in the patientin two different ways: standard attachment to a vein (SA fistula) or as a venous transposition (VTfistula). Venous transposition placements require greater skill to place, longer time to mature, andare needed when veins are small and hard to find (such scenarios often occur in diabetics, smokersand obese individuals). Of primary scientific interest in the VA study was an a priori test of thecomparative effectiveness of the three access types for chronic hemodialysis on the time to accessrevision among patients with end-stage renal disease. Besides this predictor of interest, it was alsonecessary to adjust for additional covariates in the model as the study was observational in nature.

To address this question, one might pre-specify a proportional hazards model for the data anddraw inference on the adjusted hazard ratios comparing groups with different access types; this isa common approach taken by many investigators with similar goals as reflected by the ubiquityof the Cox model in applied research (Stigler 1994). However, as the crossing of the survivivalcurves in Figure 2 suggests, the proportional hazards assumption might not be reasonable whencomparing different access types. One might attempt to incorporate a time-dependent covariateor a shift point for the hazard into the model to address the nonproportionality after observingsuch curves, but doing so would necessitate data-driven modeling that would inherently alter thepre-conceived question of interest expressed in the original model. In this setting, Type I error islikely to be inflated and the generalizability of the results may be questionable as the new model,and hence, scientific question, is chosen on the basis of the observed sample. On the other hand,basing inference on the misspecified model can lead to flawed results as Struthers & Kalbfleisch(1986) have shown that the regression coefficients in this model are consistent to parameters thatdepend upon the observed censoring distribution. As a result, point estimates from a misspecifiedproportional hazards model are influenced by the censoring pattern, a parameter which is usuallynot of scientific interest and may lead to results that can be difficult to replicate (censoring patterns

2

Page 3: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

change from study to study) and an interpretation that is not scientifically meaningful.To address this deficiency, Xu & O’Quigley (2000) and Boyd, Kittelson & Gillen (2012) proposed

estimators based on the weighted score that remove the censoring-dependence in the two-samplesetting for the proportional hazards model. That is, a marginal hazard ratio can be pre-specifiedsuch that the resulting estimator from a modified estimating equation is robust to model misspecifi-cation to the extent that the resulting estimand is scientifically meaningful (weighted hazard ratio)and censoring-independent; the resulting estimand is identical to that if a proportional hazardsmodel was used for inference in the absence of censoring. Their methods consist of estimating thecensoring distribution conditional on the binary group indicator, henceforth denoted as SC(·|Z),and re-weighing observations by the inverse of these estimates at different locations in the scoreequation. The result is a censoring-robust estimator that is the solution to the weighted scoreequation. Inference can be based on the asymptotic distribution of the censoring-robust estimatorwhere the weights are treated as fixed. When Z is a group indicator, estimation of SC(·|Z) isstraightforward by applying the left-continuous Kaplan-Meier estimator to the censoring times ofeach group. However, when multiple covariates (possibly taking on continuous values) are includedin the model, estimation of SC is nontrivial unless a parametric model is assumed for the cen-soring time. Similar to the proportional hazards model, parametric models are accompanied byassumptions that can fail and lead to consequences that may not be well-understood, making themunattractive for our use.

This manuscript is concerned with censoring-robust estimation and inference for misspecifiedproportional hazards models involving multiple, possibly continuous, time-independent covariatesas typically utilized in observational studies. The goal is to draw valid and censoring-robust infer-ence on estimands that capture the scientific question of interest (marginal hazard ratios) regardlessof whether the model assumptions hold. This is useful in a comparative effectiveness setting like theVA study as the hypothesis conveyed through a probability model is a priori specified. Post-hocmodification of the model to conform with the observed data is not desired in order to control TypeI error and preserve the reproducibility of the results. In Section 2, we describe previously pro-posed censoring-robust methods in more detail and consider their asymptotic distributions for thetwo-sample case. We then outline a survival tree approach based on the work of LeBlanc & Crow-ley (1993) to identify group-specific censoring and illustrate how censoring-robust estimation andinference can be achieved in cases with multiple adjustment covariates. We evaluate our proposedmethodology using simulation in Section 3, and apply the proposed methodology to the VascularAccess data in Section 4. We end with some concluding remarks in Section 5.

2 Methods

2.1 Two-sample Censoring-robust Estimation for the Proportional Hazards Model

Let T > 0 be a continuous-time random variable with hazard function

λ(t|Z) = λ0(t) exp(ZTβ),

where Z is a p × 1 vector of covariates, β is a p × 1 vector of regression coefficients, and λ0 is anonspecified baseline hazard corresponding to Z = 0. Let C > 0 be a random censoring variablewith distribution FC(·|Z) where T is independent of C given Z, and let τ be the maximum follow-uptime. For a sample of size n, let β denote the maximum partial likelihood estimator corresponding

3

Page 4: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

to the root of the estimating equation (derivative of the log-partial-likelihood)

U(β) =n∑i=1

∫ τ

0

[Zi −

∑nj=1 Yj(t)Zj exp(ZTj β)∑nj=1 Yj(t) exp(ZTj β)

]dNi(t),

where Yi(t) = I{Ci ≥ t, Ti ≥ t} and Ni(t) = I{Ti ≤ t, Ti < Ci}.When λ is misspecified, Struthers & Kalbfleisch (1986) showed that, under mild regularity

conditions, β is consistent to the root of

g(β) =

∫ τ

0EZ

{fT (t|Z)SC(t|Z)

[Z − EZ{Z exp(ZTβ)ST (t|Z)SC(t|Z)}

EZ{exp(ZTβ)ST (t|Z)SC(t|Z)}

]}dt,

where fX and SX corresponds to the true density and survival function of X (X = C or T ). Theestimator β is consistent for a quantity that depends on the distribution of Z, the true distributionof the failure times, the maximal follow-up time τ , and the censoring distribution. To remove thecensoring-dependence, Boyd et al. (2012) proposed a censoring-robust estimator βCR based on theroot of the weighted estimating equation

UW (β) =n∑i=1

∫ τ

0W (t|Zi)

[Zi −

S(1)W (β, t)

S(0)W (β, t)

]dNi(t),

where, for the two-sample case, W (t|Zi) = 1/SC(t|Zi), SC(·|Zi) is the left-continuous Kaplan-Meierestimator of the censoring time for group Zi, and where for r = 0, 1, 2,

S(r)W (β, t) =

1

n

n∑j=1

W (t|Zj)Yj(t)Z⊗rj exp(ZTj β),

such that for a vector z, z⊗0 = 1, z⊗1 = z, and z⊗2 = zzT . Provided that SC(·|Zi) converges toSC(·|Zi) in probability, they showed that βCR is consistent for β∗, the root of

gW (β) =

∫ τ

0EZ

{fT (t|Z)

[Z − EZ{Z exp(ZTβ)ST (t|Z)}

EZ{exp(ZTβ)ST (t|Z)}

]}dt,

and that n1/2(βCR−β∗) converges to a zero-mean Gaussian distribution with variance that can beconsistently estimated by A−1BA−1, where

A = − 1

n

n∑i=1

∫ τ

0W (t|Zi)

[S

(2)W (βCR, t)

S(0)W (βCR, t)

−S

(1)W (βCR, t)

⊗2

S(0)W (βCR, t)2

]dNi(t),

B =1

n

n∑i=1

{∫ τ

0W (t|Zi)

[Zi(t)−

S(1)W (βCR, t)

S(0)W (βCR, t)

]dNi(t)

−∫ τ

0

W (t|Zi)Yi(t) exp{Zi(t)T βCR}S

(0)W (βCR, t)

[Zi(t)−

S(1)W (βCR, t)

S(0)W (βCR, t)

]dFW (t)

},

and FW (t) =∑W (t|Zi)Ni(t)/n. The vector β∗ does not depend on the nuisance parameter SC ;

it corresponds to the regression coefficient vector from a Cox model if the proportional hazardsrelationship was correctly specified, and corresponds to the estimand of the maximum partiallikelihood estimator for a misspecified Cox model in the absence of censoring.

4

Page 5: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

2.2 Identifying Groups via Survival Trees

Censoring-robust estimation and inference in the proportional hazards model rely on consistentestimation of SC(·|Z) and incorporating covariate specific inverse probability of censoring weightsinto the score equation. When Z consists of group indicators, nonparametric estimation of SC can beeasily obtained by separating the observations into groups and applying the Kaplan-Meier estimatorto each group. When Z consists of continuous variables, estimation of SC is not straightforward.One could assume covariate-independent censoring (SC(t|Z) = SC(t)) as in Xu & O’Quigley (2000)and estimate a single censoring distribution for the observed sample. However, Boyd et al. (2012)showed that this does not remove the censoring-dependence when censoring is dependent on Z.Thus, the key to extending previously developed censoring-robust methods is to obtain a consistentestimator for SC(·|Z).

For our purpose, estimation of SC represents a prediction problem that is not of direct scientificinterest but is necessary to remove the dependence of the usual Cox estimator on the censoringdistribution under a misspecified model. Our goal for this prediction problem is to devise a rea-sonable set of procedures that can be easily implemented in order to flexibly estimate SC so as tobe useful for the original inferential problem (scientific interest).

Based on our experiences, a parametric relationship of C on Z is unrealistic in most situationsas censoring times often differ by groups, with groups usually defined based on low or high valuesof a certain variable or a certain covariate combination. For example, in the Vascular Access study,older subjects have higher mortality rates leading to shorter censoring times and the relationshipbetween time to death and age may be modified by a patient’s diabetic status. Thus, we prefer tonot impose any strict assumptions on the relationship between C and Z, opting for a highly flexiblerelationship between adjustment variables and the probability of censoring.

To that end, we propose to discretize the relationhip between Z and C by identifying clusters ofobservations based upon their covariate values that share a “similar” censoring distribution. Thatis, we wish to arrive at a function MC that maps Z to a censoring group, i.e., MC(Z)→ {1, . . . ,m},where m is the number of censoring-specific groups. To identify clusters in a nonparametric fashionwe consider the survival tree approach of LeBlanc & Crowley (1993). Although tree approachesidentify relationships by dichotomizing covariates or taking subsets of values, they are flexibleenough to capture parametric and nonparametric relationships given sufficient sample size at thecost of efficiency (more nodes). While this may lead to less precise estimates of SC at differenttime points if parametric relationships do exist, the tradeoff with flexibility is worthwhile in ourview as it allows for the investigation of all possible relationships within a population (even thoughcensoring is probably strongly influenced by only a small subset of variables).

Once censoring-specific groups are identified based on Z, the Kaplan-Meier estimator can beused to provide a nonparametric estimate of the censoring distribution for each group. After theweights are obtained they can be used for robust estimation and inference as described in Section 2.1,the only difference being the estimation of m censoring distributions instead of two. The inferentialalgorithm is outlined as follow for the proportional hazards model.

1. Specify the a priori scientific model:

λ(t|Z) = λ0(t) exp(ZTβ).

2. Identify censoring-specific groups using survival trees:

MC(Z)→ {1, . . . ,m}.

5

Page 6: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

3. Estimate SC(· | MC(Z)) using the left-continuous Kaplan-Meier estimator for each group(1, . . . ,m):

SC(· |MC(Z)).

4. Plug in the inverse probability of censoring in the estimating equation to form a weightedestimating equation:

UW (β) =n∑i=1

∫ τ

0W (t|Zi)

[Zi −

∑nj=1 W (t|Zj)Yj(t)Zj exp(ZTj β)∑nj=1 W (t|Zj)Yj(t) exp(ZTj β)

]dNi(t),

where W (t|Zi) = 1/SC(t|MC(Zi)).

5. Solve UW (β) = 0 to obtain the censoring-robust estimator, βCR.

6. Inference follows from the asymptotic distribution of βCR as stated in Section 2.1.

The target of inference is β∗, the root of gW , which also corresponds to the estimand of themaximum partial likelihood estimator from a Cox model under zero censorship. If the proportionalhazards relationship was correctly specified, then β∗ is the well known constant log-hazard-ratio.If the model assumption from step 1 was misspecified, then β∗ represents a marginal or weightedhazard ratio over the support of observable time (Xu & O’Quigley 2000; Nguyen & Gillen 2012).The weighted estimating equation is required to estimate β∗ since Struthers & Kalbfleisch (1986)and Nguyen & Gillen (2012) have shown that the usual (naive) estimator from a Cox model isconsistent for a quantity that depends on the censoring distribution. Thus, the current researchfocuses on the nonparametric estimation of SC in order to facilitate inference about β∗.

For step 3 of the inferential algorithm, we consider the survival tree approach of LeBlanc &Crowley (1993). They proposed growing a tree by considering splits based on the partitioning of thecovariate space where the response is the time to some event; in our specific setting, the response isthe censoring time. A standardized two-sample statistic that measures “difference” between groupsis computed for each potential split, where the largest statistic is chosen for the next split. Oncea maximal-sized tree is grown, pruning takes place based on their goodness-of-split statistic. SeeAppendix A of the manuscript for a summary of their algorithm as implemented in the currentcontext (step 2 of the previously described inferential algorithm).

At each iteration of growing the tree, we are faced with selecting the split that separates the datainto the two most heterogeneous groups with respect to censoring time. However, heterogeneity isdifficult to summarize and is even more difficult to compare using a single statistic. The role of thestandardized splitting statistic is to rank potential splits based on its measure of difference, but eachtwo-sample statistic captures heterogeneity in a different way. For example, LeBlanc & Crowley(1993) illustrated their algorithm with the logrank statistic since it is commonly used for testingthe equality of two distributions and is well-understood. The logrank statistic is most powerfulwhen the difference between the two groups can be characterized using a proportional hazardsrelationship and is generally powerful when the hazards are stochastically ordered. However, whenhazards are stochastically ordered but are nonproportional, it is not clear how splits should beranked as the heterogeneity is not captured in a way that is comparable or even transitive. It iseven more problematic when hazards cross as the two groups are clearly different but may not bereflected using the logrank statistic.

6

Page 7: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Thus, it is crucial to define what heterogeneity precisely means and to select a statistic thatreflects this difference. We require a statistic that can capture heterogeneity in a variety of al-ternatives, not just under proportional hazards. Recall the motivation behind a censoring-robustestimator is to be able to a priori specify the question of interest and infer validly once dataare available even when the proportional hazards assumption fails. To this end, it would seemcontradictory to implicitly assume or prefer proportional hazards when splitting censoring groupsusing the logrank statistic. As described previously, the logrank statistic is probably not an op-timal candidate due to its drawbacks under nonproportional hazards. We thus require a splittingstatistic that is able to detect deviations from the strong null of equal censoring over a wide rangeof alternatives. Moreover, the splitting statistic should capture differences across the support ofobservable censoring times with no preference to any particular time point or region.

Some previously proposed versatile testing procedures that may be considered as possible split-ting statistics include the maximum or linear combinations of several weighted logrank statisticsproposed and investigated in Fleming & Harrington (1991, Section 7.5), Lee (1996), and Wu &Gilbert (2002); Kolmogorov-Smirnov type or Renyi-type statistics explored in Fleming, Harring-ton & O’Sullivan (1987), Fleming, O’Fallon, O’Brien & Harrington (1980), Fleming & Harring-ton (1981); Cramer-von Mises type statistics explored in Schumacher (1984) and Koziol (1978);weighted Kaplan-Meier statistics proposed and explored in Pepe & Fleming (1989) and Pepe &Fleming (1991); and statistics that capture overall survival differences based on squared differencesin the hazard or absolute differences in the survival curves explored in Lin & Wang (2004) and Lin& Xu (2009), respectively.

For the purpose of identifying group-specific censoring in this manuscript, we consider the KGρ

statistic of Fleming et al. (1987). For a time-to-event random variable T differentiated by twogroups, the statistic is defined to be

KGρ =supt≥0

∫ t0 S

ρ+1/2(Y1Y2Y1+Y2

)1/2 (dN1Y1− dN2

Y2

)[∫∞

0 S2ρ+1(

1− ∆N1+∆N2−1Y1+Y2−1

)× I{Y1Y2 > 0}d(N1+N2)

Y1+Y2

]1/2,

where the time arguments are dropped in the integrals for each function for simplicity, the sub-scripts indicate group, S is the pooled Kaplan-Meier estimator for the time of interest, and ρ ≥ 0is a parameter that affects power at different alternatives. This particular statistic utilizes themaximum difference in the hazard function between the two groups, is easy to compute, and doesnot depend on weights involving the “censoring” time (in our case, the failure time). Thus, itcan detect heterogeneity when the difference between groups satisfies proportional hazards, whenhazard functions are nonproportional but ordered, and when hazard functions cross. Based on theoptimality results discussed in Fleming et al. (1987) and the simulation results of Lee (1996) thatincludes a similar statistic, we believe the KGρ will be quite versatile for our use.

To reiterate, once censoring-groups are identified based on the algorithm of LeBlanc & Crowley(1993), as outlined in Appendix A of the manuscript, using theKGρ statistic as the splitting criteria,the censoring distribution could be estimated by each group using the Kaplan-Meier estimator. Theweights can be inserted by analogy with the two-sample case, and estimation and inference followsdirectly.

7

Page 8: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

3 Numerical Studies

3.1 Simulation Setup

In this section, we compare the performance of 1) the naive estimator β where censoring is notaccounted for, 2) the censoring-robust estimator βCRC where SC is estimated from a Cox propor-tional hazards model with all covariates included, and 3) the censoring-robust estimator βCR whereSC is nonparametrically estimated using the survival tree approach (our proposed methodology)using simulation for the continuous-time proportional hazards model as described in Section 2.2at n = 400, 800, and 2, 000. We evaluate the estimators based on bias, efficiency, mean squarederror (MSE), and coverage probability of 95% confidence intervals under different data-generatingmechanisms and as the censoring distribution varies.

Suppose Z1 ∈ {0, 1} is the variable of interest. Then exp(β1) will be the corresponding pa-rameter and focus of concern in our evaluations. Due to a lack of an analytic expression for β∗1from gW , we take the “true” value of β∗1 to be the Monte Carlo average of β1 (naive estimator)under censoring case 1 (no random censorship with administrative censoring/truncation at τ = 4)for n = 2, 000. That is, we use the average of the maximum partial likelihood estimator for largesamples in the absence of intermittent censoring (administrative censoring is required to keep thesupport the same across comparisons) when evaluating the three estimators.

We generate data for the null case according to

λ(t|Z) = exp{0× z1 − 0.4× z2 − 0.4× z3},

the non-null proportional hazards case according to

λ(t|Z) = exp{−0.4× z1 − 0.4× z2 − 0.4× z3},

and the nonproportional hazards case (late diverging hazard) according to

λ(t|Z) = 1 + 0.3× I{t > 0.5, z1 = 1}.

For each data-generating scenario and for each sample size n, we study the properties of thethree estimators as censoring varies. See Table 2 for a description of the censoring scenarios, whichconsists of a variety of cases: administrative censoring by truncation (case 1), covariate-independentcensoring (case 2), censoring by grouping (case 3–4), censoring by parametric relationships (case5–6), and crossing hazard censoring (case 7). We require administrative censoring at time τ = 4in all cases to keep the support of observable time constant. In most cases, censoring times aregenerated according to a power-function distribution with parameters (b, r), where 0 < C < b andr ≥ 0. When r = 1, C is the uniform distribution over (0, b). When r → 0 more probability isconcentrated towards 0 and when r → ∞ more probability is concentrated towards b. As typicalof observational studies, we generate covariates that are correlated by design: Z1 ∼ Bernoulli(0.5),Z2 ∼ power-function(1, 1 + 2 × I{Z1 = 1}), and Z3 ∼ 2 × Z2 + Normal(0, 1). We replicate eachdata-generating scenario and censoring case 1,000 times.

To find censoring-specific groups, we apply the algorithm described in Appendix A of themanuscript to the censoring times using the KG0 statistic. The parameter ρ affects power atdifferent alternatives; since we are not using the statistic in a formal testing framework, we chooseρ = 0 for simplicity. In our tree-building algorithm, we restrict each node to have at least 20 events(censored observations) to ensure enough information on the censoring times for each group; we are

8

Page 9: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

not comfortable saying two groups are different with respect to censoring time when each grouphas, e.g., 5 observations each. We use 5-fold cross-validation to select the optimally sized tree. Theresults of the simulation study are presented in Table 1.

3.2 Simulation Results

In the subsequent paragraphs, we focus primarily on the comparison of the naive estimator (β)with the censoring-robust estimator based on survival trees (βCR). We defer the comparison ofthe censoring-robust estimator based on the Cox proportional hazards model for censoring (βCRC)towards the end.

Under the null and PH (correct model specification) scenarios, we find that the naive estima-tor performs as expected: it is approximately unbiased and coverage is attained using the 95%confidence intervals regardless of the censoring mechanism. In contrast, βCRC and βCR are nearlyunbiased and coverage is near 95% for all censoring mechanisms except for a few cases (4–7) wherecoverage is 90–94% and 91–93%, respectively. However, as the sample size grows, coverage improvesto 93–95% for βCR. As expected when the correct model is specified, we observe that the naiveestimator is a more efficient estimator.

In the case that the model is misspecified (non-PH scenario), we observe higher variability forβCRC and βCR as compared to β. However, the comparison of variability with β is inappropriatesince the estimators are estimating different quantities; this akins to comparing apples and oranges.In a misspecifed setting, the estimand for the naive estimator is a quantity that depends on censoringand the estimand for the censoring-robust estimator is a marginal adjusted hazard ratio that isindependent of censoring, a quantity that can be meaningful to an investigator in the presence ofa time-varying effect: a weighted average of hazard ratios. As can be seen, the naive estimator isseverely biased and coverage can decrease to 0, both of which illustrates that what β is consistentfor is a function of censoring. On the other hand, βCR is nearly unbiased and coverage is attainedor nearly attained in all scenarios.

The improvement in unbiasedness and coverage probability of βCR might seem like a directresult of a bias-variance tradeoff induced by the weighted estimating equation. If we consider MSE,then β dominates for the null and proportional hazards settings under all censoring scenarios; thisis expected as the model was correctly specified. However, for the nonproportional hazards setting,we find that the MSE of βCR dominates the MSE of β in all censoring scenarios. We note, however,that MSE is not an ideal summary measure for our comparisons as the estimators are consistentfor different quantities. If we let n → ∞, then the variance component of MSE goes to 0 for allestimators, and what remains is the bias (consistency). In our case, the bias reflects the varyingestimand for varying censoring mechanisms.

If we consider coverage probability in more detail, we can explain the lack of nominal coverageattainment by either of two reasons: either the estimator was not estimating the estimand ofinterest (e.g., β in the nonproportional hazards setting) or that the variability of the estimator wasnot adequately estimated (e.g., βCR as reflected by the average ratio of analytic standard error toempirical standard error, SE/ESE). As inference of β∗ is based on the asymptotic distribution ofβCR, coverage probability improves as n→∞. A second reason might be due to the fact that thevariability induced by the survival tree algorithm and the estimation of SC was not incorporatedin the variance formula. That is, the weights in UW were treated as fixed. To incorporate thissource of variability, one can easily incorporate the bootstrap to obtain a more correct estimate ofthe standard error. However, as Figure 1 portrays, not much is lost when this source of variability

9

Page 10: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

is ignored. In most cases, the average ratio of analytic standard error to bootstrap standard erroris about 0.95–1.00, with the worst case scenario being 0.914 for the proportional hazards settingwith censoring scenario 6 at n = 400. With respect to coverage probability, the use of bootstrapstandard errors make negligible difference, with the biggest difference being a 1% improvement inthe coverage probability.

When considering the average number of censoring groups obtained from the survival treealgorithm, we find that more groups are identified when there is a parametric relationship betweenthe covariates and the censoring time (cases 5–6). We also observe that as the sample size isincreased, the algorithm yields more groups. Both observations are expected for the survival treealgorithm. We also note that the KG0 statistic works reasonbly well when hazards cross (case 7).

If we compare the results of βCRC with βCR from Table 1, we find that in nearly all settings, thesurvival tree approach (βCR) performs better with respect to bias, efficiency, MSE, and coverageprobability. It’s noteworthy that censoring scenario 4 leads to extremely large values for MSE.This is probably due to the fact that the Cox model for the censoring time is estimating extremelysmall probabilities caused by large or small covariate values combined with the linear relationshipon the hazard ratio scale. Small probabilities of not-yet-censored would lead to large weights andnumerical instability with the estimates. This probaly was not a problem for the survival treeapproach since we restricted at least 20 censoring events for each node, reducing the possibility forsmall probability of not-yet-censored.

Finally, we were not able to manually build good predictive models for C|Z using the propor-tional hazards model to obtain βCRC for each data set generated in the simulation study. Sinceonly three covariates were part of the scientific model in our simulation study, we included all threecovariates for the distribution of C|Z. As we’ve described in our inferential algorithm in Section 2.2,any prediction method could be used to estimate SC(·|Z). If one were to use a (semi-)parametricmodel, then transformations or discretizations of the covariates and interactions should be con-sidered to obtain better models. The possibilities are limitless. However, exploring all functionalforms and interactions is nearly equivalent to the survival tree approach, where covariates are re-peatedly discretized and interactions are inherently incorporated due the recursive nature of thetree approach. Thus, a survival tree approach may be preferred.

[Table 1 about here.]

[Figure 1 about here.]

4 Application

Recall the Vascular Access study introduced in Section 1. In this section, we focus on the subgroupof patients receiving hemodialysis; see Table 3 for a description of the sample. A scientific goalof the study was to compare the effect of different access types on the time to access revision.As noted before, using a proportional hazards model can lead to estimates that depend upon thecensoring distribution since the data indicate a failure of the proportional hazards assumption (seeFigure 2), potentially leading to results that are difficult to replicate across multiple studies. This isof particular relevance in the area of comparative effectiveness where one would seek to determine ifthe relative performance of access type is consistent across multiple studies and patient populations.

10

Page 11: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Tab

le1:

Resu

lts

of

the

sim

ula

tion

study.

Each

scenari

ois

rep

eate

d1,0

00

tim

es.

Event

rate

refe

rsto

the

avera

ge

pro

port

ion

of

the

sam

ple

wit

han

obse

rved

event.

Tru

thre

fers

toth

eM

onte

Carl

oavera

ge

ofβ

atn

=2,000

and

under

censo

ring

scenari

o1.

ESE

refe

rsto

the

em

pir

ical

standard

err

or

as

calc

ula

ted

thro

ugh

the

repli

cate

ddata

sets

.M

SE

refe

rsto

mean

square

derr

or

tim

es

1000.

SE

/E

SE

refe

rsto

the

avera

ge

rati

oof

analy

tic-c

alc

ula

ted

sandw

ich

standard

err

or

toth

eE

SE

.C

Pre

fers

tocovera

ge

pro

babilit

yof

the

95%

confi

dence

inte

rvals

.A

vg

Gro

ups

refe

r

toth

eavera

ge

num

ber

of

censo

ring-s

pecifi

cgro

ups

deri

ved

from

the

chose

ntr

ee

use

dto

obta

inβC

R.

Naiv

eest

imato

r(β

)R

obust

est

imato

rw

ith

Cox

PH

Censo

ring

(βC

RC

)R

obust

est

imato

rw

ith

Surv

ival

Tre

es

(βC

R)

Scenari

oE

vent

rate

Tru

thB

ias

ESE

MSE

SE

/E

SE

CP

Bia

sE

SE

MSE

SE

/E

SE

CP

Bia

sE

SE

MSE

SE

/E

SE

CP

Avg

Gro

ups

n=

400

Null

C:

10.8

19

0.0

00

0.0

05

0.1

20

1.4

49

1.0

42

0.9

65

-0.0

03

0.1

32

1.7

54

0.9

45

0.9

28

0.0

05

0.1

20

1.4

49

1.0

42

0.9

65

1.0

00

C:

20.5

49

0.0

00

0.0

06

0.1

52

2.3

27

1.0

15

0.9

62

0.0

01

0.1

99

3.9

59

0.9

22

0.9

34

0.0

04

0.1

85

3.4

18

0.9

54

0.9

49

5.0

01

C:

30.6

06

0.0

00

0.0

05

0.1

40

1.9

57

1.0

37

0.9

59

-0.0

14

0.3

63

13.1

66

0.5

86

0.9

15

0.0

02

0.1

64

2.6

81

0.9

84

0.9

38

4.3

19

C:

40.6

78

0.0

00

0.0

07

0.1

33

1.7

59

1.0

21

0.9

56

-0.0

24

1.2

37

152.9

34

0.3

34

0.8

24

0.0

05

0.1

45

2.1

10

1.0

11

0.9

51

3.5

66

C:

50.4

16

0.0

00

0.0

05

0.1

81

3.2

59

1.0

07

0.9

44

-0.0

06

0.1

90

3.6

11

0.9

95

0.9

48

0.0

02

0.2

21

4.8

59

0.9

82

0.9

43

6.4

49

C:

60.3

74

0.0

00

0.0

16

0.2

05

4.2

06

1.0

14

0.9

52

0.0

02

0.2

26

5.1

04

0.9

35

0.9

37

0.0

26

0.2

77

7.7

34

0.9

07

0.9

30

8.3

21

C:

70.3

80

0.0

00

0.0

02

0.1

85

3.4

35

1.0

17

0.9

52

-0.0

08

0.3

08

9.4

91

0.8

38

0.9

03

0.0

02

0.2

62

6.8

58

0.9

31

0.9

34

6.7

60

PH C

:1

0.7

66

-0.4

04

-0.0

04

0.1

35

1.8

28

0.9

77

0.9

44

-0.0

02

0.1

32

1.7

39

0.9

98

0.9

52

-0.0

04

0.1

35

1.8

28

0.9

77

0.9

44

1.0

00

C:

20.4

93

-0.4

04

-0.0

03

0.1

68

2.8

27

0.9

85

0.9

48

0.0

04

0.2

31

5.3

24

0.8

77

0.9

39

-0.0

05

0.2

06

4.2

21

0.9

39

0.9

43

5.4

84

C:

30.5

50

-0.4

04

-0.0

04

0.1

58

2.4

82

0.9

83

0.9

49

-0.0

02

0.3

00

9.0

02

0.7

38

0.9

21

-0.0

07

0.1

86

3.4

44

0.9

43

0.9

37

4.9

84

C:

40.6

19

-0.4

04

-0.0

02

0.1

50

2.2

44

0.9

60

0.9

44

-0.0

47

1.1

66

136.0

49

0.3

35

0.8

39

-0.0

04

0.1

68

2.8

23

0.9

33

0.9

31

4.4

64

C:

50.3

71

-0.4

04

-0.0

12

0.1

93

3.7

40

1.0

08

0.9

47

-0.0

01

0.2

12

4.5

00

0.9

68

0.9

36

-0.0

14

0.2

62

6.8

55

0.9

03

0.9

23

6.8

10

C:

60.3

41

-0.4

04

-0.0

07

0.2

33

5.4

21

0.9

61

0.9

34

-0.0

02

0.2

32

5.3

95

0.9

87

0.9

44

-0.0

07

0.3

20

10.2

23

0.8

74

0.9

18

8.8

11

C:

70.3

33

-0.4

04

0.0

00

0.2

04

4.1

45

0.9

94

0.9

52

-0.0

04

0.3

39

11.4

91

0.8

48

0.9

19

-0.0

02

0.2

99

8.9

47

0.8

90

0.9

16

7.2

23

Non-P

HC

:1

0.8

85

-0.6

47

-0.0

03

0.1

24

1.5

36

1.0

02

0.9

44

0.0

09

0.1

22

1.5

04

1.0

13

0.9

55

-0.0

03

0.1

24

1.5

36

1.0

02

0.9

44

1.0

00

C:

20.6

64

-0.6

47

0.1

33

0.1

44

3.8

50

0.9

94

0.8

42

0.0

01

0.1

77

3.1

18

0.8

86

0.9

08

0.0

11

0.1

58

2.5

06

0.9

77

0.9

44

3.6

17

C:

30.7

21

-0.6

47

0.0

95

0.1

38

2.8

21

0.9

92

0.8

98

-0.0

32

0.2

04

4.2

53

0.7

95

0.9

24

0.0

03

0.1

46

2.1

44

0.9

85

0.9

39

3.0

29

C:

40.7

84

-0.6

47

0.0

45

0.1

31

1.9

05

1.0

02

0.9

37

-0.1

10

1.0

89

119.6

79

0.2

10

0.9

08

-0.0

09

0.1

36

1.8

66

0.9

88

0.9

41

2.3

77

C:

50.5

08

-0.6

47

0.2

02

0.1

63

6.7

19

1.0

01

0.7

63

0.1

06

0.1

73

4.1

21

0.9

71

0.8

98

0.0

23

0.1

90

3.6

41

0.9

71

0.9

42

5.6

42

C:

60.4

85

-0.6

47

0.2

02

0.1

94

7.8

55

0.9

95

0.8

10

0.1

63

0.1

96

6.4

96

1.0

01

0.8

63

0.0

27

0.2

47

6.1

64

0.9

23

0.9

24

6.8

36

C:

70.5

00

-0.6

47

0.2

63

0.1

65

9.6

38

0.9

83

0.6

15

-0.0

04

0.2

70

7.2

83

0.7

93

0.9

01

0.0

29

0.2

02

4.1

61

0.9

66

0.9

41

5.5

71

n=

800

Null

C:

10.8

19

0.0

00

-0.0

03

0.0

90

0.8

12

0.9

83

0.9

37

0.0

00

0.0

92

0.8

51

0.9

58

0.9

41

-0.0

03

0.0

90

0.8

12

0.9

83

0.9

37

1.0

00

C:

20.5

47

0.0

00

-0.0

01

0.1

09

1.1

89

1.0

01

0.9

52

0.0

02

0.1

52

2.3

10

0.8

85

0.9

33

-0.0

06

0.1

36

1.8

56

0.9

48

0.9

40

9.3

78

C:

30.6

06

0.0

00

-0.0

02

0.1

01

1.0

26

1.0

10

0.9

43

-0.0

09

0.1

93

3.7

12

0.8

20

0.9

31

-0.0

02

0.1

18

1.3

93

0.9

97

0.9

49

7.8

88

C:

40.6

77

0.0

00

-0.0

03

0.0

96

0.9

13

1.0

01

0.9

48

-0.0

60

1.3

53

183.2

28

0.3

07

0.8

42

-0.0

08

0.1

07

1.1

54

0.9

81

0.9

44

7.5

99

C:

50.4

15

0.0

00

-0.0

01

0.1

32

1.7

33

0.9

73

0.9

45

-0.0

01

0.1

38

1.9

08

0.9

73

0.9

48

-0.0

03

0.1

71

2.9

19

0.9

30

0.9

23

13.7

25

C:

60.3

74

0.0

00

-0.0

02

0.1

48

2.1

85

0.9

90

0.9

51

0.0

03

0.1

49

2.2

26

1.0

06

0.9

46

0.0

02

0.1

97

3.8

75

0.9

53

0.9

41

17.6

77

C:

70.3

80

0.0

00

0.0

02

0.1

31

1.7

19

1.0

11

0.9

56

0.0

07

0.2

15

4.6

07

0.9

14

0.9

44

0.0

06

0.1

93

3.7

40

0.9

41

0.9

44

12.7

84

PH C

:1

0.7

67

-0.4

04

0.0

01

0.0

93

0.8

73

0.9

96

0.9

47

0.0

06

0.0

96

0.9

24

0.9

68

0.9

35

0.0

01

0.0

93

0.8

73

0.9

96

0.9

47

1.0

00

C:

20.4

94

-0.4

04

-0.0

04

0.1

20

1.4

51

0.9

68

0.9

42

0.0

04

0.1

62

2.6

13

0.9

01

0.9

33

-0.0

07

0.1

52

2.3

22

0.9

22

0.9

32

10.2

81

C:

30.5

48

-0.4

04

0.0

01

0.1

13

1.2

65

0.9

70

0.9

33

0.0

03

0.2

18

4.7

48

0.7

61

0.9

31

0.0

01

0.1

37

1.8

81

0.9

35

0.9

44

9.2

50

C:

40.6

18

-0.4

04

0.0

01

0.1

03

1.0

64

0.9

84

0.9

44

-0.0

51

1.0

55

111.5

32

0.3

71

0.8

59

0.0

01

0.1

24

1.5

31

0.9

30

0.9

39

8.7

51

C:

50.3

70

-0.4

04

-0.0

03

0.1

38

1.8

95

0.9

95

0.9

56

0.0

08

0.1

49

2.2

12

0.9

81

0.9

45

-0.0

08

0.1

80

3.2

41

0.9

67

0.9

45

14.3

29

C:

60.3

42

-0.4

04

-0.0

04

0.1

56

2.4

35

1.0

08

0.9

54

0.0

02

0.1

67

2.7

70

0.9

75

0.9

38

-0.0

09

0.2

08

4.3

27

0.9

93

0.9

43

18.5

25

C:

70.3

32

-0.4

04

0.0

05

0.1

44

2.0

81

0.9

89

0.9

42

-0.0

01

0.2

48

6.1

28

0.8

79

0.9

31

-0.0

01

0.2

16

4.6

70

0.9

27

0.9

30

14.4

40

Non-P

HC

:1

0.8

85

-0.6

47

-0.0

02

0.0

86

0.7

40

1.0

17

0.9

52

-0.0

04

0.0

85

0.7

31

1.0

25

0.9

52

-0.0

02

0.0

86

0.7

40

1.0

17

0.9

52

1.0

00

C:

20.6

64

-0.6

47

0.1

32

0.1

00

2.7

25

1.0

13

0.7

39

-0.0

03

0.1

22

1.4

95

0.9

25

0.9

32

0.0

03

0.1

12

1.2

60

0.9

85

0.9

41

6.9

89

C:

30.7

23

-0.6

47

0.0

95

0.0

94

1.7

99

1.0

23

0.8

39

-0.0

34

0.1

36

1.9

69

0.8

58

0.9

36

0.0

01

0.1

02

1.0

43

1.0

11

0.9

56

5.9

62

C:

40.7

84

-0.6

47

0.0

47

0.0

90

1.0

36

1.0

21

0.9

12

-0.0

35

0.3

44

11.9

54

0.4

72

0.9

06

-0.0

10

0.0

96

0.9

33

1.0

02

0.9

51

5.0

71

C:

50.5

08

-0.6

47

0.2

04

0.1

15

5.4

87

1.0

03

0.5

68

0.0

95

0.1

27

2.5

09

0.9

42

0.8

58

0.0

22

0.1

35

1.8

70

0.9

88

0.9

45

11.2

60

C:

60.4

86

-0.6

47

0.2

08

0.1

39

6.2

44

0.9

77

0.6

50

0.1

53

0.1

36

4.1

88

1.0

16

0.8

02

0.0

41

0.1

78

3.3

22

0.9

30

0.9

25

14.5

17

C:

70.5

00

-0.6

47

0.2

61

0.1

11

8.0

49

1.0

24

0.3

63

-0.0

32

0.1

90

3.7

25

0.8

36

0.8

95

0.0

25

0.1

43

2.1

13

1.0

00

0.9

46

9.8

87

n=

2,0

00

Null

C:

10.8

19

0.0

00

-0.0

01

0.0

55

0.2

98

1.0

24

0.9

54

-0.0

01

0.0

56

0.3

13

1.0

00

0.9

57

-0.0

01

0.0

55

0.2

98

1.0

24

0.9

54

1.0

00

C:

20.5

48

0.0

00

-0.0

02

0.0

69

0.4

72

1.0

03

0.9

57

-0.0

01

0.1

01

1.0

10

0.8

74

0.9

14

-0.0

03

0.0

86

0.7

33

0.9

79

0.9

43

27.0

84

C:

30.6

06

0.0

00

-0.0

01

0.0

62

0.3

89

1.0

35

0.9

63

0.0

10

0.1

56

2.4

41

0.7

54

0.9

26

-0.0

01

0.0

77

0.5

95

0.9

94

0.9

41

22.1

01

C:

40.6

78

0.0

00

-0.0

01

0.0

59

0.3

46

1.0

25

0.9

51

0.0

41

1.1

43

130.6

07

0.3

89

0.8

31

-0.0

03

0.0

67

0.4

51

1.0

08

0.9

45

19.1

91

C:

50.4

16

0.0

00

-0.0

00

0.0

81

0.6

54

1.0

00

0.9

53

-0.0

01

0.0

87

0.7

59

0.9

84

0.9

40

-0.0

01

0.1

06

1.1

18

0.9

76

0.9

47

40.1

83

C:

60.3

74

0.0

00

-0.0

01

0.0

90

0.8

06

1.0

26

0.9

52

-0.0

01

0.0

98

0.9

57

0.9

79

0.9

48

0.0

07

0.1

23

1.5

15

0.9

92

0.9

48

46.7

01

C:

70.3

80

0.0

00

-0.0

02

0.0

82

0.6

74

1.0

19

0.9

58

-0.0

04

0.1

43

2.0

35

0.9

19

0.9

34

0.0

01

0.1

27

1.6

02

0.9

43

0.9

44

39.3

11

PH C

:1

0.7

66

-0.4

01

0.0

00

0.0

57

0.3

27

1.0

28

0.9

54

-0.0

00

0.0

60

0.3

55

0.9

86

0.9

48

-0.0

00

0.0

57

0.3

27

1.0

28

0.9

54

1.0

00

C:

20.4

94

-0.4

01

-0.0

01

0.0

73

0.5

39

1.0

02

0.9

53

0.0

01

0.1

02

1.0

39

0.9

50

0.9

49

-0.0

01

0.0

91

0.8

32

1.0

01

0.9

54

30.8

84

C:

30.5

49

-0.4

01

-0.0

00

0.0

66

0.4

34

1.0

44

0.9

63

0.0

02

0.1

62

2.6

31

0.7

61

0.9

46

-0.0

01

0.0

81

0.6

58

1.0

24

0.9

55

26.2

56

C:

40.6

18

-0.4

01

0.0

01

0.0

63

0.3

98

1.0

16

0.9

47

-0.0

29

1.2

07

145.6

33

0.3

37

0.8

25

-0.0

03

0.0

74

0.5

54

0.9

89

0.9

46

22.7

98

C:

50.3

70

-0.4

01

0.0

00

0.0

84

0.7

02

1.0

31

0.9

53

-0.0

03

0.0

94

0.8

92

0.9

86

0.9

43

0.0

00

0.1

13

1.2

66

1.0

05

0.9

56

43.0

59

C:

60.3

42

-0.4

01

-0.0

00

0.0

96

0.9

26

1.0

30

0.9

53

-0.0

01

0.1

09

1.1

95

0.9

47

0.9

34

0.0

10

0.1

35

1.8

41

0.9

98

0.9

47

49.2

30

C:

70.3

32

-0.4

01

-0.0

02

0.0

90

0.8

13

0.9

98

0.9

50

0.0

04

0.1

54

2.3

77

0.9

47

0.9

40

-0.0

08

0.1

40

1.9

77

0.9

35

0.9

35

42.1

94

Non-P

HC

:1

0.8

85

-0.6

46

0.0

00

0.0

58

0.3

35

0.9

54

0.9

45

-0.0

04

0.0

56

0.3

19

0.9

80

0.9

47

-0.0

00

0.0

58

0.3

35

0.9

54

0.9

45

1.0

00

C:

20.6

64

-0.6

46

0.1

36

0.0

65

2.2

65

0.9

74

0.4

20

-0.0

02

0.0

78

0.6

04

0.9

46

0.9

38

0.0

05

0.0

74

0.5

48

0.9

69

0.9

42

17.6

48

C:

30.7

23

-0.6

46

0.0

98

0.0

63

1.3

67

0.9

63

0.6

22

-0.0

29

0.0

85

0.8

16

0.8

83

0.9

29

0.0

03

0.0

70

0.4

93

0.9

50

0.9

31

14.3

12

C:

40.7

84

-0.6

46

0.0

48

0.0

60

0.5

96

0.9

63

0.8

61

-0.0

42

0.3

81

14.6

82

0.3

70

0.8

88

-0.0

11

0.0

65

0.4

29

0.9

60

0.9

31

12.5

88

C:

50.5

08

-0.6

46

0.2

02

0.0

73

4.6

36

0.9

89

0.2

10

0.0

96

0.0

80

1.5

74

0.9

47

0.7

37

0.0

16

0.0

86

0.7

59

1.0

08

0.9

45

33.5

33

C:

60.4

86

-0.6

46

0.2

04

0.0

87

4.9

04

0.9

85

0.3

34

0.1

51

0.0

89

3.0

72

0.9

81

0.5

74

0.0

28

0.1

09

1.2

66

0.9

86

0.9

40

38.1

97

C:

70.5

00

-0.6

46

0.2

63

0.0

73

7.4

75

0.9

83

0.0

51

-0.0

32

0.1

29

1.7

69

0.8

55

0.9

06

0.0

27

0.0

96

0.9

99

0.9

68

0.9

37

30.4

39

11

Page 12: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

We thus apply the censoring-robust methods outlined in Section 2.2 to obtain censoring-specificgroups and apply the results outlined in Section 2.1 to obtain estimates and draw inference.

After applying the survival tree approach using the same restrictions on the nodes and numberof cross-validations as in the simulation study, we arrive at 24 censoring groups, with ethnicity beingthe first covariate that a split occurs on. That is, there is heterogeneity in censoring time acrossethnicity. The censoring curves for each node of the pruned tree can be seen in Figure 3. Estimatesand 95% confidence intervals for the scientific model are presented in Table 4; we include resultsfrom both the naive estimator and the censoring-robust estimator. Clearly, the point estimatesdiffer by a noticeable amount. The marginal adjusted hazard ratio comparing VT fistula to SAfistula is estimated to be 2.08 (95% CI: 1.449–2.985). The marginal adjusted hazard ratio comparingprosthetic graft to SA fistula is 1.640 (95% CI: 1.234–2.179). Thus VT fistula, the most complicatedaccess type for insertion, is associated with longer time to access revision.

One might argue that the same conclusions would have been obtained using the naive estimator.However, the estimates are clearly different, and if the study were repeated, it would not be sur-prising if the naive estimator resulted in noticeably different estimates. That is, even if the survivaltimes remained the same in a new study but that a different censoring mechanism is utilized, theresults from the naive estimator would change whereas the results based on the censoring-robustestimator would not.

When real effects are near a hazard ratio of 1, the impact might lead to a different conclusion,from statistical significance to nonsignificance (or vice versa). For example, consider the estimatefor gender. The naive estimator yields a censoring-dependent marginal adjusted hazard ratio of1.150 (95% CI: 0.991–1.335), and the censoring-robust estimator yields a marginal adjusted hazardratio of 1.248 (95% CI: 1.012–1.506). The important fact that should not be forgotten is thatcensoring is a nuisance that should be removed in order for results to bear scientific meaning.

[Figure 2 about here.]

[Table 2 about here.]

[Figure 3 about here.]

[Table 3 about here.]

5 Discussion

In this manuscript, we outlined a set of procedures that allow for robust estimation and inference ofa pre-defined marginal adjusted hazard ratio using a proportional hazards model involving multiplecovariates that is commonly used for observational studies. The methodology is useful in scientificpractice since the hypothesis (expressed using a measure of association from a probability model)is defined a priori using a pre-specified probability model. Since the model is chosen before dataanalysis, the adequacy of model assumptions are often not met. Modifying the model to conformwith assumptions alters the pre-conceived hypothesis, potentially leading to an inflation of TypeI error and results that may be difficult to replicate. The proposed methodology facilitates theestimation of a marginal adjusted hazard ratio that is independent of censoring when the assumptionof a time-invariant covariate effect is violated. In the case that the assumption does in fact hold,the methodology still yields a consistent estimate of the fixed adjusted hazard ratio. The tradeoff

12

Page 13: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

for robustness is a cost in the efficiency of the estimator when model assumptions does in facthold. However, we believe the cost for robustness is valuable in the scientific setting where pre-specification of the hypothesis is required and model assumptions are not known to hold at thedesign phase.

The extension from the two-sample case to the observational study case is not immediatelyobvious when SC depends on multiple covariates as the relationship between C and Z is unknown.We treat the estimation of SC as a data-driven prediction problem since it is not of direct scientificinterest but its accuracy is crucial in removing the nuisance censoring-dependence from the estimandof interest. As seen in Section 3, the algorithm of LeBlanc & Crowley (1993) combined with theKGρ statistic is sufficiently flexible to identify censoring-specific groups under a wide variety ofcensoring scenarios. Once the groups are identified, extension to the multiple groups case is trivialusing developed theory for the two-sample case. The novelty of the method presented in thismanuscript is the use of survival trees on the censoring time that facilitates the estimation goal ofa scientific study. To draw proper inference, one could employ the bootstrap procedure to accountfor the variability from the tree-building algorithm and the estimation of SC . However, as oursimulation results have shown, little is lost if this source of variability is unaccounted for.

Our proposed procedures can be modified a number of ways. Any method of prediction for SCis a possible candidate. However, based on our experiences with censoring, we believe censoringusually differs by the clustering of a small subset of covariates. Hence, the tree approach seemsnatural. Within the tree framework, a user could choose any other reasonable two-sample statisticthat detects heterogeneity. We understand that there is often a time-constraint on any given dataanalysis. Should computing be an issue, the use of readily-available algorithms such as the treealgorithm based on an exponential model (LeBlanc & Crowley 1992) is probably reasonable in mostsituations. However, the flexibility in detecting heterogeneity in cases where the hazard functionscross might be lost.

Users should consider placing restrictions on nodes to have more confidence in the splittingof nodes. Examples of restrictions include the number of events or number of observations. Theillustrations in Section 3 and Section 4 require 20 censored events in each node. This numbershould be dictated by the comfort level of the user. Also, instead of using cross-validation to selectan optimal tree that takes into account the tradeoff of complexity and overfitting, the users canpre-define the number of nodes that they are willing to accept. For example, four or five nodes isprobably more than enough to account for most of the variability arising from differential censoringin most situations.

We’ve also illustrated the estimation of SC with a Cox proportional hazards model in oursimulation study and found that it performs worse than the survival tree approach. We do notdiscount this method immediately as our simulation study did not optimize the C|Z model withrespect to prediction. In practice where one spends more time on predicting SC , results may beimproved. However, such an approach would require manual human intervention and guidance inbuilding the model to explore different transformations, combinations, and interactions. In thiscontext, the survival tree approach is attractive due to its automatic nature and inherent design toexplore interactions and functional forms (via discretization of the covariates).

We note that the proposed estimation procedure has particular relevance in health policy wherea great deal of research and emphasis is being placed on the comparative effectiveness of existing in-terventions. Such analyses are often carried out in a meta-analysis framework where the relative per-formance of existing interventions are compared and contrasted across multiple observational and

13

Page 14: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

interventional studies. Along with the usual complications involved in performing meta-analyses(accounting for heterogeneity of data collection procedures, different adjustment covariates, differ-ent patient populations, publication bias, etc.) most studies will tend to have differential patient ac-crual and dropout patterns. The result could be perceived effect modification that is solely driven bydifferences in the censoring distribution across studies. The methods proposed here could alleviatethis problem if each individual analysis reported a censoring-robust estimate of intervention effect.For example, other authors have also studied vascular access patency (Woods, Turenne, Strawder-man, Young, Hirth, Port & Held 1997; Dixon, Novak, Fangman et al. 2002; Pisoni, Young, Dykstra,Greenwood, Hecking, Gillespie, Wolfe, Goodkin & Held 2002; Sheth, Brandt, Brewer, Nuchtern,Kale & Goldstein 2002; Ramage, Bailie, Tyerman, McColl, Pollard, Fitzpatrick et al. 2005). If wesuppose that all inclusion and exclusion criteria and adjustment variables were identical in thesestudies, then the individual (naively) estimated hazard ratios cannot be compared as estimandsare most likely different due to differential censoring. Only if censoring were constant could a faircomparison be made. For example, if a censoring-robust estimator were used, then an adjustedmarginal hazards ratio under zero censorship could be used for comparison. Further utility ofcensoring-robust estimators in the context of meta-analyses remains an area of future research.

One limitation to our methodology is that it does not extend to the time-varying covariatessetting as it is unclear how the sample could be partitioned when each “subject” experience differentcovariate values at different times. In such a setting, building a parametric model selected by cross-validation or validated by a hold-out sample may seem like a better approach.

Finally, the methodology proposed in this manuscript can easily be extended for censoring-robust estimation in the discrete survival setting as described by Nguyen & Gillen (2012). Again,survival trees can be used to identify censoring-specific groups, and weights can be incorporatedinto the estimating equation as in the two-sample case.

14

Page 15: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

References

Boyd, A. P., Kittelson, J. M., & Gillen, D. L. (2012), “Estimation of treatment effect under non-proportional hazards and conditionally independent censoring,” Statistics in Medicine, .URL: http://dx.doi.org/10.1002/sim.5440

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984), Classification and regression treesChapman & Hall/CRC.

Cox, D. R. (1972), “Regression Models and Life-Tables,” Journal of the Royal Statistical Society.Series B (Methodological), 34(2), pp. 187–220.URL: http://www.jstor.org/stable/2985181

Dixon, B., Novak, L., Fangman, J. et al. (2002), “Hemodialysis vascular access survival: upper-armnative arteriovenous fistula.,” American journal of kidney diseases: the official journal of theNational Kidney Foundation, 39(1), 92.

Fleming, T., & Harrington, D. (1991), Counting processes and survival analysis Wiley Series inProbability and Mathematical Statistics: Applied Probability and Statistics Section.

Fleming, T. R., & Harrington, D. P. (1981), “A Class of Hypothesis Tests for One and Two Samplesof Censored Survival Data,” Commun. Stat. - Theory Meth., 10, 763–794.

Fleming, T. R., Harrington, D. P., & O’Sullivan, M. (1987), “Supremum Versions of the Log-Rank and Generalized Wilcoxon Statistics,” Journal of the American Statistical Association,82(397), pp. 312–320.URL: http://www.jstor.org/stable/2289169

Fleming, T. R., O’Fallon, J. R., O’Brien, P. C., & Harrington, D. P. (1980), “Modified Kolmogorov-Smirnov Test Procedures with Application to Arbitrarily Right-Censored Data,” Biometrics,36(4), pp. 607–625.URL: http://www.jstor.org/stable/2556114

Gibson, K., Gillen, D., Caps, M., Kohler, T., Sherrard, D., & Stehman-Breen, C. (2001), “Vascularaccess survival and incidence of revisions: A comparison of prosthetic grafts, simple autogenousfistulas, and venous transposition fistulas from the United States Renal Data System DialysisMorbidity and Mortality Study* 1,” Journal of vascular surgery, 34(4), 694–700.

Koziol, J. A. (1978), “A Two Sample CRAMR-VON MISES Test for Randomly Censored Data,”Biometrical Journal, 20(6), 603–608.URL: http://dx.doi.org/10.1002/bimj.4710200608

LeBlanc, M., & Crowley, J. (1992), “Relative Risk Trees for Censored Survival Data,” Biometrics,48(2), pp. 411–425.URL: http://www.jstor.org/stable/2532300

LeBlanc, M., & Crowley, J. (1993), “Survival Trees by Goodness of Split,” Journal of the AmericanStatistical Association, 88(422), pp. 457–467.URL: http://www.jstor.org/stable/2290325

15

Page 16: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Lee, J. W. (1996), “Some Versatile Tests Based on the Simultaneous Use of Weighted Log-RankStatistics,” Biometrics, 52(2), pp. 721–725.URL: http://www.jstor.org/stable/2532911

Lin, X., & Wang, H. (2004), “A New Testing Approach for Comparing the Overall Homogeneityof Survival Curves,” Biometrical Journal, 46(5), 489–496.URL: http://dx.doi.org/10.1002/bimj.200310053

Lin, X., & Xu, Q. (2009), “A new method for the comparison of survival distributions,” Pharmaceut.Statist., .

Mingers, J. (1989), “An Empirical Comparison of Pruning Methods for Decision Tree Induction,”Machine Learning, 4, 227–243. 10.1023/A:1022604100933.URL: http://dx.doi.org/10.1023/A:1022604100933

Nguyen, V., & Gillen, D. (2012), “Robust inference in discrete hazard models for randomizedclinical trials,” Lifetime Data Analysis, pp. 1–24.

Pepe, M. S., & Fleming, T. R. (1989), “Weighted Kaplan-Meier Statistics: A Class of DistanceTests for Censored Survival Data,” Biometrics, 45(2), pp. 497–507.URL: http://www.jstor.org/stable/2531492

Pepe, M. S., & Fleming, T. R. (1991), “Weighted Kaplan-Meier Statistics: Large Sample and Op-timality Considerations,” Journal of the Royal Statistical Society. Series B (Methodological),53(2), pp. 341–352.URL: http://www.jstor.org/stable/2345745

Pisoni, R., Young, E., Dykstra, D., Greenwood, R., Hecking, E., Gillespie, B., Wolfe, R., Goodkin,D., & Held, P. (2002), “Vascular access use in Europe and the United States: results from theDOPPS,” Kidney international, 61(1), 305–316.

Ramage, I., Bailie, A., Tyerman, K., McColl, J., Pollard, S., Fitzpatrick, M. et al. (2005), “Vascularaccess survival in children and young adults receiving long-term hemodialysis.,” Americanjournal of kidney diseases: the official journal of the National Kidney Foundation, 45(4), 708.

Schumacher, M. (1984), “Two-Sample Tests of Cramr–von Mises- and Kolmogorov–Smirnov-Typefor Randomly Censored Data,” International Statistical Review / Revue Internationale deStatistique, 52(3), pp. 263–281.URL: http://www.jstor.org/stable/1403046

Sheth, R., Brandt, M., Brewer, E., Nuchtern, J., Kale, A., & Goldstein, S. (2002), “Permanenthemodialysis vascular access survival in children and adolescents with end-stage renal disease,”Kidney international, 62(5), 1864–1869.

Stigler, S. M. (1994), “Citation Patterns in the Journals of Statistics and Probability,” StatisticalScience, 9(1), pp. 94–108.URL: http://www.jstor.org/stable/2246292

Struthers, C. A., & Kalbfleisch, J. D. (1986), “Misspecified proportional hazard models,”Biometrika, 73(2), 363–369.URL: http://biomet.oxfordjournals.org/content/73/2/363.abstract

16

Page 17: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Woods, J., Turenne, M., Strawderman, R., Young, E., Hirth, R., Port, F., & Held, P. (1997), “Vas-cular access survival among incident hemodialysis patients in the United States,” Americanjournal of kidney diseases, 30(1), 50–57.

Wu, L., & Gilbert, P. B. (2002), “Flexible Weighted Log-Rank Tests Optimal for Detecting Earlyand/or Late Survival Differences,” Biometrics, 58(4), pp. 997–1004.URL: http://www.jstor.org/stable/3068543

Xu, R., & O’Quigley, J. (2000), “Estimating average regression effect under non-proportional haz-ards,” Biostatistics, 1(4), 423–439.URL: http://biostatistics.oxfordjournals.org/content/1/4/423.abstract

17

Page 18: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Appendices

A Survival Trees Algorithm

The following summarizes the procedure proposed by LeBlanc & Crowley (1993). We added thecomponent on selecting the right-sized tree based on cross-validation as is typical of the standardCART algorithm (Breiman, Friedman, Olshen & Stone 1984).

A.1 Definitions

Definition 1. A binary tree consists of a finite non-empty set T of positive integers, 1, 2, . . . , q andtwo functions, left(·) and right(·), from T to T ∪{0}, which satisfy the following properties for eachh ∈ T : (1) left(h) > h and right(h) = left(h) + 1 or left(h) = right(h) = 0, and (2) for all h in Tthere is at most one u ∈ T such that h = right(u) or h = left(u).

Each element of T is called a node. The definitions of mother, daughter, sibling, internal, andterminal nodes are obvious.

Definition 2. A tree T1 is a subtree of T if T1 is a tree with the same root node as T , and forevery h ∈ T1, h is in T . We denote this T1 � T .

Definition 3. A tree Th is called a branch of T if Th is a tree with root node h ∈ T and alldescendants of h in T are descendants of h in Th.

Definition 4. Define the complexity of a tree T as |S|, where T − T is the set of internal nodesof T and T is the set of terminal nodes of T . Call α ≥ 0 the complexity parameter and definesplit-complexity Gα(T ) as

Gα(T ) = G(T )− α|S|,

where G(T ) is the sum over all standardized splitting statistics, G(h), in the tree T :

G(T ) =∑h∈S

G(h).

Call G(T ) the goodness of split statistic of T .

Definition 5. T1 is an optimally pruned subtree of T for complexity parameter α if

Gα(T1) = maxT ′�T

Gα(T ′),

and it is the smallest optimally pruned subtree of T if T1 � T ′ for every optimally pruned subtreeT ′ of T . Let T (α) denote the smallest optimally pruned subtree of T with respect to α.

A.2 Splitting Algorithm

We consider splits on a single covariate as opposed to splits on linear combinations of predictors andboolean combination splits that are possible in the CART algorithm. Suppose we have P numberof covariates. Then, {xip : i = 1, . . . , n, p = 1, . . . , P} is the sample space we can perform our splitover. If Xi is an ordered variable, then potential splits are of the form Xi ≤ c where c can take

18

Page 19: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

on any possible value. Typically, the c’s are taken to be the midpoint of the observed Xi’s in thenode under consideration. If Xi is a nominal variable that take on values in B = {b1, . . . , br}, thenpotential splits are of the form Xi ∈ S where S ⊂ B.

Let G(s, h) be a standardized two-sample statistic that measures discrepancy between the twogroups induced by split s in node h. The best split, s∗, is the split such that

G(s∗, h) = maxs∈Sh

G(s, h),

where Sh is the set of all possible splits of node h. If s∗ is not unique, then one of the maximal splitsis arbitrarily chosen. This procedure is repeated recursively until a maximal-sized tree is grown.Note that restrictions on the nodes may be imposed, such as the minimum number of observationsat risk and/or the minimum number of events observed.

A.3 Pruning Algorithm

After growing a maximal-sized tree, pruning is required to obtain a reasonably sized tree that doesnot overfit the data. For any nonterminal node h of a nontrivial tree T0, consider the branch Th.Define the function g(h), h ∈ T0,

g(h) = G(Th)|Sh| if h ∈ S0,

= +∞ otherwise,

where S0 = T0 − T0. Then the weakest link h0 in T0 is the node for which

g(h0) = minh∈T0

g(h).

Let α1 = g(h0) and let T1 be the tree obtained by pruning off branch Th0. Let α2 = g(h1) =

minh∈T1 g(h) and let T2 be the tree obtained by pruning off branch Th1. Repeat to obtain the

nested sequence of subtrees Tm ≺ · · · T1 ≺ T0, where Tm is the root node, and the sequence∞ > αm > · · · > α2 > α1 > 0. Note that Tk is optimal for αk ≤ α < αk+1, k = 1, . . . ,m; T0 isconsidered optimal for α < α1.

The above pruning algorithm from LeBlanc & Crowley (1993) is one of many possible pruningalgorithms. Mingers (1989) compares alternative pruning methods for general decision trees suchas pruning based on a specified critical value for the two sample statistic.

A.4 Selection of a Pruned Subtree

Based on the above pruning algorithm, we obtain a sequence of optimally pruned subtrees T1, . . . , Tmcorresponding to complexity parameters α1, . . . , αm. The goal is to obtain a “right-sized” tree thatdoes not overfit yet still does a decent job of splitting observations into groups that are similar. Oneway about this is to select final tree based on the split-complexity. However, the sample’s Gα(Tk)is a biased estimate of split-complexity since the data used to compute it is also used to grow andprune the trees. If data are readily available, the best approach in choosing the right-sized optimalsubtree would be to run an independent validation sample down the optimally pruned subtrees. Foreach subtree, compute the split-complexity and choose α that corresponds to the maximum split-complexity; the corresponding tree is the “right” sized optimal tree. However, in most situations,

19

Page 20: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

an indepedent validation data set is not readily available. LeBlanc & Crowley (1993) suggests usingthe bootstrap to remove bias in the goodness of split statistic for each optimal subtree, and a priorichoosing and using a penalty αc that is meaningful. For example, a penalty αc = 4 correspondsroughly to the 0.05 level significance level for each split and αc = 2 corresponds to the AIC.

For the selection of a right-sized pruned subtree, we propose to follow the CART (Breimanet al. 1984) convention by using cross-validation to obtain an honest estimate of the Gα(Tk)’s.That is, after growing and obtaining a sequence of optimally pruned subtrees based on the fullsample, divide the data set into V roughly equal-sized data sets. For each v = 1, . . . , V , we growand obtain a sequence of optimally pruned subtrees using all observations except those in the vth

data set Lv by the same manner as before. Let T v1 , . . . , Tvmv be the optimally pruned subtrees with

corresponding complexity parameters αv1, . . . , αvmv , grown and pruned using all observations not in

Lv. Set α′k =√αkαk+1 for k = 1, . . . ,m− 1, the geometric midpoint of the complexity parameters

corresponding to the optimally pruned subtrees grown using the full data set, and α′m = αm.Recall that Tk is optimal for αk ≤ α < αk+1. For each v = 1, . . . , V , and for each k = 1, . . . ,m,select Tkv such that αvkv ≤ α′k < αvkv+1; if α′k < αv1, then select T v0 . Compute the split-complexityGα′

k(T vkv) using Lv; if any internal nodes of T vkv contain zero observations when running Lv down

the tree, let G(s, h) for that node comparison be zero. Estimate Gα′k(Tk) by taking the average of

Gα′k(T vkv) over v. Choose α′k that corresponds to the maximum estimate of Gα′

k(Tk) as the optimal

complexity parameter, and choose the corresponding Tk as the optimally pruned tree. The numberof cross-validation is at the user’s discretion and is usually dependent on the total sample size.

Note that the 1-SE (standard error) rule can also be applied to emphasize parsimony in treeselection, where the SE can be estimated based on the variability of Gα′

k(T vkv) over v.

20

Page 21: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

List of Figures

1 Bootstrap standard error comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Survival curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Censoring curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

21

Page 22: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 1: Bootstrap standard error of βCR compared with analytic standard error not accountingfor the variability induced by the estimation of SC . For each scenario, the procedure was repeated600 times (as opposed to 1,000 replicates), where at each iteration, 200 bootstrap samples weredrawn. The symbol � corresponds to analytic standard error, and 4 corresponds to the bootstrapSE.

22

Page 23: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Time from study start (years)

Sur

viva

l Pro

babi

lity

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.2

0.4

0.6

0.8

1.0SA fistulaVT fistulaProsthetic graft

SA fistula SA fistula 401 (0) 234 (99) 35 (170) 0 (183)VT fistula VT fistula 111 (0) 44 (42) 2 (62) 0 (63)

Prosthetic graft Prosthetic graft 1030 (0)

1542 (0)

611 (265)

889 (406)

44 (594)

81 (826)

1 (617)

1 (863)Total

Figure 2: Survival curves.

23

Page 24: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Censoring Curves

Time (years)

Pro

babi

lity

of N

ot C

enso

red

Figure 3: Censoring curves.

24

Page 25: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

List of Tables

1 Results of the simulation study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Censoring Scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Patient characteristics by access type. . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Adjusted hazard ratio estimates and 95% confidence intervals based on the naive

estimator and the censoring-robust estimator. . . . . . . . . . . . . . . . . . . . . . . 28

25

Page 26: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Table 2: Censoring Scenarios. Scenario 1 is that of censoring by truncation. Scenario 2 is that ofa single censoring mechanism. Scenarios 3–4 is that of two censoring mechansims dictacted by thecovariates. Scenarios 5–6 has censoring depend on Z in a parametric relationship. Scenario 7 isthat of crossing hazards.

Case Censoring

1 C = 42 C ∼ power-function(4, 1)3 C ∼ power-function(4, 1 + 0.5× I{z1 = 1, z2 ≤ 0.5})4 C ∼ power-function(4, 1 + 2.0× I{z1 = 1, z2 ≤ 0.5}})5 C ∼ power-function(4, exp{−z2})6 C ∼ power-function(4, exp{−0.5× z1 − 0.5× z2 − z1 × z2})7 Z1 = 0 : λC(t|Z) = 1× I{t ≤ 0.5}+ 0.360× I{0.5 < t ≥ 1}+ 1× I{1 < t ≤ 4}

Z1 = 1 : λC(t|Z) = 0.516× I{t ≤ 0.5}+ 1× I{0.5 < t ≤ 4}

26

Page 27: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Table 3: Patient characteristics by access type.

Variable SA fistula (n=401) VT fistula (n=111) Prosthetic graft (n=1030)

Age 58.42 ± 16.60 65.15 ± 14.94 63.29 ± 14.49BMI 26.13 ± 14.82 25.10 ± 6.18 28.30 ± 21.27Gender

Male 281 (70%) 68 (61%) 475 (46%)Female 120 (30%) 43 (39%) 555 (54%)

RaceCaucasian 255 (64%) 76 (68%) 583 (57%)African-Americans 105 (26%) 25 (23%) 366 (36%)Others (mainly Asians) 38 (9%) 9 (8%) 74 (7%)NA 3 (1%) 1 (1%) 7 (1%)

SmokingNonsmoker 193 (48%) 58 (52%) 536 (52%)Former smoker 110 (27%) 32 (29%) 317 (31%)Current smoker 69 (17%) 13 (12%) 116 (11%)NA 29 (7%) 8 (7%) 61 (6%)

DiabetesNo 205 (51%) 56 (50%) 420 (41%)Yes 188 (47%) 52 (47%) 582 (57%)NA 8 (2%) 3 (3%) 28 (3%)

Serum albumin 3.58 ± 0.62 3.45 ± 0.54 3.45 ± 0.56

27

Page 28: Censoring-robust estimation in observational …dgillen/Papers/NguyenGillen_Censoring...Censoring-robust estimation in observational survival studies: Assessing the relative e ectiveness

Table 4: Adjusted hazard ratio estimates and 95% confidence intervals based on the naive estimatorand the censoring-robust estimator.

Adj. Hazard Ratio (naive) Adj. Hazard Ratio (censoring-robust)Age 1.003 (0.997–1.008) 1.001 (0.994–1.008)BMI 0.996 (0.991–1.002) 0.996 (0.990–1.002)Female 1.150 (0.991–1.335) 1.248 (1.034–1.506)Race: African-Americans 1.131 (0.967–1.323) 1.206 (1.012–1.437)Race: Others 0.808 (0.595–1.099) 0.537 (0.265–1.091)Diabetes 1.060 (0.912–1.231) 1.077 (0.892–1.301)Serum albumin 0.823 (0.725–0.935) 0.849 (0.738–0.977)Access type: Venous transposition fistula 1.857 (1.366–2.525) 2.080 (1.449–2.985)Access type: Prosthetic graft 1.426 (1.183–1.720) 1.640 (1.234–2.179)

28