improving on first-order inference for cox regression
DESCRIPTION
IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION Donald A. Pierce, Oregon Health Sciences Univ Ruggero Bellio, Udine, Italy. These slides are at www.science.oregonstate.edu/~piercedo/. - PowerPoint PPT PresentationTRANSCRIPT
UW Winter 07
1
IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION
Donald A. Pierce, Oregon Health Sciences Univ
Ruggero Bellio, Udine, Italy
These slides are at www.science.oregonstate.edu/~piercedo/
UW Winter 07
2
Nearly all survival analysis uses first-order asymptotics: limiting distns of MLE, LR, or scores ; interest here only on Cox regression, partial likelihood
Usually these approximations are quite good, but it is of interest to verify this or improve on them (Samuelsen, Lifetime Data Analysis, 2003)
We consider both higher-order asymptotics and more direct simulation of P-values
Primary issue: inference beyond first-order requires more than the likelihood function
This may lead to unreasonable dependence of methods on the censoring model and baseline hazard
Our approach involves forms of conditioning on censoring
UW Winter 07
3
Consider direct simulation of P-values without this (same issues arise in higher-order asymptotics)
One must estimate the baseline hazard, sample failure times according to this, then apply the censoring model which may involve estimating a censoring distribution
Quite unattractive in view of the essential nature of Cox regression
With suitable conditioning, and some further conventions regarding the censoring model, this can be avoided
Aim is to maintain the rank-based nature of inference in the presence of censoring (simulation: sample failures from exponential distn, apply censoring to ranks)
We provide convenient Stata and R routines for carrying out both the simulation and higher-order asymptotics.
UW Winter 07
4
COX REGRESSION: Hazards of form , with unspecified. Interest parameter a scalar function of with remaining coordinates as nuisance parameters
( ; , ) ( ) zt z t e
XO
X
OO
X
(1)t (2)t
Risk set : those alive at failure time iR ( )itMultinomial likelihood contribution , the probability that it is individual (i) among these that fails.
( )( ) /i
i ji j
z zL e e
R
Partial likelihood ( ) ( )i iL L
( )
UW Winter 07
5
Useful reference sets for interpreting a given dataset:
(i) data-production frame of reference
(ii) conditional on “censoring configuration”
(iii) treating all risk sets as fixed
Using (i) involves censoring model, estimation of baseline hazard and censoring distribution (see Dawid 1991 JRSSS-A regarding data-production and inferential reference sets)
That of (ii) requires some development/explanation. By “censoring configuration” we mean the numbers of censorings between successive ordered failures
Approach (iii) is not really “conditional”, but many may feel this is the most appropriate reference set --- things are certainly simple from this viewpoint. Applies when risk sets arise in complicated ways, and to time-dependent covariables
UW Winter 07
6
EXTREME* EXAMPLE TO SHOW HOW THINGS ARE WORKING:
n = 40 with 30% random censoring, log(RR) interest parameter 1.0 with binary covar -- 5 nuisance params in RR involving exponential covariables . Hypotheses where one-sided Wald P-value is 0.05 * 6 covariables with <30 failures
results typical for datasets Lower Upper
LR first order 0.046 0.062
Data production, exact (simulation) 0.090 0.020
Conditional, exact (simulation) 0.103 0.024
Conditional, 2nd order asymptotics 0.096 0.025
Fixed risk sets, exact (simulation) 0.054 0.051
Fixed risk sets, 2nd order asymptotics 0.052 0.052
UW Winter 07
7
With fewer failures, and fewer nuisance parameters, adjustments are smaller and thus harder to summarize.
Lower Upper
LR first order 0.042 0.065
Data production, exact (simulation) 0.053 0.040
Conditional, exact (simulation) 0.054 0.037
Conditional, 2nd order asymptotics 0.060 0.043
Fixed risk sets, 2nd order asymptotics 0.047 0.051
However, the following for a typical dataset shows the essential nature of results. This is for n = 20 with 25% censoring, interest parameter as before, and only 1 nuisance parameter.
Samuelsen’s conclusion, that in small samples the Wald and LR confidence intervals are conservative, does not seem to hold up with any useful generality
UW Winter 07
8
CONDITIONING ON “CENSORING CONFIGURATION”
That is, on the vector , where is the number censored following the jth ordered failure
Seems easy to accept that this is “ancillary” information, for inference about relative risk when using partial likelihood. It could be that “ancillary” is not the best term for this (comments please!!)
The further convention involved in making this useful pertains to which individuals are censored
Our convention for this: in martingale fashion, sample from risk sets the to be censored, with probabilities possibly depending on covariables (comments please!!)
Unless these probabilities depend on covariables, a quite exceptional assumption, results of Kalbfleisch & Prentice (1973 Bka) apply: partial likelihood is the likelihood of “reduced ranks”
0 1( , , , )kq q q q jq
jq
UW Winter 07
9
Recall that a probability model for censoring is often (but with notable exceptions) sort of a “fiction” concocted by the statistician, with following aims
A common model is that for each individual there is a fixed, or random, latent censoring time and what is observed is the minimum of the failure and censoring time
Leads to usual likelihood function: product over individuals of
The use of censoring models is usually only to consider whether this likelihood is valid (censoring is “uninformative”) --- model is not used beyond this
But usual models as above render the problem not to be one only involving ranks, whereas our conditioning and convention maintain the rank-based inferential structure
1( ) ( )i ic ci i if t pr T t
UW Winter 07
10
“Reduced ranks”, or marginal distribution of ranks, concept – individual 3 is here censored
x
x
O
x
Compatible ranks for uncensored data – the single “reduced ranks” outcome
2, 3, 4, 1
2, 4, 3, 1
2, 4, 1, 3Partial likelihood, as a function of the data, provides the distribution of these reduced ranks
UW Winter 07
11
Thus with our conditioning and convention, and no direct dependence of censoring on covariates, the K&P result yields that the partial likelihood is the actual likelihood for the “reduced rank” data
Means that all the theory of higher-order likelihood inference applies to partial likelihood (subject to minor issues of discreteness) --- a more general argument exists for the data-production reference set
Higher-order asymptotics depend only on certain covariances of scores and loglikelihood
Either exact or asymptotic results can in principle be computed from the K&P result, but simulation is both simpler and more computationally efficient
Simulation for asymptotics is considerably simpler than for exact results (no need to fit models for each trial), but many will prefer the latter when it is not problematic
UW Winter 07
12
SIMULATION OF P-VALUES:
With conditioning, one may: (i) simulate failure times using constant baseline hazard since only the ranks matter (ii) apply censoring process to the rank data, and (iii) fit the two models
Our primary aim is to lay out assumptions justifying (i) and (ii). (comments please!!)
Highly tractable, except that null and alternative model must be fitted for each trial
Quite often must allow for “infinite” MLEs, but even with this can be problematic for small samples
Primary advantage over asymptotics is the transparency
Stata procedure uses same syntax as the ordinary fitting routine, takes about a minute for 5,000 trials
UW Winter 07
13
SECOND-ORDER METHODS: This is for inference about scalar functions of the RR. It involves the quantity proposed by Barndorff-Nielsen,
where r is the signed-root maximum LR, and adj involves more than the likelihood function.
Insight into limitations of first-order methods derives from decomposing this adjustment as
where NP allows for fitting nuisance parameters and INF basically allows for moving from likelihood to frequency inference.
Generally, INF is only important for fairly small samples, but NP can be important for reasonable amounts of data when there are several nuisance parameters.
*r r adj
adj NP INF
UW Winter 07
14
COMPUTATION OF THIS: Will not give (the fairly simple) formulas here, but they involve computing
where the parameter are then evaluated at the constrained and full MLEs (formulas: Pierce & Bellio, Bka 2006, 425)
These must be computed by simulation, raising the same issues about reference sets, but this is far easier than the simulation of likelihood ratios
Quantities above pertain to statistical curvature, and at least in our setting the magnitude and direction of the NP adjustment relate to the extent and direction of the curvature introduced by variation in composition of risk sets
0
0
0 1
0 1 0
cov ( ), ( )
cov ( ), ( ) ( )
l l
l l l
UW Winter 07
15
RISK SETS AS FIXED: Things simplify considerably for the inferential reference set where the risk sets are taken as fixed (and experiments on these as independent)
Use of this reference set often seems necessary when the risk sets arise in complex ways, mainly useful for inference about relative risk beyond analysis of simple response-time data
It is also quite adequate for all needs when the numbers at risk are large in relation to the number of failures (rare events).
UW Winter 07
16
FORMULAS FOR FIXED RISK SETS: In this case the setting is one of independent multinomial experiments defined on the risk sets. Following is for loglinear RR
Formulas of Pierce & Peters 1992 JRSS-B apply, yielding
Where w is the Wald statistic, and is the ratio of determinants of the nuisance parameter information at the full and constrained MLEs
May be useful in exploring for what settings the NP adjustment is important: nuisance parameter information must “vary rapidly” with the value of the interest parameter
However, these adjustments are smaller than for our other reference sets
1/ 2
1log( / )
1log( )
INF r wr
NPr
UW Winter 07
17
SAME AS FIRST EXAMPLE (5 nuisance parameters) BUT WITH:
n = 500 with 97% random censoring (fewer failures than before, namely about 15) – rare disease case
Remainder of model specification as in first example, results when Wald P-value is 0.05
typical results for a single dataset, lower limits
LR first order 0.057
Data-production refset 0.059
Conditional exact (direct simulation) 0.054
Conditional, second-order 0.054
Fixed risk sets, exact (simulation) 0.055
Fixed risk sets, 2nd order 0.052
UW Winter 07
18
OVERALL RECOMMENDATIONS:
1. Seems that adjustments will usually be small, but it may be at least worthwhile to verify that in many instances if convenient enough.
2. Will provide routines in Stata and R. The Stata one largely uses same syntax as basic fitting command.
3. When failures are a substantial fraction of those at risk, use conditional simulation of P-values unless problems with fitting are encountered
4. If those problems are likely or encountered, then use the 2nd-order methods. These also provide more insight.
5. When failures are a small fraction of those at risk, or when risk sets arise in some special way, use the asymptotic fixed risk set calculations