we’ll now look at the relationship between a survival variable y and an explanatory variable x;...

13
We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and X could be white blood cell count. X is sometimes called the covariate or the regressor variable. Often there are more than just one X variables so we write X T =(X 1 , … X p ) when there are p explanatory variables. (T=transpose). We write Y x for the response Y when X=x. Def 8.1: Let Y x denote the response depending on an observed vector X=x. A proportional hazards model for Y x is h x (y)=h 0 (y)g 1 (x), where g is a postive function of x and h 0 (y) is called the baseline hazard and represents the hazard function for an individual having g 1 (x)=1. Often g 1 (x)=exp( x 1 +…+ p x p )

Upload: jessie-turner

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and X could be white blood cell count. X is sometimes called the covariate or the regressor variable.

• Often there are more than just one X variables so we write XT =(X1, … Xp) when there are p explanatory variables. (T=transpose). We write Yx for the response Y when X=x.

• Def 8.1: Let Yx denote the response depending on an observed vector X=x. A proportional hazards model for Yx is hx(y)=h0(y)g1(x), where g is a postive function of x and h0(y) is called the baseline hazard and represents the hazard function for an individual having g1(x)=1. Often g1(x)=exp(x1+…+pxp)

Page 2: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• Note how the “proportional” enters the picture (see p. 144 for definitions):

• The two hazards are for two different individuals, distinguished by the values the explanatory variables take on for them…note that the “baseline” hazard cancels out

• In the simplest case, we work with the situation where the g1 function is exp(x1+…+pxp) - it satisfies the properties g1(x) ≥ 0 and g1(0) = 1 and the baseline hazard occurs when x=0. The process of fitting this model follows the usual process of finding the best estimates of the beta values…

hX1(y)

hX 2(y)

=h0(y)g1(X1)

h0(y)g1(X2)=

g1(X1)

g1(X2)

Page 3: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• Then the standard proportional hazards model in Def. 8.1 becomes:

hx(y)=h0(y) exp(x1+…+pxp) • Then the baseline hazard is when x=0 (all covariates=0)• We’ll then estimate the betas using the given responses

and covariates…NOTE: The hazard on the left equals the product of two

functions: the baseline hazard (which doesn’t involve the covariates) and the other factor (which doesn’t involve the survival time y).

• This is called the Cox proportional hazards model and good estimates of the betas and the hazard and survival curves can be obtained in many different and varied situations ; i.e., this model is very robust. It is called semiparametric since we don’t have to assume a particular model for the survival function.

Page 4: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• Let’s look Example 8.1, where there is only one covariate, namely “group” (usually control and experimental are the only two values). The proportional hazard (or the hazard ratio) is

• So, if we could get an estimate of call it -hat), we could then have an estimate of the hazard ratio between two individuals in the two groups ; i.e., exp(-hat) so we could say that

hX =1(y)

hX = 0(y)=

h0(y)exp(β (1))

h0(y)exp(β (0))= exp(β − 0) = exp(β )

hX =1(y) = exp( ˆ β )hX = 0(y)

Page 5: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• Note on page 145 in (8.3) that the proportional hazards model has a so-called “power” effect on the baseline survival function:

• Here

• Example 8.1 shows the effect of a single covariate X=group:

• Notice also that the ratio of two hazards cancels out the baseline hazard and leaves a function that is constant over time.

Sx (y) = S0(y)[ ]exp(β1x1 +...+β p x p )

S0(y) = exp(− h0(u)du0

y

∫ )

S1(y) = S0(y)[ ]exp(β )

Page 6: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• SAS has a procedure that easily estimates the betas in the proportional hazards model - for example, in the remission times data:

proc phreg; model remtime*censor(0)=grp; run;/* or if we put a second covariate in */proc phreg; model remtime*censor(0)=grp logWBC; run;/*note the use of the numeric variable grp defined as grp=1 if group=“pl” and 0 otherwise… */

Page 7: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• Now let’s consider the remission data example in more detail…get the SAS output for the 3 models:– grp only (model 1)– grp and logWBC (model 2)– grp, logWBC, and interaction term grp*logWBC (model 3)

• For each model, we’ll do three things:– do a statistical test of the null hypothesis beta=0– get an estimate of the hazard ratio for each beta– get a 95% confidence interval for the hazard ratio

• There are two statistics we can compute to do a significance test of the betas:– the Wald statistic is the quotient of the estimator (beta-hat)

divided by the standard error of the estimator. This statistic is approximately standard normal and the p-value is obtained from the normal table.

Page 8: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

– the second statistic is the so-called likelihood ratio (LR) statistic and is used to compare the models

• Use this statistic to compare model 3 with model 2; i.e., is the interaction term significant?– the Wald statistic is -.342/.520 = -.66. The null hypothesis

being tested is that beta=0 (for the coefficient of the interaction term) Use the normal table to see that 2*P(Z<-.66)=2(.2554)=.5108

– the LR statistic is computed as the difference between LRs of the two models, LR(model 2) - LR(model 3) = 144.559 - 144.131 = .428. Now consider this as chi-square with 1 d.f. (one parameter difference between the two models) under the null hypothesis that the interaction term has coefficient zero and we have P(chisq(1) > .428) = .513

Page 9: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• Notice that in each of the three printouts, there is a section giving values of a three test statistics testing the so-called “Global Null Hypothesis: BETA=0” . In this case, the BETA=0 refers to the vector of all the betas: The likelihood ratio chi-square statistic is obtained from the two -2LOG(L) statistics subtracted (the one w/out covariates {no x’s} minus the one with covariates). If the null hypothesis is true, then this chi-square will have d.f. equal to the number of covariates in the model.

• This same difference in log(likelihoods) can be used to compare any two models - the statistic is chi-square with the number of d.f. is the difference in # of covariates, assuming the null hypothesis of the “extra” betas = 0 is true.

1 = β 2 = ... = β p = 0

Page 10: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• Now let’s look at the HRs in each of the three models…• In model 1, the HR is estimated to be 4.523 (from SAS).

Let’s see how this is done… we’ve seen that

so if X=1 is the placebo group, then the maximum likelihood estimate of beta = 1.50919 (from SAS), so exp(1.50919) = 4.523066 is the estimated hazard ratio. This means that the hazard for an individual in the placebo group is more than 4.5 times greater than an individual in the treatment group (at all times) ignoring logWBC.

hX =1(y)

hX = 0(y)=

h0(y)exp(β (1))

h0(y)exp(β (0))= exp(β − 0) = exp(β )

Page 11: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• Consider Model 2’s hazard ratios…

and

If we had a significant interaction term the estimated HR could be

hX =1& logWBC (y)

hX = 0& logWBC (y)=

h0(y)exp(1.29405(1) +1.60432logWBC)

h0(y)exp(1.29405(0) +1.60432logWBC)= 3.647529

hX =1& logWBC +1(y)

hX =1& logWBC (y)=

h0(y)exp(1.29405(1) +1.60432(logWBC +1))

h0(y)exp(1.29405(1) +1.60432(logWBC))= 4.974476

hX =1& logWBC +int (y)

hX = 0& logWBC +int (y)=

h0(y)exp(2.35494(1) +1.80279(logWBC) − .34220*1* logWBC)

h0(y)exp(2.35494(0) +1.80279(logWBC) − .34220* 0* logWBC)

Page 12: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• To get confidence intervals around the estimated HRs, we use the Wald statistic +/- 1.96 * SE(Wald) to get confidence intervals for the beta-hats - then exponentiate the interval to get Cis for the HRs.

• To get the adjusted survival curves for the two groups (adjusted for the covariates - i.e., use the model 2), we use the baseline option in proc phreg

proc phreg; model remtime*censor(0)=grp logWBC; title “Model 2”;

baseline out=a survival=s upper=ucl lower=lcl ;

proc print data=a; run; quit;

Page 13: We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and

• To get the adjusted survival curves for specific values of the covariates, first create a dataset with the values you want to consider and then use the covariate option as follows:

…data b; grp=1; logWBC=2.93; run;…proc phreg data=remission; model remtime*censor(0)=grp logWBC; baseline out=a survival=s upper=ucl lower=lcl covariates=b/nomean;

proc print data=a; run; quit;