simulation study for extended auc in disease risk prediction in survival analysis

24
–1 Gang Cui Sr. Biostatistician, CSCC, Department of Biostatistics, UNC at Chapel Hill 1

Upload: gang-cui

Post on 09-Jan-2017

131 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–11

Gang CuiSr. Biostatistician, CSCC, Department of Biostatistics, UNC at Chapel Hill

–11

Page 2: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–22

Background

Risk Prediction Application in Public HealthFramingham Heart Study: Risk Assessment Tool for Estimating 10-year

Risk of Developing CHD ARIC (Atherosclerosis Risk in Community Study): CHD, Stroke, and D

iabetes Risk Calculator at CSCC

Useful in intervention trials, estimating the population burden of disease, and designing prevention strategies.

Need to be evaluated to incorporate new data (clinical, environmental, and genetic).

–22

Page 3: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–33

IntroductionCox Proportional Hazard Model to account for both censored

and uncensored dataAssumptions: The survival time of each member of a population is assumed to follow its

own hazard function is a baseline hazard function, Xi is the vector of explanatory variables for the ith individual,

is the vector of unknown regression coefficient. The vector is assumed to be the same for all individuals.

The survivor function can be expressed as where, is the baseline survival function.

SAS procedure: PROC PHREG

–33

)'exp()()|()( 0 βXX ittt iii

)(0 t

β β

)exp(0 )]([)|( βX'iX tStS ii

))(exp()(0 00 t

duutS

Page 4: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–44

Introduction(Cont)However, statistically significant association is NOT enough.Need measure to assess and quantify the improvement in risk

prediction by new models?Interest of measures in evaluating prediction model:

AUC (Area Under ROC Curve), ROC stands for Receiver Operating Characteristic

Extended AUCSensitivity/Specificity given Corr( , T)Others

–44

βX'i

βX'

Page 5: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–55

AUC(T) (Area Under ROC at time T) – measure of accuracy: probability that a person with disease onset has higher score than a person without such onset, P( Zi>Zj|Di(T)=1 & Dj(T)=0), where, D(T) is the indicator of disease or not by the time of T and z= (Note: usually defined without regard to T. The present estimator is calculated for any follow-up time T, and account for censoring in the estimation)

–55

βX'

Introduction(Cont)

AUC(T) 1 0.9-1 0.8-0.9 0.7-0.8 0.6-0.7 0.5-0.6

Prediction accuracy

perfect excellent good fair poor fail

Page 6: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–66

Our Parameters of Interest:Extended AUC: probability of i-th person score z being greater

than j-th person given the i-th person has the event onset before certain time T0 (time of interest, say 10 year) and earlier than the jth person,

Correlation Coefficient between score and event time given the event time less than the time of interest (T0)

)TTTTZP(Z 0ijiji |

–66

)|()|()|,()|,(

00

00 TTTVARTTZVAR

TTTZCOVTTTZCORR i

Introduction(Cont)

Page 7: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–77

Question

I. How to estimate Extended AUC and CORR(Z,T|T< T0)?Two estimators for Extended AUC – counting method and

survival analysis method.One estimator for CORR(Z,T|T< T0)

II. How good are the estimators?Answered by comparing the estimates with true value from

simulated data.

–77

Page 8: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–88

MethodsSimulate data based on three model assumptions of

independent variable(s) and event time distributions

For each simulated data, we calculate the estimates of Extended AUC and Corr(Z,T|T0) and the true values.

Independent Variable(s) Event time distribution

Assumption 1 X ~N(0,1) Exponential

Assumption 2 X ~N(0,1) Weibull

Assumption 3 X1 normal conditional on X2, with mean varying by X2X2 binomial

Exponential

–88

Page 9: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–99

Methods- Extended AUC Estimating Extended AUC (θ): counting and survival analysis

methodsI. Counting Method

Denominator: the number of pairs for which one had event at time (Ti) before T0, and the other with event time Tj greater than Ti.

Numerator: the number of pairs among the denominator for which the order of Zi and Zj is the reverse of the order of Ti and Tj

)TTT|Z(ZP̂ˆ0ijiji T

–99

Page 10: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1010

II. Survival Analysis Method

Denominator: it can be derived that , where

Numerator: it can be derived that

S(T|Z), cummulative survival density function, conditional on Z g(T|Z), conditional density function of event time. h(Z), density function of

)TP(T)TT|TP(T)TTTTZP(Z

)TTTP(T)TTTTZP(Z

)TTTT|ZP(Z0i0iji

0ijiji

0iji

0ijiji0ijiji

)](1[21)()|( 0

200 TSTTPTTTTP iiji

iii dZZhZTSTS

)()|()( 0

jiZj i

T

jiiijiijiji dZdZdTZhZhZTgZTSTTTTZZP

0

00 )()()|()|()(

–1010

Methods- Extended AUC

βX'

Page 11: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1111

II. Survival analysis method(cont)From Riemann-Stieltjes Integral, we can rewrite as

EXTAUC can be estimated as

The h(Z)dZ integral is an expected value of z and estimated by averaging the sample of Z

is the fitted survival function, is the average of I (Zi>Zj) is the indicator function with value 1 if Zi>Zj, 0 otherwise. n is the sample size and k is the number of total event before T0

–1111

Methods- Extended AUC

jiZj

T

jiiijiijiji dZdZZhZhZTdGZTSTTTTZZP

0

00 )()()}|()|({)(

)()]}|(ˆ)|(ˆ[)|(ˆ{1)(ˆ1

2ˆ1

1 122

0

jiididdi

n

i

n

ij

k

djdj ZZIZTSZTSZTS

nTS

)(ˆ0TS

)|(ˆ ZTS

)|(ˆ0ZTS

Page 12: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1212

Calculating true EXTAUC by numerical integrationAssuming cummulative survival density function S(T|Z),

event time density function g(T|Z), and xbeta density function h(Z) are known as specified in simulation

)](1[21

)()()|()|(

02

0

0

TS

dZdZdTZhZhZTgZTS iji

T

ijiijiZj

1212

Methods- Extended AUC

Numerical Numerical IntegrationIntegration

Page 13: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1313

Method-Corr(Z,T|T0)Estimating – Survival Analysis Method

By definition

Conditional joint density function can be derived as

–1313

)|()|()|,(

00

0T|TZ, 0 TTTVARTTZVAR

TTTZCOV

)}|()|()}{|()|({

)|()|()|(

02

02

02

02

000

TTTETTTETTZETTZE

TTTETTZETTZTE

)()()|()|,(

00 TG

ZhZTgTTTZf

0T|TZ,

Page 14: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1414

Estimating – Survival Analysis Method Now can estimate all the pieces needed to calculate

where, , therefore can be estimated as

–1414

dZZZhZTSTG

dZZZhdTZTgTG

dZdTTG

ZhZTgZdZdTTTTZfZTTZE

T

TT

)()]|(1[)(

1)(])|([)(

1)(

)()|(*)|,(*)|(

00

00

00

0 00

0

00

n

jjj ZZTS

TSnTTZE

10

00 )]|(ˆ1[

)](ˆ1[1)|(ˆ

dZdTZhZTgTTPTGT

0

000 )()|()()( )(ˆ1)(ˆ00 TSTG

0T|TZ,0T|TZ,

Method-Corr(Z,T|T0)

)|( 0TTZE

Page 15: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1515

Estimating – Survival Analysis Method Similarly

For and

similarly

–1515

dZZhdTZTgTTG

dZdTTTTZfTTTTETT

)(])|([)(

1)|,(*)|( 00

00

0 00

n

jjj ZZTS

TSnTTZE

1

20

00

2 )]|(ˆ1[)](ˆ1[

1)|(ˆ

n

j

k

ijijii ZTSZTST

TSnTTTE

1 11

00 )]|(ˆ)|(ˆ[

)](ˆ1[1)|(ˆ

n

j

k

ijijii ZTSZTST

TSnTTTE

1 11

2

00

2 )]|(ˆ)|(ˆ[)](ˆ1[

1)|(ˆ

0T|TZ,

Method-Corr(Z,T|T0)

)|( 0TTTE )|( 02 TTTE

Page 16: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1616

Estimating – Survival Analysis Methodsimilarly

therefore,

–1616

n

j

k

ijijiji ZTSZTSZT

TSnTTZTE

1 11

00 )]|(ˆ)|(ˆ[

)](ˆ1[1)|(ˆ

)}|(ˆ)|(ˆ)}{|(ˆ)|(ˆ{

)|(ˆ)|(ˆ)|(ˆ

02

02

02

02

000|, 0

TTTETTTETTZETTZE

TTTETTZETTZTEr TTTZ i

0T|TZ,

Method-Corr(Z,T|T0)

Page 17: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1717

Calculating true by numerical integrationAssuming cummulative survival density function S(T|Z),

event time density function g(T|Z), and xbeta density function h(Z) are known as specified in model assumptions

–1717

0T|TZ,

)}|()|()}{|()|({

)|()|()|(

02

02

02

02

000T|TZ, 0 TTTETTTETTZETTZE

TTTETTZETTZTE

Method-Corr(Z,T|T0)

Page 18: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1818

Calculating true by numerical integration

–1818

0T|TZ,

dZZhdTZTgTTG

dZdTTTTZfTTTTETT

)(])|([)(

1)|,(*)|( 00

00

0 00

dZZhdTZTgTTG

dZdTTTTZfTTTTETT

)(])|([)(

1)|,(*)|( 00

0

2

00 0

20

2

dZZhZdTZTgTTG

dZdTTTTZfZTTTZTETT

)(])|([)(

1)|,(*)|( 00

00

0 00

Method-Corr(Z,T|T0)

dZZZhZTSTG

TTZE )()]|(1[)(

1)|( 00

0

dZZhZZTSTG

dZdTTTTZfZTTZET

)()]|(1[)(

1)|,(*)|( 20

00 0

20

2 0

Page 19: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–1919

Result

–1919

*Model Assumption

Spl_size n_sim Mean of Estimated EXTAUC

Estimated EXTAUC

STD

True EXTAUC Bias (=Estimate - True)

1 100 800 0.604900 0.057083 0.607174 -0.002274

1 500 800 0.606580 0.025622 0.607174 -0.000594

2 100 1000 0.603647 0.047803 0.606972 -0.003325

2 500 1000 0.608658 0.021580 0.606972 0.001685

3 100 850 0.641936 0.048242 0.621508 0.020428

3 500 850 0.641398 0.022373 0.621508 0.019890

Extended AUC – Survival Analysis Method

*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential

Page 20: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–2020

ResultExtended AUC – Counting Method

*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential

–2020

*Model Assumption

Spl_size n_sim Mean of Estimated EXTAUC True EXTAUC Diff (=Estimate - True)

1 100 800 0.613986 0.607174 0.006812

1 500 800 0.608614 0.607174 0.001439

2 100 1000 0.612707 0.606972 0.005735

2 500 1000 0.610470 0.606972 0.003498

3 100 850 0.649875 0.621508 0.028367

3 500 850 0.642890 0.621508 0.021382

Page 21: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–2121

Result

–2121

*Model Assumption

Spl_size n_sim Mean of Estimated EXTAUC (Survival)

Estimated EXTAUC Mean (Counting)

Diff

1 100 800 0.604900 0.613986 -0.009086

1 500 800 0.606580 0.608614 -0.002033

2 100 1000 0.603647 0.612707 -0.009060

2 500 1000 0.608658 0.610470 -0.001812

3 100 850 0.641936 0.649875 -0.007939

3 500 850 0.641398 0.642890 -0.001492

Extended AUC – Comparison of two methods

*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential

Page 22: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–2222

Result

–2222

*Model Assumption

Spl_ size n_ sim

STD Bias MSE STD Bias MSE

1 100 800 0.057083 -0.002274 0.057088 0.065914 0.006812 0.065960

1 500 800 0.025622 -0.000594 0.025622 0.028968 0.001439 0.028970

2 100 1000 0.047803 -0.003325 0.047814 0.060650 0.005735 0.060683

2 500 1000 0.021580 0.001685 0.021583 0.027041 0.003498 0.027054

3 100 850 0.048242 0.020428 0.048659 0.056477 0.028367 0.057282

3 500 850 0.022373 0.019890 0.022768 0.026539 0.021382 0.026996

Estimated EXTAUCMean (Survival)

Estimated EXTAUC Mean (Counting)

Extended AUC – Comparison of two methods

*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential

Page 23: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–2323

ResultCorr(Z,T|T0)

*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential

–2323

*Model Assumption

spl_size n_sim Est. Corr. Mean True Corr. Bias (=Estimate - True)

1 100 800 -0.032053 -0.028371 -0.003682

1 500 800 -0.028618 -0.028371 -0.000246

2 100 1000 -0.038438 -0.035328 -0.003110

2 500 1000 -0.035660 -0.035328 -0.000333

3 100 850 -0.059517 -0.052803 -0.006715

3 500 850 -0.053330 -0.052803 -0.000527

Page 24: Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis

–2424

ConclusionsI. Bias of these estimators are small relative to estimates.II. Large sample size provide better estimate than small

sample size.III. For extended AUC, difference between estimates of

two methods are small relative to the estimate, while counting method is slightly more biased than survival analysis method.

–2424