survival analysis - taurshamir/abdbm/pres/17/survival.pdf4 what is survival analysis? •statistical...

1

Survival Analysis Sources: •Slides: Kristin Sainani Stanford http://www.stanford.edu/~kcobb •Johnson and Shih An Introduction to Survival Analysis, Principles and Practice of Clinical Research 2E (2007) •Rich et al. A practical guide to understanding Kaplan-Meier Curves, Otolaryngology – Head and Neck Surgery (2010)

ABDBM © Ron Shamir

http://www.stanford.edu/~kcobb�



2

Overview

• Intro, terminology • Survival/hazard functions. • Kaplan-Meier curves • The LogRank test • Cox PH

ABDBM © Ron Shamir

3

Early example of survival analysis, 1669

Christiaan Huygens' 1669 curve showing how many out of 100 people survive until 86 years.

From: Howard Wainer- STATISTICAL GRAPHICS: Mapping the Pathways of Science. Annual Review of Psychology. Vol. 52: 305-335.

ABDBM © Ron Shamir

4

What is survival analysis? • Statistical methods for analyzing

longitudinal data on the occurrence of event.

• Possible events: – death, injury, onset of disease, recovery from illness,

recurrence-free survival for 5 years (binary variables) – transition above or below the clinical threshold of a

continuous variable (e.g. blood glucose level).

• Accommodates data from randomized clinical trial or cohort study design.

ABDBM © Ron Shamir

Randomized Clinical Trial (RCT)

Target population

Intervention

Control

Disease

Disease-free

Disease

Disease-free

TIME

Random assignment

Disease-free, at-risk cohort

5 ABDBM © Ron Shamir

Target population

Treatment

Control

Cured

Not cured

Cured

Not cured

TIME

Random assignment

Patient population



Target population

Treatment

Control

Dead

Alive

Dead

Alive

TIME

Random assignment

Patient population



Cohort study (prospective/retrospective)

Target population

Exposed

Unexposed

Disease

Disease-free

Disease

Disease-free

TIME

Disease-free cohort


9

Examples of survival analysis in medicine

ABDBM © Ron Shamir

10

RCT: Women’s Health Initiative (JAMA, 2002)

On hormones

On placebo Cumulative incidence

Women’s Health Initiative Writing Group.

JAMA. 2002;288:321-333.

ABDBM © Ron Shamir

11

Breast cancer and low-fat diet Control

Low-fat diet

Prentice et al. JAMA, February

8, 2006; 295: 629 - 642.

ABDBM © Ron Shamir

12

Aspirin, ibuprofen, and mortality after myocardial infarction: retrospective cohort study

Curits et al. BMJ 2003;327:1322-1323. ABDBM © Ron Shamir

Curtis et al. BMJ 2003

13

Why survival analysis? 1. Why not compare mean time-to-event

between groups using a t-test or linear regression?

-- For some patients we may not know if and when an event occurred: study terminated or we lost touch with them

2. Why not compare proportion of events in each group using risk/odds ratios or logistic regression?

--ignores time ABDBM © Ron Shamir

14

Terminology • The event of interest: the outcome sought • Time-to-event: The time from entry into a

study until a subject had the outcome • Censoring: Subjects are said to be

censored if they are lost to follow up or drop out of the study, or if the study ends before they have the outcome. They are counted as alive / disease-free for the time they were enrolled in the study. – Must assume censoring is independent of the

outcome, otherwise censoring will create bias

ABDBM © Ron Shamir

An example

ABDBM © Ron Shamir 15

Solid circles: uncensored Open: censored

Moving all start times to 0


A better view only if time homogeneity holds

Data of a hypothetical study

ABDBM © Ron Shamir 17 Johnson and Shih

18

Data Two-variable outcome : • ti = time at last disease-free observation or

time at event • ci =1 if had the event; ci =0 no event by time

ti

ABDBM © Ron Shamir

Survival function • S(t): the probability of an individual

surviving at least until time t • Usually unknown, evaluated based on a

sample • Survival experience – the empirical function


20

Cumulative survival

ABDBM © Ron Shamir

21

Probability density function: f(t) T: the event time for an individual (a random variable)

The probability of the event time occurring at exactly time t

F(t) = CDF of f(t)

S(t) = 1-F(t)

ABDBM © Ron Shamir

tttTtPtf

t ∆∆+<≤

=→∆

)(lim)(0

23

The hazard function

ttTttTtPth

t ∆≥∆+<≤

=→∆

)/(lim)(0

The probability that if you survive to t, you will succumb to the event in the next instant.

)()((t) :survival anddensity from Hazard

tStfh =

)()(

)()(

)()&()/()(

tSdttf

tTPdttTtP

tTPtTdttTtPtTdttTtPdtth =

≥+<≤

=≥

≥+<≤=≥+<≤=

Bayes’ rule

ABDBM © Ron Shamir

24

AGE ABDBM © Ron Shamir

25

A possible set of probability density, failure, survival, and hazard functions.

F(t)=cumulative failure

S(t)=cumulative survival h(t)=hazard function

f(t)=density function

ABDBM © Ron Shamir

The Kaplan-Meier curve Sorted events t1 < t2 < …< tn. No censoring. Pr(surviving to ti) = (n-i+1)/n What to do when some subjects are censored? Sorted events t1 < t2 < …< tn, di – no of events in (ti-1,ti]; ni – no of individuals at

risk (remaining in the study) in (ti-1,ti]; Pr(survival to ti)= P(surviving to ti-1) x P(surviving

interval (ti-1,ti]) = P(survival to ti-1) x (ni-di)/ni ABDBM © Ron Shamir 26

K-M or product-

limit estimator

28

K-M estimate and curve • Non-parametric estimate of the survival function • Empirical probability of surviving past certain

times in the sample (taking into account censoring). • Describes survivorship of study population/s. • Commonly used to compare two study populations. • Intuitive graphical presentation.

ABDBM © Ron Shamir

Paul Meier 1924-2011


Edward L. Kaplan

Comparing two survival curves • Two methods:

– Compare the curves at a pre-specified time point t – Compare the overall plots over the entire time range


Hormones vs Placebo Women’s Health Initiative

Writing Group. JAMA. 2002;288:321-333.

Result depends on t; tendency to pick the

“best” t

Comparing two curves: Log rank test • H0: S1(t) = S2(t) for all t • Log rank test: Use the ranks of events, not times. Sorted events t1 < t2 < …< tK, For time tj: Under H0, E(aj)=tot events x # at risk group 1/# at risk =

(aj+cj)x(aj+bj)/nj Z is approximately standard normal – evaluate p-val


Events Surviving Total

Group 1 aj bj aj+bj

Group2 cj dj cj+dj

Total aj+cj bj+dj nj

Example: breast cancer survival signature

• Caveats: – No mention of mean

survival – Visual inspection can be misleading – Must predefine the groups in advance


Van de Vijver NEJM 02

Small numbers left

Certain characteristics (age, sex, ..) can be related to survival – confounding / prognostic factors can change the relation of treatment to outcome

Need to stratify the test and compare survival differences within each level of these factors

35

Cox Proportional Hazard Model • K-M curves and Log Rank – univariate

analysis; describe survival using one categorical factor

• Cox PH: allows many prognostic factors, categorical or real-valued

• Semi-parametric • Models the effect of predictors and

covariates on the hazard rate but leaves the baseline hazard rate unspecified.

• Estimates relative rather than absolute hazard.

ABDBM © Ron Shamir

36

The model

ikki xxi etth ββλ ++= ...

011)()(

Components:

•A baseline hazard function that is left unspecified but must be positive (=the hazard when all covariates are 0)

•A linear function of a set of k fixed covariates that is exponential.

ikkii xxtth ββλ +++= ...)(log)(log 110

Can take on any form!

37

The model

)(...)(...

0

...0

,1111

11

11

)()(

)()( jkikji

jkkj

ikkixxxx

xx

xx

j

iji e

etet

ththHR −++−

++

++

=== ββββ

ββ

λλ

Proportional hazards:

Hazard functions should be strictly parallel Produces covariate-adjusted hazard ratios

Hazard for person j (eg a non-smoker)

Hazard for person i (eg a smoker)

Hazard ratio

38

The model

)(

0

0

2

1 21

2

1

)()(

)()( xx

x

x

eetheth

ththHR −=== β

β

β

The point is to compare the hazard rates of individuals who have different covariates:

Hence, called Proportional hazards:

Hazard functions should be strictly parallel.

ABDBM © Ron Shamir

For binary x: β is exp log (increase in hazard)

betw categories. For numerical x: exp log increase per unit (e.g.

year)

Cox PH - computation • The coefficients β1, …, βK can be estimated using

numerical optimization (details not shown) • For large enough sample, the estimate of each βi

has a normal distribution and its p-val and confidence intervals can be computed.


Example: Farmingham heart study • Cohort of 5,180 aged 45-82 followed until time of

death or up to 10 years. 46% males, 402 deaths.

• Cox PH model for age and sex as factors:

• Both factors increase risk. – Age: exp(0.11149) = 1.118 so 11.8% higher risk per year. – Male: exp(0.67958) = 1.973 higher risk per males, holding

age constant ABDBM © Ron Shamir 40

Die (n=402) Do Not Die (n=4778) Mean (SD) Age, years 65.6 (8.7) 56.1 (7.5) N (%) Male 221 (55%) 2145 (45%)

Risk Factor Parameter Estimate P-Value Age, years 0.11149 0.0001 Male Sex 0.67958 0.0001

http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Survival/

Model with more covariates

• Significant factors have CI that do not include 1 (the null)


Risk Factor Parameter Estimate P-Value Hazard Ratio (HR) (95% CI for HR)

Age, years 0.11691 0.0001 1.124 (1.111-1.138)

Male Sex 0.40359 0.0002 1.497 (1.215-1.845)

Systolic Blood Pressure

0.01645 0.0001 1.017 (1.012-1.021)

Current Smoker 0.76798 0.0001 2.155 (1.758-2.643)

Total Serum Cholesterol

-0.00209 0.0963 0.998 (0.995-2.643)

Diabetes -0.02366 0.1585 0.816 (0.615-1.083)

42

Example 1: Study of publication bias

By Kaplan-Meier methods

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

43

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

Table 4 Risk factors for time to publication using univariate Cox regression analysis

Characteristic

# not published

# published

Hazard ratio (95% CI)

Null

29

23

1.00

Non-significant trend

16

4

0.39 (0.13 to 1.12)

Significant

47

99

2.32 (1.47 to 3.66)

Interpretation: Significant results have a 2-fold higher incidence of publication compared to null results.

Univariate Cox regression

44

Example 2: Study of mortality in academy award winners for screenwriting

Kaplan-Meier methods

From: Longevity of screenwriters who win an academy award: longitudinal study BMJ 2001;323:1491-1496 ( 22-29 December )

Table 2. Death rates for screenwriters who have won an academy award.* Values are percentages (95% confidence intervals) and are adjusted for the factor indicated

Relative increase in death rate for

winners

Basic analysis

37 (10 to 70) Adjusted analysis

Demographic:

Year of birth

32 (6 to 64)

Sex

36 (10 to 69) Documented education

39 (12 to 73)

All three factors

33 (7 to 65) Professional:

Film genre

37 (10 to 70)

Total films

39 (12 to 73) Total four star films

40 (13 to 75)

Total nominations

43 (14 to 79) Age at first film

36 (9 to 68)

Age at first nomination

32 (6 to 64) All six factors

40 (11 to 76)

All nine factors

35 (7 to 70)

HR=1.37; interpretation: 37% higher incidence of death for winners compared with nominees

HR=1.35; interpretation: 35% higher incidence of death for winners compared with nominees even after adjusting for potential confounders

http://bmj.bmjjournals.com/cgi/content-nw/full/323/7327/1491/�

Sir David Cox • Born 1924 • Cambridge, Imperial College London, Oxford • Books:

– Planning of experiments (1958) – Queues (Methuen, 1961). With Walter L. Smith – Renewal Theory (Methuen, 1962). – The theory of stochastic processes (1965). With Hilton David Miller – Analysis of binary data (1969). With Joyce E. Snell – Theoretical statistics (1974). With D. V. Hinkley – Point processes (Chapman & Hall/CRC, 1980). With Valerie Isham – Applied statistics, principles and examples (Chapman & Hall/CRC, 1981). With Joyce E. Snell – Analysis of survival data (Chapman & Hall/CRC, 1984). With David Oakes – Asymptotic techniques for use in statistics. (1989) With Ole E. Barndorff-Nielsen – Inference and asymptotics (Chapman & Hall/CRC, 1994). With Ole E. Barndorff-Nielsen – Multivariate dependencies, models, analysis and interpretation (Chapman & Hall, 1995). With Nanny Wermuth – The theory of design of experiments. (Chapman & Hall/CRC, 2000). With Nancy M. Reid. – Complex stochastic systems (Chapman & Hall/CRC, 2000). With Ole E. Barndorff-Nielsen and Claudia

Klüppelberg – Components of variance (Chapman & Hall/CRC, 2003). With P. J. Solomon – Principles of Statistical Inference (Cambridge University Press, 2006). ISBN 978-0-521-68567-2 – Selected Statistical Papers of Sir David Cox 2 Volume Set – Principles of Applied Statistics (CUP) With Christl A. Donnelly


survival analysis - taurshamir/abdbm/pres/17/survival.pdf4 what is survival analysis? •statistical...

Documents