survival analysis - taurshamir/abdbm/pres/17/survival.pdf4 what is survival analysis? •statistical...
TRANSCRIPT
1
Survival Analysis Sources: •Slides: Kristin Sainani Stanford http://www.stanford.edu/~kcobb •Johnson and Shih An Introduction to Survival Analysis, Principles and Practice of Clinical Research 2E (2007) •Rich et al. A practical guide to understanding Kaplan-Meier Curves, Otolaryngology – Head and Neck Surgery (2010)
ABDBM © Ron Shamir
2
Overview
• Intro, terminology • Survival/hazard functions. • Kaplan-Meier curves • The LogRank test • Cox PH
ABDBM © Ron Shamir
3
Early example of survival analysis, 1669
Christiaan Huygens' 1669 curve showing how many out of 100 people survive until 86 years.
From: Howard Wainer- STATISTICAL GRAPHICS: Mapping the Pathways of Science. Annual Review of Psychology. Vol. 52: 305-335.
ABDBM © Ron Shamir
4
What is survival analysis? • Statistical methods for analyzing
longitudinal data on the occurrence of event.
• Possible events: – death, injury, onset of disease, recovery from illness,
recurrence-free survival for 5 years (binary variables) – transition above or below the clinical threshold of a
continuous variable (e.g. blood glucose level).
• Accommodates data from randomized clinical trial or cohort study design.
ABDBM © Ron Shamir
Randomized Clinical Trial (RCT)
Target population
Intervention
Control
Disease
Disease-free
Disease
Disease-free
TIME
Random assignment
Disease-free, at-risk cohort
5 ABDBM © Ron Shamir
Target population
Treatment
Control
Cured
Not cured
Cured
Not cured
TIME
Random assignment
Patient population
Randomized Clinical Trial (RCT)
6 ABDBM © Ron Shamir
Target population
Treatment
Control
Dead
Alive
Dead
Alive
TIME
Random assignment
Patient population
Randomized Clinical Trial (RCT)
7 ABDBM © Ron Shamir
Cohort study (prospective/retrospective)
Target population
Exposed
Unexposed
Disease
Disease-free
Disease
Disease-free
TIME
Disease-free cohort
8 ABDBM © Ron Shamir
9
Examples of survival analysis in medicine
ABDBM © Ron Shamir
10
RCT: Women’s Health Initiative (JAMA, 2002)
On hormones
On placebo Cumulative incidence
Women’s Health Initiative Writing Group.
JAMA. 2002;288:321-333.
ABDBM © Ron Shamir
11
Breast cancer and low-fat diet Control
Low-fat diet
Prentice et al. JAMA, February
8, 2006; 295: 629 - 642.
ABDBM © Ron Shamir
12
Aspirin, ibuprofen, and mortality after myocardial infarction: retrospective cohort study
Curits et al. BMJ 2003;327:1322-1323. ABDBM © Ron Shamir
Curtis et al. BMJ 2003
13
Why survival analysis? 1. Why not compare mean time-to-event
between groups using a t-test or linear regression?
-- For some patients we may not know if and when an event occurred: study terminated or we lost touch with them
2. Why not compare proportion of events in each group using risk/odds ratios or logistic regression?
--ignores time ABDBM © Ron Shamir
14
Terminology • The event of interest: the outcome sought • Time-to-event: The time from entry into a
study until a subject had the outcome • Censoring: Subjects are said to be
censored if they are lost to follow up or drop out of the study, or if the study ends before they have the outcome. They are counted as alive / disease-free for the time they were enrolled in the study. – Must assume censoring is independent of the
outcome, otherwise censoring will create bias
ABDBM © Ron Shamir
An example
ABDBM © Ron Shamir 15
Solid circles: uncensored Open: censored
Moving all start times to 0
ABDBM © Ron Shamir 16
A better view only if time homogeneity holds
Data of a hypothetical study
ABDBM © Ron Shamir 17 Johnson and Shih
18
Data Two-variable outcome : • ti = time at last disease-free observation or
time at event • ci =1 if had the event; ci =0 no event by time
ti
ABDBM © Ron Shamir
Survival function • S(t): the probability of an individual
surviving at least until time t • Usually unknown, evaluated based on a
sample • Survival experience – the empirical function
ABDBM © Ron Shamir 19
20
Cumulative survival
ABDBM © Ron Shamir
21
Probability density function: f(t) T: the event time for an individual (a random variable)
The probability of the event time occurring at exactly time t
F(t) = CDF of f(t)
S(t) = 1-F(t)
ABDBM © Ron Shamir
tttTtPtf
t ∆∆+<≤
=→∆
)(lim)(0
23
The hazard function
ttTttTtPth
t ∆≥∆+<≤
=→∆
)/(lim)(0
The probability that if you survive to t, you will succumb to the event in the next instant.
)()((t) :survival anddensity from Hazard
tStfh =
)()(
)()(
)()&()/()(
tSdttf
tTPdttTtP
tTPtTdttTtPtTdttTtPdtth =
≥+<≤
=≥
≥+<≤=≥+<≤=
Bayes’ rule
ABDBM © Ron Shamir
24
AGE ABDBM © Ron Shamir
25
A possible set of probability density, failure, survival, and hazard functions.
F(t)=cumulative failure
S(t)=cumulative survival h(t)=hazard function
f(t)=density function
ABDBM © Ron Shamir
The Kaplan-Meier curve Sorted events t1 < t2 < …< tn. No censoring. Pr(surviving to ti) = (n-i+1)/n What to do when some subjects are censored? Sorted events t1 < t2 < …< tn, di – no of events in (ti-1,ti]; ni – no of individuals at
risk (remaining in the study) in (ti-1,ti]; Pr(survival to ti)= P(surviving to ti-1) x P(surviving
interval (ti-1,ti]) = P(survival to ti-1) x (ni-di)/ni ABDBM © Ron Shamir 26
K-M or product-
limit estimator
ABDBM © Ron Shamir 27
28
K-M estimate and curve • Non-parametric estimate of the survival function • Empirical probability of surviving past certain
times in the sample (taking into account censoring). • Describes survivorship of study population/s. • Commonly used to compare two study populations. • Intuitive graphical presentation.
ABDBM © Ron Shamir
Paul Meier 1924-2011
ABDBM © Ron Shamir 29
Edward L. Kaplan
Comparing two survival curves • Two methods:
– Compare the curves at a pre-specified time point t – Compare the overall plots over the entire time range
ABDBM © Ron Shamir 30
Hormones vs Placebo Women’s Health Initiative
Writing Group. JAMA. 2002;288:321-333.
Result depends on t; tendency to pick the
“best” t
Comparing two curves: Log rank test • H0: S1(t) = S2(t) for all t • Log rank test: Use the ranks of events, not times. Sorted events t1 < t2 < …< tK, For time tj: Under H0, E(aj)=tot events x # at risk group 1/# at risk =
(aj+cj)x(aj+bj)/nj Z is approximately standard normal – evaluate p-val
ABDBM © Ron Shamir 32
Events Surviving Total
Group 1 aj bj aj+bj
Group2 cj dj cj+dj
Total aj+cj bj+dj nj
Example: breast cancer survival signature
• Caveats: – No mention of mean
survival – Visual inspection can be misleading – Must predefine the groups in advance
ABDBM © Ron Shamir 33
Van de Vijver NEJM 02
Small numbers left
Certain characteristics (age, sex, ..) can be related to survival – confounding / prognostic factors can change the relation of treatment to outcome
Need to stratify the test and compare survival differences within each level of these factors
WHI and breast cancer
Women’s Health
Initiative Writing Group.
JAMA. 2002;288:321-
333. 34 ABDBM © Ron Shamir
35
Cox Proportional Hazard Model • K-M curves and Log Rank – univariate
analysis; describe survival using one categorical factor
• Cox PH: allows many prognostic factors, categorical or real-valued
• Semi-parametric • Models the effect of predictors and
covariates on the hazard rate but leaves the baseline hazard rate unspecified.
• Estimates relative rather than absolute hazard.
ABDBM © Ron Shamir
36
The model
ikki xxi etth ββλ ++= ...
011)()(
Components:
•A baseline hazard function that is left unspecified but must be positive (=the hazard when all covariates are 0)
•A linear function of a set of k fixed covariates that is exponential.
ikkii xxtth ββλ +++= ...)(log)(log 110
Can take on any form!
37
The model
)(...)(...
0
...0
,1111
11
11
)()(
)()( jkikji
jkkj
ikkixxxx
xx
xx
j
iji e
etet
ththHR −++−
++
++
=== ββββ
ββ
λλ
Proportional hazards:
Hazard functions should be strictly parallel Produces covariate-adjusted hazard ratios
Hazard for person j (eg a non-smoker)
Hazard for person i (eg a smoker)
Hazard ratio
38
The model
)(
0
0
2
1 21
2
1
)()(
)()( xx
x
x
eetheth
ththHR −=== β
β
β
The point is to compare the hazard rates of individuals who have different covariates:
Hence, called Proportional hazards:
Hazard functions should be strictly parallel.
ABDBM © Ron Shamir
For binary x: β is exp log (increase in hazard)
betw categories. For numerical x: exp log increase per unit (e.g.
year)
Cox PH - computation • The coefficients β1, …, βK can be estimated using
numerical optimization (details not shown) • For large enough sample, the estimate of each βi
has a normal distribution and its p-val and confidence intervals can be computed.
ABDBM © Ron Shamir 39
Example: Farmingham heart study • Cohort of 5,180 aged 45-82 followed until time of
death or up to 10 years. 46% males, 402 deaths.
• Cox PH model for age and sex as factors:
• Both factors increase risk. – Age: exp(0.11149) = 1.118 so 11.8% higher risk per year. – Male: exp(0.67958) = 1.973 higher risk per males, holding
age constant ABDBM © Ron Shamir 40
Die (n=402) Do Not Die (n=4778) Mean (SD) Age, years 65.6 (8.7) 56.1 (7.5) N (%) Male 221 (55%) 2145 (45%)
Risk Factor Parameter Estimate P-Value Age, years 0.11149 0.0001 Male Sex 0.67958 0.0001
http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Survival/
Model with more covariates
• Significant factors have CI that do not include 1 (the null)
ABDBM © Ron Shamir 41
Risk Factor Parameter Estimate P-Value Hazard Ratio (HR) (95% CI for HR)
Age, years 0.11691 0.0001 1.124 (1.111-1.138)
Male Sex 0.40359 0.0002 1.497 (1.215-1.845)
Systolic Blood Pressure
0.01645 0.0001 1.017 (1.012-1.021)
Current Smoker 0.76798 0.0001 2.155 (1.758-2.643)
Total Serum Cholesterol
-0.00209 0.0963 0.998 (0.995-2.643)
Diabetes -0.02366 0.1585 0.816 (0.615-1.083)
42
Example 1: Study of publication bias
By Kaplan-Meier methods
From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)
43
From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)
Table 4 Risk factors for time to publication using univariate Cox regression analysis
Characteristic
# not published
# published
Hazard ratio (95% CI)
Null
29
23
1.00
Non-significant trend
16
4
0.39 (0.13 to 1.12)
Significant
47
99
2.32 (1.47 to 3.66)
Interpretation: Significant results have a 2-fold higher incidence of publication compared to null results.
Univariate Cox regression
44
Example 2: Study of mortality in academy award winners for screenwriting
Kaplan-Meier methods
From: Longevity of screenwriters who win an academy award: longitudinal study BMJ 2001;323:1491-1496 ( 22-29 December )
Table 2. Death rates for screenwriters who have won an academy award.* Values are percentages (95% confidence intervals) and are adjusted for the factor indicated
Relative increase in death rate for
winners
Basic analysis
37 (10 to 70) Adjusted analysis
Demographic:
Year of birth
32 (6 to 64)
Sex
36 (10 to 69) Documented education
39 (12 to 73)
All three factors
33 (7 to 65) Professional:
Film genre
37 (10 to 70)
Total films
39 (12 to 73) Total four star films
40 (13 to 75)
Total nominations
43 (14 to 79) Age at first film
36 (9 to 68)
Age at first nomination
32 (6 to 64) All six factors
40 (11 to 76)
All nine factors
35 (7 to 70)
HR=1.37; interpretation: 37% higher incidence of death for winners compared with nominees
HR=1.35; interpretation: 35% higher incidence of death for winners compared with nominees even after adjusting for potential confounders
Sir David Cox • Born 1924 • Cambridge, Imperial College London, Oxford • Books:
– Planning of experiments (1958) – Queues (Methuen, 1961). With Walter L. Smith – Renewal Theory (Methuen, 1962). – The theory of stochastic processes (1965). With Hilton David Miller – Analysis of binary data (1969). With Joyce E. Snell – Theoretical statistics (1974). With D. V. Hinkley – Point processes (Chapman & Hall/CRC, 1980). With Valerie Isham – Applied statistics, principles and examples (Chapman & Hall/CRC, 1981). With Joyce E. Snell – Analysis of survival data (Chapman & Hall/CRC, 1984). With David Oakes – Asymptotic techniques for use in statistics. (1989) With Ole E. Barndorff-Nielsen – Inference and asymptotics (Chapman & Hall/CRC, 1994). With Ole E. Barndorff-Nielsen – Multivariate dependencies, models, analysis and interpretation (Chapman & Hall, 1995). With Nanny Wermuth – The theory of design of experiments. (Chapman & Hall/CRC, 2000). With Nancy M. Reid. – Complex stochastic systems (Chapman & Hall/CRC, 2000). With Ole E. Barndorff-Nielsen and Claudia
Klüppelberg – Components of variance (Chapman & Hall/CRC, 2003). With P. J. Solomon – Principles of Statistical Inference (Cambridge University Press, 2006). ISBN 978-0-521-68567-2 – Selected Statistical Papers of Sir David Cox 2 Volume Set – Principles of Applied Statistics (CUP) With Christl A. Donnelly
ABDBM © Ron Shamir 46