bms 617

Marshall University Genomics Core Facility

Marshall University School of MedicineDepartment of Biochemistry and Microbiology

BMS 617

Lecture 10: Survival Curves

Marshall University School of Medicine

Survival Data

• The term survival data refers to the measurement of the time it takes for an event to happen– The event does not have to be death (so the

measurement is the amount of time the subject survived)

– Requirements are that the event is non-recurring– Either the time to the event is known, or the time to

censoring is known• Explain this shortly


Examples of survival data

• Time to first metastasis of a cancer• Time to recovery from a disease (in some well-

defined sense)• Time to first adverse effect of a drug


Censored survival data• Typical scenario for survival data is a long-running clinical study• We want to measure the time to a particular event.• Often, we cannot measure this for all subjects:

– Some subjects may choose to unenroll from the study– Subjects may die (from unrelated causes)– Some subjects may develop a comorbidity which rules them out of the study– Some subjects may require a medication which is not allowed by the study

protocol– The study may end before the event in question occurs for some subjects

• We say these subjects are censored at the time at which they become unavailable to the study

• In these cases, we do not want to eliminate the subject from the study entirely

• We know they survived at least until the point of censoring


Survival Curves

• Survival data is usually presented in a survival curve– Should include censored data– Still provides useful information– We know the time to event is at least as much as

the time to censoring for that subject• Survival curves plot the probability of survival

against time.


Kaplan-Meier Survival Curves• The Kaplan-Meier method for plotting survival curves uses the

following method:– Consider all events, including censoring events– The survival rate for each event is the number at risk immediately

after the event (excluding anyone censored at the event) divided by the number at risk immediately before the event (also excluding anyone censored at the event)

– The cumulative survival at any time is the product of the survival rates of all events up to that time• It’s the empirical probability of surviving to that point

– The Kaplan-Meier curve plots cumulative survival against time• Censoring events are usually marked with a small vertical line on the curve


Simple example

• For a simple example, consider a small fictional study with seven patients– End point is death from a disease

Date entered study End date Event

2/7/1998 2/2/2002 Died

5/19/1998 11/30/2004 Moved and left study

11/14/1998 4/3/2000 Died

3/4/1999 5/4/2005 Study ended

6/15/1999 5/4/2005 Died

12/1/1999 9/4/2004 Died

12/15/1999 8/15/2003 Died in car crash


Data for survival curve

• To use these data for a survival curve, we convert dates to elapsed time and classify events as “died” or “censored”:

Time (years) Event

4.07 Died

6.54 Censored

1.39 Died

6.17 Censored

5.89 Died

4.76 Died

3.67 Censored


Cumulative survival

• Now sort the times and compute cumulative survival:

Time Event # before # after Survival rate Cumulative

1.39 Death 7 6 6/7=0.857 0.857

3.67 Censor 5 5 1 0.857

4.07 Death 5 4 0.8 0.686

4.76 Death 4 3 0.75 0.514

5.89 Death 3 2 0.666 0.343

6.17 Censor 1 1 1 0.343

6.54 Censor 0 0 - 0.343


Sample Kaplan-Meier Curve


Confidence Intervals for Survival Curves

• Software can compute confidence intervals for survival curves– Can be shown as error bars at points where

survival changes, or as dashed lines


Summaries of Survival Data

• Two common summaries of survival data are often presented:– Median survival

• Time at which the survival rate is 50%• Probability of surviving this long is 50%• This is the x-value for y=0.5 on the curve

– 5.89 years in our example

– 5-year survival • Often used in cancer studies• The proportion of subjects who survive 5 years• This is the y-value when x is 5 (years)

– 51.4% in our example


Assumptions for survival data• Interpreting survival data relies on several assumptions:

– Representative sample– Independent subjects

• Survival of one subject does not depend on survival of another– Consistent criteria

• Including criteria for being a part of the study• And criteria for determining end points

– Clearly defined starting time– Censoring is unrelated to survival

• Can’t censor because patient is too sick to come to a clinic, for example– Average survival is constant throughout study

• Same no matter when patient enters the study• May be violated if standard of care improves through study period


Comparing survival data

• Usual use of survival analysis is to compare the survival rates of two (or more) groups under different conditions– Different treatment groups, for example

• Helpful to plot Kaplan-Meier curves for both groups on the same graph

• Can compute a p-value for the null hypothesis that the survival rate is equal in two groups


Assumption of proportional hazards

• The hazard is essentially the slope of the survival curve– The rate at which subjects are dying

• The key statistic in the comparison of two groups is the hazard ratio– Hazard in one group divided by the hazard in the other

• Assumption of proportional hazards is the assumption that this ratio is constant over time– For example, if there is a high early risk in one group,

there must be a high early risk in the other group


Example: Prednisolone as treatment for chronic active hepatitis

• Example from Motulsky (Kirk et al. 1980)• Compared survival of patients with chronic

active hepatitis, treated either with prednisolone or with a placebo

• 22 patients in each group– one in prednisolone group left the study after 56

months– 10 in prednisolone group and 6 in control group

were still alive at end of study


Kaplan-Meier plots for prednisolone and control groups


Median Survival Times

• The median survival time for the prednisolone group is 146.0 months

• For the control group it is 40.5 months• Ratio of these is 3.605• 95% Confidence interval of the ratio is [1.673, 7.768]– We are 95% confident that the range from 1.673 to 7.768

contains the true ratio of median survival times for prednisolone-treated chronic active hepatitis patients to the median survival times of untreated patients


Statistical Tests

• Most common statistical test is the log-rank method, also called the Mantel-Cox method

• Another very similar test is the Mantel-Haenszel test– These differ only in how they handle two patients dying

at the same timepoint• Less common test is the Gehan-Breslow-Wilcoxon

test, which gives more weight to deaths at early time points

• Don’t try to do any of these tests by hand…


Mantel-Cox (logrank) test for example

• Using the log rank test for the prednisolone example yields a hazard ratio of 0.4456 with a 95% confidence interval of 0.1944 to 0.9107– Best estimate of the ratio of the hazard for the prednisolone group

relative to the control group is 0.4456– We are 95% confident that the range 0.1944 to 0.9107 includes

the true value of this ratio• The p-value is 0.0305

– If the treatment had no effect, the chances of random sampling giving survival curves this different is just 3.05%

– We have to assume that the hazard ratio is constant over time to make this interpretation

bms 617

Documents

probability of survival

survival curveto

survival datatime

survival curveshould

survival rates

plotting survival curves

eventthe cumulative

elapsed time