survival analysis
TRANSCRIPT
ORIGINAL ARTICLE
Survival analysis
Robert Flynn
Aims and objectives. This paper describes when and why survival analysis is used and describes the use and interpretation of the
techniques most commonly encountered in medical literature. This is performed using examples taken from core medical
journals.
Background. Survival analysis is widely used in clinical and epidemiological research: in randomised clinical trials for com-
paring the efficacy of treatments and in observational (non-randomised) research to determine and test the existence of epi-
demiological association.
Design. This paper introduces the principles, practice and terminology of survival analysis.
Methods. References are made to examples from open-access medical journals.
Results. Survival analysis is a well-established series of methodologies that are widely encountered in medical literature for both
observational and randomised studies.
Conclusions. Survival analysis represents a more efficient use of clinical data than other forms of analysis which rely on fixed
time periods. One of the most widely used techniques is that developed by Kaplan and Meier. This involves the creation of life
tables and the plotting of survival curves with comparison made between two or more groups. The log-rank test is commonly
used to establish whether there is a statistically significant difference between these groups. The Multivariate Cox proportional
hazards extend this approach to give an estimate of effect size (the Hazards Ratio) and can adjust for any potential confounding
variables. In this model, the assumption of proportional hazards is of key importance and should always be checked. More
advanced techniques are the use of time-dependent variables and the less widely used parametric survival techniques. Care
should always be taken when considering the assumptions involved when using such methods.
Relevance to clinical practice. As survival analysis is widely used in clinical research, it is important that readers can critically
evaluate the use of this technique.
Key words: Cox proportional hazards models, Kaplan–Meier survival curves, statistics, survival analysis
Accepted for publication: 10 October 2011
Introduction
Survival analysis concerns the follow-up in time of individ-
uals from an initial experience or exposure until a discrete
event. It can be used to describe survival of a single group of
patients, but more interestingly, it can also be used to
compare the experience of different groups of patients or
subjects. Its use in contemporary medical literature is
widespread. This article will describe when and why
survival analysis is used, the nature of the data required
for such analyses, the use and interpretation of the
techniques most commonly encountered in medical litera-
ture and finally, consideration will be given to the strengths
and weaknesses of this technique. This will all be illustrated
with reference to ‘open-access’ examples taken from core
medical journals.
Author: Robert Flynn, PhD, MSc, GradStat, Statistician and
Epidemiologist, Medicines Monitoring Unit, University of Dundee,
Ninewells Hospital & Medical School, Dundee, UK
Correspondence: Robert Flynn, Statistician and Epidemiologist,
Medicines Monitoring Unit, University of Dundee, Ninewells
Hospital & Medical School, Dundee DD1 9SY, UK. Telephone:
+44 01382 383119.
E-mails: [email protected]
� 2012 Blackwell Publishing Ltd
Journal of Clinical Nursing, doi: 10.1111/j.1365-2702.2011.04023.x 1
Background
Survival analysis is widely used in clinical and epidemiolog-
ical research. In randomised clinical trials, it is used to
compare the occurrence of outcomes in patients receiving
different treatments to establish which is the most effective
(Dumville et al. 2009, Severe et al. 2010). Observational
(non-randomised) research also makes extensive use of
survival models, to determine and test the existence of
epidemiological association (Versmissen et al. 2008, de
Oliveira et al. 2010). It is worth noting that although
originally developed for the purposes of analysing clinical
data, survival analysis is increasingly being used to analyse
non-medical data, for example, by the financial services
sector (to assess time to default of bank loans) and in
engineering (for example to assess time to failure of a
component). This article will focus on the use of survival
analysis in healthcare.
Why use survival analysis?
When considering the likelihood of an event in a cohort of
patients, an intuitive approach might be to calculate the risk
of an event occurring by measuring the proportion of patients
suffering a particular event after a fixed time period, for
example, the proportion deceased one year after starting a
given treatment. There are, however, several problems with
this simple approach. Some subjects will be known to be alive
after one year, others will be known to be deceased, but a
certain proportion will be ‘lost’ to follow-up, their where-
abouts unknown, perhaps because they have moved house or
emigrated (as discussed later, these are referred to as censored
observations). Because the status of these patients is
unknown, then cannot be included in the one-year analysis,
even though they may have been followed for several weeks
or months. Another issue is that patients who die after
one week will have an equal weighting in the analysis as
those dying after 51 weeks, whereas a patient dying after
53 weeks will not be included in the count of deceased
patients in the one-year analysis. Additionally, the realities of
clinical research are that patients tend to be followed-up for
different periods of time: some patients who are enroled in
the early stages of a study can be easily followed for longer
periods. In the above example of a one-year mortality study,
some patients could have been followed-up for three years;
however, when calculating the one-year mortality rate, the
final two years of follow-up would be ignored. Other
patients, however, may be recruited at the end of a study
and might only be followed-up for a short period, say
six months. These patients would have to be excluded from
the analysis as they would not have sufficient follow-up.
What is needed is a form of analysis that takes into account
these different follow-up times. This is what survival analysis
does.
Structure of survival data
Outcome data
Survival analysis is used when considering the occurrence in a
population of a binary (or dichotomous) outcome: that is,
one that may be either present or absent. This binary
outcome is often death, hence the label ‘survival analysis’;
however, it could also be any event like the onset of acute
illness (such as myocardial infarction or stroke) or chronic
illness (such as onset of diabetes). This is sometimes referred
to as the dependent variable.
Censoring
This important concept relates to subjects who form part of a
cohort but who never suffer the event of interest. This could
be because a patient is ‘lost to follow-up’ (leaves the
population prior to the end of a study), because the study is
completed before the patient suffers the event of interest, or
because the patient suffers another event which stops them
from suffering the principal event of interest (for example,
being hospitalised for infection in a study with hospitalisation
for myocardial infarction as the outcome). Although none of
these subjects suffer the event of interest, the fact that they
contribute time without suffering an event is vitally impor-
tant.
For all the method of survival analysis described in this
article, there is an import assumption that the censoring is
non-informative. That is, censoring is not related to the
probability of an event occurring. This could occur, for
example, if patients left the study population shortly before
dying.
Explanatory variables
Typically two or more different groups of subjects are
considered with the survival experience compared between
these different categories. These two groups could be patients
exposed to different medicines (placebo vs. active), but could
also include other prognostically important factors such as
age or sex. These may also be referred to as predictors or
independent variables. As will be seen, observational studies
in particular tend to include data of a large number of such
variables. It is generally the case that attention is focused on
R Flynn
� 2012 Blackwell Publishing Ltd
2 Journal of Clinical Nursing
one explanatory variable in particular with the others being
referred to as covariates.
Survival time
The final next key piece of information is the follow-up time.
This is the interval – usually in days, months or years –
between the start of follow-up for that subject until the
occurrence of the event of interest or until censored. A
summary of commonly used terms and definitions used in
survival analysis is shown in Table 1.
Survival analysis techniques
As will be seen, the output of survival analysis can take the
form of life tables, survival curves, formal hypothesis tests
and measures of relative risk. The use, implementation and
interpretation of these will be discussed in the following
sections.
Simple survival techniques
One of the most widely used techniques is that developed by
Kaplan and Meier (1958). This remains in wide use in
randomised controlled trials but also has a more limited role
to play in observational research. It is a simple technique that
considers at different points the number of patients remaining
in the cohort and the cumulative number of events that have
occurred up to that point. As an example, consider the
hypothetical data shown in Table 2. This shows data on nine
patients enroled in a study in 2006 and 2007 and followed-up
until the end of 2009, with a patient identifier, the dates
between which they were follow, the duration of follow-up
and the outcome (death or censored). These same data are
presented in Fig. 1. Panel a shows these data as timelines,
with an ‘X’ showing deaths and ‘O’ showing censored
observations. By considering the duration of follow-up from
the same baseline (Fig. 1 panel b) and by then putting the
timelines in the order of their duration (Fig. 1 panel c), it
starts to become clear how a survival curve might be
Table 1 Terms and definitions commonly encountered in survival analysis and referred to in this article
Randomised Controlled Trials (RCT) – a clinical trial whereby study subjects (typically patients) are randomly assigned to receive different
interventions, for example, treatment or placebo.
Observational (non-randomised) research – a clinical study where no intervention is instigated by those undertaking the research. The inves-
tigators instead observe exposures and outcomes in groups of patient and draw conclusion from these. Such research may take one of several
recognised study types, for example, cohort study, case-control study or ecological study.
Epidemiological association – a measured relationship between two factors that may or may not be the result of a causal association. Such
associations are often identified in observational studies.
Censoring – This occurs in subjects who are included in the follow-up for a study but who never suffer the event of interest. This may occur if a
patient is lost to follow-up, if the study finishes before the event has occurred, or if the patient suffers another event which excludes them from
follow-up.
Explanatory variables (predictors or independent variables) – prognostically important factors that may be measured for subsequent analyses.
Generally, the focus is on one explanatory variable in particular (for example, exposure to a specific medicine) with the other variables being
referred to as covariates (such as age, sex, exposure to other medicines).
Survival time – the interval (measured in days, months or years) between start of follow-up and the occurrence of the event of interest or until
censored. Also referred to as follow-up time.
Life table – a summary table used to describe the survival experience of population. These are seldom used as where there are a large number of
events or a comparison is made between groups, the table rapidly becomes lengthy and complex.
Survival curves – A curve that shows the proportion of the population ‘surviving’ at successive points in time. This may include two or more
discrete groups that can easily be compared. The Kaplan–Meier survival plot usually compares two groups in a univariate analysis.
Log-rank test – a non-parametric hypothesis test that compares survival curves to see whether any difference that exist is likely to have arisen by
chance.
Confounding variable – a variable that can cause the outcome of interest and which is associated with the principal variable of interest. This can
cause misleading epidemiological associations between independent variables and outcomes.
Cox Proportional Hazard Model – a multivariate proportional hazards survival model that assumes that the impact of variables on the hazard
rate remains constant over time and is multiplicative. This technique is widely used in medical literature, in particular in observational studies.
Hazard Ratio (HR) – a type of relative risk derived from a Cox model. When comparing against a reference group (i.e. a placebo medicine) an
HR > 1 indicates increase risk of an event whilst an HR < 1 indicates reduced risk.
Parametric/non-parametric/semi-parametric survival models – these terms describe the assumptions made in the various survival analysis
methods. The simplest methods are the non- parametric techniques which made no assumption about the underlying distribution (or shape) of
hazard function (e.g. Kaplan–Meier method). Parametric survival models make assumptions about the impact of variables on outcomes and the
shape of the hazard function – at present these are not widely encountered in medical literature. Semi-parametric techniques include the Cox
model and make assumptions about the impact of variables on outcomes but not the shape of the hazard function.
Original article Survival analysis
� 2012 Blackwell Publishing Ltd
Journal of Clinical Nursing 3
constructed. The final Kaplan–Meier survival curve (shown in
panel d) is actually derived from the life table shown
in Table 3. This life table shows the survival function – that
is, the probability that an individual will survive until the
given time – which is recalculated as the cumulative
probability of survival since baseline, based on the number
of surviving patients at each point an event occurs.
Although as shown in Table 3 the life table is easy to
comprehend, in studies where there are large numbers of
patients or outcomes, or where there are different groups of
patients being compared, the life table rapidly becomes too
large and unwieldy to be easily interpreted. For this reason,
the Kaplan–Meier survival plot is more commonly used. This
is simply a plot of the survival function against time (Fig. 1
panel d). Where two (or more) groups are involved, these are
displayed as two (or more) separate curves on the same axes.
The nearer the origin the curve is (i.e. the lower it is), the
worse the survival experience is for that group. The approach
Figure 1 Pictorial representation of survival data. panel (a) shows the raw data as timelines as it would be collected in a clinical study, panel (b)
shows the same timelines but all originating at the same baseline, panel (c) shows these ordered by duration of follow-up and panel (d) shows a
survival curve generated using the Kaplan–Meier methodology.
Table 2 Structure of a typical data set for survival analysis
Subject Start of follow-up End of follow-up
Duration
of follow-up (months) Outcome
6 January 2006 4 June 2008 28Æ9 Death
2 28 April 2006 19 November 2006 6Æ7 Censored
3 23 November 2006 2 April 2007 4Æ3 Death
4 6 February 2007 21 December 2009 34Æ5 Censored
5 6 January 2006 21 December 2009 47Æ5 Censored
6 20 December 2006 21 April 2008 16Æ0 Death
7 23 July 2006 30 December 2006 5Æ3 Death
8 31 March 2007 22 August 2008 16Æ8 Death
9 28 March 2006 19 October 2009 42Æ7 Death
R Flynn
� 2012 Blackwell Publishing Ltd
4 Journal of Clinical Nursing
of Kaplan and Meier is often referred to as non-parametric as
it makes no assumptions about the underlying distribution of
the data and makes no attempt to describe this numerically.
Whilst it may be clear that there is a difference between two
lines on a Kaplan–Meier survival plot, it might not be clear
whether such differences have arisen by chance or whether
there is actually a meaningful underlying difference. Several
statistics exist that formally test whether there is a statistically
significant difference between two (or more) survival curves.
The most commonly applied of these is the log-rank test. Where
this takes a p-value of <0Æ05 then this would be considered a
statistically significant difference. Other tests do exist but are
not widely used in medical literature.
A good example is seen in a study published by Severe et al.
(2010). This randomised study compared early vs. standar-
dised antiretroviral therapy in HIV infected patients in Haiti.
In two separate survival plots considering time until death
and time until development of tuberculosis, the survival curve
for the standard therapy group is below that of the early
intervention group in each. For both of these, the log-rank
test gave a small p-value (0Æ001 and 0Æ01 respectively)
indicating that these are both statistically significant findings.
The authors concluded that the early intervention approach
resulted in better outcomes.
In another randomised study by Dumville et al. (2009), the
investigators studied the impact of larval therapy (maggots)
on leg ulcers to see whether there was an impact on time-
to-healing and time-to-debridement of the ulcer. In this case,
the time being modelled is that until the onset of a beneficial
outcome (healing of the wound), so the plots are inverted. In
the case of time-to-healing, the two survival curves run more
or less concurrently and there seems to be no difference
between the two treatment groups – this suspicion that there
is no difference in time-to-healing is confirmed by the log-
rank test that has non-significant p-value of 0Æ331. For the
time-to-debridement of the wound, the larvae curve is
substantially higher than the control group indicating a
shorter time until debridement of the wound, a finding that –
with a log-rank test p<0Æ001 – is highly significant.
Whilst this technique is easy to use and interpret, it has its
limitations. Although differences between groups can be seen
and their statistical significance tested, no estimate of the
actual effect size is quantified. In addition where there are
imbalances between groups, as will very likely occur in non-
randomised observational studies, the findings will be prone
to confounding and bias. This cannot be adjusted for with
such a simple analysis. The next section will deal with
multivariate techniques that can address these issues.
Multivariate analysis
Multivariate survival models are widely used in medical
literature with examples of both randomised and observa-
tional studies (Versmissen et al. 2008, Dumville et al. 2009,
de Oliveira et al. 2010, Severe et al. 2010). In randomised
controlled trials, such methodologies can be used where there
is a desire to quantify the size of relative differences between
groups (Severe et al. 2010), where there is the existence of
strong prognostic indicators that could be biasing the results
(Dumville et al. 2009), or where there is a desire to quantify
the impact of other prognostic indicators (Dumville et al.
2009). However, it is in non-randomised observational
studies – where the nature of the data means there are likely
to be confounding variables – that multivariate survival
analyses have had the biggest impact.
The most widely used technique is the Cox Proportional
Hazard Model, developed by Sir David Cox, in a paper which
has become one of the most widely cited (and perhaps least
read) papers with almost 25,000 citations currently recorded
by ISI Web of knowledge (Cox 1972). This technique works
by modelling the hazard function. This is most easily
interpreted as a subject’s risk of suffering the event of interest
at any given time during the follow-up. The model makes no
assumptions about the underlying shape of the hazard, and
no attempt is made to parameterise or describe this numer-
ically. However, it is assumed that the impact of the variables
included is proportional and this part of the model is
parameterised. For this reason, this technique is sometimes
referred to as a semi-parametric approach.
An important consideration is whether there are sufficient
events in the cohort for the study to be valid. Where there are
a large number of covariates included in an analysis, it is
important to check that the analysis is not underpowered.
Typically, it is thought that there should be approximately 10
events per covariate for an analysis to be valid, although there
Table 3 Life table representation of the data from Table 1
Time (months) Number Events
Proportion surviving
until end of period
Cumulative
proportion
surviving
4Æ3 9 1 1 � 1/9 = 0Æ889 0Æ889
5Æ3 8 1 1 � 1/8 = 0Æ875 0Æ778
6Æ7* 7 0 1 � 0/7 = 1Æ000 0Æ778
16Æ0 6 1 1 � 1/6 = 0Æ833 0Æ648
16Æ8 5 1 1 � 1/5 = 0Æ800 0Æ519
28Æ9 4 1 1 � 1/4 = 0Æ750 0Æ389
34Æ5* 3 0 1 � 0/3 = 1Æ000 0Æ389
42Æ7 2 1 1 � 1/2 = 0Æ500 0Æ194
47Æ5* 1 0 1 � 0/1 = 1Æ000 0Æ194
*Censored observations.
Original article Survival analysis
� 2012 Blackwell Publishing Ltd
Journal of Clinical Nursing 5
is some suggestion that this could be relaxed (Vittinghoff &
McCulloch 2007).
Assumption of proportional hazards
The assumption of proportional hazards is of key impor-
tance, and there are some implications that should be
considered. If the subjects are divided up into four groups
based on their age, it is assumed that the relative impact on
outcome on going from age group 1 to age group 2 would be
the same as going from age group 3 to age group 4. This
assumption can easily be tested by plotting ‘log-minus-log’
plots, something that should always be performed and
reported when using this analysis. It also assumes that the
impact of combined variables is multiplicative, for example,
if men are at double the risk of an adverse event and if
patients with diabetes are also at double the risk, then male
patients with diabetes would have a quadrupled risk overall.
This assumption can easily be tested by modelling interaction
terms, which could – if found to be significant – allow male
patients with diabetes to have a disproportionately high or
low risk. There is also an implicit assumption of a constant
hazard function. This means that it is assumed the instanta-
neous risk of an event remains constant throughout the
duration of follow-up and that the relative difference between
categories also remains constant. Again this assumption can
easily be tested either by plotting log-minus-log plots or by
modelling interactions with time and should always be
reported.
Interpretation of the Cox model
The output from the Cox model is the Hazard Ratio (HR)
which is interpreted in the same way as other relative risks
(see example discussed below). Where the HR is >1, this
indicates there is an increased risk of an event associated with
that variable. Where the HR is below 1, this indicates a
reduced risk. The associated 95% confidence interval (CI)
gives an indication of the statistical uncertainty of the HR
estimate: where these cross 1, this indicates that there is no
statistically significant difference. The Cox model also allows
the plotting of ‘adjusted’ Kaplan–Meier survival curves which
allow the comparison of curves that are balanced for the
other variables in the model.
In the randomised study considering the impact on wound
healing of larval therapy (Dumville et al. 2009), the authors
used a multivariate Cox model to investigate whether any
relationship between the treatment and known prognostic
indicators had an impact on time-to-healing and time-
to-debridement. For both outcomes, this produced results in
line with the unadjusted analysis: a non-significant HRs of
1Æ13 (95% CI 0Æ76–1Æ68) for time-to-healing and a significant
2Æ31 (95% CI 1Æ65–3Æ24) for time-to-debridement of wound.
The advantage here is that the Cox model adjusts for the
impact of other covariates and provides an effect size (a 2Æ3
times increased ‘risk’ of early debridement of the ulcer),
something the Kaplan–Meier approach would not provide.
An observational study using data from a Scotland-wide
health survey looked to see whether there was an association
between frequency of toothbrushing and incidence of car-
diovascular disease, seeking to confirm a biologically plau-
sible association that had previously been observed elsewhere
(de Oliveira et al. 2010). In an observational study like this,
there is a strong likelihood of confounding, particularly
relating to socioeconomic status and health behaviour – this
would mean that any association observed between worse
oral hygiene and cardiovascular outcomes might be because
subjects with lower socioeconomic status were less likely to
brush their teeth regularly and more likely to suffer cardio-
vascular events because of other associated unhealthy behav-
iour. To adjust for this, the authors used a multivariate Cox
model that adjusted for other prognostic indicators including
socioeconomic status. The results show that, for a simple
model which took into account only age and sex, there is a
strong association between cardiovascular disease and tooth-
bushing twice daily vs. less than once daily: HR 2Æ3 (95% CI
1Æ8–3Æ1). However, when a more complex Cox model was
used that incorporated socioeconomic status as a variable, the
relationship remained but was weaker with a HR of 1Æ7 (95%
CI 1Æ3–2Æ3). This implies that the association between
toothbrushing and cardiovascular outcomes is partly con-
founded by socioeconomic status, but that an association
remains after this is taken into account.
Figure 2 shows an example of Kaplan–Meier survival
curves. In this study, the authors were attempting to establish
whether there was an association between thyroid-stimulat-
ing hormone (TSH) serum concentration and cardiovascular
morbidity and mortality in patient receiving long-term
thyroxine (Flynn et al. 2010). In this case, the TSH is an
indicator of whether the patient is being over or undertreated
with thyroxine – ‘high’ TSH indicating undertreatment,
‘normal’ TSH indicating appropriate treatment control and
‘low’ and ‘suppressed’ indicating overtreatment. The survival
curves shown were ‘adjusted’, that is, they were derived from
a Cox model so that the four groups were balanced for the
various covariates included in the analysis (see Fig. 2 for
details). They show increased risk of cardiovascular events
for patients with a high and suppressed TSH compared with a
normal (p < 0Æ0001 for both), but no difference between
patients with normal and low TSH (p = 0Æ084).
R Flynn
� 2012 Blackwell Publishing Ltd
6 Journal of Clinical Nursing
Time-dependent variables
Extensions and alternatives to the Cox proportional hazards
model are sometimes encountered in medical research. Data
from both clinical trials and observational studies sometimes
use the Cox model with a ‘time-dependent variable’ or ‘time
varying covariate’. This allows for the fact that during
follow-up, patients may spend some periods of time both
exposed and not exposed to the variable of interest. For
example, in an observational study considering the efficacy of
statins in familial hypercholesterolemia (Versmissen et al.
2008), the authors were aware that during a total of
16,792 years of follow-up amongst the 1950 individuals,
patients were likely to spend some periods of time taking
statins and other periods not taking statins. To allow for this
in their analysis, they entered statins as a time-dependent
variable so that events that occurred whilst exposed were
considered separately to events that occurred whilst not
exposed. In this situation, the resulting HR is interpreted the
same way as for any other Cox model (i.e. where the HR is
<1, then the effect is protective and if it is greater, the effect
is associated with greater harm). In the study mentioned
above, it was concluded that use of statins had a HR of 0Æ18,
indicating an 82% reduction in the rate of coronary heart
disease associated with statin use.
It is worth noting that where a time-dependent variable is
used, it is not possible to plot survival curves: survival curves
are based on fixed categories of patients that are usually
defined at baseline – however, where a time-dependent
variable is used, patients may be switching between catego-
ries during follow-up.
Parametric survival models
Another extension of the techniques discussed above is the
use of fully parametric survival models such as the acceler-
ated failure time models. Here, assumptions are made about
both the impact of the variables considered and the shape of
the underlying survival function (i.e. the shape of the survival
curve). These assumptions allow stronger inferences to be
made and these have several implications. Whereas the Cox
proportional hazards approach models the impact of vari-
ables on the hazard function, parametric analyses typically
model the impact of variables directly on survival time. This
means the interpretation of the model differs from the Cox
model with the output relating directly to the duration of
survival, thus allowing an inference to be made about the
actual time until the event. This also means that for a given
set of variables, it is possible to calculate what would be the
‘mean’ survival time. In many ways, this might be easier to
interpret, especially for patients who may struggle with the
concept of relative risks. Another implication is that risk
estimates are calculated that typically have narrower CIs
implying a reduced level of statistical uncertainty; however, it
should be remembered that the assumptions made might lead
to a less stable or inaccurate representation of the data. A
further advantage of these parametric models is that it is
possible to incorporate non-constant hazard functions.
Whereas the Cox model has the assumption that the risk of
an event remains constant with time, the Weibull parametric
model allows the inclusion of a risk that, in time, may either
increase (e.g. increasing risk of cardiovascular disease) or
decrease (e.g. risk of death following a road traffic accident).
A model commonly encountered is that which assumes an
exponential function – this is essentially a constant hazard
function and should in many respects yield conclusions
similar to that from the Cox model. Other advantages of
these models are briefly, but clearly discussed by May et al.
(2003). Here, it is shown that by making basic assumptions,
it is possible to build a more useful model that allows an
estimate of survival functions to be made where very few or
no events occur. This paper also shows how parametric
methods give narrower CI limits.
Despite the advantage of such models, they remain
relatively uncommon in medical literature, perhaps because
of the problem of easily interpreting models where risk is
changing throughout the course of the study or perhaps
because of the ubiquitous nature of the Cox models which is
so well established and widely used. Readers wishing to
Figure 2 Adjusted survival curve derived from a Cox model showing
time to cardiovascular admission or death (adjusted for age, sex,
history of hyperthyroidism, history of cardiovascular disease, socio-
economic status and diabetic status). There were increased events for
patients with a high thyroid-stimulating hormone (TSH)
(p < 0Æ0001) and those with a suppressed TSH (p < 0Æ0001) com-
pared with a normal TSH, but no difference between normal and low
TSH (p < 0Æ084) (Flynn et al. 2010). Copyright 2010, The Endo-
crine Society.
Original article Survival analysis
� 2012 Blackwell Publishing Ltd
Journal of Clinical Nursing 7
explore these parametric survival models are directed to the
bibliography for further reading.
Strengths
There are some advantages in using survival analysis
techniques. As stated above, they enable a very efficient
use of data where there are varying durations of follow-up
(as is often the case with clinical data); to use any other
form of analysis where there is a dichotomous outcome
would represent a waste of potentially valuable informa-
tion.
The widespread use of survival analyses, especially
involving Kaplan–Meier survival curves and the Cox pro-
portional hazards models means that these methodologies
are widely understood and that there is good knowledge on
how to interpret them. Another advantage is the ability to
adjust for potentially confounding variables which has
resulted in the Cox model being commonly used in obser-
vational studies.
Weaknesses
The limitations of survival analysis stem from the nature of
the methodology and the assumptions involved when using
them. Survival analysis cannot be used where there is a
continuous outcome measure, for example, blood pressure or
serum creatinine concentration. Another potential issue is
with attrition of the cohort, and it should always be made
clear how many patients survive to the end of follow-up.
Although studies will have plenty of subjects in the early
stages of follow-up, by the final stage, towards the end there
might be very few subjects involved with the result that the
survival curve might be subject to a considerable degree of
statistical uncertainty – ideally, the actual number of patients
still exposed should be made clear under the x-axis of a
Kaplan–Meier survival plot, as in the case by Severe et al.
(2010).
The assumptions that are so important in underpinning the
model may often be violated or inadequately checked. In
particular, the assumption that there is a constant hazard
function is often not true, something that is especially likely
over a protracted time period. Where such a situation arises,
another type of model should be considered, for example, the
Weibull survival model, although interpretation will not be as
straightforward. As with other statistical modelling tech-
niques, where a continuous variable is included as a potential
predictor, there is an implicit assumption of linearity, so that
the difference in risk in individuals between ages 20–30 years
will be the same as the difference between 75 and 85 years.
Although this might often be a reasonable assumption, there
will obviously be occasions where this assumption is not
valid. In such cases, it might be more appropriate to
categorise the continuous variables (for example, expressing
age in groups). Alternatively, a continuous variable could be
fitted as a fraction polynomial which allows for the nonlinear
modelling of continuous variables (Sauerbrei et al. 1999);
however, this again can be hard to explain and interpret. The
ease and speed with which modern statistical software can
produce such a model means it is very easy for errors to be
unintentionally introduced!
Conclusion
In summary, survival analysis techniques are valuable and
widely used tools when analysing data that have a binary (or
dichotomous) outcome in subjects with uneven follow-up
time. These techniques are easy to use and are readily
interpreted by readers without the need for detailed under-
standing of statistical methods (see Table 1 for a list of
commonly encountered terminology). Although easy and
widespread, there are several challenges, assumptions and
difficulties that may be encountered and these should always
be reported by investigators and carefully considered by
readers. A variety of more advanced techniques can be
employed to enhance the analyses. Although not yet widely
employed, it seems like that these techniques will become
increasingly widespread over the coming years.
Relevance to clinical practice
Survival analysis is widely encountered in clinical literature
for describing both observational research as well as rando-
mised controlled trials. It is important that readers can
understand and critically evaluate the use of this technique.
Contributions
Study design: RF; data collection and analysis: RF and
manuscript preparation: RF.
Conflict of interest
The author has no conflicts of interest to declare.
R Flynn
� 2012 Blackwell Publishing Ltd
8 Journal of Clinical Nursing
References
Cox DR (1972) Regression models and life
tables. Journal of the Royal Statistical
Society, Series B (Methodological) 34,
187–220.
Dumville JC, Worthy G, Bland JM, Cullum
N, Dowson C, Iglesias C, Mitchell JL,
Nelson EA, Soares MO & Torgerson
DJ (2009) Larval therapy for leg ulcers
(VenUS II): randomised controlled trial.
British Medical Journal 338, b773.
Flynn RW, Bonellie SR, Jung RT, Mac-
Donald TM, Morris AD & Leese GP
(2010) Serum thyroid-stimulating hor-
mone concentration and morbidity
from cardiovascular disease and frac-
tures in patients on long-term thyroxine
therapy. Journal of Clinical Endocri-
nology & Metabolism 95, 186–193.
Kaplan ED & Meier PL (1958) Nonpara-
metric estimation from incomplete
observations. Journal of the American
Statistical Association 53, 457.
May M, Sterne J & Egger M (2003) Para-
metric survival models may be more
accurate than Kaplan-Meier estimates.
British Medical Journal 326, 822.
de Oliveira C, Watt R & Hamer M (2010)
Toothbrushing, inflammation and risk
of cardiovascular disease: results from
Scottish health survey. British Medical
Journal 340, c2451.
Sauerbrei W, Royston P, Bojar H, Schmoor
C & Schumacher M (1999) Modelling
the effects of standard prognostic fac-
tors in node-positive breast cancer.
German Breast Cancer Study Group
(GBSG). British Journal of Cancer 79,
1752–1760.
Severe P, Juste MA, Ambroise A, Eliacin L,
Marchand C, Apollon S, Edwards A,
Bang H, Nicotera J, Godfrey C, Gulick
RM, Johnson WD Jr, Pape JW &
Fitzgerald DW (2010) Early versus
standard antiretroviral therapy for
HIV-infected adults in Haiti. New
England Journal of Medicine 363, 257–
265.
Versmissen J, Oosterveer DM, Yazdanpa-
nah M, Defesche JC, Basart DC, Liem
AH, Heeringa J, Witteman JC, Lans-
berg PJ, Kastelein JJ & Sijbrands EJ
(2008) Efficacy of statins in familial
hypercholesterolaemia: a long term
cohort study. British Medical Journal
337, a2423.
Vittinghoff E & McCulloch CE (2007)
Relaxing the rule of ten events per
variable in logistic and cox regression.
American Journal of Epidemiology
165, 710–718.
Recommended reading
Hosmer DW, Lemeshow S & May S (2008) Applied Survival
Analysis: Regression Modeling of Time-to-Event Data, 2nd
edn. Wiley, Chichester.
Collett D (2003) Modelling Survival Data in Medical
Research, 2nd edn. Chapman & Hall/CRC, Boca Raton, FL.
The Journal of Clinical Nursing (JCN) is an international, peer reviewed journal that aims to promote a high standard of
clinically related scholarship which supports the practice and discipline of nursing.
For further information and full author guidelines, please visit JCN on the Wiley Online Library website: http://
wileyonlinelibrary.com/journal/jocn
Reasons to submit your paper to JCN:High-impact forum: one of the world’s most cited nursing journals and with an impact factor of 1Æ228 – ranked 23 of 85
within Thomson Reuters Journal Citation Report (Social Science – Nursing) in 2009.
One of the most read nursing journals in the world: over 1 million articles downloaded online per year and accessible in over
7000 libraries worldwide (including over 4000 in developing countries with free or low cost access).
Fast and easy online submission: online submission at http://mc.manuscriptcentral.com/jcnur.
Early View: rapid online publication (with doi for referencing) for accepted articles in final form, and fully citable.
Positive publishing experience: rapid double-blind peer review with constructive feedback.
Online Open: the option to make your article freely and openly accessible to non-subscribers upon publication in Wiley
Online Library, as well as the option to deposit the article in your preferred archive.
Original article Survival analysis
� 2012 Blackwell Publishing Ltd
Journal of Clinical Nursing 9