a unified theory for sequential clinical trials

16
STATISTICS IN MEDICINE Statist. Med. 18, 2271 } 2286 (1999) A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS JOHN WHITEHEAD* Medical and Pharmaceutical Statistics Research Unit, The University of Reading, Reading, RG6 6FN, U.K. SUMMARY The theory underlying sequential clinical trials is now well developed, and the methodology is increasingly being implemented in practice, both by the pharmaceutical industry and in the public sector. The conse- quences of conducting interim analyses for frequentist interpretations of data are now well understood. A large number of approaches are available for the calculation of stopping boundaries and for the eventual terminal analysis. In this paper, the principles of the design and analysis of sequential clinical trials will be presented. Existing methods will be reviewed, and their relationships with the general principles will be clari"ed. Controversies and gaps within the methodology will be highlighted. It is intended that presentation of the subject as a single uni"ed theory will allow the few essential underlying features to be better appreciated. Copyright ( 1999 John Wiley & Sons, Ltd. 1. INTRODUCTION The methodology of sequential clinical trials has come of age. More and more major clinical trials are being designed using formal sequential procedures, and more and more contributions are being made to the underlying statistical methodology. Unfortunately, it sometimes appears as if the many alternative approaches to sequential design and analysis are fundamentally di!erent and irreconcilable. It is the purpose of this paper to show that this is not the case. The principles underlying sequential methodology are universal, and they are also simple and few. It is the technical details of realizing these principles which are complicated, approximate and incomplete. In the following sections, the key ingredients of the sequential approach will be identi"ed and explained, alternative approaches to each will be described, and an attempt will be made to relate them to one another. The gaps in existing theory will be pointed out. Throughout this paper, it will be assumed that two treatments, an experimental (E) and a control (C) are to be compared in a major, parallel group, phase III, randomized clinical trial. The primary objective of the trial is to determine whether E is more e$cacious than C in terms of a single, speci"c patient response. A frequentist "nal analysis is sought, with a P-value and point and interval estimates of the advantage of E over C. The procedure is to satisfy a prespeci"ed power requirement, and will be conducted as a series of interim analyses, each involving a comparison of the evidence of e$cacy of E and C to date, with stopping occurring as soon as one of the interim analyses is in some sense su$ciently convincing. * Correspondence to: John Whitehead, Medical and Pharmaceutical Statistics Research Unit, The University of Reading, Reading, RG6 6FN, U.K. CCC 0277}6715/99/172271}16$17.50 Copyright ( 1999 John Wiley & Sons, Ltd.

Upload: john-whitehead

Post on 06-Jun-2016

220 views

Category:

Documents


7 download

TRANSCRIPT

STATISTICS IN MEDICINE

Statist. Med. 18, 2271}2286 (1999)

A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS

JOHN WHITEHEAD*

Medical and Pharmaceutical Statistics Research Unit, The University of Reading, Reading, RG6 6FN, U.K.

SUMMARY

The theory underlying sequential clinical trials is now well developed, and the methodology is increasinglybeing implemented in practice, both by the pharmaceutical industry and in the public sector. The conse-quences of conducting interim analyses for frequentist interpretations of data are now well understood.A large number of approaches are available for the calculation of stopping boundaries and for the eventualterminal analysis. In this paper, the principles of the design and analysis of sequential clinical trials will bepresented. Existing methods will be reviewed, and their relationships with the general principles will beclari"ed. Controversies and gaps within the methodology will be highlighted. It is intended that presentationof the subject as a single uni"ed theory will allow the few essential underlying features to be betterappreciated. Copyright ( 1999 John Wiley & Sons, Ltd.

1. INTRODUCTION

The methodology of sequential clinical trials has come of age. More and more major clinical trialsare being designed using formal sequential procedures, and more and more contributions arebeing made to the underlying statistical methodology. Unfortunately, it sometimes appears as ifthe many alternative approaches to sequential design and analysis are fundamentally di!erentand irreconcilable. It is the purpose of this paper to show that this is not the case. The principlesunderlying sequential methodology are universal, and they are also simple and few. It is thetechnical details of realizing these principles which are complicated, approximate and incomplete.In the following sections, the key ingredients of the sequential approach will be identi"ed andexplained, alternative approaches to each will be described, and an attempt will be made to relatethem to one another. The gaps in existing theory will be pointed out.

Throughout this paper, it will be assumed that two treatments, an experimental (E) anda control (C) are to be compared in a major, parallel group, phase III, randomized clinical trial.The primary objective of the trial is to determine whether E is more e$cacious than C in terms ofa single, speci"c patient response. A frequentist "nal analysis is sought, with a P-value and pointand interval estimates of the advantage of E over C. The procedure is to satisfy a prespeci"edpower requirement, and will be conducted as a series of interim analyses, each involvinga comparison of the evidence of e$cacy of E and C to date, with stopping occurring as soon asone of the interim analyses is in some sense su$ciently convincing.

* Correspondence to: John Whitehead, Medical and Pharmaceutical Statistics Research Unit, The University of Reading,Reading, RG6 6FN, U.K.

CCC 0277}6715/99/172271}16$17.50Copyright ( 1999 John Wiley & Sons, Ltd.

This narrow focus does exclude interesting problems concerning more than two treatments andmore than one endpoint, but it has been the setting for many of the methodological contributionsto the subject. Some generalizations will be mentioned in Section 6. Bayesian approaches tosequential clinical trials lie outside the scope of this paper, although the following items (i), (ii) and(iii) remain relevant.

For the frequentist, two treatment, single endpoint case, the key ingredients of any sequentialapproach which will allow the signi"cance of the treatment di!erence to be evaluated and itsmagnitude to be estimated are as follows:

(i) A parameter which expresses the advantage of E over C in terms of e$cacy. This is anunknown population characteristic about which hypotheses will be posed and of whichestimates will be sought. Its value will be denoted by h.

(ii) A statistic which expresses the advantage of E over C apparent from the sample of dataavailable at an interim analysis, and a second statistic which expresses the amount ofinformation about h contained in the sample.

(iii) A stopping rule which determines whether the current interim analysis should be the last,and if so whether it is to be concluded that E is better than or worse than C, or that notreatment di!erence has been established.

(iv) A frequentist method of analysis, valid for the speci"c design used, giving a P-value andpoint and interval estimates for h.

Di!erent proposals have been given by various authors for each of these four ingredients. Oftenan author will present a choice for all four, and it sometimes appears that those four solutionsmust always be used together. In fact, for the most part, a mix-and-match strategy can beadopted, with any combination of choices for the four ingredients being permitted.

2. PARAMETERIZING THE ADVANTAGE OF THE EXPERIMENTAL TREATMENT

The necessity to express the advantage of E over C terms of single unknown parameter h is, ofcourse, not peculiar to sequential methodology. The authority of any clinical trial will be greatlyenhanced if a single primary analysis is speci"ed in the protocol, and is subsequently found toshow signi"cant bene"t of E.

At the design stage, the role of h is to provide a language in which the power requirement forthe trial can be expressed. It will be required that, with probability (1!b), E be shown to besigni"cantly more e$cacious than C, when a given degree of treatment advantage is present.Signi"cance will be taken to mean that the P-value against the two-sided alternative is lessthan a (so that when h"0 signi"cant advantage is erroneously claimed with probability 1

2a). The

given degree of treatment advantage will be speci"ed in terms of h, a value denoted by hR ('0)being chosen. This can be referred to as the reference improvement. At the analysis stage,hypotheses about h will be tested, notably the null hypothesis H

0: h"0 which represents no

treatment di!erence, and point and interval estimates of h will be calculated.These roles imply that the primary criterion for choosing h must be one of clinical interpreta-

tion. Investigators have to be able to express the reference improvement in a way which can betranslated into a value for hR and the "nal estimates of h must be clinically meaningful.

In some cases, the choice of h will depend on modelling assumptions. For example, in the case ofsurvival data, h might be chosen to be minus the log-hazard ratio, that is, h"!logMhE(t)/hC(t)N,where hE(t) and hC(t) denote the hazard functions at time t since randomization, on E and C

2272 J. WHITEHEAD

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

respectively. Implicit in this de"nition is the assumption that the hazards are proportional so thath is constant in t. Such a choice should be made only if there is good reason to believe thathazards will be proportional, or that the form of averaging implied by the log-hazard ratio isappropriate. Alternative parameterizations for survival data are possible.1,2 Similar consider-ations apply to the assumption of proportional odds in the analysis of ordered categorical data.

Most approaches to sequential methodology make use of test statistics which, when plottedover the course of the study, resemble points on a Brownian motion. More details of theBrownian motion theory will be described in the next section. Here, it is su$cient to understandthat in sequential analysis, a second criterion in the choice of parametrization is the consequentaccuracy of the Brownian motion approximation. Central to this theory is the expression of thelog-likelihood as a quadratic function in h. When h is the only unknown parameter, then at anyinterim analysis the log-likelihood depends on the known data and on h only. Taylor's expansionguarantees that when h is small, the log-likelihood can be expanded in terms of powers of h, andthat the cubic terms and above can be ignored. To this degree of accuracy, the log-likelihoodresembles that obtained from a single sequence of independent normally distributed randomvariables with mean h and variance 1. The theory in the latter case is simple, and naturally leadsto a connection with Brownian motion. Because of the Taylor series approximation, this theory isinherited in the general case when h is small. When nuisance parameters are present, they can bereplaced by conditional maximum likelihood estimates which depend on the unknown value of h.Putting these estimates into the log-likelihood yields the pro"le log-likelihood, which is a functionof h only, and can again be approximated by a Taylor series up to quadratic terms in h.

The largest neglected terms in the Taylor expansion are those involving third derivatives of thelog-likelihood. It follows that parameterizations in which these third derivatives are likely to besmall will correspond to situations in which the Brownian motion theory is most accurate.3,4

As an example, consider the case of binary patient responses: success or failure. The probabilit-ies of success are denoted by pE and pC, respectively. Possible parameterization include:

(i) h"pE!pC;(ii) h"log [pE(1!pC)/MpC(1!pE)N];(iii) h"g(pE)!g(pC)

where

g(p)"Pp

0

dt

Mt(1!t)N2@3, p3(0, 1).

The "rst is clinically meaningful and easy to interpret. However, it does have certain fundamentallimitations. For example, the choice hR"0)1 has no interpretation if pE is less than 0)1. The valueof hR may be set by asking for a likely value of pC and a desirable value of pE, but the power ofa sequential procedure will be achieved for any pair pC and pE which give rise to the referenceimprovement hR, and is thus independent of the true value of pC. In some sense all such pairsshould represent situations in which a probability of (1!b) of observing a signi"cant advantageof E over C is desirable. Parameterization (i) is likely to achieve this, only for a limited range of pC.Consideration of the third derivative of log-likelihood with respect to this form of h shows thatthe Brownian motion approximation is likely to be poor except when h is very small indeed. Inturn this will mean that con"dence intervals for h which include values outside the permittedrange (!1, 1) are quite likely to occur.

A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS 2273

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

Parameterization (iii) is optimal in the sense of eliminating the third derivative of log-likelihoodcompletely and thus ensuring close adherence to the asymptotic theory. Unfortunately, it leads tovirtually insuperable problems of interpretation and communication. In the case of sequentialtrials in which all patients receive the experimental treatment and the success rate is to becompared to some "xed standard value, the asymptotic theory can be poor unless parameteriz-ation (iii) is used, and in this non-comparative setting it is quite simple to express conclusions interms of pE itself. This approach has been applied to monitoring the viability of accessions storedin seed banks,5 and has been compared with exact methods in the context of medical studies.6

Parameterization (ii) is often the most appropriate for comparative trials. Any value ofh between plus and minus in"nity can be interpreted, and the asymptotic theory is usuallyadequate. Furthermore, this form leads on naturally to logistic regression methods when there arecovariates to be adjusted for.

A more di$cult case is that of normally distributed random variables. Suppose that the meanresponse is kE for patients randomized to E, and kC for those on C, and that large responsesrepresent good outcomes, so that it is hoped that kE is larger than kC. The standard deviation ofresponses is p for both treatment groups. Two natural choices for h are as follows:

(iv) h"kE!kC;(v) h"(kE!kC)/p.

Parameterization (v) has some strong mathematical advantages. The accuracy of the Brownianmotion theory is far greater than for parameterization (iv), as has been con"rmed by simulation.7Also, parameterization (v) gives a dimensionless quantity, and allows sample sizes to be "xed andsequential designs to be chosen in a way which is independent of the value of p.

Inconveniently, clinicians invariably think in terms of the unstandardized quantity (iv). Inorder to overcome the problem of inaccuracy, a good strategy is to use an anticipated (guessed)value of p to translate a speci"cation in terms of parameterization (iv) into the standardizedform given by (v), and to proceed using the latter version of h. The issue of whether the initialguess should be modi"ed in the light of interim data will be considered in Section 6 below.Problems remain at the "nal analysis if point and interval estimates of the standardized version ofh are to be transformed into statements about the unstandardized di!erence between meanresponses.

3. TEST STATISTICS FOR USE IN INTERIM ANALYSES

A sequential trial consists of a series of interim analyses. At each of these it is necessary tocalculate a statistic which expresses the current observed advantage of E over C and a statisticwhich represents the information about h currently available. Here, the former will be denoted byZ and the latter by <, and it will be supposed that they are chosen in such a way that E(Z)"h<and var (Z)"< in large samples. The notation Z for a random variable which has variance otherthan one is unfortunate. However, I have used it so extensively in my own work that it would beconfusing to change it now. The desire to plot a value with expectation h< against < is, on theother hand, perfectly natural. It means that in such plots we can look for straight line trends, andinterpret the gradients of such a trend as the advantage h of E over C. Other authors suggest theconstruction of plots of Z/J< against<, in which trends will show up as parabolic relationships,whereas the construction of plots so that relationships can be seen as straight lines is a funda-mental principle of good graphical presentation.8

2274 J. WHITEHEAD

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

There are two general approaches to the construction of test statistics Z and <, although otherchoices can be made in speci"c cases. The "rst9 uses the score statistic evaluated under the nullhypothesis for Z and the observed form of Fisher's information for <. Speci"cally, in the absenceof nuisance parameters

Z"l@(0) and <"!lA(0)

where l(h) denotes the log-likelihood of h based on the data available, and l@ and lA denote the "rstand second derivatives of l with respect to h. Similar forms exist when a pro"le log-likelihood isused to overcome the problem of nuisance parameters; alternatively, the method can be appliedto a conditional or marginal likelihood, or in the context of survival data to a partial likelihood.The approximate quadratic form of l

l(h)"l(0)#hZ!12h2<

is central to the large sample theory. This methodology is a sequential form of the large samplelikelihood ratio test of Rao10 and has been referred to as the =u-test.11 It leads to sequentialversions of many familiar tests, such as Pearson's s2 test for binary data, the Mann}Whitney testfor ordered categorical data, the logrank test for survival data, and Armitage's trend test.12 Underconditions which are not too demanding, the score statistic l@(h) where h denotes the trueparameter value, forms a martingale as new data are added at each interim analysis.13 Thisimplies that the increments of the stochastic process formed by the score statistic are independent,and so, when h is small, are the increments of Z. The Brownian motion properties of Z whenplotted against < would appear to follow, both in the case without nuisance parameters and inthe case with, although I know of no general formal proof in the literature. There are proofs ofvarious special cases, including the logrank approach to survival data14 and a parametricsurvival model.15 The approach has roots in early work of Bartlett16 and aspects of it underliemore recent accounts of monitoring accumulating evidence.17,18 Minor variations can sometimessimplify the form of the test statistics or improve their accuracy. In particular, for ordinal data thefull form of < is unwieldy, and an approximate version will usually su$ce,19 and for groupedsurvival data with adjustment for covariates use of expected information rather than its observedform improves accuracy.20

The second general approach* is to base each interim analysis on an estimate h) of h. Usuallythe maximum likelihood estimate is used. Information< is taken to be the inverse of the square ofthe standard error of h) . The statistic Z can then be taken to be h)<, so that its expectation has therequired form. This approach is straightforward to implement because the necessary statistics areoutput directly from most standard analysis packages. When based on maximum likelihoodestimates, the method is a sequential version of the large sample likelihood ratio test of Wald,21which has been referred to as the =e-test.11 Its use in the sequential context goes back to anoriginal application to binary data.22

The justi"cation of the Brownian motion approximation for the sequential Wald test is madelargely by its asymptotic equivalence to the Rao test. When estimates not obtained by maximizing

* Since the &Burning issues' conference, Jennison and Turnbull have published an account of the use of this approachwhen covariates are to be adjusted for: Jennison, C and Turnbull, B. W. &Group sequential analyses incorporatingcovariate information', Journal of the American Statistical Association, 92, 1330}1341 (1997).

A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS 2275

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

the likelihood are used, the justi"cation is at best heuristic (for example, when the Kaplan}Meierestimate is used2). Calculation of the score test requires nuisance parameters to be estimatedunder the null hypothesis, whereas Wald's test uses unrestricted maximum likelihood estimates.These are more likely to fail to exist than under H

0. For example, consider a binary comparison.

As soon as at least one patient on each treatment has been observed, and at least one success andone failure recorded, Fisher's information will be positive and the plot of the score test can bestarted. However, for the Wald test to begin, at least one success and one failure are required oneach of the treatments. Monitoring would not even begin for a trial in which 50 per cent ofpatients on C were succeeding while all patients on E were failing, precisely the situation in whicha trial should be stopped.

4. STOPPING RULES

Denote by Ziand <

i, the values of Z and < calculated at the ith interim analysis. A stopping rule

will involve a lower limit liand upper limit u

i, for each i, where both l

iand u

iare functions of

<1,2,<

i, the amounts of information available at this and previous interims. If Z

i)l

ithen the

trial will stop; the conclusion is likely to be that E is relatively harmful, or perhaps just that E is nobetter than C. If Z

i*u

ithen the trial will stop; the conclusion is likely to be that E is better than

C. If Zi3(l

i, u

i), then the trial will continue until the next interim analysis. Notice that above,

conclusions are just indicated as being likely. The primary purpose of the stopping rule is todetermine when su$cient data have been collected; the subsequent analysis will provide theconclusion to be drawn.

The designs of interest for clinical trials are closed, which means that stopping is certainprovided that information continues to accrue. The closure of a trial can be achieved in one of twoways. First the stopping limits may be constructed to converge, so that there is a value <max of< such that for the "rst interim analysis in which<

iis equal to or exceeds<max, li

"uiand the trial

must stop. Second, a simple extra stopping rule can be introduced of the form: stop if <i*<max.

These two cases will be referred to as having convergent and non-convergent boundaries, respec-tively. The designs shown in Figure 1(b) have convergent boundaries, while those in Figure 1(a)are non-convergent.

Also of interest are double tests, formed by combining two sequential procedures. The usualsituation is symmetric; a procedure with stopping limits (l

i, u

i), i"1, 2,2 chosen so that Dl

iD)Du

iD

for all i, being combined with its mirror image which has stopping limits (!ui,!l

i), i"1, 2,2

The "rst of these, denoted by T`

, results in a claim that E is better than C if an upper limit iscrossed and that E is no better than C otherwise. The second is denoted by T

~and concludes that

E is worse than C if Zi)!u

ifor some i, and that E is no worse otherwise. The double test runs

until both T`

and T~

have stopped: if T`"nds that E is better than C, then so does the double

test, similarly if T~"nds that E is worse than C then so does the double test; otherwise no

treatment di!erence is claimed. The restriction that DliD)Du

iD ensures that the latter situation

occurs when T`

has found E to be no better than C and T~

has found E to be no worse than C.The properties of double tests can be found from those of the constituent &single' procedures, andunless stated otherwise attention will be con"ned to single procedures in what follows. Thedesigns shown in Figure 1(c) are double tests.

Once a sequence of stopping limits has been chosen, the properties of the resulting sequentialdesign can be evaluated using recursive numerical integration.23 For example, the probability p(h)that the sample path will exceed one of the upper stopping limits at or before the ith interim

2276 J. WHITEHEAD

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

Figure 1. Continuous boundaries for designs from PEST and EaSt

analysis is given byp(h)"

i+k/1

P(<"<k,Z*u

k)

"

i+k/1

PR

uk

fk(z

k, k) dz

k

A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS 2277

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

where

fk(z

k)"P

uk~Ç

lk~Ç

1

JIk

/Azk!z

k~1!hI

kJI

kB f

k~1(z

k~1, k!1) dz

k~1

k"2,2, i, f1(z

1, 1)"(1/J<

1) / M(z

1!h<

1)/J<

1N, I

k"<

k!<

k~1, k"2,2, i and / denotes

the standard normal density function.Usually crossing the upper boundary is taken to be signi"cant evidence that E is better than C,

and so for an interim analysis conducted when<iis equal to <max the probability above should be

equal to 12a when h"0, and to (1!b) when h"hR.

The stopping limits are chosen in order to satisfy the power requirement of the test, and subjectto that, to achieve secondary design objectives. Considerations might be the minimization ofexpected sample size, or else ensuring that early stopping only occurs in the presence ofoverwhelming evidence of treatment di!erence. In general the sequences l

1, l

2,2 and u

1, u

2,2

cannot be speci"ed in advance. They depend on the values of < at the current and previousinterim analyses. Computation of the stopping limits l

iand u

itakes place at the time of the ith

interim analysis. Although the properties of any sequence of stopping limits can be evaluated asshown above, no method exists that is both simple and accurate for reversing the process, andderiving the stopping limits from the required properties. It is the details of the derivation of thestopping limits which introduces much of the variety of sequential methodology. Most modernmethods utilize one of two di!erent approaches:

(i) An idealized continuous version of the monitoring process is envisaged, in which theplotted path becomes a continuous trace relating Z to <, and imagining that < increasessmoothly. Continuous stopping boundaries l(<) and u(<) are constructed. Usually bound-aries are chosen which have simple mathematical properties and often some form ofoptimality is aimed for. Discrete versions are then constructed as approximations, in thehope that they will inherit some of the optimal characteristics.

(ii) An a-spending function is de"ned. This function relates the null probability of obtaininga signi"cant positive result that E is better than C when or before the amount ofinformation available reaches the value <"v. This is also a continuous formulation inwhich< increases smoothly. The symmetric constraint l(<)"!u(<) is usually imposed, ifnot then a second form of spending function must be speci"ed, which concerns theprobability of stopping on the lower boundary when or before <"v, either under the nullhypothesis, or under the alternative that h"hR. From such functions the stopping limitsfor each enterim analysis can be derived. In this context the value of </<max is usuallyreferred to as the information time.

These two approaches di!er primarily in the way that the design is constructed. In the case ofcontinuous monitoring, every pair of boundaries (l(<), u(<)) has associated spending functions,and every combination of spending functions has an associated pair of boundaries. The bound-aries approach dates back to the sequential probability ratio test,24,25 and to the wide variety ofdesigns evaluated by Anderson.26 Anderson studied straight line boundaries, including thetriangular form also investigated by Lorden27 and by Hall.28 Double versions of the sequentialprobability ratio and triangular tests are given, respectively, by Sobel and Wald29 and byWhitehead and Brunier.30 The properties of straight line boundaries can be expressed in terms ofclosed form equations, most of which are given by Anderson,26 and derived again in a more

2278 J. WHITEHEAD

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

concise manner by Hall.31 In particular, the associated sending functions can be derived. Theattractions of the sequential probability ratio test and the triangular test are their minimization ofthe expected sample size under certain values of h.32}34 For the triangular test this is anasymptotic property as a tends to zero, and although it is a precise result for the sequentialprobability ratio test, in practice this design is usually truncated so that optimality is again onlyvalid asymptotically in the case of late truncation.

Discrete versions of straight line stopping boundaries can be found using the Christmas treecorrection.9 Consider an interim analysis taking place when the information available is <

i. An

obvious choice of stopping limits is to take the corresponding points on the continuousboundary: l

i"l(<

i) and u

i"u(<

i). However, these limits will achieve error probabilities smaller

than speci"ed, as the continuous sample path might wander outside the stopping boundariesbetween looks at the data, while coming back to lie within the continuation region when theinterim analysis is performed. It is necessary to bring the stopping limits l

iand u

icloser together

in compensation. The Christmas tree correction brings each of them in by the amount0)583J(<

i!<

i~1), (taking <

0"0) and is based on a result of Siegmund.35 It is very accurate for

the triangular test, although it can be less successful for other designs.36 A compromise is to usethe Christmas tree correction in order to "nd a design which approximately satis"es the powerrequirement, and then to use recursive numerical integration to perform a more accurate "nalanalysis. Otherwise a numerical search procedure will be necessary.

The methods which underlie the program EaSt37 are de"ned in terms of continuous bound-aries involving powers of<.38,39 These are curved, and no simple theory exists for their properties(although it is possible that the methods of Lai and Siegmund40 might be appropriate). Conse-quently, the spending functions cannot be derived; instead they are approximated using incom-plete beta functions. Suitable designs have to be found using search procedure based on recursivenumerical integration.

In the literature on a-spending functions,41}43 stopping rules are de"ned in terms of spendingfunctions which re#ect their desired trial properties. For example, an a-spending function which issmall at early looks, and then rises quickly towards the end, will lead to a procedure which willonly stop early for overwhelming evidence of a treatment di!erence. In some of these papers, nopower requirement is set, or else approximate ful"lment is achieved by equating <max to theamount of information required in an equivalent "xed sample trial. More accurate speci"cationof power can be achieved by "rst choosing the proportion of information that will be available ateach interim and then searching for the appropriate value of <max. In practice, the plannedinformation proportions will not be achieved precisely, but the power properties are robust undera wide variety of inspection schedules.

The recursive numerical integration approach will evaluate the properties of any sequentialprocedure however it has been de"ned. Thus it can be applied to designs formulated in terms ofstochastic curtailment, or even to "nd the frequentist properties of Bayesian designs.

A concept related to sequential designs is that of a repeated con"dence interval.44 A series ofintervals of the form (hLi

, hUi), i"1, 2,2, is called a 100(1!c) per cent repeated con,dence

interval if

PM(h¸i, h;i) U h for i"1, 2,2N"1!c.

The hLiand hUi

are test statistics calculated at the ith interim analysis. A suitable repeatedcon"dence interval can be derived from any non-convergent sequential design in which the null

A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS 2279

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

probabilities of crossing each boundary are 12c. In this case, the limits

hLi"(Z

i!u

i)/<

iand hUi

"(Zi!l

i)/<

i

can be used. Conversely, a repeated con"dence interval can be transformed into a sequentialdesign by requiring that the trial be stopped as soon as the interval (hLi

, hUi) fails to contain 0.

Repeated con"dence intervals can be used without imposing a stopping rule to indicate withgrowing precision the likely value of h. The method is a discrete counterpart to continuous timecon,dence sequences.45

The continuous boundaries for a variety of useful designs are shown in Figure 1. Four pairs ofdesigns are illustrated; those on the left being derived from the program PEST46 and those on theright from EaSt.37 The top two are symmetric designs featuring non-convergent boundarieschosen to avoid continuation of the trial in the presence of a major treatment di!erence. If theboundaries are not crossed, then the trial would continue to a maximum sample size increasedrelative to the "xed sample design of equal power to allow for early stopping. The second pair areasymmetric designs featuring convergent boundaries. Crossing the upper boundary is taken assigni"cant evidence that E is better than C. It is possible to identify an early part of the lowerboundary, which is reached with null probability 1

2a and to consider crossing of this as signi"cant

evidence that E is worse than C. Thus a two-sided alternative is created, but with asymmetricpower properties, in that whilst the probability that E is found to be signi"cantly better thanC when h"hR is (1!b), the probability that E is found to be signi"cantly worse than C whenh"!hR will be far smaller than (1!b). In PEST, this two-sided asymmetric approach isexplicitly catered for, and it is indicated in the "gure; the o$cial formulation in EaSt is di!erent.The third pair are double tests, combining the symmetry of pair (a) with the ability to stop early inthe absence of a treatment e!ect possessed by pair (b). Such designs can be used in equivalencetesting.47 The "nal pair are also asymmetric: that from PEST allows early stopping only whenE appears to be worse than C and so provides safety monitoring; the EaSt design stops only whenE appears to be better than C. It is interesting to note how similar the repertoires of these twopackages are, despite their di!ering methodological backgrounds.

5. POST-TRIAL ANALYSIS

Once a sequential trial has terminated, an analysis must be performed. The interim analyses serveonly to determine whether stopping should take place; they do not provide complete interpreta-tions of the data. The "nal analysis seeks to express the degree of evidence that a treatmentdi!erence exists using a P-value, and to estimate its magnitude using point and interval estimates.In this paper, it will be assumed that the trial has stopped according to a formal stoppingprocedure; other possibilities are considered by Whitehead.48

Most analysis methods available are based on some ordering of the possible data sets whichcould be observed at the end of the trial, one potential data set being called more extreme thananother if it supports the alternative hypothesis that E is better than C more strongly. TheP-value function P(h), de"ned as

P(h)"P (observing data more extreme than that from the trial; h)

is central to the "nal frequentist analysis. The P-values against the one- and two-sided alterna-tives are then P(0) and 2 minMP(0), 1!P(0)N, respectively. A 100(1!c) per cent con"dence

2280 J. WHITEHEAD

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

interval (hL, hU) can be found by solving P(hL)"12c and P(hU)"1!1

2c and a median unbiased

estimate hM of h by solving P(hM)"12.

When monitoring is continuous (in the idealized sense described in Section 4), each possibledata set will have terminal values of Z and < on one of the boundaries, and an anti-clockwiseordering of data sets according to this "nal (<,Z) point appears to be appropriate. Hence, for anydata set corresponding to the crossing of the upper boundary, more extreme data sets will bethose leading to earlier crossing of the upper boundary. For any data set corresponding to thecrossing of the lower boundary, later crossing of the lower boundary, or any crossing of the upperboundary or stopping with <"<max would be considered to be more extreme. The anti-clockwise ordering was introduced by Armitage49 and further developed by Siegmund.50

Fairbanks and Madsen51 introduced an ordering of terminal data sets which allows for thediscrete form of monitoring used in practice, and this was further discussed by Tsiatis et al.52 Fora data set in which the upper boundary is crossed, this ordering regards as more extreme thosedata sets leading to stopping at an earlier look, or to stopping at the same look, with a largerterminal value of Z. Alternative orderings for the discrete case have been suggested.53}55 Some ofthese are not stochastic orderings, meaning that they result in P-value functions which are notstrictly monotonic increasing. A consequence is that con"dence regions are not necessarilyintervals, but may be made up from two or more disjoint intervals with gaps in between. Otherfeatures of such alternatives is that they might imply that early stopping is associated with onlyborderline signi"cance, whereas late stopping leads to stronger conclusions. This is in contrast tothe Fairbanks and Madsen ordering for which early stopping is necessarily accompanied by verysmall P-values.

An analysis based on the Fairbanks and Madsen ordering depends on the "nal value of Z, andconditions on the values of < at the "nal and at all intermediate inspections of the data. Any twodesigns which share the stopping limits (l

i, u

i) for all inspections that were carried out, will lead to

the same analysis, regardless of what stopping limits might have been imposed had the trialcontinued. This property is intuitively appealing. In particular, it implies that if a trial stops at the"rst interim look, then the analysis is identical to a conventional "xed sample analysis. TheP-value, con"dence limits and median unbiased estimate can be calculated in the conventionalway. The special analysis required for sequential clinical trials is, in e!ect, an allowance forprevious interim looks, and in the case of stopping at the "rst interim, there are no previous looksto allow for. All other orderings su!er from the serious defect that the values of< at interim looksthat would have been carried out had the trial not stopped have to be taken into consideration.As these potential interims were never carried out, the corresponding <-values are a matter ofspeculation; indeed there is no guarantee that these subsequent looks would have taken place atall. It is true that the quantitative e!ect of di!erent assumptions concerning the subsequentpattern of looks is very small, but the philosophical objection to such dependence remains.

A limitation of methods deduced from orderings is that they provide no guidance for theestimation of parameters other than h. For example, after a clinical comparison of two streams ofbinary responses, estimation of the experimental and control success probabilities, pE and pC,might be required in addition to that of their log-odds ratio h. In the case of a trial stopped earlywith a positive conclusion, not only will a conventional estimate of h be too large, conventionalestimates of pE and pC will be too far apart, and their associated con"dence intervals will be tooshort. Woodroofe56 introduced a method for the calculation of con"dence intervals based on anapproximately normally distributed pivot which can be used to obtain con"dence intervals bothfor h itself57 and for pE and pC.58 More general applications to other parametric functions appear

A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS 2281

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

to be possible. For point estimation, Whitehead59 suggested an approximate method for remov-ing the bias of a conventional maximum likelihood estimate of h, which at the same time reducesits variance. Unfortunately, implementation of both this and the Woodroofe approaches requirethe computation of expected values of test statistics at termination, which in turn requiresknowledge of the complete schedule of interim inspections, including any which would have takenplace had the trial not been stopped.

A method which does not require speculation on what might have happened has beensuggested by Emerson,60 based on earlier work due to Ferebee.61 The underlying idea is simple.To the level of approximation at which Z is normally distributed with mean h< and variance<,Z

1/<

1is an unbiased estimate of h, where Z

1and <

1are the values of Z and < at the "rst

interim analysis. Furthermore, the terminal values ZT and <T of Z and < are jointly su$cient forh. The process of Rao}Blackwellization then guarantees that E(Z

1/<

1DZT,<T) will be unbiased for

h and have a smaller variance, and it is this point estimate which is recommended. Emerson andKittelson62 suggest an e$cient algorithm for evaluating the estimate, and Liu and Hall63investigate its claims for optimality.

6. ISSUES AND OPPORTUNITIES

The mathematical results of sequential analysis are clear; if certain stopping rules are followedrigorously, then certain risks of error will be achieved and certain inferences will be valid. Majorissues involved in the practical implementation of sequential clinical trial designs are whetherstopping rules should be followed rigorously, and if not, what degree of #exibility is appropriate.

Clinical trials occupy a special place in scienti"c research because the subjects under study arepeople, and in particular people who are particularly vulnerable. Consequently, the absolutes ofscienti"c rigour have to be tempered with the ethical demands of respect and care for the patients.The activities of a Data and Safety Monitoring Board compromise the validity of any frequentistanalysis, "xed-sample or sequential, because of their power to terminate a trial due to anyundesirable treatment comparison which they notice. It is not possible, even conceptually, toallow for their behaviour in the analysis because the whole point of human rather than automaticsafety monitoring is to allow reactions to unexpected problems. In practice, most responsiblesafety monitoring has little e!ect on "nal analyses, and because the only available action is to stopdue to a safety problem which is likely to lead to the abandonment of the experimental treatment,the e!ect is usually in the conservative direction. Nevertheless, such considerations indicate thatpure scienti"c rigour is not possible.

In a sequential study, the "ndings of formal interim analyses, including a plot of Z against< forthe primary endpoint, will often be presented to the Data and Safety Monitoring Board forrati"cation. The Board will have the right to override the formal recommendation, and indeciding whether to do so will take into account secondary comparisons and possibly newevidence from other studies. Careful discussion of the formal rule and its consequences with theBoard before the trial begins will reduce the likelihood of the Board departing from it, and thuswill enhance the validity of the eventual analysis.

Many clinical trial designs are formulated under strong mathematical assumptions. Forexample, it might be supposed that the treatment di!erence h is common to a number of strata.A linear relationship with some baseline prognostic covariate might be assumed. In survivalanalyses, the assumption of proportional hazards might be made, and for ordered categoricaldata, a proportional odds model might be adopted. When a sample size is "xed in advance, based

2282 J. WHITEHEAD

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

on such an assumption, then the "rst opportunity to check its validity comes when the trial iscomplete. If it is found to be invalid, then it might be realized that the sample size chosen was toosmall. A sequential design provides the opportunity to check model assumptions at each interimanalysis, and to alter the basis of the design if the assumption is found to be inappropriate. Shouldthis opportunity be taken? It further erodes the mathematical purity of the procedure. On theother hand, the alternative is to persist with a model in the face of evidence that it is wrong, andperhaps misleading. A certain amount of formal allowance for repeated model checking may bedeveloped, but the existence of interim looks will always leave the dilemma of what to do ifsomething untoward is found which is outside their formal remit, or whether to avoid theproblem by not looking too closely in the "rst place. My own view is that the consequences ofonline model checking need to be investigated further through theory and simulation, but that inthe meantime it should be performed at least in respect of the most crucial model assumptions.

A particular form of model checking can be made when patient responses are normallydistributed, and the power speci"cation has been made in terms of the absolute treatmentdi!erence d"kE!kC. As discussed in Section 2, the most accurate procedure is to use theanticipated value of p to recast the power requirement into terms of the standardized di!erence,h"(kE!kC)/p, and to design the study accordingly. Prior to the "rst interim analysis, the valueof p can be checked. To avoid unblinding the study, the EM algorithm can be used to estimatep without identifying treatment allocations, as described in the context of sample size reviews byGould and Shih.64 The reference value of h can be recalculated using this new estimate, anda revised design considered. Having decided whether to use the revised design, the data can beunblinded (for the monitoring statistician) and the "rst interim analysis conducted.

Repeated model checking is just one example of making multiple judgements at each interimanalysis. Other possibilities include comparing two treatments in respect of more than oneendpoint, and comparing more than two treatments. The case of simultaneously monitoringendpoints for both safety and e$cacy has been discussed in other papers,65,66 although furtherdevelopment for the more general bivariate an multivariate cases would still be worthwhile.Sequential s2 and F-tests for multiple treatments have been developed,67,68 but these lead tostopping as soon as it is established that not all treatments are equivalent, rather than going on to"nd which is the best. Most procedures adopted in practice are compromises which combinepairwise procedures, such as the three-treatment comparison described by Whitehead andThomas.20

Sequential designs have now been applied to many areas of clinical research. The book editedby Peace69 and a special issue of Statistics in Medicine70 include accounts of a variety ofsequential procedures. The triangular test has been applied to clinical trials in lung cancer,71,72bone marrow transplantation,73 pneumonia in AIDS su!erers,74 stroke,75 renal cancer,76 andcardiovascular disease77 among others. The restricted procedure has been applied to a trial inosteoarthritis20 and the open top design to a trial in amyotrophic lateral sclerosis.78 This level ofimplementation of the limited and simple methods which are already available should encouragethe development of more sophisticated procedures to address the real problems which arise inapplication.

REFERENCES

1. Tarone, R. E. and Ware, J. &On distribution-free tests for equality of survival distributions', Biometrika,64, 156}160 (1977).

A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS 2283

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

2. Sooriyarachchi, M. R. and Whitehead, J. &A method for sequential analysis of survival data withnonproportional hazards', Biometrics, 54, 1072}1084 (1998).

3. Anscombe, F. J. &Normal likelihood functions', Annals of the Institute of Statistical Mathematics, 16, 1}17(1964).

4. Sprott, A. D. &Normal likelihoods and their relation to large sample theory of estimation', Biometrika, 60,457}465 (1973).

5. Whitehead, J. &The use of the sequential probability ratio test for monitoring the percentage germinationof accessions in seed banks', Biometrics, 37, 129}136 (1981).

6. Stallard, N. and Todd, S. &Exact sequential methods for a single sample of binary responses', Statistics inMedicine, submitted.

7. Facey, K. M. &A sequential procedure for a phase II e$cacy trial in hypercholesterolemia', ControlledClinical ¹rials, 13, 122}133 (1992).

8. Cox, D. R. &Some remarks on the role in statistics of graphical methods', Applied Statistics, 27, 4}9 (1978).9. Whitehead, J. ¹he Design and Analysis of Sequential Clinical ¹rials, revised 2nd edn, Wiley, Chichester,

1997.10. Rao, C. R. &Large sample tests of statistical hypotheses concerning several parameters with applications

to problems of estimation', Proceedings of the Cambridge Philosophical Society, 44, 50}57 (1948).11. Cox, D. R. and Hinkley, D. V. ¹heoretical Statistics, Chapman and Hall, London, 1974.12. Armitage, P. &Tests for linear trends in proportions and frequencies', Biometrics, 11, 375}386 (1955).13. Silvey, S. D. &A note on maximum-likelihood in the case of dependent random variables', Journal of the

Royal Statistical Society, Series B, 23, 444}452 (1961).14. Sellke, T. and Siegmund, D. &Sequential analysis of the proportional hazards model', Biometrika, 70,

315}326 (1983).15. Tsiatis, A. A., Boucher, H. and Kim, K. &Sequential methods for parametric survival models', Biometrika,

82, 165}173 (1995).16. Bartlett, M. S. &The large sample theory of sequential tests', Proceedings of the Cambridge Philosophical

Society, 42, 239}244 (1946).17. Lan, K. K. G. and Wittes, J. &The B-Value: A tool for monitoring data', Biometrics, 44, 579}585 (1988).18. Lan, K. K. G. and Zucker, D. M. &Sequential monitoring of clinical trials: the role of information and

Brownian motion', Statistics in Medicine, 12, 753}765 (1993).19. Whitehead, J. &Sample size calculations for ordered categorical data', Statistics in Medicine, 12,

2257}2271 (1993).20. Whitehead, J. and Thomas, P. &A sequential trial of pain killers in arthritis: Issues of multiple

comparisons with control and of interval-censored data', Journal of Biopharmaceutical Statistics, 7,(1997) 333}353.

21. Wald, A. &Tests of statistical hypotheses concerning several parameters when the number of observationsis large', ¹ransactions of the American Mathematical Society, 54, 426}482 (1943).

22. Cox, D. R. &Large sample sequential tests for composite hypotheses', Sankhya, 25, 5}12 (1963).23. Armitage, P., McPherson, C. K. and Rowe, B. C. &Repeated signi"cance tests on accumulating data',

Journal of the Royal Statistical Society, Series A, 132, 235}244 (1969).24. Wald, A. Sequential Analysis, Wiley, New York, 1947.25. Barnard, G. A. &Sequential tests in industrial statistics', Journal of the Royal Statistical Society, Supple-

ment, 8, 1}26 (1946).26. Anderson, T. W. &A modi"cation of the sequential probability ratio test to reduce sample size', Annals of

Mathematical Statistics, 31, 165}197 (1960).27. Lorden, G. &2-SPRT's and the modi"ed Kiefer}Weiss problem of minimising an expected sample size',

Annals of Statistics, 4, 281}291 (1976).28. Hall, W. &Sequential minimum probability ratio tests', in Chakravarti, I. M. (ed.), Asymptotic ¹heory of

Statistical ¹ests and Estimation, Academic Press, New York, 1980, pp. 325}350.29. Sobel, M. and Wald, A. &A sequential decision procedure for choosing one of three hypotheses concerning the

unknown mean of a normal distribution', Annals of Mathematical Statistics, 20, 502}522 (1949).30. Whitehead, J. and Brunier, H. &The double triangular test: a sequential test for the two-sided alternative

with early stopping under the null hypothesis', Sequential Analysis, 9, 117}136 (1990).31. Hall, W. J. &The distribution of Brownian motion on linear stopping boundaries', Sequential Analysis, 16,

345}352 (1996) Addendum in 17, 123}124 (1997).

2284 J. WHITEHEAD

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

32. Wald, A. and Wolfowitz, J. &Optimum character of the sequential probability ratio test', Annals ofMathematical Statistics, 19, 326}339 (1948).

33. Lai, T. L. &Optimal stopping and sequential tests which minimise the maximum expected sample size',Annals of Statistics, 1, 659}673 (1973).

34. Hall, W. J. &Sequential triangular tests for Brownian motion with minimax stopping times', Presented atthe 1996 International Biometric Conference, Amsterdam 1996.

35. Siegmund, D. &Corrected di!usion approximations in certain random walk problems', Advances inApplied Probability, 11, 701}719 (1979).

36. Stallard, N. and Facey K. M. &Comparison of the spending function method and the Christmas treecorrection for group sequential trials', Journal of Biopharmaceutical Statistics, 6, 361}373 (1996).

37. CyTeL Software Corportion. EaSt: A Software Package for the Design and Interim Monitoring of GroupSequential Clinical ¹rials, Cytel, Cambridge MA, 1992.

38. Wang, S. K. and Tsiatis, A. A. &Approximately optimal one-parameter boundaries for group sequentialtrials', Biometrics, 43, 193}199 (1987).

39. Pampallona, S. and Tsiatis, A. A. &Group sequential designs for one-sided and two-sided hypothesistesting with provision for early stopping in favour of the null hypothesis', Journal of Statistical Planningand Inference, 42, 19}35 (1994).

40. Lai, T. L. and Siegmund, D. &A nonlinear renewal theory with applications to sequential analysis I',Annals of Statistics, 5, 946}954 (1977).

41. Lan, K. K. G. and DeMets, D. L. &Discrete sequential boundaries for clinical trials', Biometrika, 70,659}663 (1983).

42. Kim, K. and DeMets, D. L. &Design and analysis of group sequential tests based on the type I errorspending rate function', Biometrika, 74, 149}154 (1987).

43. Hwang, I. K., Shih, W. J. and DeCani, J. S. &Group sequential designs using a family of type I errorprobability spending functions', Statistics in Medicine, 9, 1439}1445 (1990).

44. Jennison, C. and Turnbull, B. W. &Interim analysis: the repeated con"dence interval approach', Journal ofthe Royal Statistical Society, Series B, 51, 305}361 (1989).

45. Robbins, H. &Statistical methods related to the law of the iterated logarithm', Annals of MathematicalStatistics, 41, 1397}1409 (1970).

46. Brunier, H. and Whitehead, J. PES¹3.0 Operating Manual, Reading University, 1993.47. Whitehead, J. &Sequential designs for equivalence studies', Statistics in Medicine, 15, 2703}2715 (1996).48. Whitehead, J. &Overrunning and underrunning in sequential clinical trials', Controlled Clinical ¹rials, 13,

106}121 (1992).49. Armitage, P. &Restricted sequential procedures', Biometrika, 44, 9}26 (1957).50. Siegmund, D. &Estimation following sequential tests', Biometrika, 65, 341}349 (1978).51. Fairbanks, K. and Madsen, R. &P values for testing using a repeated signi"cance test design', Biometrika,

69, 69}74 (1982).52. Tsiatis, A. A., Rosner, G. L. and Mehta, C. R. &Exact con"dence intervals following a group sequential

test', Biometrics, 40, 797}803 (1984).53. Rosner, G. L. and Tsiatis, A. A. &Exact con"dence limits following group sequential test', Biometrika, 75,

723}729 (1988).54. Chang, M. N. &Con"dence intervals for a normal mean following group sequential test', Biometrics, 45,

247}254 (1989).55. Emerson, S. S. and Fleming, T. R. &Parameter estimation following group sequential hypothesis testing',

Biometrika, 77, 875}892 (1990).56. Woodroofe, M. &Estimation after sequential testing: A simple approach for a truncated sequential

probability ratio test', Biometrika, 79, 347}353 (1992).57. Todd, S., Whitehead, J. and Facey, K. M. &Point and interval estimation following a sequential test',

Biometrika, 83, 453}461 (1996).58. Todd, S. and Whitehead, J. &Con"dence interval calculation for a sequential clinical trial of binary

responses', Biometrika, 84, 737}743 (1997).59. Whitehead, J. &On the bias of maximum likelihood estimation following a sequential test', Biometrika, 73,

573}581 (1986).60. Emerson, S. S. &Computation of the uniform minimum variance unbiased estimator of a normal mean

following a group sequential trial', Computers and Biomedical Research, 26, 68}73 (1993).

A UNIFIED THEORY FOR SEQUENTIAL CLINICAL TRIALS 2285

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)

61. Ferebee, B. &An unbiased estimator for the drift of a stopped Wiener process', Journal of AppliedProbability, 20, 94}102 (1983).

62. Emerson, S. S. and Kittelson, J. M. & A computationally simpler algorithm for the UMVUE of a normalmean following a group sequential trial', Biometrics, 53, 365}369 (1997).

63. Liu, A. and Hall, W. J. &Unbiased estimation following a group sequential test', Biometrika, 86, 71}78(1999).

64. Gould, A. L. and Shih, W. J. &Sample size re-estimation without unblinding for normally distributed datawith unknown variance', Communications in Statistics } ¹heory and Methods, 21, 2833}2853 (1992).

65. Jennison, C. and Turnbull, B. W. &Group sequential tests for bivariate response: interim analysis ofclinical trials with both e$cacy and safety endpoints', Biometrics, 49, 741}752 (1993).

66. Cook, R. J. and Farewell, V. T. &Guidelines for monitoring e$cacy and toxicity responses in clinicaltrials', Biometrics, 50, 1146}1152 (1994).

67. Siegmund, D. &Sequential s2 and F tests and the related con"dence intervals', Biometrika, 67, 387}402(1980).

68. Jennison, C. and Turnbull, B. W. &Exact calculations for sequential t, s2 and F tests', Biometrika, 78,133}141 (1991).

69. Peace, K. E. (ed.). Biopharmaceutical Sequential Statistical Applications, Dekker, New York, 1992.70. Souhami, R. L. and Whitehead, J. (ed.). &Workshop on early stopping rules in cancer clinical trials',

Statistics in Medicine, 13, 1289}1500 (1994).71. Jones, D. R., Newman, C. E. and Whitehead, J. &The design of a sequential clinical trial for comparison of

two lung cancer treatments', Statistics in Medicine, 1, 73}82 (1982).72. Whitehead, J., Jones, D. R. and Ellis, S. H. &The analysis of a sequential clinical trial for the comparison

of two lung cancer treatments', Statistics in Medicine, 2, 183}190 (1983).73. Storb, R., Deeg, J., Whitehead, J., Appelbaum, F., Beatty, P., Bensinger, W. et al. &Methotrexate and

cyclosporine compared with cyclosporine alone for prophylaxis of acute graft versus host disease aftermarrow transplantation for leukemia', New England Journal of Medicine, 314, 729}735 (1986).

74. Montaner, J. S. G., Lawson, L. M., Belzberg, A., Schechter, M. T. and Ruedy, J. &Corticosteroids preventearly deterioration in patients with moderately severe pneumocystis carinii pneumonia and the acquiredimmunode"ciency syndrome (AIDS)', Annals of Internal Medicine, 113, 14}20 (1990).

75. Whitehead, J. &Application of sequential methods to a phase III clinical trial in stroke', Drug InformationJournal, 27, 733}740 (1993).

76. Fayers, P. M., Cook, P. A., Machin, D., Donaldson, N., Whitehead, J., Ritchie, A., Oliver, R. T. D. andYuen, P. &On the development of the medical research council trial of a-interferon in metastic renalcarcinoma', Statistics in Medicine, 13, 2249}2260 (1994).

77. Moss, A. J., Hall, W. J., Cannom, D. S., Daubert, J. P., Higgins, S. L., Klein, H. et al. &Improved survivalwith implanted de"brillator in patients with coronary disease at high risk for ventricular arrhythmia',New England Journal of Medicine, 335, 1933}1940 (1996).

78. Lacomblez, L., Bensimon, G., Leigh, P. N., Guillet, P. and Meininger, V. &Dose-ranging study of riluzolein amyotrophic lateral sclerosis', ¸ancet, 347, 1425}1431 (1996).

2286 J. WHITEHEAD

Copyright ( 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2271}2286 (1999)