a 25-year review of sequential methodology in clinical studies

16
STATISTICS IN MEDICINE Statist. Med. 2007; 26:237–252 Published online 1 December 2006 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/sim.2763 Paper Celebrating the 25th Anniversary of Statistics in Medicine A 25-year review of sequential methodology in clinical studies Susan Todd , Medical and Pharmaceutical Statistics Research Unit, The University of Reading, PO Box 240, Earley Gate, Reading RG6 6FN, U.K. SUMMARY This paper explores the theoretical developments and subsequent uptake of sequential methodology in clinical studies in the 25 years since Statistics in Medicine was launched. The review examines the contributions which have been made to all four phases into which clinical trials are traditionally classified and highlights major statistical advancements, together with assessing application of the techniques. The vast majority of work has been in the setting of phase III clinical trials and so emphasis will be placed here. Finally, comments are given indicating how the subject area may develop in the future. Copyright 2006 John Wiley & Sons, Ltd. KEY WORDS: stopping rules; interim analyses; data monitoring; clinical development phases; clinical trials 1. INTRODUCTION The Encyclopaedic Companion to Medical Statistics [1] defines sequential analysis as ‘a method allowing hypothesis tests to be conducted on a number of occasions as data accumulate through the course of a trial. A trial monitored in this way is usually called a sequential trial.’ In clinical studies there is great interest in sequential procedures for ethical, economic and administrative reasons and the methodology has a role to play in all phases of clinical research. The most compelling reason for monitoring trial data is that, ethically, it is desirable to terminate or modify a trial when evidence has emerged concerning the particular hypothesis of interest. In a sequential procedure the sample size is not fixed in advance of the trial. Instead, such studies implement stopping rules. Armitage, in his book ‘Sequential Medical Trials’ [2] and second edition [3] first widely publi- cised the use of sequential testing in the field of medicine, although Bross [4], amongst others, had advocated such methods a little earlier. Armitage argued that ethical considerations demand a trial Correspondence to: Susan Todd, Medical and Pharmaceutical Statistics Research Unit, The University of Reading, PO Box 240, Earley Gate, Reading RG6 6FN, U.K. E-mail: [email protected] Received 19 September 2006 Copyright 2006 John Wiley & Sons, Ltd. Accepted 11 October 2006

Upload: susan-todd

Post on 06-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

STATISTICS IN MEDICINEStatist. Med. 2007; 26:237–252Published online 1 December 2006 in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/sim.2763

Paper Celebrating the 25th Anniversary of Statistics in Medicine

A 25-year review of sequential methodology in clinical studies

Susan Todd∗,†

Medical and Pharmaceutical Statistics Research Unit, The University of Reading, PO Box 240,Earley Gate, Reading RG6 6FN, U.K.

SUMMARY

This paper explores the theoretical developments and subsequent uptake of sequential methodology inclinical studies in the 25 years since Statistics in Medicine was launched. The review examines thecontributions which have been made to all four phases into which clinical trials are traditionally classifiedand highlights major statistical advancements, together with assessing application of the techniques. Thevast majority of work has been in the setting of phase III clinical trials and so emphasis will be placedhere. Finally, comments are given indicating how the subject area may develop in the future. Copyrightq 2006 John Wiley & Sons, Ltd.

KEY WORDS: stopping rules; interim analyses; data monitoring; clinical development phases; clinicaltrials

1. INTRODUCTION

The Encyclopaedic Companion to Medical Statistics [1] defines sequential analysis as ‘a methodallowing hypothesis tests to be conducted on a number of occasions as data accumulate through thecourse of a trial. A trial monitored in this way is usually called a sequential trial.’ In clinical studiesthere is great interest in sequential procedures for ethical, economic and administrative reasonsand the methodology has a role to play in all phases of clinical research. The most compellingreason for monitoring trial data is that, ethically, it is desirable to terminate or modify a trial whenevidence has emerged concerning the particular hypothesis of interest. In a sequential procedurethe sample size is not fixed in advance of the trial. Instead, such studies implement stopping rules.

Armitage, in his book ‘Sequential Medical Trials’ [2] and second edition [3] first widely publi-cised the use of sequential testing in the field of medicine, although Bross [4], amongst others, hadadvocated such methods a little earlier. Armitage argued that ethical considerations demand a trial

∗Correspondence to: Susan Todd, Medical and Pharmaceutical Statistics Research Unit, The University of Reading,PO Box 240, Earley Gate, Reading RG6 6FN, U.K.

†E-mail: [email protected]

Received 19 September 2006Copyright q 2006 John Wiley & Sons, Ltd. Accepted 11 October 2006

238 S. TODD

be stopped as soon as there is clear evidence that one of the treatments is to be preferred and thatthis leads to the use of a sequential trial. He described a number of techniques and their applicationto trials comparing two alternative treatments. This was in the setting of what we would now terma phase III clinical trial. Much of his work stemmed from that of Wald [5] and Barnard [6]. Earlysequential designs were expressed in terms of the preference for one or other of the treatmentsafter each pair of patients, one individual assigned to each therapy. Monitoring was conducted afterevery pair. These early ideas provided the scope for further research. In particular, Pocock [7] andO’Brien and Fleming [8] developed designs with fewer inspections than Armitage’s original tests,with groups of patients responding between each look. These became known as group-sequentialtrials. The methodology was developed within the framework of two specific requirements. Tofacilitate the design, the maximum number of interim looks at the data had to be pre-specified. Inaddition, the interim looks, or interim analyses, had to be conducted at precisely equally spacedintervals. During the course of a trial, various factors may influence the number, frequency andtiming of interim analyses. If the frequency or timing of the analyses were to be substantiallyaltered from the planned schedule then the designs described above would become inappropriate.

This brief summary highlights the state of development of methodology in this area 25 yearsago when Statistics in Medicine was launched. To complement this, there was a small streamof papers describing application of sequential methods in actual clinical trials. Kilpatrick andOldham [9] give one of the earliest accounts, although there were others (for example [10]). Inthis paper the growth of the topic and its subsequent application over the last 25 years fromthe early 1980s to the present day is reviewed. Statistics in Medicine has published articles on thesubject throughout the 25 years of the journal. In the early years, the emphasis was perhaps moreon the application of techniques described elsewhere (for example [11–14]), but more recently thefocus has shifted, with authors submitting more methodological manuscripts to the journal (forexample [15–18]).

For this review, we will describe advances in sequential methodology which have taken place ineach of the four phases commonly used to classify clinical trials. It is phase III where sequentialprocedures are most well developed and now well established and so the major emphasis will beplaced here. In each section the aim is to outline key developments and to bring together referencesrelevant to a study of the particular topic, pointing the reader to where further details can be found.The paper will conclude with some indication as to how the subject area may develop in the future.

2. PHASE I CLINICAL TRIALS

Phase I studies are usually the first studies in man. Approaches to testing are most naturallysequential, with data being reviewed after every patient or small group of patients to decidewhether to carry on with the study and, if so, which dose of the experimental treatment to usenext. However, in this setting, the use of formal statistical procedures has been very limited. Indeed,pre-Statistics in Medicine it was only mathematical statisticians, publishing in the more theoreticalstatistics journals, who were considering the problem (for example [19, 20]). The developmentof formal statistical methods has mostly been in trials for oncology where ethical considerationsare paramount; however, more recent work has looked at the application of techniques to healthyvolunteer studies.

Probably the best known formal procedure developed over the last 25 years is the continualreassessment method (CRM) introduced in a key paper in 1990 by O’Quigley et al. [21]. The paper

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

SEQUENTIAL METHODOLOGY IN CLINICAL STUDIES 239

envisages a study in which patients are treated one at a time and subsequently evaluated, in orderto detect the TD20, that is the dose associated with a probability of toxicity of 0.2. At the beginningof the trial the first patient is treated with the dose for which the probability of toxicity is thoughtto be closest to 0.2. Having observed whether a toxicity has occurred, the most likely value of theprobability of toxicity for each dose is recalculated using a Bayesian approach. The second patientthen receives the dose for which the probability of toxicity is now closest to 0.2 and the procedurecontinues until several successive patients receive the same dose safely. Since the appearance ofthis paper, the procedure has been the focus of much interest and debate. Authors such as Faries[22], Korn et al. [23] and Goodman et al. [24], some of them publishing in Statistics in Medicinehave suggested modifications and enhancements. O’Quigley himself provides an up-to-date reviewof the state of play [25].

A second body of work considers the use of Bayesian decision theory as an alternative to theCRM. Gatsonis and Greenhouse [26], Whitehead and Brunier [27] and Babb et al. [28] haveall published on the topic in Statistics in Medicine. In order to use this sequential approach asimple model needs to be assumed which describes the way in which the risk of toxicity increaseswith dose. Prior opinion concerning the unknown parameters of this model must be elicited andappropriately represented. Procedures are then put in place to determine how the prior informationis modified in the light of accumulating data to decide which dose to administer to each subject inthe next cohort. Two recently published multi-authored books provide excellent up-to-date reviewsof this area [29, 30]. Unfortunately, these latter techniques have yet to be widely implementedin practice. Perhaps this is where the challenge now lies and comments relating to this will beoutlined in the conclusions to this paper.

3. PHASE II CLINICAL TRIALS

A wide variety of investigative studies fall under the broad heading of phase II clinical trials. Inthe simplest case a phase II study may be a small single-arm trial to assess basic efficacy andsafety of the experimental treatment. This form of design is found almost exclusively in oncologystudies. Alternatively, the term may also describe a larger comparative study comparing a controltreatment to an experimental treatment, possibly at several doses, incorporating an element oftreatment selection. As with phase I, most of the development of sequential methods in phase IIhas been in the last 25 years and again, has been motivated by trials in serious diseases. Becausesuch studies are generally small, there is limited scope for including many looks at the accumulatingdata. Work has concentrated on the development of sequential methodology with just one or twointerim analyses.

In single-arm phase II studies all patients receive the experimental treatment. Two broad classesof designs have been proposed, those based on frequentist approaches with the specification oftraditional error rates and those based on Bayesian designs. Within the frequentist setting Schoenfeld[31] suggested implementing a conventional hypothesis testing formulation in phase II trials, butadvocated reducing the level of rigour required compared to a phase III design by moving the errorrates away from the traditional figures of 5 per cent for type I error and 10 per cent for type II error.Since then, many authors have taken up this idea and proposed sequential phase II trials withinthis framework. Because of the basic nature of a single-arm trial, the data can be summarizedsimply. This means that even if the computations to derive appropriate stopping rules to preserveerror rates may be somewhat complex, the rules themselves can be easily described. The trial

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

240 S. TODD

usually begins with testing a specific number of patients. If fewer than a pre-specified number aresuccessfully treated then the trial stops. Otherwise, a further group of patients is treated. Again, theoutcome from this group may be to abandon further development of the treatment, or to continue.Usually only one or two looks at the accumulating data will be conducted. At the last look, if anappropriate number of patients are successfully treated then development continues with a phaseIII study. The best known and most used design was developed by Simon [32]. Work in this areaalso includes that by Fleming [33], Chen [34] and Conaway and Petroni [35].

The alternative school of thought makes use of Bayesian theory. Two of the key referenceswhich have been widely cited since their publication are those of Thall and Simon [36] and Thallet al. [37]. Both papers describe methods for the construction of stopping rules for phase II trialsusing the Bayesian approach. The idea is that the trial continues with regular interim analysesuntil belief about the success of the treatment is sufficiently precise to be able to say that furtherdevelopment of the therapy is worthwhile. In many cases, it is possible to describe the stoppingrule in a similar way to the frequentist approach described above.

Also under the umbrella of Bayesian analysis comes sequential designs which utilize Bayesiandecision theory. Such designs base decisions made during the trial on the possible consequencesof the choices made. The methodology enables belief about a treatment’s success to be coupledwith an analysis of the costs and benefits associated with different actions to allow a decisionconcerning the progress of the trial to be made. A number of authors including Hilden et al. [38],Cressie and Beale [39], Heitjan [40], Stallard [41, 42] and Stallard et al. [43] have all suggesteddesigns based on this approach.

In comparative phase II studies patients are randomized between one or more experimentaltreatments and control. In the two arm case of one experimental treatment and a control, themethods outlined above for single arm studies can be extended. In the case of a selection studywhere patients are assigned to a number of treatment groups and a control, it is usually desirable todrop less effective or safe formulations as the trial progresses. Construction of sequential proceduresin this case is more complicated. Thall et al. [44] and Schaid et al. [45] both suggest two-stagedesigns whereby treatment selection occurs at the end of the first stage with a comparative analysisof the remaining treatment(s) and control at the end of the second. The Bayesian method proposedby Thall and Simon [36] for the single-arm study has been extended to selection studies by Thalland Estey [46] and Thall and Sung [47], both published in Statistics in Medicine.

In phase II studies, probably the most widely used stopping rules are those based on frequentistmethodology, such as the Simon design [32]. As with the phase I studies presented in Section 2,some of the sequential Bayesian methodology developed for phase II studies has found only limitedapplication. A strong collaboration between clinicians and statisticians is needed to increase thelikelihood of implementation.

4. PHASE III CLINICAL TRIALS

The greatest impact of sequential methodology has been felt in phase III studies. Here, the samplesizes of the trials are likely to be larger than for phase I and phase II studies. Consequently, thepotential exists to include more looks at the accumulating data and the methodology becomesmore complex. Furthermore, strong regulatory input means that many of the sequential proceduresdeveloped for use in phase III are much more formal than those used at earlier phases. The impactof this can be seen throughout the conduct of such studies and will be commented upon below.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

SEQUENTIAL METHODOLOGY IN CLINICAL STUDIES 241

The conduct of a phase III sequential clinical trial involves various stages: specifically design,monitoring and final analysis. For this review, developments over the last 25 years under each ofthese broad headings will be presented.

4.1. Design

In the early 1980s the methodological literature concerning sequential clinical trials was verymuch focussed on the issue of how to design such studies appropriately. Modern sequential testscan be grouped into two very broad classes which have developed from contrasting backgrounds.The first class of tests stems from Wald’s [5] sequential probability ratio test (SPRT) and willbe collectively defined as the boundaries approach. The second class of tests were facilitated bywork developed by Armitage et al. [48], with the key development coming in 1983 when Lan andDeMets proposed the �-spending function [49]. Other approaches are possible, although are lesswidely implemented. These will also be discussed.

To enable a brief discussion of these various approaches and the related literature, some basicbackground is now presented. Much of the advance in methodology for the first half of the last 25years has concentrated upon drawing inferences in the case of comparing a single experimentaltreatment with a control treatment, often termed univariate inference. We will begin by describingthe methods in this setting and discuss multivariate sequential methods in a separate section.

Suppose that inference is to be drawn about the value of some parameter, �, which representsthe treatment difference between the experimental and the control groups in a two-arm parallelgroup clinical trial. We assume that it is desired to test the null hypothesis H0: � = 0. In sequentialanalyses when repeated tests of this hypothesis are conducted, the specialist methodology ensuresthat an overall type I error rate of � for the trial as a whole is maintained. This is a key requirementof the regulatory authorities in the phase III setting.

The sequential approach first requires the definition of suitable test statistics for testing H0. Thereare numerous alternative test statistics which can be used. Early work [2] in the area prescribeddesigns whereby traditional test statistics such as the t-statistic or the chi-squared statistic weremonitored after each patient’s response was obtained. The progress of the trial was tracked in termsof the sample size. However, problems can arise as a result of nuisance parameters and consequentlythe power and type I error rate of a procedure will be affected. To maintain the correct propertiesof a test, authors including Whitehead [50] and Jennison and Turnbull [51], in their respectivebooks on the subject of sequential analysis, now suggest monitoring trials in terms of informationand not sample size. The appropriate translation from information to required sample size can thenbe implemented for different response types. Plotting directly against information helps guaranteepower at a particular reference value of �, regardless of the value of nuisance parameters.

Let us assume the following. Suppose that a maximum of K interim analyses are to be conducted.Choose for test statistics the score statistic and Fisher’s information [50], although alternatives areavailable [51]. At the kth interim analysis denote these by Sk and Ik , respectively, for k = 1, . . . , K .Scharfstein et al. [52] show that, for a wide range of problems, conditional on the valuesI1, . . . , IK , (S1, . . . , SK ) is asymptotically multivariate normally distributed with S1 ∼N(�I1, I1)and increments S j −Si ∼N(�(I j − Ii ), (I j − Ii )) independent of Si , 1� i � j � K . In the sequentialtrial the observed values of Sk may be plotted against the values of Ik . At the kth interim analysisin a sequential phase III trial, the observed value Sk is then compared with critical values lk anduk . If Sk � uk the trial would be stopped and the null hypothesis rejected in favour of the one-sidedalternative hypothesis H+

1 : �>0. If Sk � lk , the trial is stopped and, depending on the way in

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

242 S. TODD

which the test is constructed, either it is concluded that the experimental treatment is no better thancontrol, so H0 is accepted, or the null hypothesis is rejected in favour of the one-sided alternativehypothesis H−

1 : �<0. If Sk is between lk and uk , then the trial continues. The problem whichhas occupied so many authors in the field of sequential analysis is the calculation of the criticalvalues l1, . . . , lK and u1, . . . , uK to give an overall type I error rate of �. The various differentapproaches to answering this question are now addressed.

4.1.1. The �-spending functions approach. Armitage et al. in their key paper in 1969 [48] de-veloped a recursive numerical integration method for calculation of the overall type I error ratefor a sequential trial under the assumption that the test statistics S1, . . . , SK follow an asymptoticnormal distribution when critical values l1, . . . , lK and u1, . . . , uK are specified for the stoppingrule. The method enables the effect of conducting interim analyses without adjusting for multipletesting to be demonstrated and allows the construction of critical values to maintain an overall typeI error rate of �. This methodology was used by both Pocock [7] and O’Brien and Fleming [8].Their results were tabulated to allow easy implementation without the need for additional compu-tation. These methods, particularly the O’Brien and Fleming design remain in use in clinical trials,however, a far more flexible design approach is provided by the �-spending function method. Thisadvance can be regarded as one of the milestones in this discipline of the last quarter century. Theearliest work of this nature can be traced back to Slud and Wei [53]. These authors present theirmethod in terms of the number of interims to be conducted, rather than information, requiring thata maximum number of looks be pre-specified. In their approach, the overall significance level isportioned into probabilities �1, . . . , �K such that

∑Ki = 1 �i = �. The probabilities �i are selected

before the trial starts. As the trial progresses, the symmetric stopping limits li and ui =−li arecalculated to satisfy P(l1<S1<u1, l2<S2<u2, . . . , Si � li or Si � ui ; � = 0)= �i . Thus, the value�i is the probability of stopping at the i th inspection and rejecting H0 when it is true and is referredto as the type I error spent at inspection i . Lan and DeMets [49] take the idea further and removethe problem of pre-specifying the number of inspections. With their method, again, the total typeI error rate of � is considered to be spent through the course of the trial. However now, the rateat which it is spent is controlled by a specified function, known as the �-spending function, �∗(t).The method introduces flexibility in the choice of the shape of the stopping boundaries and also,in contrast to the tests of Pocock [7], O’Brien and Fleming [8] and Slud and Wei [53], allowsconstruction of a test that maintains the type I error rate when inspections deviate from theirplanned course.

Lan and DeMets [49] present �-spending functions which result in designs that are similar toPocock’s and O’Brien and Fleming’s tests. Kim and DeMets [54] suggest a family of �-spendingfunctions, examples of which correspond closely to designs constructed under the boundariesapproach described below. Pampallona et al. [55] introduce the concept of (1 − �) spendingfunctions which can be used to construct asymmetric designs, for use when the interest in bothtreatments is not equal such as when comparing an active treatment with placebo. Investigatorsare likely to have less interest in finding out whether the active treatment is significantly worsethan placebo. With the same objective in mind, Stallard and Facey [56] describe (1− �) spendingfunctions.

4.1.2. The boundaries approach. The boundaries approach is based on modelling (S1, . . . , SK ) aspoints on a Brownian motion with drift � observed at times I1, . . . , IK and has led to implementationof the abstract concept of continuous monitoring. It is assumed that the value of S is observed

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

SEQUENTIAL METHODOLOGY IN CLINICAL STUDIES 243

at all times rather than at the discrete times I1, . . . , IK and a plot of S against I then forms acontinuous sample path. The path is compared with continuous boundaries, which may be expressedas functions of I . Many of the theoretical developments in sequential analysis have been based onconsideration of this problem. A consequence of this formulation is that, since the sample path iscontinuous, the trial stops exactly on a boundary, whereas for a discretely monitored trial, there issome overshoot of the boundary when the trial stops.

The original design of this type was the SPRT proposed by Wald [5], which is an extension to theclassical approach to hypothesis testing developed in the 1920s and 1930s by Neyman and Pearson.Unfortunately, Wald’s original test has two main weaknesses when applied to clinical trials. First,it is a test of two simple hypotheses. In clinical trials, the null hypothesis of no treatment differenceusually has a two-sided alternative hypothesis of some treatment difference. Second, the SPRTis an open design, with no upper bound placed on the sample size of the trial. The boundariesapproach evolved as a series of modifications which rectify one or both of these problems. Thetriangular test is one modification first suggested by Anderson [57], investigated further by Lai[58] and subsequently described by Whitehead et al. [59]. This is the most commonly used of allthe boundaries approach designs, see, for example, Whitehead [60]. Other designs of the sameclass are described in detail by Whitehead [50].

The critical values obtained using the boundaries approach maintain the overall type I errorrate for a continuously monitored test. In practice, monitoring is necessarily discrete, since evenif an interim analysis is conducted after observation of each patient, information will increase insmall steps. This means that if the critical values from the boundaries approach are used, then thetype I error rate will be less than the desired level, �. Whitehead [50] has proposed a correctionto modify the continuous boundaries to allow for the discretely monitored sample path. Thiscorrection, known as the ‘Christmas tree correction’, brings in the critical values by an amountequal to the expected overshoot of the discrete sample path.

4.1.3. Other approaches. So far, the discussion of methodology for designs has been restricted tothe mainstream statistical procedures that are most widely implemented in practice. There are anumber of other approaches which have been developed over the last 25 years and these are nowhighlighted.

Sequential methodology following the Bayesian framework has been developed by some authorsfor use in phase III clinical trials. As with the Bayesian approaches proposed for phase I and phase IItrials, the concept is the same. Belief about the treatment effect is expressed before the trial begins,based on pre-existing data and clinical judgement. As the trial progresses, that belief is refined bycombining the prior opinion with the observed data. Stopping rules are then developed based onterminating the trial when belief is sufficiently persuasive. There are several references describingthis technique. These are all summarized in the book by Spiegelhalter et al. [61] which describes themethodology in detail. Related to this is an approach known as the predictive power approach, whichallows specification of a rule to allow stopping for futility only, based on Bayesian calculations [62].

In 1982 Lan et al. [63] and Halperin et al. [64] proposed the idea of stochastic curtailment. Theidea here is to stop a trial as soon as its outcome is determined with high probability. The approachmakes use of the conditional power function, with a study being more likely to be abandonedif the conditional power is poor. Applications of stochastic curtailment have been described byAnderson [65], Halperin et al. [66] and Hunsberger et al. [67] amongst others.

The final approach highlighted in this section is the repeated confidence interval approach. Forthis method a sequence of confidence intervals is calculated, one at each inspection of the data.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

244 S. TODD

The intervals are constructed such that the simultaneous coverage probability is maintained at somelevel, say 1 − �. The stopping rule is then of the form ‘stop the test when the current repeatedconfidence interval excludes 0’. The methodology is described in detail in a series of papers byJennison and Turnbull [68–70]. An application of repeated confidence intervals to equivalencestudies is described by Durrleman and Simon [71].

The vast literature concerning sequential designs which has been written over the last 25 years,only some of which it is possible to mention in this paper, has promoted authors to try and unifythe various approaches into general classes of designs. Kittelson and Emerson [72] propose afamily of designs which unifies previous approaches and allows continuous movement amongstthe various classes of designs. Whitehead [73] points out the relationships between the variousapproaches and draws out the essential underlying features which are important when designing asequential trial in practice. Despite these, agreement on a single approach has not been reached andauthors still maintain preferences for one approach or another. However, it is now appreciated thatthe differences are largely in terms of the details of calculation and not the underlying concepts.

4.1.4. Adaptive designs. An alternative methodology, which is somewhat different from thesequential approaches described above, is the group of procedures that has become known collec-tively as adaptive designs. This work has received much attention in recent years (for example [74]).The technique is essentially based on the assumption of multivariate normality for S1, . . . , SK andwas first described by Bauer and Kohne [75]. These authors focus on a two-stage design and as-sume that the data from each stage are independent of those from the other stage. The methodologycan be extended to trials with greater numbers of stages. The basic idea is as follows. Supposethat a standard hypothesis test of H0: � = 0 is conducted based on the data obtained from each ofthe two stages, leading to two p-values, p1 and p2. Using Fisher’s combination method, Bauerand Kohne show that, under the null hypothesis, −2 log(p1 p2) follows a chi-square distributionon 4 degrees of freedom. This allows the data from the two stages to be combined in a singletest. The only assumption made is the independence of data from the two stages. This means thatthe approach has the ability to allow great flexibility in the design and analysis of trials withoutinflating the type I error rate. The adaptations can be based on unblinded data collected so far ina trial, as well as external information. In addition, the adaptation rules need not be specified inadvance. The most common change which is advocated is modification of the sample size of thesecond stage based on the predicted power of the trial at the end of the first stage [76]. However,the possibilities go far beyond this to include dropping or adding treatment arms, changing theprimary endpoint, changing the patient population and even changing objectives (for example,switching from non-inferiority to superiority). Several authors have recently discussed the topic indetail and proposed related schemes [77–81].

It has been noted that the adaptive design approach makes use of a test statistic which is nota sufficient statistic for the treatment difference. This leads to a lack of power for the test, sothat, if the flexibility of the adaptive design is not utilized, a sequential test can be found that isas powerful and has smaller expected sample size [82]. Nevertheless, the enhanced flexibility isextremely attractive and adaptive designs are likely to be the continued focus of more research.

4.2. Monitoring

The actual monitoring of trial data is the second step in the conduct of a sequential clinical trial.Although this process is present in all clinical development phases when a sequential approach is

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

SEQUENTIAL METHODOLOGY IN CLINICAL STUDIES 245

implemented, it is most formalized in the phase III setting. Recent years have seen the appointmentof Data and Safety Monitoring Boards (DSMBs) or Independent Data Monitoring Committees(IDMCs) for increasing numbers of phase III trials. These were originally constituted only in thesetting of serious life-threatening diseases. Now, many large scale phase III trials in a wide varietyof therapeutic areas will have a DSMB. Such a board may exist to monitor safety data only whentrials are not planned to be sequential. If a stopping rule is to be implemented, a DSMB will lookat both safety and efficacy data. Throughout its history Statistics in Medicine has been a forumfor discussion on issues of monitoring clinical trials, and the format and structure of DSMBs.Proceedings of two workshops were published in special issues of the journal. The first workshopwas on ‘Practical Issues in Data Monitoring of Clinical Trials’ held in 1992 and sponsored by theUS National Institutes of Health [83]. The second was on early stopping rules in cancer clinicaltrials held at Cambridge University in 1993 [84]. As well as these individual volumes devotedentirely to the subject, a number of authors have published on the topic in the journal. Whitehead[85] discusses issues relating to being a statistician on a DSMB, whilst Facey and Lewis [86]discuss, in general terms, the management of interim analyses in drug development. An excellentbook on DSMBs has been recently published by Ellenberg et al. [87].

One topic which deserves particular mention under the heading of monitoring is that of samplesize reviews. These are interim looks at the data that can lead to resizing the clinical trial, usuallyincreasing the sample size, but cannot lead to stopping it. Typically, only one sample size reviewwould be conducted during the course of a trial. The concept of sample size reviews was originallyintroduced in the early 1990s, in the context of fixed sample designs, by Wittes and Brittain [88],Gould [89, 90] and Gould and Shih [91]. The use of similar reviews in the context of a sequentialtrial has been proposed by Gould and Shih [92] and Whitehead et al. [93]. It is worth notingthat when information-based monitoring is implemented as part of a sequential trial, sample sizereviews fit naturally within this framework and can be conducted easily.

4.3. Analysis

Once authors had tackled the question of appropriately designing a sequential phase III studyattention turned to the question of analysis. The interim analyses conducted during the monitoringstep determine only whether the trial should be stopped, they do not provide a complete interpre-tation of the data. On completion of a sequential trial, a final analysis will be performed. The useof a stopping rule means that standard analysis methods are no longer valid. To see why, supposethat a trial stops when the test statistic exceeds the upper stopping limit. This leads to the con-clusion that the experimental treatment is better than the control treatment. The trial has stoppedprecisely because of the large observed value of the random test statistic and a standard estimatebased on that observed value, for example, the maximum likelihood estimate, will, on average,overestimate the true value of the treatment difference. Likewise, a confidence interval calculatedusing traditional methods will, on average, be too narrow, that is, too precise and the p-value will,on average, be too small, that is, it will overstate the evidence against the null hypothesis. Aftera sequential trial, the meaning and interpretation of data summaries such as significance levels,point estimates and confidence intervals remain as for fixed sample size trials, but methodologyfor their calculation needs to be redefined.

A concept central to the appropriate analysis of a sequential trial is that of an ‘ordering’. Thisconsiders how likely it would have been to observe stronger evidence of treatment benefit in a trialwith the same sequence of interim analyses. The first orderings defined in the sequential setting

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

246 S. TODD

were proposed by Armitage [94] and Siegmund [95]. These made use of the continuous monitoringframework. Over the last 25 years many more authors have proposed orderings, this time for themore realistic case of discrete monitoring. These include Fairbanks and Madsen [96], Tsiatis et al.[97], Rosner and Tsiatis [98], Chang [99] and Emerson and Fleming [100]. An orderings analysisenables the construction of a p-value function and from this it is possible to obtain median unbiasedestimates of the treatment effect and associated confidence intervals. Alternatively, it is possible toimplement expectation-based methods to obtain adjusted estimates and confidence intervals. Thisapproach has been considered by Whitehead [101], Woodroofe [102] and Todd et al. [103].

5. PHASE IV CLINICAL TRIALS

The term phase IV clinical trial is usually reserved for postmarketing surveillance of treatmentsafter they have received regulatory approval. Once more, this is likely to be a sequential procedurewith data accumulating on adverse events that occur following the widespread distribution of atreatment. On reviewing the literature this is an area which has received very little attention fromstatisticians developing formal sequential methods. Clearly monitoring in this setting is wheresequential methodology could have an important role to play in the future.

Another form of investigation, which makes use of data on completed phase III trials, is thatof meta-analysis. Of particular relevance to this article is the recent development of methodologyfor cumulative meta-analysis. This technique involves performing an updated meta-analysis everytime a new trial is added to a series of similar trials. By definition this is a sequential procedure,bringing with it all the issues of multiple testing, and so sequential methodology is required. Pogueand Yusuf [104] and Lan et al. [105] both address this problem, proposing adaptations of classicalsequential boundaries for use in this setting.

6. SOFTWARE AND APPLICATION

The design, monitoring and analysis of sequential tests based on any approach can be time-consuming and mathematically complex. In the last 25 years, this has prompted authors to takeadvantage of increasing computing power and develop suitable software. The computer packagePEST4 [106] has been developed to make implementation of the boundaries approach both easierand quicker. The package evolved from earlier FORTRAN programs, which evaluated some of themore complicated aspects of the methodology. The software package EaSt 4 [107] produced byCytel Software Corporation has been developed to allow the design and analysis of sequential trialsbased on the � spending function methodology. Reboussin et al. [108] have developed a programspecifically for calculating boundaries using the Lan and DeMets approach, which is availablewith their paper. Kittelson and Emerson’s [72] unified framework is the basis of the S-Plus moduleS + SeqTrial [109]. Finally, a recent addition to the available software is ADDPLAN [110] forthe implementation of methodology developed under the specific heading of adaptive designs.

Early examples of the implementation of sequential procedures were in small scale trials,since inspections were frequent. The availability of easy-to-use software has meant that stoppingrules are now simple to evaluate and apply. Consequently, their implementation has become morewidespread. Peace [111] presents an entire volume of preclinical and clinical real-world applicationsof sequential procedures to drug research and development. Furthermore, individual manuscripts

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

SEQUENTIAL METHODOLOGY IN CLINICAL STUDIES 247

reporting clinical trials which have been designed and analysed sequentially are appearing ingreater numbers. Areas of application include anticancer [112], antiviral [113], cardiovascular[114], degenerative diseases [115] and gastrointestinal [116], amongst many others.

7. MULTIVARIATE ANALYSES

As computing power improved, authors turned their attention to more complex sequential problems.Of specific interest is the development of methodology for multivariate sequential analyses. Again,most research has occurred in the setting of phase III designs, although there are exceptions (forexample [35]). In a multivariate sequential analysis the parameter of interest � will be vector-valued.There are a number of scenarios in which such data can be envisaged. Perhaps, the earliest workconsidered estimation of secondary parameters at the end of a trial in which a primary parameterhas been monitored sequentially [117]. Alternatively interest could lie in considering repeatedobservations of the same parameter over time. Tang et al. [118] and Wei et al. [119] approachthe monitoring of multiple responses of this type through the reduction to a single compositeoutcome. More recently the question of how to approach the multiple treatment problem in thesequential setting has been considered. Proschan et al. [120] and Liu [121] follow the route ofrepeated application of a global test procedure, while Vincent et al. [122] consider the case ofcomparing two experimental treatments with a control in a bivariate framework. Using this samebivariate approach Jennison and Turnbull [123] and Cook and Farewell [124] both consider thesimultaneous monitoring of efficacy and safety responses. Recent work that can also be describedunder the heading of multivariate analyses is research into combining phases II and III into asingle, seamless trial. Several authors have considered this question, taking a variety of differentapproaches [125–128].

Within the setting of multivariate analyses, the importance of defining a suitable power specifi-cation has been identified. In particular, it has become apparent that there is no well-defined anduniversally accepted methodology for even the fixed sample size case. The power specificationmight be marginal, relating to some or all of the parameters of interest separately. Alternatively,a global, experiment-wise error rate might be set. Stopping rules for multivariate trials might bespecified in terms of error spending functions or in terms of boundaries. Multivariate recursivenumerical integration (a generalization of the univariate approach of Armitage et al. [48]) mightbe implemented to compute specific critical boundaries in relatively simple cases. Alternatively, incases where scenarios become more complex it may be that simulation is the more efficient toolfor calculating stopping limits. Indeed, this was the approach taken by Proschan et al. [120] inthe multiple treatment problem. Although it is relatively early in their development, these methodshave the potential to address some of the key problems encountered in monitoring many of today’sclinical trials.

8. DISCUSSION

The continuing development of sequential methodology carries with it the potential to change thedesign and analysis methods currently employed in many controlled clinical trials. Developmentsin the area during the past 25 years have provided more user-friendly avenues of handling thestatistical problems arising from the analysis of such trials. Consequently, sequential methods now

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

248 S. TODD

receive a more prominent position in the design and conduct of medical trials than when they werefirst developed, both in the area of public health research and within the pharmaceutical industry.This can be borne out by the increasing number of journal articles on the subject, both in Statisticsin Medicine and elsewhere.

Inevitably, many problems still need to be addressed, both in terms of further theoretical devel-opments needed to tackle the ever increasing complexities inherent in modern clinical studies andconsideration of issues involved in practical application of methodology. Areas where there is thelargest potential for advancements to be made are those where methodology has been most recentlydeveloped, for example, in the fields of adaptive designs and multivariate analyses. It is also hopedthat the theoretical advances in phases of clinical research other than phase III will find widerapplication. The take-up of such methods is likely to depend on the development and promotionof suitable software. Some progress has been made on this front, with several authors releasingcode or producing commercial packages. An example is in the setting of phase I trials, where theconcluding chapter of [30] presents details of available software. In summing up, although muchhas been achieved in the last 25 years, there still remains more to do.

REFERENCES

1. Everitt BS, Palmer CR (eds). Encyclopaedic Companion to Medical Statistics. Hodder Arnold: London, 2005.2. Armitage P. Sequential Medical Trials. Blackwell: Oxford, 1960.3. Armitage P. Sequential Medical Trials (2nd edn). Blackwell: Oxford, 1975.4. Bross I. Sequential medical plans. Biometrics 1952; 8:188–205.5. Wald A. Sequential Analysis. Wiley: New York, 1947.6. Barnard GA. Sequential tests in industrial statistics. Journal of the Royal Statistical Society 1946; 8(Suppl.):1–26.7. Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika 1977; 64:191–199.8. O’Brien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics 1979; 35:549–556.9. Kilpatrick GS, Oldham PD. Calcium chloride and adrenaline as bronchial dilators compared by sequential

analysis. British Medical Journal 1954; ii:1388–1391.10. Snell ES, Armitage P. Clinical comparison of diamorphine and pholcodine as cough suppressants, by a new

method of sequential analysis. Lancet 1957; 1:860–862.11. Jones DR, Newman CE, Whitehead J. The design of a sequential clinical trial for comparison of two lung

cancer treatments. Statistics in Medicine 1982; 1:73–82.12. Rosner GL, Tsiatis AA. The impact that group sequential tests would have made on ECOG clinical trials.

Statistics in Medicine 1989; 8:505–516.13. Bellisant E, Benichou J, Chastang C. Application of the triangular test to phase II cancer clinical trials. Statistics

in Medicine 1990; 9:907–917.14. Pawitan Y, Hallstrom A. Statistical interim monitoring of the cardiac arrhythmia suppression trial. Statistics in

Medicine 1990; 9:1081–1090.15. Coad DS, Rosenberger WF. A comparison of the randomised play-the-winner rule and the triangular test for

clinical trials with binary responses. Statistics in Medicine 1999; 18:761–769.16. Kieser M, Friede T. Re-calculating the sample size in internal pilot study designs with control of the type I

error rate. Statistics in Medicine 2000; 19:901–911.17. Stallard N, Todd S. Exact sequential tests for single samples of discrete responses using spending functions.

Statistics in Medicine 2000; 19:3051–3064.18. Jennison C, Turnbull BW. Mid-course sample size modification in clinical trials based on the observed treatment

effect. Statistics in Medicine 2003; 22:971–993.19. Eichhorn BH, Zacks S. Sequential search of an optimal dosage. Journal of the American Statistical Association

1973; 68:594–598.20. Eichhorn BH, Zacks S. Bayes sequential search of optimal dosage: linear regression with parameters unknown.

Communications in Statistics—Theory and Methods 1981; A10:931–953.21. O’Quigley J, Pepe M, Fisher L. Continual reassessment method: a practical design for phase I clinical trials in

cancer. Biometrics 1990; 46:33–48.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

SEQUENTIAL METHODOLOGY IN CLINICAL STUDIES 249

22. Faries D. Practical modifications of the continual reassessment method for phase I cancer trials. Journal ofBiopharmaceutical Statistics 1994; 4:147–164.

23. Korn EL, Midthune D, Chen TT, Rubintein LV, Christian MC, Simon RM. A comparison of two phase Idesigns. Statistics in Medicine 1994; 13:1799–1806.

24. Goodman SN, Zahurak ML, Piantadosi S. Some practical improvements in the continual reassessment methodfor phase I studies. Statistics in Medicine 1995; 14:1149–1161.

25. O’Quigley J. Dose finding studies using continual reassessment method. In Handbook of Statistics in Oncology,Crowley J (ed.). Dekker: New York, 2001.

26. Gatsonis C, Greenhouse JB. Bayesian methods for phase I clinical trials. Statistics in Medicine 1992; 11:1377–1389.

27. Whitehead J, Brunier H. Bayesian decision procedures for dose determining experiments. Statistics in Medicine1995; 14:885–893.

28. Babb J, Rogatko A, Zacks S. Cancer phase I clinical trials: efficient dose escalation with overdose control.Statistics in Medicine 1998; 17:1103–1120.

29. Ting N (ed.). Dose Finding in Drug Development. Springer: New York, 2006.30. Chevret S (ed.). Statistical Methods for Dose-Finding Experiments. Wiley: Chichester, 2006.31. Schoenfeld D. Statistical considerations for pilot studies. International Journal of Radiation Oncology Biology

and Physics 1980; 6:371–374.32. Simon R. Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 1989; 10:1–10.33. Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics 1982; 38:143–151.34. Chen TT. Optimal three-stage designs for phase II cancer clinical trials. Biometrics 1997; 43:865–874.35. Conaway M, Petroni G. Bivariate sequential designs for phase II trials. Biometrics 1995; 51:656–664.36. Thall PF, Simon R. Practical Bayesian guidelines for phase IIB clinical trials. Biometrics 1994; 50:337–349.37. Thall PF, Simon R, Estey EH. Bayesian sequential monitoring designs for single-arm clinical trials with multiple

outcomes. Statistics in Medicine 1995; 14:357–379.38. Hilden J, Bock JE, Andreasson B, Visfeldt J. Ethics and decision theory in a clinical trial involving severe

disfigurement. Theoretical Surgery 1987; 1:183–189.39. Cressie N, Beale J. A sample-size-optimal Bayesian decision procedure for sequential pharmaceutical trials.

Biometrics 1994; 50:700–711.40. Heitjan DF. Bayesian interim analysis of phase II cancer clinical trials. Statistics in Medicine 1997; 16:1791–1802.41. Stallard N. Sample size determination for phase II clinical trials based on Bayesian decision theory. Biometrics

1998; 54:279–294.42. Stallard N. Approximately optimal designs for phase II clinical studies. Journal of Biopharmaceutical Statistics

1998; 8:469–487.43. Stallard N, Thall PF, Whitehead J. Decision theoretic designs for phase II clinical trials with multiple outcomes.

Biometrics 1999; 55:971–977.44. Thall PF, Simon R, Ellenberg SS. A two-stage design for choosing among several experimental treatments and

a control in clinical trials. Biometrics 1989; 45:537–547.45. Schaid DJ, Wieand S, Therneau, TM. Optimal two-stage screening designs for survival comparisons. Biometrika

1990; 77:659–663.46. Thall PF, Estey EH. A Bayesian strategy for screening cancer treatments prior to phase II clinical evaluation.

Statistics in Medicine 1993; 12:1197–1211.47. Thall PF, Sung H-G. Some extensions and applications of a Bayesian strategy for monitoring multiple outcomes

in clinical trials. Statistics in Medicine 1998; 17:1563–1580.48. Armitage P, McPherson CK, Rowe BC. Repeated significance tests on accumulating data. Journal of the Royal

Statistical Society 1969; A132:235–244.49. Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983; 70:659–663.50. Whitehead J. The Design and Analysis of Sequential Clinical Trials (revised 2nd edn). Wiley: Chichester, 1987.51. Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC:

London, Boca Raton, FL, 2000.52. Scharfstein DO, Tsiatis AA, Robins JM. Semiparametric efficiency and its implication on the design and analysis

of group-sequential studies. Journal of the American Statistical Association 1997; 92:1342–1350.53. Slud EV, Wei LJ. Two-sample repeated significance tests based on the modified Wilcoxon statistics. Journal of

the American Statistical Association 1982; 77:862–868.54. Kim K, DeMets DL. Design and analysis of group sequential tests based on the type I error spending rate

function. Biometrika 1987; 74:149–154.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

250 S. TODD

55. Pampallona S, Tsiatis AA, Kim K. Interim monitoring of group sequential trials using spending functions forthe type I and type II error probabilities. Drug Information Journal 2001; 35:1113–1121.

56. Stallard N, Facey KM. Comparison of the spending function method and the Christmas tree correction forgroup sequential trials. Journal of Biopharmaceutical Statistics 1990; 6:361–373.

57. Anderson, TW. A modification of the sequential probability ratio test to reduce sample size. Annals ofMathematical Statistics 1960; 31:165–197.

58. Lai, TL. Optimal stopping and sequential tests which minimise the maximum expected sample size. Annals ofStatistics 1973; 1:659–663.

59. Whitehead J, Stratton I. Group sequential clinical trials with triangular continuation regions. Biometrics 1983;39:227–236.

60. Whitehead J. Use of the triangular test in sequential clinical trials. In Handbook of Statistics in Oncology,Crowley J (ed.). Dekker: New York, 2001.

61. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation.Wiley: Chichester, 2004.

62. Spiegelhalter DJ, Freedman LS, Blackburn PR. Monitoring clinical trials: conditional or predictive power?Controlled Clinical Trials 1986; 7:8–17.

63. Lan KKG, Simon R, Halperin, M. Stochastically curtailed tests in long-term clinical trials. Sequential Analysis1982; 1:207–219.

64. Halperin M, Lan KKG, Ware JH, Johnson NJ, DeMets DL. An aid to data monitoring in long-term clinicaltrials. Controlled Clinical Trials 1982; 3:311–323.

65. Anderson PK. Conditional power calculations as an aid in the decision whether to continue a clinical trial.Controlled Clinical Trials 1987; 8:67–74.

66. Halperin M, Lan KKG, Wright EC, Foulkes MA. Stochastic curtailing for the comparison of slopes in longitudinalstudies. Controlled Clinical Trials 1987; 8:315–326.

67. Hunsberger S, Sorlie P, Geller NL. Stochastic curtailing and conditional power in matched case-control studies.Statistics in Medicine 1994; 13:663–670.

68. Jennison C, Turnbull BW. Repeated confidence intervals for group sequential clinical trials. Controlled ClinicalTrials 1984; 5:33–45.

69. Jennison C, Turnbull BW. Repeated confidence intervals for the median survival time. Biometrika 1985;72:619–625.

70. Jennison C, Turnbull BW. Interim analyses: the repeated confidence interval approach (with Discussion). Journalof the Royal Statistical Society, Series B 1989; 51:305–361.

71. Durrleman S, Simon R. Planning and monitoring of equivalence studies. Biometrics 1990; 46:329–336.72. Kittelson JM, Emerson SS. A unifying family of group sequential test designs. Biometrics 1999; 55:874–882.73. Whitehead J. A unified theory for sequential clinical trials. Statistics in Medicine 1999; 18:2271–2286.74. Phillips AJ, Keene ON on behalf of the PSI Adaptive Design Expert Group. Adaptive designs for pivotal trials:

discussion points from the PSI Adaptive Design Expert Group. Pharmaceutical Statistics 2006; 5:61–66.75. Bauer P, Kohne K. Evaluation of experiments with adaptive interim analyses. Biometrics 1994; 50:1029–1041.

Correction in Biometrics 1996; 52:380.76. Posch M, Bauer P. Interim analysis and sample size reassessment. Biometrics 2002; 56:1170–1176.77. Proschan MA, Hunsberger SA. Designed extension of studies based on conditional power. Biometrics 1995;

51:1315–1324.78. Fisher LD. Self-designing clinical trials. Statistics in Medicine 1998; 17:1551–1562.79. Lehmacher W, Wassmer G. Adaptive sample size calculations in group sequential trials. Biometrics 1999;

55:1286–1290.80. Muller HH, Schafer H. Adaptive group sequential designs for clinical trials: combining the advantages of

adaptive and of classical group sequential approaches. Biometrics 2001; 57:886–891.81. Posch M, Bauer P, Brannath W. Issues in designing flexible trials. Statistics in Medicine 2003; 22:953–969.82. Tsiatis AA, Mehta CR. On the inefficiency of the adaptive design for monitoring clinical trials. Biometrika

2003; 90:367–378.83. Ellenberg S, Geller N, Simon R, Yusuf S (eds). Practical issues in data monitoring of clinical trials. Statistics

in Medicine 1993; 12:414–616.84. Souhami RL, Whitehead J (eds). Workshop on Early Stopping Rules in Cancer Clinical Trials; Statistics in

Medicine 1994; 13:1289–1500.85. Whitehead J. On being the statistician on a data and safety monitoring board. Statistics in Medicine 1999;

18:3425–3434.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

SEQUENTIAL METHODOLOGY IN CLINICAL STUDIES 251

86. Facey KM, Lewis JA. The management of interim analyses in drug development. Statistics in Medicine 1998;17:1801–1809.

87. Ellenberg S, Fleming TR, DeMets DL. Data Monitoring Committees in Clinical Trials. Wiley: Chichester, 2002.88. Wittes J, Brittain E. The role of internal pilot studies in increasing the efficiency of clinical trials. Statistics in

Medicine 1990; 9:65–72.89. Gould AL. Interim analyses for monitoring clinical trials that do not materially affect the type I error rate.

Statistics in Medicine 1992; 11:55–66.90. Gould AL. Planning and revising the sample size for a trial. Statistics in Medicine 1995; 14:1039–1051.91. Gould AL, Shih WJ. Sample size re-estimation without unblinding for normally distributed data with unknown

variance. Communications in Statistics—Theory and Methods 1992; 21:2833–2853.92. Gould AL, Shih WJ. Modifying the design of ongoing trials without unblinding. Statistics in Medicine 1998;

17:89–100.93. Whitehead J, Whitehead A, Todd, S, Bolland, K Sooriyarachchi MR. Mid-trial design reviews for sequential

clinical trials. Statistics in Medicine 2001; 20:165–176.94. Armitage P. Restricted sequential procedures. Biometrika 1957; 44:9–26.95. Siegmund D. Estimation following sequential tests. Biometrika 1978; 65:341–349.96. Fairbanks K, Madsen R. P values for tests using the repeated significance test design. Biometrika 1982; 69:69–74.97. Tsiatis AA, Rosner GL, Mehta CR. Exact confidence intervals following a group sequential test. Biometrics

1984; 40:797–803.98. Rosner GL, Tsiatis AA. Exact confidence limits following group sequential tests. Biometrika 1988; 75:723–729.99. Chang MN. Confidence intervals for a normal mean following a group sequential test. Biometrics 1989;

45:247–254.100. Emerson SS, Fleming TR. Parameter estimation following group sequential hypothesis testing. Biometrika 1990;

77:875–892.101. Whitehead J. On the bias of maximum likelihood estimation following a sequential test. Biometrika 1986;

73:573–581.102. Woodroofe M. Estimation after sequential testing: a simple approach for a truncated sequential probability ratio

test. Biometrika 1992; 79:347–353.103. Todd S, Whitehead J, Facey KM. Point and interval estimation following a sequential trial. Biometrika 1996;

83:453–461.104. Pogue JM, Yusuf S. Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries

for cumulative meta-analysis. Controlled Clinical Trials 2003; 18:580–593.105. Lan KKG, Hu MX, Cappelleri JC. Applying the law of iterated logarithm to cumulative meta-analysis of a

continuous endpoint. Statistica Sinica 2003; 13:1135–1145.106. MPS Research Unit. PEST 4: Operating Manual. The University of Reading: U.K., 2000.107. Cytel Software Corporation. EaSt 4. A Software Package for the Design and Interim Monitoring of Group-

Sequential Clinical Trials. Cytel Software Corporation: Cambridge, MA, 2005.108. Reboussin DM, DeMets DL, Kim K, Lan KKG. Computations for group sequential boundaries using the

Lan-DeMets spending function method. Controlled Clinical Trials 2000; 21:190–207.109. Insightful Corporation. S-Plus 7. Insightful Corporation: Seattle, Washington, 2005.110. ADDPLAN GmbH. ADDPLAN: Adaptive Designs—Plans and Analyses. ADDPLAN GmbH: Cologne, Germany,

2005.111. Peace KE (ed.). Biopharmaceutical Sequential Statistical Applications. Dekker: New York, 1992.112. Fayers PM, Cook PA, Machin D, Donaldson N, Whitehead J, Ritchie A, Oliver RTD, Yuen P. On the development

of the medical research council trial of �-interferon in metastatic renal carcinoma. Statistics in Medicine 1994;13:2249–2260.

113. Montaner JSG, Lawson LM, Levitt N, Belzberg A, Schechter MT, Reudy J. Corticosteroids preventearly deterioration in patients with moderately severe pneumocystis carinii pneumonia and the acquiredimmunodeficiency syndrome (AIDS). Annals of Internal Medicine 1990; 113:14–20.

114. Moss AJ, Hall WJ and 10 others. Improved survival with implanted defibrillator in patients with coronarydisease at high risk for ventricular arrhythmia. New England Journal of Medicine 1996; 335:1933–1940.

115. Whitehead J, Thomas P. A sequential trial of pain killers in arthritis: issues of multiple comparisons withcontrol and of interval-censored data. Journal of Biopharmaceutical Statistics 1997; 7:333–353.

116. Bellisant E, Duhamel J-F, Guillot M, Pariente-Khayat A, Olive G, Pons G. The triangular test to assessthe efficacy of metoclopramide in gastroesophageal reflux. Clinical Pharmacology and Therapeutics 1997;61:377–384.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim

252 S. TODD

117. Whitehead J. Supplementary analysis at the conclusion of a sequential clinical trial. Biometrics 1986; 42:461–471.118. Tang D-I, Gnecco S, Geller NL. Design of group sequential clinical trials with multiple endpoints. Journal of

the American Statistical Association 1989; 84:776–779.119. Wei LJ, Su JQ, Lachin JM. Interim analysis with repeated measurements in a sequential clinical trial. Biometrika

1992; 77:359–364.120. Proschan MA, Follman DA, Geller NL. Monitoring multi-armed trials. Statistics in Medicine 1994; 13:

1441–1452.121. Liu W. A group sequential procedure for all-pairwise comparisons of k treatments based on the range statistic.

Biometrics 1995; 51:946–955.122. Vincent E, Todd S, Whitehead J. A sequential procedure for comparing two experimental treatments with a

control. Journal of Biopharmaceutical Statistics 2002; 12:249–265.123. Jennison C, Turnbull BW. Group sequential tests for bivariate response: interim analyses of clinical trials with

both efficacy and safety endpoints. Biometrics 1993; 49:741–752.124. Cook RJ, Farewell VT. Guidelines for monitoring efficacy and toxicity responses in clinical trials. Biometrics

1994; 50:1146–1152.125. Stallard N, Todd S. Sequential designs for phase III clinical trials incorporating treatment selection. Statistics

in Medicine 2003; 22:689–703.126. Todd S, Stallard N. A new combined clinical trial design combining phases 2 and 3: sequential designs with

treatment selection and a change of endpoint.127. Bretz F, Schmidli H, Konig F, Racine A, Maurer W. Confirmatory seamless phase II/III clinical trials with

hypotheses selection at interim: general concepts. Biometrical Journal 2006; 48:623–634.128. Schmidli H, Bretz F, Racine A, Maurer W. Confirmatory seamless phase II/III clinical trials with hypotheses

selection at interim: applications and practical considerations. Biometrical Journal 2006; 48:635–643.

Copyright q 2006 John Wiley & Sons, Ltd. Statist. Med. 2007; 26:237–252DOI: 10.1002/sim