controlling for experimentwise type ii error ...patients when 503 patients are allocated to each the...

13
CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR: OPTIMUM PATIENT ALLOCATION FOR A THREE ARMED, PARALLEL, RANDOMIZED, CONTROLLED CLINICAL TRIAL WITH BINARY OUTCOMES Benjamin Duncan, Glaxo Wellcome Inc., Research Triangle Park, North Carolina Introduction In the drug development process, controlling for type I error (declaring the test drug is efficacious when in reality it is not) typically takes precedence over controlling for type II error (declaring the test drug is not efficacious when it is). Protecting the public from ineffective medicines is naturally of extreme importance. However, mistakenly withholding effective medicines from the public is also important. Either type of error could cause the loss of human life' or lead to unnecessary suffering. Multiple outcomes or multiplicity complicates matters further when attempting to control for type I and type II errors. Multiplicity is a concern for each of the following situations; a clinical trial with two or more primary variables of interest, a clinical trial with three or more treatment groups, two or more similar studies testing the same test treatment, and any combination of these. Therefore, how to and when to adjust for type I and type II errors is both non- trivial and controversial. Issues of multiplicity abound in statistical journals. However, most research focuses on controlling for experimentwise type I error rates. In contrast, little attention is given to experimentwise type II error rates. This paper considers the multiplicity issue in randomized, controlled clinical trials with one parameter of interest and three or more treatment groups, primarily considering the experimentwise type II error rate. Controlled clinical trials. use a standard therapy treatment group, a placebo group or both as a control. For example, a controlled trial with three arms may have one test treatment group, one standard treatment group and one placebo group (sometimes referred to as a gold standard trial''), or may have two test treatment groups and one placebo or standard treatment group. In this scenario, at least two and in many cases all three possible comparisons are of interest. For the first example, normally the primary objective is to show that the test treatment group is superior to the placebo group and is equivalent or superior to the standard treatment group. In addition, to establish the validity of the trial, it is also desireable to show that the standard treatment group is superior to the placebo group. Therefore, all three comparisons are of interest. For the second example (e.g., different doses of the same test drug), at least two objectives are possible. One possibility is to show that at least one of the two test treatment groups are superior to placebo. Alternatively, the objective may be to show that the two test drugs differ or are equivalent while also showing that both test drugs are superior to placebo. In this second objective, all three comparisons are therefore of interest. In each of these scenarios and as in any randomized clinical trial, the experiment must be appropriately sized to detect "clinically meaningful" differences between each of the treatment groups, if such differences exist. Optimum sample size and patient allocation per treatment group are further examined for a parallel, randomized, controlled, three-armed clinical trial with binary outcomes controlling for the overall or experimentwise type II error rate. Experiment Wise Power Two possibilities exist when comparing two treatment groups; either the treatment groups are equal or they are unequal. With more than two treatment groups, many more possibilities exist. For example, five possibilities exist when comparing three treatment groups, either all three groups are equal, all three groups are unequal, or one of the three groups differs from the other two. Although there are five true states of existence, there are 23 = 8 possible outcomes of the three pairwise comparisons. Three of the possible outcomes will always result in at least one comparisonwise type I or type II error (see appendix A). The best experiment will maximize the likelihood of a "correct decision" while minimizing the probability of one or more comparisonwise type I or type II errors. Of course, care should be taken as to not to oversize the trial in such a way that non-clinically meaningful differences are found to be statistically significant. The power of a test is the probability of rejecting the null hypothesis, given the alternative hypothesis is true and is equal to one minus the probability of a type II error (13). Experimentwise power is defmed differently, depending upon the objective of the experiment. For example, Hayter and Liu assessed experimentwise power in the case of comparing k ~ 2 treatments versus one control", Power was based upon rejecting the overall null hypothesis that all k treatment groups equal the control group given one or more of the k treatments differed from the controL In this type of problem, controlling for the probability of type I error (a) must be considered. However, in this paper, experimentwise power refers to the combined probabilities of rejecting the null hypotheses, given the alternative hypotheses are true, for each comparison. Experimentwise power is a fimction of the individual comparisonwise powers. For the three-armed, controlled clinical trial where T > S > P (where T represents the test treatment, S represents the standard treatment, and P represents the placebo), positive, statistically, significant results must occur for all three pairwise comparisons in order to correctly conclude the test treatment is superior to both controls and the standard treatment is superior to placebo. For this type of trial, the absolute upper bound for experimentwise type I error is the comparisonwise probability type I error rate" (a). This result holds true, because one statistically significant result is not sufficient to claim superiority. All three significant results are required to claim test treatment superiority, or in the case of equivalence, both the standard and test treatments must 3

Upload: others

Post on 28-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR: OPTIMUM PATIENT ALLOCATION FOR A THREE ARMED, PARALLEL, RANDOMIZED,

CONTROLLED CLINICAL TRIAL WITH BINARY OUTCOMES

Benjamin Duncan, Glaxo Wellcome Inc., Research Triangle Park, North Carolina

Introduction

In the drug development process, controlling for type I error (declaring the test drug is efficacious when in reality it is not) typically takes precedence over controlling for type II error (declaring the test drug is not efficacious when it is). Protecting the public from ineffective medicines is naturally of extreme importance. However, mistakenly withholding effective medicines from the public is also important. Either type of error could cause the loss of human life' or lead to unnecessary suffering. Multiple outcomes or multiplicity complicates matters further when attempting to control for type I and type II errors. Multiplicity is a concern for each of the following situations; a clinical trial with two or more primary variables of interest, a clinical trial with three or more treatment groups, two or more similar studies testing the same test treatment, and any combination of these. Therefore, how to and when to adjust for type I and type II errors is both non­ trivial and controversial.

Issues of multiplicity abound in statistical journals. However, most research focuses on controlling for experimentwise type I error rates. In contrast, little attention is given to experimentwise type II error rates. This paper considers the multiplicity issue in randomized, controlled clinical trials with one parameter of interest and three or more treatment groups, primarily considering the experimentwise type II error rate. Controlled clinical trials. use a standard therapy treatment group, a placebo group or both as a control. For example, a controlled trial with three arms may have one test treatment group, one standard treatment group and one placebo group (sometimes referred to as a gold standard trial''), or may have two test treatment groups and one placebo or standard treatment group. In this scenario, at least two and in many cases all three possible comparisons are of interest. For the first example, normally the primary objective is to show that the test treatment group is superior to the placebo group and is equivalent or superior to the standard treatment group. In addition, to establish the validity of the trial, it is also desireable to show that the standard treatment group is superior to the placebo group. Therefore, all three comparisons are of interest. For the second example (e.g., different doses of the same test drug), at least two objectives are possible. One possibility is to show that at least one of the two test treatment groups are superior to placebo. Alternatively, the objective may be to show that the two test drugs differ or are equivalent while also showing that both test drugs are superior to placebo. In this second objective, all three comparisons are therefore of interest. In each of these scenarios and as in any randomized clinical trial, the experiment must be appropriately sized to detect "clinically meaningful" differences between each of the treatment groups, if such differences exist. Optimum sample size and patient allocation per treatment group are further

examined for a parallel, randomized, controlled, three-armed clinical trial with binary outcomes controlling for the overall or experimentwise type II error rate.

Experiment Wise Power

Two possibilities exist when comparing two treatment groups; either the treatment groups are equal or they are unequal. With more than two treatment groups, many more possibilities exist. For example, five possibilities exist when comparing three treatment groups, either all three groups are equal, all three groups are unequal, or one of the three groups differs from the other two. Although there are five true states of existence, there are 23 = 8 possible outcomes of the three pairwise comparisons. Three of the possible outcomes will always result in at least one comparisonwise type I or type II error (see appendix A). The best experiment will maximize the likelihood of a "correct decision" while minimizing the probability of one or more comparisonwise type I or type II errors. Of course, care should be taken as to not to oversize the trial in such a way that non-clinically meaningful differences are found to be statistically significant.

The power of a test is the probability of rejecting the null hypothesis, given the alternative hypothesis is true and is equal to one minus the probability of a type II error (13). Experimentwise power is defmed differently, depending upon the objective of the experiment. For example, Hayter and Liu assessed experimentwise power in the case of comparing k ~ 2 treatments versus one control", Power was based upon rejecting the overall null hypothesis that all k treatment groups equal the control group given one or more of the k treatments differed from the controL In this type of problem, controlling for the probability of type I error (a) must be considered. However, in this paper, experimentwise power refers to the combined probabilities of rejecting the null hypotheses, given the alternative hypotheses are true, for each comparison.

Experimentwise power is a fimction of the individual comparisonwise powers. For the three-armed, controlled clinical trial where T > S > P (where T represents the test treatment, S represents the standard treatment, and P represents the placebo), positive, statistically, significant results must occur for all three pairwise comparisons in order to correctly conclude the test treatment is superior to both controls and the standard treatment is superior to placebo. For this type of trial, the absolute upper bound for experimentwise type I error is the comparisonwise probability type I error rate" (a). This result holds true, because one statistically significant result is not sufficient to claim superiority. All three significant results are required to claim test treatment superiority, or in the case of equivalence, both the standard and test treatments must

3

Page 2: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

significantly differ from placebo. Therefore, adjustments to the analyses are not required in order to control for experimentwise type I error. However, it is imperative to properly adjust the sample size of the study to take into account experimentwise type n error or power.

A lower bound (from the Bonferroni inequality) for experimentwise power" for this type of trial is

(1 - 13ts ~ J3tp - f3sp)

where f3ts is the comparisonwise type II error for the test treatment versus standard treatment comparison, J3tp is the comparisonwise type II error for the test treatment versus placebo comparison and J3sp is the comparisonwise type II error for the standard treatment versus placebo comparison. This is a very conservative measure of experimentwise power. In this example, If the comparisonwise type II errors are all equal to 0.67, then the lower bound for experimentwise power is approximately zero. Additionally, the lower bound probability that all comparisons will result in a type II error is zero (see Appendix B). This result is true for any level of the comparisonwise powers and for as few as two multiple comparisons An alternative method for computing experimentwise power assumes independence for each of the pairwise comparisons. A slightly less conservative estimate of experimentwise power is also obtained. For the previously described experiment, experimentwise power assuming independence is

(1 - 13ts Xl - J3tp Xl - J3sp).

For high values of comparisonwise power, the two methods produce similar experimentwise power. For lower values, the differences in experimentwise power become apparent. For example, if the power for each pairwise comparison is 0.7, methods (1) and (2) produce experimentwise powers of 0.10 and 0.34 respectively. In addition, from (2) the probability that all comparisons will result in a type II error equals (13tsXf3tpX!3sp), (which is greater than zero). True experimentwise power exceeds the estimate obtained from method (2) whenever a positive dependence (correlation) exists between the pairwise comparisons, in terms of the probability of committing type II errors. However, when a negative correlation exists between pairwise comparisons, method (2) over estimates true experimentwise power. For situations where comparisonwise power is not very high, true experimentwise power may be closer to the estimate obtained from method (2). However, when planning an experiment, most researchers plan for power of ~ 0.80, in which the differences obtained from methods (1) and (2) are minor. Therefore, in practical applications it makes little difference which method is used.

Optimum Patient Allocation

For two treatment groups, only one comparison is possible and the power of the test is maximized, given a fixed overall sample size (n), by randomly assigning an equal number of patients to each treatment group. For clinical trials with multiple comparisons, an equal number of patients in each

(1)

(2)

treatment group (given a fixed overall n) maximizes the power of each pairwise comparison. Therefore, in clinical trials with more than two treatment groups, patients are typically, equally allocated to each treatment group. However, an equal allocation of n patients does not necessarily maximize the experimentwise power. For the case of the controlled clinical trial with three or more treatment groups, an unequal allocation of the n patients to each treatment group may give a higher experimentwise power as compared to equal allocation. Consider the case where the parameter of interest in which sample size and power calculations are based is a binary outcome (e.g., success, failure). Furthermore, the hypothesized response rate for the placebo treatment group is assumed to be lower than that of the test treatment group(s) and the standard treatment group. For this scenario, an unequal allocation of n patients to each treatment group will always result in improved experimentwise power as compared to equally allocating the n patients. Utilizing this concept allows for a more efficient and cost-effective control of experimentwise type II error.

Calculating the ideal patient allocation ratio which maximizes experimentwise power is a non-trivial task. Several simultaneous equations would need to be solved and the concepts of global maximum. and minimum points of a multidimensional surface would need to be applied. A simple alternative involves the use of the computer. A SAS@ program (see appendix C) can quickly enumerate all possible combinations of patients per treatment group for any n and output the ideal ratio.

Example

For the controlled, three-armed clinical trial previously described, assume the alternative hypothesis is pP = 0.3, Ps = 0.45, and Pt = 0.55, pP, ps and Pt are the respective response rates for the placebo (P), standard (8) and test (T) treatment groups. The investigator is interested in showing that both the standard and test treatments are superior to placebo. In addition, the investigator would like to show that the test treatment is superior to the standard treatment, although this result may not be required for the overall success of the trial. Therefore, all three comparisons are of interest. Separate pairwise sample size computations' at the alpha = 0.05 level of significance and at 80% comparisonwise power give the following numbers:

P versus S - 176 patients per arm P versus T - 69 patients per arm S versus T - 412 patients per arm.

The S versus T comparison requires 412 patients per arm to achieve 80% power. If patients are equally allocated to each treatment group, than 412 patients will be allocated to each treatment group for a total of 1,236 patients. With 412 patients per group, the comparisonwise powers for the P versus S and P versus T comparisons are now 99.3% and > 99.9010 respectively. The experimentwise power under (I) and (2) is nearly equal, 79.3% and 79.5% respectively. However, if the 1,236 patients are unequally allocated, an improvement in experimentwise power is possible. A maximum ex-periment wise power of 84.9% using (2), is achieved for the 1,236

4

Page 3: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo group. In addition, the comparisonwise power for S versus T improves to 87.6%, while the comparisonwise powers for P versus S (97.0%) and P versus T (> 99.9%) are not seriously compromised. To keep the experimentwise power at 80%, only 1,117 patients are required when 450 patients are allocated to each the standard and test treatment groups and 217 patients are allocated to the placebo group. With 1,117 patients comparisonwise powers are as follows: 95.8% for P versus S, 83.6% for S versus T, and> 99.9% for P versus T. Figure (1) plots experimentwise power versus the S,T:P ratio for 1,117 patients. The maximum point on the curve represents the maximum experimentwise power (and the corresponding ratio) possible for 1,117 patients.

Simulations

In order to further test the utility of the unequal allocation scheme, the parameters from the example were used to simulate 10,000 trials of 1,117 unequally allocated patients using the RANBJNi function from SAS®. In order to compare observed versus expected experimentwise power when experimentwise power is low and to compare to two methods of experimentwise power computation, an additional 10,000 trials were simulated for 750 patients (250 in each treatment group) with the following parameters; P = 0.25, S = 0.35, and T = 0.45. Observed power is calculated from the frequencies of type II errors occurring in the simulated trials. The Mantel­ Haenszel chi-square test' is used at the alpha = 0.05 level of significance for all comparisons Results for each of the simulations are presented below:

Simulation (1)

Number of simulated trials: 10,000

Number of patients per trial: 1,117 (217 in P, 450 in S, 450 in T)

True response rates: P = 0.30, S = 0.45, T = 0.55

Observed respouse rates (mean over all trials) P = 0.30, s = 0.45,

t = 0.55.

84.2% 100%

80.0% (79.3%)1 80.6% (79.8% - 81.3%/

(1) Powers computed with independence assumption (and Bonferroni inequality)

(2) 95% confidence interval

Simulation (2)

Number of simulated trials: 10,000

Number of patients per trial: 750 (250 in each treatment group)

True response rates: P = 0.25, S = 0.35, T= 0.45

Observed response rates (mean over all trials) p = 0.25, s = 0.35,

t = 0.45.

38.2% (23.6%)1 37.0% (36.1% - 38.0%/

(1) Powers computed with independence assumption (and Bonferroni inequality)

(2) 95% confidence interval

In simulation (1) the observed experimentwise power is very close to the prespecified experimentwise power. This supports the validity of the unequal allocation scheme. Recomputing experimentwise power from the observed comparisonwise powers gives 80.5% (Bonferroni) and 81.l% (independence assumption). In simulation (2) the observed experimentwise power is very close to the prespecified experimentwise power computed with the independence assumption. The Bonferroni approach appears to significantly underestimate the true experimentwise power. The prespecified comparisonwise powers are slightly lower than the observed comparisonwise powers. Recomputing experimentwise power from the observed comparisonwise powers gives 31.5% (Bonferroni) and 43.2% (independence assumption). The observed experimentwise power of 37.0% lies within this interval.

Summary

Both experimentwise type I and type II errors need to be considered in planning clinical trials with multiple comparisons. For the particular case of the controlled, randomized clinical trial controlling for the experimentwise type I error rate is not needed, because each of the pairwise comparisons are of interest and must achieve significance in order to claim the trial a success. However, an adjustment are required to maintain the experimentwise type II error rate. The Bonferroni method for computing experimentwise power is conservative when experimentwise power is low. However, when experimentwise power is high (i.e., ;::: 80%), little difference in estimated experimentwise power exists between the Bonferroni method and the independence assumption method. An equal allocation of patients in controlled, randomized clinical trials with three or more arms is not always

5

Page 4: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

desirable. in terms of experimentwise type II error. Unequal allocation proves to be more efficient and powerful. The structure of the alternative hypothesis dictates the allocation ratio which maximizes experimentwise power. The treatment group with the lowest hypothesized response rate (usually a placebo or standard treatment group) should have fewer patients allocated in comparison to the remaining treatment groups. A benefit of unequal allocation when compared to the equal allocation scheme is that typically fewer patients are required to maintain a desired level of experim.entwise power. Also, fewer patients are randomized to the hypothesized inferior treatment ann, which is ethically appealing. Finally, the primary comparison of interest (test versus standard or test dosage 1 versus test dosage 2) has increased power, without additional cost.

References

(1) PC O'Brien, Comment to Multiple Comparisons in Over­ the-Counter Drug Clinical Trials with Both Positive and Placebo Controls, Statistics in Medicine, Vol 10, 9-11 (1991).

(2) S Senn, Crossover Trials in Clinical Research, John Wiley & Sons Ltd, West Sussex, England (1993).

(3) AI Hayter and W Liu, Communications in Statistics A, Vol 21, 1871-1889 (1992).

(4) GG Koch, Comment to Multiple Comparisons in Over-the­ Counter Drug Clinical Trials with Both Positive and Placebo Controls, Statistics in Medicine, Vol 10, l3-16 (1991)

(5) JL Fleiss, Statistical Methods For Rates And Proportions, John Wiley & Sons, New York, New York (1981).

(6) SAS® Language Reference Version 6 First Edition, SAS Institute Inc., Cary, NC (1990).

(7) ME Stokes, CS Davis, GG Koch, Categorical Data Analysis Using the SAS@ System, SAS Institute Inc., Cary, NC (1995).

6

Page 5: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

Appendix A

Decision Matrix for a three-armed, three-comparison trial

TRUE STATE OF NATURE

DECISION A=B=C A=B,C:;t:A,B A=C,B:¢:A,C B=C,A:;t:B,C A:;t:B:;t:C A A B vs vs vs B C C E E E Correct Decision 132, 133 131, 133 131, 132 131, 132, 133 E N N a2, a3 Correct Decision 131, a2 131, 0.3, 131

N E N al, a3 a!, 132 Correct Decision 132, a3 132

N N E a!, a2 a!, 133 a2, 133 Correct Decision 133

N N N a.!, a2, a3 aj a2 0.3 Correct Decision

E E N 0.3 132 131 131, 132, a3 131, 132

E N E a2 133 13 1, a2, 133 131 131, Ih

N E E al o,j, 132, 133 133 132 lh,133

Where: N = Reject He (not equal) E = Do not reject He (equal) aj = type I error for the A vs B comparison a2 = type I error for the A vs C comparison a3 = type I error for the B vs C comparison 131 = type IT error for the A vs B comparison 132 = type IT error for the A vs C comparison 133 = type IT error for the B vs C comparison

7

Page 6: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

AppendixB

Experimentwise Power

Let the events A, B, and C represent the following:

A - the S versus P comparison results in a statistically significant p-value « 0.05), given that S and P truly differ.

B - the T versus P comparison results in a statistically significant p-value « 0.05), given that T and P truly differ.

C - the T versus S comparison results in a statistically significant p-value « 0.05), given that T and S truly differ.

peA) = 1 -l3sp

PCB) = 1 -l3tp

P(C) = 1 - I3ts

Experimentwise Power = P(ABC) = Probability that all three comparisons will result in statistically significant p-values « 0.05) given T '* S '* P

Bonferroni Inequality

P(ABC) = 1 - P(ABC)

- - - = 1 - [P(A) + PCB) + P(C) - peA B C)

- - - ;?: 1 - [P(A) + PCB) + P(C)]; note: peA B C) ;?: 0

Independence Assumption

P(ABC) = P(A)P(B)P(C)

= [1 -l3ts][l- J3tp][I- J3sp]

8

Page 7: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

AppendixC

1*============================================================= Program: POWER.SAS

Purpose: Create program to compute overall power (for three individual hypothesis) and to output the sample size per group to maintain a given overall power, for population proportions.

SAS Version: 6.09

Created by: Ben Duncan Date: 17JAN96

Description: Optimal sample sizes for randomized, controlled, clinical studies with three treatment groups are computed. The computations are made assuming that all three possible comparisons are of interest and that the outcome or primary response is a proportion. An additional characteristic of the computations are that one treatment group is designated as a standard or control treatment(i.e. placebo, lowest dose of a particular study drug, different drug utilized as a standard treatment) and the other two treatment groups are designated as the study or experimental drug where each drug has a different dose. The computations fix the two study drugs to have the same sample size and compares overall power when varying the ratio of the sample size for the standard treatment versus the sample size for study drug treatment groups. Power computations are based upon Fleiss (Equations 3.19 and 3.20, pg. 45 from Statistical Methods for Rates and Proportions). Equation 3.20 is used for the contintuity correction

This program creates two macros POWER1 and POWER2. The POWERl macro computes individual and overall power for the three comparisons of interest. Based upon a pre-selected overall sample size, the sample allocation that gives the best overall power is output. Also a printout and power curve are output for all of the possible combinations for the pre-selected overall sample size. Options for the POWERl macro are as follows:

STANDARD = The hypothesized response level for the standard or placebo treatment group (where 0 < STANDARD < 1)

S _DRUG 1 = The hypothesized response level for the 1 st study drug treatment group (where 0 < S_DRUGl < 1)

S_DRUG2 = The hypothesized response level for the 2nd study drug treatment group (where 0 < S_DRUG2 < 1)

ALPHA = The desired level of alpha (where 0 < ALPHA < 1) The default level of alpha is 0.05

N = The desired overall sample size for the three combined treatment groups

PRT_ALL = The default value is 'YES' which prints all the possible combinations considered in the computations. 'NO' will print only the sample allocation that gives the best overall power for the total sample size.

The POWER2 macro outputs the optimum allocation of sample size per group to maximized overall power for a pre-specified range of overall sample size.

STANDARD = The hypothesized response level for the standard or placebo treatment group (where 0 < STANDARD < 1)

S _ DRUG 1 = The hypothesized response level for the 1 st study drug treatment group (where 0 < S_DRUGl < 1)

9

Page 8: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

S_DRUG2 == The hypothesized response level for the 2nd study drug treatment group (where 0 < S_DRUG2 < 1)

ALPHA = The desired level of alpha (where 0 < ALPHA < 1) The default level of alpha is 0.05

N _ LOW = The desired sample size starting point of the computations

N_IllGH == The desired sample size ending point of the computations

S1EP = The desired increment between N_LOW and N_IllGH

OPTIONS LS==80 PS=60;

*._---- POWERI Macro

%macro powerl(standard=, s_drugl=, s_drug2==, alpha=O.05, prt_all=YES, n=);

datapowerl(keep=Npl p2 p3 alphanl n2 n3 rl2 rl3 r23 power12 power13 power23 o_power prt all);

prt_all = "&Prt_all";

N=&n;

low = round(nl3); high = nl2 - 1;

** Do loops are used to compute all combinations from the low to high specifications. **;

do n2 = low to high; n3 =n2; nl == N - (n2 + n3);

rl2 = n2/nl; r13 =n3/nl; r23 =n3/n2;

pl = &standard; p2 == &s_drugl; p3 = &s_drug2; alpha = &alpha;

** Probit function returns the z-score for the corresponding level of alpha and stores it in the variable z _ alpha. * *;

z _alpha = probit( (1 - (&alpha )/2));

** Formulas are derived from Fleiss (3.19 and 3.20), using algebra. the Zbeta-score is computed for each of the three pairwise comparisons. **;

10

*1

Page 9: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

cb12 = «z_alpha * sqrt«r12+ 1) * «Pl+r12*p2)1(r12+ 1» * (1-(pI +r12*p2)/(r12+ 1 »» - sqrt(r12*(p2-p1)**2) * sqrt(n1-(rI2+1)/(r12*abs(p2-pl»» / sqrt(r12*pl *(l-pl) + p2*(1-p2»;

cbB = «z_alpha * sqrt(r13+l) * «Pl+r13*p3)1(r13+ 1» * (l-(pl +r13*p3)1(r13+ 1 »» - sqrt(r13*(p3-pl)**2) * sqrt(nl-(r13+1)1(r13*abs(p3-pl»» I sqrt(r13*pl*(I-pl) + p3*(1-p3»;

cb23 = «z_alpha * sqrt(r23+ 1) * «p2+r23*p3)/(r23+1» * (1-(p2+r23*p3)/(r23+1»» - sqrt(r23*(p3-p2)**2) * sqrt(n2-(r23+l)l(r23*abs(p3-p2»» / sqrt(r23*p2*(1-p2) + p3*(1-p3»;

** The corresponding type Il error probabilities are computed for each pairwise comparison using the probnorm function, **;

beta12 = probnorm(cb12); beta13 = probnorm(cb13); beta23 = probnorm(cb23);

** The pairwise power and experimentwise power are computed from the type Il error probabilities. **;

power12 = 1 - beta12; power13 = 1 - beta13; power23 = 1 - beta23;

** Experimentwise power using the independence assumption. **;

o_power = power I 2 * power13 * power23;

output;

end;

run' ,

proc sort data=power 1 out=bpower 1; by n descending 0 _power,

run;

data bpower 1; set bpower 1; byn; if first.n then output;

run;

proc print data=bpowerl split='*'; var nl n2 n3 r12 pl p2 p3 alpha 0 _power power12 power13 power23; label r12 = 'RATIO'; label 0 _power = 'OVERALL *POWER'; titlel Treatment group sample sizes that give the best overall power for'; title2 "a total sample size of &N"; footnotel 'nl = standard/placebo, n2 = study drug 1, n3 = study drug 2'; footnote2 'Computations made where the treatment group sample size for the'; footnote3 'two high dose ondansetron groups are fixed to be equal';

run;

proc print data=power 1 (where=(prt _ all="YES "» split=?"; var n1 n2 n3 r12 pl p2 p3 alpha power I 2 power13 power23 o_power, label r12 = 'Ratio'; label 0 _power = 'OVERALL *POWER';

11

Page 10: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

label 0 __power = 'OVERALL *POWER'; title 1 'Power Calculations of varying treatment group ratios'; title2 "for a total sample size of &N";

run;

** Create a SAS Graph **;

filename fig1 'BIO$STAT_S3ACNS:[BD16806]pOWER.PS1';

goptions rotate=landscape device=ps300 nodisplay hsize=9 vsize=6 gsflen=130 ftext=none ctext=grayOO cback=grayff hpos=90 vpos=60 horigin= 1 vorigin= 1;

goptions gsfuame=fig1 gsfmode=replace;

symboll interpol=join value=none repeat=l c=red;

proc gplot data=powerlrwhere=trtz-c=tj); titlel 'Power Curve'; footnotel 'NI is the standard treatment group sample size'; plot o__power*rl2;

run;

%mend powerl;

*._---- POWER2 Macro

%macro power2( standard=, s_drugl=, s_drug2=, alpha=O.05, n_low=, n_high=, step=);

data power2(keep=N n1 n2 n3 rl2 power12 power13 power23 o_power);

do n = &n _low to &n _ high by &step; low = round(N/3); high = round(N/2 - I);

do n2 = low to high;

n3=n2; nl =N -(n2+n3);

rl2 = n2/nl; r13 =n3/nl; r23 =n3/n2;

pl = &standard; p2 = &s_drugl; p3 = &s_drug2; alpha = &alpha;

z_alpha = probit«(l - (&alpha)/2»;

cbl2 = «z_alpha * sqrt«r12+l) * «p1+r12*p2)/(r12+ 1» * (l-(pl+r12*p2)/(r12+1»» - sqrt(rl2*(p2-pl)**2) * sqrt(n1-(r12+ 1)/(r12*abs(p2-p1»» / sqrt(rI2*pI *(I-pl) + p2*(I-p2»;

12

Page 11: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

ebB = «z_alpha * sqrt«r13+1) * «Pl+r13*p3)/(r13+I» * (l-(pl+r13*p3)/(r13+ I»» - sqrt(r13*(p3-pI )**2) * sqrt(nl-(rI3+l )I(r13*abs(p3-pl »» I sqrt(r13*pl *(l-pl) + p3*(1-p3»;

eb23 = «z_alpha * sqrt«r23+l) * «p2+r23*p3)1(r23+1» * (I-(p2+r23*p3)1(r23+ 1 »» - sqrt(r23*(p3-p2)**2) * sqrt(n2-(r23+ 1 )I(r23*abs(p3-p2»» / sqrt(r23*p2*(1-p2) + p3*(1-p3»;

betaI2 = probnorm(cb12); beta13 = probnorm(cb13); beta23 = probnonn( cb23);

powerI2 = I - betaI2; power13 = I - beta13; power23 = I - beta23;

o_power = powerI2 * power13 * power23;

output;

end;

end;

run;

proc sort data=power2; by n descending 0 _power;

run;

data power2; setpower2; by n; if first.n then output;

run;

proc print data=power2; title! 'Treatment group sample sizes that give the best overall power for'; title2 'a selected range of total sample size.'; footnotel 'nl = standard/placebo, n2 = study drug 1, n3 = study drug 2'; footnote2 'Computations made where the treatment group sample size for the'; footnote3 'two ondansetron groups are fixed to be equal';

run;

** Create a SAS Graph **;

tilenarne fig2 'BIO$STAT_S3ACNS:[BD16806]POWER.PS2';

goptions rotate=landscape device=ps300 nodisplay hsize=9 vsize=6 gsflen=130 fiext=none ctext=grayOO cback=grayfI hpos=90 vpos=60 horigin=l vorigin=l;

goptions gsfname=fig2 gsfmode=replace;

symbol1 interpol=join value=none repeat=I c=red;

proc gplot data=power2; title 1 'Power Curve';

13

Page 12: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

footnotel 'N is the total sample size with the optimum ratio of the study drug'; footnote2 'treatment groups to the standard treatment group to obtain maximum'; footnote3 'power'; plot 0 _power*n;

run;

%mend power2;

*---- Macro calls

%power I (standard=.3, s_drugl=.45, s_drug2=.55, alpha=O.05, prt_all=YES, n=1117);

%power2(standard=.3, s_drugl=.45, s_drug2=.55, alpha=O.05, n_10w=800, n_high=1400, 81ep=10);

14

Page 13: CONTROLLING FOR EXPERIMENTWISE TYPE II ERROR ...patients when 503 patients are allocated to each the standard and test treatment groups and 230 patients are allocated to the placebo

o POWER - 0.81

0.80

0.79

0.78

U1

0.77

0.76

0.75

0.74

0.73

Figure ( 1 ) Power Curve

o 2

R12

R12 is the ratio of the S or T treatment versus P o POWER is the Experimentwise Power

4 1 3