test for qualitative interaction in equivalence trials when the number of centres is large

12
STATISTICS IN MEDICINE Statist. Med. 2004; 23:711–722 (DOI: 10.1002/sim.1658) Test for qualitative interaction in equivalence trials when the number of centres is large Xin Yan ; Clinical Biostatistics; Merck Research Laboratories; Jolly Road; P.O. Box 4; Blue Bell; PA 19422; U.S.A. SUMMARY We propose a generalized testing procedure to test for qualitative interaction in equivalence trails when the number of centres is large. The proposed testing procedure allows for an adaptable denition of qualitative interaction that can take into account the total number of centres. A tuning parameter k (k ¿0) is introduced to quantify qualitative interaction. The testing procedure is proposed for equivalence trials with symmetric or asymmetric margins. In addition to the test procedure, we also provide explicit formulae for the power calculation. The proposed test is relatively easy to implement using any statistical software. Examples for detecting qualitative interaction are given to illustrate the method. Copyright ? 2004 John Wiley & Sons, Ltd. KEY WORDS: qualitative interaction; equivalence trials; equivalence margins; power function; test size 1. INTRODUCTION Assessment of consistency of the between-treatment dierences over investigational centres often occurs in equivalence trials. There are two categories of treatment-by-centre interaction: quantitative and qualitative. Quantitative interaction implies that the between-treatment dif- ferences over centres have the same directions, either all positive or all negative, with the changes only in magnitude. The interpretation of overall treatment eect is less dicult for this situation. Qualitative interaction may exist when the between-treatment dierences over centres change not only in magnitude but also in direction. In other words, if one treatment is superior to another treatment in some centres and inferior to another treatment in some other centres, it may imply qualitative interaction which often causes diculties in the assessment of treatment eect. Detection of qualitative interaction has received much attention in the literature [1–3]. Gail and Simon [4] published a seminal paper that proposed a likelihood ratio-based test for detecting qualitative interaction. Piantadosi and Gail [5] suggested a range test for detecting qualitative interaction. Weins and Heyse [6] introduced a testing procedure for qualitative Correspondence to: Xin Yan, Clinical Biostatistics, Merck Research Laboratories, Jolly Road, PO Box 4, Blue Bell, PA 19422; U.S.A. E-mail: xin [email protected] Received September 2002 Copyright ? 2004 John Wiley & Sons, Ltd. Accepted July 2003

Upload: xin-yan

Post on 06-Jul-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Test for qualitative interaction in equivalence trials when the number of centres is large

STATISTICS IN MEDICINEStatist. Med. 2004; 23:711–722 (DOI: 10.1002/sim.1658)

Test for qualitative interaction in equivalence trials whenthe number of centres is large

Xin Yan∗;†

Clinical Biostatistics; Merck Research Laboratories; Jolly Road; P.O. Box 4; Blue Bell; PA 19422; U.S.A.

SUMMARY

We propose a generalized testing procedure to test for qualitative interaction in equivalence trails whenthe number of centres is large. The proposed testing procedure allows for an adaptable de�nition ofqualitative interaction that can take into account the total number of centres. A tuning parameter k(k¿0) is introduced to quantify qualitative interaction. The testing procedure is proposed for equivalencetrials with symmetric or asymmetric margins. In addition to the test procedure, we also provide explicitformulae for the power calculation. The proposed test is relatively easy to implement using any statisticalsoftware. Examples for detecting qualitative interaction are given to illustrate the method. Copyright ?2004 John Wiley & Sons, Ltd.

KEY WORDS: qualitative interaction; equivalence trials; equivalence margins; power function; test size

1. INTRODUCTION

Assessment of consistency of the between-treatment di�erences over investigational centresoften occurs in equivalence trials. There are two categories of treatment-by-centre interaction:quantitative and qualitative. Quantitative interaction implies that the between-treatment dif-ferences over centres have the same directions, either all positive or all negative, with thechanges only in magnitude. The interpretation of overall treatment e�ect is less di�cult forthis situation. Qualitative interaction may exist when the between-treatment di�erences overcentres change not only in magnitude but also in direction. In other words, if one treatment issuperior to another treatment in some centres and inferior to another treatment in some othercentres, it may imply qualitative interaction which often causes di�culties in the assessmentof treatment e�ect.Detection of qualitative interaction has received much attention in the literature [1–3].

Gail and Simon [4] published a seminal paper that proposed a likelihood ratio-based test fordetecting qualitative interaction. Piantadosi and Gail [5] suggested a range test for detectingqualitative interaction. Weins and Heyse [6] introduced a testing procedure for qualitative

∗Correspondence to: Xin Yan, Clinical Biostatistics, Merck Research Laboratories, Jolly Road, PO Box 4,Blue Bell, PA 19422; U.S.A.

†E-mail: xin [email protected] September 2002

Copyright ? 2004 John Wiley & Sons, Ltd. Accepted July 2003

Page 2: Test for qualitative interaction in equivalence trials when the number of centres is large

712 X. YAN

interaction in non-inferiority trials. Pan and Wolfe [7] developed a test procedure along withan explicit power function, which allowed for a small, clinically negligible di�erence d¿0 inthe formulation for detecting qualitative interaction and calculating the corresponding power.This clinically insigni�cant margin d¿0 was used to determine whether or not the con�denceinterval of the between-treatment di�erence was completely outside of the equivalence region[−d; d]. Their approach is particularly useful in dealing with equivalence trials since simplydividing between-treatment di�erences into two categories, positive or negative, may leadto an overly liberal conclusion on qualitative interaction which may be both clinically andstatistically inappropriate. Pan and Wolfe’s method provided a practical approach, which isnovel and not a simple modi�cation of the test proposed by Gail and Simon, by removingthe between-treatment di�erences of clinical insigni�cance from the body of evidence ofqualitative interaction. However, in Pan and Wolfe’s method, qualitative interaction is alwaysa�rmed when one pair of con�dence intervals of the between-treatment di�erence are entirelyoutside of the equivalence margins regardless of the total number of centres. Therefore, theirmethod is more appropriate for the clinical trials with a small or a medium number of centres.When the number of centres is large, Pan and Wolfe’s method may be overly conservative.For the trial with a large number of centres, there may be more than one pair of the

con�dence intervals of the between-treatment di�erence falling outside of the equivalencemargins, but they are still a small portion of the total number of con�dence intervals. As longas a dominant portion (generally ¿95 per cent) of the con�dence intervals of the between-treatment di�erence overlap the pre-de�ned equivalence region [−d; d], it could imply nosigni�cant qualitative interaction.For example, in a trial of four centres where two out of four con�dence intervals of

the between-treatment di�erence fall entirely outside of the equivalence margins, evidencesuggests the presence of severe qualitative interaction, whereas in a trial with 50 clinicalcentres, it may not suggest severe qualitative interaction when two out of 50 con�denceintervals of the between-treatment di�erence fall entirely outside of the equivalence margins.Therefore, a sound testing procedure should take the total number of clinical centres, thebetween-treatment di�erence and the equivalence margins into account in detecting qualitativeinteraction. Unfortunately, this is not an option in the existing methods. Hence, it is imperativeto develop a more adaptable testing procedure that takes into account the total number ofcentres in detecting qualitative interaction. This is essentially the goal of this paper.The paper is arranged in the following order. In Section 2, we de�ne the hypotheses along

with a three-step adaptive testing procedure. The power functions for both the symmetricand asymmetric equivalence margin are presented in Section 3. In Section 4, two examplesare given to illustrate the proposed method for both symmetric and asymmetric equivalencemargins. Section 5 includes a brief discussion. Finally, derivations of the test size and powerfunction are outlined in Appendix A.

2. THE TESTING PROCEDURE

Consider a clinical trial of two treatment groups, and we are interested in the consistencyof the between-treatment di�erences over I centres. Let −d and d be the �xed equivalencemargins which often have clinical relevance. Let Di be the between-treatment di�erence forcentre i. Assume that Di’s are independent and each is normally distributed with a mean �i

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 3: Test for qualitative interaction in equivalence trials when the number of centres is large

TEST FOR QUALITATIVE INTERACTION 713

and a known standard deviation �i. When �i is unknown, a consistent estimate of �i shouldbe used. We then propose the following procedure to test for qualitative interaction.(1) De�ne the null and alternative hypothesis

H0(d; k): {I − k of �i¿−d or I − k of �i¡d for k¿0; i∈{1; 2; : : : ; I}} (1)

H1(d; k): {at least k + 1 of �i¡−d and another k + 1 of �i¿d; k¿0

i∈{1; 2; : : : ; I}} (2)

Note that the null hypothesis above is identical to that of Pan and Wolfe’s method whenk=0. However, the null hypothesis above allows for k¿0. This is novel in that it allows forselection of a �xed number of centres whose con�dence intervals of the between-treatmentdi�erence are outside of the equivalence margins without indicating qualitative interaction. Infact, k can be considered as a tuning parameter to quantify qualitative interaction and shouldbe carefully pre-determined based on the nature of clinical trial and the total number of centres.This needs to be substantially brought into attention in order to prevent the proposed testingprocedure from being rendered suspect due to the selection of a value of k. We recommendthat k should be selected less than 5 per cent of the total number of the centres when thedesign is balanced. Based on the above, hypothesis, qualitative interaction is con�rmed ifat least k + 1 con�dence intervals of the between-treatment di�erence are entirely abovethe upper equivalence margin and at least another k + 1 con�dence intervals of the between-treatment di�erence are completely below the lower equivalence margin. We will discuss howto determine the level of each con�dence interval and calculate the power for the proposedtesting procedure.(2) Three-step testing procedure C(d; k)Step 1: Determine a level of the simultaneous con�dence intervals for the parameter �i by

selecting ePE =2(1− �)1=(I−k−1) − 1; i=1; 2; : : : ; I (3)

where � is the test level.Step 2: Calculate the con�dence intervals (Li; Ui) of a level PE for the parameter �i for

i=1; 2; : : : ; I :

Li =Di − z(1− PE2

)�i (4)

Ui =Di + z(1− PE2

)�i (5)

where z((1 − PE)=2) is the upper (1 − PE)=2 quartile of the standard normal distribution,and Di is the observed between-treatment mean di�erence at centre i. The calculation ofthese con�dence intervals is straightforward, and they are independent of the choice of thepre-de�ned equivalence margin d.Step 3: Perform the test based on equivalence margin and simultaneous con�dence interval

(Li; Ui). Do not reject the null hypothesis if at most 2k pairs of the con�dence intervalsof the between-treatment di�erences are outside of the equivalence margins. Reject the null

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 4: Test for qualitative interaction in equivalence trials when the number of centres is large

714 X. YAN

hypothesis if at least k+1 con�dence intervals of �i are entirely above the upper equivalencemargin d and at least another k+1 con�dence intervals of �i are completely below the lowerequivalence margin −d.The proposed three-step testing procedure is relatively easy to implement. Note that the

type I error for the proposed testing procedure is controllable and is directly related to thelevel of the simultaneous con�dence intervals of the between-treatment di�erence. We showin Appendix A that the proposed test C(d; k) is better than a test of size � in the hypothesisspace H (d; k; �), i.e.

supH (d; k; �)

Prob(C(d; k) rejects H0)6� (6)

Note that k could be greater than 0 so that the null hypothesis may be con�rmed at a higherpower when the total number of centres is large. For the trial with asymmetric equivalencemargins d1 and d2, the proposed testing procedure may be denoted by C(d1; d2; k) and theabove three-step testing procedure remains valid, except for substituting −d and d with d1and d2, respectively.

3. POWER FUNCTION

3.1. Power function for symmetric equivalence margins

For the trial with symmetric equivalence margins, the proposed test C(d; k) has the followingpower function:

Power function

=1− ∏i∈A�(d+ �i�i

+ z(1− PE2

)) ∏j=j1 ; j2 ;:::; jl∈Ac

�!( j)(−d− �j

�j− z

(1− PE2

))

− ∏i∈B�(d− �i�i

+ z(1− PE2

)) ∏j=j′1 ; j

′2 ;:::; j

′m∈Bc

�!( j)(−d+ �j

�j− z

(1− PE2

))

+∏i∈C

{�(d− �i�i

+ z(1− PE2

))−�

(−d− �i�i

− z(1− PE2

))}

× ∏j=j1 ; j2 ;:::; jl∈Ac

�!( j)(−d− �j

�j− z

(1− PE2

))

× ∏j=j′1 ; j

′2 ;:::; j

′m∈Bc

�!( j)(−d+ �j

�j− z

(1− PE2

))(7)

where set A is the index set for the con�dence intervals (Li; Ui) whose upper bound Ui¿−d.i.e. A= {i; |Ui¿−d; for all i}. Set B is the index set for the con�dence intervals (Li; Ui)whose lower bounds Li6d. i.e. B= {i; |Li6d; for all i}. Sets Ac and Bc are the comple-mentary sets of A and B, respectively. The indices in sets Ac or Bc are the indices whosecorresponding con�dence intervals (Lj; Uj) are entirely outside of the equivalence margins.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 5: Test for qualitative interaction in equivalence trials when the number of centres is large

TEST FOR QUALITATIVE INTERACTION 715

Set C as the index set for the con�dence intervals whose upper bound Ui¿−d and lowerbound Li6d. i.e. C=A∩B. The number of selected indices from either Ac or Bc should notbe greater than k for power calculation. When the number of indices in set Ac is greater thank we should select indices j1; j2; : : : ; jk such that

(�j1 ; �j2 ; : : : ; �jk )= arg maxj1 ; j2 ;:::; jk∈Ac

∏j=j1 ; j2 ;:::; jk

�!( j)(−d− �j

�j− z

(1− PE2

))

Similarly, when the number of indices in set Bc is greater than k the indices j′1; j′2; : : : ; j

′k ∈Bc

should be selected such that

(�j′1 ; �j′2 ; : : : ; �j′k )= arg maxj′1 ; j

′2 ;:::; j

′k∈Bc

∏j=j′1 ; j

′2 ;:::; j

′k

�!( j)(−d+ �j

�j− z

(1− PE2

))

According to the proposed testing procedure, the null hypothesis H0 will not be rejected if thetotal number of indices in set A or B is greater than or equal to I−k. The null hypothesis willbe rejected if the numbers of indices in both sets A and B are less than I − k simultaneously.In the expression of the power function, the !(j) is an indicator function for con�denceinterval j. i.e. !(j)=1 if index j is in set Ac or set Bc, otherwise, !(j)=0. The function �is the cumulative distribution function of the standard normal distribution, z(�) is the upper1− � quartile of the standard normal distribution, and

�=1− (1− �)1=(I−k−1) (8)

The power can be calculated using observed di�erence Di in place of �i, con�dence intervals,and pre-de�ned equivalence margins. The standard deviation �i can be estimated by the squareroot of the sample variance using the data from centre i. The derivation of the power functionis outlined in Appendix A.

3.2. Power function for asymmetric equivalence margins

The power function for symmetric equivalence margins can be readily extended to that withasymmetric equivalence margins:

Power function

= 1− ∏i∈A�(−d1 + �i

�i+ z

(1− PE2

)) ∏j=j1 ; j2 ;:::; jl∈Ac

�!( j)(d1 − �j�j

− z(1− PE2

))

− ∏i∈B�(d2 − �i�i

− z(1− PE2

)) ∏j=j′1 ; j

′2 ;:::; j

′m∈Bc

�!( j)(−d2 + �j

�j+ z

(1− PE2

))

+∏i∈C

{�(d2 − �i�i

+ z(1− PE2

))−�

(d1 − �i�i

− z(1− PE2

))}

× ∏j=j1 ; j2 ;:::; jl∈Ac

�!( j)(d1 − �j�j

− z(1− PE2

))

× ∏j=j′1 ; j

′2 ;:::; j

′m∈Bc

�!( j)(−d2 + �j

�j+ z

(1− PE2

))(9)

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 6: Test for qualitative interaction in equivalence trials when the number of centres is large

716 X. YAN

where d1 and d2 are the lower and upper equivalence margins, respectively. The index setsA–C, and the normal percentile z(�) are the counterparts in the power function for symmetricequivalence margins. Examples of the power calculation for both symmetric and asymmetricequivalence margins will be given in the next section.

4. EXAMPLE

The �rst example is applied in a setting with asymmetric equivalence margins. The proposedtest C(d1; d2; k) is applied to a vaccine trial to compare the consistency of the experimentalvaccine from three di�erent lots over 20 clinical centres. The primary endpoint for the exper-imental vaccine is the geometric mean of the immune antibody titre in initially seronegativesubjects. The criterion to show the equivalence of the three lots requires that the two-sided90 per cent con�dence interval for the logarithm of immune antibody titre di�erence beentirely within the asymmetric interval (0:67; 1:5). An ANOVA model is applied to performpairwise comparisons, with a response of logarithm of the immune antibody titre and �xed ef-fect of treatment, study centre and treatment-by-centre interaction. After a treatment-by-centreinteraction is detected between lots 2 and 3, it is desirable to know whether or not the interac-tion is qualitative. To this end, the observed between-treatment di�erence Di and its standarddeviation are calculated for each centre. The test level �=0:1 and the con�dence levels arePE =0:989 and 0:988 for k=0 and 1, respectively. The pre-speci�ed asymmetric equivalencemargins are d1 = 0:67 and d2 = 1:5. The con�dence intervals of the between-treatment dif-ference of the immune antibody titres in a log-scale over 20 pooled centres are shown inFigures 1 and 2 for k=0 and 1. The �gures clearly show that there is no con�dence intervalentirely outside of the equivalence margins. Applying test C(0:67; 1:5; 0) or C(0:67; 1:5; 1),we con�rm no statistically signi�cant qualitative interaction for the immune antibody titrebetween lots 2 and 3. In this example, both tests (k=0; 1) give the same conclusion.Furthermore, the proposed power function is used to calculate the powers for the test

C(d1; d2; 0), C(d1; d2; 1) and C(d1; d2; 2). For k=0, our test is identical to the test proposedby Pan and Wolfe (1997). The powers are 77.5, 78.5 and 79:6 per cent for k=0, 1 and 2,respectively. It is worth noting that, with identical statistical conclusions and testing levels,the larger value of k tends to have a higher testing power. Since di�erent values of k actuallyde�ne di�erent null hypotheses, the comparison of powers under di�erent values of k isthen less meaningful. However, if the total number of centres is large, we may select areasonable value of k¿0 to reach a preferred power. We also calculated the p-value basedon Gail and Simon’s test (p-value= 0.732). In this example, Gail and Simon’s test, Pan andWolfe’s test and the proposed test (k=1) yield identical statistical conclusions. However,when k=1 the proposed test has higher power than Pan and Wolfe’s test. According to thetotal number of centres (20), we may select k=1 and apply C(0:67; 1:5; 1) to con�rm noqualitative interaction.The second example is applied to a simulated setting with symmetric equivalence margins,

where k=0 may not be an appropriate choice when the total number of centres is large. In thisexample, the treatment di�erences Di over 50 centres are generated from the normal distribu-tion N(0; 0:25). For each centre, normal data of size 40 are generated from the normal distribu-tion N(Di; 0:04). We then calculate the estimated standard deviations and con�dence intervalsof the mean di�erence as de�ned by testing procedure C(0:5; 1). For k=1, I =50 and �=0:1

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 7: Test for qualitative interaction in equivalence trials when the number of centres is large

TEST FOR QUALITATIVE INTERACTION 717

Clinical Center (K=0)

Log

Mea

n D

iffer

ence

0 5 10 15 20

0.5

1.0

5.0

Figure 1. Mean di�erences (log-scale) and their 98:9 per cent con�dence intervals across 20 centres(k =0). Two horizontal lines represent asymmetric equivalence margins d1 = 1:5 and d2 = 0:67. The

corresponding power is 77:5 per cent and the p-value is 0:05.

the required con�dence level PE =2× 0:91=48 = 0:996 and corresponding Z((1+PE)=2)=2:849.The con�dence intervals of the mean di�erence over 50 centres are presented in Figure 3,where two horizontal lines represent the symmetric equivalence margin d=−0:5 and 0:5, re-spectively. We further calculate the powers using the proposed power function for k=0,1, 2, 3, 4 and a test level �=0:1. The resultant powers are 79:0, 79:4, 79:9, 80:3 and80:8 per cent, respectively. As expected, the testing power tends higher when k increases,which is a typical feature of the proposed test. However, power comparison between di�erentvalues of k should be avoided. As shown in Figure 3,two con�dence intervals are completelyabove the upper equivalence margin d=0:5 and one con�dence interval is entirely below thelower equivalence margin d=−0:5. In both cases k=1 and 2, the existence of qualitativeinteraction is rejected at a power 79:4 and 79:9 per cent, respectively. Note that qualita-tive interaction could be con�rmed if we select k=0. However, since only 6 per cent ofthe con�dence intervals fall entirely outside of the equivalence margins, and a great portion(94 per cent) of the con�dence intervals fall into or overlap the equivalence margins, thereis strong evidence that we could accept the conclusion of no qualitative interaction. Thus,we may pre-determine k=1 which is less than 5 per cent of the total number of centres.Applying test C(0:5; 1), we accept the conclusion that there is no signi�cant evidence of

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 8: Test for qualitative interaction in equivalence trials when the number of centres is large

718 X. YAN

Clinical Center (K=1)

Log

Mea

n D

iffer

ence

0 5 10 15 20

0.5

1.0

5.0

Figure 2. Mean di�erences (log-scale) and their 98:8 per cent con�dence intervals across 20 centres(k =1). Two horizontal lines represents asymmetric equivalence margins d1 = 1:5 and d2 = 0:67. The

corresponding power is 78:5 per cent and the p-value is 0.05.

qualitative interaction at a level 0:1 and a power 79:4 per cent. This conclusion is quitereasonable since the mean di�erences over 50 centres are from N(0; 0:25) and the equivalenceregion is between d=0:5 and −0:5; then the mean di�erences should have a high chance offalling into the equivalence region, which is an evidence of no qualitative interaction.

5. DISCUSSION

A trial with more than 50 centres may be considered as the one having a ‘large’ numberof centres. The choice of k often depends not only on the total number of centres, butalso on the nature of the trial. In general, a value of k should be approximately less than5 per cent of the total number of centres. On the other hand, we are inclined to select a smallervalue of k in a trial for the patients with fatal or severe disease. Since the existing methoduses k=0 regardless of the total number of centres, the proposed test is more attractive anduseful when the total number of centres is large. The power for the proposed test usuallyincreases with a larger value of k. However, power comparison between di�erent values of kis less meaningful, since di�erent values of k correspond to di�erent hypotheses. Although wegave the corresponding powers for di�erent values of k, it is only for illustration of how the

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 9: Test for qualitative interaction in equivalence trials when the number of centres is large

TEST FOR QUALITATIVE INTERACTION 719

Clinical Center

Mea

n D

iffer

ence

0 10 20 30 40 50

-2

-1

0

1

2

Figure 3. Mean di�erences and their 98:8 per cent con�dence intervals across 50 centres (k =1). Twohorizontal lines represent symmetric equivalence margins d=−0:5 and 0:5. The corresponding power

is 79:4 per cent and the p-value is 0:1.

power changes due to di�erent values of k. It is mandatory to pre-determine a �xed value ofk when designing a trial. Moreover, the proposed test is more suitable for a balanced design.For a trial with unbalanced design, it may be appropriate to apply the sample size weightingscheme in the calculation of con�dence intervals.

APPENDIX A

A.1. Test C(d; k) is better than a test of size �

To show this we calculate the probability in which the proposed test C(k; d) rejects H0 whenH0 is true. Consider the scenario where both set Ac and Bc are non-empty. Note that the sizeof test C(k; d) is

P(C(d; k) rejects H0 |H0)=1− P(C(d; k) accepts H0 |H0)

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 10: Test for qualitative interaction in equivalence trials when the number of centres is large

720 X. YAN

=1− P(U ∩Ua or L∩La |H0)=1− P(U ∩Ua |H0)− P(L ∩ La |H0) + P(U ∩Ua ∩ L ∩ La |H0)

=1− ∏i∈A�(d+ �i�i

+ z(�)) ∏j=j1 ; j2 ;:::; jk∈Ac

�(−d− �j

�j− z(�)

)

−∏i∈B�(d− �i�i

+ z(�)) ∏j=j′1 ; j

′2 ;:::; j

′k∈Bc

�(−d+ �j

�j− z(�)

)

+∏i∈C

{�(d− �i�i

+ z(�))−�

(−d− �i�i

− z(�))}

× ∏j=j1 ; j2 ;:::; jk∈Ac

�(−d− �j

�j− z(�)

) ∏j=j′1 ; j

′2 ;:::; j

′k∈Bc

�(−d+ �j

�j− z(�)

)

The above probability reaches the maximum when one �i=+∞; i∈A, �j=−∞; j= j1; j2; : : : ;jk ∈Ac and the rest �l’s are all equal to −d, or when one �i=−∞; i∈B, �j=+∞; j=j′1; j′2; : : : ;j′k ∈Bc and the rest �l’s are all equal to d. Choosing the �rst scenario above, for example,the third and fourth terms in the above expression vanish entirely and the result is

supH (k; d; �)

Pr(C(k; d) rejects H0 |H0)61−∏

l∈A; l�=i�(z(�))=1− (1− �)I−k−1 (A1)

Recall that �=1− (1− �)1=(I−k−1), hence, the size of the proposed test is better than �. Theabove product term with the indices in set Ac or Bc becomes 1 when Ac or Bc is empty.Therefore, the above inequality still remains true, which implies that the test C(k; d) is betterthan a test of size �.

A.2. Proof of the power calculation formula for the test C(d; k)

Let (Li; Ui), i=1; 2; : : : ; I be the con�dence intervals proposed in Section 2. Let A be the indexset of the con�dence intervals whose upper bounds Ui¿−d. i.e. A= {i |Ui¿−d for all i}. LetB be the index set of the con�dence intervals whose lower bounds Li6d, i.e. B= {i |Li6d,for all i}. Denote C=A∩B. i.e. the index set of the con�dence intervals whose lower boundsare less than d and upper bounds are greater than −d. Under H0 there are at most I − kindices in the set A. Let Ac and Bc be the complementary set of A and B, respectively. LetU = {Ui¿−d; i∈A} and Ua= {Uj6 − d; j= j1; j2; : : : ; jl ∈Ac}. Let L= {Li6d; i∈B} andLa= {Lj¿d; j= j′1; j′2; : : : ; j′m ∈Bc}, where l6k and m6k. The !(j) is an indicator functionfor con�dence interval j, i.e. !(j)=1 if index j is in set Ac or set Bc, otherwise, !(j)=0.The function �(z) is the cumulative distribution of the standard normal distribution. Theproposed test C(d; k) rejects H0 if at least k +1 pairs of the con�dence intervals (Li; Ui) fallon the opposite sides of the equivalence margins. Consider the scenario where both Ac andBc are non-empty and the number of indices in each set is less than or equal to k. Since theDi’s are mutually independent across the centres, we have

Power function

=1− P(C(d; k) rejects H1|H1)

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 11: Test for qualitative interaction in equivalence trials when the number of centres is large

TEST FOR QUALITATIVE INTERACTION 721

=1− P{U ∩Ua or L∩La}

=1− P{Ui¿−d; i∈A}P{Uj¡−d; j= j1; j2; : : : ; jl ∈Ac}

−P{Li6d; i∈B}P{Lj¿d; j= j′1; j′2; : : : ; j′m ∈Bc}

+P{Ui¿−d and Li6d; i∈C}

×P{Uj ¡ −d; j= j1; j2; : : : ; jl ∈Ac}P{Lj¿d; j= j′1; j′2; : : : ; j′m ∈Bc}

where l and m are not greater than k. Next, the second term is

P{Ui¿−d; i∈A}P{Uj ¡ −d; j= j1; j2; : : : ; jl ∈Ac}

=∏i∈AP{Ui¿−d} ∏

j=j1 ; j2 ;:::; jl∈AcP{Uj¡−d}

=∏i∈AP{Di + z

(1− PE2

)�i¿−d

} ∏j=j1 ; j2 ;:::; jl∈Ac

P{Dj + z

(1− PE2

)�j¡−d

}

=∏i∈A�(d+ �i�i

+ z(1− PE2

)) ∏j=j1 ; j2 ;:::; jl∈Ac

�(−d− �j

�j− z

(1− PE2

))(A2)

and the third term is

P{Li6d; i∈B}P{Lj¿d; j= j′1; j′2; : : : ; j′m ∈Ac}

=∏i∈B�(d− �i�i

+ z(1− PE2

)) ∏j=j′1 ; j

′2 ;:::; j

′m∈Bc

�(−d+ �j

�j− z

(1− PE2

))(A3)

and the fourth term is

P{U and L}P{Ua}P{La}

=P{Ui¿−d and Li6d; i∈C}

×P{Uj ¡ −d; j= j1; j2; : : : ; jl ∈Ac}P{Lj¿d; j= j′1; j′2; : : : ; j′m ∈Bc}

=∏i∈CP{−d− �i

�i− z

(1− PE2

)6Di − �i�i

6d− �i�i

+ z(1− PE2

)}

× ∏j=j1 ; j2 ;:::; jl∈Ac

P(Uj¡−d) ∏j=j′1 ; j

′2 ;:::; j

′m∈Bc

P(Lj¿d)

=∏i∈C

{�(d− �i�i

+ z(1− PE2

))−�

(−d− �i�i

− z(1− PE2

))}

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722

Page 12: Test for qualitative interaction in equivalence trials when the number of centres is large

722 X. YAN

× ∏j=j1 ; j2 ;:::; jl∈Ac

�(−d− �j

�j− z

(1− PE2

))

× ∏j=j′1 ; j

′2 ;:::; j

′m∈Bc

�(−d+ �j

�j− z

(1− PE2

))(A4)

Combining (A2)–(A4) yields the desired power function when both sets A and B are non-empty. If sets A and B are all empty, or one of sets A and B is empty, we introduce anindicator function !(j) and use it as a power term in the expression of P(La) or P(Ua).Finally, when the number of the indices in Ac is greater than k we should select k indicessuch that the corresponding �j’s maximize

∏j=j1 ; j2 ;:::; jk∈Ac

�(−d− �j

�j− z

(1− PE2

))

A similar approach can be applied to the scenario where the number of indices in set Bc isgreater than k.

ACKNOWLEDGEMENTS

The author is grateful to an anonymous referee for insightful review and constructive comments. Theauthor is also grateful to colleagues in Merck Research Laboratories for bene�cial discussions.

REFERENCES

1. Zelterman D, Kersher RP. Test for qualitative interaction. Proceedings of the Biopharmaceutical Section,American Statistical Association, San Francisco, California, 1987.

2. Wellek S. Testing for absence of qualitative interactions between risk factors and treatment e�ects. BiometricalJournal 1997; 39:809–821.

3. Hsu JC. Multiple Comparisons, Theory and Methods. Chapman & Hall: London=CRC Press: Boca Raton, FL,1999.

4. Gail M, Simon R. Testing for qualitative interactions between treatment e�ects and patients subsets. Biometrics1985; 41:361–372.

5. Piantadosi S, Gail MH. A comparison of the power of two test for qualitative interactions. Statistics in Medicine1993, 12:1239–1248.

6. Weins BL, Heyse JF. Testing for interaction in studies of non-inferiority. Proceedings of the Annual meeting ofthe American Statistical Association, Atlanta, Georgia, 2001.

7. Pan G, Wolfe DA. Test for qualitative interaction of clinical signi�cance. Statistics in Medicine 1997; 16:1645–1652.

Copyright ? 2004 John Wiley & Sons, Ltd. Statist. Med. 2004; 23:711–722