power and non-inferiority richard l. amdur, ph.d. chief, biostatistics & data management core,...
TRANSCRIPT
![Page 1: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/1.jpg)
Power and Non-Inferiority
Richard L. Amdur, Ph.D.Chief, Biostatistics & Data Management Core, DC VAMC
Assistant Professor, Depts. of Psychiatry & SurgeryGeorgetown University Medical Center
![Page 2: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/2.jpg)
Power and Non-Inferiorityin Clinical Trials
Richard L. Amdur, Ph.D.Chief, Biostatistics & Data Management Core, DC VAMC
Assistant Professor, Depts. of Psychiatry & SurgeryGeorgetown University Medical Center
![Page 3: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/3.jpg)
If you can not reject the null hypothesis of ‘no effect’, this does not ‘prove’ there is no
effect
Why?
![Page 4: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/4.jpg)
Frequency Distribution for One VariableSubject Score
1 92 103 104 115 116 117 128 129 12
10 1211 1312 1313 1314 1315 1316 1417 1418 1419 1420 1521 1522 1523 1624 1625 1626 1727 1728 1829 1830 1931 2032 21
ScoreSubjectCount
% of total
9 1 0.5010 2 1.0011 3 1.5012 4 2.0013 5 2.5014 4 2.0015 3 1.5016 3 1.5017 2 1.0018 2 1.0019 1 0.5020 1 0.5021 1 0.50
total 32
Frequency Table
9 10 11 12 13 14 15 16 17 18 19 20 210
1
2
3
4
5
6
Score
Co
un
t
mean
sd
![Page 5: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/5.jpg)
Compare the outcomes of treatment vs. control groups
Effect Size = (meanTX – meanCON) / SDCON
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 2000
2
4
6
8
10
12
Control Test Tx
Outcome Score (Level of Functioning)
nu
mb
er
of s
ub
ject
s
^CON mean
^TX mean
![Page 6: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/6.jpg)
Mean difference=1
If SD=3, ES = 1/3= 0.33
![Page 7: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/7.jpg)
Mean difference=2
If SD=3, ES = 2/3 = 0.67
![Page 8: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/8.jpg)
Mean difference=3
If SD=3, ES = 3/3 = 1.0
![Page 9: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/9.jpg)
Mean difference=4
If SD=3, ES = 4/3 = 1.33
![Page 10: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/10.jpg)
Mean difference=4 SD = 3 ES = 1.33
![Page 11: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/11.jpg)
Mean difference=4 SD = 1.94 ES = 2.1
![Page 12: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/12.jpg)
Mean difference=4 SD = 1.1 ES = 3.6
![Page 13: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/13.jpg)
Type-I and Type-II Errors
Different 1- (Power)
The Same 1-
In fact, TX & Placebo are:
Different The Same
In your experiment, you observe thatTX & Placeboare:
a the rate of false positives, Type I error rateβ the rate of false negatives, Type II error ratePower = 1 – β, the rate of true positives
![Page 14: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/14.jpg)
Plot of Score Distribution under the Null and Alternative Hypotheses
0
0.1
0.2
0.3
-3 -2 -1 0 1 2 3 4 5
critical t = 1.97897
α2β
Using 2-tailed independent-groups t-test with alpha=.05, and power = .80
H0 H1
N needed per group is 64
t
p
![Page 15: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/15.jpg)
Plot of Score Distribution under the Null and Alternative Hypotheses
Using 2-tailed independent-groups t-test with alpha=.05, and power = .95
0
0.1
0.2
0.3
-2 0 2 4 6
critical t = 1.97143
α2
β
N needed per group is 105
t
p
![Page 16: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/16.jpg)
Total sample size
Pow
er
(1-β
err
pro
b)
t tests - Means: Difference between two independent means (two groups)Tail(s) = Two, α err prob = 0.05, Allocation ratio N2/N1 = 1, Effect size d = 0.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
20 40 60 80 100 120 140 160 180 200
![Page 17: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/17.jpg)
Total sample size
Pow
er
(1-β
err
pro
b)
t tests - Means: Difference between two independent means (two groups)Tail(s) = Two, α err prob = 0.05, Allocation ratio N2/N1 = 1, Effect size d = 1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
20 40 60 80 100 120 140 160 180 200
![Page 18: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/18.jpg)
Power is reduced by:• Measurement Error
- This will tend to ‘muddy’ the outcome scores, making tx effect harder to distinguish – i.e., it increases the SD of both the CON & TX groups, reducing the ES.
• Intent-to-treat analysis- If subjects drop out because they see no progress.
- S’s rarely drop out because they get cured early, but if they did, then completer-analysis would reduce power.
• Low disease severity- Less room for improvement
![Page 19: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/19.jpg)
If you can not reject the null hypothesis of ‘no effect’, this does not ‘prove’ there is no
effectWhy?
Because your power to detect an effect might have been low.
![Page 20: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/20.jpg)
Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41, 1149-1160.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191.
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/download-and-register
![Page 21: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/21.jpg)
Equivalence & Non-inferiority trials
How do you show that a new treatment is not inferior to a standard treatment?
![Page 22: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/22.jpg)
Quality of the evidence baseThere should be several 2-arm trials of ‘old’ tx vs. placebo, in order to get a range of effect sizes, and response rates (% who improve).
Ideally, there is at least one 3-arm double-blind placebo-controlled comparison (‘old’ tx vs. new tx vs. placebo)
![Page 23: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/23.jpg)
New Tx Beats Placebo
• Effect size vs. placebo is clinically significant.
•Mean difference on the primary outcome is statistically significant
• Response rate (% responders) is higher than placebo [how much higher is determined by prior studies]
![Page 24: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/24.jpg)
New Tx not substantially worse than established tx
• New tx mean on primary outcome is closer to the est. tx mean than the placebo mean.• New tx is not significantly different from the established tx. • New tx responder rate is not much lower than that of the established tx. [should be just within the range seen in prior studies]
• Lower bound of the 95% confidence interval for primary outcome falls above ∆.
![Page 25: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/25.jpg)
How to select ∆
• It is lower than the range of outcome differences seen in prior to studies of established tx vs. placebo. • The smallest value that could be considered a clinically meaningful effect (vs. placebo).• The mean difference that corresponds to a x% difference in responder rates. [x is determined by prior studies of the established tx vs. placebo]
![Page 26: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/26.jpg)
Other Criteria• Dosing & duration of each tx are within the range of known efficacy.• No confounds (despite randomization)• Sample size provides adequate power to detect a clinically significant difference.• Subjects have moderate disease severity.• ‘Per protocol’ set of subjects may be best (most conservative).
![Page 27: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/27.jpg)
Other Criteria• Tx compliance should be similar in both groups.• Low measurement error. If this is an interview or ratings, there is careful training & inter-rater reliability testing. If using a survey, the test is psychometrically sound._________________________________________________________________________________________________________________________
These threats all create bias in favor of finding equivalence, unlike a superiority trail, where they bias the study against finding an effect.
![Page 28: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/28.jpg)
Summary
• Evidence base adequate (for established tx).• New Tx beats placebo.• New Tx not substantially worse than established tx.• Study design features do not bias the results toward equivalence.
![Page 29: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/29.jpg)
Hypothetical Example
study mean diff
Mean placebo sx
scoreMean Est. Tx
sx score placebo sd ESPlacebo % responder
Est tx % responder
1 10 20 10 6.0 1.7 0.25 0.4
2 12 22 10 5.8 2.1 0.2 0.48
3 14 30 16 6.0 2.3 0.15 0.49
4 16 25 9 6.4 2.5 0.22 0.6
avg 13 24.25 11.25 6.05 2.14 20.5% 49.3%
1.2 1.4 1.6 1.8 2 2.2 2.4 2.60.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
f(x) = 0.209287575839711 x + 0.0441554948606184
Effect size
% T
rea
tme
nt
Re
sp
on
de
rs
Mean Difference v. placebo
Mean sx score
ES vs. placebo
ES vs. Est. tx
Estimated New tx % responder
8 16 1.33 -1.00 32.3%
9 15 1.50 -0.80 35.8%
10 14 1.67 -0.60 39.3%
Previous studies
Effect size x % respondersPossible values for ∆
If ∆ = 9, the lower bound of the 95% CI for the new tx primary outcome score must be < 15in order to claim non-inferiority. This is equivalent to a 36% responder rate and ES of 1.5 vs. placebo and -0.8 vs. established tx.
![Page 30: Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649da25503460f94a8e957/html5/thumbnails/30.jpg)
Hypothetical ResultsA) New tx mean = 16 B) New tx mean = 17
New tx is not inferior to EST. Tx New tx is inferior to EST. tx