1 controlling false positive rate due to multiple analyses controlling false positive rate due to...
TRANSCRIPT
1
Controlling False Positive Rate Controlling False Positive Rate Due to Multiple AnalysesDue to Multiple Analyses
Unstratified vs. Stratified Logrank Test
Peiling Yang, Gang Chen, George Y.H. Chi
DBI/OB/OPaSS/CDER/FDA
The view expressed in this talk are those of the authors and may not necessarily represent those of the Food and Drug Administration.
2
Motivation: Example of Drug X
Primary endpoint: Survival
Hypothesis: Overall constant H.R. 1 vs. >1Primary Analysis: Unstratified logrank
Results Observedstatistic
P-value(1-sided)
Unstratified 1.762 0.039Stratified 2.228 0.013
Q: Is this finding statistically significant?
3
Issues to Explore
• Implication of these tests/analyses.
• Eligibility of efficacy claim based on these tests/analyses.
• Practicability of multiple testing/analyses.
4
Outline
• Notations / Settings• Introduction to logrank test
– Unstratified, stratified
• Comparisons– Hypotheses, test statistic, test procedure, inference
• Practicability of hypotheses Testing• Multiple testing/analyses• Example of Drug X• Summary
5
Settings / Notations
• 2 arms (control j=1; experimental: j=2).
• K strata: k=1, .., K
• Patients randomized within strata
• t1 < t2 < …< tD: distinct death times
• dijk: # of deaths & Yijk: # of patients at risk at death time ti, in jth arm & kth stratum.
6
Settings / Notations
# o f d e a t h s a tt i m e t i
# o f p a t i e n t s a tr i s k a t t i m e t i
I n S t r a t u m k : 2i . k i j kj = 1d = d 2
. 1i k i j kjY Y
I n A r m j : Ki j . i j kk = 1d = d . 1
Ki j i j kkY Y
T o t a l : 2i . . i j .j = 1d = d 2
. . .1i i jjY Y
7
Settings / Notations
• Hazard ratio (ctrl./exper.): constant– Across strata: c
– Within stratum: ck
• Non-informative censoring
8
Introduction: Unstratified Logrank
1c u0H : v s . > 1cu
1H :
T e s t s t a t i s t i c : . 1 . . 1 .
. 1 .
[ ]
[ ]
uu
u
d E dW
V A R d
, w h e r e
. 1 .[ ]uE d = 1 .1 .
. .
ii
ii
dY
Y
. 1 .[ ]uV A R d = 1 . 2 . . . . .. .
. . . . . . 1i i i i
ii i ii
Y Y Y dd
Y Y Y
9
Introduction: Unstratified Logrank
• Wu ~ N(0,1) under least favorable parameter configuration (c=1) in .
• Reject if Wu > z.
• Type I error rate is controlled at level .
0uH
0uH
10
Introduction: Stratified Logrank
1kc s0H : f o r a l l k v s .
1kc s1H : f o r a t l e a s t o n e k .
T e s t s t a t i s t i c : . 1 . . 1 .
. 1 .
[ ]
[ ]
ss
s
d E dW
V A R d
, w h e r e
. 1 .[ ]sE d = 11
.
i ki k
i kk i
dY
Y
. 1 .[ ]sV A R d = 1 2 . ..
. . . 1i k i k i k i k
i ki k i k i kk i
Y Y Y dd
Y Y Y
11
Introduction: Stratified Logrank
• Ws ~ N(0,1) under least favorable parameter configuration (ck = 1 for all k) in .
• Reject if Ws > z.
• Type I error rate is controlled at level .
0sH
0sH
12
Comparison of Hypotheses
• Different hypotheses formulations:
– U nstratified :
0 : 1uH c vs . 1 : 1uH c
– S tratified :
: 1s0 kH c for a ll k vs.s1H : 1kc for a t least one k .
13
Comparison of Test Statistics
• Corr(Wu, Ws) = 1 because of same r.v. d.1.
• Ws = a Wu + b, wherewhere
• Wu ~ N(0, 1) Ws ~ N(b, a2)
a .1.
.1.
[ ]
[ ]
u
sVar d
Var d & b .1. .1.
.1.
[ ] [ ]
[ ]
u s
s
Ed Ed
Var d
.
14
Comparison of Test Procedure
To test 1c u0H : vs. > 1cu
1H :
– Use uW and reject u0H if uW > z.
– If use sW , adjusted critical value (az b )required for a valid level- test.
15
Comparison of Test Procedure
T o t e s t 1kc s0H : f o r a l l k v s .
1kc s1H : f o r a t l e a s t o n e k .
– U s e sW a n d r e j e c t s0H i f sW > z .
– I f u s e uW , a d ju s t e d c r i t i c a l v a lu e ( ) /z b a r e q u i r e d f o r a v a l id l e v e l - t e s t .
16
Comparison of Inference
• Rejection of : – Infer overall positive treatment effect in entire
population.
• Rejection of : – Can only infer positive treatment effect in "at least one
stratum".
– Further testing to identify those strata required to make claim & error rate for identifying wrong strata also needs to be controlled.
u0H
s0H
17
Practicability of Hypotheses Testing
• Unstratified hypotheses are tested when desired to infer overall positive treatment effect in entire population.
• Stratified hypotheses are tested when desired to infer positive treatment effect in certain strata.
• Multiple testing of both unstratified & stratified hypotheses ok when not sure whether treatment is effective in entire population or certain strata (but both nulls need to be prespecified in protocol).
18
Multiple Testing/Analyses
• Multiple testing unstratified (use Wu) & stratified (use Ws) hypotheses.
• Error to control: strong familywise error (SFE), including the following:– When c1 & all ck1: falsely infer c or some ck’s>1.
– When c1 & some ck’s>1: falsely infer c>1 or wrong ck’s>1
Note: parameter space of “all ck1 but c>1” impossible.
19
Multiple Testing/Analyses
c1 & all ck 1
c>1 & at least one ck>1
impossible space
c1 & at least one ck>1
Property of SFE: FE nested in another FE.
FE
Which ck>1?
Nested FE
20
Example -- Drug X
• Ws = aWu+b, where a = 1.039, b=0.409
• Critical value using Ws should be adjusted to az+b.
• False positive error rate using Ws w/o adjustment = 0.066; – Inflation = 0.066 - 0.025 = 0.041.
• Ans.: This finding is not statistically significant.
Logrank Test Observedstatistic
P-value(1-sided)
Unstratified Wu 1.762 0.039Stratified Ws 2.228 0.013
1cu0H: vs. > 1cu
1H:
for s0H
21
Figure 1: False positive rate vs. desired level (w/o adjustment)
22
Summary
• Hypotheses (unstratified or stratified or both) – should reflect what is desired to claim.– need to be prespecified in protocol.
• If stratified null is rejected, further testing required to identify in which strata treatment effect is positive.
• Strong family error rate needs to be controlled regardless of single or multiple testing.