bayesian hypothesis testing

Clinical trial monitoring withBayesian hypothesis testing

John D. CookValen E. Johnson

August 6, 2008

Estimation and Testing

I Bayesians typically approach a clinical trial as an estimationproblem, not a test.

I Possible explanation: poor operating characteristics . . .

I Unless you choose your alternative prior well.

Local prior operating characteristics

I Point null hypothesis versus alternative prior that assignspositive probability to the null

I When simulating from the alternative, Bayes factor in favor ofalternative grows like en.

I When simulating from the null, Bayes factor in favor of nullgrows like n1/2.

I Hard to ever reject the null.

Inverse moment priors (iMOM)

0.0 0.2 0.4 0.6 0.8 1.0

π1(θ) ∝ (θ − θ0)−ν−1 exp(−λ(θ − θ0)−2k

)[θ > θ0]

iMOM Convergence rates

When simulating from alternative

p limn→∞

n−1 log BFn(1|0) = c > 0.

(Well known result.)

When simulating from null,

p limn→∞

n−k/(k+1) log BFn(1|0) = c < 0.

(New result.)

Thall-Simon method

I Historical standard: θS ∼ Beta(aS , bS). Parameters aS andbS large.

I Experimental treatment: θE ∼ Beta(aE , bE ) a priori, aE andbE small.

I Stop for inferiority if P(θE < δ + θS | data) is large.

I Stop for superiority if P(θE > θS | data) is large.

I Operating characteristics degrade without δ > 0.

I Inconsistent in limit: both stopping rules could apply.

Thall-Simon plot

0.0 0.2 0.4 0.6 0.8 1.0

Beta(60, 140) historical, Beta(12, 18) experimental

Comparing Bayes factor with Thall-Simon

Historical response 20%, alternative 30%. Fifty patients maximum.

Bayes factor design:

I H0: θ = 0.2

I H1: iMOM prior with mode 0.3.

I Stop for inferiority if P(H0 | data) > 0.9.

I Stop for superiority if P(H1 | data) > 0.9.

Comparing Bayes factor with Thall-Simon, cont.

Thall-Simon design:

I θS ∼ Beta(200,800)

I θE ∼ Beta(0.6, 1.4) a priori

I Stop for inferiority if P(θS > 0.1 + θE | data) > 0.976.

I Stop for superiority if P(θE > θS | data) > 0.99.

Calibrated to match probability of stopping for wrong reason atnull and alternative.

Stopping for inferiority

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

true response probability

prob

abili

ty o

f con

clud

ing

infe

riorit

y● ●

●

●

●

●

●

●●

● ● ● ● ● ● ● ● ● ● ● ●

● Thall−SimonBayes factor

Stopping for superiority

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0


prob

abili

ty o

f con

clud

ing

supe

riorit

y

● ● ●●

●

●

●

●

●

●

● ● ● ● ● ● ● ● ● ● ●

● Thall−SimonBayes factor

Thall-Wooten time-to-event method

I Analogous to Thall-Simon method for binary outcomes.

I t | θ ∼ exponential with mean θ, θ ∼ inverse gamma

I Stop for inferiority if P(θS + 0.1 > θE | data) large . . .

I Stop for superiority if P(θE > θS | data) large

Comparing Bayes factor and Thall-Wooten method

Standard treatment 6 months PFS, alternative 8 months,maximum 50 patients

Bayes factor design:

I H0: θ = 6

I H1: iMOM prior with mode 8.

I Stop for inferiority if P(H0 | data) > 0.9.

I Stop for superiority if P(H1 | data) > 0.9.

Comparing Bayes factor and Thall-Wooten method, cont.

Thall-Wooten design:

I θS ∼ Inverse Gamma (20,1200)

I θE ∼ Inverse Gamma(3, 12) a priori

I Stop for inferiority if P(θS + 2 > θE | data) > 0.976.

I Stop for superiority if P(θE > θS | data) > 0.93.

Calibrated to match probability of stopping for wrong reason atnull and alternative.

Stopping for inferiority

2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

true mean survival time

prob

abili

ty o

f ear

ly s

topp

ing

for

infe

riorit

y● ● ● ● ● ●

●

●

●

●

●

●

●

●●

● ● ● ● ● ●

● Thall−WootenBayes factor

Stopping for superiority

2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

true mean survival time

prob

abili

ty o

f ear

ly s

topp

ing

for

supe

riorit

y

● ● ● ● ● ● ●●

●

●

●

●

●

●

●

●● ● ● ● ●

● Thall−WootenBayes factor

Comparison with Simon two-stage design

Simon two-stage design to test null response rate 0.20 versusalternative rate 0.40.

Reject 95% of the time under null, 20% under alternative.

Maximum of 43 patients: 13 in first stage, 30 in second stage.

Comparison with Simon two-stage design:rejection probability

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0


prob

abili

ty o

f rej

ectin

g tr

eatm

ent

● ● ● ●

●

●

●

●

●

●

●● ● ● ● ● ● ● ● ● ●

● SimonBayes factor

Comparison with Simon two-stage design:patients used

0.0 0.2 0.4 0.6 0.8 1.0

010

2030

40


patie

nts

●●

●

●

●

●

●

●

●

●●

● ● ● ● ● ● ● ● ● ●

● SimonBayes factorNaive Simon

References

I Valen E. Johnson, John D. Cook. Bayesian Design ofSingle-Arm Phase II Clinical Trials with ContinuousMonitoring. Clinical Trials 2009; 6(3):217-26.

I Software: http://biostatistics.mdanderson.org

I http://www.JohnDCook.com

bayesian hypothesis testing

Documents

q q q q q q q q q q

q q q q q qq q q q

q q q qq0

q q thallsimon0

q thallwooten bayes

q bayes factor0

q bayes factor qq0

thallsimon design