ken cheung columbia biostatistics
TRANSCRIPT
benchmark 1
Simple benchmark for planning and evaluating complex dose finding designs
Ken Cheung Columbia Biostatistics
Cheung (2014). Biometrics 70, 389— 397
Agenda
• Dose Finding Trials – General background – Example: A phase 1/2, Eff-Tox design
• Dose Finding benchmark – Applications: Design diagnostic – (method comparison; sample size calculation) – Discussion
benchmark 2
benchmark 3
Dose Finding Trials • Phase I and phase I/II • Not parallel randomized • Small-group-sequential: Adapt after every small
cohort (e.g. 3) • General design and analysis strategy
– Observe a few – Estimate a “good” dose (model-based, myopic or not) – Treat at the good dose, and observe
Dose Finding Trials
Challenge in planning: Complexity • Assume programming correct without theoretical guidance • Pathological properties may not be detected by simulation • Difficult to reproduce by another statistician, and review
the plausibility of the simulation results
benchmark 4
benchmark 5
Some generality and notation • A pre-specified set of test levels {1, …, K} • Multinomial outcome Y:
– Yi(k) = Outcome for patient i at dose level k – Take values on L+1 possible values {w0, w1, …, wL} – Tail distribution πl(k) = Pr{Y(k) ≥ wl } for l = 1, …, L
• Objective: Estimate the target dose d(π) in {1, …, K} • Example 1: Phase I trial with binary toxicity Y = 0, 1
– π1(k) denotes toxicity probability at dose k – d(π) = arg mink | π1(k) – p | for some target p.
Example 2:Thrombolytic agent for acute stroke
• Phase 1/2 study • Trinary outcome (Efficacy-toxicity):
– Intracranial hemorrhage (Toxicity; Y=2) – Reperfusion without hemorrhage (Response; Y=1) – Neither (Y=0)
• Thall and Cook (2004): • Define desirability δ(πE,πT) as a function of response
rate πE and toxicity rate πT
• Aim to find a dose that maximizes δ(πE,πT) • dTC (π) = arg maxk δk
benchmark 6
Example 2:Thrombolytic agent for acute stroke
Thall and Cook (2004): • Outcome-adaptive • Bayesian, model-based dose finding method
– Assign patients at dose with maximum desirability based on interim data, subject to acceptability criteria
– Consider two dose-response-toxicity models: Proportional odds (PO) and Continuation ratio (CR)
Use simulation at planning: compare models
benchmark 8
Simulation results: Which model to use?
Model Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 Desirability -0.48 -0.13 0.22 0.32 -0.26 PO✔ 0 0 20 72 7 CR 0 2 32 49 16
benchmark 9
Model Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 Desirability 0.12 0.29 0.45 0.58 0.69 PO 0 2 10 34 54 CR✔ 0 0 1 5 94
Scenario 3
Scenario 4
Which model to use?
• Motivation:
– Numerical performance from simulation can be difficult to interpret without a benchmark
• Proposal: – Dose Finding Benchmark Design
benchmark 10
Dose Finding Benchmark
• Goal: A theoretical dose finding design that provides an upper limit of accuracy for any dose finding methods for a given design objective under a given scenario.
• Definition: – Recall d(π) is the target dose (estimand) – Benchmark design: d(π*) where π* is a nonparametric
optimal estimate of π based on complete outcome profile
benchmark 11
Complete outcome profile: Example 1
• In an actual trial, we observe a partial outcome profile, e.g., a patient at dose 3 with toxicity
• In computer simulation, we can observe a complete profile by generating a uniform tolerance
• The nonparametric optimal estimate π*(k) is evaluated by the proportion of toxicity at dose k in a simulated trial
benchmark 12
Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 ? ? Toxicity Toxicity Toxicity
Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 No toxicity Toxicity Toxicity Toxicity Toxicity
Complete outcome profile: General (inc. Example 2)
• Ordinal outcome Y: Takes values on L+1 possible values {w0, w1, …, wL} with tail distribution π(k) at dose k
• Yi(k) = Outcome for patient i at dose level k • In simulation, randomly draw a tolerance profile: Ui1, Ui2,
… UiL iid Uniform(0,1) • Generate complete outcome profile Yi(k) for patient i at
dose level k as follows: – Yi(k) = wl if Ui,l+1 > rl+1(k) and Uij ≤ rj for all j=1,…,l – rj(k) = πj(k) / πj-1(k)
• Nonparametric optimal π*(k) = average of I{Yi(k) ≥wl} benchmark 13
benchmark 14
Thall and Cook (2004), revisit
Model Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 Desirability -0.48 -0.13 0.22 0.32 -0.26 PO✔ 0 0 20 72 7 CR 0 2 32 49 16 d(π*) 0 0 13 85 1
14
Model Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 Desirability 0.12 0.29 0.45 0.58 0.69 PO 0 2 10 34 54 CR✔ 0 0 1 5 94 d(π*) 0 0 0 5 95
Scenario 3
Scenario 4
Thall and Cook (2004), revisit
Benchmark as “effect size”
• Benchmark d(π*) performs better in S4 than in S3 suggesting S4 is an “easier” scenario than S3; analogous to large effect size in hypothesis test
• Eff-tox using proportional odds model is idiosyncratic in that it does comparatively poorly in an easy scenario (S4).
• Continuation ratio model wins in this example
benchmark 15
Benchmark for Method Comparison
benchmark 16
p
p
pp
Benchmark accuracy (logit scale)
Eff-
Tox
met
hod
accu
racy
(log
it sc
ale)
0.5 0.88 0.98 1
00
0.010.020.050.120.270.50.730.880.95 c
c
c
c
Prop OddsCont. Ratio
Thall and Cook (2004) Eff-Tox
Least squares fit (logit scale)
mm
mm
m
m
Benchmark accuracy (logit scale)
Met
hod
accu
racy
(log
it sc
ale)
0.54 0.66 0.76
0.35
0.41
0.48
0.54
0.6
0.66
0.71
0.76
0.8
b b
b
b
b
b
Method AMethod B
Multiple toxicity
Least squares fit (logit scale)
Summary & Discussion
• The proposed benchmark is applicable to general early phase dose finding settings: – Discrete test levels, including combination therapy – Multinomial outcome (multiple tox; bivariate; etc)
• Applications: effect size; method comparison; power calculation
• Features of a good benchmark: – Easy and quick to compute (not error prone) – Nonparametric: not favoring one model over another – Upper bound of accuracy for parametric methods
benchmark 18