ken cheung columbia biostatistics

18
benchmark 1 Simple benchmark for planning and evaluating complex dose finding designs Ken Cheung Columbia Biostatistics Cheung (2014). Biometrics 70, 389— 397

Upload: others

Post on 05-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

benchmark 1

Simple benchmark for planning and evaluating complex dose finding designs

Ken Cheung Columbia Biostatistics

Cheung (2014). Biometrics 70, 389— 397

Agenda

•  Dose Finding Trials – General background – Example: A phase 1/2, Eff-Tox design

•  Dose Finding benchmark – Applications: Design diagnostic –  (method comparison; sample size calculation) – Discussion

benchmark 2

benchmark 3

Dose Finding Trials •  Phase I and phase I/II •  Not parallel randomized •  Small-group-sequential: Adapt after every small

cohort (e.g. 3) •  General design and analysis strategy

–  Observe a few –  Estimate a “good” dose (model-based, myopic or not) –  Treat at the good dose, and observe

Dose Finding Trials

Challenge in planning: Complexity •  Assume programming correct without theoretical guidance •  Pathological properties may not be detected by simulation •  Difficult to reproduce by another statistician, and review

the plausibility of the simulation results

benchmark 4

benchmark 5

Some generality and notation •  A pre-specified set of test levels {1, …, K} •  Multinomial outcome Y:

–  Yi(k) = Outcome for patient i at dose level k –  Take values on L+1 possible values {w0, w1, …, wL} –  Tail distribution πl(k) = Pr{Y(k) ≥ wl } for l = 1, …, L

•  Objective: Estimate the target dose d(π) in {1, …, K} •  Example 1: Phase I trial with binary toxicity Y = 0, 1

–  π1(k) denotes toxicity probability at dose k –  d(π) = arg mink | π1(k) – p | for some target p.

Example 2:Thrombolytic agent for acute stroke

•  Phase 1/2 study •  Trinary outcome (Efficacy-toxicity):

–  Intracranial hemorrhage (Toxicity; Y=2) –  Reperfusion without hemorrhage (Response; Y=1) –  Neither (Y=0)

•  Thall and Cook (2004): •  Define desirability δ(πE,πT) as a function of response

rate πE and toxicity rate πT

•  Aim to find a dose that maximizes δ(πE,πT) •  dTC (π) = arg maxk δk

benchmark 6

Example 2:Thrombolytic agent for acute stroke

benchmark 7

Thall and Cook (2004) K = 5 levels

Example 2:Thrombolytic agent for acute stroke

Thall and Cook (2004): •  Outcome-adaptive •  Bayesian, model-based dose finding method

–  Assign patients at dose with maximum desirability based on interim data, subject to acceptability criteria

–  Consider two dose-response-toxicity models: Proportional odds (PO) and Continuation ratio (CR)

Use simulation at planning: compare models

benchmark 8

Simulation results: Which model to use?

Model Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 Desirability -0.48 -0.13 0.22 0.32 -0.26 PO✔ 0 0 20 72 7 CR 0 2 32 49 16

benchmark 9

Model Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 Desirability 0.12 0.29 0.45 0.58 0.69 PO 0 2 10 34 54 CR✔ 0 0 1 5 94

Scenario 3

Scenario 4

Which model to use?

•  Motivation:

– Numerical performance from simulation can be difficult to interpret without a benchmark

•  Proposal: – Dose Finding Benchmark Design

benchmark 10

Dose Finding Benchmark

•  Goal: A theoretical dose finding design that provides an upper limit of accuracy for any dose finding methods for a given design objective under a given scenario.

•  Definition: –  Recall d(π) is the target dose (estimand) –  Benchmark design: d(π*) where π* is a nonparametric

optimal estimate of π based on complete outcome profile

benchmark 11

Complete outcome profile: Example 1

•  In an actual trial, we observe a partial outcome profile, e.g., a patient at dose 3 with toxicity

•  In computer simulation, we can observe a complete profile by generating a uniform tolerance

•  The nonparametric optimal estimate π*(k) is evaluated by the proportion of toxicity at dose k in a simulated trial

benchmark 12

Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 ? ? Toxicity Toxicity Toxicity

Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 No toxicity Toxicity Toxicity Toxicity Toxicity

Complete outcome profile: General (inc. Example 2)

•  Ordinal outcome Y: Takes values on L+1 possible values {w0, w1, …, wL} with tail distribution π(k) at dose k

•  Yi(k) = Outcome for patient i at dose level k •  In simulation, randomly draw a tolerance profile: Ui1, Ui2,

… UiL iid Uniform(0,1) •  Generate complete outcome profile Yi(k) for patient i at

dose level k as follows: –  Yi(k) = wl if Ui,l+1 > rl+1(k) and Uij ≤ rj for all j=1,…,l –  rj(k) = πj(k) / πj-1(k)

•  Nonparametric optimal π*(k) = average of I{Yi(k) ≥wl} benchmark 13

benchmark 14

Thall and Cook (2004), revisit

Model Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 Desirability -0.48 -0.13 0.22 0.32 -0.26 PO✔ 0 0 20 72 7 CR 0 2 32 49 16 d(π*) 0 0 13 85 1

14

Model Dose 1 Dose 2 Dose 3 Dose 4 Dose 5 Desirability 0.12 0.29 0.45 0.58 0.69 PO 0 2 10 34 54 CR✔ 0 0 1 5 94 d(π*) 0 0 0 5 95

Scenario 3

Scenario 4

Thall and Cook (2004), revisit

Benchmark as “effect size”

•  Benchmark d(π*) performs better in S4 than in S3 suggesting S4 is an “easier” scenario than S3; analogous to large effect size in hypothesis test

•  Eff-tox using proportional odds model is idiosyncratic in that it does comparatively poorly in an easy scenario (S4).

•  Continuation ratio model wins in this example

benchmark 15

Benchmark for Method Comparison

benchmark 16

p

p

pp

Benchmark accuracy (logit scale)

Eff-

Tox

met

hod

accu

racy

(log

it sc

ale)

0.5 0.88 0.98 1

00

0.010.020.050.120.270.50.730.880.95 c

c

c

c

Prop OddsCont. Ratio

Thall and Cook (2004) Eff-Tox

Least squares fit (logit scale)

mm

mm

m

m

Benchmark accuracy (logit scale)

Met

hod

accu

racy

(log

it sc

ale)

0.54 0.66 0.76

0.35

0.41

0.48

0.54

0.6

0.66

0.71

0.76

0.8

b b

b

b

b

b

Method AMethod B

Multiple toxicity

Least squares fit (logit scale)

benchmark 17 Cheung (2013): Sample size formulae for CRM

Benchmark for “Power” Calculation

Summary & Discussion

•  The proposed benchmark is applicable to general early phase dose finding settings: –  Discrete test levels, including combination therapy –  Multinomial outcome (multiple tox; bivariate; etc)

•  Applications: effect size; method comparison; power calculation

•  Features of a good benchmark: –  Easy and quick to compute (not error prone) –  Nonparametric: not favoring one model over another –  Upper bound of accuracy for parametric methods

benchmark 18