trial design and statistics for beginnerscmpath.ncri.org.uk/wp-content/uploads/2018/02/bliss.pdf ·...

CMPath Training 9 Feb 2018

in partnership with

Trial design and statistics for beginners

Professor Judith Bliss

Director ICR-Clinical Trials & Statistics Unit (ICR-CTSU), The Institute of Cancer

Research

NCRI Cancer CTU Group – Chair of Directors Group

https://www.youtube.com/watch?v=Hz1fyhVOjr4

(courtesy of Sue Hilsenbeck)

What is a clinical trial?

“A clinical trial may be defined as a carefully designed,

prospective medical study which attempts to answer a

precisely defined set of questions with respect to the effects of

a particular treatment or treatments.”

R. Sylvester 1984

Where is the laboratory?

The NHS & cancer centres worldwide

Phases of clinical trials 3

Phase 1: TOXICITY

What is the maximum tolerated dose (MTD)? safety, 3+3 vs. more complex dose escalation procedures eg

continual reassessment methods (CRM), size of expansion cohorts

Phase 2: ACTIVITY

Does it do anyone any good? establishing sufficient evidence of activity to justify phase III, formal

stop/go criteria, single group or randomised

Phase 3 THERAPEUTIC BENEFIT

Is it any better than existing treatment? efficacy comparison with standard of care, robust results with the

potential to change practice, choice of endpoints, risks & benefits

Ultimate goal is to change routine clinical practice & target

treatment towards those patients with the most to gain

Phase I Clinical Trials 4

Dose-finding: to determine Maximum Tolerated Dose (MTD)

Conduct

Common designs include 3+3, rolling six, continual reassessment

method (CRM)

Endpoints

Tolerability, pharmacodynamics, pharmacokinetics, bioavailability

First experiments in humans

For cancer treatment - usually in patients whose standard therapy has

failed i.e. those with late stage advanced disease

Phase II Clinical Trials 5

• To determine if drug has therapeutic activity

• To identify promising new agents / interventions / techniques for further

investigation / potential predictive biomarkers

Conduct

• Historically, single arm studies of 20-80 patients

• Single stage Fleming or A’Hern design

• Two stage - Simon, Gehan designs - allow trial to be terminated at end

of first stage if drug is clearly inactive

Endpoints

• Use quickly-available endpoints

• Usually “tumour response” (pCR or ORR) and/or safety

• Emergence of “biomarker” endpoints (molecular & imaging)

Phase II – Simon two-stage design 6

Identify boundaries of treatment utility:

p0: response level at which the treatment does not warrant further investigation

p1: expected (true) response level where the treatment warrants further

investigation (i.e. development of a phase III)

Stage 1:

Recruit N1 patients

if ≥n1/N1 responses seen,

continue to stage 2

if <n1/N1 responses seen,

Stage 2:

Recruit and additional N2

patients so that N1 + N2 = N

if ≥n/N responses seen,

treatment warrants

further investigation

if <n/N responses seen,

Phase II – Example: CORAL 7

Phase II Clinical Trials 8

Randomised phase II…

Move towards introduction of early randomised comparison against

current standard to assess feasibility + tolerability + activity - complex trial

designs: integrated phase II / phase III

Why not single group studies?

Single group trials prone to selection bias

No real allowance for imprecision in historical estimate of response used

to base phase II sample size on

“Modest” treatment effects may be lost in study bias or imprecision of

historical information

But….can be useful when cancer or subgroup of interest is uncommon or

for proof of principle studies

Phase III Clinical Trials 9

Reliable

• Large enough to give precision to estimate treatment benefit

• Study a group of patients who are broadly representative

• Experimental treatment differs sufficiently from control to have

reasonable chance of influencing chosen endpoint

• Trial may test superiority or non-inferiority of new treatment

• to determine how new treatment compares with existing therapy

Conduct

• Unbiased, reliable, clinically informative, randomised comparison

Endpoints

• Direct measure of patient benefit

• Disease-free survival, Progression-free survival, Overall Survival

• Identify adverse risk vs benefit profile (& cost effectiveness)

• Translational research to identify patients with most/least to gain

Treatment is allocated at random

Why randomise?

Randomisation avoids bias

Each subject has an equal chance

of receiving the experimental or

control treatment, regardless of

clinical / demographic

characteristics.

Treatment groups similar overall -

when sample size large enough to

avoid random errors

Randomisation – to balance for

the factors which cannot be

measured.

Only systematic difference between

groups is the treatment

If outcomes for group A better than B,

likely due to treatment, as two groups

similar

Results = improvement in treatment

Treatment A Treatment B

Common phase III trial designs 11

Parallel groups: “between-

patient” comparisons - each

patient receives 1 intervention

Compare A vs B

Compare A vs B and

C vs D

Interactions

Compare A vs B within

groups of patients

Interaction w/marker

A C B C

Factorial: ≥ 2 treatment

comparisons in same trial

without necessarily increasing

Stratified designs:

1 intervention,

comparison within

subgroup Str

Marker +

Marker - R A

Biomarker designs

Modified marker based design

Marker based strategy design

Marker based

strategy

Non-marker

based strategy

Marker based

strategy

Non-marker

based strategy

Targeted design (enrichment)

Marker +

Marker -

Off study

Marker stratified design (marker-treatment interaction)

fy Marker +

Marker - R A

Biomarker trial - Example

Off study

FOCUS4: Molecularly stratified, multi-centre randomised trial programme

for patients with advanced/metastatic colorectal cancer

14 A randomised trial utilising ctDNA mutation tracking to detect minimal residual disease and trigger intervention in patients with moderate and high risk early stage triple negative breast cancer

c-TRAK TN

Background: • Circulating tumour DNA (ctDNA) is

released when tumour cells die. ctDNA detection in patients who have completed treatment for early BC may be highly predictive for future relapse.

• Pembrolizumab = anti PD-1 immunotherapy. Blocks interaction between PD-1 and it’s ligands which are often overexpressed in TNBC and therefore prevents immune system suppression.

• c-TRAK TN aims to show whether ctDNA tracking can predict relapse, and whether use of pembrolizumab may result in sustained clearance of ctDNA.

Recruitment duration: 18 months Number of centres: 17

Recruitment target:

ctDNA screening: 200 patients (~150 with trackable

mutations)

Randomised: 36-68 (dependant on rate of ctDNA positivity)

What are adaptive trials?

“use accumulating data to decide on how to modify aspects of the study

without undermining the validity or integrity of the trial” (Dragalin 2007)

• Adaptive by design not a license for adhoc changes of trial conduct &

analysis nor a remedy for poor planning

• Platform trials / umbrella protocols / basket trials

• Change trial based on early safety, activity or efficacy data / decision

• Changes could include

Change dose of treatment (phase I)

Change allocation ratio of control:research

Early stopping for benefit

Early stopping for lack-of-(sufficient)-benefit

Adding in new treatments – via randomisation or as additional cohorts

Multi-Arm Multi-Stage designs 16

Figure courtesy of STAMPEDE website

The MAMS design rolls the

phase 2 assessment of the

activity of combination

therapy into the same trial as

the phase 3 assessment of

effectiveness

Thus saving time and

resources

MAMS designs - Example 17

Hormone therapy (HT) aloneA

HT + zoledronic acidB

HT + docetaxelC

HT + celecoxibD

HT + zoledronic acid + docetaxelE

HT + zoledronic acid + celecoxibF

Control

Eligible patient with prostate cancer

Hormone therapy (HT) aloneA

HT + zoledronic acidB

HT + docetaxelC

HT + celecoxibD

HT + zoledronic acid + docetaxelE

HT + zoledronic acid + celecoxibF

Control

Eligible patient with prostate cancer

Figure from STAMPEDE trial website, MRC CTU

STAMPEDE: Systemic Therapy in Advancing or Metastatic

Prostate cancer: Evaluation of Drug Efficacy

Advantages and disadvantages of this design?

Actionable mutation identified in ctDNA screening

Blood sample sent to central laboratory for analysis by digital PCR using

ctDNA assays for hotspot mutations in ESR1, HER2, AKT1 and PIK3CA,

with HER2 copy number assessment

Platform trial: plasmaMATCH trial 18

PIK3CA mutation status

also reported to facilitate

entry into trials outside of

the context of

plasmaMATCH

Cohort A

ESR1 mutation

Treat with extended-

dose fulvestrant

Cohort B

HER2 mutation

Treat with neratinib

(plus fulvestrant in

ER+ BC)

Cohort C

AKT1 mutation in

ER+ BC

Treat with AZD5363

+ fulvestrant

Cohort D

AKT activation basket

mutation identified in ctDNA or

in tumour tissue sequencing

outside of plasmaMATCH

Treat with AZD5363

Future Cohorts

May be added by substantial

amendment in the case of

diagnostic/

therapeutic targets as the trial

progresses

Patients with metastatic or recurrent locally advanced breast cancer

eligible for ctDNA screening within plasmaMATCH are registered for

screening component

Eligible patients enter therapeutic component for treatment with targeted

agent until disease progression

Number of centres ~50 UK Screening Sites, of which ~25 sites will also be Treatment Sites

Recruitment target • ctDNA screening component: ~1000 patients with metastatic or recurrent locally

advanced BC who have received prior systemic treatment in the advanced setting

• Therapeutic component: Cohort A – 40 patients; Cohorts B–D – 16 patients in each

Trial considerations: Types of data 19

Clinical

• Baseline characteristics

• Treatment details

• Adverse events/side effects

• Disease related outcomes

• QoL questionnaires

• Patient diaries

Translational

• Biomarkers using blood

(e.g. BRCA, ctDNA, CTCs)

• Biomarkers using tumour

tissue (e.g. Ki67, HRD)

• Gene expression data

• SNP arrays

• Whole exome sequencing

• PK/PD

• Photographs

• Physics/dosimetry

• Imaging

• Health economics

• Qualitative

Trial considerations: effect size

Superiority

what is the minimum clinically important improvement in efficacy with new

treatment compared with standard treatment?

• e.g. treatment A is at least 6% better than treatment B

Non-inferiority

show that new treatment is not worse than standard by more than pre-

specified, small amount (non-inferiority margin)

• e.g. treatment A is no more than 3% worse than treatment B

Smaller effect size → larger sample size

Trial considerations: Analysis populations - Intention to treat (ITT) As a general rule (phase III), all patients randomised should be analysed by randomly allocated treatment.

• Avoids bias. People who receive non-allocated treatment likely to be a selected subset (e.g. side effects, more severe disease etc.) Ignoring them excludes this type of person from one treatment arm and not the other, which affects the comparison of treatment effect.

• Is more pragmatic. Gives an idea of what would happen if the treatment were to be adopted as standard practice in the real world.

‘What happens to the population on average if new drug becomes policy’ not ‘What happens to the average patient who gets the drug’

Intention to treat analysis - Example 22

• RCT comparing nurse-led vs. conventional medical follow-up for patients

with lung cancer

• Primary outcome measures: patient satisfaction & quality of life

• Followed-up to 1 year after randomisation

• In practice, many patients allocated to nurse-led follow-up were referred

back to medical team for symptom control or further palliative treatment

What would be the result of not analysing these data according to ITT?

Conventional medical follow-up group included severely ill patients who had

been referred from nurse-led group → patients who had only received nurse-

led follow-up would generally be those with better prognosis

Trial considerations: Analysis populations – Per Protocol (PP)

Usually excludes patients who have any major protocol violations and

analysis is by treatment actually received

Often used for safety analyses – better estimate of absolute risks

Often used for non-inferiority trials because

including data for patients who did not receive protocol treatment (ITT)

• tends to bias results toward equivalence

• could make a truly inferior treatment appear non-inferior

But bias! – excluding patients fails to maintain the integrity of randomisation

Non-inferiority trials should be analysed both by ITT & PP

• Want to see the same result in both analyses

Statistical considerations in clinical trials

At the concept/design stage (pre-funding application)

Trial design:

Treatment allocation method – randomisation / blinding

Stratification variables - centre / biomarkers

Protecting against other sources of bias

Endpoints – clinically informative, reliable & valid measurement?

Sample size – study appropriately powered random error

• Power (1-β) = probability of detecting a difference if such a difference truly exists

• Significance level (α) = probability of concluding there is a treatment effect when

no true effect exists

Power = 80%- 90% α = 0.05 (usual)

CMPath Training 9 Feb 2018 Clinical Trials - Lucy Kilburn & Holly Tovey – 12/01/2018

Statistical considerations in clinical trials

Statistical Analysis Plan defines plans and scope for During the running of the trial Trial monitoring • Data quality & completeness Interim analyses (for review by Independent Data Monitoring Committee)

• Review of emerging data - safe & ethical to continue? • Futility assessment Analysis

Analysis of primary endpoint

• maturity of data, ITT or PP populations

Estimate of treatment effect & of precision of estimate

• 95% confidence interval

Subgroups/exploratory or hypothesis generating analyses

• Multiplicity Adjustment

Trial considerations: Null hypothesis 26

• It is simpler to set out to disprove a hypothesis than to prove it

e.g. in a metastatic breast cancer trial of A vs B:

Response rate A = 53% Response rate B = 20%

The null hypothesis is that the treatments are equally effective in the

population of all metastatic breast cancer patients (there is no true

difference in response rates)

The alternative hypothesis is that there is a true difference in response

rates for A & B.

Note: difference could be in either direction; alternative hypothesis is “2-

sided”

• The declaration of a null hypothesis (often a statement that there is no

difference) is the first step in any statistical test of significance

Statistical fundamentals: Significance test 27

• After defining the null hypothesis, the main question is:

If the null hypothesis were true, what are the chances of getting a

difference at least as big as that observed?

e.g. in the breast cancer trial, if there really is no true difference

between the 2 drugs in terms of tumour response, what is the

probability of observing a treatment difference as large (or even

larger than) 53% versus 20%?

• This probability (the p-value) is determined by applying an

appropriate statistical significance test

• There are different significance tests for different types of data, but

the principle is the same

Statistical fundamentals: What is a p-value? 28

• p-value: probability of obtaining data as extreme (or more extreme)

than that observed, if null hypothesis is true

• The smaller the p-value, the less likely the sample results could have

occurred by chance if the null hypothesis were true (greater statistical

significance)

• Statistical tables provide p-values for statistical test values

“Degrees of freedom“ relates

to number of observations,

number of categories being

compared etc (depends on the

Statistical fundamentals: Significant or not significant?

Arbitrary cut-off of p<0.05 often used to indicate statistical significance,

but better to present exact p-values & interpret accordingly

e.g. would you interpret p=0.04 very differently from p=0.06?

Note!!!

“Not significant” does not automatically mean that there is no actual

difference (we can’t prove the null hypothesis), but merely that we have

been unable to show evidence of a difference with certainty

i.e. “No evidence of an effect” is NOT the same as “evidence of no effect”

– this is subtle but important

Reasons for non-significant results include: no true difference in the

population, sample size may be too small, estimates too imprecise, bias

Statistical fundamentals: Statistical versus clinical significance

Size of the p-value depends on observed difference & sample size

• If sample size is small, results may produce a p-value which is not

statistically significant, even if there is actually a large true difference

• If sample size is large, small observed differences (which may be

clinically irrelevant) may achieve statistical significance

• Need to think about what size differences are clinically important in

order to interpret statistical significance results sensibly

e.g. supposing we found a mean difference in age of 5 years between 2

groups of patients

In a small study, this difference might not be statistically significant, but in

a much larger study might be highly statistically significant. So what?!

Need to use clinical judgement to decide whether 5 years is clinically

important (not a statistical decision)

Statistical fundamentals: Confidence intervals & hypothesis testing (1)

Significance tests help us decide whether or not study results are compatible

with a hypothesis

BUT they provide no information on the size of the difference

e.g. in the breast cancer trial, the 33% difference in tumour response rates

was statistically significant with p<0.001

Confidence intervals help us to estimate the size of the difference with some

measure of precision

e.g. 95% CI for the 33% difference in response rates in the breast cancer

trial is:

95% Confidence Interval (20.5% to 45.5%)

i.e. we are 95% confident that real difference between A & B tumour

response is between 20.5% & 45.5%

Statistical fundamentals: Confidence intervals & hypothesis testing (2)

• There is a link between p-values & CIs

• If 95% CI for a difference does not include the null hypothesis value of 0,

p<0.05

• If this CI includes the null hypothesis value, p>0.05

In the e.g., the null hypothesis is that there is no difference between the

tumour response rates in the population (i.e. the null hypothesis value = 0)

Does the 95% CI for the 33% difference in response rates include 0?

No (20.5% to 45.5%), so we can infer that p<0.05

Statistics is …about understanding data

It is NOT just about hypothesis testing and p-values - a statistically significant result may not be clinically important or vice versa

Confidence Intervals (95%) give information on the precision and clinical significance of an observed effect

Subgroup analyses

– open to abuse and mis-interpretation “the more you look the more you find” – adjustment for multiple testing, biological plausibility,

– quantitative vs qualitative treatment interactions - if overall trial no effect, identification of sensitive subgroup implies subgroup where treatment confers harm

And finally…..

Correlation is a measure of association – not of agreement (Bland-Altman methods required for Method Comparison analyses)

Statistics – the fundamentals

trial design and statistics for beginnerscmpath.ncri.org.uk/wp-content/uploads/2018/02/bliss.pdf ·...

Documents

tatung's superiority

barca's superiority

non-inferiority trials.tct-korea 2009.show 2 · increasing...

issues on design of clinical trials · superiority...

bliss - united notions · bliss.pdf 6bliss2.pdf 6 5/4/2010...

superiority of nava

trial objectives superiority, non-inferiority, and...

chapter 11. union advantages numerical superiority better...

the superiority of sayyids

training superiority & training surprise

optical superiority

visualization for decision superiority

ground-up superiority

george therapondos, mb chb, frcpc, mph transplant … ·...

clinical trials - ucla ctsi · clinical trial designs •...

nutritional superiority of organic foods

the superiority of economists

superlative of superiority

mobi-c demonstrated superiority in overall trial success...

trial title: azithromycin versus usual care in ambulatory...