statistical methods in clinical trials

Statistical Methods in Clinical Trials

Ziad Taib

Biostatistics

AstraZeneca

February 22, 2012

http://www.astrazeneca.se/

Types of Data

ContinuousBlood pressureTime to event

Ordered CategoricalPain level

DiscreteNo of relapses

Categoricalsex

quantitative qualitative

Types of data analysis (Inference)

ParametricVs

Non parametric

FrequentistVs

Bayesian

Model basedVs

Data driven

Steps in data analysis and presentation of results

• Data cleaning.

• Descriptive analyses.

• Statistical Inference.

• Reporting and presenting the results.

Continuous Data

Part I Classical InferencePart II Bayesian Inference

Part IClassical Inference

1. One sample

2. Paired data

3. Two samples

4. Many samples» One way anova» Two way anova» Ancova

One sample

From a population of patients we draw a sample of size n of patients and measure some response on each of these. We want to test whether the average response in this population is equal to (or is larger or smaller than) some predefined value μ0 (e.g. the corresponding value in the healthy population).

Estimation• Assume is a sample (i.i.d) of

size n from some distribution F with mean μ and standard deviation σ. Then

and

• Are (i) unbiased and (ii) consistent estimators of μ and σ² i.e.

(i)

(ii) (LLN)

Estimation in general

• The maximum likelihood method

• The moment method

• Uniformly Minimum Variance Unbiased estimators

• Least squares

• etc

Hypothesis testing

We can e.g. test the null hypothesis

using the test statistic

when n is large, T follows, under the null hypothesis, the standard normal distribution (CLT).

against

Hypothesis testing

For moderate values of n assuming

follow a normal distribution, then

follows, under the null hypothesis, the t (student) distribution with (n-1) degrees of freedom.

The true distribution among the patients(exponential with mean 0.5.)

The distribution of the average whenn is large

Confidence intervals• A (1-α)100% confidence interval for μ has

the form

(i)

(ii)

when n is large

when n moderate but sample normally distributed.

A hypothesis (or a confidence interval) can be one sided or two sided

95% confidence interval

1-α=0.95

The normaldistribution

The tdistribution

1-α=0.95

Correspondence TheoremHypothesis testing and confidence intervals

are two sides of the same coin!

Parameter value according to null hypothesis not included in

confidence interval (α)

Reject null hypothesis (α)

Paired data

• Sometimes we measure a response at baseline and at the end of the treatment. A suitable analysis can be to consider the endpoint changes from baseline.

• and to use the (one sample) test statistic

Two independent samples

From two populations/distributions we draw two samples of sizes n1 and n2 of patients and measure some response on each of these. We want to test whether the average responses, μ1 and μ2, in these two populations are equal. The population variances are denoted by σ1² and σ2² respectively.

H )null(0 H )alternative(A

1 2 1 2

1 2

1 2

1 2

1 2

Two samples (continued)

• When both sample sizes are large we can use the central limit theorem according to which the test statistic below follows the standard normal distribution

• When the sample sizes are moderate but normal with equal variances we can use the t-statistic (having n1+n2-2 d.f.) and the pooled variance.

• The remaining case can be handled using a method known as Satterwaite’s confidence intervals.

Many samplesOne way analysis of variance

• Here we want to compare several (k) groups (treatments) with respect to the average response in each group. The following model is assumed

Anova1

Anova1

• Main point: individuals only differ with respect to one criterion: group belonging.

• Intuitively: Looking at the pictures we see that the variation says something about the difference between the cases where the groups are different and when they are similar.

Anova1 continued

• The deviation of an individual observation from the overall mean can be described by

• The null hypothesis and the alternative hypothesis can be formulated as

SST/SST SSA/SSB SSE/SSW

Anova 1 continued

• Manipulating these gives:

MSA

• When the null hypothesis is false we expect the ratio MSA/MSE to be large. To calculate exact significance levels we use the fact that it F-distributed with (k-1,N-k) d.f.

MSE MSA

Anova table

Source of variation

Sum of squares

d.f. Mean

squares

F/p-value

Treatment SSA K-1 MSA F0=MSA/MSE /p

Error SSE N-k MSE

Total SST N-1

• Example: 3 groups with the same mean 20 and the same standard deviation 2. We reject for large values of F0.

Source of variation

Sum of squares

d.f. Mean squares

F0/p-value

Treatment 9.56 2 4.78 1.22/ 0.29

Error 4671.1 1197 3.9

Total 4680.65 1199

F is F-distributed and F0 the value of MSA/MSE

We cannot reject

• Example: 3 groups with means 20, 20 and 20.5 and the same standard deviation 3.

Source of variation

Sum of squares

d.f. Mean squares

F/p-value

Treatment

59.4 2 29.7 3.25/ 0.039

Error 10942.8 1197 9.14

Total 11002.3 1199We can reject

We can reject

Multiple comparisons

• When we reject the null hypothesis we only know that the groups are not equal but some of them might still be. To find out more we have to consider all the pairwise comparisons between the groups. With k groups this gives m=k(k-1)/2 such comparisons. How do we do this and still have a reasonable overall significance level? The simplest way to deal with this is using Bonferroni’s inequality. This implies that when performing m tests if each test is at the 1-α/m level then the tests taken simultaneously will be on the 1- α level. We will deal with problem in detail later on.

Two way anova

• Here we assume that individuals can differ with respect to two factors (two drugs, treatment and centre). The following model is usually suitable for this situation

• As before the deviation of an individual observation from the overall average can be decomposed into terms related to the effects of treatment, center, treatment by center as well as to a pure random component.

Anova2

• We use a similar notation as before

• Tests of treatment effect, center effect or treatment by center interaction can be performed by using the appropriate ratio between mean square values.

Anova2

• As an example assume we want to test if there is a treatment effect:

• This can be tested using the test statistic

• Which has under the null hypothesis an F distribution with (a-1) and ab(n-1) degrees of freedom.

• The various tests are summarized in the following table

Two-way anova

Analysis of covariance

• Here we assume that individuals can differ with respect baseline values. It is sometimes desirable to adjust the model for the endpoint measures Y so baseline values X are taken into account. The following model can then be useful

The analysis uses an F distribution based onMean sums of squares (cf. p.319)

39

Various forms of models and relation between them

LM: Assumptions:

1. independence,

2. normality,

3. constant parameters

GLM: assumption 2) Exponential family

LMM: Assumptions 1) and 3) are modified

GLMM: Assumption 2) Exponential family and assumptions 1) and 3) are modified

Repeated measures: Assumptions 1) and 3) are modified

Longitudinal dataMaximum likelihood

Classical statistics )Observations are random, parameters are unknown constants(

Bayesian statistics, parameters are random

LM - Linear model

GLM - Generalised linear model

LMM - Linear mixed model

GLMM - Generalised linear mixed model

Non-linear models

Mixed models, both random and constant parameters

40

The Linear Mixed-effects Model

• The linear mixed effects model is quite flexible and does not need balance, independence etc. Usually some version of maximum l likelihood is used for the inference

Average evolution Subject specific

Part IIBayesian Inference

What is a probability?

1. the limit of a relative frequencyProblem: Not all events all repeatable!

2. the degree of plausibility Or of beliefProblem: subjective?

P=1/6P=?

Thomas Bayes (1702 to 1761)

Bayes in brief

• Bayes’ conclusions were accepted by Laplace in 1781

• Rediscovered by Condorcet, and remained unchallenged until

• Boole questioned them. • Since then Bayes' techniques have

been subject to controversy.

Is Classical Inference Really Classical?

• Only in the sense that it started a realtively long time ago with Fisher around 1920.

• Bayesian Statistics is even more classical since it was dominating until then.

• Classical inference was introduced as a way to introduce objectivity in the scientific process.

Simplified scientific Process

1. A theory or a hypothesis needs to be tested

2. We perform an experience and obtain some data

3. Are our data in agreement with our theory?

4. If the answer to the above question is no, we reject the theory. Otherwise we cannot reject it!

Classical Paradigmvs

The Bayesian paradigm

• The classical paradigm is based on the consideration of

P[ Data | Theory ] (1)

• How likely is the data if the theory was to be true?

Classical vs Bayesian (cont’d)

• The Bayesian paradigm is based on the consideration of

P[ Theory | Data](2)

• How much support or belief is there in the theory given the data?

Bayes’ Formula

Where

T = Theory

and

D = Data

Simple formula with many interesting implications.

D] P[

] T P[ T] | D P[ D] | T P[

States that

(3)

Implications of Bayes’ Formula

1. (1) and (2) are not equivalent.

2. To work out (2) we have to estimate P[ T ] i.e. we need to put a probability on our belief in the theory we are testing.

3. We cannot make P[ T ] disappear. (Similar to the uncertainty principle in quantum mechanics?)

Is subjective always bad?

In our everyday life we are far from being always objective. Not even when it comes to scientific issues. Nevertheless we are afraid of incorporating subjectivity into the scientific process.

This is the reason why Bayesian inference is perceived as being controversial.

How to deal with subjectivity?

1. Sometimes we do have some apriori information (earlier studies, similar drugs, expert opinions etc).

2. But even if we do not have any prior knowledge, we can study the effect of having various degrees of belief . (D. Spiegelhalter)

3. We can also invert the problem and ask what prior belief in the theory is required to make our data coherent with that theory?

How to deal …

4. The role of the apriori information becomes negligible as data accumulates. It can be shown mathematically that ultimately all investigators will reach the same conclusion regardless of what prior information they had from the beginning. (for a mathematical argument cf. O’hagan, 1994). In the meanwhile the sceptic will just need more time/convincing/data!

A textbook example

• Let X be a sample from N(,) of size n.• Let the apriori distribution (be N(,) which

is the so called conjugate distribution.• Then the aposterior distribution is of the

following form

22

2

22

2

),)1((

)dt | (t)P(

) ( ) | P( ) | (

n

n

n

nxN

t

x

xx

Comments

• There are problems in choosing the apriori distribution.

• The importance of the apriori distribution decreases as the sample size increases.

• Some toxin is associated with certain symptoms. denotes the toxin level in patients having the symptoms and 0 the toxin level in healthy individuals. We want to test if there is a difference. A test can be based on a sample of size n through

Example: p-values

Under the null hypothesis

H0: = 0,

z is an observation from a t-distribution with n-1 degrees of freedom.

If, moreover, the p-value is less than 0.05 it is then customary to consider the result as significant, i.e. we reject the null hypothesis.

We return to this in moment!

Preliminaries

• Consider two Hypothesis H0 and H1

D] P[

] H P[ ]H | D P[ D] | H P[ 00

0

D] P[

] H P[ ]H | D P[ D] | H P[ 11

1

• We consider the odds ratio between the two

]H | D P[

]H | D P[

] H P[

] H P[

D] | H P[

D] | H P[

1

0

1

0

1

0

Odds ratio = prior odds x likelihood ratio

New old Bayes factor

Result

1

0

00 BF] H P[

] H P[-11 D] | H P[

]H | P[D

]H | P[D BF

1

0

where

Back to the t-test

P. M. Lee Bayesian Statistics: An Introduction

2nd Ed. London: Arnold, p. 131 (1997)

shows that in the case of the t-est and under quite

general conditions:

2

2

BFz

e

Consequences

• Assume that there is no agreement on the effect of the toxin so we take P(H0)=0.5. Then:

1

2

z-1

2

z-0

2

2 e1

e

11 D] | P[H

• Assume further that the value of z in the experiment turns out to be 2. Since this leads to a p-value of 0.044 we conclude that the result is significant at the 0.05 level. Setting z=2 in the formula above leads to:

12.0e1 D] | P[H

1

2

2-

0

2

Z=2P

[ H

0| D

]

P[ H0]

P[

H0|

D]

P[ H0]

Conclusions

• By introducing the element of degree of belief about a theory, we arrive at conclusions that do not agree with those obtained using the frequentist approach, i.e. prior knowledge matters

• Prior knowledge is part of the Bayesian approach.

Backup slidesBayesian Analysis in Clinical Trials

Number of Bayes Articles in Medical Journals

Robert A.J. Mathhews (1998) The use and abuse of subjectivity in scientific research – Part 2. Working paper for the European Science and Environment Forum

In short, Bayesian inference provides a coherent, comprehensive and strikingly intuitive alternative to the flawed frequentist methods of statistical inference. It leads to results that are more easily interpreted , more useful, and which more accurately reflect the way science actually proceeds. It is, moreover, unique in its ability to deal explicitly and reliably with the provably ineluctable presence of subjectivity in science.

These features alone should motivate many working scientists to find out more about Bayesian inference in their own research.

What for?

• The greatest virtue of frequentist clinical trials is extreme rigor and focus on the experiment at hand.

• But they tend to become large and expensive and some patients receive inferior experimental therapies.

• The Bayesian approach is gaining terrain and has more and more advocates hoping to address these and other challenges

• The Advantages:– Formal system for incorporating existing information– Natural approach to inference– Generally more efficient– Well suited for decision making

• The Challenges:– Determining appropriate prior probabilities– Computational complexity– Lack of familiarity– Lack of software tools

Bayesian Framework

What is the difference?

Approach Traditional Bayesian

Question How likely are the trial results given there really is no difference among treatments?

How likely is it that there is a true difference among treatments given the trial data?

Approval based on Pivotal trial (Hypothesis testing)

Weight of evidence (revising beliefs in light of new evidence)

Design Single stage Adaptive

In what settings have Bayesian desings been used?

• Pharmacokinetics

• Phase I (CRM)

• Phase II (Thall/Simon)

• Phase III (Spiegelhalter, Parmar, others)

• Meta-analysis

• Medical device clinical trials (guidelines from 2006)

Summary of applications

• Methods for incomplete data

• Adaptive designs (including adaptive sample size)

• Confirmatory trials

• Evaluating a modified version of an approaved product

• Synthesising data in post market surveillance

Recommended Reading

1. Berry DA (2006). Bayesian clinical trials. Nature Reviews: Drug Discovery. 5: 27-36; 2006.

2. Goodman, S.N. Toward Evidence-Based Medical Statistics: The P Value Fallacy. Annals of Internal Medicine. 130:995-1021; 1999.

3. Spiegelhalter DJ, Keith R, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. John Wiley & Sons, Ltd. 2004.

4. Winkler, R.L. Why Bayesian Analysis Hasn’t Caught on in Healthcare Decision Making. International Journal of Technology Assessment in Health Care. 17:1, 56-66; 2001.

Questions or Comments?

Karl Popper (1902-1994)

A theory can be called scientific if it can be falsified!

We recognise this pirnciple from the foundation of statistical hypothesis testing .

Falsifiability (or refutability or testability) is the logical possibility that an assertion can be shown false by an observation or a physical experiment.

For example, "all men are mortal" is unfalsifiable, since no amount of observation could ever demonstrate its falsehood. "All men are immortal," by contrast, is falsifiable, by the presentation of just one dead man

A white mute swan, common to Eurasia and North America. Two black swans, native to Australia

http://upload.wikimedia.org/wikipedia/en/b/b6/Mute.swan.slimb.750pix.jpg

http://upload.wikimedia.org/wikipedia/commons/4/41/Black.swans.slimb.750pix.jpg

1

0

00

1

1

00

0

11

0

00

0

11

0

1

00

1

0

1

0

1

0

BF] H P[

] H P[-11D] | H P[

]H | P[D]H | P[D

] H P[

] H P[-11

C

11

C

C1

C1

CD] | H P[

CD] | H P[CD] | H P[

D] | H P[-1C

D] | H P[]H | P[D

]H | P[D

] H P[

] H P[D] | H P[

]H | P[D

]H | P[D

] H P[

] H P[

D] | H P[

D] | H P[

Proof

statistical methods in clinical trials

Documents

sample i

sample test statistic

n moderate

sample sizes

sample of size n of

distribution f

true distribution

confidence interval1