1 outline preliminaries - a typical scenario - why randomize? bias advantages disadvantages -...

194
1 Outline Preliminaries - A typical scenario - Why randomize? Bias Advantages Disadvantages - Blinding - Primary and secondary questions - Clinical goals, objectives, hypotheses - Control groups Clinical study design - Parallel - Crossover - Fixed sequence rising dose Allocation procedures - Fixed probability - Variable probability Questions and comments

Upload: robert-bailey

Post on 28-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

AN INTRODUCTION TO CLINICAL STATISTICS AT MERCK RESEARCH LABORATORIESWhere are we going?
The role of clinical study design, randomization, and allocation and component
ID schedules in the scientific process
Scientific
Questions
Study Results and Conclusions
Which study design?
Resources (time, money, manpower, drug availability)
How to interpret results?
Some prior evidence that MK works
MD wants to make “right” decision
Experiment (clinical trial)
*
What is the primary question?
Most interest; clearly defined; state in advance, do not modify; realistic; all study planners must agree
Example
Condition
Hypertension
Drug
MK
*
From active controls?
To active controls?
Secondary questions
2. Subgroup investigation
*
Example
Goal
To observe the effect of maintenance level oral dosing of MK in patients with hypertension.
Objective
*
Primary Hypothesis
The proportion of hypertensive patients whose supine diastolic blood pressure is reduced to 80 mmHg following 12 weeks of oral dosing with MK is at least 20 percentage points greater than that of Drug A.
Secondary Hypothesis
The proportion of black hypertensive patients whose supine diastolic blood pressure is reduced to 80 mmHg following 12 weeks of oral dosing with MK is within 10 percentage points of the proportion for non-Blacks.
Example (cont’d)
What study design should I use?
Hypothesis Testing
*
Observed differences in treatments: patients, treatment, or chance?
Examples: Placebo (negative control), vehicle control, active control
Placebo control
Like study therapy (shape, weight, color, taste, odor, packaging, route of administration, dosing regimen)
Placebo effect
*
+
-1
-2
-2
-3
-4
-4
-4
-5
-5
-5
-5
-5
-6
-6
-6
-6
-6
-6
-7
-7
-7
-7
-8
-8
-8
-9
-10
-13
-14
“Highest” probability values
x
-2.4
-10.8
-6.2
-6.6
-6.4
-6.0
-6.0
s2
1.3
6.7
0.2
40.3
7.3
2.5
0.0
Sample
Estimates
Target Population
Hypertensive Patients
Hypertensive Patients
Hypertensive Patients
In General
Drug A
Drug B
*
Drug A vs. Drug B
Difference of 5 mmHg in mean change from baseline SuDBP is important.
Examples: Drug A Drug B
-15 -10
-5 -10
Clinical Hypothesis: The true mean change in SuDBP following treatment with Drug A IS 5 mmHg different from the true mean change following treatment with Drug B.
Conjecture about truth (parameters)
*
Each patient receives one treatment
Treatments (e.g., drug, dose, formulation) are indicated by the comparison(s) of interest
Crossover
May receive all treatments
*
Disease Condition
Carryover Effects?
Pharmacokinetics: Plasma Concentration vs. Time Curve
*
Design and Randomization Scheme
Design and Randomization Scheme
Design and Randomization Scheme
2
2
3
3
0.80
1.00
1.25
(1.08)
(1.03)
(0.98)
*
0
1
2.5
AUC
Cmax
0.5
1.3
1.25
1.00
0.80
AUC
Cmax
(0.97)
(0.86)
(0.77)
(1.01)
(0.92)
(0.84)
Drug Interaction Study
*
0.0
1.0
2.0
1.5
0.5
Parent
Compound
Metabolite
Parent
Compound
Metabolite
AUC
Cmax
AUC
AUC
1600
1200
800
400
0
25
50
100
Cmax
400
300
200
100
0
25
50
100
AUC
*
Cmax: Active Treatments vs. No Treatment
All Subjects
4 Treatments, 2 Periods,
4 Treatments, 2 Periods,
Dose strengths are mg.
*
(N=14 to 20)
P
50
100
150
A
P
50
100
150
A
P
50
100
150
A
P
50
100
150
A
P
50
100
150
A
Changes vs. Baseline at Each Time Point Over 24 Hours
Supine Diastolic Blood Pressure (mmHg)
Pbo
-400
-300
-200
-100
0
Baseline
Design and Randomization Scheme
Factorial Arrangement of Treatments
Each patient simultaneously receives more than one level of each treatment factor
Treatment factor
Treatment factor level
Treatment
(e.g., groin + patch B; armpit + patch C)
Number of treatments equals product of numbers of treatment factor levels
(e.g., 3x3=9)
Patch Type
Example Allocation Scheme
(combination of treatment factor levels)
*
Design and Randomization Scheme
50 = 50 Drug A 150 = 150 Drug A
Randomization: Within a panel, to either placebo or
that panel’s dose. (Sometimes
randomization to a panel.)
Design and Randomization Scheme
Example Allocation Scheme
Supine Blood Pressure
in Period Baseline Values
*
*
0
6
18
30
42
1
7
14
21
28
35
11
11
Design and Randomization Scheme
with placebo substituted for one of
the doses.
Design and Randomization Scheme
with placebo inserted between two
of the doses.
Randomization: To one sequence of rising dose / placebo treatments within
each panel. (Could also randomize subjects to a panel.)
Dosing Day / Period
Design and Randomization Scheme
*
within each panel. (Could also randomize subjects
to a panel).
Dosing Day / Period
In concert with
Conscious or subconscious factors
Bias never completely controlled
Ethics
Bias - Allocation of Patients to Treatment
Conscious or unconscious bias in the allocation of patients to new therapy based upon ethics and/or drug favoritism
Situation Compare MK to placebo
Belief MK is effective
Drug Favoritism
Sicker patients receive placebo (“hopeless”)
*
“Beating” the randomization scheme
Tell-tale lab results or AEs
Small block size in randomization scheme
*
Asymptotically (“really large samples”), tends to balance unknown baseline prognostic/ concomitant variables among treatments
Supports blinding procedures
Lessens bias in assessment of response
Supports statistical design
Guides statistical analysis
Randomization
Disadvantages
For small or moderate sample sizes, may not balance unknown baseline prognostic/concomitant variables among treatments
Most appropriate randomization scheme may be difficult or costly to administer
Patient recruitment problems
*
Candidates for blinding
Placebo control
Active control
Vaccines
Dummy vaccinations often unacceptable
*
Fixed probabilities
*
Patient
Procedure
1
2
3
4
5
6
7
8
9
10
Systematic
M
M
M
M
H
H
H
H
C
C
Simple(a)
M
H
M
C
M
H
C
H
H
M
Blocked
M
C
M
H
H
C
H
M
M
H
*
Treatment assignment bias minimized
*
Constant vs. variable block size
Allocations at end of block may be deterministic
Need to use full blocks of allocation numbers
*
Separate randomization within levels of prognostic factor
May not achieve intended treatment group ratios
In small samples, tends to improve efficiency of estimators and power of statistical tests; less so for large samples
Need to limit number of strata and number of levels in each strata; focus on most important covariates
Better understanding of disease process
Generate new clinical hypotheses
*
*
Yes
Yes
Yes
No
Treatment assignment depends upon previous patients’ responses
Useful for allocating more (or fewer) patients to treatments achieving desired (not desired) outcomes
Some use in Clinical Pharmacology for panels of subjects
Covariate Adaptive
*
General Reading
Cox, D.R., Planning of Experiments, New York: John Wiley and Sons, 1958.
Friedman, L.M., Furberg, C.D., DeMets, D.L., Fundamentals of Clinical Trials, Littleton, MA.: PSG Publishing Company, 1985.
Hicks, C.R., Fundamental Concepts in the Design of Experiments, Fort Worth: Saunders College Publishing, 1982.
Peterson, R.G., Design and Analysis of Experiments, New York: Marcel Dekker, 1985.
Piantadosi, S., Clinical Trials - A Methodological Perspective, New York: John Wiley and Sons, 1997
Pocock, S.J., Clinical Trials: A Practical Approach, New York: John Wiley and Sons, 1983.
Rosenberger, W.F., Lachin, J.M., Randomization in Clinical Trials - Theory and Practice, New York: John Wiley and Sons, 2002.
*
*
Experimental planning
Sample size
Blocking Factors and Covariates
Inadequate Sample Sizes and Scientific Inference
*
The Question
*
Hypothesis Testing
Experimental Outcome
Caution: Failing to reject the null hypothesis does not mean that you have proven it to be true.
Power
(1-b)
False
Negative
(b)
False
Positive
(a)
Truth
Calculation
where
and
Tst
Appropriate sample sizes: nA, nB
Appropriate power: 1 - b
dCM = Clinically meaningful difference
dMD = Minimally detectable difference
Ideally: dMD dCM d
Example
Mean change from baseline
dCM £ d
Statistically significant, p < a
Accepted power, 1 - b
Variability (estimated), s2 (s2)
Real difference, d = mA - mB, between Drug A and Drug B
Completing patients with useable data
*
Power: 1 - b = .80
Estimated standard deviation of mean change from baseline: s = 9 mmHg
True difference d = mA - mB = ±5 mmHg between Drugs A and B in population
Sample size n = 52 (completing) patients per treatment group (with useable data)
Example
*
Sample size calculations
False
Positive
52
86
92
41
*
Statistically significant, p<a
Fixed sample size per treatment, n
Variability (estimated), s2 (s2)
*
*
Sample size calculations
False
Positive
52
20
52
52
*
Statistically significant, p<a
Accepted power, 1 - b
Variability (estimated), s2 (s2)
*
*
Detectable outcome calculations
52
20
52
52
5 mmHg
8.2 mmHg
6.5 mmHg
6.7 mmHg
*
Must accommodate and/or take advantage of:
Final ANOVA model degrees-of-freedom (d.f.)
Effectiveness of blocking factors and covariates
Dropouts
Blocking factors
Ineffective: use up d.f. without reducing MSE; can reduce power; may increase completing sample size.
Effective: use up d.f. but reduce SSE (MSE) at a proportionally faster rate; can increase power; may reduce completing sample size.
Covariates
Reduce SSE (MSE) while spending few d.f.; can increase power; may reduce completing sample size.
Carefully consider original source of variance estimate.
*
*
Sample Size Given
In Most Books:
Dropouts, other missing data
*
No true difference, (d=0) between therapies?
Lack of power, 1 - b?
Under estimation of variance, s2
Over estimation of true difference (dMD≥dCM≥d)
Dropouts, missing data, patient drift, other issues not accounted for
Lack of power
Adequate power
Probably no clinically important true difference between therapies (dMD≤dCM≤d)
*
Three Questions of Interest
Sample Size: What sample size, n, per treatment group do I need to estimate the true difference, d=mA-mB, in mean change from baseline SuDBP within dP mmHg given that I want (1-a)% confidence?
*
Three Questions of Interest
*
*
Bioequivalence (and Drug Interaction)
NOT same as usual frequentist testing
H0: mA/mB0.80 or mA/mB1.25
H1: 0.80<mA/mB<1.25
Algebraically equivalent to 90% confidence interval (CI) approach
*
Case 2: mA/mB1, 0.80< mA/mB<1.25
(equivalence), sample size
Three Questions:
*
Similarity
Probability of Concluding Bioequivalence: Given a fixed total sample size, n, what probability do I have that the 90% CI for the ratio of the true means, mA/mB, is contained in [0.80, 1.25] given that mA/mB=1 (or some other value specified between 0.80 and 1.25)?
*
Straight forward
Assumptions are everything
Normality:
Two Sample Pooled Variance t-Test
Statistical Assumptions
Original Situation and Calculation:
*
Drug A (A) vs. Drug B (B), CM = = A - B = ±5mmHg, = A = B ± 9mmHg, = .05, 1 - = .80, 2-tailed pooled variance t-test 52 per treatment group (sample size formula).
Situation
1
2
3
4
5
6
7
8
9
10
n
10
20
30
40
50
60
55
53
51
52
Sample Size: Virtual Clinical Trials
Allows for evaluation of competing trial designs and their corresponding sample size requirements (power and detectable outcomes, too).
Generate virtual patients with virtual responses according to a proposed trial design taking into account:
Covariate distribution models
Input-output models
Disease progression
PK/PD relationships
Execution models
Multiple hypothesis testing (estimation) within a study
Multiple endpoints
Multiple timepoints
Multiple statistical methods
Number of Tests (k)
When to increase sample size?
Multiple hypothesis tests (estimations) are joined by “At Least” or “Or”
Example: As compared to Placebo, Drug A is effective for at least one of the five endpoints.
*
A single hypothesis test (estimation)
Multiple hypothesis tests (estimations) are joined by “All” or “And”.
Example: As compared to Placebo, all 5 doses of Drug A are effective.
Example: Both Drug A and Drug B are effective as compared to Placebo.
Closed testing procedures where the multiplicity issues can be ordered/ranked.
Example: Dose response studies
Experimental planning
Sample size
Blocking Factors and Covariates
Inadequate Sample Sizes and Scientific Inference
*
General Reading
Brush, G.G. (1988). How to Choose the Proper Sample Size, Volume 12, Milwaukee, WI: American Society for Quality Control.
Cohen, J. (1988). Statistical Power Analyses for the Behavioral Sciences, 2nd edition, Hillsdale, NJ: Lawrence Erlbaum Associates.
*
General Reading
*
General Reading
Parker, R.A. and N.G. Berman (2003). “Sample Size: More Than Calculations”, The American Statistician, 57(3):166-170.
Senn, S. (1997). Statistical Issues in Drug Development, New York, John Wiley and Sons, Chapter 13.
*
Uncertainty and variability
Experimentation
Parameter
Summary
Mean
Variance
Sample statistic estimates population parameter (realizations observed in study)
Summary
Mean
Variance
Standard
Deviation
Example: H0: mA = mB
Reject / fail to reject null hypothesis
Cannot prove true
Alternative hypothesis, H1:
Statement about target population to assume once null hypothesis rejected
Example: H1: mA mB
*
In truth, no difference between
Drug A (mA) and Drug B (mB) in target population, mA= mB
Falsely assume true difference between Drug A (mA) and Drug B (mB)
False positive (Type I error) rate, a
Accepted risk of false positive
Traditionally, a = .05
Sample results fail to reject null hypothesis, H0:
In truth, difference between Drug A (mA) and Drug B (mB) in target population, mAmB
Falsely assume no true difference between Drug A (mA) and Drug B (mB)
False negative (Type II error) rate, b
Accepted risk of false negative
Traditionally, b = .05, .10, or .20
*
Probability of rejecting null hypothesis, H0:
In truth, difference between Drug A (mA) and Drug B (mB) in target population, mAmB
Correctly conclude difference between Drug A (mA) and Drug B (mB)
Power = complement of false negative rate, 1-b
Traditionally, 1 - b = .95, .90, .80
*
Experimental Outcome
Caution: Failing to reject the null hypothesis does not mean that you have proven it to be true.
Power
(1-b)
False
Negative
(b)
False
Positive
(a)
Truth
If nonnormally distributed populations:
Should not use small (a.01) Type I error rates
Power
Platykurtic: (less central) less power
Leptokurtic: (more central) more power
*
Two Sample Pooled Variance t-Test
In general, penalties for nonnormality decrease when sample sizes, nA and nB,
Increase: nA and nB
Distribution-free methodology
Wilcoxon Rank Sum Test
Does not require normality
Different, but related hypothesis
*
Variance population A variance population B
Type I error rates not maintained at a
Inflated (too high) ­
Conservative (too low) ¯
Smaller Sample Sizes
More Moderate Sample Sizes
Larger Sample Sizes
Allocation Schedules
and IVRS
Block randomization: procedure whereby randomization occurs within subsets of the total number of allocation numbers (ANs).
Block: range of consecutive ANs guaranteed to contain the exact ratio of treatment regimens specified in the protocol.
Block Size*: number of ANs in a block.
*also called Blocking Factor
1:1 planned trt. regimen ratio
Block Size: 4
Allocation Schedule
Note that each block has a 1:1 ratio of A to B.
AN
00001
00002
00003
00004
00005
00006
00007
00008
Why Use Block Randomization?
Ensure (approximate) proper treatment regimen balance at any point in the trial
Allow valid interim analyses while the trial is ongoing
Account for patient drift in long trials (patients who enroll early in the trial often differ from those who enroll late)
Support blinding
each study center
important subgroups, e.g.,
.
*
Why Use Block Randomization?
Note: Technically, all randomized studies are block randomized. If no smaller block size is specified, then the block size = the total number of ANs.
*
How to Choose the Block Size
*
Example
Trt. regimen ratio: 1:1
Allocation Schedule
If block size is too small, it can lead to unblinding
*
How to Choose the Block Size
If block size is too large, it can lead to trt. regimen imbalance.
Example
Trt. regimen ratio: 1:1
B
B
A
B
B
A
A
A
Block
1
*
Stratified Randomization
*
that can influence the response
Strengthen the validity of the comparison
between trt. regimens
shipping
Stratification factor: Variable potentially related to the outcome of interest.
Two types of stratification factors:
Biological
Race
Clearly, biological factors may be related to a biological outcome.
Why do we often stratify on study center?
Study conduct may differ among centers
Centers often enter and/or leave the
study at different times
unknown or difficult to measure may differ
in different populations
*
Examples
Specific combination of stratification factor levels if there are multiple stratification factors
*
- Diabetic, Non-Diabetic
Medical
Condition
Diabetic
Non-Diabetic
One or more complete blocks of ANs are
assigned to each stratum
in each stratum
have the same number of blocks of
ANs
Trt. regimen ratio 1:1
Levels Mild, Mod., Sev.
Number of ANs 8
*
*
*
All potential randomizations
Trt. regimen ratio 1:1
Ratio of blocks (Center 1:Center 2) 2:1
Stratification factor 2* Disease Severity
Levels Mild, Mod., Sev.
Number of ANs 24
Center 2: 02001 to 02008
*2 centers x 3 severity categories = 6 strata
There are 4096 possible randomizations!
*
The allocation schedule you receive might contain the following randomization
Strat. Factor
*
5,5,10
1
treatment of
4,8,12, ...
1:2:2:1
6,12,18, ...
4
1:3:3:1
8,16,24, ...
0
10
20
30
Issue
Parallel
Crossover
Precision
(Positive
Correlation)
Less
More
Preference
Evaluation
No
Yes
Duration,
treatment of hypertension?
(Variable) Dosing Period
0
10
20
30
Dosing Day / Period
Fixed probabilities
Patient
Procedure
1
2
3
4
5
6
7
8
9
10
Systematic
M
M
M
M
H
H
H
H
C
C
Simple
(a)
M
H
M
C
M
H
C
H
H
M
Blocked
M
C
M
H
H
C
H
M
M
H
Race White Black