542-02-#1
STATISTICS 542STATISTICS 542Introduction to Clinical TrialsIntroduction to Clinical Trials
“What’s “What’s thethe Question?” Question?”
Quote by Dr. Max Halperin
NHLBI, National Institutes of Health
542-02-#2
Primary vs. Secondary QuestionPrimary vs. Secondary Question
• Primary– most important, central question– ideally, only one – stated in advance– basis for design and sample size
• Secondary– related to primary– stated in advance– limited in number
542-02-#3
Examples (1)Examples (1)
• Physicians Health Study (PHS)– aspirin vs placebo– primary: total mortality– secondary: fatal + nonfatal myocardial
infarction (MI)
542-02-#4
Examples (2)Examples (2)
• Eastern Cooperative Oncology Group (ECOG - 1178) – tamoxifen vs placebo– primary: tumor recurrence/relapse,
disease-free survival– secondary: total mortality
542-02-#5
Examples (3)Examples (3)
• Multicenter Investigation of Limitation of Infarction Size (MILIS)– propranolol vs. placebo – primary: ultimate size of an acute
myocardial infarction– secondary: left ventricular ejection fraction
542-02-#6
Examples (4)Examples (4)
• Chronic Study of Intermittent Positive Pressure Breathing (IPPB)– long-term intermittent positive pressure
breathing vs. nebulizer
– primary: forced expiratory volume (FEV1)
– secondary: quality of life
542-02-#7
A-HEFTA-HEFT
• Ref: NEJM, Nov 11, 2004• 1050 African Americans with Class III-IV CHF• Isosorbide Dinitrate + Hydrolyzine vs. Plbo• Composite outcome (death, HF
hospitalizations, change in QoL)• DMC terminated trial early
542-02-#8
A-HEFTA-HEFT
542-02-#9
A-HEFTA-HEFT
542-02-#10
2. Subgroup Questions2. Subgroup Questions
• Questions about effect of therapy in a sub-population of subjects entered into the trial
• Assess internal consistency of results• Confirm previous hypothesis• Generate new hypotheses
542-02-#11
Subgroup AnalysesSubgroup AnalysesExamples:
Breast Cancer: Does the benefit of treatment depend on: menopausal status, stage of disease, age, etc.
AIDS: Does the benefit of treatment depend on: gender, age, initial CD4 counts, race, etc.
Analyses of a trial by subgroup results in a separate statistical test for each subgroup. As a result the probability of false positive conclusions arising in the analysis of a trial will increase.
542-02-#12
False Positive RatesFalse Positive Rates
The greater the number of subgroups analyzed separately, the larger the probability of making false positive conclusions.
No. of Subgroups At Least One False Positive
1 .05 2 .097 3 .143 4 .186 5 .226
542-02-#13
Example - Subgroup ConcernExample - Subgroup Concern
• Second International Study of Infarct Survival (ISIS 2) – 2 x 2 factorial design
(aspirin vs. placebo and streptokinase vs. placebo)
– vascular and total mortality in patients with an acute myocardial infarction (MI)
– Gemini or Libra astrological birth signs did somewhat worse on aspirin while all other signs and overall results impressive and highly significant benefit from aspirin
542-02-#14
Subgroup ConsiderationsSubgroup Considerations
• Rules for Subgroups
1. Stated in advance (in protocol)2. Limited in number3. Interpreted cautiously, qualitatively4. Look for consistency of results
• May be used to
1. Confirm or answer specific questions generated in aprevious trial (e.g. Metroprolol <65 vs. >65 age total
mortality
2. Generate new hypothesis to be tested in some future trial
3. Consistency of primary outcomes
542-02-#15
MERIT-HF Study DesignMERIT-HF Study Design
• Chronic heart failure patients
• Randomized placebo controlled
• Metoprolol vs. placebo
• Two-week placebo run in (compliance)
• Entered 3991 patients
• Terminated early
• Mean follow-up approximately one year
The International Steering Committee on Behalf of the MERIT-HF Study Group,
Am J Cardiol 1997; 80(9B):54J-58J. The MERIT-HF Study Group, ACC, March 1999.
542-02-#16
MERIT Total MortalityMERIT Total Mortality
542-02-#17
MERITMERIT
542-02-#18
MERITMERIT(AHJ, 2001)
542-02-#19
• Model Choice
– Cox– Logistic
• Test Statistic– Wald (Reg co-efficient)– Score (likelihood)
• Definition of Subgroups– US vs. World– All Countries Separately
Interaction Tests Not Unique
542-02-#20
Subgroup x TreatmentSubgroup x TreatmentInteractionInteraction
• Qualitative InteractionTreatment effect is different in direction in two subgroups
• Quantitative InteractionTreatment effect is of same direction but of different magnitude
• Statistical tests for interaction not very powerful
• Even if statistically significant, must be cautious in interpretation (PRAISE)
542-02-#21
PRAISE IPRAISE IRef: NEJM, 1996
• Amlodipine vs. placebo• NYHA class II-III• Randomized double-blind• Mortality/hospitalization outcomes• Stratified by etiology (ischemic/non-ischemic)• 1153 patients
542-02-#22
PRAISE IPRAISE I
542-02-#23
PRAISE I - InteractionPRAISE I - Interaction
• Overall P = 0.07
• Etiology by Trt InteractionP = 0.004
• Ischemic P = NS
• Non-Ischemic P < 0.001
542-02-#24
PRAISE I - IschemicPRAISE I - Ischemic
542-02-#25
PRAISE I – Non- IschemicPRAISE I – Non- Ischemic
542-02-#26
PRAISE IIPRAISE II
• Repeated non-ischemic strata • Amlodipine vs. placebo• Randomized double-blind• 1653 patients• Mortality outcome• RR 1.0
542-02-#27
Three Views:Three Views:
• Ignore subgroups and analyze only by treatment groups.
• Plan for subgroup analyses in advance. Do not “mine” data.
• Do subgroup analyses --- However view all results with caution.
542-02-#28
3. Adverse Effects3. Adverse Effects
• Any intervention should do more benefit than harm
• Not always easy to specify in advance - many variables will be measured (clinical, laboratory)
• Usually not willing or interested in demonstrating an intervention to be harmful
• May be known adverse effects from earlier trials
542-02-#29
Serious Adverse Events (SAEs)Serious Adverse Events (SAEs)
• Death
• Irreversible event
• Requires hospitalization
542-02-#30
Serious Adverse Events (SAEs)Serious Adverse Events (SAEs)
Must be reported to regulatory
agencies and IRBs
542-02-#31
Adverse EventsAdverse Events
• Challenges– Short term vs longer term– Longer term follow-up in face of early benefit– Rare AEs may be seen only with very large
numbers of exposed patients and long term follow-up
• Recent Example – COX II s– Immediate pain reduction vs longer term
increase in cardiovascular risk– Viox & Celebrex
542-02-#32
What’s the Question? What’s the Question? 4. “Natural History”
• Question not related to intervention• Control group, often a “placebo,” may be used to
describe how prognostic factors relate to eventual subject outcome (predictive, not causative)
e.g. Coronary Drug Project: Aided greatly in defining natural history of patients following a heart attack
5. Ancillary• Questions not related at all but still of scientific interest• Usually piggy-backed onto trial• Must not interfere with trial!
542-02-#33
What’s the Question?What’s the Question?6. Exploratory
• Most studies conducted to test some hypothesis• Most studies can generate new hypotheses• Multiple analyses often conducted
increased false positive (Type I) error rate• Could demand reduced significance level (or p-value)
for each test•e.g. /K (assuming independent variables)• = .05, K = 10 /K =.005 • But can’t afford this usually
• Could be selective in number of primary hypotheses• Should state key comparisons in advance• Relegate other comparisons to either
• Confirmatory or Exploratory
542-02-#34
Outcome AssessmentOutcome Assessment
542-02-#35
What’s the Response Variable? What’s the Response Variable?
• Used to answer primary/secondary questions
• Characteristics for primary/secondary outcomes
1. Well defined & stable
2. Ascertained in all subjects
3. Unbiased
4. Reproducible
5. Specificity to question
542-02-#36
• Examples
1. MILISInfarct size measurement?- Enzymes (area under curve or
peaks) - Radionuclide imaging - EKG
Issues of definition, ascertainment, reproducible
2. NOTTQuality of Life?- POMS (Profile of Mood)- SIP (Sickness Impact Profile)- Pulmonary Function- Survival
Response VariableResponse Variable
542-02-#37
3. Cardiovascular Disease Trials- Total mortality- CHD mortality- Non-fatal MI- PVC’s
4. Diabetes- Mortality- Blindness- Visual impairment- Retinopathy- Microaneurisms
Response VariableResponse Variable
542-02-#38
Surrogate Response VariablesSurrogate Response Variables
• Used as alternative to desired or ideal clinical response
• Examples– Suppression of arrhythmia (sudden death)– T4 cell counts (AIDS or ARC)
• Used often - therapeutic exploratory (Phase I, Phase II)
• Use with caution - therapeutic confirmatory (Phase III)
542-02-#39
Surrogate Response Variables (2)Surrogate Response Variables (2)
• Frequent Criticism of Clinical Trials– Too long– Too large– Too expensive
• Advantages– Perhaps smaller sample size– Detect earlier effect shorter trial– Easier
542-02-#40
Examples of FDA Approval of Examples of FDA Approval of Drugs Using Surrogates (1)Drugs Using Surrogates (1)
• Lower cholesterol without evidence of survival benefit
• Lower blood pressure without evidence of benefit for stroke, MI, congestive heart failure, or survival
• Increase bone density without evidence of decreased fractures in osteoporosis
542-02-#41
Examples of FDA Approval of Examples of FDA Approval of Drugs Using Surrogates (2)Drugs Using Surrogates (2)
• Increase cardiac function in congestive heart failure without evidence of survival benefit
• Decrease rate of arrhythmias (VPBs) without evidence of survival benefit
• Lower blood glucose and glycosylated hemoglobin without evidence about diabetic complications or survival benefit
542-02-#42
Surrogate Response VariablesSurrogate Response Variables• Requirements (Prentice, 1989)
T = True clinical endpoint
S = Surrogate
Z = Treatment
• H0: P(T|Z) = P(T) P(S|Z) = P(S)
• Sufficient Conditions
1. S is informative about T (predictive)
P(T|S) P(T)
2. S fully captures effect of Z on T
P(T|S,Z) = P(T|S)
542-02-#43
Concerns About SurrogatesConcerns About Surrogates
1. Relationship between surrogate and true endpoint may not be causal, but coincidental to a third factor
2. Other unfavorable effects of the drug
3. Effect on surrogate may correlate with one clinical endpoint, but not others
542-02-#44
Time
Surrogate
Intervention
Disease End Point
True Clinical Outcome
The setting that provides the greatest potential for the surrogate endpoint to be valid. Reprinted from Ann Intern Med 1996; 125:605-13.
542-02-#45
Time
True Clinical OutcomeDisease
SurrogateEnd PointA
Surrogate
Intervention
BDisease
End Point
True Clinical Outcome
Intervention
CDisease
SurrogateEnd Point
True Clinical Outcome
True Clinical Outcome
DiseaseSurrogateEnd Point
D
Intervention
Reasons for failure of surrogate end points. A. The surrogate is not in the causal pathway of the disease process. B. Of several causal pathways of disease, the intervention affects only the pathway mediatedthrough the surrogate. C. The surrogate is not in the pathway of the intervention’s effect or is insensitive to its effect. D. The intervention has mechanisms for action independent of the disease process. Dotted lines = mechanisms of action that might exist.
542-02-#46
Examples Using “Surrogates”Examples Using “Surrogates”
• Chronic Obstructive Pulmonary Disease
• Cardiac Arrhythmias
• Heart Failure
• AIDS
• Osteoporosis
542-02-#47
Nocturnal Oxygen Nocturnal Oxygen Therapy Trial (NOTT)Therapy Trial (NOTT)
• Hypothesis– Is continuous oxygen therapy better than nocturnal oxygen
therapy in chronic obstructive lung disease patients? • Surrogates• Survival
• Design– 203 patients– Two-sided 0.05 Type I error– Randomized– Multicenter– Sequential data monitoring
542-02-#48
Possible NOTT SurrogatesPossible NOTT Surrogates• PaO2
• Hematocrit
• FEV1 % Predicted
• FVC % Predicted• Maximum Workload• Heart Rate• Mean Pulmonary Artery Pressure• Cardiac Index• Pulmonary Vascular Resistance• Neuropsychiatric Impairment• Quality of Life
542-02-#49
The Nocturnal Oxygen Therapy TrialThe Nocturnal Oxygen Therapy Trial
NOTT Survival Experience for 102 Patients on Nocturnal Oxygen (NOT) and 101 Patients on Continuous Oxygen Therapy (COT)
542-02-#50
Cardiac ArrhythmiasCardiac Arrhythmias
• Cardiac arrhythmias associated with sudden death
• Class of drugs developed to suppress arrhythmias
• FDA approved for high risk patients
• “Off-label” use increased
542-02-#51
Cardiac Arrhythmia Suppression TrialCardiac Arrhythmia Suppression Trial
Hypothesis
Does suppression of arrhythmia following an MI reduce incidence of:
1. Sudden death
2. Total mortality
542-02-#52
Cardiac Arrhythmia Suppression TrialCardiac Arrhythmia Suppression Trial
Design
• Randomized Double Blind
• Three Drug Arms vs. Placebo
• Multicenter Study
• Group Sequential Data Monitoring
• One Sided (0.025 Type I Error) for Benefit
• Advisory One Sided (0.025) for Harm
• Run-in Period (Arrhythmia Suppression)
542-02-#53
Cardiac Arrhythmia Suppression TrialCardiac Arrhythmia Suppression Trial
Early Termination in Two Drug Arms
Drugs Placebo
Sudden Death 33 9
Total Mortality 56 22
542-02-#54
CAST Sequential BoundariesCAST Sequential Boundaries
Early Termination in Two Drug Arms
Drugs Placebo
Sudden Death 33 9
Total Mortality 56 22
542-02-#55
Chronic Heart Failure (CHF)Chronic Heart Failure (CHF)
• CHF is a serious problem• Patients have reduced cardiac function &
reduced ability to conduct daily activities• Severity stages: NYHA Class I-IV• Mortality rates increase with severity class• Improving cardiac function, exercise
capacity & quality of life desirable• Drugs developed/approved on that basis
542-02-#56
PROMISEPROMISE(Packer et al. NEJM 1991)(Packer et al. NEJM 1991)
• Problem
– Patients with advanced (Class IV) congestive heart failure have 40% one year mortality
– Milrinone (a phosphodiesterase inhibitor) enhances cardiac contractility
– Milrinone improved cardiac output, exercise tolerance, and symptoms
• Hypothesis
Does milrinone increase survival in severe (Class III or IV) congestive heart failure patients?
542-02-#57
PROMISEPROMISEDesign
• Randomized multicenter double-blind, placebo-control trial
• Patients with Class III or IV congestive heart failure for 3 months
• Two-sided 5% significance level, 90% power for 25% reduction in mortality
• 1088 patients entered
• Milrinone (10 mg/4 times per day) vs. matched placebo
• Standard therapy of digoxin, diuretics, and a converting enzyme inhibitor
542-02-#58
PROMISE Mortality ResultsPROMISE Mortality Results
542-02-#59
AIDS Clinical TrialsAIDS Clinical Trials
• Clinical Outcomes– Death– Progression to AIDS– Progression to ARC
• Surrogate Outcome– CD4 Cell Count
542-02-#60
State-of-the-Art ConferenceState-of-the-Art Conference• Results
– AIDS/Death• *8 trials positive• 7/8 had positive CD4 cell changes• *8 trials negative• 6/8 had positive CD4 cell change
– Death• *4 trials positive• 2/4 CD4 positive• *7 trials negative• 6/7 CD4 cell positive
542-02-#61
OsteoporosisOsteoporosis(Riggs et al. NEJM, 1990)
• Bone loss in postmenopausal women leads to increase risk of fracture
• Sodium Fluoride stimulates bone formation and increased bone mass (double)
• Hypothesis– Will Fluoride treatment decrease rate of vertebral fractures?
• Design– Randomized, double blind, placebo-controlled– 202 postmenopausal women randomized– All received calcium supplementation
542-02-#62
Osteoporosis Fluoride TrialOsteoporosis Fluoride TrialResults
• Fluoride increased bone density by 35%– 35% (p = 0.0001) in spine– 12% (p = 0.0001) in femoral neck
• Fluoride decrease bone density by 4% in wrist (p = 0.02)
• Vertebral fractures higher on Fluoride (F 163, P 136, p < 0.05)
• Non-vertebral fractures higher on Fluoride (72 vs. 24; p = 0.01)
• Fluoride concluded not effective as a treatment for post-menopausal osteoporosis
542-02-#63
Concluding Remarks Concluding Remarks on Surrogateson Surrogates
• Surrogates play an important role in the development of Phase I, II, and pilot Phase III studies
• Treatments may affect more than one mechanism
• “Surrogates” do not reliably predict treatment on clinical outcome
• Continued success in a given field is not even guaranteed
• Reliance on “surrogates” should be minimized
542-02-#64
Study PopulationStudy Population
542-02-#65
What Is The Study Population? (1)What Is The Study Population? (1)• Subset of the general population
determined by the eligibility criteria
GENERAL POPULATION
eligibility criteria
STUDY POPULATION
enrollment
STUDY SAMPLEobserved
542-02-#66
The General Flow of The General Flow of Statistical InferenceStatistical Inference
Patient Population
Sample* Protocol
Patients On Study
Observed Results
Inference about Population
*Sample of Opportunity: random or non-random?
542-02-#67
What Is The Study Population? (2)What Is The Study Population? (2)
Defined by Eligibility Criteria
–Define in advance
–Characterize population• Impact of results• Replication of study
–Biased sample does not imply biased trial!
542-02-#68
Who Should Be Studied?Who Should Be Studied?
Homogeneous vs. Heterogeneous
1. Well defined Can’t specify easily
2. Mechanism of action Don’t know if one group
known will respond differently
3. Don’t dilute results Easier subject recruitment
4. Infer results specifically Easier to generalize
542-02-#69
Eligibility CriteriaEligibility Criteria• Need to describe who we intend to study
– State in advance– Precision related to importance
• Consider– Potential for benefit
• Homogeneous population• Heterogeneous population
– Ability to detect benefit High risk but not too high– No contraindications– No competing risk– Compliance likely
• Impact– Generalization– Ease of recruitment– Risk or event rates
542-02-#70
RecruitmentRecruitment• More difficult than anticipated• Yield not 100%
– Eligibility criteria (age, prior history, prior treatment, etc.)– Exclusion Criteria– Physician Refusal– Patient Refusal
• Many trials yield 10-15% randomized of those screened
• Must be a team effort– Physicians– Nurses– Data Manager or Coordinator
• Health Screening Effect lower risk than expected!
542-02-#71
Accrual TrackingAccrual Tracking
542-02-#72
Measures of Efficacy Measures of Efficacy from Clinical Trialsfrom Clinical Trials
542-02-#73
Characteristics of a Characteristics of a Good Summary Measure Good Summary Measure
• Easy to compute
• Easily understand by all (non-technical)
• Minimal variance across baseline characteristics
• Statistically sound
542-02-#74
Purpose and Limitations Purpose and Limitations of Clinical Trialsof Clinical Trials
• Clinical trials are designed to detect differences between treatment groups– relative risk ( or relative risk reduction)
– mean absolute risk reduction (relative to placebo)
• In clinical trials, the method of assessing the primary endpoints is usually pre-specified and stated in terms of RRR or RHR.
• Clinical trials are not designed to directly estimate the incidence in the population at risk.
• The population in a clinical trial may not completely represent the population to be treated
542-02-#75
Measures Currently UsedMeasures Currently Used
• Relative Risk (RR) and Relative Risk Reduction (RRR)
• Odds ratio (OR)
• Relative Hazard (RH) and Relative Hazard Reduction (RHR)
• Absolute Risk Reduction (ARR)
542-02-#76
Outcome MeasuresOutcome MeasuresRelative Risk (RR)
RR = P1/P2
Relative Risk Reduction
RRR = 1 - RR
Odds Ratio (OR)
Absolute Risk Reduction (ARR)
ARR = P1 - P2
)P(1P
)P(1POR
12
21
542-02-#77
0
10
20
30
40
50
Study (first author of paper)
Ann
ual i
ncid
ence
rat
e (%
)
Low
Medium
High
ARR
Placebo incidence rates of vertebral fracture from several studies Placebo incidence rates of vertebral fracture from several studies Efficacy as measured by relative risk reduction was reasonable stable over studiesEfficacy as measured by relative risk reduction was reasonable stable over studies
Absolute risk reduction varied across studiesAbsolute risk reduction varied across studies
542-02-#78
Study Incidence inplacebo (%)
Incidence inrisedronate (%)
RRR ARR
VERT NA 16.3 11.3 59% 5.0%
VERT MN 29.0 18.1 51% 10.9%
There is a danger in using ARR to compare efficacyThere is a danger in using ARR to compare efficacy
The drug used is the same in both studiesThe drug used is the same in both studies
542-02-#79
We Need Both MeasuresWe Need Both Measures
• Effectiveness– Related to RR
• Benefits– Related to absolute risk
542-02-#80
RRRRRR
• Usually– Constant over baseline characteristic– Constant over study time– Easy test of interaction – When not constant it is usually piece-
wise constant– Differences seen among different
studies can be viewed as random– Good statistical models are available
542-02-#81
Absolute RiskAbsolute Risk
• Unlikely– To be constant over time– To be constant over baseline characteristics– To be able to describe with simple models
• Consequences– Patients characteristics can change with study time– Differences among studies cannot be ignored
542-02-#82
ConclusionConclusion
• If the RRR is constant and detailed information about the AR is provided both summary measures provide useful information about the effectiveness and benefits of treatment
• ARR is not a simple index of therapeutic effectiveness. It is a function of the incidence rate for the event of interest in the population studied and may not be reflective of the true ARR for the patient sitting before you.
• There is concerns about using the rate in the placebo group from a clinical trial as a surrogate for the true baseline risk for an individual patient.
• Before making a recommendation, one needs to know the risk profile of the patients to be treated
542-02-#83
SummarySummaryDefining the QuestionDefining the Question
• Defined carefully in advance• Must be clinically relevant• Prioritize into primary, secondary, …• Design built around primary question(s)• Eligibility criteria define population
studied and inferences to be made• Surrogates desirable but risky• Need the relevant measure of efficacy