lecture 2: statistical overview elizabeth garrett [email protected] child psychiatry research methods...

39
Lecture 2: Statistical Overview Elizabeth Garrett [email protected] Child Psychiatry Research Methods Lecture Series

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Lecture 2:Statistical Overview

Elizabeth Garrett

[email protected]

Child Psychiatry Research Methods Lecture Series

Page 2: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Two Types of Statistics• Descriptive Statistics

– Uses sample statistics (e.g. mean, median, standard deviation) to describe the sample and the population from which it was drawn.

– Not “decision” oriented– Pilot studies are descriptive

• Statistical Inference– Inference: The act of passing from statistical sample data to

generalizations …. usually with calculated degrees of certainty.

– Key elements:

sample generalizations certainty

– Often used for making decisions: drug works or it doesn’t ADHD is genetically inherited or it isn’t

Page 3: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Example 1: “Viral Exposure and Autism”

(Deykin and MacMahon, 1979)

• Hypothesis: – Direct exposure to or clinical illness with

measles, mumps, or chicken pox may play a causal role in autism.

Page 4: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Example 2:“Neurobiology of Attention in Fetal Alcohol

Syndrome”(Lockhart, 2001?)

Hypotheses:(1) The neurobiological basis of problems in

response inhibition and motor impersistence in children with FAS is related to abnormalities in the “anterior” frontostriatal network.

(2) The neurobiological basis of problems in orienting/shifting attention in children with FAS is related to abnormalities in the “posterior” parietal network.

Page 5: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

4 Statistical Plan

4.1 Primary outcome(s)

4.2 Statistical analysis

4.3 Sample size justification

Page 6: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

4 Statistical Plan

4.1 Primary outcome(s)

Common Problem: Primary outcome variable not defined!

Page 7: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Defining Primary Outcome Variables

Continuous– MRI volumes– fMRI activation levels– blood pressure– response time – number of voxels activated– cost of hospital visit– neurobehavioral test score

Page 8: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Categorical

Nominal• Binary (two categories)

– gene carrier status (as diagnosed by….)

– measles (as diagnosed by….)

– ADHD (as diagnosed by….)

• Polychotomous (more than two unordered categories)

– region of activation

Ordinal– severity score (see BPI)

– symptom rating

– “on a scale of 1 to 5….”

Page 9: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Example 1: Primary outcomesDisease history of

• measles

• mumps

• chicken pox

Example 2: Primary outcomesMRI volumes of

• corpus collosum

• caudate

• cerebellar vermis

• parietal lobes

• frontal lobe

Page 10: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

4 Statistical Plan

4.1 Primary outcomes

- Be clear about each variable and how it is measured.

- NOT okay to say “our primary outcome variable is cognition.”

- It IS okay to say “our primary outcome variable is cognitionas measured by the WISC-III.”

- Multiple outcomes are okay: e.g. MRI volumes and cognitive tests can both be primary outcomes.

Page 11: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

4 Statistical Plan

4.1 Primary outcome(s)

4.2 Statistical analysis

- How are you going to answer specific aims using

primary outcome variable?

Page 12: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Commonly seen statistical methods in analysis plans:– t-test– confidence interval– Chi-square test– Fisher’s exact test– linear regression– logistic regression– Wilcoxon rank sum test– ANOVA– GEE

Page 13: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Key Idea: Data Reduction

• Statistics is the art/science of summarizing a large amount of information by just a few numbers and/or statements.

• Examples: pvalue = 0.01

OR = 5.0

prevalence = 0.20 0.05

Page 14: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Example 1:

• Recall aim: To compare measles history in autistic versus non-autistic kids.

• Methods: – Odds ratio: Quantifies risk of disease in two

exposure groups– Confidence interval: Answers “What is

reasonable range for true odds ratio?”– Fisher’s exact test: Answers “Is the risk the

same in the two exposure groups?”

Page 15: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Statistical Analysis

“We will measure the risk of autism associated with measles using an odds ratio. Significance will be assessed by Fisher’s exact test and a 95% confidence interval will be calculated.”

Page 16: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Example 2:

• Recall aim: To compare MRI volumes in FAS kids and controls.

• Methods: – Two-sample t-test: Answers “are the mean

volumes in the two groups different?”– 95% confidence interval: Answers “what is the

estimated difference in volumes in the two groups, approximately?”

Page 17: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Statistical Analysis

“To answer the specific aims, we will compare the caudate volumes in the FAS group to those in the control group using a two sample t-test. We will also estimate a 95% confidence interval to provide a reasonable range of the difference in mean volumes in the two groups.”

Page 18: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

4 Statistical Plan

4.1 Primary outcome(s)

4.2 Statistical analysis

- Data reduction is key: How are you going to combineinformation from all patients to answer scientific question?

- Specific methods need to be designated.

- Study design often changes after statistical issues are considered!

Page 19: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

4 Statistical Plan

4.1 Primary outcome(s)

4.2 Statistical analysis

4.3 Sample size justification

- Do you have enough subjects to answer the question,but not too many so that you are efficient (in terms ofmoney and risks)?

Page 20: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Power and Sample Size Considerations

• All about precision! (Recall Craig last time)• Intuition:

– the more individuals, the better your estimate

– the more individuals, the less variability in your estimate

– the more individuals, the more precise your estimate

– but, how precise need your estimate be?

• Example 1:– Odds ratio of measles for autism: 3.7

– Interpretation: Babies exposed to measles prenatally or in early infancy are at 3.7 times the risk for autism compared to children who are unexposed.

– Strong result?

Page 21: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Three Theoretical Outcomes

95% confidence intervals

Odds Ratio

0 5 10 15 20 25 30

(

(

(

)

)

)

Odds Ratio

0.05 0.1 0.25 0.5 1 5 10 20

(

(

(

)

)

)

Page 22: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Actual Result from Study

Odds Ratio

0 5 10 15 20 25 30

( )

Odds Ratio

0.05 0.1 0.25 0.5 1 5 10 20

( )

95% Confidence interval: (0.97, 14.2)Fisher’s exact pvalue = 0.12

Page 23: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Magnitude versus Significance

• Magnitude of finding: How big is the odds ratio?• Statistical significance of the finding: Is the odds

ratio different than 1?• Clinical significance of the finding: Is the size of

the estimated odds ratio worth worrying about?

• Autism and Measles:– exposure to measles is rare

– need a lot of subjects to show significant difference!

Page 24: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Justifying sample size in a study design

Hypothesis testing:Ho: OR=1

Ha: OR=3

Which is a more reasonable conclusion?

Issues:type 1 error ()type 2 error ()

Ho

Ha

log odds ratio-log(3) log(1) log(3)

Page 25: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Type I and II Errors

• Type I error ():– The probability that we reject Ho given that it is true

– The probability that we find an association between measles and autism when, in truth, one does not exist.

• Type II error ():– The probability that we reject Ha given that it is true

– The probability that we find no association between measles and autism when, in truth, one does exist.

• Note: Power = 1 -

Page 26: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Sample size dictates overlap

Small samples

Large samples

Scenario 1:

Scenario 2:

log odds ratio

-log(9)

-log(6)

-log(3)

log(1)log(3)

log(6)log(9)

log(12)

log odds ratio

-log(9)

-log(6)

-log(3)

log(1)log(3)

log(6)log(9)

log(12)

Page 27: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Decision Rule

• Before study is completed, you know what you need to observe to find evidence for OR=1 or OR=3

• Scenario 1: If observed OR > 3.6, then conclude that there IS an association

• Scenario 2: If observed OR > 1.6, then conclude that there IS an association.

Page 28: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Type I Error: alpha

Alpha usuallypredetermined = 0.05

log odds ratio-lo

g(9)

-log(

6)

-log(

3)

log(

1)

log(

3)

log(

6)

log(

9)

log(

12)

log(

3.6)

log odds ratio-lo

g(9)

-log(

6)

-log(

3)

log(

1)

log(

3)

log(

6)

log(

9)

log(

12)

log(

1.63

)

Page 29: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Type II Error: beta

Beta is figured out conditional on alpha. If sample size is small,

beta will be big

If sample size is big, beta will be small

= 0.60

=0.02

log odds ratio-lo

g(9)

-log(

6)

-log(

3)

log(

1)

log(

3)

log(

6)

log(

9)

log(

12)

log(

3.6)

log odds ratio-lo

g(9)

-log(

6)

-log(

3)

log(

1)

log(

3)

log(

6)

log(

9)

log(

12)

log(

1.63

)

Page 30: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Power: 1- beta

Power is 1 - beta.

If sample size is small, power will be small

If sample size is large, power will be large

Power = 0.40

Power = 0.98

log odds ratio-lo

g(9)

-log(

6)

-log(

3)

log(

1)

log(

3)

log(

6)

log(

9)

log(

12)

log(

3.6)

log odds ratio-lo

g(9)

-log(

6)

-log(

3)

log(

1)

log(

3)

log(

6)

log(

9)

log(

12)

log(

1.63

)

Page 31: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Power/Sample Size Estimate

• Kids with Autism: N = 608

• Kids without Autism: N = 1216

“Using Fisher’s exact test, we have 80% power with alpha = 0.05 to detect an odds ratio of 3 if we enroll 608 children with autism and 1216 normal controls. This assumes that 3% of autistic children have been exposed to measles and 1% of the controls have been exposed.”

Page 32: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Sample Size Table(80% power, alpha 0.05)

OddsRatio

P1 P2 N1 N2 Total N

3 0.01 0.03 1216 608 1824

4 0.01 0.04 678 339 1017

5 0.01 0.05 458 229 687

7 0.01 0.07 270 135 405

10 0.01 0.09 189 95 284

15 0.01 0.13 116 58 174

20 0.01 0.17 82 41 123

Page 33: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Example 2: FAS and controls

• How many FAS children and controls do we need to detect a significant difference in MRI volumes?

• From previous research we can estimate (i.e. guess):– Volumes of cerebellar vermis in FAS kids are

approximately 400.

– It would be interesting if FAS kids had volumes 10% or more less than normal controls (i.e. 400 versus 450).

Page 34: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Sample size needed depends on overlap between FAS and control kids.

FAS

control

control

FAS

MRI Volumes

250 300 350 400 450 500 550 600 650

MRI Volumes

250 300 350 400 450 500 550 600 650

Page 35: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Two sample t-test• Same general approach as the odds ratio

• Define = difference in mean volumes = control mean - FAS mean

• H0: = 0

• Ha: = 50

• Same thing: which hypothesis is more reasonable based on our data?

• Note: Based on previous research, we can estimate that the standard deviaion of volumes is 70.

Page 36: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

What if N = 100 (50 per group)?

Alpha = 0.05

Delta = Difference in mean MRI Volumes

-60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120

Delta = Difference in mean MRI Volumes

-30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120

Beta = 0.06

Page 37: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

Power/Sample Size Options

• For power = 80%, alpha = 0.0532 FAS and 32 controls

• For power = 90%, alpha = 0.0543 FAS and 43 controls

“To achieve 80% power with a type I error of 5%, we require 32 FAS kids and 32 controls. This will allow us to detect a 10% difference in mean MRI volumes of cerebellar vermis (400 versus 450, respectively) assuming standard deviations of 70 in each group.”

Page 38: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

4 Statistical Plan

4.1 Primary outcome(s)

4.2 Statistical analysis

4.3 Sample size justification

-Explain justification in terms of statistics. Saying “we are confident that 10 subjects will provide….” is not sufficient.

Page 39: Lecture 2: Statistical Overview Elizabeth Garrett esg@jhu.edu Child Psychiatry Research Methods Lecture Series

General Biostatistics References• Practical Statistics for Medical Research.

Altman. Chapman and Hall, 1991.• Medical Statistics: A Common Sense Approach.

Campbell and Machin. Wiley, 1993• Principles of Biostatistics. Pagano and Gauvreau.

Duxbury Press, 1993.• Fundamentals of Biostatistics. Rosner. Duxbury

Press, 1993.