introduction to biostatistics/hypothesis testing brian healy, phd

56
Introduction to Introduction to Biostatistics/Hypoth Biostatistics/Hypoth esis Testing esis Testing Brian Healy, PhD Brian Healy, PhD

Upload: joseph-obrien

Post on 12-Jan-2016

258 views

Category:

Documents


1 download

TRANSCRIPT

Introduction to Introduction to Biostatistics/Hypothesis Biostatistics/Hypothesis

TestingTesting

Brian Healy, PhDBrian Healy, PhD

Course objectivesCourse objectives

Introduction to concepts of Introduction to concepts of biostatisticsbiostatistics– Type of dataType of data– Hypothesis testingHypothesis testing– p-valuep-value– Choosing the best statistical testChoosing the best statistical test– Study designStudy design– When you should get helpWhen you should get help

Statistical thinking, not math proofsStatistical thinking, not math proofs

Office hourOffice hour

Tuesday 9-11 in Room 2.140 of the Tuesday 9-11 in Room 2.140 of the Simches buildingSimches building

If you plan to come, please email me If you plan to come, please email me (([email protected]@partners.org) with a brief ) with a brief description of your data so that I can description of your data so that I can prepareprepare

Beyond the scopeBeyond the scope

Tutorial for a specific statistical Tutorial for a specific statistical packagepackage– I will show output from some packages I will show output from some packages

(STATA, SAS, GraphPad)(STATA, SAS, GraphPad) Topics that will be mentioned, but Topics that will be mentioned, but

not focused onnot focused on– Mixed modelsMixed models– Principal components analysisPrincipal components analysis– ROC curvesROC curves

Class objectivesClass objectives

Introduction to biostatisticsIntroduction to biostatistics– Stages of a research studyStages of a research study– Types of dataTypes of data– Hypothesis testHypothesis test– t-test t-test – Wilcoxon testWilcoxon test

Questions and requests for next timeQuestions and requests for next time

Research studyResearch study

I.I. Study designStudy design• Experimental question- What are you trying to Experimental question- What are you trying to

learn? How will you prove this?learn? How will you prove this?• Sample selection- Who are you going to study?Sample selection- Who are you going to study?

II.II. Data collectionData collection• What should be collected?What should be collected?

III.III. Analysis of dataAnalysis of data• Results- Was there any effect?Results- Was there any effect?• Conclusions- What does this all mean? To Conclusions- What does this all mean? To

whom do results apply?whom do results apply?

How is statistics related to each How is statistics related to each stage?stage?

I.I. Study designStudy design• Experimental question- Define Experimental question- Define outcomeoutcome, ,

sources of variability, unitsources of variability, unit and and analysis plananalysis plan• Sample selection- Sample size, type of sampleSample selection- Sample size, type of sample

II.II. Data collectionData collection• What to collect?What to collect?

III.III. Analysis of dataAnalysis of data• Results- Results- Hypothesis testHypothesis test• Conclusion- Significance of effect/generalizabilityConclusion- Significance of effect/generalizability

Experimental question: What? How?

Sample selection: Who? How many?

Collect Data

Analysis: Is there an effect?

Conclusion: To whom?

ExampleExample

Multiple sclerosis is a progressive Multiple sclerosis is a progressive neurological disorder neurological disorder

We would like to find treatments that help We would like to find treatments that help patientspatients

Unfortunately, it is very difficult to Unfortunately, it is very difficult to determine a patient’s disease course determine a patient’s disease course because there are many things going onbecause there are many things going on

How do we measure the change in the How do we measure the change in the disease?disease?

What is the outcome?What is the outcome?

Outcome variablesOutcome variables

An An outcome variableoutcome variable is is dependent dependent variablevariable of interest of interest

The common outcome variables in MS The common outcome variables in MS experiments are:experiments are:– Expanded disability status scale (EDSS)-Expanded disability status scale (EDSS)-

ordinal measure of disease severityordinal measure of disease severity– Presence/absence of disease progressionPresence/absence of disease progression– Expression a cytokine of interest (ex. IFN-Expression a cytokine of interest (ex. IFN-

))– Time to next relapseTime to next relapse

Types of variablesTypes of variables

Continuous variable: Age, expression levelContinuous variable: Age, expression level Dichotomous variable: Dead/alive, Wild Dichotomous variable: Dead/alive, Wild

type/mutanttype/mutant Categorical variable: Race, nominal scalesCategorical variable: Race, nominal scales Ordinal variable: Mild/Moderate/Severe, Ordinal variable: Mild/Moderate/Severe,

level of stat knowledgelevel of stat knowledge Count outcomes: Number of lesionsCount outcomes: Number of lesions Time to event outcome: Time to deathTime to event outcome: Time to death

Continuous variablesContinuous variables

Summary Summary statisticsstatistics– LocationLocation

MeanMean MedianMedian

– VariabilityVariability Standard Standard

deviationdeviation

GraphsGraphs

Dichotomous variablesDichotomous variables

Summary statisticsSummary statistics– TableTable– ProportionProportion

GraphGraph

MalMalee

FemalFemalee

NumbeNumberr

2020 3030

PercenPercentt

4040 6060Categorical variablesCategorical variables

Summary statistics: Summary statistics: – TableTable– ProportionProportion

GraphsGraphs

Provider of mental health

Medical professional

Mental health

professional

Other

Is this the correct Is this the correct interpretation?interpretation?

Ordinal variableOrdinal variable

Summary statisticsSummary statistics– Mean- may be Mean- may be

appropriate for appropriate for scales or scales or questionnaires questionnaires

– Ordered table- Ordered table- appropriate for appropriate for ordered categories ordered categories with uncertain with uncertain difference in difference in magnitudemagnitude

– RankRank

MildMild ModeratModeratee

SevereSevere

NumbeNumberr

1414 1515 44

Time to eventTime to event

Survival Survival timetime– MedianMedian

GraphGraph– Kaplan-Kaplan-

Meier Meier curvecurve

Description vs. comparisonDescription vs. comparison

In many instances, description of the In many instances, description of the outcome variable is the focusoutcome variable is the focus– Estimate and confidence intervalEstimate and confidence interval

Based on results from survey, Based on results from survey, description is not enough, rather description is not enough, rather comparison is of interestcomparison is of interest

What do we need for comparison?What do we need for comparison?– Second variable-usually called Second variable-usually called

explanatory variableexplanatory variable

Explanatory variablesExplanatory variables

Explanatory variablesExplanatory variables are the are the independent variablesindependent variables that we that we believe affect the outcome variables believe affect the outcome variables in some wayin some way

In MS clinical studies, this can beIn MS clinical studies, this can be– Presence of diseasePresence of disease– Intervention/treatment (clinical trial)Intervention/treatment (clinical trial)– GenotypeGenotype– Expression of another cytokineExpression of another cytokine– TimeTime

Types of analysis-independent Types of analysis-independent samplessamples

OutcomeOutcome ExplanatoryExplanatory AnalysisAnalysis

ContinuousContinuous DichotomousDichotomous t-test, t-test, Wilcoxon Wilcoxon testtest

ContinuousContinuous CategoricalCategorical ANOVA, linear ANOVA, linear regressionregression

ContinuousContinuous ContinuousContinuous Correlation, Correlation, linear regressionlinear regression

DichotomousDichotomous DichotomousDichotomous Chi-square test, Chi-square test, logistic logistic regressionregression

DichotomousDichotomous ContinuousContinuous Logistic Logistic regressionregression

Time to eventTime to event DichotomousDichotomous Log-rank testLog-rank test

Comparison of two groupsComparison of two groups Question: Is the expression of CD-26 Question: Is the expression of CD-26

different in relapsing MS patients different in relapsing MS patients compared to progressive MS patients? compared to progressive MS patients?

What is the outcome?What is the outcome?– We measure CD-26 using flow cytometryWe measure CD-26 using flow cytometry– Continuous variableContinuous variable

What is the explanatory variable?What is the explanatory variable?– Group membership (relapsing vs. progressive)Group membership (relapsing vs. progressive)– Dichotomous variableDichotomous variable

How would you answer this question?How would you answer this question?– Collect a sample from each groupCollect a sample from each group

ResultsResults

Mean values:Mean values:– Relapsing patients=34.6Relapsing patients=34.6– Progressive patients=41.8Progressive patients=41.8

The progressive patients had greater The progressive patients had greater production, but are we certain that production, but are we certain that there is a difference between these?there is a difference between these?– Statistically significantStatistically significant– Clinically meaningfulClinically meaningful

What is the variability in the data?What is the variability in the data?

Means in Means in two groups two groups are the are the same in both same in both experimentsexperiments

Is there a Is there a difference in difference in Experiment Experiment 1? 1?

In In Experiment Experiment 2?2?

Hypothesis Hypothesis testtest

Experiment 1

Experiment 2

Reasons for differences Reasons for differences between groupsbetween groups

Actual effect-when there is a difference Actual effect-when there is a difference between the two groupsbetween the two groups

ChanceChance BiasBias ConfoundingConfounding Statistical tests are designed to Statistical tests are designed to

determine if the observed difference determine if the observed difference between the groups was likely due to between the groups was likely due to chancechance

Chance experimentChance experiment

Experiment: I flip a coinExperiment: I flip a coin– If heads, I win $1If heads, I win $1– If tails, you win $1If tails, you win $1

What if the following happened?What if the following happened?– 2 heads in a row2 heads in a row– 5 heads in a row5 heads in a row– 15 heads in a row15 heads in a row

Are you suspicious?Are you suspicious?

Null hypothesisNull hypothesis

In all experiments, we have an initial beliefIn all experiments, we have an initial belief– In coin example, you believed that there was a In coin example, you believed that there was a

50/50 chance of heads50/50 chance of heads We always set up our null hypothesis so We always set up our null hypothesis so

that we can reject the null hypothesis. that we can reject the null hypothesis. For our study, the null hypothesis is that For our study, the null hypothesis is that

the mean in the relapsing MS patients is the mean in the relapsing MS patients is the same as the mean in the progressive the same as the mean in the progressive MS patients.MS patients.

What is rare enough?What is rare enough? This curve is the This curve is the

distribution of the distribution of the statistic under the statistic under the null hypothesisnull hypothesis

If the observed If the observed value is value is sufficiently rare sufficiently rare under the null, we under the null, we reject the null reject the null hypothesishypothesis

0.05 corresponds 0.05 corresponds to a 1 out of 20 to a 1 out of 20 chancechance

0.05

0.05

P-valueP-value

Definition:Definition: the probability of the the probability of the observed result or something more observed result or something more extreme under the null hypothesisextreme under the null hypothesis

If the probability of the event is If the probability of the event is sufficiently small, we say that the sufficiently small, we say that the difference is likely not due simply to difference is likely not due simply to chance and we have an actual effect.chance and we have an actual effect.

If p-value is small enough, we call the If p-value is small enough, we call the effect effect statistically significantstatistically significant

What if p>0.05?What if p>0.05?

In this case, the difference between the In this case, the difference between the groups is not statistically significant (at the groups is not statistically significant (at the 0.05 level).0.05 level).

““If two values are not significantly different, If two values are not significantly different, then by definition are they not identical?”then by definition are they not identical?”– NoNo– The two groups are not significantly different, but The two groups are not significantly different, but

we cannot say that they are the samewe cannot say that they are the same– We fail to reject the null hypothesis; we do not We fail to reject the null hypothesis; we do not

accept that the null is trueaccept that the null is true– Bayesian statisticsBayesian statistics

BiasBias

Is there Is there something something in my in my design that design that led to my led to my result?result?

Steps for hypothesis testingSteps for hypothesis testing

1)1) State null hypothesisState null hypothesis2)2) State type of data for explanatory and State type of data for explanatory and

outcome variableoutcome variable3)3) Determine appropriate statistical testDetermine appropriate statistical test4)4) State summary statistics if possibleState summary statistics if possible5)5) Calculate p-value (stat package)Calculate p-value (stat package)6)6) Decide whether to reject or not reject the Decide whether to reject or not reject the

null hypothesisnull hypothesis• NEVER accept nullNEVER accept null

7)7) Write conclusionWrite conclusion

ExampleExample

1)1) HH00: mean: meanrelapsingrelapsing =mean =meanprogressiveprogressive

2)2) Explanatory: group membership- Explanatory: group membership- dichotomousdichotomous

Outcome: cytokine production-Outcome: cytokine production-continuouscontinuous

• What test can we use to compare a What test can we use to compare a continuous outcome with a continuous outcome with a dichotomous explanatory variable?dichotomous explanatory variable?

Two sample t-testTwo sample t-test

A two sample t-test is a test for A two sample t-test is a test for differences in means in two samples.differences in means in two samples.

Assumption: Underlying population Assumption: Underlying population distribution is normaldistribution is normal

The method of calculating the p-value The method of calculating the p-value is beyond the scope of this class, but it is beyond the scope of this class, but it is easily found on-lineis easily found on-line

Can get p-value from statistical Can get p-value from statistical packagepackage

ResultsResults

4)4) meanmeanrelapsingrelapsing =34.6, mean =34.6, meanprogressiveprogressive=41.8=41.85)5) Calculate p-value:Calculate p-value:

Two Sample t-testTwo Sample t-testt = -1.19, df = 22.8, p-value = t = -1.19, df = 22.8, p-value = 0.250.2595 percent confidence interval: (-5.3, 95 percent confidence interval: (-5.3, 19.7) 19.7)

6)6) Fail to reject the null hypothesis because Fail to reject the null hypothesis because p-value is less than 0.05p-value is less than 0.05

7)7) Conclusion: The difference between the Conclusion: The difference between the groups is not statistically significant.groups is not statistically significant.

p-value

summary statistics

summary statistics

p-value

Significant Significant difference in difference in experiment 1experiment 1

Added Added variance in variance in experiment 2 experiment 2 led to non-led to non-significant significant resultresult

What does What does this mean?this mean?

Experiment 1

Experiment 2

p<0.0001

p=0.25

Types of analysis-independent Types of analysis-independent samplessamples

OutcomeOutcome ExplanatoryExplanatory AnalysisAnalysis

ContinuousContinuous DichotomousDichotomous t-testt-test, Wilcoxon , Wilcoxon testtest

ContinuousContinuous CategoricalCategorical ANOVA, linear ANOVA, linear regressionregression

ContinuousContinuous ContinuousContinuous Correlation, Correlation, linear regressionlinear regression

DichotomousDichotomous DichotomousDichotomous Chi-square test, Chi-square test, logistic logistic regressionregression

DichotomousDichotomous ContinuousContinuous Logistic Logistic regressionregression

Time to eventTime to event DichotomousDichotomous Log-rank testLog-rank test

ExampleExample

Experimental Autoimmune Experimental Autoimmune Encephalomyelitis (EAE) in mice is the Encephalomyelitis (EAE) in mice is the animal model for multiple sclerosis (MS)animal model for multiple sclerosis (MS)

The effect of various interventions are first The effect of various interventions are first tested in micetested in mice

A common hypothesis is that treating mice A common hypothesis is that treating mice with a specific intervention will either with a specific intervention will either inhibit or promote the diseaseinhibit or promote the disease

How do we measure the change in the How do we measure the change in the disease?disease?

What is the outcome?What is the outcome?

Monkey wrenchMonkey wrench

What if What if underlying underlying data is not data is not normal?normal?

An outcome in An outcome in an EAE study an EAE study is the disease is the disease grade, which grade, which is an ordinal is an ordinal scalescale

Disease severity scores

01234567

0 1 2 3 4

Score

Fre

qu

en

cy

KO

Wild-type

Wilcoxon rank sum testWilcoxon rank sum test

Wilcoxon rank sum test is a Wilcoxon rank sum test is a nonparametric test that allows group nonparametric test that allows group comparison ifcomparison if– Ordinal dataOrdinal data– Rank dataRank data– Underlying data are non-normalUnderlying data are non-normal– OutliersOutliers

Steps for hypothesis test using a Steps for hypothesis test using a Wilcoxon test are exactly the sameWilcoxon test are exactly the same

Hypothesis testHypothesis test

1)1) HH00: median: medianKOKO =median =medianWild typeWild type 2)2) Predictor: dichotomousPredictor: dichotomous

Outcome: ordinalOutcome: ordinal3)3) Test: Wilcoxon rank sum testTest: Wilcoxon rank sum test

4)4) MedianMedianKOKO=1; Median=1; MedianWild typeWild type=2=25)5) Calculate p-value: p = 0.19Calculate p-value: p = 0.196)6) Fail to reject null hypothesisFail to reject null hypothesis7)7) There is not significant evidence of a There is not significant evidence of a

difference between the two groupsdifference between the two groups

p-value

Dependent observationsDependent observations

Up to now we have assumed that Up to now we have assumed that observations are independentobservations are independent

What if we have related observations?What if we have related observations?– On and off treatment on the same subjectOn and off treatment on the same subject– Left and right eye from the same subjectLeft and right eye from the same subject– Multiple observations over timeMultiple observations over time

The big advantage of dependent The big advantage of dependent observations is the same subject is observations is the same subject is observed under multiple conditionsobserved under multiple conditions

Independent tests fail to account for Independent tests fail to account for correlationcorrelation

ExampleExample

In MS patients, the intensity of areas of In MS patients, the intensity of areas of the brain on T1-weighted MRI are of the brain on T1-weighted MRI are of interest to determine if there is damageinterest to determine if there is damage

In particular, the intensity of the In particular, the intensity of the putamen of left and right side of the putamen of left and right side of the brain was measured in 35 MS patientsbrain was measured in 35 MS patients

We believed that there would be more We believed that there would be more significant hypointensity in the left sidesignificant hypointensity in the left side

There may There may a a difference difference between between the groupsthe groups

Are we Are we interested interested just in the just in the mean at mean at each time each time point?point?

The The difference difference between between the time the time points is points is the the outcomeoutcome

Is the Is the difference difference significantlsignificantly different y different from 0?from 0?

Hypothesis testHypothesis test

1)1) HH00: mean: meanleftleft=mean=meanrightright

2)2) Paired continuous data with side as Paired continuous data with side as explanatory variableexplanatory variable

3)3) Paired t-testPaired t-test4)4) Mean difference=0.063Mean difference=0.0635)5) p-value=0.046p-value=0.0466)6) Since the p-value is less than 0.05, we Since the p-value is less than 0.05, we

can reject the null hypothesis can reject the null hypothesis 7)7) We conclude that the intensity is unequal We conclude that the intensity is unequal

in the two sides of the brainin the two sides of the brain

p-value

Types of analysis-dependent Types of analysis-dependent samplessamples

OutcomeOutcome PredictorPredictor AnalysisAnalysis

ContinuousContinuous DichotomousDichotomous Paired t-test, Paired t-test, Wilcoxon signed Wilcoxon signed rank testrank test

ContinuousContinuous CategoricalCategorical Repeated Repeated measures measures ANOVAANOVA

ContinuousContinuous ContinuousContinuous Mixed modelMixed model

DichotomousDichotomous DichotomousDichotomous McNemar’s testMcNemar’s test

DichotomousDichotomous ContinuousContinuous Repeated Repeated measures measures logistic logistic regressionregression

Other dependent samplesOther dependent samples

Continuous outcome/categorical Continuous outcome/categorical explanatory variableexplanatory variable– Subject is measured under three Subject is measured under three

conditionsconditions– Subject is measures at three time pointsSubject is measures at three time points

Each dot Each dot represents represents an an observation observation for a mouse for a mouse at each of at each of the markersthe markers

There was a There was a negative negative control in this control in this experiment experiment (Group = 0)(Group = 0)

What should we do?What should we do?

What is the hypothesis?What is the hypothesis?– Is the expression of any of the markers Is the expression of any of the markers

different than the control?different than the control? Repeated measures ANOVA/mixed Repeated measures ANOVA/mixed

modelmodel– Can proceed with normal hypothesis Can proceed with normal hypothesis

testtest Must always think about assumptions Must always think about assumptions

of modelof model– Do we have equal variance?Do we have equal variance?

Consult a statisticianConsult a statistician

Why use dependent Why use dependent samples?samples?

Sometimes it is required based on Sometimes it is required based on the studythe study

Often can increase power depending Often can increase power depending on the outcome because one major on the outcome because one major source of variability is accounted forsource of variability is accounted for– Changes over timeChanges over time

Consult a statistician if you want to Consult a statistician if you want to determine the best study designdetermine the best study design

Helpful websiteHelpful website

http://www.ats.ucla.edu/stat/stata/whhttp://www.ats.ucla.edu/stat/stata/whatstat/default.htmatstat/default.htm

Shows how to complete many of Shows how to complete many of these analyses in various statistical these analyses in various statistical packagespackages

What we learned (hopefully)What we learned (hopefully)

Using your outcome and predictor to Using your outcome and predictor to determine the correct analysisdetermine the correct analysis

p-valuep-value T-testT-test Wilcoxon testWilcoxon test