nemours biomedical research statistics april 16, 2009 tim bunnell, ph.d. & jobayer hossain,...

34
Nemours Biomedical Research Statistics April 16, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility

Post on 19-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Nemours Biomedical Research

Statistics

April 16, 2009Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D.

Nemours Bioinformatics Core Facility

Nemours Biomedical Research

Experimental Design Terminology

• An Experimental Unit is the entity on which measurement or an

observation is made. For example, subjects are experimental units

in most clinical studies.

• Homogeneous Experimental Units: Units that are as uniform as

possible on all characteristics that could affect the response.

• Randomization is the process of assigning experimental units

randomly to different experimental groups.

• Replication is the repetition of an entire experiment or portion of an experiment under two or more sets of conditions.

Nemours Biomedical Research

Experimental Design Terminology

• A Factor is a controllable independent variable that is being

investigated to determine its effect on a response. E.g.

treatment group is a factor.

• Factors can be fixed or random

– Fixed -- the factor can take on a discrete number of values

and these are the only values of interest.

– Random -- the factor can take on a wide range of values

and one wants to generalize from specific values to all

possible values.

• Each specific value of a factor is called a level. E.g. treatment

group: A, B, and placebo. Then all these are three levels.

Nemours Biomedical Research

Experimental Design Terminology

• Effect is the change in the average response

between two factor levels.

• That is, factor effect = average response at one level

– average response at a second level.

Nemours Biomedical Research

Experimental Design Terminology

• Interaction is the joint factor effects in which the effect of

one factor depends on the levels of the other factors.

No interaction effect of factor A and B

Interaction effect of factor A and B

Nemours Biomedical Research

Analysis of Variance (ANOVA)

• The analysis of variance (ANOVA) is a technique of decomposing the total

variability of a response variable into:

• Variability due to the experimental factor(s) and…

• Variability due to error (i.e., factors that are not accounted for in the

experimental design).

• The basic purpose of ANOVA is to test the equality of several means.

• A fixed effect model includes only fixed factors in the model.

• A random effect model includes only random factors in the model.

• A mixed effect model includes both fixed and random factors in the model.

Nemours Biomedical Research

The basic ANOVA situation

Type of variables: Quantitative response and Categorical (factor) predictors (independent variable).

Main Question: Are mean response measures of different groups are equal?

One categorical variable with only 2 levels (groups): 2-sample t-test

One categorical variable with more than two levels (groups): One way ANOVA

Two or more categorical variable, each with at least two or more levels (groups) of each:

Factorial ANOVA

Nemours Biomedical Research

Graphical Investigation

Graphical investigation:

• side-by-side box plots

• multiple histograms

Side by Side Boxplots

Nemours Biomedical Research

One-way analysis of Variance

• One factor of k levels or groups. E.g., 3 treatment groups in our

default data

• Total variation of observations (SST) can be split in two components:

variation between groups (SSG) and variation within groups (SSE).

• Variation between groups is due to the difference in different groups.

E.g. different treatment groups or different doses of the same

treatment.

• Variation within groups is the inherent variation among the

observations within each group.

• Completely randomized design (CRD) is an example of one-way

analysis of variance.

Nemours Biomedical Research

One-way analysis of Variance

• Model:– yij = µ + ai + eij

– Where yij is the ith observation of the jth group– ai is the effect of the ith group– µ is the grand mean and eij is the error.

• Assumptions:– Observations yij are independent.

– eij are normally distributed with mean zero and constant standard deviation.

– The second assumption implies that response variable for each group is normal (Check using q-q plot, histogram, or test for normality) and standard deviations for all groups are equal (rule of thumb: ratio of largest to smallest are approximately 2:1).

Nemours Biomedical Research

One-way analysis of Variance

• Hypothesis:Ho: Means of all groups are equal.Ha: At least one of them is not equal to other.

– doesn’t say how or which ones differ.– Can follow up with “multiple comparisons”

• ANOVA Table for one way classified data

Sources of Variation

Sum of Squares

df Mean Sum of Squares

F-Ratio

Group SSG k-1 MSG=SSG/k-1 F=MSG/MSE

Error SSE n-k MSE=SSE/n-k

Total SST n-1

Note: Large F means that MSG is large compared to MSE

Nemours Biomedical Research

Summary One-way ANOVA

Significance of the differences between the groups depends on

• the difference in the means

• the standard deviations of each group

• the sample sizes

• A useful web (thanks Bette for sending this website for the class):

www.psych.utah.edu/stat/introstats/anovaflash.html

ANOVA determines P-value from the F statistic

Nemours Biomedical Research

Multiple comparisons

• If the F test is significant in ANOVA table, then we intend to find the pairs of groups are significantly different. Following are the commonly used procedures:

– Fisher’s Least Significant Difference (LSD)

– Tukey’s HSD method

– Bonferroni’s method

– Scheffe’s method

– Dunn’s multiple-comparison procedure

– Dunnett’s Procedure

Nemours Biomedical Research

One-way ANOVA – Rcmdr Demo

• Make sure that group variables are in factor form.

• Data->Manage variables in active data set -> Convert numeric

variables to factor -> pick variables, select supply level names or Use

numbers for Factor levels, Give a new name or leave as the same

name.

• To run one-way ANOVA:

• Statistics -> Means-> One-way ANOVA -> Model Name, pick

response variable (e.g, PLUC.post), pick group variable (e.g. grp),

select pairwise comparisons of means

Nemours Biomedical Research

One-way ANOVA output: Pluc.pre on treatment groups

> Model1 <- aov(PLUC.post ~ grp, data=data)

> summary(Model1)

Df Sum Sq Mean Sq F value Pr(>F)

grp 2 157.282 78.641 18.844 5.225e-07 ***

Residuals 57 237.880 4.173

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> numSummary(data$PLUC.post , groups=data$grp, statistics=c("mean", "sd"))

mean sd n

1 10.15657 2.227334 20

2 12.15339 1.874268 20

3 14.12242 2.011497 20

Nemours Biomedical Research

One-way ANOVA output: Pluc.pre on treatment groups

> .Pairs <- glht(Model1, linfct = mcp(grp = "Tukey"))

> confint(.Pairs)

Simultaneous Confidence Intervals

Multiple Comparisons of Means: Tukey Contrasts

Fit: aov(formula = PLUC.post ~ grp, data = data)

Estimated Quantile = 2.4064

95% family-wise confidence level

Linear Hypotheses:

Estimate lwr upr

2 - 1 == 0 1.9968 0.4422 3.5514

3 - 1 == 0 3.9658 2.4113 5.5204

3 - 2 == 0 1.9690 0.4144 3.5236

Nemours Biomedical Research

One-way ANOVA output: Pluc.pre on treatment groups

1 2 3 4 5

3 - 2

3 - 1

2 - 1 (

(

(

)

)

)

95% family-wise confidence level

Linear Function

Nemours Biomedical Research

Analysis of variance of factorial experiment (Two or

more factors)• Factorial experiment:

– The effects of the two or more factors including their

interactions are investigated simultaneously.

– For example, consider two factors A and B. Then total

variation of the response will be split into variation for A,

variation for B, variation for their interaction AB, and variation

due to error.

Nemours Biomedical Research

Analysis of variance of factorial experiment (Two or

more factors)• Model with two factors (A, B) and their interactions:

• Assumptions: The same as in One-way ANOVA.

Nemours Biomedical Research

Analysis of variance of factorial experiment (Two or

more factors)• Null Hypotheses:

• Hoa: Means of all groups of the factor A are equal.

• Hob: Means of all groups of the factor B are equal.

• Hoab:(αβ)ij = 0, i. e. two factors A and B are independent

Nemours Biomedical Research

Two Factor ANOVA

• ANOVA for two factors A and B with their interaction AB.

Kpr-1SSTTotal

MSE=SSE/

kp(r-1)

kp(r -1)SSEError

MSAB/MSEMSAB=SSAB/ (k-1)(p-1)

(k-1)(p-1)SSABInteraction Effect AB

MSB/MSEMSB=SSB/p -1P-1SSBMain Effect B

MSA/MSEMSA=SSA/k -1k-1SSAMain Effect A

F-RatioMean Sum of Squares

dfSum of Squares

Sources of Variation

Kpr-1SSTTotal

MSE=SSE/

kp(r-1)

kp(r -1)SSEError

MSAB/MSEMSAB=SSAB/ (k-1)(p-1)

(k-1)(p-1)SSABInteraction Effect AB

MSB/MSEMSB=SSB/p -1P-1SSBMain Effect B

MSA/MSEMSA=SSA/k -1k-1SSAMain Effect A

F-RatioMean Sum of Squares

dfSum of Squares

Sources of Variation

Nemours Biomedical Research

Two-factor ANOVA - Rcmdr Demo

Statistics -> Means-> One-way ANOVA -> Model Name, pick response variable (e.g, PLUC.post), pick group variable (e.g. grp), select pairwise comparisons of means

Nemours Biomedical Research

Two- way ANOVA output: Pluc.pre on treatment groups

and pedModel2 <- (lm(PLUC.post ~ grp*Ped, data=data))

> Anova(Model2)

Anova Table (Type II tests)

Response: PLUC.post

Sum Sq Df F value Pr(>F)

grp 157.282 2 19.2016 5.024e-07 ***

Ped 0.842 1 0.2055 0.6521

grp:Ped 15.880 2 1.9387 0.1538

Residuals 221.159 54

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), mean, na.rm=TRUE)

+ # means

Ped

grp 1 2

1 9.591563 10.72158

2 12.397521 11.90926

3 14.798619 13.44621

Nemours Biomedical Research

> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), sd, na.rm=TRUE)

+ # std. deviations

Ped

grp 1 2

1 2.656100 1.645896

2 1.850990 1.964046

3 2.034403 1.840354

> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), function(x)

+ sum(!is.na(x))) # counts

Ped

grp 1 2

1 10 10

2 10 10

3 10 10

Two- way ANOVA output: Pluc.pre on treatment groups

and ped

Nemours Biomedical Research

Repeated Measures

• The term repeated measures refers to data sets with

multiple measurements of a response variable on the same

experimental unit or subject.

• In repeated measurements designs, we are often concerned

with two types of variability:

– Between-subjects - Variability associated with different groups of subjects

who are treated differently (equivalent to between groups effects in one-

way ANOVA)

– Within-subjects - Variability associated with measurements made on an

individual subject.

Nemours Biomedical Research

Repeated Measures

• Examples of Repeated Measures designs:

A. Two groups of subjects treated with different drugs for

whom responses are measured at six-hour increments for

24 hours. Here, DRUG treatment is the between-subjects

factor and TIME is the within-subjects factor.

B. Students in three different statistics classes (taught by

different instructors) are given a test with five problems and

scores on each problem are recorded separately. Here

CLASS is a between-subjects factor and PROBLEM is a

within-subjects factor.

Nemours Biomedical Research

Repeated Measures

• When measures are made over time as in example A we want to assess: how the dependent measure changes over time independent of treatment

(i.e. the main effect of time) how treatments differ independent of time (i.e., the main effect of

treatment) how treatment effects differ at different times (i.e. the treatment by time

interaction).

• Repeated measures require special treatment because: Observations made on the same subject are not independent of each

other. Adjacent observations in time are likely to be more correlated than non-

adjacent observations

Nemours Biomedical Research

Res

pons

e

Time

Repeated Measures

Nemours Biomedical Research

Repeated Measures

• Methods of repeated measures ANOVA

Univariate - Uses a single outcome measure.

Multivariate - Uses multiple outcome measures.

Mixed Model Analysis - One or more factors (other than subject)

are random effects.

• We will discuss only univariate approach

Nemours Biomedical Research

Repeated Measures

• Assumptions:

Subjects are independent.

The repeated observations for each subject follows a

multivariate normal distribution

The correlation between any pair of within subjects

levels are equal. This assumption is known as

sphericity.

Nemours Biomedical Research

Repeated Measures

• Test for Sphericity:

Mauchley’s test

• Violation of sphericity assumption leads to inflated F statistics and hence inflated

type I error.

• Three common corrections for violation of sphericity:

Greenhouse-Geisser correction

Huynh-Feldt correction

Lower Bound correction

• All these three methods adjust the degrees of freedom using a correction factor

called Epsilon.

• Epsilon lies between 1/k-1 to 1, where k is the number of levels in the within

subject factor.

Nemours Biomedical Research

Repeated Measures - R Demo

• The following R commands will arrange the data in default.csv for the ANOVA of repeated measures (you can also use R functions gl() for this manipulation), -

attach(x)

sid1<- c(sid,sid) grp1<-c(grp,grp)plc<- c(PLUC.pre, Pluc.post)time<-c(rep(1,60),rep(2,60))

• The following code is for repeated measures ANOVAsummary(aov(plc~factor(grp1)*factor(time) +

Error(factor(sid1))))

Nemours Biomedical Research

Repeated Measures R output:

Error: factor(sid1) Df Sum Sq Mean Sq F value Pr(>F) factor(grp1) 2 121.640 60.820 16.328 2.476e-06 ***Residuals 57 212.313 3.725 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(time) 1 172.952 172.952 33.8676 2.835e-07 ***factor(grp1):factor(time) 2 46.818 23.409 4.5839 0.01426 * Residuals 57 291.083 5.107 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nemours Biomedical Research

Thank you