one-way anova one-way analysis of variance. “did you wash your hands?” a student decided to...

50
One-Way ANOVA One-Way ANOVA One-Way Analysis of One-Way Analysis of Variance Variance

Upload: teresa-anthony

Post on 04-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

One-Way ANOVAOne-Way ANOVA

One-Way Analysis of VarianceOne-Way Analysis of Variance

““Did you wash your hands?”Did you wash your hands?”

A student decided to investigate just A student decided to investigate just how effective washing with soap is in how effective washing with soap is in eliminating bacteria.eliminating bacteria.

She tested 4 different methods – She tested 4 different methods – washing with water only, washing washing with water only, washing with regular soap, washing with with regular soap, washing with antibacterial soap, and spraying antibacterial soap, and spraying hands with antibacterial spray.hands with antibacterial spray.

This experiment consisted of one This experiment consisted of one factor (washing method) at 4 levels.factor (washing method) at 4 levels.

““Did you wash your hands?”Did you wash your hands?”

““Did you wash your hands?”Did you wash your hands?”

Ho: All the group means are equal.Ho: All the group means are equal. We want to see whether differences We want to see whether differences

as large as those observed in the as large as those observed in the experiment could naturally occur experiment could naturally occur even if the groups actually have even if the groups actually have equal means.equal means.

Look at the following two sets of Look at the following two sets of boxplots. Are the means different boxplots. Are the means different enough to reject Ho?enough to reject Ho?

““Did you wash your hands?”Did you wash your hands?”

““Did you wash your hands?”Did you wash your hands?”

Believe it or not, the means in both Believe it or not, the means in both sets are the same. (Look carefully at sets are the same. (Look carefully at the values of the medians in the the values of the medians in the boxplots.)boxplots.)

But why do they look so different? But why do they look so different? In the 2In the 2ndnd figure, the variation figure, the variation withinwithin

each group is so small that the each group is so small that the differences differences betweenbetween the means the means stand out.stand out.

One-Way ANOVAOne-Way ANOVA

The one-way analysis of variance is used The one-way analysis of variance is used to test the claim that three or more to test the claim that three or more population means are equalpopulation means are equal

This is an extension of the two This is an extension of the two independent samples t-testindependent samples t-test

One-Way ANOVAOne-Way ANOVA

The The responseresponse variable is the variable variable is the variable you’re comparingyou’re comparing

The The factorfactor variable is the categorical variable is the categorical variable being used to define the groupsvariable being used to define the groups We will assume We will assume kk samples (groups) samples (groups)

The The one-wayone-way is because each value is is because each value is classified in exactly one wayclassified in exactly one way Examples include comparisons by gender, Examples include comparisons by gender,

race, political party, color, etc.race, political party, color, etc.

One-Way ANOVAOne-Way ANOVA

Conditions or AssumptionsConditions or Assumptions The data are randomly sampledThe data are randomly sampled The variances of each sample are assumed The variances of each sample are assumed

equalequal The residuals are normally distributedThe residuals are normally distributed

One-Way ANOVAOne-Way ANOVA

The null hypothesis is that the means are all The null hypothesis is that the means are all equalequal

The alternative hypothesis is that at least one The alternative hypothesis is that at least one of the means is differentof the means is different Think about the Sesame StreetThink about the Sesame Street®® game where game where

three of these things are kind of the same, but one three of these things are kind of the same, but one of these things is not like the other. They don’t all of these things is not like the other. They don’t all have to be different, just one of them.have to be different, just one of them.

kOH μμμμ ==== . . . : 321

One-Way ANOVAOne-Way ANOVA

The statistics classroom is divided into The statistics classroom is divided into three rows: front, middle, and backthree rows: front, middle, and back

The instructor noticed that the further the The instructor noticed that the further the students were from him, the more likely students were from him, the more likely they were to miss class or use an instant they were to miss class or use an instant messenger during classmessenger during class

He wanted to see if the students further He wanted to see if the students further away did worse on the examsaway did worse on the exams

One-Way ANOVAOne-Way ANOVA

The ANOVA doesn’t test that one mean is less The ANOVA doesn’t test that one mean is less than another, only whether they’re all equal or than another, only whether they’re all equal or at least one is different.at least one is different.

BMOH μμμ ==F :

One-Way ANOVAOne-Way ANOVA

A random sample of the students in each A random sample of the students in each row was takenrow was taken

The score for those students on the The score for those students on the second exam was recordedsecond exam was recorded Front:Front: 82, 83, 97, 93, 55, 67, 5382, 83, 97, 93, 55, 67, 53 Middle:Middle: 83, 78, 68, 61, 77, 54, 69, 51, 6383, 78, 68, 61, 77, 54, 69, 51, 63 Back:Back: 38, 59, 55, 66, 45, 52, 52, 6138, 59, 55, 66, 45, 52, 52, 61

One-Way ANOVAOne-Way ANOVA

The summary statistics for the grades of each row The summary statistics for the grades of each row are shown in the table beloware shown in the table below

RowRow FrontFront MiddleMiddle BackBack

Sample sizeSample size 77 99 88

MeanMean 75.7175.71 67.1167.11 53.5053.50

St. DevSt. Dev 17.6317.63 10.9510.95 8.968.96

VarianceVariance 310.90310.90 119.86119.86 80.2980.29

One-Way ANOVAOne-Way ANOVA

VariationVariation Variation is the sum of the squares of the Variation is the sum of the squares of the

deviations between a value and the mean of deviations between a value and the mean of the valuethe value

Sum of Squares is abbreviated by SS and Sum of Squares is abbreviated by SS and often followed by a variable in parentheses often followed by a variable in parentheses such as SS(B) or SS(W) so we know which such as SS(B) or SS(W) so we know which sum of squares we’re talking aboutsum of squares we’re talking about

One-Way ANOVAOne-Way ANOVA

Are all of the values identical?Are all of the values identical? No, so there is some variation in the dataNo, so there is some variation in the data This is called the total variationThis is called the total variation Denoted SS(Total) for the total Sum of Denoted SS(Total) for the total Sum of

Squares (variation)Squares (variation) Sum of Squares is another name for variationSum of Squares is another name for variation

One-Way ANOVAOne-Way ANOVA

Are all of the sample means identical?Are all of the sample means identical? No, so there is some variation between the No, so there is some variation between the

groupsgroups This is called the This is called the between groupbetween group variation variation Sometimes called the variation due to the Sometimes called the variation due to the

factor (this is what your calculator calls it)factor (this is what your calculator calls it) Denoted SS(B) for Sum of Squares (variation) Denoted SS(B) for Sum of Squares (variation)

between the groupsbetween the groups

One-Way ANOVAOne-Way ANOVA

Are each of the values within each group Are each of the values within each group identical?identical? No, there is some variation within the groupsNo, there is some variation within the groups This is called the This is called the within groupwithin group variation variation Sometimes called the error variation (which is Sometimes called the error variation (which is

what your calculator calls it)what your calculator calls it) Denoted SS(W) for Sum of Squares Denoted SS(W) for Sum of Squares

(variation) within the groups(variation) within the groups

One-Way ANOVAOne-Way ANOVA

There are two sources of variationThere are two sources of variation the variation between the groups, SS(B), or the variation between the groups, SS(B), or

the variation due to the factorthe variation due to the factor the variation within the groups, SS(W), or the the variation within the groups, SS(W), or the

variation that can’t be explained by the factor variation that can’t be explained by the factor so it’s called the error variationso it’s called the error variation

One-Way ANOVAOne-Way ANOVA

Here is the basic one-way ANOVA tableHere is the basic one-way ANOVA table

SourceSource SSSS dfdf MSMS FF pp

BetweenBetween

WithinWithin

TotalTotal

One-Way ANOVAOne-Way ANOVA

Grand MeanGrand Mean The grand mean is the average of all the The grand mean is the average of all the

values when the factor is ignoredvalues when the factor is ignored It is a weighted average of the individual It is a weighted average of the individual

sample meanssample means

x =

nix ii=1

k

ni

i=1

k

k

kk

nnn

xnxnxnx

++++++

= . . . . . .

21

2211

One-Way ANOVAOne-Way ANOVA

Grand Mean for our example is 65.08Grand Mean for our example is 65.08

x =7(75.71) + 9(67.11) + 8(53.50)

7 + 9 + 8

x =1562

24x = 65.08

One-Way ANOVAOne-Way ANOVA

Between GroupBetween Group Variation, SS(B) Variation, SS(B) The The Between GroupBetween Group Variation is the variation Variation is the variation

between each sample mean and the grand between each sample mean and the grand meanmean

Each individual variation is weighted by the Each individual variation is weighted by the sample sizesample size

SS(B) = ni x i − x ( )2

i=1

k

SS(B) = n1 x 1 − x ( )2

+ n2 x 2 − x ( )2

+ ... + nk x k − x ( )2

One-Way ANOVAOne-Way ANOVA

The The Between GroupBetween Group Variation for our example is Variation for our example is SS(B)=1902SS(B)=1902

(Ok, so it doesn’t really round to 1902, but if I (Ok, so it doesn’t really round to 1902, but if I hadn’t rounded everything from the beginning it hadn’t rounded everything from the beginning it would have. Bear with me.)would have. Bear with me.)

SS(B) = 7 75.71− 65.08( )2

+ 9 67.11− 65.08( )2

+ 8 53.50 − 65.08( )2

SS(B) =1900.8376 ≈1902

One-Way ANOVAOne-Way ANOVA

Within Group Variation, SS(W)Within Group Variation, SS(W) The The Within GroupWithin Group Variation is the weighted total of Variation is the weighted total of

the individual variationsthe individual variations The weighting is done with the degrees of freedomThe weighting is done with the degrees of freedom The df for each sample is one less than the sample The df for each sample is one less than the sample

size for that sample.size for that sample.

One-Way ANOVAOne-Way ANOVA

Within GroupWithin Group Variation Variation

SS(W ) = df isi2

i=1

k

SS(W ) = df1s12 + df2s2

2 + ...+ dfksk2

One-Way ANOVAOne-Way ANOVA

The The within groupwithin group variation for our example variation for our example is 3386is 3386

SS(W ) = 6(310.90) + 8(119.86) + 7(80.29)

SS(W ) = 3386.31 ≈ 3386

One-Way ANOVAOne-Way ANOVA

After filling in the sum of squares, we have …After filling in the sum of squares, we have …

SourceSource SSSS dfdf MSMS FF pp

BetweenBetween 19021902

WithinWithin 33863386

TotalTotal 52885288

One-Way ANOVAOne-Way ANOVA

Degrees of FreedomDegrees of Freedom The The between groupbetween group df is one less than the df is one less than the

number of groupsnumber of groups We have three groups, so df(B) = 2We have three groups, so df(B) = 2

The The within groupwithin group df is the sum of the individual df is the sum of the individual df’s of each groupdf’s of each group The sample sizes are 7, 9, and 8The sample sizes are 7, 9, and 8 df(W) = 6 + 8 + 7 = 21df(W) = 6 + 8 + 7 = 21

The The totaltotal df is one less than the sample size df is one less than the sample size df(Total) = 24 – 1 = 23df(Total) = 24 – 1 = 23

One-Way ANOVAOne-Way ANOVA

Filling in the degrees of freedom gives this …Filling in the degrees of freedom gives this …

SourceSource SSSS dfdf MSMS FF pp

BetweenBetween 19021902 22

WithinWithin 33863386 2121

TotalTotal 52885288 2323

One-Way ANOVAOne-Way ANOVA

VariancesVariances The variances are also called the Mean of the The variances are also called the Mean of the

Squares and abbreviated by MS, often with an Squares and abbreviated by MS, often with an accompanying variable MS(B) or MS(W)accompanying variable MS(B) or MS(W)

They are an average squared deviation from the They are an average squared deviation from the mean and are found by dividing the variation by the mean and are found by dividing the variation by the degrees of freedomdegrees of freedom

MS = SS / dfMS = SS / df

Variance =Variation

df

One-Way ANOVAOne-Way ANOVA

MS(B)MS(B) = 1902 / 2= 1902 / 2 = 951.0= 951.0 MS(W)MS(W) = 3386 / 21= 3386 / 21 = 161.2= 161.2 MS(T)MS(T) = 5288 / 23= 5288 / 23 = 229.9= 229.9

Notice that the MS(Total) is NOT the sum of Notice that the MS(Total) is NOT the sum of MS(Between) and MS(Within).MS(Between) and MS(Within).

This works for the sum of squares SS(Total), This works for the sum of squares SS(Total), but not the mean square MS(Total)but not the mean square MS(Total)

The MS(Total) isn’t usually shown, as it isn’t The MS(Total) isn’t usually shown, as it isn’t used in the analysis.used in the analysis.

One-Way ANOVAOne-Way ANOVA

Completing the MS gives …Completing the MS gives …

SourceSource SSSS dfdf MSMS FF pp

BetweenBetween 19021902 22 951.0951.0

WithinWithin 33863386 2121 161.2161.2

TotalTotal 52885288 2323 229.9229.9

One-Way ANOVAOne-Way ANOVA

Special VariancesSpecial Variances The MS(Within) is also known as the pooled The MS(Within) is also known as the pooled

estimate of the variance since it is a weighted estimate of the variance since it is a weighted average of the individual variancesaverage of the individual variances Sometimes abbreviated Sometimes abbreviated

The MS(Total) is the variance of the response The MS(Total) is the variance of the response variable.variable. Not technically part of ANOVA table, but useful none the Not technically part of ANOVA table, but useful none the

lessless

2ps

One-Way ANOVAOne-Way ANOVA

F test statisticF test statistic An F test statistic is the ratio of two sample An F test statistic is the ratio of two sample

variancesvariances The MS(B) and MS(W) are two sample The MS(B) and MS(W) are two sample

variances and that’s what we divide to find F.variances and that’s what we divide to find F. F = MS(B) / MS(W)F = MS(B) / MS(W)

For our data, F = 951.0 / 161.2 = 5.9For our data, F = 951.0 / 161.2 = 5.9

One-Way ANOVAOne-Way ANOVA

Adding F to the table …Adding F to the table …

SourceSource SSSS dfdf MSMS FF pp

BetweenBetween 19021902 22 951.0951.0 5.95.9

WithinWithin 33863386 2121 161.2161.2

TotalTotal 52885288 2323 229.9229.9

The The FF Distribution Distribution Named for Sir Ronald Fisher.Named for Sir Ronald Fisher. The F-models depend on the two df The F-models depend on the two df

for the two groups.for the two groups.

One-Way ANOVAOne-Way ANOVA

The F test is a right tail testThe F test is a right tail test The F test statistic has an F distribution The F test statistic has an F distribution

with df(B) numerator df and df(W) with df(B) numerator df and df(W) denominator dfdenominator df

The p-value is the area to the right of the The p-value is the area to the right of the test statistictest statistic

P(FP(F2,212,21 > 5.9) = Fcdf(5.9,999,2,21) = .0093 > 5.9) = Fcdf(5.9,999,2,21) = .0093

One-Way ANOVAOne-Way ANOVA

Completing the table with the p-valueCompleting the table with the p-value

SourceSource SSSS dfdf MSMS FF pp

BetweenBetween 19021902 22 951.0951.0 5.95.9 0.0090.009

WithinWithin 33863386 2121 161.2161.2

TotalTotal 52885288 2323 229.9229.9

One-Way ANOVAOne-Way ANOVA

The p-value is 0.009, which is less than The p-value is 0.009, which is less than the significance level of 0.05, so we reject the significance level of 0.05, so we reject the null hypothesis.the null hypothesis.

The null hypothesis is that the means of The null hypothesis is that the means of the three rows in class were the same, but the three rows in class were the same, but we reject that, so at least one row has a we reject that, so at least one row has a different mean.different mean.

One-Way ANOVAOne-Way ANOVA

There is enough evidence to support the There is enough evidence to support the claim that there is a difference in the mean claim that there is a difference in the mean scores of the front, middle, and back rows scores of the front, middle, and back rows in class.in class.

The ANOVA doesn’t tell which row is The ANOVA doesn’t tell which row is different, you would need to look at different, you would need to look at confidence intervals or run post hoc tests confidence intervals or run post hoc tests to determine thatto determine that

Let’s Practice…Let’s Practice…

Fill in an ANOVA table.

ANOVA on TI-83ANOVA on TI-83

1)1) Enter the data into LEnter the data into L11, L, L22, L, L33, etc., etc.2)2) Press Press STATSTAT and move the cursor to and move the cursor to

TESTSTESTS3)3) Select choice F:Select choice F: ANOVA(ANOVA(4)4) Type each list followed by a comma. End Type each list followed by a comma. End

with ) and press with ) and press ENTER.ENTER.

Front:Front: 82, 83, 97, 93, 55, 67, 5382, 83, 97, 93, 55, 67, 53 Middle:Middle: 83, 78, 68, 61, 77, 54, 69, 51, 6383, 78, 68, 61, 77, 54, 69, 51, 63 Back:Back: 38, 59, 55, 66, 45, 52, 52, 6138, 59, 55, 66, 45, 52, 52, 61

After You Reject HoAfter You Reject Ho

When the null hypothesis is rejected, When the null hypothesis is rejected, we may want to know where the we may want to know where the difference among the means is.difference among the means is.

The are several procedures to do this. The are several procedures to do this. Among the most commonly used tests:Among the most commonly used tests: Scheffe test – use when the samples differ Scheffe test – use when the samples differ

in sizein size Tukey test – use when the samples are Tukey test – use when the samples are

equal in sizeequal in size

Scheffe TestScheffe Test Compare the means two at a time, Compare the means two at a time,

using all possible combinations of using all possible combinations of means.means.

To find the critical value To find the critical value F’F’ for the for the Scheffe test:Scheffe test:

⎟⎟⎠

⎞⎜⎜⎝

⎛+

−=

jiW

jiS

nns

XXF

11

)(

2

2

)C.V.(*)(' BdfF =

One-Way ANOVAOne-Way ANOVA

Summary statistics for the grades of each row:Summary statistics for the grades of each row:

RowRow FrontFront MiddleMiddle BackBack

Sample sizeSample size 77 99 88

MeanMean 75.7175.71 67.1167.11 53.5053.50

St. DevSt. Dev 17.6317.63 10.9510.95 8.968.96

VarianceVariance 310.90310.90 119.86119.86 80.2980.29

One-Way ANOVAOne-Way ANOVA

SourceSource SSSS dfdf MSMS FF pp

BetweenBetween 19021902 22 951.0951.0 5.95.9 0.0090.009

WithinWithin 33863386 2121 161.2161.2

TotalTotal 52885288 2323 229.9229.9

Scheffe TestScheffe Test

42.11

8

1

7

12.161

)50.5371.75(

11

)(

: versusFor

87.4

8

1

9

12.161

)50.5311.67(

11

)(

: versusFor

81.1

9

1

7

12.161

)11.6771.75(

11

)(

: versusFor

2

2

2

2

2

2

2

2

2

=⎟⎠

⎞⎜⎝

⎛ +

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛+

−=

=⎟⎠

⎞⎜⎝

⎛ +

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛+

−=

=⎟⎠

⎞⎜⎝

⎛ +

−=

⎟⎟⎠

⎞⎜⎜⎝

⎛+

−=

BFW

BFS

BF

BMW

BMS

BM

MFW

MFS

MF

nns

XXF

XX

nns

XXF

XX

nns

XXF

XX

Scheffe TestScheffe Test

Use F-Table to find the critical value (C.V.)Use F-Table to find the critical value (C.V.) alpha level = 0.05alpha level = 0.05

df(B)=2df(B)=2df(W)=21df(W)=21

F’ F’ = df(B)*(C.V.) = (2)*(3.47) = 6.94= df(B)*(C.V.) = (2)*(3.47) = 6.94

Scheffe TestScheffe Test

The only The only FF test value that is greater than test value that is greater than the critical value (6.94) is for the “front the critical value (6.94) is for the “front versus back” comparison.versus back” comparison.

Thus, the only significant difference is Thus, the only significant difference is between the mean exam score for the between the mean exam score for the front row and the mean exam score for the front row and the mean exam score for the back row.back row.