chapter 29nielsen/soci252/notes/soci252notes29.pdf · chapter 29 multifactor analysis of variance ....

Copyright © 2012, 2008, 2005 Pearson Education, Inc.

Chapter 29

Multifactor Analysis of

Variance


Two Factors at Once?!

Unlike experiments we have seen thus far, the dart

throwing experiment mentioned in the text has not

one but two factors (distance and hand used).

Two factors do not confuse things, but rather improve

the experiment and analysis.

With two factors we have two hypotheses tests. Each

of those hypotheses asks whether the mean of the

response variable is the same for each of the

treatment levels.

Back in Chapter 13, we considered ways to remove

or avoid extra variation in designing experiments. A

two-factor experiment does just that.

Slide 29 - 3


An ANOVA Model

A model for one-way ANOVA represented each observation as the sum of simple components. It broke each observations into the sum of three effects: the grand mean, the treatment mean, and an error:

Now we have two factors, Hand and Distance. Our model should reflect the effects of both factors. Now we write:

ijy j ij

ijky j k ijk

Slide 29 - 4


When you write out a model, you’ll usually

find it clearer to name the factors rather

than using Greek letters:

In general, we would like to know whether

the mean dart accuracy changes with

changes in the levels of either factor. Our

null hypothesis on each factor is that the

effects of that treatment are all zero.

An ANOVA Model (cont)

y

ijk Hand effect

j Distance effect

k Error

ijk

0 1 2 0 1 2 3H : and H : Slide 29 - 5


The alternative hypotheses are that the treatment effects are not all equal.

We want to compare the differences across the treatment effects with the underlying variability within the treatment.

Now there are two factors and the underlying variability has the effect of both factors removed from it.

The error term holds the variability that’s left after removing the effects of both factors.

An ANOVA Model (cont)

Slide 29 - 6


Plot the Data …

Start by using side-by-side boxplots across levels of each factor.

Look for outliers and correct them or omit them if you can’t correct them.

The problem with looking at the boxplots is that the responses at each level of the factor contain all levels of the other factor.

A better alternative would be to make boxplots for each factor level after removing the effects of the other factor.

We could compute a one-way ANOVA on one factor and find the residuals. Then, make boxplots of those residuals for each level of the other factor. We might call this display a partial boxplot.

Assumptions and Conditions

Slide 29 - 7


Assumptions and Conditions (cont)

Boxplots of Accuracy by Hand show the effect of changing Hand less clearly than the corresponding partial boxplots on the right, which show the effects of Hand after the effect of Distance has been removed. The effect of changing hands is much easier to see without the unwanted variation caused by changing the other factor, Distance.

Slide 29 - 8



Additivity

Our model assumes that we can just add the

effects of these two factor levels together:

Just like linearity, additivity is an assumption. We

can’t know for sure, but we can check the Additive

Enough Condition.

For the effects of Hand and Distance to be additive,

changing hands must make the same difference in

accuracy no matter what distance you throw from.

ijk j k ijky Hand effect Distance effect Error

Slide 29 - 9



Boxplots of Accuracy by Distance conditional on each Hand. Changing the Distance seems to have a greater effect on left Hand accuracies.

Slide 29 - 10



When the effects of one factor change for different

levels of another factor, we say there is an interaction.

To show the interaction, we use an interaction plot.

How parallel is enough? We will see later. Slide 29 - 11



Independence Assumptions

An Independence Assumption for the two-factor

model is the same as in a one-way ANOVA. Hence,

the observations within each treatment group must be

independent of each other. However, no test can

verify that assumption.

Check the Randomization Condition. Were the data

collected with suitable randomization? This is true for

both surveys and experiments.

Slide 29 - 12


Assumptions and Conditions (cont) Equal Variance Assumption

Like the one-way ANOVA, the two-factor ANOVA requires that the variances of all treatment groups be equal. It is the residuals after fitting both effects that we will pool for the Error Mean Square. We check this assumption by checking the Similar Variance Condition. We need to check for equal spread across all treatment groups.

Look at the residuals plotted against the predicted values. If the plot thickens (to one side or the other), it’s a sign that the variance is changing systematically. Consider re-expressing the response variable. If the variance is constant, the plot should be patternless. See figure next slide.

Slide 29 - 13



Scatterplot of residuals vs. predicted values from the two-way ANOVA model.

This figure doesn’t show changing variance, but it’s certainly not patternless. It shows a U-shaped pattern. This suggests the condition is violated.

Slide 29 - 14



You can also plot the residuals grouped by each factor.

Remember, we can’t use the original boxplots to check

changing spread.

Boxplots of the

residuals from the

two-way ANOVA

of the dart

accuracy

experiment show

roughly equal

variability when

plotted for each

factor. Slide 29 - 15


Assumptions and Conditions (cont) Normal Error Assumption

As with one-way ANOVA, the F-tests require that the

underlying errors follow a Normal model. We’ll check

a corresponding Nearly Normal Condition with a

Normal probability plot or histogram of the residuals.

A Normal probability

plot of the residuals

from the two-way

ANOVA model for dart

accuracy seems

reasonably straight.

Slide 29 - 16


State what we want to know and the null hypotheses we wish to test. For two-factor ANOVA, the null hypotheses are that all the treatment groups have the same mean for each factor. The alternatives are that the means are not all equal.

Two-Factor Analysis of Variance (cont)

Step-by Step

Examine the side-by-side partial

boxplots of the data. Plot

Slide 29 - 17



Step-by Step

Think about the assumptions and

check the appropriate conditions. Plan

Show the ANOVA table.

Additive Enough Condition.

Randomization Condition.

Independence Assumption.

Nearly Normal Condition,

Outlier Condition.

Similar Variance Condition.

Before Testing, Check Conditions.

Slide 29 - 18


Display the effects for each level

of the significant factors.

Remember, significance does

not guarantee that the

differences are meaningful.


Step-by Step

Discuss the Tests. Mechanics

Tell what the F-tests mean. Interpretation Slide 29 - 19


Back in Chapter 13 we thought about randomized block design.

The two-way ANOVA takes advantage of the blocking to remove unwanted variation. Removing unwanted variation makes the effects of the other factor easier to see.

Blocks

Slide 29 - 20


Interactions

Interaction plot of

Accuracy by Hand and

Distance. The effect of

Distance appears to be

greater for the left

hand.

Up to now, we have assumed that whatever effects the two factors have on the response can be modeled by simply adding the separate effects together. What if that’s not good enough?

Slide 29 - 21


Interactions (cont)

When the effect of one factor changes depending on the levels of the other factor, the factors are said to interact. We can model this interaction by adding a term to our model and testing whether the adjustment is worthwhile with an F-test.

The model can now be written:

The new term jk represents the interaction effect of

level j of factor 1 and level k of factor 2

j k jk ijkijk

y

Slide 29 - 22


When significant interaction is present, the best

way to display the results is with an interaction

plot.

When we have the interaction term in our model,

we recalculate the residuals. The residuals are

the values that are left over after accounting for

the overall mean, the effect of each factor and the

interaction effect.

Inference When Variables are Related

Slide 29 - 23


Fitting the interaction term succeeded in removing structure from the error. The new model seems to satisfy the assumptions more successfully and so our inferences are likely to be closer to the truth.

Inference When Variables are Related (cont)

Residuals for the two-

way ANOVA of dart

accuracy with an

interaction term

included. Now there is

no U-shaped pattern.

Slide 29 - 24


With two factors in the model, there is always the possibility that they interact.

If the interaction plot shows parallel lines and the F-test for the interaction term is not significant, you may proceed with a two-way ANOVA model without interaction.

If the interaction is significant, the model must contain more than just the main effects. In this case, the interpretation of the analysis depends crucially on the interaction term, and the interaction plot shows most of what to Tell about the data.

Why Not Always Start by Including the

Interaction Term?

Slide 29 - 25


When there’s a significant interaction term, the

effect of each factor depends on the level of the

other factor, so it may not make sense to talk

about the effects of a factor by itself.

However, we may still be able to talk about the

main effects.

A significant interaction effect tells us more.

But if the lines in the interaction plot cross, you

need to be careful.


Interaction Term? (cont)

Slide 29 - 26


You won’t want Tell anything about the main

effect.

Its F-test is irrelevant.

Whether the lines cross or not, including an

interaction term in our model is always a good

way to start.

But sometimes we just can’t.

A experiment that includes only one trial at each

treatment is called an unreplicated two-factor

design.



Slide 29 - 27


We can’t distinguish an interaction effect from an

error unless we replicate the experiment.

Without replication, if we try to fit an interaction

term, there are exactly 0 degrees of freedom left

for the error.

That makes any inference or testing impossible.

When you do replicate, it’s best to replicate all

treatment conditions equally.



Slide 29 - 28


State what we want to know and the

null hypotheses we wish to test. For

two-factor ANOVA with interaction, the

null hypotheses are that all the

treatment groups have the same mean

for each factor and that the interaction

effect is 0. The alternatives are that at

least one effect is not 0.

Two-Factor ANOVA with Interaction

Step-By-Step

Slide 29 - 29


Examine the side-by-side partial

boxplots of the data.

Two-Factor ANOVA with Interaction (cont)

Step-By-Step

Plot

Think about the assumptions and

check the appropriate conditions. Plan

Show the ANOVAtable.

Slide 29 - 30


Two-Factor ANOVA with Interaction (cont)

Step-By-Step

Show the table of means. Mechanics

Tell what the F-tests mean. Interpretation

Slide 29 - 31


Beware of unreplicated designs unless you are sure there is no interaction.

Without replicating the experiment for each treatment combination, there is no way to distinguish the interaction terms from the residuals.

Don’t attempt to fit an interaction term to an unreplicated two-factor design.

If you have an unreplicated two-factor experiment or observational study, you will find that if you try to fit an interaction term, you will get a strange ANOVA table.

What Can Go Wrong?

Slide 29 - 32


Be sure to fit an interaction term when it exists.

When the interaction effect is significant, don’t

interpret the main effects.

Main effects can be very misleading in the

presence of interaction terms. Look at the

interaction plot. (See next slide.)

What Can Go Wrong? (cont)

Slide 29 - 33



An interaction plot of Yield by Temperature and

Pressure. The main effects are misleading. There is

no (main) effect of Pressure because the average

Yield at the two pressures is the same. That doesn’t

mean that Pressure

has no effect on the

Yield. In the

presence of an

interaction effect, be

careful when

interpreting the

main effects.

Slide 29 - 34


Always check for outliers.

Outliers can distort your conclusions.

Check for skewness.

If the underlying data distributions are skewed,

consider a transformation to make them more

symmetric.

Beware of unbalanced designs and designs with

empty cells.

Empty cells and other more serious violations of

balance require different methods and additional

assumptions.


Slide 29 - 35


We can extend the Analysis of Variance to

designs with more than one factor.

Partial boxplots are a good way to examine the

effect of each factor on the response.

Sometimes factors can interact with each other

and we can add an interaction term to our model

to account for the possible interaction.

We need to check the appropriate assumptions

and conditions as we did for the simple ANOVA.

What Have We Learned?

Slide 29 - 36

chapter 29nielsen/soci252/notes/soci252notes29.pdf · chapter 29 multifactor analysis of variance ....

Documents