chapter 29nielsen/soci252/notes/soci252notes29.pdf · chapter 29 multifactor analysis of variance ....
TRANSCRIPT
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Chapter 29
Multifactor Analysis of
Variance
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Two Factors at Once?!
Unlike experiments we have seen thus far, the dart
throwing experiment mentioned in the text has not
one but two factors (distance and hand used).
Two factors do not confuse things, but rather improve
the experiment and analysis.
With two factors we have two hypotheses tests. Each
of those hypotheses asks whether the mean of the
response variable is the same for each of the
treatment levels.
Back in Chapter 13, we considered ways to remove
or avoid extra variation in designing experiments. A
two-factor experiment does just that.
Slide 29 - 3
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
An ANOVA Model
A model for one-way ANOVA represented each observation as the sum of simple components. It broke each observations into the sum of three effects: the grand mean, the treatment mean, and an error:
Now we have two factors, Hand and Distance. Our model should reflect the effects of both factors. Now we write:
ijy j ij
ijky j k ijk
Slide 29 - 4
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
When you write out a model, you’ll usually
find it clearer to name the factors rather
than using Greek letters:
In general, we would like to know whether
the mean dart accuracy changes with
changes in the levels of either factor. Our
null hypothesis on each factor is that the
effects of that treatment are all zero.
An ANOVA Model (cont)
y
ijk Hand effect
j Distance effect
k Error
ijk
0 1 2 0 1 2 3H : and H : Slide 29 - 5
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
The alternative hypotheses are that the treatment effects are not all equal.
We want to compare the differences across the treatment effects with the underlying variability within the treatment.
Now there are two factors and the underlying variability has the effect of both factors removed from it.
The error term holds the variability that’s left after removing the effects of both factors.
An ANOVA Model (cont)
Slide 29 - 6
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Plot the Data …
Start by using side-by-side boxplots across levels of each factor.
Look for outliers and correct them or omit them if you can’t correct them.
The problem with looking at the boxplots is that the responses at each level of the factor contain all levels of the other factor.
A better alternative would be to make boxplots for each factor level after removing the effects of the other factor.
We could compute a one-way ANOVA on one factor and find the residuals. Then, make boxplots of those residuals for each level of the other factor. We might call this display a partial boxplot.
Assumptions and Conditions
Slide 29 - 7
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont)
Boxplots of Accuracy by Hand show the effect of changing Hand less clearly than the corresponding partial boxplots on the right, which show the effects of Hand after the effect of Distance has been removed. The effect of changing hands is much easier to see without the unwanted variation caused by changing the other factor, Distance.
Slide 29 - 8
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont)
Additivity
Our model assumes that we can just add the
effects of these two factor levels together:
Just like linearity, additivity is an assumption. We
can’t know for sure, but we can check the Additive
Enough Condition.
For the effects of Hand and Distance to be additive,
changing hands must make the same difference in
accuracy no matter what distance you throw from.
ijk j k ijky Hand effect Distance effect Error
Slide 29 - 9
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont)
Boxplots of Accuracy by Distance conditional on each Hand. Changing the Distance seems to have a greater effect on left Hand accuracies.
Slide 29 - 10
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont)
When the effects of one factor change for different
levels of another factor, we say there is an interaction.
To show the interaction, we use an interaction plot.
How parallel is enough? We will see later. Slide 29 - 11
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont)
Independence Assumptions
An Independence Assumption for the two-factor
model is the same as in a one-way ANOVA. Hence,
the observations within each treatment group must be
independent of each other. However, no test can
verify that assumption.
Check the Randomization Condition. Were the data
collected with suitable randomization? This is true for
both surveys and experiments.
Slide 29 - 12
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont) Equal Variance Assumption
Like the one-way ANOVA, the two-factor ANOVA requires that the variances of all treatment groups be equal. It is the residuals after fitting both effects that we will pool for the Error Mean Square. We check this assumption by checking the Similar Variance Condition. We need to check for equal spread across all treatment groups.
Look at the residuals plotted against the predicted values. If the plot thickens (to one side or the other), it’s a sign that the variance is changing systematically. Consider re-expressing the response variable. If the variance is constant, the plot should be patternless. See figure next slide.
Slide 29 - 13
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont)
Scatterplot of residuals vs. predicted values from the two-way ANOVA model.
This figure doesn’t show changing variance, but it’s certainly not patternless. It shows a U-shaped pattern. This suggests the condition is violated.
Slide 29 - 14
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont)
You can also plot the residuals grouped by each factor.
Remember, we can’t use the original boxplots to check
changing spread.
Boxplots of the
residuals from the
two-way ANOVA
of the dart
accuracy
experiment show
roughly equal
variability when
plotted for each
factor. Slide 29 - 15
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Assumptions and Conditions (cont) Normal Error Assumption
As with one-way ANOVA, the F-tests require that the
underlying errors follow a Normal model. We’ll check
a corresponding Nearly Normal Condition with a
Normal probability plot or histogram of the residuals.
A Normal probability
plot of the residuals
from the two-way
ANOVA model for dart
accuracy seems
reasonably straight.
Slide 29 - 16
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
State what we want to know and the null hypotheses we wish to test. For two-factor ANOVA, the null hypotheses are that all the treatment groups have the same mean for each factor. The alternatives are that the means are not all equal.
Two-Factor Analysis of Variance (cont)
Step-by Step
Examine the side-by-side partial
boxplots of the data. Plot
Slide 29 - 17
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Two-Factor Analysis of Variance (cont)
Step-by Step
Think about the assumptions and
check the appropriate conditions. Plan
Show the ANOVA table.
Additive Enough Condition.
Randomization Condition.
Independence Assumption.
Nearly Normal Condition,
Outlier Condition.
Similar Variance Condition.
Before Testing, Check Conditions.
Slide 29 - 18
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Display the effects for each level
of the significant factors.
Remember, significance does
not guarantee that the
differences are meaningful.
Two-Factor Analysis of Variance (cont)
Step-by Step
Discuss the Tests. Mechanics
Tell what the F-tests mean. Interpretation Slide 29 - 19
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Back in Chapter 13 we thought about randomized block design.
The two-way ANOVA takes advantage of the blocking to remove unwanted variation. Removing unwanted variation makes the effects of the other factor easier to see.
Blocks
Slide 29 - 20
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Interactions
Interaction plot of
Accuracy by Hand and
Distance. The effect of
Distance appears to be
greater for the left
hand.
Up to now, we have assumed that whatever effects the two factors have on the response can be modeled by simply adding the separate effects together. What if that’s not good enough?
Slide 29 - 21
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Interactions (cont)
When the effect of one factor changes depending on the levels of the other factor, the factors are said to interact. We can model this interaction by adding a term to our model and testing whether the adjustment is worthwhile with an F-test.
The model can now be written:
The new term jk represents the interaction effect of
level j of factor 1 and level k of factor 2
j k jk ijkijk
y
Slide 29 - 22
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
When significant interaction is present, the best
way to display the results is with an interaction
plot.
When we have the interaction term in our model,
we recalculate the residuals. The residuals are
the values that are left over after accounting for
the overall mean, the effect of each factor and the
interaction effect.
Inference When Variables are Related
Slide 29 - 23
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Fitting the interaction term succeeded in removing structure from the error. The new model seems to satisfy the assumptions more successfully and so our inferences are likely to be closer to the truth.
Inference When Variables are Related (cont)
Residuals for the two-
way ANOVA of dart
accuracy with an
interaction term
included. Now there is
no U-shaped pattern.
Slide 29 - 24
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
With two factors in the model, there is always the possibility that they interact.
If the interaction plot shows parallel lines and the F-test for the interaction term is not significant, you may proceed with a two-way ANOVA model without interaction.
If the interaction is significant, the model must contain more than just the main effects. In this case, the interpretation of the analysis depends crucially on the interaction term, and the interaction plot shows most of what to Tell about the data.
Why Not Always Start by Including the
Interaction Term?
Slide 29 - 25
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
When there’s a significant interaction term, the
effect of each factor depends on the level of the
other factor, so it may not make sense to talk
about the effects of a factor by itself.
However, we may still be able to talk about the
main effects.
A significant interaction effect tells us more.
But if the lines in the interaction plot cross, you
need to be careful.
Why Not Always Start by Including the
Interaction Term? (cont)
Slide 29 - 26
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
You won’t want Tell anything about the main
effect.
Its F-test is irrelevant.
Whether the lines cross or not, including an
interaction term in our model is always a good
way to start.
But sometimes we just can’t.
A experiment that includes only one trial at each
treatment is called an unreplicated two-factor
design.
Why Not Always Start by Including the
Interaction Term? (cont)
Slide 29 - 27
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
We can’t distinguish an interaction effect from an
error unless we replicate the experiment.
Without replication, if we try to fit an interaction
term, there are exactly 0 degrees of freedom left
for the error.
That makes any inference or testing impossible.
When you do replicate, it’s best to replicate all
treatment conditions equally.
Why Not Always Start by Including the
Interaction Term? (cont)
Slide 29 - 28
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
State what we want to know and the
null hypotheses we wish to test. For
two-factor ANOVA with interaction, the
null hypotheses are that all the
treatment groups have the same mean
for each factor and that the interaction
effect is 0. The alternatives are that at
least one effect is not 0.
Two-Factor ANOVA with Interaction
Step-By-Step
Slide 29 - 29
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Examine the side-by-side partial
boxplots of the data.
Two-Factor ANOVA with Interaction (cont)
Step-By-Step
Plot
Think about the assumptions and
check the appropriate conditions. Plan
Show the ANOVAtable.
Slide 29 - 30
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Two-Factor ANOVA with Interaction (cont)
Step-By-Step
Show the table of means. Mechanics
Tell what the F-tests mean. Interpretation
Slide 29 - 31
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Beware of unreplicated designs unless you are sure there is no interaction.
Without replicating the experiment for each treatment combination, there is no way to distinguish the interaction terms from the residuals.
Don’t attempt to fit an interaction term to an unreplicated two-factor design.
If you have an unreplicated two-factor experiment or observational study, you will find that if you try to fit an interaction term, you will get a strange ANOVA table.
What Can Go Wrong?
Slide 29 - 32
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Be sure to fit an interaction term when it exists.
When the interaction effect is significant, don’t
interpret the main effects.
Main effects can be very misleading in the
presence of interaction terms. Look at the
interaction plot. (See next slide.)
What Can Go Wrong? (cont)
Slide 29 - 33
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
What Can Go Wrong? (cont)
An interaction plot of Yield by Temperature and
Pressure. The main effects are misleading. There is
no (main) effect of Pressure because the average
Yield at the two pressures is the same. That doesn’t
mean that Pressure
has no effect on the
Yield. In the
presence of an
interaction effect, be
careful when
interpreting the
main effects.
Slide 29 - 34
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
Always check for outliers.
Outliers can distort your conclusions.
Check for skewness.
If the underlying data distributions are skewed,
consider a transformation to make them more
symmetric.
Beware of unbalanced designs and designs with
empty cells.
Empty cells and other more serious violations of
balance require different methods and additional
assumptions.
What Can Go Wrong? (cont)
Slide 29 - 35
Copyright © 2012, 2008, 2005 Pearson Education, Inc.
We can extend the Analysis of Variance to
designs with more than one factor.
Partial boxplots are a good way to examine the
effect of each factor on the response.
Sometimes factors can interact with each other
and we can add an interaction term to our model
to account for the possible interaction.
We need to check the appropriate assumptions
and conditions as we did for the simple ANOVA.
What Have We Learned?
Slide 29 - 36