notes anova

12
Lecture Notes Chapter Ten: Analysis of Variance Randall Miller 1 | Page 1. Elements of a Designed Experiment Definition 10.1 The response variable is the variable of interest to be measured in the experiment. We also refer to the response as the dependent variable. Definition 10.2 Factors are those variables whose effect on the response is of interest to the experimenter. Quantitative factors are measured on a numerical scale, whereas qualitative factors are not (naturally) measured on a numerical scale. Definition 10.3 Factor levels are the values of the factor utilized in the experiment. Definition 10.4 The treatments of an experiment are the factor-level combinations utilized. Definition 10.5 An experimental unit is the object on which the response and factors are observed or measured. Definition 10.6 A designed experiment is an experiment in which the analyst controls the specification of the treatments and the method of assigning the experimental units to each treatment. An observational experiment is an experiment in which the analyst simply observes the treatments and the response on a sample of experimental units.

Upload: vignanaraj

Post on 18-Jul-2016

3 views

Category:

Documents


0 download

DESCRIPTION

Lecture Notes on Analysis of variance

TRANSCRIPT

Page 1: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

1 | P a g e

1. Elements of a Designed Experiment Definition 10.1 The response variable is the variable of interest to be measured in the experiment. We also refer to the response as the dependent variable. Definition 10.2 Factors are those variables whose effect on the response is of interest to the experimenter. Quantitative factors are measured on a numerical scale, whereas qualitative factors are not (naturally) measured on a numerical scale. Definition 10.3 Factor levels are the values of the factor utilized in the experiment. Definition 10.4 The treatments of an experiment are the factor-level combinations utilized. Definition 10.5 An experimental unit is the object on which the response and factors are observed or measured. Definition 10.6 A designed experiment is an experiment in which the analyst controls the specification of the treatments and the method of assigning the experimental units to each treatment. An observational experiment is an experiment in which the analyst simply observes the treatments and the response on a sample of experimental units.

Page 2: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

2 | P a g e

2. The Completely Randomized Design Definition 10.7 The completely randomized design is a design in which treatments are randomly assigned to the experimental units or in which independent random samples of experimental units are selected for each treatment. ANOVA F-test to Compare k Treatment Means: Completely Randomized Design

0 1 2: ...: At least two treatment means differ.

k

a

HH

µ µ µ= = =

Test statistic: MSTMSE

F =

Rejection region: F Fα> where Fα is based on ( )1 1kν = − numerator degrees of freedom

(associated with MST) and ( )2 n kν = − denominator degrees of freedom (associated with MSE). Conditions required for a Valid ANOVA F-test: Completely Randomized Design 1. The samples are randomly selected in an independent manner from the k treatment

populations. (This can be accomplished by randomly assigning the experimental units to the treatments.)

2. All k sampled populations have distributions that are approximately normal. 3. The k population variances are equal (i.e., 2 2 2

1 2 ... kσ σ σ= = = ). General ANOVA Summary Table for a Completely Randomized Design

Source df SS MS F Treatments 1k − SST SSTMST =

1k −

MSTMSE

Error n k− SSE SSEMSE =n k−

Total 1n − SS(Total) What Do You Do When the Assumptions are not Satisfied for the Analysis of Variance for a Completely Randomized Design? Answer: Use a nonparametric statistical method such as the Kruskal-Wallis H-test of section 14.5.

Page 3: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

3 | P a g e

Steps for Conducting an ANOVA for a Completely Randomized Design 1. Make sure that the design is truly completely randomized, with independent random samples

for each treatment. 2. Check the assumptions of normality and equal variances. 3. Create an ANOVA summary table that specifies the variability’s attributable to treatments

and error, making sure that those variability’s lead to the calculation of the F-statistic for testing the null hypothesis that the treatment means are equal in the population. Use a statistical software package to obtain the numerical results. If no such package is available, use the calculation formulas in Appendix B.

4. If the F-test leads to the conclusion that the means differ, a. Conduct a multiple-comparisons procedure for as many of the pairs of means as you

wish to compare. (See Section 10.3.) Use the results to summarize the statistically significant differences among the treatment means.

b. If desired, from confidence intervals for one or more individual treatment means. 5. If the F-test leads to the nonrejection of the null hypothesis that the treatment means are

equal, consider the following possibilities; a. The treatment means are equal; that is, the null hypothesis is true. b. The treatment means really differ, but other important factors affecting the response

are not accounted for by the completely randomized design. These factors inflate the sampling variability, as measured by MSE, resulting in smaller values of the F-statistic. Either increase the sample size for each treatment, or use a different experimental design (as in 10.4) that accounts for the other factors affecting the response.

[Note: Be careful not to automatically conclude that the treatment means are equal since the possibility of a Type II error must be considered if you accept 0H .]

Page 4: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

4 | P a g e

Formulas for the Calculations in the Completely Randomized Design

( )

2

21

Correction for mean

Total of all observations=

Total number of observations

n

ii

CM

y

n=

=

=

( )

( ) 2

1

SS Total Total sum of squares

Sum of squares of all observationsn

ii

CM y CM=

=

= = = −∑

22 21 2

1 2

SST = Sum of square for treatmentsSum of squares of treatments totals with

each square divided by the number ofobservations for that treatment

... k

k

CM

TT T CMn n n

= −

= + + + −

SSE = Sum of squares for error = SS(Total) – SST

MST = Mean square for treatments = SST1k −

MSE = Mean square for error = SSEn k−

F = Test statistic = MSTMSE

Where

( )

Total number of observationsNumber of treatmentsTotal for treatment 1,2,...,i

nkT i i k

==

= =

Page 5: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

5 | P a g e

3. Multiple Comparisons of Means Determining the Number of Pairwise Comparisons of Treatment Means In general, if there are k treatment means, there are

( )1 / 2c k k= −

pairs of means that can be compared. Guidelines for Selecting a Multiple-Comparison Method in ANOVA

Method Treatment sample sizes Types of comparisons Tukey Equal Pairwise Bonferroni Equal or unequal Pairwise Scheffé Equal or unequal General contrasts

4. The Randomized Block Design Definition 10.8 The randomized block design consists of a two-step procedure: 1. Matched sets of experimental units, called blocks, are formed, with each block consisting of k

experimental unites (where k is the number of treatments). The b blocks should consist of experimental units that are as similar as possible.

2. One experimental unit from each block is randomly assigned to each treatment, resulting in a total of n bk= responses.

ANOVA F-Test to Compare k Treatment Means: Randomized Block Design

0 1 2: ...: At least two treatment means differ.

k

a

HH

µ µ µ= = =

Test statistic: MSTMSE

F =

Rejection region: F Fα> where Fα is based on ( )1k − numerator degrees of freedom and

( )1n b k− − + denominator degrees of freedom. Conditions Required for a Valid ANOVA F-Test: Randomized Block Design 1. The b blocks are randomly selected, and all k treatments are applied (in random order) to each

block. 2. The distributions of observations corresponding to all bk block-treatment combinations are

approximately normal. 3. The bk block-treatment distributions have equal variances.

Page 6: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

6 | P a g e

General ANOVA Summary Table for a Randomized Block Design Source df SS MS F Treatments 1k − SST MST MST/MSE Blocks 1b − SSB MSB Error 1n k b− − + SSE MSE Total 1n − SS(Total)

Steps for Conducting an ANOVA for a Randomized Block Design 1. Be sure that the design of blocks (preferably, blocks of homogeneous experimental units) and

that each treatment is randomly assigned to one experimental unit in each block. 2. If possible, check the assumptions of normality and equal variances for all block-treatment

combinations. [Note: This may be difficult to do, since the design will likely have only one observation for each block-treatment combination.]

3. Create an ANOVA summary table that specifies the variability attributable to treatments, blocks, and error, and that leafs to the calculation of the F-statistic to test the null hypothesis that the treatment means are equal in the population. Use a statistical software package or the calculation formulas in Appendix B to obtain the necessary numerical ingredients.

4. If the F-statistic leads to the conclusion that the means differ, employ the Bonferroni or Tukey procedure, or a similar procedure, to conduct multiple comparisons of as many of the pairs of means as you wish. Use the results to summarize the statistically significant differences among the treatment means. Remember that, in general, the randomized block design cannot be employed to form confidence intervals for individual treatment means.

5. If the F-test leads to the nonrejection of the null hypothesis that the treatment means are equal, several possibilities exist:

a. The treatment means are equal: that is, the null hypothesis is true. b. The treatment means really differ, but other important factors affecting the response

are not accounted for by the randomized block design. These factors inflate the sampling variability, as measured by MSE, resulting in smaller values of the F-statistic. Either increase the sample size for each treatment, or conduct an experiment that accounts for the other factors affecting the response (as is to be done in Section 10.5). Do not automatically reach the former conclusion, since the possibility of a Type II error must be considered if you accept 0H .

6. If desired, conduct the F-test of the null hypothesis that the block means are equal. Rejection of this hypothesis lends statistical support to the utilization of the randomized block design.

What Do You Do When the Assumptions Are Not Satisfied for the Analysis of Variance for a Randomized Block Design? Answer: Use a nonparametric statistical method such as the Friedman rF test of Section 14.6.

Page 7: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

7 | P a g e

Formulas for the Calculations in the Randomized Block Design

( )

2

21

Correction for mean

Total of all observations=

Total number of observations

n

ii

CM

y

n=

=

=

( )

( ) 2

1

SS Total Total sum of squares

Sum of squares of all observationsn

ii

CM y CM=

=

= = = −∑

22 21 2

SST = Sum of square for treatmentsSum of squares of treatments totals witheach square divided by , the number of

observations for that treatment

... k

b CM

TT T CMb b b

= −

= + + + −

22 21 2

SST = Sum of square for blocksSum of squares of block totals with

each square divided by , the number ofobservations for that block

... k

k CM

BB B CMk k k

= −

= + + + −

SSE = Sum of squares for error = SS(Total) – SST – SSB

MST = Mean square for treatments = SST1k −

MSB = Mean square for blocks = SSB1b −

MSE = Mean square for error = SSE1n k b− − +

F = Test statistic = MSTMSE

Page 8: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

8 | P a g e

Where

( )( )

Total number of observationsNumber of blockNumber of treatmentsTotal for treatment 1,2,...,

Total for block 1,2,...,i

i

nbkT i i k

B i i b

===

= =

= =

5. Factorial Experiments Definition 10.9 A complete factorial experiment is a factorial experiment in which every factor-level combination is utilized. That is, the number of treatments in the experiment equals the total number of factor-level combinations. Factor B at b levels Level 1 2 3

B 1 Trt. 1 Trt. 2 Trt. 3

Trt. b Factor A 2 Trt. b + 1 Trt. b + 2 Trt. b + 3

Trt. 2b At a levels 3 Trt. 2b + 1 Trt. 2b + 2 Trt. 2b + 3

Trt. 3b

a Trt. (a-1)b + 1 Trt. (a-1)b + 2 Trt. (a-1)b + 3 Trt. ab

Procedure for Analysis of Two-Factor Factorial Experiment 1. Partition the total sum of squares into the treatment and error components (stage 1 of Figure

10.21). Use either a statistical software package or the calculation formulas in Appendix C to accomplish the partitioning.

2. Use the F-ratio of the mean square for treatments to the mean square for error to test the null hypothesis that the treatment means are equal.

a. If the test results in nonrejection of the null hypothesis, consider refining the experiment by increasing the number of replications or introducing other factors. Also, consider the possibility that the response is unrelated to the two factors.

b. If the test results in rejection of the null hypothesis, then proceed to step 3. 3. Partition the treatment sum of squares into the main effect and the interaction sum of squares

(stage 2 of Figure 10.21). Use either a statistical software package or the calculation formulas in Appendix B to accomplish the partitioning.

Page 9: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

9 | P a g e

4. Test the null hypothesis that factors A and B do not interact to affect the response by comparing the F-ratio of the mean square for interaction to the mean square for error.

a. If the test results in nonrejection of the null hypothesis, proceed to step 5. b. If the test results in rejection of the null hypothesis, conclude that the two factors

interact to affect the mean response. Then proceed to step 6a. 5. Conduct tests of two null hypotheses that the mean response is the same at each level of

factor A and factor B. Compute tow F-ratios by comparing the mean square for each factor main effect with the mean square for error.

a. 6. Compare the mean;

a. If the test for interaction (step 4) is significant, use a multiple-comparison procedure to compare any or all pairs of the treatment means.

b. If the test for one or both main effects (step 5) is significant, use a multiple-comparison procedure to compare the pairs of means corresponding to the levels of the significant factor(s).

Tests Conducted in Analyses of Factorial Experiments: Factorial Experiments, r Replicates per Treatment Test for Treatment Means

0 :H No difference among the ab treatment means :aH At least two treatment means differ

Test statistic: MSTMSE

F =

Rejection region: F Fα≥ , based on ( )1ab − numerator and ( )n ab− denominator degrees of

freedom [Note: n abr= .] Test for Factor Interaction

0 :H Factors A and B do not interact to affect the response mean :aH Factors A and B do interact to affect the response mean

Test statistic: ( )MSMSE

ABF =

Rejection region: F Fα≥ , based on ( )( )1 1a b− − numerator and ( )n ab− denominator degrees of freedom

Page 10: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

10 | P a g e

Test for Main Effect of Factor A 0 :H No difference among the a mean levels of factor A

:aH At least two factor A mean levels differ

Test statistic: ( )MSMSE

AF =

Rejection region: F Fα≥ , based on ( )1a − numerator and ( )n ab− denominator degrees of freedom Test for Main Effect of Factor B

0 :H No difference among the b mean levels of factor B :aH At least two factor B mean levels differ

Test statistic: ( )MSMSE

BF =

Rejection region: F Fα≥ , based on ( )1b − numerator and ( )n ab− denominator degrees of freedom Conditions Required for Valid F-Tests in Factorial Experiments 1. The response distribution for each factor-level combination (treatment) is normal. 2. The response variance is constant for all treatments. 3. Random and independent samples of experimental units are associated with each treatment. General ANOVA Summary Table for a Two-Factor Factorial Experiment with r Replicates, where Factor A has a Levels and Factor B has b Levels

Source df SS MS F A 1a − SSA MSA MSA/MSE B 1b − SSB MSB MSB/MSE AB ( )( )1 1a b− − SSAB MSAB MSAB/MSE

Error ( )1ab r − SSE MSE

Total 1n − SS(Total) Note: That A + B + AB = Treatments from a completely randomized experiment.

Page 11: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

11 | P a g e

Formulas for the Calculations for a Two-Factor Factorial Experiment

( )

2

21

Correction for mean

Total of all observations=

Total number of observations

n

ii

CM

y

n=

=

=

( )

( ) 2

1

SS Total Total sum of squares

Sum of squares of all observationsn

ii

CM y CM=

=

= = = −∑

( )1 2

2

1

SS Sum of squares for main effects, factor

Sum of squares of the totals , ,...,divided by the number of measurements

in a single total, namely

a

a

ii

A A

A A A

br

ACM

br=

=

=

= −∑

( )1 2

2

1

SS Sum of squares for main effects, factor

Sum of squares of the totals , ,...,divided by the number of measurements

in a single total, namely

a

b

ii

B B

B B B

ar

BCM

ar=

=

=

= −∑

( )

( ) ( )

11 12

2

1 1

SS Sum of squares for interaction

Sum of squares of the cells totalsA , ,..., divided by

the number of measurements ina single total, namely

SS SS

ab

b b

ijj i

AB AB

B AB AB

r

ABA B CM

r= =

=

=

= − − −∑ ∑

Page 12: Notes Anova

Lecture Notes Chapter Ten: Analysis of Variance

Randall Miller

12 | P a g e

Where

( )( )

Total number of observations Number of levels of factor Number of levels of factor Number of replicates observations per treatment

Total for level of factor 1,2,...,

Total for levei

i

na Ab Br

A i A i a

B

===

=

= =

= ( )( )

l of factor 1,2,...,

Total for treatment , i.e., for th level of factor and th level of factor ij

i B i b

AB ij i A i B

=

=