statistics 502 lecture notes...chapter 1 research design principles 1.1 induction in our efforts to...

160
Statistics 502 Lecture Notes Peter D. Hoff c December 6, 2006

Upload: others

Post on 01-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

Statistics 502 Lecture Notes

Peter D. Hoffc©December 6, 2006

Page 2: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

Contents

1 Research Design Principles 11.1 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Model of a process or system . . . . . . . . . . . . . . . . . . . 11.3 Experiments and observational studies . . . . . . . . . . . . . 21.4 Steps in designing an experiment . . . . . . . . . . . . . . . . 6

2 Comparing Two Treatments 82.1 Summaries of sample populations . . . . . . . . . . . . . . . . 92.2 Hypothesis testing via randomization . . . . . . . . . . . . . . 112.3 Essential nature of a hypothesis test . . . . . . . . . . . . . . 162.4 Basic decision theory, or “What use is a p-value?” . . . . . . . 172.5 Relating samples to (super)populations . . . . . . . . . . . . . 182.6 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 212.7 Introduction to the t-test . . . . . . . . . . . . . . . . . . . . . 232.8 Two sample tests . . . . . . . . . . . . . . . . . . . . . . . . . 292.9 Power and Sample Size Determination . . . . . . . . . . . . . 36

2.9.1 The non-central t-distribution . . . . . . . . . . . . . . 382.9.2 Computing the Power of a test . . . . . . . . . . . . . 40

2.10 Checking Assumptions of a two-sample t-test . . . . . . . . . . 432.10.1 Two-sample t-test with unequal variances . . . . . . . 472.10.2 Which two-sample t-test to use? tdiff(yA,yB) vs. t(yA,yB)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3 Comparing Several Treatments 493.1 Introduction to ANOVA . . . . . . . . . . . . . . . . . . . . . 49

3.1.1 A model for treatment variation . . . . . . . . . . . . . 513.1.2 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . 523.1.3 Testing hypothesis with MSE and MST . . . . . . . . 54

i

Page 3: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CONTENTS ii

3.1.4 Partitioning sums of squares . . . . . . . . . . . . . . . 573.1.5 The ANOVA table . . . . . . . . . . . . . . . . . . . . 603.1.6 More sums of squares geometry . . . . . . . . . . . . . 633.1.7 Unbalanced Designs . . . . . . . . . . . . . . . . . . . 653.1.8 Normal sampling theory for ANOVA . . . . . . . . . . 683.1.9 Sampling distribution of the F -statistic . . . . . . . . . 703.1.10 Comparing group means . . . . . . . . . . . . . . . . . 733.1.11 Power calculations for the F-test . . . . . . . . . . . . 75

3.2 Treatment Comparisons . . . . . . . . . . . . . . . . . . . . . 783.2.1 Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . 783.2.2 Orthogonal Contrasts . . . . . . . . . . . . . . . . . . . 803.2.3 Multiple Comparisons . . . . . . . . . . . . . . . . . . 83

3.3 Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 853.3.1 Detecting violations with residuals . . . . . . . . . . . 863.3.2 Variance stabilizing transformations . . . . . . . . . . . 92

4 Multifactor Designs 1024.1 Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.1.1 Data analysis: . . . . . . . . . . . . . . . . . . . . . . . 1034.1.2 Additive effects model . . . . . . . . . . . . . . . . . . 1094.1.3 Evaluating additivity: . . . . . . . . . . . . . . . . . . 1134.1.4 Inference for additive treatment effects . . . . . . . . . 116

4.2 Randomized complete block designs . . . . . . . . . . . . . . . 1264.3 Unbalanced designs . . . . . . . . . . . . . . . . . . . . . . . . 132

4.3.1 Non-orthogonal sums of squares: . . . . . . . . . . . . . 1394.4 Analysis of covariance . . . . . . . . . . . . . . . . . . . . . . 142

5 Nested Designs 1465.1 Nested Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 146

5.1.1 Mixed-effects approach . . . . . . . . . . . . . . . . . . 1535.1.2 Repeated measures analysis: . . . . . . . . . . . . . . . 156

Page 4: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

List of Figures

2.1 Wheat yield distributions . . . . . . . . . . . . . . . . . . . . . 102.2 Randomization distribution for the wheat example . . . . . . . 152.3 The superpopulation model . . . . . . . . . . . . . . . . . . . 202.4 The χ2 distribution . . . . . . . . . . . . . . . . . . . . . . . . 252.5 The t-distribution . . . . . . . . . . . . . . . . . . . . . . . . . 262.6 A t8 null distribution and α = 0.05 rejection region . . . . . . 282.7 The t-based null distribution for the wheat example . . . . . . 322.8 Randomization distribution for the t-statistic . . . . . . . . . . 332.9 The non-central t-distribution . . . . . . . . . . . . . . . . . . 392.10 Critical regions and the non-central t-distribution . . . . . . . 412.11 Null and alternative distributions for another wheat example,

and power versus sample size. . . . . . . . . . . . . . . . . . . 442.12 Normal scores plots. . . . . . . . . . . . . . . . . . . . . . . . 46

3.1 Bacteria data . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.2 Randomization distribution of the F -statistic . . . . . . . . . . 583.3 Coagulation data . . . . . . . . . . . . . . . . . . . . . . . . . 693.4 F-distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 723.5 Normal-theory and randomization distributions of the F -statistic 733.6 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.7 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.8 Yield-density data . . . . . . . . . . . . . . . . . . . . . . . . 813.9 Normal scores plots of normal samples, with n ∈ {20, 50, 100} 883.10 Crab data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893.11 Crab residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 903.12 Fitted values versus residuals . . . . . . . . . . . . . . . . . . 923.13 Data and log data . . . . . . . . . . . . . . . . . . . . . . . . . 943.14 Diagnostics after the log transformation . . . . . . . . . . . . 95

iii

Page 5: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

LIST OF FIGURES iv

3.15 Mean variance relationship of the transformed data . . . . . . 100

4.1 Marginal Plots. . . . . . . . . . . . . . . . . . . . . . . . . . . 1044.2 Conditional Plots. . . . . . . . . . . . . . . . . . . . . . . . . 1054.3 Cell plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064.4 Mean-variance relationship. . . . . . . . . . . . . . . . . . . . 1064.5 Mean-variance relationship for transformed data. . . . . . . . 1074.6 Plots of transformed poison data . . . . . . . . . . . . . . . . 1084.7 Comparison between types I and II, without respect to deliv-

ery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1174.8 Comparison between types I and II, with delivery in color. . . 1184.9 Marginal plots of the data. . . . . . . . . . . . . . . . . . . . . 1224.10 Three datasets exhibiting non-additive effects. . . . . . . . . . 1254.11 Experimental material in need of blocking. . . . . . . . . . . . 1284.12 Results of the experiment . . . . . . . . . . . . . . . . . . . . 1284.13 Marginal plots and residuals . . . . . . . . . . . . . . . . . . 1304.14 Marginal plots for pain data . . . . . . . . . . . . . . . . . . . 1374.15 Interaction plots for pain data . . . . . . . . . . . . . . . . . . 1384.16 Oxygen uptake data . . . . . . . . . . . . . . . . . . . . . . . 1434.17 ANOVA and ANCOVA fits to the oxygen uptake data . . . . 144

5.1 Potato data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485.2 Diagnostic plots for potato ANOVA. . . . . . . . . . . . . . . 1495.3 Potato data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1515.4 Potato data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Page 6: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

Chapter 1

Research Design Principles

1.1 Induction

In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained via induction: reasoning from the specific to thegeneral.

Example (survey): Do you favor increasing the gas tax for public trans-portation?

• Specific cases: 200 people called for a telephone survey

• Inferential goal: get information on the opinion of the entire city.

Example (Women’s Health Initiative): Does hormone replacement im-prove health status in post-menopausal women?

• Specific cases: Health status monitored in 16,608 women over a 5-yearperiod. Some took hormones, others did not.

• Inferential goal : Determine if hormones improve the health of womennot in the study.

1.2 Model of a process or system

Inputs (x) → Process → Output or Response (y)

1

Page 7: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 2

Input variables consist of

controllable factors: measured and determined by scientist

uncontrollable factors: measured but not determined by scientist

noise factors: unmeasured, uncontrolled factors (experimental variabilityor “error”)

For any interesting process, there are inputs such that:

variability in input → variability in output

If variability in an input factor x leads to variability in output y, we say xis a source of variation. In this class we will discuss methods of designingand analyzing experiments to determine important sources of variation.

1.3 Experiments and observational studies

Information on how inputs affect output can be gained from:

• Observational studies: Input and output variables are observed from apre-existing population. It may be hard to say what is input and whatis output.

• Controlled experiments: (some) Input variables are controlled and ma-nipulated by the experimenter to determine their effect on the output.

Example (Women’s Health Initiative, WHI):

• Population: Healthy, post-menopausal women in the U.S.

• Input variables:

1. estrogen treatment, yes/no

2. demographic variables (age, race, diet, family history, . . . )

3. unmeasured variables (?)

• Output variables

1. coronary heart disease (eg. MI)

Page 8: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 3

2. invasive breast cancer

3. . . .

• Scientific question: How does estrogen treatment affect health out-comes?

Observational Study:

1. Observational population: 93,676 women enlisted starting in 1991,tracked over eight years on average. Data consists of x= input variables,y=health outcomes, gathered concurrently on existing populations.

2. Results: good health/low rates of CHD generally associated with estro-gen treatment.

3. Conclusion: Estrogen treatment is positively associated with health out-comes, such as prevalence of CHD.

Experimental Study (WHI randomized controlled trial):

1. Experimental population:

373,092 women determined to be eligible

↪→ 18,845 provided consent to be in experiment

↪→ 16,608 included in the experiment

16,608 women randomized to either

{x = 1 (estrogen treatment)x = 0 (control, i.e. no estrogen treatment)

using a randomized block design: Women were treated at differentclinics, and were of different ages.

age group1 (50-59) 2 (60-69) 3 (70-79)

clinic 1 n11 n12 n13

2 n21 n22 n23...

......

...

ni,j = # of women in study, in clinic i and in age group j

= # of women in block i, j

Page 9: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 4

Randomization scheme: For each block, 50% of the women in thatblock were randomly assigned to treatment (x = 1) and the remainingassigned to control (x = 0).

Question: Why did they randomize within a block ?

2. Results: JAMA, July 17 2002. Women on treatment had higher inci-dence of

• CHD

• breast cancer

• stroke

• pulmonary embolism

and lower incidence of

• colorectal cancer

• hip fracture

3. Conclusion: Estrogen treatment is not a viable preventative measurefor CHD in the general population. That is, our inductive inferenceis

(specific) Higher CHD rate in treatment population than control

suggests

(general) if everyone in the population were treated, the incidencerate of CHD would increase.

Question: Why the different conclusions? Consider the following possibleexplanation: Let

x1 = estrogen treatment

x2 = “health consciousness” (not directly measured)

y = health outcomes

Page 10: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 5

Observational study

correlation

.

X2

X1 Y

causecause

Randomized experiment

randomization

.................................

X2

X1 Y

Observational studies can suggest good experiments to run, but can’tdefinitively show causation.

Randomization can eliminate correlation between x1 and y due to adifferent cause x2, aka a confounder.

“No causation without randomization”

Page 11: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 6

1.4 Steps in designing an experiment

1. Identify research hypotheses to be tested.

2. Choose a set of experimental units, which are the units to whichtreatments will be randomized.

3. Choose a response/output variable.

4. Determine potential sources of variation in response:

(a) factors of interest

(b) nuisance factors

5. Decide which variables to measure and control:

(a) treatment variables

(b) potential large sources of variation/blocking variables

6. Decide on the experimental procedure and how treatments are to berandomly assigned.

These factors are often constrained by budgets, ethics, time,. . .

Three principles in Experimental Design

1. Replication: Repetition of an experiment.Replicates are runs of an experiment or sets of experimental units thathave the same values of the control variables.

More replication → more precise inferenceLetyA,i = response of the ith unit assigned to treatment AyB,i = response of the ith unit assigned to treatment Bi = 1, . . . , n.

Then yA 6= yB provides evidence that treatment affects response, i.e.treatment is a source of variation. ( larger n → more evidence ).

2. Randomization: Random assignment of treatments to experimental units.This removes potential for systematic bias/ removes any pre-experimentalsource of bias. Makes confounding unlikely.

Page 12: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 7

3. Blocking: Randomization within blocks of homogeneous experimentalunits. The goal is to evenly distribute treatments across large potentialsources of variation.

Example (Crop study):

Factor of interest: Tomato type, A or B.

Experimental units: three plots of land, each to be divided into a 2×2 grid.

Nuisance factor : Soil quality.

good soil

Plot 1 Plot 2 Plot 3

bad soil

Page 13: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

Chapter 2

Comparing Two Treatments

Example: Wheat yield

Factor of interest: Fertilizer type, A or B. One factor of interest, havingtwo levels.

Question: Is one fertilizer better than another, in terms of yield?

Experimental material: One plot of land to be divided into 2 rows of 6subplots.

1. Design question: How to assign treatments/factor levels to the plots?Want to avoid confounding a treatment effect with another potential sourceof variation.

2. Potential sources of variation: Fertilizer, soil, sun, water.

3. Implementation: If we assign treatments randomly, we can avoidany pre-experimental bias in results: 12 playing cards, 6 red, 6 black wereshuffled and dealt.

1st card red → 1st plot gets A2nd card red → 2nd plot gets A3rd card black → 3rd plot gets B

...This is our first design, a completely randomized design.

8

Page 14: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 9

4. Results:

A A B B A B26.9 11.4 26.6 23.7 25.3 28.5B B A A B A

14.2 17.9 16.5 21.1 24.3 19.6

How much evidence is there that fertilizer type is a source of yield variation?Evidence about differences between two populations is generally measured bycomparing summary statistics across the two sample populations. (Recall, astatistic is any computable function of known, observed data).

2.1 Summaries of sample populations

Distribution:

• Empirical distribution: Pr(a, b] = #(a < yi ≤ b)/n

• Empirical CDF (cumulative distribution function)

F (y) = #(yi ≤ y)/n = Pr(−∞, y]

• Histograms

• Kernel density estimates

Note that these summaries more or less retain all the information inthe data except the unit labels.

Location:

• sample mean or average : y = 1n

∑ni=1 yi

• sample median : q(1/2) is a/the value y(1/2) such that

#(yi ≤ y(1/2))/n ≥ 1/2 #(yi ≥ y(1/2))/n ≥ 1/2

To find the median, sort the data in increasing order, and callthese values y(1), . . . , y(n). If there are no ties, then

if n is odd, then y(n+12

) is the median;

if n is even, then all numbers between y(n2) and y(n+1

2) are

medians.

Page 15: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 10

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

y

F(y

)

Histogram of y

y

Den

sity

10 15 20 25 300.

000.

020.

040.

065 10 20 30

0.00

0.02

0.04

0.06

N = 12 Bandwidth = 2.99

Den

sity

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

y

F(y

)

Histogram of yA

yA

Den

sity

10 15 20 25 30

0.00

0.05

Histogram of yB

yB

Den

sity

10 15 20 25 30

0.00

0.05

5 10 15 20 25 30 350.

000.

020.

040.

06N = 6 Bandwidth = 3.133

Den

sity

Figure 2.1: Wheat yield distributions

Scale:

• sample variance and standard deviation:

s2 =1

n− 1

n∑i=1

(yi − y)2, s =√

s2

• interquantile range:

[y(1/4), y(3/4)] (interquartile range);

[y(.025), y(.975)]

Example: Wheat yield

> yA<-c(26.9,11.4,25.3,16.5,21.1,19.6)

> yB<-c(26.6,23.7,28.5,14.2,17.9,24.3)

Page 16: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 11

> mean(yA)

[1] 20.13333

> mean(yB)

[1] 22.53333

> median(yA)

[1] 20.35

> median(yB)

[1] 24

> sd(yA)

[1] 5.712676

> sd(yB)

[1] 5.432004

> quantile(yA,prob=c(.25,.75))

25% 75%

17.275 24.250

> quantile(yB,prob=c(.25,.75))

25% 75%

19.350 26.025

So there is a different in yield for these wheat fields.Would you recommend B over A for future plantings?Do you think these results generalize to a larger population?

2.2 Hypothesis testing via randomization

Questions:

• Could the observed differences be due to fertilizer type?

• Could the observed differences be due to plot-to-plot variation?

Hypothesis tests:

• H0 (null hypothesis): Fertilizer type does not affect yield.

Page 17: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 12

• H1 (alternative hypothesis): Fertilizer type does affect yield.

A statistical hypothesis test evaluates the plausibility of H0 in light ofthe data.

Suppose we are interested in mean wheat yields. We can evaluate H0 byanswering the following questions:

• Is a mean difference of 2.4 plausible/probable if H0 is true?

• Is a mean difference of 2.4 large compared to experimental noise?

To answer the above, we need to compare

{|yB − yA| = 2.4}, the observed difference in the experimentto

values of |yB − yA| that could have been observed if H0 were true.

Hypothetical values of |yB− yA| that could have been observed under H0 arereferred to as samples from the null distribution.

Finding a null distribution: Let

g(YA,YB) = g({Y1,A, . . . , Y6,A}, {Y1,B, . . . , Y6,B}) = |YB − YB|.

This is a function of the outcome of the experiment. It is a statistic. Sincewe will use it to perform a hypothesis test, we will call it a test statistic.

Observed test statistic: g(26.9, 11.4, . . . , 24.3) = 2.4 = gobs

Hypothesis testing procedure: Compare gobs to g(YA,YB) for values ofYA and YB that could have been observed, if H0 were true.

Recall the outcome of the experiment:

1. Cards were shuffled and dealt R,R,B,B, . . . and fertilizer types plantedin subplots:

A A B B A B

B B A A B A

Page 18: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 13

2. Crops were grown and wheat yields obtained:

A A B B A B26.9 11.4 26.6 23.7 25.3 28.5B B A A B A

14.2 17.9 16.5 21.1 24.3 19.6

Now imagine re-doing the experiment in a universe where “H0: no treat-ment effect” is true:

1. Cards are shuffled and dealt B, R, B, B, . . . and wheat types planted insubplots:

B A B B A A

A B B A A B

2. Crops were grown and wheat yields obtained:

B A B B A A26.9 11.4 26.6 23.7 25.3 28.5A B B A A B

14.2 17.9 16.5 21.1 24.3 19.6

Under this hypothetical treatment assignment,

(YA,YB) = {11.4, 25.3, . . . , 19.6}|YB − YA| = 1.07

This represents an outcome of the experiment in a universe where

• The treatment assignment is B, A, B, B, A,A, A, B, B, A,A, B;

• H0 is true.

Page 19: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 14

IDEA: To consider what types of outcomes we would see in universes whereH0 is true, compute g(YA,YB) under every possible treatment assignmentand assuming H0 is true.

Under our randomization scheme, there were

12!

6!6!=

(12

6

)= 924

equally likely ways the treatments could have been assigned. For each oneof these, we can calculate the value of the test statistic that would’ve beenobserved under H0:

{g1, g2, . . . , g924}

This enumerates all potential pre-randomization outcomes of our teststatistic, assuming no treatment effect. Along with the fact that eachtreatment assignment is equally likely, these value give a null distribution,a probability distribution of possible experimental results, if H0 is true.

Pr(g(YA,YB) ≤ x|H0) =#{gk ≤ x}

924

This distribution is sometimes called the randomization distribution, be-cause it is obtained by the randomization scheme of the experiment.

Is there any contradiction between H0 and our data?

Pr(g(YA,YB) ≥ 2.4|H0) = 0.47

According to this calculation, the probability of observing a mean differenceof 2.4 or more is not unlikely under the null hypothesis. This probabilitycalculation is called a p-value. Generically, a p-value is

“The probability, under the null hypothesis, of obtaining a result as or moreextreme than the observed result.”

The basic idea:

small p-value → evidence against H0

large p-value → no evidence against H0

Page 20: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 15

YB − YA

Den

sity

−10 −5 0 5 10

0.00

0.04

0.08

0.12

|YB − YA|D

ensi

ty0 2 4 6 8

0.00

0.10

0.20

Figure 2.2: Randomization distribution for the wheat example

Approximating a randomization distribution We don’t want to haveto enumerate all

(n

n/2

)possible treatment assignments. Instead, repeat the

following Nsim times:

(a) randomly simulate a treatment assignment from the population of pos-sible treatment assignments, under the randomization scheme.

(b) compute the value of the test statistic, given the simulated treatmentassignment and under H0.

The empirical distribution of {g1, . . . , gNsim} approximates the null dis-tribution :

#(|gk|) ≥ 2.4)

Nsim≈ Pr(g(YA,YB) ≥ 2.4|H0)

The approximation improves if Nsim increased.

Here is some R-code:

y<- c( 26.9,11.4,26.6,23.7,25.3,28.5,14.2,17.9,16.5,21.1,24.3,19.6)

x<- c("A","A","B","B","A","B","B","B","A","A","B","A")

Page 21: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 16

g<-real()

for(nsim in 1:5000) {

xsim<-sample(x)

g[nsim]<- abs ( mean(y[xsim=="B"]) - mean(y[xsim=="A"] ) )

}

2.3 Essential nature of a hypothesis test

Given H0, H1 and data y = {y1, . . . , yn}:

1. From the data, compute a relevant test statistic g(y).g(y) should be chosen so that it can differentiate between H0 and H1.For example,

g(y) is probably

{small under H0

large under H1

2. Obtain a null distribution : A probability distribution over the pos-sible outcomes of g(Y) under H0. Here, Y = {Y1, . . . , Yn} are potentialexperimental results that could have happened under H0.

3. Compute the p-value: The probability under H0 of observing a teststatistic g(Y) as or more extreme than the observed statistic g(y).

p-value = Pr(g(Y) ≥ g(y)|H0)

If the p-value is small ⇒ evidence against H0.

If the p-value is large ⇒ not evidence against H0

Even if we follow these guidelines, we must be careful in our specification ofH0, H1 and g(Y) for the hypothesis testing procedure to be useful.

Questions:

• When is a small p-value evidence in favor of H1?

• Give a scenario where you might have evidence against H1, but a smallp-value.

• What does the p-value say about the probability that the null hypoth-esis is true? Try using Bayes rule to figure this out.

Page 22: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 17

2.4 Basic decision theory, or “What use is a

p-value?”

Task: Accept or reject H0 based on data.

truthaction H0 true H0 false

accept H0 correct decision type II errorreject H0 type I error correct decision

As we discussed

• The p-value can measure of evidence against H0;

• The smaller the p-value, the larger the evidence against H0.

Decision procedure:

1. Compute the p-value by comparing observed test-stat to the null dis-tribution;

2. Reject H0 if p-value ≤ α, otherwise accept H0.

This procedure is called a level-α test. It controls the pre-experimentalprobability of a type I error, or for a series of experiments, controls thetype I error rate.

Pr(type I error|H0) = Pr(reject H0 |H0)

= Pr(p-value ≤ α|H0)

= α

Single Experiment Interpretation: If you use a level-α test for your ex-periment where H0 is true, then before you run the experimentthere is probability α that you will erroneously reject H0.

Many Experiments Interpretation: If level-α tests are used in a largepopulation of experiments, then H0 will be declared false in (100×α)%of the experiments in which H0 is true.

Page 23: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 18

Pr(H0 rejected|H0 true) = α

Pr(H0 accepted|H0 true) = 1− α

Pr(H0 rejected|H0 false) = ?

Pr(H0 accepted|H0 false) = ?

Pr(H0 rejected|H0 false) is the power. Typically we need to be more specificthan “H0 false” to calculate the power.

2.5 Relating samples to (super)populations

If the experiment is

• large

• complicated

• non- or partially randomized

then a null distribution based on randomization may be difficult to obtain.An alternative approach is based on formulating a sampling model.

Consider the following model for our experiment:

• There is a large/infinite population of plots of similar size/shape/compositionas the plots in our experiment.

• When A is applied to these plots, the distribution of plot yields can berepresented by a probability distribution pA with

– mean = E(YA) =∫

ypA(y)dy = µA;

– variance =V (YA) = E[(YA − µA)2] = σ2A;

• When B is applied to these plots, the distribution of plot yields can berepresented by a probability distribution pB with

– mean = E(YB) =∫

ypB(y)dy = µB;

– variance =V (YB) = E[(YB − µB)2] = σ2B;

Page 24: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 19

• The plots that received A in our experiment can be viewed as indepen-dent samples from pA, and likewise for plots receiving B.

Y1,A, . . . , YnA,A ∼ i.i.d. pA

Y1,B, . . . , YnB ,B ∼ i.i.d. pB

Recall from intro stats:

E(YA) = E(1

nA

nA∑i=1

Yi,A)

=1

nA

nA∑i=1

E(Yi,A)

=1

nA

nA∑i=1

µA = µA

We say that YA is an unbiased estimator of µA. Furthermore, if Y1,A, . . . , YnA,A

are independent samples from population A, then

V (YA) = V (1

nA

∑Yi,A

=1

n2A

∑V (Yi,A)

=1

n2A

∑σ2

A = σ2A/nA

as nA →∞, V (YA) = E[(YA − µA)2] → 0

which, with unbiasedness, implies YA → µA

and we say that YA is a consistent estimator for µA. In general, as thesample sizes increases

YA → µA

s2A → σ2

A

#{Yi,A ≤ x}nA

= FA(x) → FA(x) =

∫ x

−∞pA(x)dx

Page 25: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 200.

000.

020.

040.

06

µA

‘All possible' A wheat yields

random sampling

yA5 10 15 20 25 30 35

0.00

0.02

0.04

0.06

Experimental samples

yA=20.13

sA=5.72

0.00

0.02

0.04

0.06

µB

‘All possible' B wheat yields

random sampling

yB5 10 15 20 25 30 35

0.00

0.02

0.04

0.06

Experimental samples

yB=22.53

sB=5.43

Figure 2.3: The superpopulation model

Page 26: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 21

Back to hypothesis testing: We can formulate null and alternative hy-potheses in terms of population quantities. For example, if µB > µA, wewould recommend B over A, and vice versa.

• H0 : µA = µB

• H1 : µA 6= µB

The experiment is performed and it is observed that

22.53 = yB > yA = 20.13

This is some evidence that µB > µA. How much evidence is it? Should wereject the null hypothesis? Consider our test statistic:

g(YA,YB) = YA − YB

To decide whether to reject or not, we need to know the distribution ofg(YA,YB) under H0. Consider the following setup:

Assume:

• Y1,A, . . . , YnA,A ∼ i.i.d. pA

• Y1,B, . . . , YnB ,B ∼ i.i.d. pB

Evaluate: H0 : µA = µB versus H1 : µA 6= µB

To obtain a p-value, we need the distribution of g(YA,YB) under µA = µB.This will involve assumptions about/approximations to pA and pB.

2.6 Normal distribution

The normal distribution is useful because in many cases,

• our data are approximately normally distributed, and/or

• our sample means are approximately normally distributed.

Page 27: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 22

These are both due to the central limit theorem:

X1 ∼ P1(µ1, σ1)X2 ∼ P2(µ2, σ2)

...Xm ∼ Pm(µm, σm)

⇒m∑

j=1

Xi∼normal

(∑µj,√∑

σ2j

)

Sums of varying quantities are approximately normally distributed.

Normally distributed data: Consider crop yields from plots of land.

Yi = a1 × seedi + a2 × soili + a3 × wateri + a4 × suni + · · ·

The empirical distribution of crop yields from a population of fields withvarying quantities of seed, soil, water, sun,. . . will be approximately normal(µ, σ), where µ and σ depend on the effects a1, a2, a3, a4, . . . and the variabilityof seed, soil, water, sun,. . . .

Additive effects ⇒ normally distributed data

Normally distributed means: Consider the following scenario:

Experiment 1: sample y(1)1 , . . . , y

(1)n ∼ i.i.d. p and compute y(1);

Experiment 2: sample y(2)1 , . . . , y

(2)n ∼ i.i.d. p and compute y(2);

...

Experiment m: sample y(m)1 , . . . , y

(m)n ∼ i.i.d. p and compute y(m);

A histogram of {y(1), . . . , y(m)} will look approximately normally distributedwith

sample mean {y(1), . . . , y(m)} ≈ µ

sample variance {y(1), . . . , y(m)} ≈ σ2/n

i.e. the sampling distribution of the mean is approximately normal(µ, σ2/n),even if the sampling distribution of the data are not normal.

Page 28: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 23

Basic properties of the normal distribution:

• Y ∼ normal(µ, σ) ⇒ aY + b ∼normal(aµ + b, |a|σ).

• Y1 ∼ normal(µ1, σ1), Y2 ∼ normal(µ2, σ2), Y1, Y2 uncorrelated⇒ Y1 + Y2 ∼ normal(µ1 + µ2,

√σ2

1 + σ22)

• if Y1, . . . , Yn ∼ i.i.d. normal(µ, σ), then Y is statistically independentof s2.

How does this help with hypothesis testing? Consider testing H0 :µA = µ = µB (trt doesn’t affect mean). Then regardless of distribution ofdata, under H0 :

YA·∼ normal(µ, σA/

√nA)

YB·∼ normal(µ, σB/

√nB)

YB − YA·∼ normal(0, σAB)

where σ2AB = σ2

A/nA + σ2B/nB. So if we knew the variances, we’d have a null

distribution.

2.7 Introduction to the t-test

Consider a simple one-sample hypothesis test:

Y1, . . . , Yn ∼ i.i.d. P , with mean µ and variance σ2.

H0 : µ = µ0

H1 : µ 6= µ0

To test H0, we need a test statistic and its distribution under H0.

• Y would make a good test stat: it is sensitive to deviations from H0.

• Its distribution is approximately known:

– E(Y ) = µ

– V (Y ) = σ2/n

Page 29: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 24

– Y is approximately normal.

Therefore, under H0,

f(Y) =Y − µ0

σ/√

n

is approximately standard normal and we write f(Y) ∼ normal(0, 1). Isf(Y) a statistic?

• Problem: Don’t usually know σ2.

• Solution: Approximate σ2 with s2.

One-sample t-statistic:

t(Y) =Y − µ0

s/√

n

What is the null distribution of t(Y)? It seems that

s ≈ σ soY − µ0

s/√

n≈ Y − µ0

σ/√

n

so t(Y) ≈ normally distributed. However, if the approximation s ≈ σ ispoor, like when n is small, we need to take account of our uncertainty in theestimate of σ.

χ2 distribution

Z1, . . . , Zn ∼ i.i.d. normal(0, 1) ⇒∑

Z2i ∼ χ2

n, chi-squared dist with n degrees of freedom∑(Zi − Z)2 ∼ χ2

n−1

Y1, . . . , Yn ∼ i.i.d. normal(µ, σ) ⇒ (Y1 − µ)/σ, . . . , (Yn − µ)/σ ∼ i.i.d. normal(0, 1)

⇒ 1

σ2

∑(Yi − µ)2 ∼ χ2

n

1

σ2

∑(Yi − Y )2 ∼ χ2

n−1

Some intuition: (Z1− Z, . . . , Zn− Z) is a vector of length n but lies in ann− 1 dimensional space. In fact, it is has a singular multivariate normal

Page 30: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 25

0 5 10 15 20 25 30

0.00

0.04

0.08

X

p(X

)

n=9n=10n=11

Figure 2.4: The χ2 distribution

distribution, with a rank (n − 1) covariance matrix. Which vector do youexpect to be bigger: (Z1, . . . , Zn) or (Z1 − Z, . . . , Zn − Z)?

Getting back to the problem at hand,

n− 1

σ2s2 =

n− 1

σ2

1

n− 1

∑(Yi − Y )2 ∼ χ2

n−1,

a known distribution we can look up on a table.

t-distribution If

• Z ∼ normal (0,1) ;

• X ∼ χ2m;

• Z,X statistically independent,

then

Z√X/m

∼ tm, the t-distribution with m degrees of freedom

How does this help us? Recall that if Y1, . . . , Yn ∼ i.i.d. normal(µ, σ),

Page 31: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 26

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

t

p(t)

n=3n=6n=12n=∞

Figure 2.5: The t-distribution

•√

n(Y − µ)/σ ∼normal(0,1)

• n−1σ2 s2 ∼ χ2

n−1

• Y , s2 are independent.

Let Z =√

n(Y − µ)/σ, X = n−1σ2 s2. Then

Z√X/(n− 1)

=

√n(Y − µ)/σ√

n−1σ2 s2/(n− 1)

=Y − µ

s/√

n∼ tn−1

This is still not a statistic, as µ is unknown, but under a specific hypothesis,like H0 : µ = µ0, it is a statistic:

t(Y) =Y − µ0

s/√

n∼ tn−1 if E(Y ) = µ0

It is called the t-statistic.

Some questions for discussion:

Page 32: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 27

• What happens to the distribution of t(Y) when n →∞? Why?

• What if the data are not normal?

– What is the distribution of√

n(Y − µ)/σ for large and small n?

– What is the distribution of n−1σ2 s2 for large and small n?

– Are√

n(Y − µ) and s2 independent for small n? What about forlarge n?

Two-sided, one-sample t-test:

1. Sampling model: Y1, . . . , Yn ∼ i.i.d. normal(µ, σ)

2. Null hypothesis: H0 : µ = µ0

3. Alternative hypothesis: H1 : µ 6= µ0

4. Test statistic: t(Y) =√

n(Y − µ0)/s

• Pre-experiment we think of this as a random variable, an unknownquantity that is to be randomly sampled from a population.

• Post-experiment this is a fixed number.

5. Null distribution: Under the sampling model and H0, the samplingdistribution of t(Y) is the t-distribution with n−1 degrees of freedom:

t(Y) ∼ tn−1

If H0 is not true, then t(Y) has a non-central t-distribution, which wewill use later to calculate power, or the type II error rate.

6. p-value: Let y be the observed data.

p-value = Pr(|t(Y)| ≥ |t(y)||H0)

= Pr(|Tn−1| ≥ |t(y)|)= 2× Pr(Tn−1 ≥ |t(y)|)= 2 ∗ (1− pt(tobs, n− 1))

= t.test(y, mu = mu0)

7. Level-α decision procedure: Reject H0 if

Page 33: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 28

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

t

dens

ity

Figure 2.6: A t8 null distribution and α = 0.05 rejection region

• p-value ≤ α or equivalently

• |t(y)| ≥ t(n−1),1−α/2 (for α = .05, t(n−1),1−α/2 ≈ 2 ).

Question: Suppose our procedure is to reject H0 only when t(y) ≥ t(n−1),1−α.Is this a level-α test?

Confidence intervals via hypothesis tests: Recall that

• H0 : E(Y ) = µ0 is rejected if

√n|(y − µ0)/s| ≥ t1−α/2

• H0 : E(Y ) = µ0 is not rejected if

√n|(y − µ0)/s| ≤ t1−α/2

|y − µ0| ≤ 1√n

s× t1−α/2

y − s√n× t1−α/2 ≤ µ0 ≤ y +

s√n× t1−α/2

Page 34: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 29

If µ0 satisfies this last line, then it is in the acceptance region. Otherwiseit is in the rejection region. In other words, “plausible” values of µ are inthe interval

y ± s√n× t1−α/2

We say this interval is a “100× (1− α)% confidence interval” for µ. Thisinterval contains only those values of µ that are not rejected by this level-αtest.

Main property of a confidence interval: Suppose you are going to

1. gather data;

2. compute a 100× (1− α)% confidence interval.

Further suppose H0 : E(Y ) = µ0 is true.

Question: What is the probability that µ0 will be in your to be sam-pled/random interval?

Pr(µ0 in interval|E(Y ) = µ0) = 1− Pr(µ0 not in interval|E(Y ) = µ0)

= 1− Pr( reject H0|E(Y ) = µ0)

= 1− α

The quantity 1− α is called the coverage probability. It is

• the pre-experimental probability that your confidence interval will coverthe true value;

• the large sample fraction of experiments in which the confidence intervalcovers the true mean.

2.8 Two sample tests

Recall the wheat example:

B A B B A A26.9 11.4 26.6 23.7 25.3 28.5A B B A A B

14.2 17.9 16.5 21.1 24.3 19.6

Page 35: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 30

Sampling model:

Y1A, . . . , YnAA ∼ i.i.d. normal(µA, σ)

Y1B, . . . , YnBB ∼ i.i.d. normal(µB, σ).

In addition to normality we assume for now that both variances are equal.

Hypotheses: H0 : µA = µB; HA: µA 6= µB

Recall that

YB − YA ∼ N

(µB − µA, σ

√1

nA

+1

nB

).

Hence if H0 is true then

YB − YA ∼ N

(0, σ

√1

nA

+1

nB

).

How should we estimate σ2 ?

s2

pooled =

∑nA

i=1(yiA − yA)2 +∑nB

i=1(yiB − yB)2

(nA − 1) + (nB − 1)

=nA − 1

(nA − 1) + (nB − 1)s2

A +nB − 1

(nA − 1) + (nB − 1)s2

B

This gives us the following two-sample t-statistic:

t(YA,YB) =YB − YA

sp

√1

nA+ 1

nB

∼ tnA+nB−2

Self-check exercises:

1. Show that (nA + nB − 2)s2p/σ

2 ∼ χ2nA+nB−2 (recall how the χ2 distribu-

tion was defined ).

2. Show that the two-sample t-statistic has a t-distribution with nA+nB−2 d.f.

Page 36: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 31

Numerical Example: Wheat again

Let α = 0.05, so if the null hypothesis is true, the probability of a type Ierror is 0.05.

• yA = 20.13, s2A = 32.63, nA = 6 yB = 22.53, s2

B = 29.51, nB = 6

• s2p = 31.07, so t(yA, yB) = 2.4/(5.57

√1/6 + 1/6) = 0.75

• Hence the p-value= Pr(|T10| ≥ 0.75) = 0.47

• Hence H0: µA = µB is not rejected at level α = 0.05

> t.test( y[x=="A"],y[x=="B"], var.equal=T)

Two Sample t-test

data: y[x == "A"] and y[x == "B"]t = -0.7458, df = 10, p-value = 0.473alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-9.570623 4.770623sample estimates:mean of x mean of y20.13333 22.53333

Always keep in mind where this comes from: See Figure 2.7

Comparison to the randomization test: We could also use the two-sample t-statistic in a randomization test. To do this we would

• generate a null distribution of potential outcomes of the experiment

{(YA,YB)(1), . . . , (YA,YB)(nsim)}

that we could have seen under H0 and different treatment assignments

• compare the null distribution of the outcomes to the observed outcome.

Page 37: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 32

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

T

p(T

)

Figure 2.7: The t-based null distribution for the wheat example

The comparison can be made using any statistic we like. We could use|YA − YB|, or we could use t(YA,YB).

Recall, to obtain a sample of t(YA,YB) from the null distribution,we

1. Sample a treatment assignment according to the randomization scheme;

2. Compute the value of t(YA,YB) under this treatment assignment andassuming the null hypothesis.

The randomization distribution of the t-statistic is then approximatedby the empirical distribution of

t(1), . . . , t(nsim)

To obtain the p-value, we compare tobs to this empirical distribution:

p-value =#(|t(j)| ≥ |tobs|)

nsim

t.stat.sim<-real()for(nsim in 1:5000) {

Page 38: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 33

t(YA,YB)

Den

sity

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

Figure 2.8: Randomization distribution for the t-statistic

xsim<-sample(x)t.stat.sim[nsim]<- t.test(y[xsim=="B"],y[xsim=="A"],var.equal=T)$stat

}

mean( abs(t.stat.sim) >= abs(t.stat.obs) )

When I ran this, I got

#(|t(j)| ≥ 0.75)

nsim= 0.48 ≈ 0.47 = Pr(|TnA+nB−2| ≥ 0.75)

Is this surprising? These two p-values were obtained via two completelydifferent ways of looking at the problem!

Comparison:

• Assumptions:

– Randomization Test: (1) Treatments are randomly assigned

– t-test:

Page 39: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 34

(1) Data are independent samples

(2) Each population is normally distributed

(3) The two populations have the same variance

• Imagined Universes:

– Randomization Test: Numerical responses remain fixed, weimagine only hypothetical treatment assignments

– t-test: Treatment assignments remain fixed, we imagine apopulation of different experimental units and/or conditions, giv-ing different numerical responses.

• Inferential Context / Type of generalization

– Randomization Test: inference is specific to our particular exper-imental units and conditions.

– t-test: under our assumptions, inference claims to be generaliz-able to other units / conditions, i.e. to a larger population.

So, why are the resulting null distributions so similar?

Some history:

de Moivre (1733): Approximating binomial distributions T =∑n

i=1 Yi, Yi ∈{0, 1}.

Laplace (1800s): Used normal distribution to model measurement error inexperiments.

Gauss (1800s) : Justified least squares estimation by assuming normallydistributed errors.

Gosset/Student (1908): Derived the t-distribution.

Gosset/Student (1925): “testing the significance”

Fisher (1925): “level of significance”

Fisher (1920s?): Fisher’s exact test.

Page 40: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 35

Fisher (1935): “It seems to have escaped recognition that the physical act ofrandomisation, which, as has been shown, is necessary for the validity ofany test of significance, affords the means, in respect of any particularbody of data, of examining the wider hypothesis in which no normalityof distribution is implied.”

Box and Andersen (1955): Approximation of randomization null distribu-tions (and a long discussion).

“If Fisher’s ANOVA had been invented 30 years later or computers had beenavailable 30 years sooner, our statistical procedures would probably be lesstied to theoretical distributions as what they are today” ( Rodgers, 1999)

Confidence Interval for a difference between treatments: Recallthat we may construct a 95% confidence interval by finding those null hy-potheses that would not be rejected at the 0.05 level.

Sampling model:

Y1A, . . . , YnAA ∼ i.i.d. normal(µA, σ2)Y1B, . . . , YnBB ∼ i.i.d. normal(µB, σ2).

Consider evaluating whether δ is a reasonable value for the mean differ-ence:

H0: µA − µB = δH1 : µA − µB 6= δ

Under H0, you should be able to show that

(YA − YB)− δ

sp

√1/nA + 1/nB

∼ tnA+nB−2

Thus a given difference δ is accepted at level α iff

|yA − yB − δ|sp

√1/nA + 1/nB

≤ t1−α/2,nA+nB−2

(yA − yB)− sp

√1

nA

+1

nB

t1−α/2,nA+nB−2 ≤ δ ≤ (yA − yB) + sp

√1

nA

+1

nB

t1−α/2,nA+nB−2

Wheat example: A 95% C.I. for µA − µB is (−9.57, 4.77).

Page 41: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 36

Questions:

• What does the fact that 0 is in the interval say about H0 : µA = µB?

• What is the interpretation of this interval?

• Could we have constructed an interval via randomization tests?

2.9 Power and Sample Size Determination

Suppose that we plan to gather data, and then perform a hypothesis test.

Two sample t-test:

• H0: µA = µB H1: µA 6= µB

• Gather data

• Perform a level α hypothesis test: reject H0 if

|tobs| ≥ t1−α/2,nA+nB−2

Recall, if α = 0.05 and nA, nB are large then t1−α/2,nA+nB−2 ≈ 2

We know that the type I error rate is α = 0.05, or more precisely:

Pr(type I error|H0 true) = Pr(reject H0|H0 true) = 0.05

What about

Pr(type II error|H0 false) = Pr(accept H0|H0 false)

= 1− Pr(reject H0|H0 false)

This is not yet a well-defined problem: there are many different ways in whichthe null hypothesis may be false, e.g. µB−µA = 0.0001 and µB−µA = 10, 000are both ways instances of the alternative hypothesis. However, clearly wehave

Pr(reject H0|µB − µA = .0001) < Pr(reject H0|µB − µA = 10, 000)

Page 42: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 37

To make the question concerning Type II error-rate better defined weneed to be able to refer to a specific alternative hypothesis. For example, inthe case of the two-sample test, for a specific difference δ, we may ask whatis:

1− Pr(type II error|µB − µA = δ) = Pr(reject H0|µB − µA = δ)?

We define the power of a two-sample hypothesis test against a specificalternative to be:

Power(δ, σ, nA, nB) = Pr(reject H0 | µB − µA = δ)

= Pr( |t(YA,YB)| ≥ t1−α/2,nA+nB−2

∣∣µB − µA = δ).

Remember, the “critical” value t1−α/2,nA+nB−2 above which we reject the nullhypothesis was computed from the null distribution.

However, now we want to work out the probability of getting a value ofthe t-statistic greater than this critical value, when a specific alternativehypothesis is true. Thus we need to compute the distribution of our t-statistic under the specific alternative hypothesis.

If we suppose YA1, . . . , YAnA∼ i.i.d. normal(µA, σ) and YB1, . . . , YBnB

∼i.i.d. normal(µB, σ), where µB − µA = δ then we need to know the distri-bution of

t(YA,YB) =YB − YA

sp

√1

nA+ 1

nB

.

We know that if µB − µA = δ then

YB − YA − δ

sp

√1

nA+ 1

nB

∼ tnA+nB−2

but unfortunately

t(YA,YB) =YB − YA − δ

sp

√1

nA+ 1

nB

sp

√1

nA+ 1

nB

. (∗)

Page 43: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 38

hence this has a distribution we have not seen before, the non-central t-distribution. More specifically, if µB − µA = δ then

t(YA,YB) ∼ t∗nA+nB−2

δ

σ√

1nA

+ 1nB

︸ ︷︷ ︸

non-centralityparameter

2.9.1 The non-central t-distribution

A noncentral t-distribution random variable can be represented as

T =Z + γ√

X/ν

where

• γ is a constant;

• Z is standard normal;

• X is χ2 with ν degrees of freedom, independent of Z.

The quantity γ is called the noncentrality parameter.

Self-check question: Derive the above result about the distribution oft(YA,YB).

What does a noncental t-distribution look like? Note that

• the mean is not zero;

• the distribution is not symmetric.

In fact, it can be shown that

E(t(YA,YB) | µB − µA = δ) =δ

σ√

1nA

+ 1nB

×√

ν2Γ(ν−1

2)

Γ(ν2)

where ν = nA + nB − 2, the degrees of freedom, and Γ(x) is the Gammafunction, a generalization of the factorial:

Page 44: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 39

−2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

t

gamma=0gamma=1gamma=2

Figure 2.9: The non-central t-distribution

• For integer n, Γ(n + 1) = n!

• Γ(r + 1) = rΓ(r).

• Γ(1) = 1, Γ(12) =

√π.

Anyway, you can show that for large ν,√ν

2Γ(

ν − 1

2)/Γ(

ν

2) ≈ 1 so

E(t(YA,YB) | µB − µA = δ) ≈ δ

σ√

1nA

+ 1nB

This isn’t really such a big surprise, because we know that:

YB − YA ∼ normal

(δ, σ

√1

nA

+1

nB

).

Hence

YB − YA

σ√

1nA

+ 1nB

∼ normal

δ

σ√

1nA

+ 1nB

, 1

.

Page 45: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 40

We also know that for large nA, nB, s ≈ σ, so non-central t-distribution will(for large enough nA, nB) look approximately normal with

• mean δ/(σ√

(1/nA) + (1/nB));

• standard deviation 1.

Another way of seeing the same thing, looking back at the t-statistic, thefirst term has a t-distribution, and becomes standard normal as n →∞, thesecond term becomes a constant, taking value δ/(σ

√(1/nA) + (1/nB)).

2.9.2 Computing the Power of a test

Recall our level-α testing procedure:

1. Sample data, compute tobs = t(YA,YB) .

2. Compute the p-value, Pr(|TnA+nB−2| > |tobs|).

3. Reject H0 if p-value ≤ α ⇔ |tobs| ≥ t1−α/2,nA+nB−2.

For this procedure, you have shown in the homework that

Pr( reject H0|µB − µA = 0) = Pr( p-value ≤ α|µB − µA = 0)

= Pr(|t(YA,YB)| ≥ t1−α/2,nA+nB−2|H0)

= Pr(|TnA+nB−2| ≥ t1−α/2,nA+nB−2)

= α

But what is the probability of rejection under Hδ : µB − µA = δ? (Hopefullythis is bigger than α! )

Pr(reject H0 | µB − µA = δ) = Pr(|t(YA,YB)| > t1−α/2,nA+nB−2

∣∣ µB − µA = δ)

= Pr(|T ∗

nA+nB−2,γ| > t1−α/2,nA+nB−2

)= Pr

(T ∗

nA+nB−2,γ > t1−α/2,nA+nB−2

)+ Pr

(T ∗

nA+nB−2,γ < −t1−α/2,nA+nB−2

)= [1− Pr

(T ∗

nA+nB−2,γ < t1−α/2,nA+nB−2

)]

+ Pr(T ∗

nA+nB−2,γ < −t1−α/2,nA+nB−2

)

Page 46: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 41

−2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

t

gamma=0gamma=1

Figure 2.10: Critical regions and the non-central t-distribution

where T ∗nA+nB−2,γ has the non-central t-distribution with d.f.= nA + nB − 2

and non-centrality parameter

γ =δ

σ√

1nA

+ 1nB

Often we want to make this calculation in order to see whether or not weneed a larger sample size in order to have a reasonable chance of rejectingthe null hypothesis. If we have a rough idea of δ and σ we can evaluate thepower using this formula.

t.rcrit <- qt( 1-alpha/2 , nA + nB -2 )

t.gamma <- delta/( sigma*sqrt(1/nA + 1/nB ))

t.power <- pt(-t.rcrit, nA+nB-2 , ncp=t.gamma ) +

1- pt(t.rcrit, nA+nB-2 , ncp=t.gamma )

The normal approximation is given by using:t.norm.power <- pnorm(-t.rcrit, mean=t.gamma ) +

1- pnorm(t.rcrit, mean=t.gamma )

This will be very reasonable for large nA, nB. It may be an over-estimateor under-estimate of the power obtained from the t-distribution.

Page 47: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 42

Note: in most cases where there is reasonable power, only one of the terms:

pt(-t.rcrit, nA + nB − 2, ncp=t.gamma ),1- pt(t.rcrit, nA + nB − 2, ncp=t.gamma )

is likely to be much appreciably different from 0. However, including bothterms is not only correct, it also means that we can use the same R code forpositive and negative δ values.Finally, observe that in our calculations we have assumed that the variancesof the two populations are equal.

Example: Suppose

• µB − µA = 2.4

• σ2 = 31.07, σ = 5.57

• nA = nB = 6

What is the probability we will reject H0 at level α = 0.05 for such anexperiment?

> t.gamma<- delta/(sigma*sqrt(1/nA +1/nB))> t.gamma[1] 0.7457637

> t.rcrit<-qt(1-alpha/2,nA+nB-2)> t.rcrit[1] 2.228139

> t.power<-pt(-t.rcrit,nA+nB-2, ncp=t.gamma) ++ 1-pt(t.rcrit,nA+nB-2, ncp=t.gamma)> t.power[1] 0.1038563

What if we double the sample size?

> nA<-nB<-12> t.gamma<- delta/(sigma*sqrt(1/nA +1/nB))> t.gamma[1] 1.054669

> t.rcrit<-qt(1-alpha/2,nA+nB-2)> t.rcrit[1] 2.073873

Page 48: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 43

> t.power<-pt(-t.rcrit,nA+nB-2,ncp=t.gamma)+1-pt(t.rcrit,nA+nB-2,ncp=t.gamma)> t.power[1] 0.1723301

Example: Selecting a sample size. Suppose an agricultural companywants to know whether or not there are any effects of a soil additive on cropyield. The design will be to randomly assign n plots to A=additive and nplots to B=control=no additive. In practice, a difference in yield of less than3 units is considered inconsequential. If there is a difference of 3 or greaterthough, the researchers would like to know about it. How many plots shouldbe in their study?

Procedure:

• CRD with n plots assigned to each of the two treatments;

• Declare a difference between A and B if |t(YA,YB)| > t.975,n+n−2.

Question: What is the probability that a difference is declared, assuming

• δ = 3;

• σ = 6;

• various values of n.

Answer: See Figure 2.11.

More Questions:

• What if the true effect size is larger? smaller?

• What if the true variance is larger? smaller?

• What if the data are not normal?

2.10 Checking Assumptions of a two-sample

t-test

Recall that in a two sample t-test we make three assumptions:

Page 49: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 44

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

t

p(t)

52080

20 40 60 80 100

0.2

0.4

0.6

0.8

nA=nB

pow

er

Figure 2.11: Null and alternative distributions for another wheat example,and power versus sample size.

Page 50: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 45

(1) Observations are independent samples

(2) Samples are from populations that are normally distributed.

(3) Variances of the two groups are equal

(Strictly speaking we only require this to hold under the null hypothesis of nodifference. Having an alternative hypothesis under which the variances are differentdoes not affect our p-value calculation or the Type I error rate of our hypothesistest (Why?). However we would need to do something else if we wanted to computethe power, e.g. simulation )

Assumption (1) will generally hold if the experiment is properly randomized(and blocks are accounted for, more on this later).

Assumption (2) can be checked by a normal probability plot

Idea: Order observations within each group:

yA(1) ≤ yA(2) ≤ · · · ≤ yA(nA)

compare these sample quantiles to the quantiles of a standard normal distri-bution:

z 1nA

− 12≤ z 2

nA− 1

2≤ · · · ≤ znA

nA− 1

2

here Pr(Z ≤ z knA

− 12) = k

nA− 1

2. The −1

2is a continuity correction.

If data are normal, the relationship should be approximately linear. Thuswe plot the pairs

(z 1nA

− 12, yA(1)), . . . , (znA

nA− 1

2, yA(nA)).

Assumption (3) may be checked roughly by fitting straight lines to the prob-ability plots and examining their slopes (Why?)

Formal hypothesis tests for equal variances may also be performed. Wewill return to this later.

Page 51: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 46

−1.0 0.0 1.0

1520

25

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

1418

2226

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

−1.

5−

1.0

−0.

5

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

−1.

4−

1.0

−0.

6

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

−1.

00.

01.

0

Theoretical Quantiles

Sam

ple

Qua

ntile

s ●

−1.0 0.0 1.0

−0.

50.

51.

0

Theoretical Quantiles

Sam

ple

Qua

ntile

s

● ●

−1.0 0.0 1.0

−2.

0−

0.5

0.5

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

−1.

0−

0.6

−0.

2

Theoretical Quantiles

Sam

ple

Qua

ntile

s●

●●

−1.0 0.0 1.0

−0.

50.

51.

0

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

−1.0 0.0 1.0

−0.

50.

51.

5

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

−1.

5−

0.5

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

0.0

1.0

2.0

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

−0.

50.

51.

0

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−1.0 0.0 1.0

−1.

0−

0.4

0.0

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

−1.0 0.0 1.0

−2.

0−

1.0

0.0

Theoretical Quantiles

Sam

ple

Qua

ntile

s ●

−1.0 0.0 1.0

−2.

0−

0.5

1.0

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Figure 2.12: Normal scores plots.

Page 52: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 47

2.10.1 Two-sample t-test with unequal variances

In the case where the sample variances appear unequal, and it is unreasonableto assume this under the null hypothesis, an alternate procedure may be usedin which the following statistic:

tdiff(yA,yB) =yB − yA√

s2A

nA+

s2B

nB

is compared to a t-distribution on

df =

(s2A

nA+

s2B

nB

)2

1nA−1

(s2A

nA

)2

+ 1nB−1

(s2B

nB

)2 .

This is known as Welch’s approximation; it may not give an integer d.f.

Note: This t-distribution is not, in fact the exact sampling distribution oftdiff(yA,yB) under the null hypothesis that µA = µB, and σ2

A 6= σ2B. This is

because the null distribution depends on the ratio of the unknown variances,σ2

A and σ2B. So we are stuck if we want an exact answer! (See Behrens-Fisher

problem.)

However, Welch’s approximation is good in most contexts.

2.10.2 Which two-sample t-test to use? tdiff(yA,yB)vs. t(yA,yB)

• If the sample sizes are the same (nA = nB) then the test statisticstdiff(yA,yB) and t(yA,yB) are the same; however the degrees of free-dom used in the null t-distribution will be different unless the samplestandard deviations are the same.

• If nA > nB, but σA < σB, and µA = µB then the two sample test basedon comparing t(yA,yB) to a t-distribution on nA + nB − 2 d.f. willreject more than 5% of the time.

– If the null hypothesis that both the means and variances are equal,i.e.

H0: µA = µB and σA = σB

Page 53: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 2. COMPARING TWO TREATMENTS 48

is scientifically relevant, then we are computing a valid p-value,and this higher rejection rate is a good thing! Since when thevariances are unequal the null hypothesis is false.

– If however, the hypothesis that is most scientifically relevant isthat H0: µA = µB

without placing any restrictions on the variances, then the higherrejection rate in the test that assumes the variances are the samecould be very misleading, since p-values may be smaller than theyare under the correct null distribution (in which σA 6= σB).

Likewise we will underestimate the probability of Type I error.

• If nA > nB and σA > σB, then the p-values obtained from the test usingt(yA,yB) will tend to be conservative (= larger) than those obtainedwith tdiff(yA,yB).

In short: one should be careful about applying the test based on t(yA,yB)if the sample standard deviations appear very different, and it is not reason-able to assume equal means and variances under the null hypothesis.

Page 54: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

Chapter 3

Comparing Several Treatments

3.1 Introduction to ANOVA

Example (Bacterial growth):

• Background: Bacteria grows on meat products stored in plastic wrap.

• Hypothesis: Alternative packing methods result in reduced bacterialgrowth.

• Treatments: Meat wrapped in

1. ambient air;

2. a vacuum;

3. 1%CO, 40%O2, 59%N ;

4. 100%CO2.

• Experimental design: (CRD) 12 75g beef steaks, 3 randomly assignedto each of the 12 treatments. Bacterial count was measured for eachsteak after 9 days of storage at 4◦ C.

• Results:

Treatment sample mean sample sd1 7.48 .192 5.50 .083 7.26 .044 3.36 .16

49

Page 55: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 50

●●●

●●●

1.0 1.5 2.0 2.5 3.0 3.5 4.0

34

56

7

Treatment

log

bact

cou

nt/c

m2

1234

Figure 3.1: Bacteria data

• Question: Is there significant variation due to treatment?

Possible analysis method: Perform paired t-tests of

H0ij : µi = µj versus H1ij : µi 6= µj

for each of the(42

)= 6 possible pairwise comparisons. Reject a hypothesis

if the associated p-value ≤ α.

Problem: If there is no treatment effect at all, then

Pr(reject H0ij|H0ij true) = α

Pr(reject H0ij some i, j|H0ij true all i, j) = 1− Pr( accept H0ij all i, j|H0ij true all i, j)

≈ 1−∏i<j

(1− α)

If α = 0.05, then

Pr(reject H0ij some i, j|H0ij true all i, j) = 1− .956 = 0.26

Page 56: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 51

So, even though the pairwise error rate is 0.05 the experiment-wiseerror rate is 0.26.

This issue is called the problem of multiple comparisons and will bediscussed further in Chapter 3. For now, we will discuss a method of testingthe global hypothesis of no variation due to treatment:

H0 : µi = µj for all i 6= j versus H1 : µi 6= µj for some i 6= j

To do this, we will compare treatment variability to experimental vari-ability. First we need to have a way of quantifying these things.

3.1.1 A model for treatment variation

Data: yi,j = measurement from the jth replicate under th ith treatment.

i = 1, . . . , t indexes treatments

j = 1, . . . , r indexes observations or replicates.

Treatment means model:

yi,j = µi + εi,j

E[εi,j] = 0

V [εi,j] = σ2

• µi is the ith treatment mean,

• εi,j represents within treatment variation/error/noise.

Treatment effects model:

yi,j = µ + τi + εi,j

E[εi,j] = 0

V [εi,j] = σ2

• µ is the grand mean;

• τ1, . . . , τt are the treatment effects, representing between treat-ment variation

Page 57: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 52

• εi,j represents within treatment variation.

In this model, we typically restrict∑

τi = 0 (Why?) The above representtwo parameterizations of the same model:

µj = µ + τj ⇔ τj = µj − µ

Reduced model:

yi,j = µ + εi,j

E[εi,j] = 0

V [εi,j] = σ2

This is a special case of the above two models with

• µ = µ1 = · · ·µt , or

• τi = 0 for all i.

In this model, there is no variation due to treatment.

3.1.2 Model Fitting

What are good estimates of the parameters? One criteria used to evaluatedifferent values of µ = {µ1, . . . , µt} is the least squares criterion

SSE(µ) =t∑

i=1

r∑j=1

(yi,j − µi)2

The value of µ that minimizes SSE(µ) is called the least-squares es-timate, and will be denoted µ.

Estimating treatment means: SSE is quadratic → convex. The globalminimizer µ satisfies ∇SSE(µ) = 0.

∂µi

SSE(µ) =∂

∂µi

r∑j=1

(yi,j − µi)2

= −2∑

(yi,j − µi)

= −2r(yi· − µi)

Page 58: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 53

Therefore, the global minimum occurs at µ = {y1·, . . . , yt·}.Interestingly, SSE(µ) provides a measure of experimental variability:

SSE(µ) =∑∑

(yi,j − µi)2 =

∑∑(yi,j − yi·)

2

Recall s2i =

∑(yij − yi·)

2/(r− 1) estimates σ2 using data from group i. If wehave more than one group, we want to pool our estimates to be more precise:

s2 =(r − 1)s2

1 + · · ·+ (r − 1)s2t

(r − 1) + · · ·+ (r − 1)

=

∑(y1j − y1·)

2 + · · ·+∑

(y1j − y1·)2

t(r − 1)

=

∑∑(yi,j − µi)

2

t(r − 1)

=SSE(µ)

t(r − 1)≡ MSE

Consider the following assumptions:

A0: Data are independently sampled from their respective populations

A1: Populations have the same variance

A2: Populations are normally distributed

• A0 → µ is an unbiased estimator of µ;

• A0+A1 → s2 is an unbiased estimator of σ2;

• A0+A1+A2 → (µ, s2) are the minimum variance unbiased esti-mators of (µ, σ2).

• A0+A1+A2 → (µ, n−1n

s2) are the maximum likelihood estimatorsof (µ, σ2).

Within-treatment variability: SSE(µ) ≡ SSE is a measure of withintreatment variability:

SSE =t∑

i=1

r∑j=1

(yi,j − yi·)2

It is the sum of squared deviation of observation values from their groupmean.

Page 59: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 54

Between-treatment variability: The analogous measure of between treat-ment variability is

SST =t∑

i=1

r∑j=1

(yi· − y··)2 = r

t∑i=1

(yi· − y··)2

We call this the treatment sum of squares. We also define MST =SST/(t− 1) as the treatment mean squares or mean squares (due to)treatment

3.1.3 Testing hypothesis with MSE and MST

Consider evaluating the null hypothesis of no treatment effect:

H0 : {µ1, . . . , µt} all equal versus H1 : {µ1, . . . , µt} not all equal

Note that

{µ1, . . . , µt} not all equal ⇔t∑

i=1

(µi − µ)2 > 0

Probabilistically,

t∑i=1

(µi − µ)2 > 0 ⇒ a larget∑

i=1

(yi· − y··)2 will probably be observed

Inductively,

a large∑i=1t

(yi· − y··)2 observed ⇒

∑i=1t

(µi − µ)2 > 0 is plausible

So a large value of SST or MST gives evidence that there are differencesbetween the true treatment means. But how large is large?

MST under the null: Suppose H0 : µ1 = · · · = µt = µ is true . Then

E(Yi·) = µ → E(√

rYi·) =√

V (Yi·) = σ2/r → V (√

rYi·) = σ2

So under the null√

rY1·, . . . ,√

rYt· are t independent random variables havingthe same mean and variance. Recall that if X1, . . . , Xn ∼ i.i.d. P , then an

Page 60: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 55

unbiased estimate of the variance of population P is given by the samplevariance

∑(Xi − X)2/(n− 1). Therefore,∑(√

rYi −√

rY )2

t− 1is an unbiased estimate of V (

√rYi) = σ2

But ∑(√

rYi −√

rY )2

t− 1=

r∑

(Yi − Y )2

t− 1

=SST

t− 1= MST,

so E(MST |H0) = σ2.

MST under an alternative: We can show that under a given value ofµ,

E(MST |µ) = σ2 +r∑t

i=1(µi − µ)2

t− 1

≡ σ2 + rv2τ

So E(MST |µ) ≥ σ2, with equality only if there is no variability in treat-ment means.

Lets summarize our potential estimators of σ2:

• Under H0 :

– E(MSE|H0) = σ2

– E(MST |H0) = σ2

• Under H1 :

– E(MSE|H1) = σ2

– E(MST |H1) = σ2 + rv2τ

Here is an idea for a test statistic:

• If H0 is true:

Page 61: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 56

– MSE ≈ σ2

– MST ≈ σ2

• If H0 is false

– MSE ≈ σ2

– MST ≈ σ2 + rv2τ > σ2

so

under H0, MST/MSE should be around 1,

under H1, MST/MSE should be bigger than 1.

Thus the test statistic F (Y) = MST/MSE is sensitive to deviations fromthe null, and can be used to measure evidence against H0. Now all we needis a null distribution.

Example (Bacterial Growth):

y1<-c(7.66,6.98,7.80)y2<-c(5.26,5.44,5.80)y3<-c(7.41,7.33,7.04)y4<-c(3.51,2.91,3.66)

r<-3t<-4

ybar.t<- c(mean(y1),mean(y2),mean(y3),mean(y4))s2.t<- c(var(y1),var(y2),var(y3),var(y4))

SSE<- sum( (r-1)*s2.t)SST<- r*(sum( ( ybar.t-mean(ybar.t))^2) )

MSE<-SSE/(t*(r-1))MST<-SST/(t-1)

Source of variation Sums of Squares Mean Squares F-ratioTreatment 32.8728 10.958 94.58Noise 0.9268 0.116

The F-ratio seems pretty far away from 1. Possible explanations:

• H1: The F-value is a result of actual differences between treatments.

Page 62: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 57

• H0: The F-value is a result of a chance assignment of treatment 1 tothe“susceptible” meats, treatment 4 to the “resistant” meats, etc.

To rule out H0, we need to look at how likely such a treatment assignmentis.

Randomization test:

> y[1] 7.66 6.98 7.80 5.26 5.44 5.80 7.41 7.33 7.04 3.51 2.91 3.66> x[1] 1 1 1 2 2 2 3 3 3 4 4 4

F.obs<-anova(lm(y~as.factor(x)))$F[1]

> F.obs[1] 94.58438

F.null<-NULLfor(nsim in 1:1000) {

x.sim<-sample(x)F.null<-c(F.null, anova(lm(y~as.factor(x.sim)))$F[1] ) }

> mean(F.null>=F.obs)[1] 0

> max(F.null)[1] 92.8799

The observed between-group variation is much larger than the ob-served within-group variation, much larger than we’d expect to get if thenull hypothesis were true.

3.1.4 Partitioning sums of squares

Intuition:

Total variability = variability due to treatment + variability due to other things= between treatment variability + within treatment variability

Proposition:

SSTotal ≡t∑

i=1

r∑j=1

(yi,j − y··)2 = SST + SSE

Page 63: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 58

Histogram of F.null

F.null

Den

sity

0 20 40 60 80

0.0

0.1

0.2

0.3

0.4

Figure 3.2: Randomization distribution of the F -statistic

Proof:

t∑i=1

r∑j=1

(yi,j − y··)2 =

∑i

∑j

[(yi,j − yi·) + (yi· − y··)]2

=∑

i

∑j

{(yi,j − yi·)

2 + 2(yi,j − yi·)(yi· − y··) + (yi· − y··)2}

=∑

i

∑j

(yi,j − yi·)2 +

∑i

∑j

2(yi,j − yi·)(yi· − y··) +∑

i

∑j

(yi· − y··)2

= (1) + (2) + (3)

(1) =∑

i

∑j(yi,j − yi·)

2 = SSE

(3) =∑

i

∑j(yi· − y··)

2 = r∑

i(yi· − y··)2 = SST

(2) = 2∑

i

∑j(yi,j − yi·)(yi· − y··) = 2

∑i(yi· − y··)

∑j(yi,j − yi·)

but note that ∑j

(yi,j − yi·) =

(∑j

yi,j

)− ryi,·

= ryi,· − ryi,· = 0

Page 64: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 59

for all j. Therefore (2) = 0 and we have

total sum of squared deviations from grand mean = between trt sums of squares +within trt sums of squares

SSTotal = SST + SSE

Putting it all together:

H0 : µi1 = µi2 for all i1, i2 ⇒ Reduced model : yi,j = µ + εi,j

H1 : µi1 6= µi2 for some i1 6= i2 ⇒ Full model : yi,j = µi + εi,j

If we believe H1,

• our fitted value of yi,j is yi·, i.e. our predicted value of anotherobservation in group i is yi = µi.

• the residual for observation {i, j} is (yi,j − yi·).

• the model lack-of-fit is measured by residual sum of squares= sumof squared residuals=∑

i

∑j

(yi,j − yi,·)2 = SSE.

If we believe H0,

• our fitted value of yi,j is y··, i.e. our predicted value of anotherobservation in any group is y·· = µ.

• the residual for observation {i, j} is (yi,j − y··).

• the model lack-of-fit is measured by the residual sum of squares=sum of squared residuals=∑

i

∑j

(yi,j − y··)2 = SST.

The improvement in model fit by including treatment parameters iserror in reduced model - error in full model = improvement in model fit

SSTotal - SSE = SSTThe main idea: Variance can be partitioned into parts representing dif-ferent sources. The variance explained by different sources can be comparedand analyzed. This gives rise to the ANOVA table:

Page 65: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 60

3.1.5 The ANOVA table

ANOVA = Analysis of Variance

Source of variation Degrees of Freedom Sums of Squares Mean Squares F-ratioTreatment t− 1 SST MST = SST

t−1MST/MSE

Noise t(r − 1) SSE MSE = SSEt(r−1)

Total tr − 1 SSTotal

Bacteria example:Source of variation Degrees of Freedom Sums of Squares Mean Squares F-ratioTreatment 3 32.873 10.958 94.58Noise 8 0.927 0.116Total 11 33.800

Using the ANOVA table:

• No model assumptions → table gives a descriptive decomposition ofvariation.

• Assuming V (Yi,j) = σ2 in all groups →

– E(MSE) = σ2

– E(MST ) = σ2 + rv2τ , where v2

τ = variance of group means.

• Assuming treatments were randomly assigned → hypothesis tests, p-values.

• Assuming data are random samples from normal population→ hypoth-esis tests, p-values, confidence intervals for parameters.

Understanding Degrees of Freedom: Consider an experiment with ttreatments/groups :

Data Group meansy1,1 . . . , y1,r y1,·y2,1 . . . , y2,r y2,·

......

yt,1 . . . , yt,r yt,·

y =

ry1 + · · ·+ ryt

tr=

y1 + · · ·+ yt

t

Page 66: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 61

We can “decompose” each observation as follows:

yi,j = y + (yi − y) + (yi,j − yi)

This leads to

(yi,j − y) = (yi − y) + (yi,j − yi)total variation = between group variation + within group variation

All data can be decomposed this way, leading to the following vectors oflength tr :

Total Treatment Errory1,1 − y.. = (y1. − y..) + (y1,1 − y1.)y1,2 − y.. = (y1. − y..) + (y1,2 − y1.)

. = . + .

. = . + .

. = . + .y1,r − y.. = (y1. − y..) + (y1,r − y1.)y2,1 − y.. = (y2. − y..) + (y2,1 − y2.)

. = . + .

. = . + .

. = . + .y2,r − y.. = (y2. − y..) + (y2,r − y2.)

......

...yt,1 − y.. = (yt. − y..) + (yt,1 − yt.)

. = . + .

. = . + .

. = . + .yt,r − y.. = (yt. − y..) + (yt,r − yt.)

SSTotal = SSTrt + SSEtr − 1 = t− 1 + t(r − 1)

We’ve seen degrees of freedom before, in the definition of a χ2 randomvariable:

• dof = the number of statistically independent elements in a vector

In the ANOVA table, the dof have a geometric interpretation:

• dof = the number of components of a vector that can vary indepen-dently

Page 67: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 62

To understand this latest definition, consider x1, x2, x3 and x = (x1 + x2 +x3)/3: x1 − x

x2 − xx3 − x

=

c1

c2

c3

How many degrees of freedom does this vector have?How many components can vary independently ?

c1 + c2 = x1 + x2 − 2(x1 + x2 + x3)/3

= x1/3 + x2/3− 2x3/3

= x1/3 + x2/3 + x3/3− x3

= x− x3

= −c3

So c1, c2, c3 can’t all be independently varied, only two at a time can bearbitrarily changed. This vector thus lies in a two-dimensional subspace ofR3, and has 2 degrees of freedom. In general, x1 − x

· · ·xm − x

is an m-dimensional vector in an m− 1 dimensional subspace, having m− 1degrees of freedom.

Exercise: Return to the vector decomposition of the data and obtain thedegrees of freedom of each component. Note that

• dof = dimension of the space the vector lies in

• SS = squared length of the vector

Page 68: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 63

3.1.6 More sums of squares geometry

Consider the following vectors:

y =

y1,1...

y1,r

... yt,1...

yt,r

ytrt =

y1·...

y1·

... yt·...

yt·

y·· = 1y·· =

y··...y··

... y··...y··

We can express our decomposition of the data as follows:

(y − y··) = (ytrt − y··) + (y − ytrt)a = b + c

This is just vector addition/subtraction on vectors of length tr. Recall thattwo vectors u and v are orthogonal/perpendicular/at right angles if

u · v ≡m∑

i=1

uivi = 0

We have already shown that b and c are orthogonal:

b · c =rt∑

l=1

blcl

=t∑

i=1

r∑j=1

(yt· − y··)(yi,j − yi·)

=t∑

i=1

(yt· − y··)r∑

j=1

(yi,j − yi·)

=t∑

i=1

(yt· − y··)× 0

= 0

So the vector a is the vector sum of two orthogonal vectors. We can drawthis:

Page 69: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 64

a = (y − y··)c = (y − ytrt)

b = (ytrt − y··)

Now recall

• ||a||2 = ||y − y··||2 = SSTotal

• ||b||2 = ||ytrt − y··||2 = SST

• ||c||2 = ||y − ytrt||2 = SSE

What do we know about right triangles?

||a||2 = ||b||2 + ||c||2SSTotal = SST + SSE

So the ANOVA decomposition is an application of Pythagoras’ Theorem.

Final observation: recall that

• df (ytrt − y··) = t− 1

• df (y − ytrt) = t(r − 1)

• (ytrt − y··) and (y − ytrt) are orthogonal.

This last bit means the degrees of freedom must add, so

df(y − y··) = (t− 1) + t(r − 1) = tr − 1

Page 70: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 65

3.1.7 Unbalanced Designs

If the number of replications is constant for all levels of the treatment/factorthen the design is called balanced. Otherwise it is said to be unbalanced.Unbalanced data:

• yij, i = 1, . . . , t, j = 1, . . . , ri

• let N =∑t

i=1 ri be the total sample size.

How might the ANOVA decomposition work in this case?

Partitioning sources of variation, unequal sample sizes:

Total Treatment Errory1,1 − y.. = (y1. − y..) + (y1,1 − y1.)y1,2 − y.. = (y1. − y..) + (y1,2 − y1.)

. = . + .

. = . + .

. = . + .y1,r1 − y.. = (y1. − y..) + (y1,r1 − y1.)y2,1 − y.. = (y2. − y..) + (y2,1 − y2.)

. = . + .

. = . + .

. = . + .y2,r2 − y.. = (y2. − y..) + (y2,r2 − y2.)

......

...yt,1 − y.. = (yt. − y..) + (yt,1 − yt.)

. = . + .

. = . + .

. = . + .yt,rt − y.. = (yt. − y..) + (yt,rt − yt.)

SSTotal = SSTrt + SSEtr − 1 = t− 1 +

∑ti=1(ri − 1)

Q:In the above, y·· =∑∑

yi,j/N . Might we take a different average?

Page 71: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 66

Sums of squares and degrees of freedom for unbalanced data: Wedefine the sums of squares as the squared lengths of these vectors:

• SSTotal = ||a||2 =∑t

i=1

∑ri

j=1(yij − y··)2

• SSTrt = ||b||2 =∑t

i=1

∑ri

j=1(yi· − y··)2 =

∑ti=1 ri(yi· − y··)

2

• SSE = ||c||2 =∑t

i=1

∑ri

j=1(yij − yi·)2

Lets see if things add in a nice way. First, lets check orthogonality:

b · c =t∑

i=1

ri∑j=1

(yt· − y··)(yi,j − yi·)

=t∑

i=1

(yt· − y··)

ri∑j=1

(yi,j − yi·)

=t∑

i=1

(yt· − y··)× 0 = 0

Note: c must be orthogonal to any vector that is constant “within a group.”So we have

a = b + cb · c = 0

}→ ||a||2 = ||b||2 + ||c||2

and so SSTotal = SST + SSE as before. What about degrees of freedom?

• dof(c) =∑t

i=1(ri − 1) = N − t should be clear

• dof(a) = N − 1 should be clear

• dof(b) = ?

Note that in general,

1

t

t∑i=1

yi· 6= y··

But in the vector b we have ri copies of (y· − y··), and

1∑ri

∑riyi· = y··

and so the vector b does sum to zero. Another way of looking at it is thatthe vector b is made up of t numbers, which don’t sum to zero, but theirweighted average sums to zero, and so the degrees of freedom are t− 1.

Page 72: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 67

ANOVA table for unbalanced data:

Source Deg. of Freedom Sum of Squares Mean Square F-Ratio

Treatment t− 1 SST MST=SSTt−1

MST/MSE

Noise N − t SSE MSE = SSEN−t

Total SSTotal N − 1

Now suppose the following model is correct:

yi,j = µi + εi,j V (εi,j) = σ2

Does MSE still estimate σ2?

MSE = SST/(N − t)

=

∑r1

j=1(y1,j − y1·)2 + · · ·

∑rt

j=1(yt,j − yt·)2

(r1 − 1) + · · ·+ (rt − 1)

=(r1 − 1)s2

1 + · · ·+ (rt − 1)s2t

(r1 − 1) + · · ·+ (rt − 1)

So MSE is a weighted average of a bunch of unbiased estimates of σ2, so itis still unbiased.

Is the F-statistic still sensitive to deviations from H0? Note that a groupwith more observations contributes more to the grand mean, but it alsocontributes more terms to the SST. One can show

• E(MSE) = σ2

• E(MST) = σ2 + v2τ , where

– v2τ =

Pti=1 riτ

2i

t−1

– τi = µi − µ

– µ =Pt

i=1 riµiPri

.

So yes, MST/MSE will still be sensitive to deviations from the null.

Page 73: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 68

3.1.8 Normal sampling theory for ANOVA

Example (Blood coagulation): From a large population of farm animals,24 animals were selected and randomly assigned to one of four different dietsA, B, C, D. However, due to resource constraints, rA = 4, rB = 6, rC =6, rD = 8.

Questions:

• Does diet have an effect on coagulation time?

• If a given diet were assigned to all the animals in the population, whatwould the distribution of coagulation times be?

• If there is a diet effect, how do the mean coagulation times differ?

The first question we can address with a randomization test. For the secondand third we need a sampling model:

yi,j = µi + εi,j

ε1,1 . . . εt,rt ∼ i.i.d. normal(0, σ)

This model implies

• independence of errors

• constant variance

• normally distributed data

Another way to write it:

yA,1, . . . , yA,4 ∼ i.i.d. normal(µA, σ)

yB,1, . . . , yB,6 ∼ i.i.d. normal(µB, σ)

yC,1, . . . , yC,6 ∼ i.i.d. normal(µC , σ)

yD,1, . . . , yD,8 ∼ i.i.d. normal(µD, σ)

So we are viewing the 4 samples under A as a random sample from thepopulation of coagulation times that would be present if all animals got A(and similarly for samples under B,C and D).

Page 74: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 69

AA

A

A

B

B

B

BBB

C

C

C

CCC

D

DDD

DDD

D

6065

70

diet

coag

ulat

ion

time

Figure 3.3: Coagulation data

> anova(lm(ctime~diet))Analysis of Variance Table

Response: ctimeDf Sum Sq Mean Sq F value

diet 3 228.0 76.0 13.571Residuals 20 112.0 5.6

The F-stat is large, but how unlikely is it under H0?

Two viewpoints on null distributions:

• The 24 animals we selected are fixed. Other possible outcomes of theexperiment correspond to different assignments of the treatments.

• The 24 animals we selected are random samples from a larger popula-tion of animals. Other possible outcomes of the experiment correspondto different experimental units.

Randomization tests correspond to the first viewpoint, sampling theory tothe second.

Page 75: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 70

3.1.9 Sampling distribution of the F -statistic

Recall the χ2 distribution:

Y1, . . . , Yn ∼ i.i.d. normal(µ, σ) ⇒ 1

σ2

∑(Yi − Y )2 ∼ χ2

n−1

Also,X1 ∼ χ2

k1

X2 ∼ χ2k2

X1 ⊥ X2

⇒ X1 + X2 ∼ χ2k1+k2

(Null) distribution of SSE:

P P(Yij−Yi·)2

σ2 = 1σ2

∑(Y1j − Y1·)

2 + · · · + 1σ2

∑(Ytj − Yt·)

2

∼ χ2r1−1 + · · · + χ2

rt−1

∼ χ2N−t (recall

∑(ri − 1) = N − t)

So SSE/σ2 ∼ χ2N−t.

Null distribution of SST: Under H0,

Yi ∼ normal(µ, σ/√

ri)√riYi ∼ normal(

√riµ, σ)

1

σ2SST =

1

σ2

∑ri(Yi − Y··)

2

=1

σ2

t∑i=1

(√

riYi −√

riY··)2 ∼ χ2

t−1

Results so far:

• SSE/σ2 ∼ χ2N−t

• SST/σ2 ∼ χ2t−1

• SSE⊥ SST (why?)

Page 76: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 71

Introducing the F -distribution: If

X1 ∼ χ2k1

X2 ∼ χ2k2

X1 ⊥ X2

⇒ X1/k1

X2/k2

∼ Fk1,k2

Fk1,k2 is the “F -distribution with k1 and k2 degrees of freedom.”

Application: Under H0(SSTσ2

)/(t− 1)(

SSEσ2

)/(N − t)

=MST

MSE∼ Ft−1,N−t

A large value of F is evidence against H0, so reject H0 if F > Fcrit. How todetermine Fcrit?

Level-α testing procedure:

1. Gather data;

2. Construct ANOVA table;

3. Reject “H0 : µi = µ for all i” if F > qf(1-alpha, dof.trt, dof.err)

.

So Fcrit is the 1−α quantile of an Ft−1,N−t distribution. Under this procedure(and a host of assumptions),

Pr(reject H0|H0 true) = α

Exercise: Study these plots of F-distributions until you know why theylook the way they do.

Back to the example:

> anova(lm(ctime~diet))Analysis of Variance Table

Response: ctimeDf Sum Sq Mean Sq F value Pr(>F)

diet 3 228.0 76.0 13.571 4.658e-05 ***

Page 77: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 72

0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F

dens

ity

F(3,20)F(3,10)F(3,5)F(3,2)

0 5 10 15 20

0.2

0.4

0.6

0.8

1.0

F

CD

F F(3,20)F(3,10)F(3,5)F(3,2)

Figure 3.4: F-distributions

Page 78: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 73

Histogram of Fsim

F

p(F

_3,2

0)

0 2 4 6 8

0.0

0.2

0.4

0.6

Figure 3.5: Normal-theory and randomization distributions of the F -statistic

Residuals 20 112.0 5.6

Fobs<-anova(lm(ctime~diet))$F[1]Fsim<-NULLfor(nsim in 1:1000) {diet.sim<-sample(diet)Fsim<-c(Fsim, anova(lm(ctime~diet.sim))$F[1] )

}

> mean(Fsim>=Fobs)[1] 2e-04

> 1-pf(Fobs,3,20)[1] 4.658471e-05

3.1.10 Comparing group means

If H0 is rejected, there is evidence that some population means are differentfrom other. We can explore this further by making treatment comparisons.

Page 79: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 74

If H0 is rejected we

• estimate µi with yi ;

• estimate σ2i with

– s2i : if variances are very unequal, this might be a better estimate.

– MSE : if variances are close and r is small, this might be a betterestimate.

Standard practice: Unless strong evidence to the contrary, we typically as-sume V (Yi,j) = V (Yk,l) = σ2, and use s2 ≡ MSE to estimate σ2. In thiscase,

V (µi) = V (Yi·) = σ2/ri

≈ s2/ri

The “standard error of the mean” = SE(µi) =√

s2/ri , is an estimate ofV (µi) = V (Yi·) = σ2/ri

Standard error: The usual definition of the standard error of an esti-mator θ of a parameter θ is an estimate of its sampling standard deviation:

θ = θ(Y)

V (θ) = γ2

SE(θ) = γ

where γ2 is an estimate of γ2.

Confidence intervals for treatment means: Similar to the one samplecase.

Yi· − µi

SE(Yi·)=√

riYi· − µi√

MSE=

Yi,· − µi

s/√

ri

∼ tN−t

Note: degrees of freedom are those associated with MSE, NOT ri− 1. As aresult,

Yi ± SE(Yi·)× t1−α/2,N−t

is a 100× (1− α)% confidence interval for µi.

Page 80: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 75

A handy rule of thumb: If θ is an estimator of θ, then in many situations

θ ± 2× SE(θ)

is an approximate 95% CI for θ.

Coagulation Example:

Yi ± SE(Yi·)× t1−α/2,N−t

For 95% confidence intervals,

• t1−α,N−t = t.975,20 = qt(.975, 20) ≈ 2.1

• SE(Yi) =√

s2/ri =√

5.6/ri = 2.37/√

ri

Diet µdiet) ri SE(µdiet) 95% CI

C 68 6 0.97 (65.9,70.0)B 66 6 0.97 (63.9,68.0)A 61 4 1.18 (58.5,63.5)D 61 8 0.84 (59.2,62.8)

3.1.11 Power calculations for the F-test

Recall, the power of a test is the probability of rejecting H0 when it is false.Of course, this depends on how the null hypothesis is false.

Power(µ, σ, r) = Pr(reject H0|µ, σ, r)

= Pr(Fobs > F1−α,t−1,N−t|µ, σ, r)

Sampling distributions:

Y1,1 . . . , Y1,r ∼ i.i.d. normal(µ1, σ)...

......

Yt,1 . . . , Yt,r ∼ i.i.d. normal(µt, σ)

Under this sampling model, F = F (Y) has a non-central F distributionwith

Page 81: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 76

• degrees of freedom t− 1, N − t

• noncentrality parameter λ

λ = r∑

τ 2i /σ2

where τi = µi − µ is the ith treatment effect. In many texts, power isexpressed as a function of the quantity Φ

Φ =

√r∑

τ 2i

σ2t=√

λ/t

Lets try to understand what Φ represents:

Φ2 =

∑τ 2i /t

σ2/r

=treatment variation

experimental uncertainty= treatment variation× experimental precision

Note that “treatment variation” means “average squared treatment effectsize”. We can gain some more intuition by rewriting λ as follows:

λ = r∑

τ 2i /σ2

= r × t×(∑

τ 2i

t

)1

σ2

= N × between-treatment variation

within-treatment variation

So presumably power is increasing in Φ, and λ.

Power(µ, σ, r) = Pr(reject H0|µ, σ, r)

= Pr(Fobs > F1−α,t−1,N−t|µ, σ, r)

= 1-pf( qf(1-alpha,t-1,N-t) , t-1 , N-t , ncp=lambda )

Page 82: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 77

2 4 6 8 10

3.0

3.5

4.0

r

F.c

rit

2 4 6 8 10

1015

2025

3035

40

r

lam

bda

2 4 6 8 10

0.4

0.5

0.6

0.7

0.8

0.9

1.0

r

pow

er

t=4, alpha=0.05, var.between/var.within=1

Figure 3.6: Power

2 4 6 8 10

3.0

3.5

4.0

r

F.c

rit

2 4 6 8 10

2030

4050

6070

80

r

lam

bda

2 4 6 8 10

0.75

0.85

0.95

r

pow

er

t=4, alpha=0.05, var.between/var.within=2

Figure 3.7: Power

Page 83: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 78

3.2 Treatment Comparisons

Analysis of Variance Table

Response: ctimeDf Sum Sq Mean Sq F value Pr(>F)

diet 3 228.0 76.0 13.571 4.658e-05 ***Residuals 20 112.0 5.6

Conclusion: There are substantial differences between the treatments. Butwhat are the differences?

3.2.1 Contrasts

Differences between sets of means can be evaluated by estimating contrasts.A contrast is a linear function of the means such that the coefficients sumto zero:

Ck =t∑

i=1

kiµi

t∑i=1

ki = 0

Examples:

• diet 1 vs diet 2 : C = µ1 − µ2

• diet 1 vs diet 2,3 and 4 : C = µ1 − (µ2 + µ3 + µ4)/3

• diets 1 or 2 vs diets 3 or 4 : C = (µ1 + µ2)/2− (µ3 + µ4)/2

Contrasts are functions of the unknown parameters. We can estimatethem, and obtain standard errors for them. This leads to confidence intervalsand hypothesis tests.

Parameter estimates: Let

C =t∑

i=1

kiµi =t∑

i=1

kiyi·

Then E[C] = C, so C is a unbiased estimator of C.

Page 84: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 79

Standard errors:

V (C) =t∑

i=1

V (kiyi·)

=t∑

i=1

k2i σ

2/ri

= σ2

t∑i=1

k2i /ri

So an estimate of V (C) is

s2C = s2

t∑i=1

k2i /ri

The standard error of a contrast is an estimate of its standard deviation

SE(C) = s

√∑ k2i

ri

t-distributions for contrasts: Consider

C

SE(C)=

∑ti=1 kiyi·

s√∑

k2i /ri

If the data are normally distributed , then under H0 : C =∑

kiµi = 0,

C

SE(C)∼ tN−t

Exercise: Prove this result

Hypothesis test:

• H0 : C = 0 versus H1 : C 6= 0.

• Level-α test : Reject H0 if |C/SE(C)| > t1−α/2,N−t.

• p-value: Pr(|tN−t > |C(y)/SE(C(y))|) = 2*(1-pt( abs(c_hat/se_c_hat),N-t)) .

Page 85: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 80

Example: Recall in the coagulation example µA = 61, µB = 66, and their95% confidence intervals were (58.5,63.5) and (63.9,68.0). Let C = µA− µB.

Hypothesis test: H0 : C = 0.

C

SE(C)=

yA· − yB·

s√

1/6 + 1/4

=−5

1.53= 3.27

p-value = 2*(1-pt(3.27,20)) = 0.004.

Confidence intervals: Based on the normality assumptions, a 100 × (1 −α)% confidence interval for C is given by

C ± t1−α/2,N−t × SE(C)

3.2.2 Orthogonal Contrasts

What use are contrasts? Consider the data in Figure 3.8.

> anova(lm(y~as.factor(x))Df Sum Sq Mean Sq F value Pr(>F)

as.factor(x) 4 87.600 21.900 29.278 1.690e-05 ***Residuals 10 7.480 0.748

In this case, the treatment levels have an ordering to them (this is not alwaysthe case). Consider the following t− 1 = 4 contrasts:

Contrast k1 k2 k3 k4 k5

C1 -2 -1 0 1 2C2 2 -1 -2 -1 2C3 -1 2 0 -2 1C4 1 -4 6 -4 1

Note:

• These are all actually contrasts (the coefficients sum to zero).

• They are orthogonal : Ci · Cj = 0.

Page 86: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 81

●●

●●

10 20 30 40 50

1214

1618

20

plant density

grai

n yi

eld

Figure 3.8: Yield-density data

What are these contrasts representing? Would would make them large?

• If all µi’s are the same, then they will all be close to zero. This is the“sum to zero” part, i.e. Ci · 1 = 0 for each contrast.

• C1 will be big if µ1, . . . , µ5 are monotonically increasing or monoton-ically decreasing. This contrast is measuring the linear componentof the relationship between density and yield.

• C2 is big if “there is a bump”, up or down, in the middle of the treat-ments, i.e. C2 is measuring the quadratic component of the relation-ship.

• Similarly, C3 and C4 are measuring the cubic and quartic parts of therelationship between density and yield.

You can produce these contrast coefficients in R:

> contr.poly(5).L .Q .C ^4

[1,] -6.324555e-01 0.5345225 -3.162278e-01 0.1195229[2,] -3.162278e-01 -0.2672612 6.324555e-01 -0.4780914

Page 87: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 82

[3,] -1.481950e-18 -0.5345225 -3.893692e-16 0.7171372[4,] 3.162278e-01 -0.2672612 -6.324555e-01 -0.4780914[5,] 6.324555e-01 0.5345225 3.162278e-01 0.1195229

Here each contrast has been normalized so that∑

k2i = 1. This doesn’t

change their sum, and it doesn’t change the orthogonality.Estimating the contrasts is easy:

> trt.means<-tapply(y,x,mean)> trt.means10 20 30 40 5012 16 19 18 17

> c.hat<-trt.means%*%contr.poly(5)> c.hat

.L .Q .C ^4[1,] 3.794733 -3.741657 0.3162278 0.83666

> 3*c.hat^2.L .Q .C ^4

[1,] 43.2 42 0.3 2.1

> sum(3*c.hat^2)[1] 87.6

Coincidence? I think not. Recall, we represented treatment variation as avector that lived in t − 1 dimensional space. This variation can be furtherdecomposed into t− 1 orthogonal parts:

Source df SS MS F

Linear trt 1 43.20 43.20 57.75Quad trt 1 42.00 42.00 56.15Cubic trt 1 0.30 0.30 0.40Quart trt 1 2.10 2.10 2.81Total trt 4 87.60 21.90 29.28

Error 10 7.48 0.75Total 14 95.08

The useful idea behind orthogonal contrasts is that the treatment variationcan be decomposed into orthogonal parts.

Page 88: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 83

3.2.3 Multiple Comparisons

An experiment with t treatment levels has(

t2

)pairwise comparisons, i.e.

contrasts of the form C = µi − µj. Should we perform hypothesis tests forall comparisons?

The more hypotheses we test, the higher the probability that at least oneof them will be rejected, regardless of their validity.

Two levels of error: Define the hypotheses

• H0 : µi = µ for all i

• H0ij : µi = µj

We can associate error rates to both of these types of hypotheses

• Experiment-wise type I error rate: Pr(reject H0|H0 is true ).

• Comparison-wise type I error rate : Pr(reject H0ij|H0ij is true ).

Consider the following procedure:

1. Gather data

2. Compute all pairwise contrasts and their t-statistics

3. Reject each H0ij for which |tij| > t1−αC/2,N−t

The comparison-wise type I error rate is of course

P (|tij| > t1−αC/2,N−t|H0ij) = αC

The experiment-wise type I error rate is the probability that we saydifferences between treatments exist when no differences exist:

P (|tij| > tcrit for some i, j |H0) ≥ αC

with equality only if there are two treatments total. The fact that theexperiment-wise error rate is larger than the comparison-wise rate is calledthe issue of multiple comparisons. What is the experiment-wise rate inthis analysis procedure?

Page 89: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 84

P (one or more H0ij rejected |H0) = 1− P (none of H0ij rejected |H0)<∼ 1−

∏i,j

P (H0ij not rejected |H0)

= 1− (1− αC)(t2)

Discussion: Why “<∼”? Are the tests independent, positively dependent or

negatively dependent?We can approximate this with a Taylor series expansion: Let f(x) =

1− (1− αC)x. Then f ′(0) = − log(1− αC) and

f(x) ≈ f(0) + xf ′(0)

1− (1− αC)x ≈ (1− 1)− x log(1− αC)

≈ xαC for small αC .

So

P (one or more H0ij rejected |H0)<∼(

t

2

)αC

Another way to derive this bound is to recall that

P (A1 ∪ A2) = P (A1) + P (A2)− P (A1 ∩ A2)

≤ P (A1) + P (A2) and that

P (A1 ∪ · · · ∪ Ak) ≤ P (A1) + · · ·P (Ak)

(subadditivity). Therefore

P (one or more H0ij rejected |H0) ≤∑i,j

P (H0ij rejected |H0)

=

(t

2

)αC

Bonferroni: Bonferroni procedure for controlling experiment-wise type Ierror rate is as follows:

1. Perform pairwise t-tests on all(

t2

)pairs.

2. Reject H0ij if |tij| > t1−αC/2,N−t

Page 90: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 85

where αC = αE/(

t2

).

• the experiment-wise error rate is less than αE

• the comparison-wise error rate is αC .

Fisher’s Protected LSD (Fisher’s least-significant-difference method)Another approach to controlling type I error makes use of the F -test.

1. Perform ANOVA and compute F -statistic

2. If F (y) < F1−αE ,t−1,t−1,N−t then don’t reject H0 and stop.

3. If F (y) > F1−αE ,t−1,t−1,N−t then reject H0 and reject all H0,i,j for which

|Cij/SE(Cij)| > t1−αC/2,N−t.

Under this procedure,

• Experiment-wise type I error rate is αE

• Comparison-wise type I error rate is

P (reject H0ij |H0ij) = P (F > Fcrit and |tij| > tcrit|H0ij)

= P (|tij| > tcrit|F > Fcrit, H0ij)P (F > Fcrit|H0ij)

– If H0 is also true, this is less than αE, and is generally betweenαC × αE and αE, depending on how many treatments there are.

– If H0 is not true, we don’t know what this is, but some simulationwork suggests this is approximately αC×power, again dependingon how many treatments there are and the true treatment varia-tion.

3.3 Model Diagnostics

Model: yij = µj + εij

If

A1: {εi,j}’s are independent;

A2: V (εi,j) = σ2 for all i, j;

Page 91: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 86

A3: {εi,j}’s are normally distributed.

thenF = MST/MSE ∼ Ft−1,N−t,λ

where λ is the noncentrality parameter. If in addition

H0: µi = µ for all i = 1, . . . , t.

then the noncentrality parameter is zero and F = MST/MSE ∼ Ft−1,N−t.We make these assumptions to

• do power calculations when designing a study,

• test hypotheses after having gathered the data, and

• to make confidence intervals for parameters/contrasts of interest.

What should we do if the model assumptions are not correct?

We could have written

A0 : {εi,j} ∼ i.i.d. normal(0, σ)

to describe all of A1 − A3. We don’t do this because some violations ofassumptions are more serious than others. Statistical folklore says the orderof importance is A1, A2 then A3. We will discuss A1 when we talk aboutblocking. For now we will talk about A2 and A3.

3.3.1 Detecting violations with residuals

Violations of assumptions can be checked via residual analysis.

Parameter estimates:

yij = y·· + (yi· − y··) + (yij − yi·)

= µ + τi + εij

Our estimate of any value in group i is yij = µ + τi = yi·

Our estimate of the error is εij = yij − yi·.

εij is called the residual for observation i, j.

Assumptions about εij can be checked by examining the values of εij’s:

Page 92: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 87

Checking Normality Assumptions:

• Histogram:

Make a histogram of εij’s. This should look approximately bell-shapedif the (super)population is really normal and there are enough ob-servations. If there are enough observations, graphically compare thehistogram to a N(0, s2) distribution.

In small samples, the histograms need not look particularly bell-shaped.

• Normal probability, or qq-plot:

If εij ∼ N(0, σ2) then the ordered residuals (ε(1), . . . , ε(an)) should cor-respond linearly with quantiles of a standard normal distribution.

How non-normal can a sample from a normal population look? You canalways check yourself by simulating data in R. See Figure 3.9.

Example (Hermit Crab Data): Is there variability in hermit crab pop-ulation across six different coastline sites? A researchers sampled the popu-lation in 25 randomly sampled transects in each of the six sites.

Data: yij = population total in transect j of site i.

Model: Yij = µ + τi + εij

Note that the data are counts so they cannot be exactly normally dis-tributed.

Data description:site sample mean sample median sample std dev1 33.80 17 50.392 68.72 10 125.353 50.64 5 107.444 9.24 2 17.395 10.00 2 19.846 12.64 4 23.01

ANOVA:

Page 93: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 88

Histogram of y

y

Den

sity

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

●● ●

●●●

−2 −1 0 1 2

−1

01

2

Normal Q−Q Plot

Theoretical QuantilesS

ampl

e Q

uant

iles

Histogram of y

y

Den

sity

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

−1

01

2Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Histogram of y

y

Den

sity

−2 −1 0 1 2

0.0

0.1

0.2

0.3

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−2

−1

01

2

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Figure 3.9: Normal scores plots of normal samples, with n ∈ {20, 50, 100}

Page 94: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 89

crab[crab[, 1] == i, 2]

Den

sity

0 100 200 300 400

0.00

00.

005

0.01

00.

015 site1

crab[crab[, 1] == i, 2]D

ensi

ty

0 100 200 300 400

0.00

00.

002

0.00

40.

006

site2

crab[crab[, 1] == i, 2]

Den

sity

0 100 200 300 400

0.00

00.

005

0.01

00.

015 site3

crab[crab[, 1] == i, 2]

Den

sity

0 100 200 300 400

0.00

0.02

0.04

0.06

site4

crab[crab[, 1] == i, 2]

Den

sity

0 100 200 300 400

0.00

0.01

0.02

0.03

0.04

site5

crab[crab[, 1] == i, 2]

Den

sity

0 100 200 300 400

0.00

0.01

0.02

0.03

0.04 site6

Figure 3.10: Crab data

Page 95: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 90

fit1$res

Den

sity

−100 0 100 200 300 400

0.00

00.

004

0.00

80.

012

●●●

●●

●●●●●

●● ●

●●●

●●●● ●●

●●

●●●

●●

● ●●●●●●●●●●

●●●

●●●

●●

●●●

●●● ●●●●●●●●●●

● ●●

●●

●●●●●●●●●●●●

●●●●

●●

● ●●●

●●

● ●●●●●●●●●● ●● ●●●

●●

●●●

●●

●●

−2 −1 0 1 20

100

200

300

400

Theoretical QuantilesS

ampl

e Q

uant

iles

Figure 3.11: Crab residuals

> anova(lm(crab[,2]~as.factor(crab[,1])))Analysis of Variance Table

Response: crab[, 2]Df Sum Sq Mean Sq F value Pr(>F)

as.factor(crab[, 1]) 5 76695 15339 2.9669 0.01401 *Residuals 144 744493 5170

Residuals: εij = yij − µi = yij − yi·.

Residual diagnostic plots: See Figure 3.11.

Conclusion: Data are clearly not normally distributed.

Checking for non-constant variance (heteroscedasticity): The nulldistribution in the F-test is based on V (εij) = σ2 for all groups i.

(a) tabulate residual variance in each treatment:

Trt Sample var. of (εi1, . . . , εin)1 s2

1...

...t s2

t

Page 96: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 91

Rule of Thumb:

– If s2largest/s

2smallest < 3: don’t worry;

– If s2largest/s

2smallest > 7: worry: need to account for the non-constant

variance, especially if sample sizes are different. Consider a ran-domization test, or converting the data to ranks for the overalltest of treatment effects.

(b) Residual vs. Fitted Value plots

For many types of data there is a mean-variance relationship: typ-ically, groups with large means tend to have large variances.

To check this: plot εij (residual) vs. yij = yi· (fitted value).

Example: Crab data

Site Mean SD4 9.24 17.395 10.00 19.846 12.64 23.011 33.80 50.393 50.64 107.442 68.72 125.35

Here we have s2largest/s

2smallest ≈ 50

(c) Statistical tests of equality of variance:

Levene’s Test: Let dij = |yij − yi|, where yi is the sample medianfrom group i.

These differences will be ‘large’ in group i, if group i has a ‘large’variance; and ‘small’ in group i, if group i has a ‘small’ variance.

We compute:

F0 =n∑t

i=1 (di· − d)2/(t− 1)∑ti=1

∑nj=1(dij − di·)2/(t(n− 1))

which is the ratio of the between group variability of the dij to thewithin group variability of the dij.

Page 97: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 92

●●

●●●●●●●

●●

●●●

●●●●●●●

●●

●●●

●●

●●

●●●●●●●●●

●●

●●●

●●●

●●●●●●●●●●●●●

●●

●●

●●●● ●●●●●●●

●●●●●

●●

●●

●●

●●●● ●●●●●●●●●●●

●●

●●●

●●

●●

10 20 30 40 50 60 70

010

020

030

040

0

fitted value

resi

dual

Figure 3.12: Fitted values versus residuals

Reject H0: V (εij) = σ2 for all i, j if F0 > Ft−1,t(n−1),1−α

Crab data:

F0 =14, 229

4, 860= 2.93 > F5,144,0.95 = 2.28

hence we reject the null hypothesis of equal variances at the 0.05 level.

See also

– F Max test, Bartlett’s test;

– the equality test for two variances (F-test, Montgomery Chapter5).

Crab data: So the assumptions that validate the use of the F-test areviolated. Now what?

3.3.2 Variance stabilizing transformations

Recall that one justification of the normal distribution model,

Yij = µi + εij,

Page 98: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 93

was that if the noise ε = Xij1 + Xij2 + · · · was the result of the addition ofunobserved additive, independent effects then by the central limit theoremεij will be approximately normal.

However, if effects are multiplicative so that in fact:

Yij = µi × εij = µi × (Xij1 ×Xij2 × · · · )

In this case, the Yij will not be normal, and the variances will not be constant:

V ar(Yij) = µ2i V ar(Xij1 ×Xij2 × · · · )

Log transform:

log Yij = log µi + (log Xij1 + log Xij2 + · · · )V (log Yij) = V (log µi + log Xij1 + log Xij2 + · · · )

= V (log Xij1 + log Xij2 + · · · )= σ2

log y

So that the variance of the log-data does not depend on the mean µi. Alsonote that by the central limit theorem the errors should be approximatelynormally distributed.

Crab data: Let Yi,j = log(Y rawi,j + 1/6)

Site Mean SD6 0.82 2.214 0.91 1.875 1.01 1.743 1.75 2.411 2.16 2.272 2.30 2.44

Page 99: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 94

●●

●●

●●

●●●

●●●●

1 2 3 4 5 6

020

040

0

site

crab

pop

ulat

ion

1 2 3 4 5 6

−2

02

46

site

log

crab

pop

ulat

ion

Figure 3.13: Data and log data

Page 100: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 95

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−4

−2

02

4

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

1.0 1.5 2.0−

4−

20

24

fitted valuere

sidu

al

Figure 3.14: Diagnostics after the log transformation

> anova(lm(log(crab[,2]+1/6)~as.factor(crab[,1])))Analysis of Variance Table

Response: log(crab[, 2] + 1/6)Df Sum Sq Mean Sq F value Pr(>F)

as.factor(crab[, 1]) 5 54.73 10.95 2.3226 0.04604 *Residuals 144 678.60 4.71> anova(lm(log(crab[,2]+1/6)~as.factor(crab[,1])))Analysis of Variance Table

Response: log(crab[, 2] + 1/6)Df Sum Sq Mean Sq F value Pr(>F)

as.factor(crab[, 1]) 5 54.73 10.95 2.3226 0.04604 *Residuals 144 678.60 4.71

Other transformations: For data having multiplicative effects , we showedthat

σi ∝ µi,

Page 101: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 96

and taking the log stabilized the variances. In general, we may observe:σi ∝ µα

i i.e. the standard deviation of a group depends on the group mean.

The goal of a variance stabilizing transformation is to find a transforma-tion of yi,j to y∗i,j such that σy∗

ij∝ (µ∗i )

0 = 1, i.e. the standard deviationdoesn’t depend on the mean.

Consider the class of power transformations, transformations of theform Y ∗

i,j = Y λi,j. Based on a Taylor series expansion of gλ(Y ) = Y λ around

µi, we have

Y ∗i,j = gλ(Yi,j)

≈ µλi + (Yi,j − µi)λµλ−1

i

E(Y ∗i,j) ≈ µλ

i

V (Y ∗i,j) ≈ E[(Yi,j − µi)

2](λµλ−1i )2

SD(Y ∗i,j) ∝ µα

i µλ−1i = µα+λ−1

i

So if we observe σi ∝ µαi , then σ∗i

sim∝ µα+λ−1i . So if we take λ = 1 − α then

we will have stabilized the variances to some extent. Of course, we typicallydon’t know α, but we could try to estimate it from data.

Estimation of α:

σi ∝ µαi ⇔ σi = cµα

i

log σi = log c + α× log µi,

so log si ≈ log c + α log yi·

Thus we may use the following procedure:

(1) Plot log si vs. log yi·

(2) Fit a least squares line: lm( log si ∼ log yi·)

(3) The slope α of the line is an estimate of α.

(4) Analyze y∗ij = y1−αij .

Page 102: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 97

Here are some common transformations:

Mean-var. Relation α λ = 1− α transform y∗ij

σy ∝ const. 0 1 no transform! yij

σy ∝ µ1/2i 1/2 1/2 square root y

1/2ij = √

yij

σy ∝ µ3/4i 3/4 1/4 quarter power y

1/4ij

σy ∝ µi 1 0 log log yij

σy ∝ µ3/2i 3/2 -1/2 reciproc. sqr. root y

−1/2ij

σy ∝ µ2i 2 -1 reciprocal 1/yij

• All the mean-variance relationships here are examples of power-laws.Not all mean-variance relations are of this form.

• α = 1 is the multiplicative model discussed previously .

More about the log transform

How did α = 1 give us y∗ij = log yij, shouldn’t it be y1−αij = yλ

ij = y0ij = 1 in

there?

Everything will make sense if we define for any λ 6= 0:

y∗(λ) =yλ − 1

λ∝ yλ + c.

For λ = 0, it’s natural to define the transformation as:

y∗(0) = limλ→0

y∗(λ) = limλ→0

yλ − 1

λ

=yλ ln y

1

∣∣∣∣λ=0

= ln y

Note that for a given λ 6= 0 it will not change the results of the ANOVA onthe transformed data if we transform using:

y∗ = yλ or y∗(λ) =yλ − 1

λ= ayλ + b.

Page 103: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 98

To summarize the procedure: If the data present strong evidence ofnonconstant variance,

(1) Plot log si vs. log yi·. If the relationship looks linear, then

(2) Fit a least squares line: lm( log si ∼ log yi·)

(3) The slope α of the line is an estimate of α.

(4) Analyze y∗ij = y1−αij .

This procedure is called the “Box-Cox” transformation (George Box andDavid Cox first proposed the method in a paper in 1964). If the relationshipbetween log si and log yi does not look linear, then a Box-Cox transformationwill probably not help.

A few words of caution:

• For variance stabilization via the Box-Cox procedure to do much good,the linear relationship between means and variances should be quitetight.

• Remember the rule of thumb which says not to worry if the ratio ofthe largest to smallest variance is less than 3, i.e. don’t use a transformunless there are drastic differences in variances.

• Don’t get too carried away with the Box-Cox transformations. If α =0.53 don’t analyze y∗ij = y0.47

ij , just use y0.5ij . Remember, α is just an

estimate anyway (See Box and Cox ‘Rebuttal’, JASA 1982.)

• Remember to make sure that you describe the units of the transformeddata, and make sure that readers of your analysis will be able to un-derstand that the model is additive in the transformed data, but notin the original data. Also always include a descriptive analysis of theuntransformed data, along with the p-value for the transformed data.

• Try to think about whether the associated non-linear model for yij

makes sense.

Page 104: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 99

• Don’t assume that the transformation is a magical fix: remember tolook at residuals and diagnostics after you do the transform. If thingshaven’t improved much, don’t transform.

• Remember that the mean-variance relationship might not be cured bya transform in the Box-Cox class.

– (e.g. if the response is a binomial proportion (= proportion ofsuccesses out of n), we have mean = p, s.d. =

√p(1− p); the

variance stabilizing transformation in this case is y∗ = arcsin√

y.)

• Keep in mind that statisticians disagree on the usefulness of transfor-mations: some regard them as a ‘hack’ more than a ‘cure’:

– It can be argued that if the scientist who collected the data hada good reason for using certain units, then one should not justtransform the data in order to bang it into an ANOVA-shapedhole. (Given enough time and thought we could instead build anon-linear model for the original data.)

• The sad truth: as always you will need to exercise judgment whileperforming your analysis.

These warnings apply whenever you might reach for a transform, whether inan ANOVA context, or a linear regression context.

Example (Crab data): Looking at the plot of means vs. sd.s suggestsα ≈ 1, implying a log-transformation. However, the zeros in our data leadto problems, since log(0) = −∞.

Instead we can use y∗ij = log(yij + 1/6). (See plots.) For the transformeddata this gives us a ratio of the largest to smallest standard deviation ofapproximately 2 which is acceptable based on the rule of 3.

site sample sd sample mean log(sample sd) log(sample mean)4 17.39 9.24 2.86 2.225 19.84 10.00 2.99 2.306 23.01 12.64 3.14 2.541 50.39 33.80 3.92 3.523 107.44 50.64 4.68 3.922 125.35 68.72 4.83 4.23

Page 105: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 100

2.5 3.0 3.5 4.0

3.0

3.5

4.0

4.5

log_mean

log_

sd

Figure 3.15: Mean variance relationship of the transformed data

Page 106: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 3. COMPARING SEVERAL TREATMENTS 101

lm(formula = log_sd ~ log_mean)Coefficients:(Intercept) log_mean

0.6652 0.9839

Page 107: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

Chapter 4

Multifactor Designs

4.1 Factorial Designs

Example (Insecticide): In developing methods of pest control, researchersare interested in the efficacy of different types of poison and different de-livery methods.

• Treatments:

– Type ∈ {I, II, III}– Delivery ∈ {A, B, C,D}

• Response: time to death, in minutes.

Possible experimental design: Perform two separate CRD experiments,one testing for the effects of Type and the other for Delivery. But,

which Delivery to use for the experiment testing for Type effects?

which Type to use for the experiment testing for Delivery effects?

To compare different Type×Delivery combinations, we need to do experi-ments under all 12 treatment combinations.

Experimental design: 48 insects randomly assigned to treatments: 4 toeach treatment combination, i.e.

4 assigned to (I, A),

102

Page 108: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 103

4 assigned to (I, B),

...

It might be helpful to visualize the design as follows:

DeliveryType A B C D

1 yI,A yI,B yI,C yI,D

2 yII,A yII,B yII,C yII,D

3 yIII,A yIII,B yIII,C yIII,D

This type of design is called a factorial design. Specifically, this designis a 3× 4 two-factor design with 4 replications.

Factors: Categories of treatments

Levels of a factor: the different treatments in a category

So in this case, Type and Delivery are both factors. There are 3 levels ofType and 4 levels of Delivery.

4.1.1 Data analysis:

Lets first look at a series of plots:

Marginal Plots: Based on these marginal plots, it looks like (III, A) wouldbe the most effective combination. But are the effects of Type consis-tent across levels of Delivery?

Conditional Plots: Type III looks best across delivery types. But the dif-ference between types I and II seems to depend on delivery.

Cell Plots: Another way of looking at the data is to just view it as a CRDwith 3×4 = 12 different groups. Sometimes each group is called a cell.

Notice that there seems to be a mean-variance relationship. Lets take careof this before we go any further: Plotting means versus standard deviationson both the raw and log scale gives the relationship in Figure 4.4. Computingthe least squares line gives

Page 109: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 104

I II III

24

68

1012

A B C D

24

68

1012

Figure 4.1: Marginal Plots.

> lm(log(sds)~log(means))Coefficients:(Intercept) log(means)

-3.203 1.977

So σi ≈ cµ2i . This suggests a reciprocal transformation, i.e. analyzing

yi,j = 1/ time to death. Does this reduce the relationship between meansand variances? Figure 4.5 shows the mean-variance relationship for the trans-formed data.

So we select this as our data scale and start from scratch, beginning withplots (Figure 4.6).

Possible analysis methods: Lets first try to analyze these data using ourexisting tools:

• Two one-factor ANOVAS: Just looking at Type, for example, the ex-periment is a one-factor ANOVA with 3 treatment levels and 16 repsper treatment. Conversely, looking at Delivery, the experiment is a onefactor ANOVA with 4 treatment levels and 12 reps per treatment.

> anova(lm(1/dat$y~dat$type))

Page 110: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 105

I II III

2.0

2.5

3.0

3.5

4.0

4.5

A

I II III

46

810

12

B

I II III

23

45

67

C

I II III

34

56

78

910

D

Figure 4.2: Conditional Plots.

Page 111: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 106

I.A II.A I.B II.B I.C II.C I.D II.D

24

68

1012

Figure 4.3: Cell plots.

0.0 1.0 2.0 3.0

23

45

67

89

sds

mea

ns

●●

0.8 1.2 1.6 2.0

−2.

0−

1.0

0.0

1.0

log(means)

log(

sds)

Figure 4.4: Mean-variance relationship.

Page 112: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 107

●●

0.02 0.04 0.06 0.08

0.2

0.3

0.4

sds

mea

ns

●●

−2.2 −1.8 −1.4 −1.0−

3.8

−3.

4−

3.0

−2.

6log(means)

log(

sds)

Figure 4.5: Mean-variance relationship for transformed data.

Df Sum Sq Mean Sq F value Pr(>F)dat$type 2 0.34877 0.17439 25.621 3.728e-08 ***Residuals 45 0.30628 0.00681---

> anova(lm(1/dat$y~dat$delivery))

Df Sum Sq Mean Sq F value Pr(>F)dat$delivery 3 0.20414 0.06805 6.6401 0.0008496 ***Residuals 44 0.45091 0.01025---

• A one-factor ANOVA: There are 3× 4 = 12 different treatments:

> anova(lm(1/dat$y~dat$type:dat$delivery))

Df Sum Sq Mean Sq F value Pr(>F)dat$type:dat$delivery 11 0.56862 0.05169 21.531 1.289e-12 ***Residuals 36 0.08643 0.00240---

Page 113: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 108

I II III

24

68

1012

A B C D

24

68

1012

I II III

0.25

0.35

0.45

0.55

A

I II III

0.10

0.20

0.30

B

I II III

0.15

0.25

0.35

0.45

C

I II III

0.10

0.20

0.30

D

I.A II.A III.A I.B II.B III.B I.C II.C III.C I.D II.D III.D

0.1

0.2

0.3

0.4

0.5

Figure 4.6: Plots of transformed poison data

Page 114: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 109

Why might these methods be insufficient?

• What are the SSE, MSE representing in the first two ANOVAS? Whyare they bigger than the value in the in the third ANOVA?

• In the third ANOVA, can we assess the effects of Type and Deliveryseparately?

• Can you think of a situation where the F-stats in the first and secondANOVAs would be “small”, but the F-stat in the third ANOVA “big”?

Basically, the first and second ANOVAs may mischaracterize the data andsources of variation. The third ANOVA is “valid,” but we’d like a morespecific result: we’d like to know which factors are sources of variation, andthe relative magnitude of their effects. Also, if the effects of one factor areconsistent across levels of the other, maybe we don’t need to have a separateparameter for each of the 12 treatment combinations, i.e. a simpler modelmay suffice.

4.1.2 Additive effects model

Yi,j,k = µ + ai + bj + εi,j,k, i = 1, . . . , t1, j = 1, . . . , t2, k = 1, . . . , r

µ = overall mean;

a1, . . . , at1 = additive effects of factor 1;

b1, . . . , bt2 = additive effects of factor 2.

Notes:

1. Side conditions: As with with the treatment effects model in the one-factor case, we only need t1 − 1 parameters to differentiate between t1things, so we usually

• restrict a1 = 0, b1 = 0 (set-to-zero side conditions ), OR

• restrict∑

ai = 0,∑

bj = 0 (sum-to-zero side conditions).

2. The additive model is a reduced model : There are t1 × t2 groupsor treatment combinations, and a full model fits a different popula-tion mean separately to each treatment combination, requiring t1 × t2parameters. In contrast, the additive model only has

Page 115: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 110

1 parameter for µt1 − 1 parameters for ai’st2 − 1 parameters for bj’st1 + t2 − 1 parameters total.

Parameter estimation and ANOVA decomposition:

yijk = y··· + (yi·· − y···) + (y·j· − y···) + (yijk − yi·· − y·j· + y···)

= µ + ai + bj + εijk

These are the least-squares parameter estimates, under the sum-to-zeroside conditions: ∑

ai =∑

(yi·· − y···) = ny··· − ny··· = 0

To obtain the set-to-zero side conditions, add a1 and b1 to µ, subtract a1

from the ai’s, and subtract b1 from the bj’s. Note that this does not changethe fitted value in each group:

fitted(yijk) = µ + ai + bj

= (µ + a1 + b1) + (ai − a1) + (bj − b1)

= µ∗ + a∗i + b∗j

As you might have guessed, we can write this decomposition out as vectorsof length t1 × t2 × r:

y − y··· = a + b + εvT = v1 + v2 + ve

The columns represent

vT variation of the data around the grand mean;

v1 variation of factor 1 means around the grand mean;

v2 variation of factor 2 means around the grand mean;

ve variation of the data around fitted the values.

You should be able to show that these vectors are orthogonal, and so∑i

∑j

∑k(yijk − y···)

2 =∑

i

∑j

∑k a2

i +∑

i

∑j

∑k b2

i +∑

i

∑j

∑k ε2

i

SSTotal = SSA + SSB + SSE

Page 116: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 111

Degrees of Freedom:

• a contains t1 different numbers but sums to zero → t1 − 1 dof

• b contains t2 different numbers but sums to zero → t2 − 1 dof

ANOVA table

Source SS df MS FA SSA t1 − 1 SSA/dfA MSA/MSEB SSB t2 − 1 SSB/dfB MSB/MSE

Error SSE (t1 − 1)(t2 − 1) + t1t2(r − 1) SSE/dfETotal SSTotal t1t2r − 1

> anova(lm(1/dat$y~dat$type+dat$delivery))Analysis of Variance Table

Response: 1/dat$yDf Sum Sq Mean Sq F value

dat$type 2 0.34877 0.17439 71.708dat$delivery 3 0.20414 0.06805 27.982Residuals 42 0.10214 0.00243---

This ANOVA has decomposed the variance in the data into the varianceof additive Type effects, additive Delivery effects, and residuals. Does thisadequately represent what is going on in the data? What do we mean byadditive? Assuming the model is correct, we have:

E[Y |type=I, delivery=A] = µ + a1 + b1

E[Y |type=II, delivery=A] = µ + a2 + b1

This says the difference between Type I and Type II is a1 − a2 regardless ofDelivery. Does this look right based on the plots? Consider the followingtable:

Effect of Type I vs II, given Deliverypoison Type full model additive model

A µIA − µIIA (µ + a1 + b1)− (µ + a2 + b1) = a1 − a2

B µIB − µIIB (µ + a1 + b2)− (µ + a2 + b2) = a1 − a2

C µIC − µIIC (µ + a1 + b3)− (µ + a2 + b3) = a1 − a2

D µID − µIID (µ + a1 + b4)− (µ + a2 + b4) = a1 − a2

Page 117: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 112

• The full model allows differences between Types to vary across levelsof Delivery

• The reduced/additive model says differences are constant across levelsof Delivery.

Therefore, the reduced model is appropriate if

(µIA − µIIA) = (µIB − µIIB) = (µIC − µIIC) = (µID − µIID)

How can we test for this? Consider the following parameterization of thefull model:

Interaction model:

Yijk = µ + ai + bj + (ab)ij + εijk

µ = overall mean;

a1, . . . , at1 = additive effects of factor 1;

b1, . . . , bt2 = additive effects of factor 2.

(ab)ij = interaction terms = deviations from additivity.

The interaction term is a correction for non-additivity of the factor effects.This is a full model: It fits a separate mean for each treatment combination:

E(Yijk) = µij = µ + ai + bj + (ab)ij

Parameter estimation and ANOVA decomposition:

yijk = y··· + (yi·· − y···) + (y·j· − y···) + (yij· − yi·· − y·j· + y···) + (yijk − yij·)

= µ + ai + bj + ˆ(ab)ij + εijk

Deciding between the additive/reduced model and the interaction/full model

is tantamount to deciding if the variance explained by the ˆ(ab)ij’s is large ornot.

Page 118: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 113

4.1.3 Evaluating additivity:

Recall the additive model

yijk = µ + ai + bj + εijk

• i = 1, . . . , t1 indexes the levels of factor 1

• j = 1, . . . , t2 indexes the levels of factor 1

• k = 1, . . . , r indexes the replications for each of the t1 × t2 treatmentcombinations.

Parameter estimates are obtained via the following decomposition:

yijk = y··· + (yi·· − y···) + (y·j· − y···) + (yijk − yi·· − y·j· + y···)

= µ + ai + bj + εijk

Fitted value:

yijk = µ + ai + bj

= yi·· + y·j· − y···

Note that in this model the fitted value in one cell depends on datafrom the others.

Residual:

εijk = yijk − yijk

= (yijk − yi·· − y·j· + y···)

As we discussed last time, this model is “adequate” if the differences acrosslevels of one factor don’t depend on the level of the other factor, i.e.

• y1j· − y2j· ≈ a1 − a2 for all j = 1, . . . , t2, and

• yi1· − yi2· ≈ b1 − b2 for all i = 1, . . . , t1.

This adequacy can be assessed by looking differences between the samplemeans from each cell and the fitted values under the additive model.

yij· − (µ + ai + bj) = yij· − yi·· − y·j· + y···

≡ ˆ(ab)ij

The term ˆ(ab)ij measures the deviation of the cell means to the estimatedadditive model. It is called an interaction.

Page 119: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 114

Interactions and the full model: The interaction terms also can bederived by taking the additive decomposition above one step further: Theresidual in the additive model can be written:

εAijk = yijk − yi·· − y·j· + y···

= (yijk − yij·) + (yij· − yi·· − y·j· + y···)

= εIijk + ˆ(ab)ij

This suggests the following full model, or interaction model

yijk = µ + ai + bj + (ab)ij + εijk

with parameter estimates obtained from the following decomposition:

yijk = y··· + (yi·· − y···) + (y·j· − y···) + (yij· − yi·· − y·j· + y···) + (yijk − yij·)

= µ + ai + bj + ˆ(ab)ij + εijk

Fitted value:

yijk = µ + ai + bj + ˆ(ab)ij

= yij· = µij

This is a full model for the treatment means: The estimate of themean in each cell depends only on data from that cell. Contrast thisto additive model.

Residual:

εijk = yijk − yijk

= yijk − yij·

Thus the full model ANOVA decomposition partitions the variability amongthe cell means y11·, y12·, . . . , yt1t2· into

• the overall mean µ and additive treatment effects ai , bj.

• what is “leftover” : ˆ(ab)ij = yij· − (µ + ai + bj)

Page 120: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 115

Perhaps someone can come up with a better table than this:var explained by add model + error in add modelvar explained by full model + error in full model

Total SS = SSA + SSB + SSAB + SSE

Example: Poison 3 × 4 two-factor CRD with 4 reps per treatmentcombination.

Df Sum Sq Mean Sq F valuepois$deliv 3 0.20414 0.06805 27.982pois$type 2 0.34877 0.17439 71.708Residuals 42 0.10214 0.00243

Df Sum Sq Mean Sq F valuepois$deliv 3 0.20414 0.06805 28.3431pois$type 2 0.34877 0.17439 72.6347pois$deliv:pois$type 6 0.01571 0.00262 1.0904Residuals 36 0.08643 0.00240

So notice

• 0.10214 = 0.01571 + 0.08643, that is SSEadd = SSABint + SSEint

• 42 = 6 + 36, that is dof(SSEadd) = dof(SSABint) + dof(SSEint)

• SSA, SSB, dof(A), dof(B) are unchanged in the two models

• MSEadd ≈ MSEint , but degrees of freedom are larger in the additivemodel. Which do you think is a better estimate of the within-groupvariance?

Expected sums of squares:

• If H0 : (ab)ij = 0 is true, then

– E(MSE) = σ2

– E(MSAB) = σ2

• If H0 : (ab)ij = 0 is not true, then

– E(MSE) = σ2

Page 121: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 116

– E(MSAB) = σ2 + rτ 2AB > σ2.

This suggests

• An evaluation of the adequacy of the additive model can be assessedby comparing MSAB to MSE. Under H0 : (ab)ij = 0 ,

FAB = MSAB/MSE ∼ F(t1−1)×(t2−1),t1t2(r−1)

Evidence against H0 can be evaluated by computing the p-value.

• If the additive model is adequate then MSEint and MSAB are twoindependent estimates of roughly the same thing (why independent?).We may then want to combine them to improve our estimate of σ2.

Df Sum Sq Mean Sq F value Pr(>F)pois$deliv 3 0.20414 0.06805 28.3431 1.376e-09 ***pois$type 2 0.34877 0.17439 72.6347 2.310e-13 ***pois$deliv:pois$type 6 0.01571 0.00262 1.0904 0.3867Residuals 36 0.08643 0.00240

For these data, there is strong evidence of both treatment effects, and littleevidence of non-additivity. We may want to use the additive model.

4.1.4 Inference for additive treatment effects

Consider a two-factor experiment in which it is determined that the effectsof factor F1 and F2 are large. Now we want to compare means across levelsof one of the factors.

Recall in the pesticide example we had 4 reps for each of 3 levels of Typeand 4 levels of Delivery. So we have 4× 4 = 16 observations for each level ofType.

The wrong approach: The two-sample t-test is

y1·· − y2··

s12

√2/(4× 4)

For the above example,

• y1·· − y2·· = 0.047

Page 122: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 117

0.1 0.2 0.3 0.4 0.5rate

111 111 1 1 1111 11 11

2 22 22 2 22 2 2 2222 2 2

Figure 4.7: Comparison between types I and II, without respect to delivery.

• s12 = 0.081, r × t2 = 4× 4 = 16.

• t-statistic = -1.638, df=30, p-value=0.112.

Questions:

• What is s212 estimating?

• What should we be comparing the factor level differences to?

If Delivery is a known source of variation, we should compare differencesbetween levels of Type to variability within a treatment combination, i.e.σ2. For the above example, s12 = .081, whereas sMSE =

√0.00240 ≈ 0.05, a

ratio of about 1.65.

y1·· − y2··

sMSE

√2/(4× 4)

= 2.7, p-value ≈ 0.01

Testing additive effects Let µij be the population mean in cell ij. Therelationship between the cell means model and the parameters in the inter-action model are as follows:

µij = µ·· + (µi· − µ··) + (µ·j − µ··) + (µij − µi· − µj· + µ··)

= µ + ai + bj + (ab)ij

and so

Page 123: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 118

0.1 0.2 0.3 0.4 0.5rate

111 111 1 1 1111 1111

2 22 22 2 22 2 2 2222 2 2

Figure 4.8: Comparison between types I and II, with delivery in color.

•∑

i ai = 0

•∑

j bj = 0

•∑

i(ab)ij = 0 for each j,∑

j(ab)ij = 0 for each i

Suppose we are interested in the difference between F1 = 1 and F1 = 2. Withthe above representation we can show that

1

t2

∑j

µ1j −1

t2

∑j

µ2j =1

t2

∑j

(µ1j − µ2j)

=1

t2

∑j

([µ + a1 + bj + (ab)1j]− [µ + a2 + bj + (ab)2j])

=1

t2(∑

j

(a1 − a2) +∑

j

(ab)1j −∑

j

(ab)2j

= a1 − a2

and so the “effect of F1 = 1 versus F1 = 2” = a1 − a2 can be viewed asa contrast of cell means. Note that a1 − a2 is a contrast of treatment

Page 124: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 119

(population) means:

F2 = 1 F2 = 2 F2 = 3 F2 = 4

F1 = 1 µ11· µ12· µ13· µ14· 4µ1··F1 = 2 µ21· µ22· µ23· µ24· 4µ2··F1 = 3 µ31· µ32· µ33· µ34· 4µ3··

3µ·1· 3µ·2· 3µ·3· 3µ·4· 12µ···

So

a1 − a2 = µ1·· − µ2··

= (µ11· + µ12· + µ13· + µ14·)/4− (µ21· + µ22· + µ23· + µ24·)/4

Like any contrast, we can estimate/make inference for it using contrasts ofsample means:

a1 − a2 = a1 − a2 = y1·· − y2·· is an unbiased estimate of a1 − a2

Note that this estimate is the corresponding contrast among the t1 × t2sample means:

F2 = 1 F2 = 2 F2 = 3 F2 = 4

F1 = 1 y11· y12· y13· y14· 4y1··F1 = 2 y21· y22· y23· y24· 4y2··F1 = 3 y31· y32· y33· y34· 4y3··

3y·1· 3y·2· 3y·3· 3y·4· 12y···

So

a1 − a2 = y1·· − y2··

= (y11· + y12· + y13· + y14·)/4− (y21· + y22· + y23· + y24·)/4

Hypothesis tests and confidence intervals can be made using the standardassumptions:

• E(a1 − a2) = a1 − a2

• Under the assumption of constant variance:

V (a1 − a2) = V (y1·· − y2··)

= V (y1··) + V (y2··)

= σ2/(r × t2) + σ2/(r × t2)

= 2σ2/(r × t2)

Page 125: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 120

• Under the assumption that the data are normally distributed

a1 − a2 ∼ normal(a1 − a2, σ√

2/(r × t2))

and is independent of MSE (why?)

• So(a1 − a2)− (a1 − a2)√

MSE 2r×t2

=(a1 − a2)− (a1 − a2)

SE(a1 − a2)∼ tν

where ν are the degrees of freedom associated with our estimate of σ2,i.e. the residual degrees of freedom in our model.

– ν = t1t2(r − 1) under the full/interaction model

– ν = t1t2(r − 1) + (t1 − 1) + (t2 − 1) under the reduced/additivemodel

t-test: Reject H0 : a1 = a2 if

a1 − a2√MSE 2

r×t2

> t1−αC/2,ν

|a1 − a2| >

√MSE

2

r × t2× t1−αC/2,ν

So the quantity

LSD1 = t1−αC/2,ν × SE(α1 − α2)

= t1−αC/2,ν ×√

MSE2

r × t2

is a “yardstick” for comparing levels of factor 1. It is sometimes called theleast significant difference for comparing levels of Factor 1. It is analogousto the LSD we used in the 1-factor ANOVA.

Important note: The LSD depends on which factor you are looking at:The comparison of levels of Factor 2 depends on

V (y·1· − y·2·) = V (y·1·) + V (y·2·)

= σ2/(r × t1) + σ2/(r × t1)

= 2σ2/(r × t1)

Page 126: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 121

So the LSD for factor 2 differences is

LSD2 = t1−αC/2,ν ×√

MSE2

r × t1

Back to the example:

Df Sum Sq Mean Sq F value Pr(>F)pois$deliv 3 0.20414 0.06805 28.3431 1.376e-09 ***pois$type 2 0.34877 0.17439 72.6347 2.310e-13 ***pois$deliv:pois$type 6 0.01571 0.00262 1.0904 0.3867Residuals 36 0.08643 0.00240

There is not very much evidence that the effects are not additive. Lets assumethere is no interaction term. If we are correct then we will have increasedthe precision of our variance estimate.

Df Sum Sq Mean Sq F value Pr(>F)pois$deliv 3 0.20414 0.06805 27.982 4.192e-10 ***pois$type 2 0.34877 0.17439 71.708 2.865e-14 ***Residuals 42 0.10214 0.00243

So there is strong evidence against the hypothesis that the additive effectsare zero for either factor. Which treatments within a factor are different fromeach other?

effects of poison type: At αC = 0.05,

LSDtype = t.975,42 × SE(a1 − a2)

= 2.018×√

.00242

4× 4= 2.018× 0.0173 = 0.035

Type Mean LSD GroupingI 0.18 1II 0.23 2III 0.38 3

Page 127: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 122

I II III

0.1

0.2

0.3

0.4

0.5

A B C D0.

10.

20.

30.

40.

5

Figure 4.9: Marginal plots of the data.

effects of poison delivery: At αC = 0.05,

LSDtype = t.975,42 × SE(b1 − b2)

= 2.018×√

.00242

4× 3= 2.018× 0.02 = 0.040

Type Mean LSD GroupingB 0.19 1D 0.21 1C 0.29 2A 0.35 3

Note the differences between these comparisons and those from two-sample t-tests.

Interpretation of estimated additive effects: If the additive model isclearly wrong, can we still interpret additive effects? The full model is

yijk = µij + εijk

Page 128: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 123

A reparameterization of this model is the interaction model:

yijk = µ + ai + bj + (ab)ij + εijk

where

• µ = 1t1t2

∑i

∑j µij = µ··

• ai = 1t2

∑j(µij − µ··) = µi· − µ··

• bj = 1t1

∑i(µij − µ··) = µ·j − µ··

• (ab)ij = µij − µi· − µ·j + µ·· = µij − (µ + ai + bj)

The terms {a1, . . . , at1} , {b1, . . . , bt2} are sometimes called “main effects”.The additive model is

yijk = µ + ai + bj + εijk

Sometimes this is called the “main effects” model. If this model is correct,it implies that

(ab)ij = 0 ∀i, j ⇔µi1j − µi2j = ai1 − ai2 ∀i1, i2, j ⇔µij1 − µij2 = bj1 − bj2 ∀i, j1, j2

What are the estimated additive effects estimating, in either case?

a1 − a2 = (y1·· − y···)− (y2·· − y···)

= y1·· − y2··

=1

t2

t2∑j=1

y1j· −1

t2

t2∑j=1

y2j·

=1

t2

t2∑j=1

(y1j· − y2j·) (2)

Now

E[1

t2

t2∑j=1

(y1j· − y2j·)] =1

t2

t2∑j=1

(µ1j − µ2j) = a1 − a2,

so a1− a2 is estimating a1−a2 regardless if additivity is correct or not. Now,how do we interpret this effect?

Page 129: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 124

• Regardless of additivity, a1 − a2 can be interpreted as the estimateddifference in response between having factor 1 =1 and factor 1=2, av-eraged over the experimental levels of factor 2.

• If additivity is correct, a1 − a2 can further be interpreted as the es-timated difference in response between having factor 1 =1 and factor1=2, for every level of factor 2 in the experiment.

Statistical folklore suggests that if there is significant non-additivity, thenyou can’t interpret main/additive effects. As we can see, this is not true:the additive effects have a very definite interpretation under the full model.In some cases (block designs, coming up), we may be interested in additiveeffects even if there is significant interaction.

Dissecting the interaction: Sometimes if there is an interaction, wemight want to go in and compare individual cell means. Consider the follow-ing table of means from a 2× 3 two-factor ANOVA.

y11· y12· y13· y14·y21· y22· y23· y24·

A large interaction SS in the ANOVA table gives us evidence, for example,that µ1j − µ2j varies across levels j of factor 2. It may be useful to dissectthis variability further, and understand how the non-additivity is manifestedin the data: For example, consider the three plots in Figure 4.10. These allwould give a large interaction SS, but imply very different things about theeffects of the factors.

Contrasts for examining interactions: Suppose we want to comparethe effect of (factor 1=1) to (factor 1=2) across levels of factor 2. Thisinvolves contrasts of the form:

C = (µ1j − µ2j)− (µ1k − µ2k)

This contrast can be estimated with the sample contrast:

C = (y1j· − y2j·)− (y1k· − y2k·)

Page 130: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 125

1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4

24

68

1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4

23

45

67

8

1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4

24

68

Figure 4.10: Three datasets exhibiting non-additive effects.

Page 131: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 126

As usual, the standard error of this contrast is the estimate of its standarddeviation:

V ar(C) = σ2/r + σ2/r + σ2/r + σ2/r = 4σ2/r

SE(C) = 2√

MSE/r

Confidence intervals and t-tests for C can be made in the usual way.

4.2 Randomized complete block designs

Recall our first design:

CRD: to assess effects of a single factor, say F1, on response, we randomlyallocate levels of F1 to experimental units.

Typically, one hopes the e.u.’s are homogeneous or nearly so:

• scientifically: If the units are nearly homogeneous, then any observedvariability in response can be attributed to variability in factor levels.

• statistically: If the units are nearly homogeneous, then MSE will besmall, confidence intervals will be precise and hypothesis tests powerful.

But what if the e.u.’s are not homogeneous?

Example: Let F2 be a factor that is a large source of variation/heterogeneity,but is not recorded (age of animals in experiment, gender, field plot con-ditions, soil conditions).

ANOVASource SS MS F-ratio

F1 SS1 MS1 =SS1/(t1 − 1) MS1/MSE(F2+Error) SS2+SSE (SS2+SSE)/N − t1

If SS2 is large, F-stat for F1 may be small. If a factor

1. affects response

2. varies across experimental units

Page 132: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 127

then it will increase the variance in response and also the experimental errorvariance/MSE if unaccounted for. If F2 is a known, potentially large sourceof variation, we can control for it pre-experimentally with a block design.

Blocking: The stratification of experimental units into groups that are morehomogeneous than the whole.

Objective: To have less variation among units within blocks than betweenblocks.

Typical blocking criteria:

• location

• physical characteristics

• time

Example(Nitrogen fertilizer timing): How does the timing of nitrogenadditive affect nitrogen uptake?

• Treatment: Six different timing schedules 1, . . . , 6: Level 4 is “stan-dard”

• Response: Nitrogen uptake (ppm×10−2 )

• Experimental material: One irrigated field

Soil moisture is thought to be a source of variation in response.

Design:

1. Field is divided into a 4× 6 grid.

2. Within each row or block, each of the 6 treatments are randomlyallocated.

1. The experimental units are blocked into presumably more homoge-neous groups.

2. The blocks are complete, in that each treatment appears in each block.

Page 133: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 128

wet

dry

−−−−−−−−−−−−−− irrigation −−−−−−−−−−−−−−

Figure 4.11: Experimental material in need of blocking.

column

row

1 2 3 4 5 6

43

21

40.892

37.995

37.184

34.981

34.896

42.073

41.221

49.423

45.854

50.156

41.995

46.692

44.576

52.683

37.615

36.941

46.652

40.234

41.92

39.24

43.296

40.455

42.913

39.971

treatment and data versus location

Figure 4.12: Results of the experiment

Page 134: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 129

3. The blocks are balanced, in that there are

• t1 = 6 observations for each level of block

• t2 = 4 observations for each level of trt

• r = 1 observation for each trt×block combination.

This design is a randomized complete block design.

Analysis of the RCB design with one rep: Analysis proceeds just asin the two-factor ANOVA:

yij − y·· = (yi· − y··) + (y·j − y··) + (yij − yi· − y·j + y··)SSTotal = SSTrt + SSB + SSE

ANOVA tableSource SS dof MS F-ratio

Trt SST t1 − 1 SST/(t1 − 1) MST/MSEBlock SSB t2 − 1 SSB/(t2 − 1) (MSB/MSE)Error SSE (t1 − 1)(t2 − 1) SSE/(t1 − 1)(t2 − 1)Total SSTotal t1t2 − 1

ANOVA for nitrogen example:Source SS dof MS F-ratio p-value

Trt 201.32 5 40.26 5.59 0.004Block 197.00 3 65.67 9.12Error 108.01 15 7.20Total 506.33 23

Discussion:

• Discuss the following three ANOVA decompositions:

#######> anova(lm(c(y)~as.factor(c(trt)) ))

Df Sum Sq Mean Sq F value Pr(>F)as.factor(c(trt)) 5 201.316 40.263 2.3761 0.08024 .Residuals 18 305.012 16.945#######

Page 135: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 130

1 2 3 4 5 6

3540

4550

as.factor(c(trt))

c(y)

1 2 3 4

3540

4550

as.factor(c(rw))

c(y)

column

row

1 2 3 4 5 6

43

21

−3.142

−1.525

−3.444

−3.31

−8.336

−4.73

2.941

2.653

5.244

6.926

2.485

2.662

1.356

5.913

−1.95

−1.341

2.622

−0.394

−2.132

−1.414

0.066

0.945

−3.863

1.691

treatment and residual versus location

Figure 4.13: Marginal plots and residuals

Page 136: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 131

#######> anova(lm(c(y)~as.factor(c(trt)) + as.factor(c(rw)) ))

Df Sum Sq Mean Sq F value Pr(>F)as.factor(c(trt)) 5 201.316 40.263 5.5917 0.004191 **as.factor(c(rw)) 3 197.004 65.668 9.1198 0.001116 **Residuals 15 108.008 7.201#######

#######> anova(lm(c(y)~as.factor(c(trt)):as.factor(c(rw)) ))

Df Sum Sq Mean Sq F value Pr(>F)as.factor(c(trt)):as.factor(c(rw)) 23 506.33 22.01Residuals 0 0.00#######

• Can we test for interaction? Do we care about interaction in this case,or just main effects? Suppose it were true that “in row 2, timing 6 issignificantly better than timing 4, but in row 3, treatment 3 is better.”Is this relevant in for recommending a timing treatment for other fields?

Did blocking help? Consider CRD as an alternative:

block 1 2 4 3 2 1 4block 2 5 5 3 4 1 4block 3 6 3 4 2 6 5block 4 1 2 6 2 5 6

• Advantages:

– more possible treatment assignments, so power is increased in arandomization test.

– If we don’t estimate block effects, we’ll have more dof for error.

• Disadvantages:

– It is possible, (but unlikely) that some treatment level will get as-signed many times to a “good” row, leading to post-experimentalbias.

– If “row ” is a big source of variation, then ignoring it may lead toan overly large MSE.

Page 137: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 132

Consider comparing the F-stat from a CRD with that from an RCB: Accord-ing to Cochran and Cox (1957)

MSEcrd =SSB + r(t− 1)MSErcbd

rt− 1

= MSB

(r − 1

rt− 1

)+ MSErcbd

(r(t− 1)

rt− 1

)In general, the effectiveness of blocking is a function of MSEcrd/MSErcb. Ifthis is large, it is worthwhile to block. For the nitrogen example, this ratiois about 2.

4.3 Unbalanced designs

Example: Observational study of 20 fatal accidents.

• Response: y = speed in excess of speed limit

• Recorded sources of variation:

1. R =rainy (rainy/not rainy)

2. I =interstate (interstate/two-lane highway),

cell means sum marginal meansinterstate two-lane

rainy 15 5 130 13r11 = 8 r12 = 2 r1· = 10

not rainy 20 10 120 12r21 = 2 r22 = 8 r1· = 10

sum 160 90 250r·1 = 10 r·2 = 10 r·· = 20

marginal mean 16 9 y··· = 12.5

Lets naively compute sums of squares based on the decomposition:

yijk = y··· + (yij· − y···) + (yijk − yij·)

yijk = y··· + (yi·· − y···) + (y·j· − y···) + (yij· − yi·· − y·j· + y···) + (yijk − yij·)

Page 138: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 133

SSM =∑

i

∑j

∑k

(yij· − y···)2 =

∑i

∑j

rij(yij· − y···)2

= 8(15− 12.5)2 + 2(5− 12.5)2 + 2(20− 12.5)2 + 8(10− 12.5)2 = 325

SSR = 10× (13− 12.5)2 + 10× (12− 12.5)2 = 5

SSI = 10× (16− 12.5)2 + 10× (10− 12.5)2 = 245

Thus the interaction sum of squares, SSRI, must be

SSM − SSR− SST = 325− 5− 245 = 75

This suggests some interaction. But look at the cell means :

• “no rain” - “rain” =5 regardless of interstate

• “interstate” - “two=lane” =10 regardless of rain

There is absolutely no interaction! The problem is that SSR, SSI, SSRI arenot orthogonal (as computed this way) and so SSM 6= SSR +SSI +SSRI.There are other things to be careful about, also:

Unbalanced marginal means: Supposed someone asked, “are the acci-dent speeds higher on rainy days or on clear days?”

• yrain−yclear = 13−12 = 1: Marginal means suggest speeds were slightlyhigher on avg during rainy days than on clear days.

• Cell means show, for both road types, speeds were higher, by 5 mphon avg , on clear days.

Explanation: Cell frequencies are varying.

• Marginal means for rain are dominated by interstate accidents

• Marginal means for no rain are dominated by two-lane accidents

We say that the marginal effects are unbalanced. How can we make senseof marginal effects in such a situation? How can we test for non-additivity?

Page 139: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 134

Solution: Least-squares means. Base inference on the cell means model:

yijk = µij + εijk

µij =1

rij

∑k

yijk, σ2 = s2 =1

N − t1t2

∑i

∑j

∑k

(yijk − yij·)2

The least-squares marginal means are:

µi· =1

t2

t2∑j=1

µij 6≡ yi··

µ·j =1

t1

t1∑i=1

µij 6≡ y·j·

The idea:

1. get estimates for each cell

2. average cell estimates to get marginal estimates.

1 2 · · · t21 y11· y12· · · · y1t2· µ1·2 y21· y22· · · · y2t2· µ2·

......

t1 yt11· yt12· · · · yt1t2· µt1·µ·1 µ·2 · · · µ·t2

Accident example:interstate two-lane marginal mean LS mean

rainy 15 5 13 10not rainy 20 10 12 15

marginal mean 16 9LS mean 17.5 7.5

Comparison of means can be made in the standard way:

Page 140: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 135

Standard errors: µi· =1t2

∑j yij·, so

V (µi·) =1

t22

∑j

σ2/rij

SE(µi·) =1

t2

√∑j

MSE

rij

and similarly

SE(µ·j) =1

t1

√∑i

MSE

rij

Now use the SE to obtain confidence intervals, t-tests, etc.

Testing for interaction: Consider

H0 : Yijk = µ + ai + bj + εijk

H1 : Yijk = µ + ai + bj + (ab)ij + εijk

How do we evaluate H0 when the data are unbalanced? I’ll first outline theprocedure, then explain why it works.

1. Compute SSEf = minµ,a,b,(ab)

∑i

∑j

∑k(yijk − [µ + ai + bj + (ab)ij])

2

2. Compute SSEr = minµ,a,b

∑i

∑j

∑k(yijk − [µ + ai + bj])

2

3. Compute SSI = SSEr − SSEf . Note that this is always positive.

Allowing for interaction improves the fit, and reduces error variance. SSImeasures the improvement in fit. If SSI is large, i.e. SSEr is much big-ger than SSEf , this suggests the additive model does not fit well and theinteraction term should be included in the model.

Testing:

F =MSI

MSE=

SSI/(t1 − 1)(t2 − 1)

SSE/(N − t1t2)

Under H0, F ∼ F(t1−1)(t2−1),N−t1t2 , so a level-α test of H0 is

reject H0 if F > F1−α,(t1−1)(t2−1),N−t1t2 .

Page 141: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 136

Note:

• SSI is the change in fit in going from the additive to the full model;

• (t1 − 1)(t2 − 1) is the change in number of parameters in going fromthe additive to the full model.

A painful example: A small scale clinical trial was done to evaluate theeffect of painkiller dosage on pain reduction for cancer patients in a varietyof age groups.

• Factors of interest:

– Dose ∈ { Low, Medium, High }– Age ∈ { 41-50, 51-60, 61-70, 71-80 }

• Response = Change in pain index ∈ {−5, 5}

• Design: CRD, each treatment level was randomly assigned to ten pa-tients, not blocked by age.

> table(trt,ageg)ageg

trt 50 60 70 801 1 2 3 42 1 3 3 33 2 1 4 3

> tapply(y,trt,mean)1 2 3

0.38 -0.95 -2.13

> tapply(y,ageg,mean)50 60 70 80

-2.200000 -1.266667 -0.920000 -0.140000

Do these marginal plots/means misrepresent the data? To evaluate thispossibility,

• compare the marginal plots to the interaction plot in Figure 4.15;

Page 142: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 137

1 2 3

−4

−2

02

50 60 70 80−

4−

20

2

Figure 4.14: Marginal plots for pain data

• compute the LS means and compare to marginal means.

> cellmeans<- tapply(y,list(trt,ageg),mean)

> cellmeans50 60 70 80

1 -4.90 1.4 1.066667 0.6750002 2.40 -3.0 -1.266667 0.3000003 -3.15 -1.4 -2.150000 -1.666667

> trt_lsm<- apply(cellmeans,1,mean)> age_lsm<- apply(cellmeans,2,mean)

> trt_lsm1 2 3

-0.4395833 -0.3916667 -2.0916667

> age_lsm50 60 70 80

-1.8833333 -1.0000000 -0.7833333 -0.2305556

Page 143: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 138

1.50 3.50 2.60 1.70 3.70 2.80

−4

−2

02

Figure 4.15: Interaction plots for pain data

What are the differences between LS means and marginal means? Not asextreme as in the accident example, but the differences can be explained bylooking at the interaction plot, and the slight imbalance in the design:

• The youngest patients (ageg=50) were imbalanced towards the higherdose, so we might expect their marginal mean to be too low. Observethe change from the marginal mean = -2.2 to the LS mean = -1.883.

• The oldest patient (ageg=80) were imbalanced towards the lower dose,so we might expect their marginal mean to be too high. Observe thechange from the marginal mean = -.14 to the LS mean = -.23 .

Lets look at the main effects in our model:

> trt_coef<- trt_lsm-mean(cellmeans)> age_coef<- age_lsm-mean(cellmeans)

> trt_coef1 2 3

0.5347222 0.5826389 -1.1173611

> age_coef

Page 144: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 139

50 60 70 80-0.90902778 -0.02569444 0.19097222 0.74375000

What linear modeling commands in R will get you the same thing?

> options(contrasts=c("contr.sum","contr.poly"))> fit_full<-lm( y~as.factor(ageg)*as.factor(trt))

> fit_full$coef[2:4]as.factor(ageg)1 as.factor(ageg)2 as.factor(ageg)3

-0.90902778 -0.02569444 0.19097222> fit_full$coef[5:6]as.factor(trt)1 as.factor(trt)2

0.5347222 0.5826389

Note that the coefficients in the reduced/additive model are not the same:

> fit_add<-lm( y~as.factor(ageg)+as.factor(trt))>> fit_add$coef[2:4]as.factor(ageg)1 as.factor(ageg)2 as.factor(ageg)3

-0.7920935 -0.3607554 0.3070743> fit_add$coef[5:6]as.factor(trt)1 as.factor(trt)2

1.207447717 -0.001899274

4.3.1 Non-orthogonal sums of squares:

Consider the following ANOVA table obtained from R:

Df Sum Sq Mean Sq F value Pr(>F)as.factor(ageg) 3 13.355 4.452 0.9606 0.42737as.factor(trt) 2 28.254 14.127 3.0482 0.06613 .Residuals 24 111.230 4.635

It might be somewhat unsettling that R also produces the following table:

> anova( lm( y~as.factor(trt)+as.factor(ageg)) )Df Sum Sq Mean Sq F value Pr(>F)

as.factor(trt) 2 31.588 15.794 3.4079 0.0498 *as.factor(ageg) 3 10.021 3.340 0.7207 0.5494Residuals 24 111.230 4.635

Page 145: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 140

Where do these sums of squares come from? What do the F-tests repre-sent? By typing “?anova.lm” in R we see that anova() computes

“a sequential analysis of variance table for that fit. That is, the

reductions in the residual sum of squares as each term of the formula

is added in turn are given in as the rows of a table, plus the residual

sum of squares.”

Sequential sums of squares: Suppose in a linear model we have threesets of parameters, say A, B and C. For example, let

• A be the main effects of factor 1

• B be the main effects of factor 2

• C be their interaction.

Consider the following calculation:

0. Calculate SS0 = residual sum of squares from the model

(0) yijk = µ + εijk

1. Calculate SS1 = residual sum of squares from the model

(A) yijk = µ + ai + εijk

2. Calculate SS2 = residual sum of squares from the model

(AB) yijk = µ + ai + bj + εijk

3. Calculate SS3 = residual sum of squares from the model

(ABC) yijk = µ + ai + bj + (ab)ij + εijk

We can assess the importance of A, B and C sequentially as follows:

• SSA = improvement in fit in going from (0) to (A) = SS0-SS1

• SSB|A = improvement in fit in going from (A) to (AB) = SS1-SS2

• SSC|AB = improvement in fit in going from (AB) to (ABC) = SS2-SS3

Page 146: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 141

This is actually what R presents in an ANOVA table:

> ss0<-sum( lm( y~1 )$res^2 )> ss1<-sum( lm( y~as.factor(ageg) )$res^2 )> ss2<-sum( lm( y~as.factor(ageg)+as.factor(trt) )$res^2 )> ss3<

> s0-ss1[1] 13.3554>> ss1-ss2[1] 28.25390>> ss2-ss3[1] 53.75015

> ss3[1] 57.47955

> anova( lm( y~as.factor(ageg)*as.factor(trt)) )Df Sum Sq Mean Sq F value Pr(>F)

as.factor(ageg) 3 13.355 4.452 1.3941 0.27688as.factor(trt) 2 28.254 14.127 4.4239 0.02737 *as.factor(ageg):as.factor(trt) 6 53.750 8.958 2.8054 0.04167 *Residuals 18 57.480 3.193

Why does order of the variables matter?

• In a balanced design, the parameters are orthogonal, and SSA =SSA|B , SSB = SSB|A and so on, so the order doesn’t matter.

• In an unbalanced design, the estimates of one set of parameters dependson whether or not you are estimating the others, i.e. they are notorthogonal, and in general SSA 6= SSA|B , SSB 6= SSB|A.

I will try to draw a picture of this on the board.

The bottom line: For unbalanced designs, there is no “variance due tofactor 1” or “variance due to factor 2”. There is only “extra variance dueto factor 1, beyond that explained by factor 2”, and vice versa. This isessentially because of the non-orthogonality, and so the part of the variance

Page 147: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 142

that can be explained by factor 1 overlaps with the part that can be explainedby factor 2. This will become more clear when you get to regression nextquarter.

4.4 Analysis of covariance

Example(Oxygen ventilation): Researchers are interested in measuringthe effects of exercise type on maximal oxygen uptake.

Experimental units: 12 healthy males between 20 and 35 who didn’t ex-ercise regularly.

Design: Six subjects randomly selected to a 12-week step aerobics program,the remaining to 12-weeks of outdoor running on flat terrain.

Response: Before and after the 12 week program, each subject’s O2 uptakewas tested while on an inclined treadmill.

y = change in O2 uptake

Initial analysis: CRD with one two-level factor. The first thing to do isplot the data. The first panel of Figure 4.16 indicates a moderately largedifference in the two sample populations. The second thing to do is a two-sample t-test:

tobs =

∣∣∣∣∣ yA − yB

sp

√2/6

∣∣∣∣∣ =

∣∣∣∣7.705− (−2.767)

6.239× .577

∣∣∣∣ = 2.907, Pr(|t10| ≥ 2.907) = 0.0156

So we have reasonably strong evidence of a difference in treatment effects.However, a plot of residuals versus age of the subject indicates a couple ofcauses for concern with our inference:

1. The residual plot indicates that response increases with age (why?),regardless of treatment group.

2. Just due to chance, the subjects assigned to group A were older thanthe ones assigned to B.

Page 148: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 143

A B

−10

−5

05

1015

grp

o2_c

hang

e

A

A

A A

A

A

B

B

BB

B

B

20 22 24 26 28 30−

50

510

agere

sidu

als

Figure 4.16: Oxygen uptake data

Question: Is the observed difference in yA, yB due to

• treatment?

• age?

• both?

A linear model for ANCOVA: Let yi,j be the response of the jth subjectin treatment i:

yi,j = µ + ai + b× xi,j + εi,j

This model gives a linear relationship between age and response for eachgroup:

intercept slope errorif i = A, Yi = (µ + aA) + b× xi,j + εi,j

i = B, Yi = (µ + aB) + b× xi,j + εi,j

Unbiased parameter estimates can be obtained by minimizing the residualsum of squares:

(µ, a, b) = arg minµ,a,b

∑i,j

(yi,j − [µ + ai + b× xi,j])2

Page 149: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 144

A

A

A A

AA

B

B

BB

B

B

20 22 24 26 28 30

−10

−5

05

1015

age

o2_c

hang

e

A

A

A A

AA

B

B

BB

B

B

20 22 24 26 28 30−

10−

50

510

15age

o2_c

hang

e

Figure 4.17: ANOVA and ANCOVA fits to the oxygen uptake data

The fitted model is shown in the second panel of Figure 4.17.

Variance decomposition Consider the following two ANOVAs:

> anova(lm(o2_change~grp))Df Sum Sq Mean Sq F value Pr(>F)

grp 1 328.97 328.97 8.4503 0.01565 *Residuals 10 389.30 38.93

> anova(lm(o2_change~grp+age))Df Sum Sq Mean Sq F value Pr(>F)

grp 1 328.97 328.97 42.062 0.0001133 ***age 1 318.91 318.91 40.776 0.0001274 ***Residuals 9 70.39 7.82

The second one decomposes the variation in the data that is orthogonal totreatment (SSE from the first ANOVA) into a parts that can be ascribed toage (SS age in the second ANOVA), and everything else (SSE from secondANOVA). I will try to draw some triangles that describe this situation.

Now consider two other ANOVAs:

Page 150: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 4. MULTIFACTOR DESIGNS 145

> anova(lm(o2_change~age))Df Sum Sq Mean Sq F value Pr(>F)

age 1 576.09 576.09 40.519 8.187e-05 ***Residuals 10 142.18 14.22

> anova(lm(o2_change~age+grp))Df Sum Sq Mean Sq F value Pr(>F)

age 1 576.09 576.09 73.6594 1.257e-05 ***grp 1 71.79 71.79 9.1788 0.01425 *Residuals 9 70.39 7.82

Again, I will try to draw some triangles describing this situation.

Blocking, ANCOVA and unbalanced designs: Suppose we have a fac-tor of interest (say F1) that we will randomly assign to experimental material,but it is known that there is some nuisance factor (say F2 ) that is suspectedto be a large source of variation. Our options are:

• Block according to F2: This allows an even allocation of treatmentsacross levels of F2. As a result, we can separately estimate the effectsof F1 and F2.

• Do a CRD with respect to F1, but account for F2 somehow in theanalysis:

– For a standard two-factor ANOVA, F2 must be treated as a cat-egorical variable (“old”, “young”).

– ANCOVA allows for more precise control of variation due to F2.

However, in either of these approaches the design is unlikely to be completelybalanced , and so the effects of F1 can’t be estimated separately from thoseof F2. This is primarily a problem for experiments with small sample sizes:The probability of design imbalance decreases as a function of sample size(as does the correlation between the parameter estimates) as long as F1 israndomly assigned.

Page 151: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

Chapter 5

Nested Designs

5.1 Nested Designs

Example(Potato): Sulfur added to soil kills bacteria, but too much sulfurcan damage crops. Researchers are interested in comparing two levels ofsulfur additive (low, high) on the damage to two types of potatoes.

Factors of interest:

1. Potato type ∈ {A, B}2. Sulfur additive ∈ {low,high}

Experimental material: Four plots of land.

Design constraints:

• It is easy to plant different potato types within the same plot

• It is difficult to have different sulfur treatments in the same plot,due to leeching.

Experimental Design: A Split-plot design

1. Each sulfur additive was randomly assigned to two of the fourplots.

2. Each plot was split into four subplots. Each potato type wasrandomly assigned to two subplots per plot.

146

Page 152: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 147

LA BA B

HB AA B

HB AA B

LB AA B

Randomization:

Sulfur type was randomized to whole plots;

Potato type was randomized to subplots.

Initial data analysis: Sixteen responses, 4 treatment combinations.

• 8 responses for each potato type

• 8 responses for each sulfur type

• 4 responses for each potato×type combination

> fit.full<-lm(y~type*sulfur) ; fit.add<-lm(y~type+sulfur)

> anova(fit.full)Df Sum Sq Mean Sq F value Pr(>F)

type 1 1.48840 1.48840 13.4459 0.003225 **sulfur 1 0.54022 0.54022 4.8803 0.047354 *type:sulfur 1 0.00360 0.00360 0.0325 0.859897Residuals 12 1.32835 0.11070

> anova(fit.add)Df Sum Sq Mean Sq F value Pr(>F)

type 1 1.48840 1.48840 14.5270 0.00216 **sulfur 1 0.54022 0.54022 5.2727 0.03893 *Residuals 13 1.33195 0.10246

Page 153: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 148

high low

4.5

5.0

5.5

sulfurA B

4.5

5.0

5.5

type

high.A low.A high.B low.B

4.5

5.0

5.5

Figure 5.1: Potato data.

Page 154: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 149

●●

−2 −1 0 1 2

−0.

40.

00.

20.

4

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

4.6 4.8 5.0 5.2 5.4−

0.4

0.0

0.2

0.4

fit.add$fitfit

.add

$res

Figure 5.2: Diagnostic plots for potato ANOVA.

Randomization test: Consider comparing the observed outcome to thepopulation of other outcomes that could’ve occurred under

• different treatment assignments;

• no treatment effects.

Treatment assignment Field 1 Field 2 Field 3 Field 4low low high high

1 A A B B A A B B A A B B A A B Blow low high high

2 A B A B A A B B A A B B A A B Blow low high high

3 A B A B A B A B A A B B A A B Blow low high high

4 A B A B A A B B A B A B A A B Blow high low high

5 A B A B A A B B A A B B A A B B...

......

......

......

......

......

......

......

......

......

......

Page 155: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 150

Number of possible treatment assignments:

•(42

)= 6 ways to assign sulfur

• For each sulfur assignment there are(42

)4= 1296 type assignments

So we have 7776 possible treatment assignments. It probably wouldn’t be tohard to write code to go through all possible treatment assignments, but itsvery easy to obtain a Monte Carlo approximation to the null distribution:

F.t.null<-F.s.null<-NULL

for(ns in 1:1000){s.sim<-rep( sample(c("low","low","high","high")),rep(4,4))t.sim<-c( sample(c("A","A","B","B")), sample(c("A","A","B","B")),

sample(c("A","A","B","B")), sample(c("A","A","B","B")) )

fit.sim<-anova( lm(y~as.factor(t.sim)+as.factor(s.sim)) )

F.t.null<-c(F.t.null,fit.sim[1,4])F.s.null<-c(F.s.null,fit.sim[2,4])

}

> mean(F.t.null>=F.t.obs)[1] 0.001> mean(F.s.null>=F.s.obs)[1] 0.352

What happened?

F randtype ≈ F1,13 ⇒ prand

type ≈ panova1type

F randsulfur 6≈ F1,13 ⇒ prand

sulfur 6≈ panova1sulfur

The F-distribution approximates a null randomization distribution if treat-ments are randomly assigned to units. But here, the sulfur treatment isbeing assigned to groups of four units.

• The precision of an effect is related to the number of independenttreatment assignments made

Page 156: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 151

F.t.null

Den

sity

0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

F.s.nullD

ensi

ty0 5 10 15

0.00

0.05

0.10

0.15

Figure 5.3: Potato data

• We have 16 assignments of type, but only 4 assignments of sulfur. Itis difficult to tell the difference between sulfur effects and field effects.Our estimates of sulfur effects are less precise than those of type.

Note:

• From the point of view of type alone, the design is a RCB.

– Each whole-plot(field) is a block. We have 2 observations of eachtype per block.

– We compare MSType to the MSE from residuals left from a modelwith type effects, block effects and possibly interaction terms.

– Degrees of freedom breakdown for the sub-plot analysis:Source dof

block=whole plot 3type 1

subplot error 11subplot total 15

• From the point of view of sulfur alone, the design is a CRD.

Page 157: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 152

– Each whole plot is an experimental unit.

– We want to compare MSSulfur to the variation in whole plots,which are fields.

– Degrees of freedom breakdown for the whole-plot analysis:Source dofsulfur 1

whole plot error 2whole plot total 3

• Recall, degrees of freedom quantify the precision of our estimates andthe shape of the null distribution.

The basic idea is:

• We have different levels of experimental units (fields/whole-plots forsulfur, subfields/sub-plots for type).

• We compare the MS of a factor to the variation among the experimentalunits of that factor’s level. In general, this variation is representedby the interaction between the experimental units label and all factorsassigned at the given level.

• The level of an interaction is the level of the smallest experimental unitinvolved.

RCB/Split-plot analysis for type and type×sulfur interaction:

> anova(lm( y~type+type:sulfur+as.factor(field) ) )Df Sum Sq Mean Sq F value Pr(>F)

type 1 1.48840 1.48840 54.7005 2.326e-05 ***as.factor(field) 3 1.59648 0.53216 19.5575 0.0001661 ***type:sulfur 1 0.00360 0.00360 0.1323 0.7236270Residuals 10 0.27210 0.02721

Thus there is strong evidence for type effects, and little evidence that theeffects of type vary among levels of sulfur.

Page 158: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 153

Whole-plot analysis for sulfur:

> anova(lm(y~sulfur+as.factor(field)))Df Sum Sq Mean Sq F value Pr(>F)

sulfur 1 0.54022 0.54022 3.6748 0.07936 .as.factor(field) 2 1.05625 0.52813 3.5925 0.05989 .Residuals 12 1.76410 0.14701

The F-test here is not appropriate: We need to compare MSSulfur to thevariation among whole-plots, after accounting for the effects of sulfur, i.e.MSField (note that including or excluding type here has no effect on thiscomparison).

MSS<-anova(lm(y~sulfur+as.factor(field)))[1,3]MSWPE<-anova(lm(y~sulfur+as.factor(field)))[2,3]F.sulfur<-MSS/MSWPE

> F.sulfur[1] 1.022911> 1-pf(F.sulfur,1,2)[1] 0.4182903

This is more in line with the analysis using the randomization test.

5.1.1 Mixed-effects approach

What went wrong with the normal sampling model approach? What is wrongwith the following model?

yijk = µ + ai + bj + (ab)ij + εijk

where

• i indexes sulfur level, i ∈ {1, 2} ;

• j indexes type level, i ∈ {1, 2} ;

• k indexes reps, i ∈ {1, . . . , 4}

• εijk are i.i.d. normal

Page 159: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 154

l

ll

l hhhh

hhhh

l

l

ll

1.0 1.5 2.0 2.5 3.0 3.5 4.0

−0.

40.

00.

20.

4

field

resi

dual

Figure 5.4: Potato data

We checked the normality and constant variance assumptions for this modelpreviously, and they seemed ok. What about independence? Figure 5.4 plotsthe residuals as a function of field. The figure indicates that residuals aremore alike within a field than across fields, and so observations within a fieldare positively correlated. Statistical dependence of this sort is commonto split-plot and other nested designs.

dependence within whole-plots

• affects the amount of information we have about factors applied atthe whole-plot level: within a given plot, we can’t tell the differencebetween plot effects and whole-plot factor effects.

• This doesn’t affect the amount of information we have about factorsapplied at the sub-plot level: We can tell the difference between ploteffects and sub-plot factor effects.

If residuals within a whole-plot are positively correlated, the most intu-itively straightforward way to analyze such data (in my opinion) is with ahierarchical mixed-effects model:

yijkl = µ + ai + bj + (ab)ij + γik + εijkl

Page 160: Statistics 502 Lecture Notes...Chapter 1 Research Design Principles 1.1 Induction In our efforts to acquire knowledge about processes or systems, much scien-tific knowledge is gained

CHAPTER 5. NESTED DESIGNS 155

where things are as before except

• γik represents error variance at the whole plot level, i.e. variance inwhole plot experimental units (fields or blocks in the above example).The index k represents whole-plot reps k = 1, . . . , r1,

{γik} ∼ normal(0, σw)

• εijkl represents error variance at the sub plot level, i.e. variance insub plot experimental units The index j represents sub-plot reps l =1, . . . , r2,

{εijkl} ∼ normal(0, σs)

Now every subplot within the same wholeplot has something in common, i.e.γik. This models the positive correlation within whole plots:

Cov(yi,j1,k,l1 , yi,j2,k,l2) = E[(yi,j1,k,l1 − E[yi,j1,k,l1 ])× (yi,j2,k,l2 − E[yi,j2,k,l2 ])]

= E[(γi,k + εi,j1,k,l1)× (γi,k + εi,j2,k,l2)]

= E[γ2i,k + γi,k × (εi,j1,k,l1 + εi,j2,k,l2) + εi,j1,k,l1εi,j2,k,l2 ]

= E[γ2i,k] + 0 + 0 = σ2

w

This and more complicated random-effects models can be fit using the lme

command in R. To use this command, you need the nlme package:

library(nlme)fit.me<-lme(fixed=y~type+sulfur, random=~1|as.factor(field))

>summary(fit.me)

Fixed effects: y ~ type + sulfurValue Std.Error DF t-value p-value

(Intercept) 4.8850 0.2599650 11 18.790991 0.0000typeB 0.6100 0.0791575 11 7.706153 0.0000sulfurlow -0.3675 0.3633602 2 -1.011393 0.4183

> anova(fit.me)numDF denDF F-value p-value

(Intercept) 1 11 759.2946 <.0001type 1 11 59.3848 <.0001sulfur 1 2 1.0229 0.4183