the one-factor modelfisher.utstat.toronto.edu/~hadas/sta305/lecture notes/week2.pdf · sta305 week2...

51
STA305 week2 1 The One-Factor Model Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon the levels of the treatment factors. Let Y ij be a random variable that represents the response obtained on the j-th observation of the i-th treatment. Let μ denote the overall expected response. The expected response for an experimental unit in the i-th treatment group is μ i = μ + τ i τ i is deviation of i-th mean from overall mean; it is referred to as the effect of treatment i.

Upload: others

Post on 06-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 1

The One-Factor Model

• Statistical model is used to describe data. It is an equation that shows the dependence of the response variable upon the levels of the treatment factors.

• Let Yij be a random variable that represents the response obtained on the j-th observation of the i-th treatment.

• Let μ denote the overall expected response.

• The expected response for an experimental unit in the i-th treatment group is μi = μ + τi

• τi is deviation of i-th mean from overall mean; it is referred to as the effect of treatment i.

Page 2: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 2

• The model is

where is the deviation of the individual’s response from the treatment group mean.

• is known as the random or experimental error.

ijiijY ετμ ++=

ijε

ijε

Page 3: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 3

Fixed Effects versus Random Effects• In some cases the treatments are specifically chosen by the experimenter

from all possible treatments.• The conclusions drawn from such an experiment apply only to these

treatments and cannot be generalized to other treatments not included in experiment.

• This is called a fixed effects model• In other cases, the treatments included in the experiment can be regarded as

a random selection from the set of all possible treatments.• In this situation, conclusions based on the experiment can be generalized to

other treatments.• When the treatments are random sample, treatment effects, τi are random

variables.• This model is called a random effects model or a components of

variance model.• The random effects model will be studied after the fixed effects model

Page 4: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 4

More about the Fixed Effects Model

• As specified in slide (2) the model is

Where are i.i.d. with distribution N(0, σ2)

• It follows that response of experimental unit j in treatment group i, Yij , is normally distributed with

• In other words

ijiijY ετμ ++=

( ) iijYE τμ +=

( ) ( ) 2σε == ijij VarYVar

( )2,~ στμ iij NY +

ijε

Page 5: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 5

Treatment Effects

• Recall that treatment effects have been defined as deviations from overall mean, and so the model can be parameterized so that:

• In the special case where r1 = r2 = · · · = ra = r this condition reduces to

• The hypothesis that there is no treatment effect can be expressed mathematically as:

H0 : μ1 = μ2 = · · · = μaHa : not all μi are equal

• This can be expressed equivalently in terms of the τi:H0 : τ1 = τ2 = · · · = τa = 0Ha : not all τi are equal to 0

∑=

=a

iiir

10τ

∑=

=a

ii

10τ

Page 6: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 6

’Dot’ Notation

• “Dot” notation will be used to denote treatment and overall totals, as well as treatment and overall means.

• The sum of all observations in the i-th treatment group will be denoted as

• Similarly, the sum of all responses in all treatment groups is denoted:

• The treatment and overall means are:

i

i

irii

r

jiji YYYYY L++==∑

=• 21

1

∑∑= =

•• =a

i

r

jij

i

YY1 1

i

ir

jij

ii r

YY

rY

i•

=• == ∑

1

1 ∑∑=

••

=•• ==

a

i

r

jij n

YYn

Yi

1 1

1

Page 7: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Rationale for Analysis of Variance

• Consider all of the data from the a treatment groups as a whole.

• The variability in the data may come from two sources:1) treatment means differ from overall mean, this is called between

group variability.

2) within a given treatment group individual observations differ from group mean, this is called within group variability.

STA305 week2 7

Page 8: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Total Sum of Squares

• Total variation in data set as a whole is measured by the total sum of squares. It is given by

• Each deviation from the overall sample mean can be expressed as the sum of 2 parts:1) deviation of the observation from the group mean.2) deviation of the group mean from the overall mean

• In other words…

• The SST can then be written as…

STA305 week2 8

( )∑∑= =

••−=a

i

r

jijT

i

YYSS1 1

2

Page 9: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Expected Sums of Squares

• Finding the expected value of the sums of squares for error and treatment will lead us to a test of the hypothesis of no treatment effect, i.e., H0 : τ1 = τ2 = · · · = τa = 0

• We start by finding the expected value of SSE….

• We continue with the expected value of SSTreat

STA305 week2 9

Page 10: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Mean Squares

• As we have seen in the calculation above, the MSE = SSE/(n − a) is an unbiased estimator of σ2.

• The MSE is called the mean square for error.

• The degrees of freedom associated with SSE are n − a and it follows that E(MSE) = σ2.

• The mean square for treatment is defined to be: MSTreat = SSTreat / (a-1).

• The expected value of MSTreat is

STA305 week2 10

( ) ∑=−

+=a

iiiTreat r

aMSE

1

22

11 τσ

Page 11: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Hypothesis Testing

• Recall that our goal is to test whether there is a treatment effect.• The hypothesis of interest is

H0 : τ1 = τ2 = · · · = τa = 0Ha : not all τi are equal to 0

• Notice that if H0 is true, then

• On the other hand, if H0 is false, then at least one τa ≠ 0, in which case

and so E (MSTreat) > E (MSE)

• On average, then, the ratio MSTreat/MSE should be small if H0 is true, and large otherwise.

• We use this to develop formal test.

STA305 week2 11

( ) ( ) ( )E

a

ii

a

iiiTreat MSEr

ar

aMSE ==

−+=

−+= ∑∑

==

2

1

22

1

22 01

11

1 σστσ

∑ > 02iirτ

Page 12: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Cochran’s Theorem

• Let Z1,Z2, . . . ,Zn be i.i.d. N(μ, 1).

• Suppose that where Qj has d.f vj.

• A necessary and sufficient condition for the Qj to be independent of one another, and for Qj ~ χ2(vj) is that .

• Cochran’s theorem implies that SSE/σ2 and SSTreat/ σ2 have independent χ2 distributions with n – a and a − 1 d.f., respectively.

• Recall: If X1 and X2 are two independent random variables, each with a χ2 distribution, then

STA305 week2 12

221

1

2s

n

ii QQZ ++=∑

=

L

∑=

=s

jj nv

1

( )2122

11 ,~// vvFvXvX

Page 13: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Hypothesis Test for Treatment Effects• Cochran’s theorem and the result just stated provide the tools to construct a

formal hypothesis test of no treatment effects.

• The Hypothesis again are:H0 : τ1 = τ2 = · · · = τa = 0Ha : not all τi are equal to 0

• The Test Statistic is: Fobs = MSTreat/MSE

• Note that if H0 is true, then Fobs ~ F(a − 1, n − a).

• So the P-value = P(F(a − 1, n − a) > Fobs).

• We reject H0 in favor of Ha if P−value < α.

• Alternatively, reject H0 in favor of Ha if Fobs > Fα(a − 1, n − a), where Fα(a − 1, n − a) is the upper 100 × α%-ile point of the F(a − 1, n − a) distribution.

STA305 week2 13

Page 14: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Analysis of Variance Table

STA305 week2 14

• The results of the calculations and the hypothesis testing are best summarized in an analysis of variance table

• The ANOVA Table is given below

Page 15: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Estimable Functions of Parameters

• A function of the model parameters is estimable if and only if it can be written as the expected value of a linear combination of the response variables.

• In other words, every estimable function is of the form

where the cij are constants

• It can be shown that from previous sections, μ, μi, and σ2 are estimable.

STA305 week2 15

⎟⎟⎠

⎞⎜⎜⎝

⎛∑∑= =

a

i

r

jijij

i

YcE1 1

Page 16: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Example - Effectiveness of Three Methods for Teaching a Programming Language

• A study was conducted to determine whether there is any difference in the effectiveness of 3 methods of teaching a particular programming language.

• The factor levels (treatments) are the three teaching methods: 1) on-line tutorial 2) personal attention of instructor plus hands-on experience3) personal attention of instructor, but no hands-on experience

• Replication and Randomization: 5 volunteers were randomly allocated to each of the 3 teaching methods, for a total of 15 study participants.

• Response Variable: After the programming instruction, a test was administered to determine how well the students had learned the programming language.

• Research Question: Do the data provide any evidence that the instruction methods differ with respect to test score.

• The data and the solutions are….STA305 week2 16

Page 17: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Conducting an ANOVA in SAS

• There are several procedures in SAS that can be used to do an analysis of variance.

• PROC GLM (for generalized linear model) will be used in this course

• To do the analysis for the Example on slide 16, start by creating a SAS dataset:data teach ;input method score ;cards ;1 731 77.....3 71;run ;

STA305 week2 17

Page 18: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• Use this dataset to conduct an ANOVA using the following SAS code:

proc glm data = teach ;class method ;model score = method / ss3 ;run ;quit ;

• The output produced by this procedure is given in the next slide.

STA305 week2 18

Page 19: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 19

Page 20: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Estimating Model Parameters

• The ANOVA indicates whether there is a treatment effect, however, it doesn’t provide any information about individual treatments or how treatments compare with each other.

• To better understand outcome of experiment, estimating mean response for each treatment group is useful.

• Also, it is useful to obtain an estimate of how much variability there is within each treatment group.

• This involves estimating model parameters.

STA305 week2 20

Page 21: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Variability

• Recall, on slides (9 and 10) we have showed that the MSE is unbiased estimator of σ2.

• Further, Cochran’s Theorem was used to show that SSE/ σ2 ~ χ2(n − a).

• We can use this result to calculate a 100 × (1 − α)% confidence interval for σ2.

• The CI is give by

where and are the upper and lower percentage points of the χ2 distribution with n − a d.f., respectively.

STA305 week2 21

( ) ( )⎟⎟⎠

⎞⎜⎜⎝

⎛−−− 1

,1 2

2/2

2/1 nSS

nSS EE

αα χχ

( )an −−2

2/1 αχ ( )an −22/αχ

Page 22: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Overall Mean

• As discussed in the beginning, the overall expected value is μ.

• Show that is unbiased estimator of μ…

• The variance of is σ2/n.

• So the 100 × (1 −α)% confidence interval for μ is:

• Further, a 100 × (1 −α)% confidence interval for μi is:

• It follows that is an unbiased estimator of the effect of treatment i, τi.

STA305 week2 22

••Y

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛−±•• n

MSantY E2/α

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛−±•

i

Ei r

MSantY 2/α

••• −YYi

Page 23: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Differences between Treatment Groups

• Differences between specific treatment groups will be important from researcher’s point of view.

• The expected difference in response between treatment groups i and j is: μi − μj = τi – τj.

• Since treatment groups are independent of each other, it follows that

• Therefore, a 100 × (1 −α)% confidence interval for τi – τj is:

STA305 week2 23

⎟⎟⎠

⎞⎜⎜⎝

⎛+−− •••

jijii rr

NYY 11,~ 2σττ

( ) ( )ji

Ei rrMSantYY 11

2/ +−±− ••• α

Page 24: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Example - Methods for Teaching Programming Language Cont’d

• Back to the example of three teaching methods and their effect on programming test score.

• Based on the ANOVA developed earlier, we found significant difference between the three methods.

• Which method had the highest average?

• What is a 95% CI for mean difference in test scores for the 2 instructor-based methods?

STA305 week2 24

Page 25: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Comparisons Among Treatment Means

• As mentioned above, ANOVA will indicate whether there is significant effect of treatments overall it doesn’t indicate which treatments are significantly different from each other.

• There are a number of methods available for making pairwise comparisons of treatment means.

STA305 week2 25

Page 26: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Least Significant Difference (LSD)

• This method tests the hypothesis that all treatment pairs have the same mean against the alternative that at least one pair differs, that is the hypothesis are:

H0 : μi − μj = 0 for all i, jHa : μi − μj ≠ 0 for at least one pair i, j

• In testing difference between any two specific means, reject the null hypothesis if:

• In the case where the design is balanced and ri = r for all i, the condition above becomes:

STA305 week2 26

( )r

MSantYY Eji

22/ −>− •• α

( )ji

ji rrantYY 11

2/ +−>− •• α

Page 27: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• In other words, the smallest difference between the means that would be considered statistically significant is:

• This quantity, LSD, is called the least significant difference.

• LSD method requires that the difference between each pair of means be compared to the LSD.

• In cases where difference is greater than LSD, we conclude that treatment means differ.

STA305 week2 27

( )r

MSantLSD E22/ −= α

Page 28: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Important Notes

• As in any situation where large number of significance tests conducted, the possibility of finding large difference due to chance alone increases.

• Therefore, in case where the number of treatment groups is large, the probability of making this type of error is relatively large.

• In other words, probability of committing a Type I error will be increased above α.

• Further, although the ANOVA F-test might find a significant treatment effect, LSD method might conclude that there are no 2 treatment means that are significantly different from each other.

• This is because ANOVA F-test considers overall trend of effect of treatment on outcome, and is not restricted to pairwise comparisons.

STA305 week2 28

Page 29: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Other Methods for Pairwise Comparisons

• Other methods for conducting pairwise comparisons are available.

• The methods that are implemented in PROC GLM in SAS include:– Bonferonni– Duncan’s Multiple Range Test– Dunnett’s procedure– Scheffe’s method– Tukey’s test– several otheres

• Chapter 4 of Dean & Voss discusses some of these methods.

STA305 week2 29

Page 30: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Pairwise Comparisons in SAS

• Pairwise comparisons can be requested by including a means statement.

• The code below requests means with LSD comparison:proc glm data = teach ;class method ;model score = method / ss3 ;means method / lsd cldiff ;run ;

• The part of the output containing the pairwise comparisons is shown in the next slide.

STA305 week2 30

Page 31: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 31

Page 32: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 32

Page 33: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Contrasts

• ANOVA test indicates only whether there is an overall trend for the treatment means to differ, and does not indicate specifically which treatments are the same, which are different, etc.

• In the last few slides looked at pairwise comparisons between treatment means.

• However, comparisons that are of interest to researcher may include more then just two group. They can be linear combination of means.

STA305 week2 33

Page 34: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Example - Does Food Decrease Effectiveness of Pain Killers?

• Researchers at pain clinic want to know whether effectiveness of two leading pain killers is same when taken on empty stomach as when taken with food.

• A study with four treatment groups was designed:1. aspirin with no food2. aspirin with food3. tylenol with no food4. tylenol with food

• In addition to determining whether there is a difference between the four treatment groups, researchers want to determine whether there is a difference between taking medication with food and taking it without.

• This second hypothesis can be expressed statistically as:H0 : μ1 + μ3 = μ2 + μ4

Ha : μ1 + μ- ≠ μ2 + μ4STA305 week2 34

Page 35: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• The point estimate of difference between fed and not fed conditions is based on sample means:

STA305 week2 35

( ) ( )•••• +−+ 4231 YYYY

Page 36: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Hypothesis Tests Using Contrasts

• As in the example on the previous slide, the comparison of treatment means that is of interest might be a linear combination of means. That is, the hypothesis of interest would be of the form

H0 : c1μ1 + c2μ2 + · · · + caμa = 0Ha : c1μ1 + c2μ2 + · · · + caμa ≠ 0

• The ci are constants subject to the constraints:(i) ci > 0 for all i, and (ii)

• Test of this hypothesis can be constructed using sample means for each treatment group.

• The linear combination c1μ1 + c2μ2 + · · · + caμa is called a contrast.

STA305 week2 36

∑==

a

i ic1

0

Page 37: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• If the assumptions of the model are satisfied, then:

• If σ2 was known, a test of H0 could be done using:

• Since σ2 is unknown, we use its unbiased estimate, the MSE, and conduct a t-test with n − a d.f.. The test statistics is

• Recall, if X is a random variable with t(v) distribution, then X2 has F(1, v) distribution.

STA305 week2 37

⎟⎟⎠

⎞⎜⎜⎝

⎛∑ ∑∑= ==

a

i

a

i iiii

a

iii r

ccNYc1 1

22

1,~ σμ

∑∑

=

= •

a

i ii

a

i ii

rc

Yc

12

1

∑∑

=

= •=a

i iiE

a

i iiobs

rcMS

Yct

12

1

/

Page 38: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• So an equivalent test statistic is:

• At level α , reject H0 in favour of Ha if Fobs > Fα(1, n − a), or equivalently if |tobs| > tα/2 (n − a).

• The sum of squares for contrast is:

• Each contrast has 1 d.f., so the mean square for contrast is: MScontrast = SScontrast/1

STA305 week2 38

( )∑

∑=

= •= a

i iiE

a

i iiobs

rcMS

YcF

12

2

1

/

( )∑∑

=

= •= a

i ii

a

i iicontrast

rc

YcSS

12

2

1

/

Page 39: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Summary

• The hypothesis: H0 : c1μ1 + c2μ2 + · · · + caμa = 0Ha : c1μ1 + c2μ2 + · · · + caμa ≠ 0

• Test Statistic

• Decision Rule: reject H0 if Fobs > Fα(1, n − a)

STA305 week2 39

E

contrastobs MS

MSF =

Page 40: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Orthogonal Contrasts

• Very often more than one contrast will be of interest. Further, it is possible that one research question will require more than one contrast, i.e., H0 : μ1 = μ3 and μ2 = μ4

• Ideally, we want tests about different contrasts to be independent of each other.

• Suppose that the two contrasts of interest are: c1μ1 + c2μ2 + · · · + caμa and d1μ1 + d2μ2 + · · · + daμa.

• These two contrasts are orthogonal to each other they iff they satisfy:

• If there are a treatments then, SSTreat can be decomposed into set of a − 1 orthogonal contrasts, each with 1 d.f. as followsSSTreat = SScontrast1 + SScontrast2 + · · · + SScontrasta−1.

• Unless a = 2, there will be more than one set of orthogonal contrasts.

STA305 week2 40

01

=∑=

a

iiidc

Page 41: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Example - Food / Pain Killers Continued

• Refer back to the example on slide 31. The study designed with 4 treatment groups.

• The treatment sum of squares can be decomposed into 3 orthogonal contrasts.

• Since researcher interested in difference between fed & unfed, makes sense to use the following contrasts:

STA305 week2 41

Page 42: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• Exercise: verify that each is in fact a contrast.

• Exercise: verify that contrasts are orthogonal.

• Note, there is more than one way to decompose treatment sum of squares into set of orthogonal contrasts.

• For example, instead of comparing aspirin and Tylenol, might be interested in comparing food with no food.

• In this case, compare (i) aspirin with food and Tylenol with food, (ii) aspirin without food and Tylenol without food, and (iii) the 2 food groups to the 2 no-food groups.

STA305 week2 42

Page 43: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

ANOVA Table for Orthogonal Contrasts

• Contrasts to be used in experiment must be chosen at the beginning of the study.

• The hypotheses to be tested should not be selected after viewing the data.

• Once the treatment SS has been decomposed using preplanned orthogonal contrasts, the ANOVA table can be expanded to show decomposition as shown in the next slide.

STA305 week2 43

Page 44: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

STA305 week2 44

Page 45: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Example - Pressure on a Torsion Spring

STA305 week2 45

Page 46: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• The figure above shows a diagram of a torsion spring.

• Pressure is applied to arms to close the spring.

• A study has been designed to examine pressure on torsion spring.

• Five different angles between arms of spring will be studied to determined their impact on the pressure: 67º, 71 º, 75 º, 79 º, and 83 º.

• Researchers are interested in whether there is an overall difference between different angle settings.

• In addition would like to study set of orthogonal contrasts which compares the 2 smallest angles to each other and 2 largest angles to each other.

• The data collected are shown in the following slide.

STA305 week2 46

Page 47: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Torsion Spring Data

STA305 week2 47

Page 48: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Solution

STA305 week2 48

Page 49: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

Contrasts in SAS

• To do the analysis for the last example, start by creating a SAS dataset:data torsion ;input angle pressure;cards ;67 8367 8571 8771 84...........79 9083 9083 92;run ;

STA305 week2 49

Page 50: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• Here is an additional code that is required to specify the contrasts of interest:

proc glm data = torsion ;class angle ;model pressure = angle / ss3 ;contrast ’67-71’ angle 1 -1 0 0 0 ;contrast ’79-83’ angle 0 0 0 1 -1 ;contrast ’sm vs lg’ angle 1 1 0 -1 -1 ;contrast ’mid vs oth’ angle 1 1 -4 1 1 ;run ;quit ;

STA305 week2 50

Page 51: The One-Factor Modelfisher.utstat.toronto.edu/~hadas/STA305/Lecture notes/week2.pdf · STA305 week2 1 The One-Factor Model • Statistical model is used to describe data. It is an

• The ANOVA part of the output is not shown here.

• The part of the output generated by the contrast statements looks like this:

Contrast DF Contrast SS Mean Square F Value Pr>F67-71 1 3.37500000 3.37500000 2.92 0.103179-83 1 1.33333333 1.33333333 1.15 0.2958sm vs lg 1 93.35294118 93.35294118 80.70 <0.0001mid vs oth 1 0.20796354 0.20796354 0.18 0.6761

STA305 week2 51