introduction to anova (single factor) part 1 2019 · 5 diatoms & heavy metals • effect of...

38
1 Reduced slides Introduction to Analysis of Variance (ANOVA) – Part 1 Single factor

Upload: others

Post on 02-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

1

Reduced slides

Introduction to Analysis of Variance (ANOVA) – Part 1

Single factor

Page 2: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

2

The logic of Analysis of Variance

• Is the variance explained by the model >> than the residual variance

• In regression models– Variance explained by regression model vs unexplained

variance

• In ANOVA models– Variance explained by Factors >> than unexplained

variance– In common language – is the variability among

treatments greater than variability within treatments

ANOVA vs regression

• One factor ANOVA:– 1 continuous response variable and 1

categorical predictor variable (factor)

• Compare with regression:– 1 continuous response variable and 1

continuous predictor variable

Page 3: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

3

Aims

• Measure relative contribution of different sources of variation (factors or combination of factors) to total variation in response variable

• Test hypotheses about group (treatment) population means for response variable

Data layoutFactor level (group) 1 2 … i

Replicates y11 y21 ... yi1

y1j y2j ... yij

... ... ... ...y1n y2n ... yin

Population means 1 2 i

Sample means y1 y2 yi

Grand mean y estimates

Page 4: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

4

Types of predictors (factors)

• Fixed factor:– all levels or groups of interest are used in study

– conclusions are restricted to those groups

• Random factor:– random sample of all groups of interest are used

in study – typically individual groups are not of interest

– conclusions extrapolate to all possible groups

Linear model

Linear model for 1 factor ANOVA:

yij = + i + ij

where

overall population mean

i effect of ith treatment or group ( - i)

ij random or unexplained error (variation not explained by treatment effects)

Page 5: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

5

Diatoms & heavy metals

• Effect of heavy metals on species diversity of diatoms in streams in Colorado

• Response variable:– species diversity of diatoms

• Predictor variable:– heavy metal level– categorical with 4 groups (background, low, medium,

high)

• Replicates are “stations”

Null hypothesis

• H0: 1 = 2 = i =

• No difference between population group (treatment) means

• Mean species diversity of diatoms is same for 4 heavy metals levels

Page 6: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

6

H0 - fixed factor

• No effects of specific groups (treatments)

H0: 1 = 2 = … = i … = 0where i = i -

• No effect of 4 heavy metal levels on diatom species diversity

Inference is only to these 4 heavy metals

Streams and diatomsDoes diatom diversity vary by stream?

Page 7: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

7

H0 - random factor• No variation among means of all possible

groups (treatments)

H0: A2 = 0

𝐻 : ∑ �̅� 𝜇 2/ 𝑁 1

=0

where groups i=1 to N (streams) are chosen randomly

• Test: No variation in diatom species diversity between randomly chosen streams

Inference is to all streams (within ??? Region) – sampled by Nnumber of streams

Partitioning variation

• Variation in response variable partitioned into:– variation explained by difference among

groups (or treatments)

– variation not explained (residual variation, within group)

Page 8: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

8

Regression: Analysis of variance in Y

( )y yi 2

Total variation (Sum of Squares) in Y

( )y yi 2

Variation in Y explained by regression (SSRegression)

( )y yi i 2

Variation in Y unexplained by

regression (SSResidual)

Y

X

least squares regression line

y

x

yi

yi

xi

y

222 )ˆ()ˆ()( iiii yyyyyy

})ˆ( i yy }

)ˆ( ii yy )( i yy }

Page 9: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

9

1 2 3

Group

y

y11y12

y13y14

y21

y22

y23

y24

y31

y32y33

y34

Partitioning the Variance

1 2 3

Group

y

y1y

2y

3y

y21

y22

y23

y24

Partitioning the Variance

Page 10: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

10

1 2 3

Group

y

y1y

3y

2y

)()()( yyyyyy iiijij

y21)( iij yy

)( yyi

Within Groups

Between Groups

Partitioning the VarianceSum of squares

1 2 3

Group

y

y1y

2y

3y

y21

y22

y23

y24

222 )()()( iijiij yyyynyy Within Group – unexplained

Partitioning the VarianceSum of squares

Page 11: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

11

1 2 3

Group

y

y1y

3y

2y

222 )()()( iijiij yyyynyy

)( yyi Between Groups (n = 4)

Between Groups – explainedn = 4 (in this example)

Partitioning the VarianceSum of squares

ANOVA

SS Total

SS Between groups + SS Within groups (Residual)

( )y yij 2

n y yi( ) 2

( )y yij i 2

Page 12: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

12

Mean squares

• Average sum-of-squared deviations

• Degrees of freedom:– number of components minus 1

– df total [pn-1] = df groups [p-1] + df residual [p(n-1)]

• Mean square is a variance:– SS divided by df

Source SS df MS

Groups p-1

Residual p(n-1)

Total pn-1

ANOVA table

( )y yij 2

n y yi( ) 2

( )y yij i 2

)1(

)( 2

np

yy iij

)1(

)( 2

p

yyn i

Page 13: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

13

Treatments (= groups) explain nothing, ie. SSGroups equals zero

Replicate Group1 Group2 Group3 Group4

1 16.0 15.0 16.0 17.02 15.0 17.0 16.0 16.03 17.0 16.0 17.0 15.04 16.0 16.0 15.0 16.0

Mean 16.0 16.0 16.0 16.0

Grand mean = 16.0

Treatments (= groups) explain everything, ie. SSResidual equals zero

Replicate Group1 Group2 Group3 Group4

1 19.5 15.0 16.5 13.02 19.5 15.0 16.5 13.03 19.5 15.0 16.5 13.04 19.5 15.0 16.5 13.0

Mean 19.5 15.0 16.5 13.0

Grand mean = 16.0

Page 14: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

14

Testing ANOVA H0

• All population group means the same1 = 2 = i = a =

• Fixed factor:H0: 1 = 2 = … = i … = 0

– Means that there is no variability across a fixed set of group means (limited inference)

• Random factor (A):H0: A

2 = 0– Means that there is no variability across all possible group

means (broad inference)

Remember: Linear model for 1 factor ANOVA:

yij = + i + ij and orbecanwhereuu ii ,

Source SS df MS__ F

Groups p-1 MSg/MSres

Residual p(n-1)

Total pn-1

ANOVA table

( )y yij 2

n y yi( ) 2

( )y yij i 2

)1(

)( 2

np

yy iij

)1(

)( 2

p

yyn i

Page 15: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

15

F-ratio statistic

• F-ratio statistic is ratio of 2 sample variances (i.e. 2 mean squares)

• Probability distribution of F-ratio known– different distributions depending on df of 2

variances

• If homogeneity of variances holds, F-ratio follows F distribution

F distribution – a null distribution

0 1 2 3 4 5

F

P(F)3, 24 df

Page 16: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

16

Expected mean squares

• If factor is fixed and homogeneity of variance assumption holds:

– MSGroups estimates

– MSResidual estimates

1

)( 22

p

n i

2

Fratio =Msgroups

MSResidual

Testing H0 - fixed factor

• If H0 is true:– all i’s = 0– MSGroups and MSResidual

both estimate 2

– so F-ratio 1

• If H0 is false:– at least one i 0– MSGroups estimates 2 +

treatment effects– so F-ratio > 1

MSGroups

MSResidual

1

)( 22

p

n i

2

Fratio =Msgroups

MSResidual

Fratio =Msgroups

MSResidual

Msgroups

MSResidual

Page 17: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

17

• If factor is fixed and homogeneity of variance assumption holds:

– MSGroups

– MSResidual

Fratio =Msgroups

MSResidual

)1(

)( 2

np

yy iij

1

)( 22

p

n i

2

)1(

)( 2

p

yyn i

Expected Calculated

Expected mean squares(random factor)

• If factor is random and homogeneity of variance assumption holds:

– MSGroups estimates

– MSResidual estimates

22An

2

Fratio =Msgroups

MSResidual

Fratio =Msgroups

MSResidual

Msgroups

MSResidual

Page 18: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

18

Testing H0 - random factor

• If H0 is true:– A

2 = 0– MSGroups and MSResidual

both estimate 2

– so F-ratio 1

• If H0 is false:– A

2 > 0– MSGroups estimates 2 plus

added variance due to groups or treatments

– so F-ratio > 1

MSGroups

MSResidual

22An

2

Fratio =Msgroups

MSResidual

Fratio =Msgroups

MSResidual

Msgroups

MSResidual

• If factor is random and homogeneity of variance assumption holds:

– MSGroups

– MSResidual

Fratio =Msgroups

MSResidual

)1(

)( 2

np

yy iij2

)1(

)( 2

p

yyn i

Expected Calculated

22An

Page 19: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

19

Full set of slides

Page 20: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

20

Introduction to Analysis of Variance (ANOVA) – Part 1

Single factor

The logic of Analysis of Variance

• Is the variance explained by the model >> than the residual variance

• In regression models– Variance explained by regression model vs unexplained

variance

• In ANOVA models– Variance explained by Factors >> than unexplained

variance– In common language – is the variability among

treatments greater than variability within treatments

Page 21: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

21

ANOVA vs regression

• One factor ANOVA:– 1 continuous response variable and 1

categorical predictor variable (factor)

• Compare with regression:– 1 continuous response variable and 1

continuous predictor variable

Aims

• Measure relative contribution of different sources of variation (factors or combination of factors) to total variation in response variable

• Test hypotheses about group (treatment) population means for response variable

Page 22: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

22

Terminology

• Factor (predictor variable):– usually designated factor A

– number of levels/groups/treatments = p

• Number of replicates within each group– n

• Each observation:– y

Data layoutFactor level (group) 1 2 … i

Replicates y11 y21 ... yi1

y1j y2j ... yij

... ... ... ...y1n y2n ... yin

Population means 1 2 i

Sample means y1 y2 yi

Grand mean y estimates

Page 23: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

23

Types of predictors (factors)

• Fixed factor:– all levels or groups of interest are used in study

– conclusions are restricted to those groups

• Random factor:– random sample of all groups of interest are used

in study – typically individual groups are not of interest

– conclusions extrapolate to all possible groups

Linear model

Linear model for 1 factor ANOVA:

yij = + i + ij

where

overall population mean

i effect of ith treatment or group ( - i)

ij random or unexplained error (variation not explained by treatment effects)

Page 24: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

24

Compare with regression model

yi = 0 + 1xi + i

• intercept is replaced by • slope is replaced by i (treatment effect):

– predictor variable is categorical rather than continuous

– still measures “effect” of predictor variable

Diatoms & heavy metals

• Effect of heavy metals on species diversity of diatoms in streams in Colorado

• Response variable:– species diversity of diatoms

• Predictor variable:– heavy metal level– categorical with 4 groups (background, low, medium,

high)

• Replicates are “stations”

Page 25: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

25

Null hypothesis

• H0: 1 = 2 = i =

• No difference between population group (treatment) means

• Mean species diversity of diatoms is same for 4 heavy metals levels

H0 - fixed factor

• No effects of specific groups (treatments)

H0: 1 = 2 = … = i … = 0where i = i -

• No effect of 4 heavy metal levels on diatom species diversity

Inference is only to these 4 heavy metals

Page 26: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

26

Streams and diatomsDoes diatom diversity vary by stream?

H0 - random factor• No variation among means of all possible

groups (treatments)

H0: A2 = 0

𝐻 : ∑ �̅� 𝜇 2/ 𝑁 1

=0

where groups i=1 to N (streams) are chosen randomly

• Test: No variation in diatom species diversity between randomly chosen streams

Inference is to all streams (within ??? Region) – sampled by Nnumber of streams

Page 27: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

27

Basic assumption of ANOVA (single factor)

12 = 2

2 = … = i2 = … = 2

where i2 = population variance of

dependent variable (yi) in each group (this is the within group variation)

Each group (or treatment) population has similar variance– homogeneity of variance assumption

Partitioning variation

• Variation in response variable partitioned into:– variation explained by difference among

groups (or treatments)

– variation not explained (residual variation, within group)

Page 28: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

28

Regression: Analysis of variance in Y

( )y yi 2

Total variation (Sum of Squares) in Y

( )y yi 2

Variation in Y explained by regression (SSRegression)

( )y yi i 2

Variation in Y unexplained by

regression (SSResidual)

Y

X

least squares regression line

y

x

yi

yi

xi

y

222 )ˆ()ˆ()( iiii yyyyyy

})ˆ( i yy }

)ˆ( ii yy )( i yy }

Page 29: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

29

ANOVA

SS Total

SS Between groups + SS Within groups (Residual)

( )y yij 2

n y yi( ) 2

( )y yij i 2

1 2 3

Group

y

y11y12

y13y14

y21

y22

y23

y24

y31

y32y33

y34

Partitioning the Variance

Page 30: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

30

1 2 3

Group

y

y1y

2y

3y

y21

y22

y23

y24

Partitioning the Variance

1 2 3

Group

y

y1y

3y

2y

)()()( yyyyyy iiijij

y21)( iij yy

)( yyi

Within Groups

Between Groups

Partitioning the Variance

Page 31: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

31

1 2 3

Group

y

y1y

2y

3y

y21

y22

y23

y24

222 )()()( iijiij yyyynyy Within Group

Partitioning the Variance

1 2 3

Group

y

y1y

3y

2y

222 )()()( iijiij yyyynyy

)( yyi Between Groups (n = 4)

Between Groupsn = 4 (in this example)

Partitioning the Variance

Page 32: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

32

Mean squares

• Average sum-of-squared deviations

• Degrees of freedom:– number of components minus 1

– df total [pn-1] = df groups [p-1] + df residual [p(n-1)]

• Mean square is a variance:– SS divided by df

Source SS df MS

Groups p-1

Residual p(n-1)

Total pn-1

ANOVA table

( )y yij 2

n y yi( ) 2

( )y yij i 2

)1(

)( 2

np

yy iij

)1(

)( 2

p

yyn i

Page 33: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

33

Treatments (= groups) explain nothing, ie. SSGroups equals zero

Replicate Group1 Group2 Group3 Group4

1 16.0 15.0 16.0 17.02 15.0 17.0 16.0 16.03 17.0 16.0 17.0 15.04 16.0 16.0 15.0 16.0

Mean 16.0 16.0 16.0 16.0

Grand mean = 16.0

Treatments (= groups) explain everything, ie. SSResidual equals zero

Replicate Group1 Group2 Group3 Group4

1 19.5 15.0 16.5 13.02 19.5 15.0 16.5 13.03 19.5 15.0 16.5 13.04 19.5 15.0 16.5 13.0

Mean 19.5 15.0 16.5 13.0

Grand mean = 16.0

Page 34: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

34

Testing ANOVA H0

• All population group means the same1 = 2 = i = a =

• Fixed factor:H0: 1 = 2 = … = i … = 0

– Means that there is no variability across a fixed set of group means (limited inference)

• Random factor (A):H0: A

2 = 0– Means that there is no variability across all possible group

means (broad inference)

Remember: Linear model for 1 factor ANOVA:

yij = + i + ij and orbecanwhereuu ii ,

Source SS df MS__ F

Groups p-1 MSg/MSres

Residual p(n-1)

Total pn-1

ANOVA table

( )y yij 2

n y yi( ) 2

( )y yij i 2

)1(

)( 2

np

yy iij

)1(

)( 2

p

yyn i

Page 35: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

35

F-ratio statistic

• F-ratio statistic is ratio of 2 sample variances (i.e. 2 mean squares)

• Probability distribution of F-ratio known– different distributions depending on df of 2

variances

• If homogeneity of variances holds, F-ratio follows F distribution

F distribution

0 1 2 3 4 5

F

P(F)3, 24 df

Page 36: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

36

Expected mean squares

• If factor is fixed and homogeneity of variance assumption holds:

– MSGroups estimates

– MSResidual estimates

1

)( 22

p

n i

2

Fratio =Msgroups

MSResidual

Testing H0 - fixed factor

• If H0 is true:– all i’s = 0– MSGroups and MSResidual

both estimate 2

– so F-ratio 1

• If H0 is false:– at least one i 0– MSGroups estimates 2 +

treatment effects– so F-ratio > 1

MSGroups

MSResidual

1

)( 22

p

n i

2

Fratio =Msgroups

MSResidual

Fratio =Msgroups

MSResidual

Msgroups

MSResidual

Page 37: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

37

• If factor is fixed and homogeneity of variance assumption holds:

– MSGroups

– MSResidual

Fratio =Msgroups

MSResidual

)1(

)( 2

np

yy iij

1

)( 22

p

n i

2

)1(

)( 2

p

yyn i

Expected Calculated

Expected mean squares(random factor)

• If factor is random and homogeneity of variance assumption holds:

– MSGroups estimates

– MSResidual estimates

22An

2

Fratio =Msgroups

MSResidual

Fratio =Msgroups

MSResidual

Msgroups

MSResidual

Page 38: Introduction to ANOVA (single Factor) Part 1 2019 · 5 Diatoms & heavy metals • Effect of heavy metals on species diversity of diatoms in streams in Colorado • Response variable:

38

Testing H0 - random factor

• If H0 is true:– A

2 = 0– MSGroups and MSResidual

both estimate 2

– so F-ratio 1

• If H0 is false:– A

2 > 0– MSGroups estimates 2 plus

added variance due to groups or treatments

– so F-ratio > 1

MSGroups

MSResidual

22An

2

Fratio =Msgroups

MSResidual

Fratio =Msgroups

MSResidual

Msgroups

MSResidual

• If factor is random and homogeneity of variance assumption holds:

– MSGroups

– MSResidual

Fratio =Msgroups

MSResidual

)1(

)( 2

np

yy iij2

)1(

)( 2

p

yyn i

Expected Calculated

22An