reading: sections 5.3{5.5. introduction to the spock ...€¦ · the spock analysis, model checks...

24
Stat 529 (Winter 2011) The Spock analysis, model checks and robustness Reading: Sections 5.3–5.5. Introduction to the Spock dataset (handout) The exploratory data analysis Performing the basic one-way ANOVA Robustness considerations Checking assumptions – revisiting the ANOVA model Fitted values and residuals Producing residual plots Residual plots for the Spock ANOVA analysis The all-in-one graph from the ANOVA command Comparing different models using the F test Diagnostic plots for the finally chosen model Completing the inference 1

Upload: others

Post on 21-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Stat 529 (Winter 2011)

The Spock analysis, model checks and robustness

Reading: Sections 5.3–5.5.

• Introduction to the Spock dataset (handout)

– The exploratory data analysis

– Performing the basic one-way ANOVA

• Robustness considerations

– Checking assumptions – revisiting the ANOVA model

– Fitted values and residuals

– Producing residual plots

– Residual plots for the Spock ANOVA analysis

– The all-in-one graph from the ANOVA command

• Comparing different models using the F test

• Diagnostic plots for the finally chosen model

• Completing the inference

1

Page 2: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

The Spock dataset

• See the handout for a description of the data, question of

interest, and exploratory data analysis.

• A discussion of the summaries:

2

Page 3: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Performing the basic one-way ANOVA

• Stat → ANOVA → One-Way.

– Response: Percentage of women.

– Factor: Judge.

– Click Store residuals and Store fits.

– Click OK.

• RESI1 are the residuals.

– Rename this to residuals.

• FITS1 are the fitted values.

– Rename this to fitted values.

(We will need the residuals and fitted values to diagnose the

fit of the model graphically.)

3

Page 4: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

ANOVA output

One-way ANOVA: Percentage of women versus Judge

Source DF SS MS F P

Judge 6 1927.1 321.2 6.72 0.000

Error 39 1864.4 47.8

Total 45 3791.5

S = 6.914 R-Sq = 50.83% R-Sq(adj) = 43.26%

Individual 95% CIs For Mean Based on

Pooled StDev

Level N Mean StDev --------+---------+---------+---------+-

A 5 34.120 11.942 (-------*------)

B 6 33.617 6.582 (------*------)

C 9 29.100 4.593 (----*-----)

D 2 27.000 3.818 (------------*-----------)

E 6 26.967 9.010 (------*------)

F 9 26.800 5.969 (-----*----)

Spock 9 14.622 5.039 (-----*-----)

--------+---------+---------+---------+-

16.0 24.0 32.0 40.0

Pooled StDev = 6.914

4

Page 5: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Comments on the ANOVA output

5

Page 6: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Robustness considerations

Taken from the textbook, Section 5.5.1:

• Normality is not crucial as long as experiment is balanced

and there are no long-tailed or highly skewed distributions.

• Independence within and across groups is critical. If inde-

pendence is lacking different analyzes should be attempted.

• The assumption of equal standard deviations is crucial

(e.g., see Display 5.13).

• The tools are not resistant to severely outlying observa-

tions.

6

Page 7: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Checking assumptions – revisiting the ANOVA model

• Remember the additive model for our data:

Yij = µi + εij; i = 1, . . . , I, j = 1, . . . , ni.

• One way to check that the model fits well is to check the

assumptions made for the errors, εij. We usually assume:

1. Errors have mean zero and constant error variance σ2.

2. The errors are (usually) normally distributed.

3. The errors are independent across i and j.

• We will estimate the errors using the residuals.

7

Page 8: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Fitted values and residuals

• For any model (reduced or full) that we consider let µi be

the estimate of the mean in the ith population.

• Then the fitted value for case j in sample i is:

Yij = µi

• The residual for individual j in sample i is:

eij = Yij − Yij = Yij − µi.

• Example: In a model in which the mean is different for each

population:

Yij =

eij =

8

Page 9: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Properties of the residuals

• If the model fits well, the residuals have the following

properties.

1. Residuals are centered around zero with constant spread.

2. The residuals are normally distributed about zero.

3. There should be no obvious patterns in residuals across i

and j. There should certainly be no relationships between

the residuals and the fitted values in a well fitting model.

9

Page 10: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Some example residual plots

• Plot the residuals versus the fitted values:

– Check for appropriateness of the fit.

– Do we need to transform the response?

– Check for constancy of the variance of errors.

– Look for outliers.

• Plot the residuals versus the population identifier.

– Check adequacy of fit for each population.

– Curvature may indicate the need to transform.

• Normal Q-Q plot of residuals.

– Check that normality is reasonable for the residuals.

• Residuals versus time or collection order.

– Check for systematic problems in the residuals

(e.g., serial correlation).

10

Page 11: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Producing residual plots for the Spock analysis

• The next four slides show a number of residual plots:

– The 1st plot was created using Graph→ Scatterplot.

– The 2nd plot used Graph→ Individual value plots.

– The 3rd plot is from Graph → Boxplot.

– The last figure is a Graph → Probability plot.

• I added reference lines at y = 0 as needed.

• We would need some time variable (e.g. day the venire was

compiled) to check for serial dependence in the residuals.

11

Page 12: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Residual plots for the Spock ANOVA analysis

12

Page 13: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Residual plots, continued

13

Page 14: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Comments on the residual diagnostic plots

14

Page 15: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

The all-in-one graph from the ANOVA command

• The ANOVA command can produce graphs of its own.

– In Stat→ANOVA→One-Way select Graphs and

then Four in One.

• Not very customizable.

– Advice for a good analysis - do not use these graphs – use

your own!

15

Page 16: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Comparing different models using the F test

• Here are some models we could consider for the Spock dataset:

1. One population mean explains all the judges.

2. One mean for Spock’s judge, and another mean for all the

other judges.

3. Each judge needs a single mean.

• Let us compare these models using F tests.

16

Page 17: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

A mean for each judge

One-way ANOVA: Percentage of women versus Judge

Source DF SS MS F P

Judge 6 1927.1 321.2 6.72 0.000

Error 39 1864.4 47.8

Total 45 3791.5

S = 6.914 R-Sq = 50.83% R-Sq(adj) = 43.26%

Individual 95% CIs For Mean Based on Pooled StDev

Level N Mean StDev --------+---------+---------+---------+-

A 5 34.120 11.942 (-------*------)

B 6 33.617 6.582 (------*------)

C 9 29.100 4.593 (----*-----)

D 2 27.000 3.818 (------------*-----------)

E 6 26.967 9.010 (------*------)

F 9 26.800 5.969 (-----*----)

Spock 9 14.622 5.039 (-----*-----)

--------+---------+---------+---------+-

16.0 24.0 32.0 40.0

Pooled StDev = 6.914

17

Page 18: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

A mean for Spock’s judge and a mean for all

the other judges

One-way ANOVA: Percentage of women versus Is Spock

Source DF SS MS F P

Is Spock 1 1600.6 1600.6 32.15 0.000

Error 44 2190.9 49.8

Total 45 3791.5

S = 7.056 R-Sq = 42.22% R-Sq(adj) = 40.90%

Individual 95% CIs For Mean Based on Pooled StDev

Level N Mean StDev ----+---------+---------+---------+-----

No 37 29.492 7.431 (---*---)

Yes 9 14.622 5.039 (-------*-------)

----+---------+---------+---------+-----

12.0 18.0 24.0 30.0

Pooled StDev = 7.056

18

Page 19: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

The F tests (exercise!)

Compare models 1 versus 2, 1 versus 3, and 2 versus 3.

• 1 versus 2:

Fobs = 32.15

The p-value ≤ 0.001 (reject H0).

Conclusion:

• 1 versus 3:

Fobs = 6.72

The p-value ≤ 0.001 (reject H0).

Conclusion:

• 2 versus 3:

Fobs =(2190.9− 1864.4)/(44− 39)

47.8

=326.5/5

47.8=

65.3

47.8= 1.37.

The p-value is 1− 0.743 = 0.257 (fail to reject H0).

Conclusion:

19

Page 20: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

The F tests, continued

• We used the following MINITAB output:

F distribution with 5 DF in numerator and 39 DF in denominator

x P(X<=x)

1.37 0.743405

20

Page 21: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Diagnostic plots for Model 2

• Now we check the model assumptions using diagnostic plots

based on the residuals of model 2.

– Important: The model you choose determines the es-

timated mean, µi, that will be used in calculating the

residuals.

• Comments on the fit of model 2:

21

Page 22: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Diagnostic plots for Model 2

22

Page 23: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Diagnostic plots for Model 2, continued

23

Page 24: Reading: Sections 5.3{5.5. Introduction to the Spock ...€¦ · The Spock analysis, model checks and robustness Reading: Sections 5.3{5.5. Introduction to the Spock dataset (handout)

Completing the inference

• Do not use this part of the ANOVA output as a final in-

ference for comparing the groups means:

Individual 95% CIs For Mean Based on Pooled StDev

Level N Mean StDev ----+---------+---------+---------+-----

No 37 29.492 7.431 (---*---)

Yes 9 14.622 5.039 (-------*-------)

----+---------+---------+---------+-----

12.0 18.0 24.0 30.0

• What is an appropriate inference to use instead?

(side issue: should we pool variances?)

24