stat479 assignment #6 solution key fall 2013 problem 1mervyn/stat479_fall2013/class/fall2013... ·...

16
Stat479 Assignment #6 Solution Key Fall 2013 Problem 1 (a) Source d.f. SS MS F p-value Regression 1 2059.78145 2059.78145 29.59 <.0001 Error 14 974.65605 69.61829 Corrected Total 15 3034.43750 (b) 0 =148.05068, s.e.( 0 )= 11.56292; 1 = -1.02359, s.e.( 1 ) = 0.18818 = 148.050681.02359 (c) Expected loss in mean muscle mass, E(y) for 1 year increase in age=1.02359. Thus expected loss in mean muscle mass, for 5-year increase in age= 5 X 1.02359 = 5.11795 (d) 2 =. = . % This means that 67.88% of the variability in muscle mass is explained by the predicted value from a linear regression model using age as the explanatory variable. (e) 95% C.I. for 1 : (1.42720,0.61998) We have 95% confidence that the expected increase in mean muscle mass, E(y) for 1-year) increase in age lies inside the above interval. (f) A t-test for 0 : 1 =0 against 1 : 1 0 is: t-value= 5.44 for which the p-value is <.0001; Thus we reject the null hypothesis at = .05 (g) From the SAS output the point estimate E(y) at x= 60 i.e. (60) is 86.6353 A 95% confidence interval for the mean muscle mass , E(y) at x= 60 is (82.1579, 91.1127) (h) I. See SAS Output attached. II. See attached plot: Assumption of constant variance as x increases appear to be satisfied as the residuals are evenly spread around the zero line as x increases. III. See attached plots: The above is also true of the plot of residuals against the predicted values. The normal probability plot of the studentized residuals does not show a pattern to indicate that the distribution of the errors deviates from a normal distribution.

Upload: vunguyet

Post on 05-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Stat479 Assignment #6 Solution Key Fall 2013

Problem 1

(a)

Source d.f. SS MS F p-value Regression 1 2059.78145 2059.78145 29.59 <.0001 Error 14 974.65605 69.61829 Corrected Total 15 3034.43750

(b) �̂�0=148.05068, s.e.( �̂�0) = 11.56292; �̂�1= -1.02359, s.e.( �̂�1) = 0.18818

𝒚�= 148.05068−1.02359 𝒙

(c) Expected loss in mean muscle mass, E(y) for 1 year increase in age=1.02359. Thus

expected loss in mean muscle mass, for 5-year increase in age= 5 X 1.02359 = 5.11795

(d) 𝑅2 =.𝟔𝟕𝟖𝟖 = 𝟔𝟕.𝟖𝟖%

This means that 67.88% of the variability in muscle mass is explained by the predicted value from a linear regression model using age as the explanatory variable.

(e) 95% C.I. for 𝛽1: (−1.42720,−0.61998)

We have 95% confidence that the expected increase in mean muscle mass, E(y) for 1-year) increase in age lies inside the above interval.

(f) A t-test for 𝐻0: 𝛽1 = 0 against 𝐻1: 𝛽1 ≠ 0 is:

t-value= −5.44 for which the p-value is <.0001; Thus we reject the null hypothesis at 𝛼 = .05

(g) From the SAS output the point estimate E(y) at x= 60 i.e. 𝜇(60) is 86.6353

A 95% confidence interval for the mean muscle mass , E(y) at x= 60 is (82.1579, 91.1127)

(h)

I. See SAS Output attached. II. See attached plot: Assumption of constant variance as x increases appear to be satisfied as the

residuals are evenly spread around the zero line as x increases. III. See attached plots: The above is also true of the plot of residuals against the predicted values.

The normal probability plot of the studentized residuals does not show a pattern to indicate that the distribution of the errors deviates from a normal distribution.

Simple Linear Regression of Horsepower on Speed

The REG Procedure Model: MODEL1

Dependent Variable: y Muscle Mass

05:47 Monday, December 02, 2013 1

Number of Observations Read 17

Number of Observations Used 16

Number of Observations with Missing Values 1

Analysis of Variance

Source DF Sum of

Squares Mean

Square F Value Pr > F

Model 1 2059.78145 2059.78145 29.59 <.0001

Error 14 974.65605 69.61829

Corrected Total 15 3034.43750

Root MSE 8.34376 R-Square 0.6788

Dependent Mean 86.18750 Adj R-Sq 0.6559

Coeff Var 9.68094

Parameter Estimates

Variable Label DF Parameter

Estimate Standard

Error t Value Pr > |t| 95% Confidence

Limits

Intercept Intercept 1 148.05068 11.56292 12.80 <.0001 123.25067 172.85068

x Age 1 -1.02359 0.18818 -5.44 <.0001 -1.42720 -0.61998

Simple Linear Regression of Horsepower on Speed

The REG Procedure Model: MODEL1

Dependent Variable: y Muscle Mass

05:47 Monday, December 02, 2013 2

Output Statistics

Obs Dependent

Variable Predicted

Value Std Error

Mean Predict 95% CL Mean Residual Std Error Residual

Student Residual -2-1 0 1 2

Cook's D

1 82.0000 75.3758 2.8813 69.1960 81.5556 6.6242 7.830 0.846 | |* | 0.048

2 91.0000 82.5410 2.1910 77.8417 87.2402 8.4590 8.051 1.051 | |** | 0.041

3 100.0000 104.0363 3.8883 95.6968 112.3759 -4.0363 7.382 -0.547 | *| | 0.041

4 68.0000 79.4702 2.4241 74.2710 84.6694 -11.4702 7.984 -1.437 | **| | 0.095

5 87.0000 90.7297 2.2469 85.9106 95.5488 -3.7297 8.036 -0.464 | | | 0.008

6 73.0000 73.3287 3.1527 66.5667 80.0906 -0.3287 7.725 -0.0425 | | | 0.000

7 78.0000 78.4466 2.5252 73.0307 83.8625 -0.4466 7.952 -0.0562 | | | 0.000

8 80.0000 90.7297 2.2469 85.9106 95.5488 -10.7297 8.036 -1.335 | **| | 0.070

9 65.0000 70.2579 3.5955 62.5463 77.9695 -5.2579 7.529 -0.698 | *| | 0.056

10 84.0000 81.5174 2.2557 76.6793 86.3554 2.4826 8.033 0.309 | | | 0.004

11 116.0000 101.9892 3.5764 94.3186 109.6597 14.0108 7.538 1.859 | |*** | 0.389

12 76.0000 88.6825 2.1358 84.1017 93.2633 -12.6825 8.066 -1.572 | ***| | 0.087

13 97.0000 101.9892 3.5764 94.3186 109.6597 -4.9892 7.538 -0.662 | *| | 0.049

14 100.0000 93.8004 2.5120 88.4128 99.1881 6.1996 7.957 0.779 | |* | 0.030

15 105.0000 97.8948 2.9973 91.4663 104.3233 7.1052 7.787 0.912 | |* | 0.062

16 77.0000 68.2107 3.9082 59.8285 76.5929 8.7893 7.372 1.192 | |** | 0.200

17 . 86.6353 2.0876 82.1579 91.1127 . . . .

Sum of Residuals 0

Sum of Squared Residuals 974.65605

Predicted Residual SS (PRESS) 1277.86656

Simple Linear Regression of Horsepower on Speed

The REG Procedure Model: MODEL1

05:47 Monday, December 02, 2013 3

Simple Linear Regression of Horsepower on Speed

The REG Procedure Model: MODEL1

05:47 Monday, December 02, 2013 4

Simple Linear Regression of Horsepower on Speed

The REG Procedure Model: MODEL1

05:47 Monday, December 02, 2013 5

Simple Linear Regression of Horsepower on Speed

The REG Procedure Model: MODEL1

05:47 Monday, December 02, 2013 6

Problem 2

The case statistics and the plots shown (see attached SAS outputs for this part) show clearly that

(a) Car O is an x-outlier. The Hat Diag for this case is 0.27 which is markedly larger than the other hats (as well as it is larger than the cutoff 4/16= .25). It stands well away from the other cars in the x-direction in the plots MPG vs. Weight. Clearly, several plots shown in the diagnostics panel are affected by this case.

(b) Car A is a possible y-outlier. Its observed value is much smaller than the value predicted by the fitted line. The RStudent for case A is 3.91 which is larger than the 5% critical value of 3.62 from Table B.10.

(c) The two largest Cooks’D values are the case A and O above. For Car O, this statistic is large primarily because it is a high leverage case (i.e. the Hat Diag is large) and not because it is a y-outlier. Thus it fits the model well but has very high influence. For Car A, Cooks D is large clearly because it is a y-outlier, and therefore does not fit the model very well at all.

(d) The following is a summary of statistics resulting from fitting the model to three different data sets:

Model Estimated 0β Estimated 1β MSE R 2 All data 41.57 -.00681 8.57 .69

A deleted 43.38 -.00725 4.24 .84 O deleted 39.21 -.00608 8.43 .60

• The case statistics for model fitted with A deleted improves the model significantly. indicate

the Car A is still influential but not a y-outlier and the plots also support this. • The case statistics for model fitted with Car O deleted does not give a better fitting model

overall.

(e) Clearly, the case statistics in Parts a), b) and c) for the model fitted for the complete data set indicated the outcome of part d) . That is, removing a case that is highly influential affects the fit of the model. If the influential case is a y-outlier, the model fit is expected to “improve.” Thus instead of refitting models with cases deleted, the user can use the case statistics from the original fit to make similar conclusions. This is the way these statistics are meant to be used. Also the other statistics like DFFITS and DFBETAS can be used to determine how each of the suspected cases affect the overall model fit.

The SAS System

The REG Procedure Model: MODEL1

Dependent Variable: y MPG

05:51 Monday, December 02, 2013 1

Number of Observations Read 16

Number of Observations Used 16

Analysis of Variance

Source DF Sum of

Squares Mean

Square F Value Pr > F

Model 1 262.30984 262.30984 30.60 <.0001

Error 14 120.01453 8.57247

Corrected Total 15 382.32437

Root MSE 2.92788 R-Square 0.6861

Dependent Mean 21.71875 Adj R-Sq 0.6637

Coeff Var 13.48087

Parameter Estimates

Variable Label DF Parameter

Estimate Standard

Error t Value Pr > |t|

Intercept Intercept 1 41.57248 3.66300 11.35 <.0001

x Weight(lbs.) 1 -0.00681 0.00123 -5.53 <.0001

The SAS System

The REG Procedure Model: MODEL1

Dependent Variable: y MPG

05:51 Monday, December 02, 2013 2

Output Statistics

Obs Car Hat Diag

H Cov

Ratio DFFITS

DFBETAS

Intercept x

1 A 0.0780 0.2649 -1.1390 -0.7017 0.5082

2 B 0.0628 1.2155 -0.0886 -0.0237 0.0062

3 C 0.1254 1.1112 -0.4149 -0.3465 0.2938

4 D 0.0782 1.1924 0.1734 -0.0452 0.0777

5 E 0.1109 1.2972 0.0672 -0.0334 0.0444

6 F 0.1247 1.2746 0.1905 -0.1049 0.1346

7 G 0.1379 1.1256 -0.4404 0.2599 -0.3257

8 H 0.0653 1.1686 0.1662 0.0664 -0.0346

9 I 0.0721 1.0950 0.2629 -0.0452 0.0961

10 J 0.0810 1.2597 -0.0323 0.0095 -0.0154

11 K 0.1628 1.3843 -0.0301 0.0194 -0.0236

12 L 0.1794 1.3776 0.1912 -0.1287 0.1544

13 M 0.1532 1.0074 0.6248 0.5508 -0.4807

14 N 0.0985 1.2746 0.0812 0.0611 -0.0491

15 O 0.2730 1.3306 0.6808 0.6509 -0.5978

16 P 0.1967 1.3895 -0.2480 -0.2286 0.2048

Output Statistics

Obs Car Dependent

Variable Predicted

Value Std Error

Mean Predict Residual Std Error Residual

Student Residual -2-1 0 1 2

Cook's D RStudent

1 A 16.0000 23.7375 0.8179 -7.7375 2.811 -2.752 | *****| | 0.321 -3.9150

2 B 21.0000 22.0017 0.7338 -1.0017 2.834 -0.353 | | | 0.004 -0.3421

3 C 22.8000 25.7797 1.0367 -2.9797 2.738 -1.088 | **| | 0.085 -1.0960

4 D 21.4000 19.6872 0.8189 1.7128 2.811 0.609 | |* | 0.016 0.5951

5 E 18.7000 18.1556 0.9750 0.5444 2.761 0.197 | | | 0.002 0.1903

6 F 19.1000 17.6791 1.0340 1.4209 2.739 0.519 | |* | 0.019 0.5047

7 G 14.3000 17.2706 1.0874 -2.9706 2.718 -1.093 | **| | 0.096 -1.1010

8 H 24.4000 22.5803 0.7484 1.8197 2.831 0.643 | |* | 0.014 0.6288

9 I 22.8000 20.1297 0.7863 2.6703 2.820 0.947 | |* | 0.035 0.9431

10 J 19.2000 19.5170 0.8332 -0.3170 2.807 -0.113 | | | 0.001 -0.1089

11 K 16.4000 16.5899 1.1813 -0.1899 2.679 -0.0709 | | | 0.000 -0.0683

12 L 17.3000 16.1815 1.2401 1.1185 2.652 0.422 | | | 0.019 0.4090

13 M 30.4000 26.5966 1.1460 3.8034 2.694 1.412 | |** | 0.180 1.4689

14 N 25.5000 24.7926 0.9190 0.7074 2.780 0.254 | | | 0.004 0.2458

15 O 31.9000 29.1493 1.5298 2.7507 2.496 1.102 | |** | 0.228 1.1110

16 P 26.3000 27.6517 1.2985 -1.3517 2.624 -0.515 | *| | 0.032 -0.5011

The SAS System

The REG Procedure Model: MODEL1

Dependent Variable: y MPG

05:51 Monday, December 02, 2013 3

05:51 Monday, December 02, 2013 4

The SAS System

The REG Procedure Model: MODEL1

Dependent Variable: y MPG

02:44 Wednesday, November 06, 2013 1

Number of Observations Read 15

Number of Observations Used 15

Analysis of Variance

Source DF Sum of

Squares Mean

Square F Value Pr > F

Model 1 292.36215 292.36215 69.01 <.0001

Error 13 55.07785 4.23676

Corrected Total 14 347.44000

Root MSE 2.05834 R-Square 0.8415

Dependent Mean 22.10000 Adj R-Sq 0.8293

Coeff Var 9.31375

Parameter Estimates

Variable Label DF Parameter

Estimate Standard

Error t Value Pr > |t|

Intercept Intercept 1 43.37935 2.61617 16.58 <.0001

x Weight(lbs.) 1 -0.00725 0.00087239 -8.31 <.0001

02:44 Wednesday, November 06, 2013 4

The SAS System

The REG Procedure Model: MODEL1

Dependent Variable: y MPG

02:46 Wednesday, November 06, 2013 1

Number of Observations Read 15

Number of Observations Used 15

Analysis of Variance

Source DF Sum of

Squares Mean

Square F Value Pr > F

Model 1 162.14910 162.14910 19.23 0.0007

Error 13 109.60690 8.43130

Corrected Total 14 271.75600

Root MSE 2.90367 R-Square 0.5967

Dependent Mean 21.04000 Adj R-Sq 0.5656

Coeff Var 13.80071

Parameter Estimates

Variable Label DF Parameter

Estimate Standard

Error t Value Pr > |t|

Intercept Intercept 1 39.20810 4.21015 9.31 <.0001

x Weight(lbs.) 1 -0.00608 0.00139 -4.39 0.0007

02:46 Wednesday, November 06, 2013 4