stat479 assignment #6 solution key fall 2013 problem 1mervyn/stat479_fall2013/class/fall2013... ·...
TRANSCRIPT
Stat479 Assignment #6 Solution Key Fall 2013
Problem 1
(a)
Source d.f. SS MS F p-value Regression 1 2059.78145 2059.78145 29.59 <.0001 Error 14 974.65605 69.61829 Corrected Total 15 3034.43750
(b) �̂�0=148.05068, s.e.( �̂�0) = 11.56292; �̂�1= -1.02359, s.e.( �̂�1) = 0.18818
𝒚�= 148.05068−1.02359 𝒙
(c) Expected loss in mean muscle mass, E(y) for 1 year increase in age=1.02359. Thus
expected loss in mean muscle mass, for 5-year increase in age= 5 X 1.02359 = 5.11795
(d) 𝑅2 =.𝟔𝟕𝟖𝟖 = 𝟔𝟕.𝟖𝟖%
This means that 67.88% of the variability in muscle mass is explained by the predicted value from a linear regression model using age as the explanatory variable.
(e) 95% C.I. for 𝛽1: (−1.42720,−0.61998)
We have 95% confidence that the expected increase in mean muscle mass, E(y) for 1-year) increase in age lies inside the above interval.
(f) A t-test for 𝐻0: 𝛽1 = 0 against 𝐻1: 𝛽1 ≠ 0 is:
t-value= −5.44 for which the p-value is <.0001; Thus we reject the null hypothesis at 𝛼 = .05
(g) From the SAS output the point estimate E(y) at x= 60 i.e. 𝜇(60) is 86.6353
A 95% confidence interval for the mean muscle mass , E(y) at x= 60 is (82.1579, 91.1127)
(h)
I. See SAS Output attached. II. See attached plot: Assumption of constant variance as x increases appear to be satisfied as the
residuals are evenly spread around the zero line as x increases. III. See attached plots: The above is also true of the plot of residuals against the predicted values.
The normal probability plot of the studentized residuals does not show a pattern to indicate that the distribution of the errors deviates from a normal distribution.
Simple Linear Regression of Horsepower on Speed
The REG Procedure Model: MODEL1
Dependent Variable: y Muscle Mass
05:47 Monday, December 02, 2013 1
Number of Observations Read 17
Number of Observations Used 16
Number of Observations with Missing Values 1
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 1 2059.78145 2059.78145 29.59 <.0001
Error 14 974.65605 69.61829
Corrected Total 15 3034.43750
Root MSE 8.34376 R-Square 0.6788
Dependent Mean 86.18750 Adj R-Sq 0.6559
Coeff Var 9.68094
Parameter Estimates
Variable Label DF Parameter
Estimate Standard
Error t Value Pr > |t| 95% Confidence
Limits
Intercept Intercept 1 148.05068 11.56292 12.80 <.0001 123.25067 172.85068
x Age 1 -1.02359 0.18818 -5.44 <.0001 -1.42720 -0.61998
Simple Linear Regression of Horsepower on Speed
The REG Procedure Model: MODEL1
Dependent Variable: y Muscle Mass
05:47 Monday, December 02, 2013 2
Output Statistics
Obs Dependent
Variable Predicted
Value Std Error
Mean Predict 95% CL Mean Residual Std Error Residual
Student Residual -2-1 0 1 2
Cook's D
1 82.0000 75.3758 2.8813 69.1960 81.5556 6.6242 7.830 0.846 | |* | 0.048
2 91.0000 82.5410 2.1910 77.8417 87.2402 8.4590 8.051 1.051 | |** | 0.041
3 100.0000 104.0363 3.8883 95.6968 112.3759 -4.0363 7.382 -0.547 | *| | 0.041
4 68.0000 79.4702 2.4241 74.2710 84.6694 -11.4702 7.984 -1.437 | **| | 0.095
5 87.0000 90.7297 2.2469 85.9106 95.5488 -3.7297 8.036 -0.464 | | | 0.008
6 73.0000 73.3287 3.1527 66.5667 80.0906 -0.3287 7.725 -0.0425 | | | 0.000
7 78.0000 78.4466 2.5252 73.0307 83.8625 -0.4466 7.952 -0.0562 | | | 0.000
8 80.0000 90.7297 2.2469 85.9106 95.5488 -10.7297 8.036 -1.335 | **| | 0.070
9 65.0000 70.2579 3.5955 62.5463 77.9695 -5.2579 7.529 -0.698 | *| | 0.056
10 84.0000 81.5174 2.2557 76.6793 86.3554 2.4826 8.033 0.309 | | | 0.004
11 116.0000 101.9892 3.5764 94.3186 109.6597 14.0108 7.538 1.859 | |*** | 0.389
12 76.0000 88.6825 2.1358 84.1017 93.2633 -12.6825 8.066 -1.572 | ***| | 0.087
13 97.0000 101.9892 3.5764 94.3186 109.6597 -4.9892 7.538 -0.662 | *| | 0.049
14 100.0000 93.8004 2.5120 88.4128 99.1881 6.1996 7.957 0.779 | |* | 0.030
15 105.0000 97.8948 2.9973 91.4663 104.3233 7.1052 7.787 0.912 | |* | 0.062
16 77.0000 68.2107 3.9082 59.8285 76.5929 8.7893 7.372 1.192 | |** | 0.200
17 . 86.6353 2.0876 82.1579 91.1127 . . . .
Sum of Residuals 0
Sum of Squared Residuals 974.65605
Predicted Residual SS (PRESS) 1277.86656
Simple Linear Regression of Horsepower on Speed
The REG Procedure Model: MODEL1
05:47 Monday, December 02, 2013 3
Simple Linear Regression of Horsepower on Speed
The REG Procedure Model: MODEL1
05:47 Monday, December 02, 2013 4
Simple Linear Regression of Horsepower on Speed
The REG Procedure Model: MODEL1
05:47 Monday, December 02, 2013 5
Simple Linear Regression of Horsepower on Speed
The REG Procedure Model: MODEL1
05:47 Monday, December 02, 2013 6
Problem 2
The case statistics and the plots shown (see attached SAS outputs for this part) show clearly that
(a) Car O is an x-outlier. The Hat Diag for this case is 0.27 which is markedly larger than the other hats (as well as it is larger than the cutoff 4/16= .25). It stands well away from the other cars in the x-direction in the plots MPG vs. Weight. Clearly, several plots shown in the diagnostics panel are affected by this case.
(b) Car A is a possible y-outlier. Its observed value is much smaller than the value predicted by the fitted line. The RStudent for case A is 3.91 which is larger than the 5% critical value of 3.62 from Table B.10.
(c) The two largest Cooks’D values are the case A and O above. For Car O, this statistic is large primarily because it is a high leverage case (i.e. the Hat Diag is large) and not because it is a y-outlier. Thus it fits the model well but has very high influence. For Car A, Cooks D is large clearly because it is a y-outlier, and therefore does not fit the model very well at all.
(d) The following is a summary of statistics resulting from fitting the model to three different data sets:
Model Estimated 0β Estimated 1β MSE R 2 All data 41.57 -.00681 8.57 .69
A deleted 43.38 -.00725 4.24 .84 O deleted 39.21 -.00608 8.43 .60
• The case statistics for model fitted with A deleted improves the model significantly. indicate
the Car A is still influential but not a y-outlier and the plots also support this. • The case statistics for model fitted with Car O deleted does not give a better fitting model
overall.
(e) Clearly, the case statistics in Parts a), b) and c) for the model fitted for the complete data set indicated the outcome of part d) . That is, removing a case that is highly influential affects the fit of the model. If the influential case is a y-outlier, the model fit is expected to “improve.” Thus instead of refitting models with cases deleted, the user can use the case statistics from the original fit to make similar conclusions. This is the way these statistics are meant to be used. Also the other statistics like DFFITS and DFBETAS can be used to determine how each of the suspected cases affect the overall model fit.
The SAS System
The REG Procedure Model: MODEL1
Dependent Variable: y MPG
05:51 Monday, December 02, 2013 1
Number of Observations Read 16
Number of Observations Used 16
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 1 262.30984 262.30984 30.60 <.0001
Error 14 120.01453 8.57247
Corrected Total 15 382.32437
Root MSE 2.92788 R-Square 0.6861
Dependent Mean 21.71875 Adj R-Sq 0.6637
Coeff Var 13.48087
Parameter Estimates
Variable Label DF Parameter
Estimate Standard
Error t Value Pr > |t|
Intercept Intercept 1 41.57248 3.66300 11.35 <.0001
x Weight(lbs.) 1 -0.00681 0.00123 -5.53 <.0001
The SAS System
The REG Procedure Model: MODEL1
Dependent Variable: y MPG
05:51 Monday, December 02, 2013 2
Output Statistics
Obs Car Hat Diag
H Cov
Ratio DFFITS
DFBETAS
Intercept x
1 A 0.0780 0.2649 -1.1390 -0.7017 0.5082
2 B 0.0628 1.2155 -0.0886 -0.0237 0.0062
3 C 0.1254 1.1112 -0.4149 -0.3465 0.2938
4 D 0.0782 1.1924 0.1734 -0.0452 0.0777
5 E 0.1109 1.2972 0.0672 -0.0334 0.0444
6 F 0.1247 1.2746 0.1905 -0.1049 0.1346
7 G 0.1379 1.1256 -0.4404 0.2599 -0.3257
8 H 0.0653 1.1686 0.1662 0.0664 -0.0346
9 I 0.0721 1.0950 0.2629 -0.0452 0.0961
10 J 0.0810 1.2597 -0.0323 0.0095 -0.0154
11 K 0.1628 1.3843 -0.0301 0.0194 -0.0236
12 L 0.1794 1.3776 0.1912 -0.1287 0.1544
13 M 0.1532 1.0074 0.6248 0.5508 -0.4807
14 N 0.0985 1.2746 0.0812 0.0611 -0.0491
15 O 0.2730 1.3306 0.6808 0.6509 -0.5978
16 P 0.1967 1.3895 -0.2480 -0.2286 0.2048
Output Statistics
Obs Car Dependent
Variable Predicted
Value Std Error
Mean Predict Residual Std Error Residual
Student Residual -2-1 0 1 2
Cook's D RStudent
1 A 16.0000 23.7375 0.8179 -7.7375 2.811 -2.752 | *****| | 0.321 -3.9150
2 B 21.0000 22.0017 0.7338 -1.0017 2.834 -0.353 | | | 0.004 -0.3421
3 C 22.8000 25.7797 1.0367 -2.9797 2.738 -1.088 | **| | 0.085 -1.0960
4 D 21.4000 19.6872 0.8189 1.7128 2.811 0.609 | |* | 0.016 0.5951
5 E 18.7000 18.1556 0.9750 0.5444 2.761 0.197 | | | 0.002 0.1903
6 F 19.1000 17.6791 1.0340 1.4209 2.739 0.519 | |* | 0.019 0.5047
7 G 14.3000 17.2706 1.0874 -2.9706 2.718 -1.093 | **| | 0.096 -1.1010
8 H 24.4000 22.5803 0.7484 1.8197 2.831 0.643 | |* | 0.014 0.6288
9 I 22.8000 20.1297 0.7863 2.6703 2.820 0.947 | |* | 0.035 0.9431
10 J 19.2000 19.5170 0.8332 -0.3170 2.807 -0.113 | | | 0.001 -0.1089
11 K 16.4000 16.5899 1.1813 -0.1899 2.679 -0.0709 | | | 0.000 -0.0683
12 L 17.3000 16.1815 1.2401 1.1185 2.652 0.422 | | | 0.019 0.4090
13 M 30.4000 26.5966 1.1460 3.8034 2.694 1.412 | |** | 0.180 1.4689
14 N 25.5000 24.7926 0.9190 0.7074 2.780 0.254 | | | 0.004 0.2458
15 O 31.9000 29.1493 1.5298 2.7507 2.496 1.102 | |** | 0.228 1.1110
16 P 26.3000 27.6517 1.2985 -1.3517 2.624 -0.515 | *| | 0.032 -0.5011
The SAS System
The REG Procedure Model: MODEL1
Dependent Variable: y MPG
05:51 Monday, December 02, 2013 3
The SAS System
The REG Procedure Model: MODEL1
Dependent Variable: y MPG
02:44 Wednesday, November 06, 2013 1
Number of Observations Read 15
Number of Observations Used 15
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 1 292.36215 292.36215 69.01 <.0001
Error 13 55.07785 4.23676
Corrected Total 14 347.44000
Root MSE 2.05834 R-Square 0.8415
Dependent Mean 22.10000 Adj R-Sq 0.8293
Coeff Var 9.31375
Parameter Estimates
Variable Label DF Parameter
Estimate Standard
Error t Value Pr > |t|
Intercept Intercept 1 43.37935 2.61617 16.58 <.0001
x Weight(lbs.) 1 -0.00725 0.00087239 -8.31 <.0001
The SAS System
The REG Procedure Model: MODEL1
Dependent Variable: y MPG
02:46 Wednesday, November 06, 2013 1
Number of Observations Read 15
Number of Observations Used 15
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 1 162.14910 162.14910 19.23 0.0007
Error 13 109.60690 8.43130
Corrected Total 14 271.75600
Root MSE 2.90367 R-Square 0.5967
Dependent Mean 21.04000 Adj R-Sq 0.5656
Coeff Var 13.80071
Parameter Estimates
Variable Label DF Parameter
Estimate Standard
Error t Value Pr > |t|
Intercept Intercept 1 39.20810 4.21015 9.31 <.0001
x Weight(lbs.) 1 -0.00608 0.00139 -4.39 0.0007