an observational study of the predictors of mileage per gallon for a car

51
1 | Observational Study Lehigh University Bethlehem, PA 18015 Department of Mathematics An Observational Study of Predictors for Mileage per Gallon of a Car Collaborators: James Patounas, Zhengang Xu, Ali Yeager April 22, 2013

Upload: ali-yeager

Post on 22-Jan-2015

107 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

  • 1. 1 | O b s e r v a t i o n a l S t u d y Lehigh University Bethlehem, PA 18015 Department of Mathematics An Observational Study of Predictors for Mileage per Gallon of a Car Collaborators: James Patounas, Zhengang Xu, Ali Yeager April 22, 2013

2. 2 | O b s e r v a t i o n a l S t u d y Abstract The globalization of production has forced businesses to invest in their products on a large scale. With respect to current consumer trends, car manufacturers must pick qualities for their product that they would like to emphasize while maintaining reasonable gas consumption. Through CAFE standards, the United States government has placed an artificial floor on the gas consumption that a car must achieve. In order to achieve this goal, car manufacturers must improve technologies that allow the car to operate more efficiently. However, in order to do this we must know which characteristic will improve the efficiency the most. We sought to analyze the interplay between various characteristics of a car and gas consumption. In doing so, we pursued establishing a relationship between these characteristics and mileage per gallon. We retrieved data from the United States Environmental Protection Agency which provided 414 observations of cars in the years 1975 through 2011. We chose 4 characteristics as our regressors to observe how each affects the response variable, mileage per gallon. Our results should form a foundation to which a larger study should be conducted to analyze which characteristics affect efficiency. This increase in efficiency will thus decrease demand for gas, and in turn oil of which almost half of the U.S.s supply is imported. This imported oil is subjected to various price shocks which have a great impact on the U.S. economy. 3. 3 | O b s e r v a t i o n a l S t u d y Contents Introduction................................................................................................................................................................................. 5 Purpose ......................................................................................................................................................................................... 5 Limitations................................................................................................................................................................................... 6 Analysis........................................................................................................................................................................................ 7 Results........................................................................................................................................................................................... 8 Conclusions...............................................................................................................................................................................15 Appendix...................................................................................................................................................................................17 Data Set: EPA......................................................................................................................................................................17 Data Set: ML........................................................................................................................................................................29 SAS Code..............................................................................................................................................................................40 regression.sas ..................................................................................................................................................................40 datastep.sas ......................................................................................................................................................................41 inspect.sas.........................................................................................................................................................................42 assumptions.sas ..............................................................................................................................................................43 linearize.sas......................................................................................................................................................................44 Lassumptions.sas ...........................................................................................................................................................45 model.sas..........................................................................................................................................................................46 rr.sas...................................................................................................................................................................................47 Works Cited..............................................................................................................................................................................51 4. 4 | O b s e r v a t i o n a l S t u d y Figure 1.....................................................................................................................................................................8 Figure 2.....................................................................................................................................................................9 Figure 3.....................................................................................................................................................................9 Figure 4.................................................................................................................................................................. 10 Figure 5.................................................................................................................................................................. 11 Figure 6.................................................................................................................................................................. 11 Figure 7.................................................................................................................................................................. 12 Figure 8.................................................................................................................................................................. 12 Figure 9.................................................................................................................................................................. 12 Figure 10 ............................................................................................................................................................... 12 Figure 11 ............................................................................................................................................................... 13 Figure 12 ............................................................................................................................................................... 13 Figure 13 ............................................................................................................................................................... 14 Figure 14 ............................................................................................................................................................... 14 Figure 15 ............................................................................................................................................................... 14 Figure 16 ............................................................................................................................................................... 15 Figure 17 ............................................................................................................................................................... 15 Figure 18 ............................................................................................................................................................... 16 5. 5 | O b s e r v a t i o n a l S t u d y Introduction According to the U.S. Department of Energy, the United States dependence on imported oil has risen dramatically in the past four decades. Approximately one half of the oil used in the United States is imported. In contrast, the United States production of oil has also steadily decreased. A majority of the imported oil is controlled by OPEC (Organization of the Petroleum Exporting Countries). OPEC is able to manipulate prices, which has a large effect on the United States in terms of how much we spend annually on oil. The price manipulation combined with the large price shocks to oil in general has had a great impact to the United States economy in the last decade. Nearly $2 trillion was spent from 2004 to 2008 on oil alone. Each shock was later followed by a recession. Unfortunately, the United States is not able to cut the oil imports completely. However, if we find alternative ways to reduce our use of oil, we can thus reduce the amount of imports. One way to reduce the use of oil is to reduce our demand via implementing automotive manufacturing requirements. For instance, according to Light Duty Automotive Technology and Fuel Economy Trends: 1975 Through 2011, CAFE (Corporate Average Fuel Economy) standards have been in place since 1978. NHTSA (National Highway Traffic Safety Association) has the responsibility for setting and enforcing CAF standards. For MY (model year) 2011, the footprint-based CAFE standards are projected to achieve average industry-wide compliance levels of 30.4 mpg for cars (including a 27.8 mpg alternative minimum standard for domestic cars for all manufacturers)" (United States Environmental Protection Agency). Hence, the United States government has placed an artificial floor on the gas consumption that a car must achieve. According to an article from the U.S. Department of Energy on fuel economy, Congress recently passed legislation to decrease our dependence on oil by increasing corporate average fuel economy (CAFE) standards on new cars and trucks to 35 mpg by model year 2020. This could reduce our petroleum use by 25 billion gallons by 2030 (United States Department of Energy). Purpose An alternative approach for achieving the aforementioned goals would be implementing a public education program about fuel economy while simultaneously creating programs that enforce structured reforms to current automotive production. Of course, in developing government initiatives one must know which automotive factors should be considered. By knowing these factors the government could then develop and implement the most cost effective initiatives, such as rebate programs and laws, so that car manufacturers are more likely to invest in research and development of the most efficient gas consumption technologies. This initial analysis should provide a basis for car manufacturers and the government to further investigate in order to 6. 6 | O b s e r v a t i o n a l S t u d y decrease demand for foreign oil. It is imperative that the United States takes action in reducing this demand because the price shocks and manipulations from the OPEC have been increasingly detrimental to the U.S. economy. With this information, it is our intention to encourage the government and vehicle businesses to further examine this issue. At the same time they could invest in educational and rebate programs in an attempt to influence consumer spending so that it is primarily directed towards automotive vehicles with these new technologies. This will provide sufficient evidence for society to recognize what they must look for when purchasing a new vehicle. If society as a whole becomes more aware of the characteristics that help influence this decrease in demand and what it will do for them in the long run, it is likely they will react accordingly. The long run effects for society include a better mileage per gallon on average with these new technologies as well as lower prices at the pump if we continuously decrease the demand for foreign oil. Combining these will surely influence society to react to this issue because it will directly affect them monetarily. This decrease in the annual budget spent on gas will cause societys disposable income to rise allowing them to spend their money elsewhere, helping the economy become more stable, thus reducing the probability of future economic turmoil. Using a data set published annually, by the United States Environmental Protection Agency, our analysis establishes a relationship between certain characteristics of a car and its gas consumption. In particular, by analyzing the data obtained we observed the relationship between vehicular weight, engine displacement, engine horsepower, and 0-60 acceleration time with mileage per gallon. We hope that this analysis can provide merely a basis for a larger experiment that can be developed by the government to help further reduce the demand for oil, and thus reduce the probability of future recessions. Limitations We obtained copious data that provided us the information for various model years from 1975 through 2011 and their respective mileage per gallon, weight, volume of cubic feet inside the cab, engine displacement, horsepower, and zero to sixty acceleration times. However, in order to properly analyze this data, we ran into the problem of working with time series data. When we plotted the residuals, it was observed that the errors were correlated. In order to overcome this problem, we normalized the data within each model year so we were able to better compare the average across time. However, although we were able to acquire a data set consisting of 414 observations across almost 4 decades, we were still very limited with the data we did find. There are many other factors that affect the mileage per gallon of a car. Many new technologies are being developed 7. 7 | O b s e r v a t i o n a l S t u d y that intend to increase efficiency. According to the U.S. Department of Energy and the Light- Duty Automotive Technology articles, the factors that affect a cars efficiency include how many cylinders are in the engine, variable valve timing, cylinder deactivation, turbochargers and superchargers, direct fuel injection, and new technologies for manual transmissions. According to the U.S. Department of Energy, on average, variable valve timing, cylinder deactivation, turbochargers and superchargers, direct fuel injection, and manual transmission technologies increase efficiency by 5%, 7.5%, 7.5%, 11-13%, and 6-7% respectively. When and how long the valves open (timing) and how much the valves move (lift) both affect engine efficiency. Cylinder deactivation merely deactivates certain cylinders when they are not in use, causing an 8-cylinder engine to a 4-cylinder, and a 6-cylinder engine into a 3-cylinder. Turbochargers and superchargers allow more compressed air and fuel to be injected into the cylinders, generating extra power from each explosion. This allows manufacturers to user smaller engines without sacrificing performance. With direct fuel injection fuel is injected directly into the cylinder so that the timing and shape of the fuel mist can be precisely controlled. This allows higher compression ratios and more efficient fuel intake, which deliver higher performance with lower fuel consumption (United States Department of Energy). Thus it is evident that these factors also play a large part in the technologies necessary to achieve the CAFE standards floor for mileage per gallon. If a larger study were to develop from our analysis, the government and vehicle manufacturers would have the sufficient funds and information to include these other important characteristics and run regression analysis with these taken into consideration (United States Department of Energy). Analysis We observed that our data was stratified according to model year, so to avoid dealing with time- series data we found the average for each set of observations in each year and then divided each observation by the average of its respective year, thus normalizing the data for cross- generational comparison. Upon analysis of our raw data in the scatterplots and residual plots, we viewed problems of heteroskedasticity and multicollinearity. To address the problem of heteroskedasticity, we applied a power transformation to the values and the logarithmic function to two of the predictor variables. While our intent had been to perform cross-validation, we were unable to eliminate the multicollinearity issues by application of any methods known to the experimenters. As such, we performed ridge regression in order to ascertain a model using the predictors. In particular, this was done via the step-wise selection method of regression. We also executed the principal component regression method. 8. 8 | O b s e r v a t i o n a l S t u d y Results We began developing our analysis by subsetting the data obtained from (United States Environmental Protection Agency) so that it only included the variables weight (NWT), engine displacement (NDISP), horsepower (NHP), and zero to sixty acceleration time (NACC) with the mileage per gallon (NMPG). After completing the aforementioned transformation to the stratified raw data, we completed a graphical inspection of the result. All subsequent computational analysis was performed with Statistical Analysis Software (SAS). It follows from Figure 1 that there is a significant nonlinear relationship between weight and mileage per gallon as well as displacement and mileage per gallon. Figure 1 Furthermore, it is clear that for weight, displacement and horsepower there is a negative relationship with mileage per gallon. It is important to note that upon inspection of Figure 1 we see that there appears to be an extremely strong linear relationship between engine displacement and vehicular weight, engine displacement and engine horsepower, and vehicular weight and engine horsepower. There also appears to be a slight linear relationship between engine acceleration and the other three regressors. This suggests that multicollinearity is present in the model. 9. 9 | O b s e r v a t i o n a l S t u d y Figure 2 Figure 3 Performing an initial regression we see, from Figure 3, specifically the QQ-Plot, that our data is approximately normal. That is, for each value there is a normal distribution of values (Dowdy). However, by observing Figure 3 we see that for multiple regressor variables there appears to be a nonlinear relationship and hence the errors are not randomly scattered. As The distribution of for each has the same variance, this suggests heteroskedasticity is inherent in the data; that is the distribution of for each does not have the same variance (Dowdy). Furthermore, as none of the residual plots appear to follow a wave-like pattern it is unlikely that autocorrelation exists. In order to further ascertain information about the potential multicollinearity issues the variance inflation factors (VIF) for each variable were calculated. Upon inspecting Figure 4 we see that for all four variables the VIF is larger than, or very close to, 10. 10. 10 | O b s e r v a t i o n a l S t u d y Figure 4 In order to verify this assessment we also examined the Condition Index. By inspecting Figure 5 located on the following page one can see that we found that for 3 of our 4 variables, the Condition Indices, a value significantly larger than 15 is observed. Thus we determined that a multicollinearity problem existed within the data (Chatterjee). Figure 5 Reiterating, we normalized the data across each model year by dividing each variable for that particular observation by the average for that variable within the stratification the observation belonged to. Running the Durbin-Watson Statistic Test on this normalized set of data, we observe that the value is 2.188. We define to be the Durbin-Watson statistic and we estimate with . The approximate relationship between the two values is 2(1- ). If is close to 0, will be close to 2, and thus we concluded that there is no autocorrelation present in our model. Figure 4 It should be noted that we ran analysis to determine the influential points via the studentized residuals and the leverage points. One can readily obtain this information using the code located in the appendix of this report. While many points were deemed to be either influential or leverage points, we opted to leave these points in the data as we suspect that the EPA uses very 11. 11 | O b s e r v a t i o n a l S t u d y strict testing methods and as such we deemed that the points were not a result of experimental error. Figure 5 We elected to address the heteroskedasticity found in this data by using a power transformation on the response variable. We maximized the log likelihood function by means of the Box-Cox transformation. One can see from Figure 5 that once we performed the Box-Cox test we determined that the optimal was 0.25. That is, for R-squared and the Log Likelihood Function are both maximized and so we transformed each value as . While the Box-Cox transformation improved the heteroskedastic issues, it did not remove them. We subsequently applied the logarithmic function to both NHP and NDISP. The resultant residuals plot is Figure 6. Figure 6 12. 12 | O b s e r v a t i o n a l S t u d y Here we have included the results from regression analysis of the transformed data. It should be noted that all of the assumptions for linear regression are satisfactory with this modified data. Furthermore, the Durbin-Watson statistic suggests that auto-correlation is not present in our model. However, analysis of the data still results in large VIFs and Condition Indices and as such we determined that multicollinearity issues are still present in our model. Figure 7 Figure 8 Figure 9 Figure 10 13. 13 | O b s e r v a t i o n a l S t u d y It should be noted that when there is multicollinearity among independent variables, the estimated coefficients variance will be very large and consequently the estimated value of the slope is unstable. As we were unable to eliminate multicollinearity via transformations, and the predictor variables were deemed to be theoretically important to the model, we applied the Ridge Regression method. Figure 11 After using SAS to develop Figure 11, we used this graph to determine a ridge parameter. We see the VIF for NMPG is decreasing as ridge parameter increases. When the ridge parameter is bigger than approximately 0.01, the VIF is less than 10 and so the multicollinearity problem is not serious. However, since we increase functions bias to reduce the huge coefficients variance, we elected to use a small ridge parameter. In particular, we elected to use k=0.02 as the ridge parameter. Figure 12 14. 14 | O b s e r v a t i o n a l S t u d y From Figure 12 we see that the VIFs are all acceptable in our final model. Furthermore, the final model output by SAS can be seen in Figure 13. Figure 13 Alternatively, we completed a principal-component regression on the modified data set. That is, instead of regressing the dependent variable on the independent variables directly, we used principal components of the independent variables. By inspecting Figure 15 one can see the principal components that we used in determining the final model. Figure 14 We next ran an ordinary least squares regression on the selected components. The factors that are most correlated with the dependent variable were selected and introduced into our model. The results of the principal components can be seen in Figure 15 and Figure 16 on the following page. Figure 15 15. 15 | O b s e r v a t i o n a l S t u d y Figure 16 Ultimately we arrived at a final model, the parameter estimates of which are located in Figure 17. Figure 17 Conclusions In summary, in order to avoid auto-correlation issues, we normalized the data so that we could compare across time periods. We then ran tests on our raw data and determined that there were issues with both heteroskedasticity and multicollinearity. While we were able to address the heteroskedasticity issues in a satisfactory manner through transformations of the data, we were unable to eliminate the multicollinearity without dropping the predictor variables. As doing this would be contradictory to the widely established theory on vehicular mileage, we conducted both ridge regression and principal-component regression in order to ascertain models based upon our data set. We will begin addressing our determinations by completing a side-by-side comparison of the estimated regression parameters for both regression methods utilized. See Figure 18. First and foremost, each predictor estimate is negative as we anticipated. It is interesting to note that the 16. 16 | O b s e r v a t i o n a l S t u d y estimate for WT was very close in both regression techniques. We see that the conclusions for engine displacement, engine horsepower, and 0-60 acceleration are not in close agreement. Ultimately, our results are in agreement with the theory in that all of the predictors used do have an effect on mileage per gallon. However, the importance of each factor is debatable. In retrospect, this was to be expected. From a theoretical standpoint the engine displacement, horsepower, and 0-60 acceleration all have a significant relationship to the capabilities of a cars engine. Figure 18 One interesting future study could be conducted by completing a bootstrap sampling in order to ascertain the standard error of these model estimates. Furthermore, for our study we considered city adjusted city mileage per gallon of a car. Potential future adaptations to this study might be to consider adjusted highway mileage per gallon of a vehicle or adjusted combination of mileage per gallon of a vehicle. Also, due to unforeseen issues and time constraints we did not implement vehicular volume in our model. This would be an interesting addition to the model as it likely has an impact on the mileage per gallon. It would be interesting to significantly expand the breadth of this study by conducting an experiment. First and foremost, some of the issues with multicollinearity could be reduced, but eliminated, by significantly increasing the sample size. Second, due to the inherent restrictions of an observational study, we have only considered major theoretical factors in car design for predicting mileage per gallon. For our stated purposes one should expand the study to include other car variables as well as some driver specific variables, such as transmission type, type of fuel used (hybrid, diesel, oil), and classification of major driving (highway, city, mixed). 17. 17 | O b s e r v a t i o n a l S t u d y Appendix Data Set: EPA ID MY MPG WT VOL DISP HP ACC 1 1975 15.1 3016 173 125 12.6 2 1975 17.6 2721 128 82 15.2 3 1975 15.0 3243 207 107 14.2 4 1975 12.9 3958 288 129 14.6 5 1975 10.5 4630 353 158 13.9 6 1975 10.2 5142 410 180 13.5 7 1975 17.4 2834 128 87 14.8 8 1975 10.2 4791 361 162 13.8 9 1975 9.3 5453 410 181 14.0 10 1975 9.0 4000 257 110 15.7 11 1975 11.8 4362 328 148 13.7 12 1975 10.7 4500 349 160 13.1 13 1976 14.9 3032 187 136 12.1 14 1976 20.2 2688 125 80 15.1 15 1976 17.4 3033 181 97 14.7 16 1976 13.9 4025 289 125 15.2 17 1976 12.2 4558 343 156 13.8 18 1976 11.1 5156 410 180 13.4 19 1976 18.7 2902 136 86 15.2 20 1976 12.6 4555 315 142 14.8 21 1976 10.6 5444 416 185 13.7 22 1976 8.1 4073 258 125 14.4 23 1976 12.3 4348 328 139 14.4 24 1976 11.1 4500 346 144 14.3 25 1977 15.5 3062 50.0 183 134 12.4 26 1977 21.7 2544 78.8 116 83 14.0 27 1977 18.8 2966 90.7 174 98 14.2 28 1977 14.2 4129 107.3 291 131 14.6 29 1977 12.9 4474 112.9 332 151 13.9 30 1977 12.4 4482 128.1 366 166 12.8 18. 18 | O b s e r v a t i o n a l S t u d y 31 1977 20.2 2801 108.0 134 85 14.9 32 1977 12.9 4410 143.6 297 135 15.1 33 1977 12.2 4713 163.1 365 161 13.6 34 1977 10.7 4000 100.0 258 125 14.2 35 1977 12.6 4405 125.0 324 145 14.0 36 1977 12.9 4500 150.0 342 147 14.0 37 1978 15.0 3079 50.0 187 143 11.8 38 1978 21.7 2584 79.1 120 83 14.1 39 1978 19.4 2842 89.7 159 94 14.4 40 1978 15.8 3552 105.1 236 113 14.5 41 1978 14.4 3820 113.0 292 135 13.4 42 1978 13.0 4394 128.5 357 162 12.8 43 1978 19.2 2805 108.0 134 90 14.3 44 1978 14.6 3836 140.0 258 123 14.4 45 1978 12.4 4664 162.4 354 162 13.4 46 1978 9.8 4000 100.0 258 125 14.2 47 1978 11.8 4409 125.0 315 149 13.7 48 1978 12.2 4500 150.0 338 159 13.1 49 1979 15.3 3026 50.0 180 128 12.2 50 1979 21.7 2450 79.7 113 77 14.4 51 1979 19.0 2847 90.4 155 94 14.2 52 1979 15.4 3624 105.3 246 117 14.4 53 1979 15.0 3710 113.1 272 128 13.6 54 1979 13.7 4210 130.0 339 154 12.9 55 1979 20.3 2711 105.1 123 81 15.1 56 1979 15.2 3758 139.7 249 118 14.7 57 1979 12.8 4467 162.5 333 155 13.4 58 1979 14.8 3127 100.0 162 94 15.0 59 1979 10.2 4385 125.0 314 140 14.2 60 1980 15.6 2954 50.0 180 121 12.3 61 1980 22.1 2459 82.8 116 78 14.4 62 1980 21.3 2640 89.9 128 82 14.7 63 1980 17.4 3185 106.2 186 102 14.4 64 1980 16.8 3362 113.2 229 113 13.8 65 1980 14.7 4130 130.9 314 137 14.0 66 1980 22.6 2591 108.2 113 75 15.4 19. 19 | O b s e r v a t i o n a l S t u d y 67 1980 16.5 3535 139.7 228 107 15.0 68 1980 14.7 4423 161.5 324 134 15.2 69 1980 13.1 4457 125.0 307 145 14.1 70 1980 15.9 4500 150.0 240 94 19.8 71 1981 17.1 3005 50.0 202 142 10.6 72 1981 27.6 2164 82.6 92 67 14.5 73 1981 23.3 2604 90.3 124 81 14.7 74 1981 20.7 2825 104.3 142 91 14.2 75 1981 17.8 3346 113.9 220 107 14.2 76 1981 15.4 4108 131.0 304 133 14.3 77 1981 23.6 2531 110.6 108 79 14.4 78 1981 18.1 3285 136.2 193 103 14.5 79 1981 15.0 4394 161.4 313 131 15.3 80 1981 13.4 4458 125.0 292 137 14.8 81 1982 19.7 2726 50.0 147 106 13.0 82 1982 28.5 2193 82.7 95 70 14.6 83 1982 22.8 2657 91.8 133 86 14.5 84 1982 22.2 2794 102.9 128 87 14.6 85 1982 18.4 3321 113.9 211 107 14.2 86 1982 15.6 4034 131.0 292 135 13.9 87 1982 23.9 2580 112.2 109 75 15.3 88 1982 18.5 3384 136.1 205 108 14.3 89 1982 14.4 4396 161.3 306 139 14.6 90 1982 20.7 2500 100.0 151 84 13.7 91 1982 14.8 4242 125.0 284 146 13.5 92 1983 18.4 2756 50.0 146 115 11.8 93 1983 28.4 2273 82.0 100 77 14.2 94 1983 23.4 2688 93.3 136 91 14.0 95 1983 22.2 2844 103.2 141 91 14.4 96 1983 18.4 3316 113.8 212 111 13.8 97 1983 15.2 4041 131.3 293 140 13.4 98 1983 25.3 2565 108.2 105 74 15.3 99 1983 19.1 3348 136.2 200 109 14.1 100 1983 14.7 4380 161.6 307 142 14.1 101 1983 20.4 2500 100.0 151 87 13.3 102 1983 18.6 3550 125.0 178 110 14.5 20. 20 | O b s e r v a t i o n a l S t u d y 103 1984 20.3 2886 50.0 174 121 12.1 104 1984 19.5 2855 75.9 151 141 10.5 105 1984 22.9 2737 93.3 140 97 13.5 106 1984 23.0 2798 103.1 137 91 14.3 107 1984 18.5 3318 113.7 210 113 13.6 108 1984 15.5 4022 130.9 294 140 13.4 109 1984 24.9 2620 116.5 107 77 15.2 110 1984 19.4 3298 135.9 172 107 14.1 111 1984 14.9 4371 161.7 305 144 13.9 112 1984 19.7 2500 100.0 150 93 12.6 113 1984 16.8 3617 125.0 192 112 14.6 114 1985 20.5 2826 50.1 158 123 11.6 115 1985 28.0 2300 79.2 114 95 13.1 116 1985 23.3 2734 93.9 136 100 13.2 117 1985 23.1 2804 102.9 138 98 13.5 118 1985 19.0 3319 113.6 205 117 13.3 119 1985 16.6 3841 129.3 279 143 12.7 120 1985 25.4 2579 117.7 107 78 15.0 121 1985 19.4 3356 134.8 170 112 13.9 122 1985 15.6 4354 161.7 305 154 13.2 123 1985 16.3 3633 125.0 195 116 14.3 124 1986 21.3 2916 50.0 166 130 11.7 125 1986 23.3 2408 80.5 113 105 12.8 126 1986 23.6 2764 94.7 136 100 13.4 127 1986 23.0 2819 103.2 137 98 13.5 128 1986 19.6 3241 113.8 194 119 13.0 129 1986 17.6 3719 127.4 260 147 12.1 130 1986 24.2 2648 118.4 113 82 14.7 131 1986 19.8 3355 137.8 162 115 13.6 132 1986 16.2 4381 161.4 304 146 13.9 133 1986 18.0 3500 100.0 179 139 11.9 134 1986 16.3 3612 125.0 191 125 13.4 135 1987 20.5 2920 50.0 167 134 11.5 136 1987 23.3 2636 77.2 128 131 11.3 137 1987 23.9 2728 92.7 128 99 13.7 138 1987 22.7 2836 102.8 135 101 13.3 21. 21 | O b s e r v a t i o n a l S t u d y 139 1987 19.5 3247 113.7 189 123 12.7 140 1987 17.4 3696 127.0 260 149 11.8 141 1987 23.8 2795 120.0 116 91 14.2 142 1987 19.3 3434 140.2 173 125 13.0 143 1987 16.3 4348 161.8 304 143 14.0 144 1987 17.5 3500 100.0 179 139 11.9 145 1987 16.3 3606 125.0 196 131 13.0 146 1988 20.3 2940 50.0 171 138 11.2 147 1988 23.7 2619 77.9 128 129 11.7 148 1988 24.6 2668 93.2 122 99 13.5 149 1988 22.6 2889 103.6 136 107 13.0 150 1988 19.7 3293 113.4 183 129 12.3 151 1988 17.6 3730 128.1 262 150 11.9 152 1988 24.3 2757 118.7 113 93 13.7 153 1988 19.4 3378 139.4 171 126 12.7 154 1988 16.8 4349 161.7 304 143 14.0 155 1988 17.5 3500 100.0 179 139 11.9 156 1988 16.3 3594 125.0 221 143 12.1 157 1989 19.9 3031 50.0 199 160 10.2 158 1989 20.2 2866 77.2 146 148 10.7 159 1989 23.6 2748 93.1 129 104 13.0 160 1989 22.5 2886 102.8 130 108 12.7 161 1989 19.5 3314 113.6 181 131 12.2 162 1989 17.3 3721 127.4 262 158 11.4 163 1989 24.2 2766 118.6 112 96 13.5 164 1989 18.8 3436 139.9 176 132 12.4 165 1989 16.5 4334 161.8 304 143 13.9 166 1989 17.5 3500 100.0 175 135 12.2 167 1989 16.2 3613 125.0 254 152 11.5 168 1990 20.5 2979 50.0 164 160 10.3 169 1990 19.6 3019 80.7 148 145 10.5 170 1990 23.9 2762 94.5 117 108 12.5 171 1990 21.6 2985 103.3 138 118 12.3 172 1990 19.0 3450 113.7 187 141 12.0 173 1990 16.9 3799 126.7 258 161 11.4 174 1990 22.4 3026 122.2 120 110 12.9 22. 22 | O b s e r v a t i o n a l S t u d y 175 1990 18.3 3499 141.6 181 139 12.0 176 1990 16.5 4337 161.6 304 143 13.9 177 1990 17.7 3444 100.0 166 128 12.6 178 1990 15.8 3692 125.0 244 159 11.3 179 1991 21.4 2890 50.0 149 150 10.8 180 1991 21.3 2912 75.5 127 137 10.8 181 1991 23.4 2779 94.8 124 111 12.3 182 1991 21.4 2984 104.5 136 120 12.1 183 1991 18.8 3412 113.5 191 147 11.5 184 1991 16.8 3893 129.0 264 178 10.7 185 1991 22.8 3005 123.3 120 106 13.3 186 1991 18.6 3506 142.3 181 140 12.0 187 1991 16.3 4403 169.1 304 177 11.9 188 1991 18.5 3241 100.0 136 90 15.9 189 1991 15.9 3873 125.0 235 159 11.7 190 1992 19.4 3022 50.0 187 183 10.0 191 1992 23.2 2691 81.0 110 118 11.2 192 1992 23.5 2806 93.8 122 117 12.0 193 1992 20.9 3034 104.3 143 127 11.8 194 1992 18.4 3515 113.9 191 151 11.5 195 1992 16.8 3872 129.6 262 186 10.4 196 1992 22.3 3076 123.7 122 123 12.4 197 1992 18.6 3504 142.6 188 143 11.8 198 1992 15.9 4500 170.3 338 177 12.1 199 1992 18.7 3076 100.0 126 96 14.5 200 1992 15.5 3879 125.0 229 166 11.4 201 1993 20.6 2975 50.0 169 178 9.6 202 1993 22.1 2819 79.5 119 130 10.9 203 1993 23.3 2797 94.4 124 120 11.8 204 1993 21.5 2982 104.1 139 121 12.0 205 1993 18.6 3515 113.9 192 153 11.4 206 1993 17.0 3831 128.9 256 181 10.5 207 1993 23.7 2882 123.0 118 99 13.5 208 1993 18.7 3498 137.7 188 144 11.7 209 1993 15.6 4500 169.3 340 174 12.2 210 1993 18.4 3088 100.0 127 95 14.5 23. 23 | O b s e r v a t i o n a l S t u d y 211 1993 14.7 3937 125.0 249 187 10.4 212 1994 18.7 3134 50.0 207 198 8.8 213 1994 20.2 3146 81.0 132 163 10.3 214 1994 22.7 2879 94.4 134 125 11.7 215 1994 21.5 3023 103.9 138 124 11.9 216 1994 18.3 3529 113.5 190 155 11.4 217 1994 16.8 3859 128.3 249 188 10.2 218 1994 23.6 2908 122.9 121 103 13.3 219 1994 18.4 3533 137.4 180 151 11.4 220 1994 15.9 4500 169.2 350 250 9.1 221 1994 19.4 3018 100.0 117 100 13.8 222 1994 15.5 3900 125.0 230 178 10.8 223 1995 18.7 3177 50.0 204 199 8.9 224 1995 19.6 3218 79.9 147 166 9.9 225 1995 22.5 2949 93.6 141 134 11.3 226 1995 21.8 2998 103.9 134 133 11.2 227 1995 18.2 3546 114.3 189 166 10.7 228 1995 16.9 3830 127.9 248 207 9.4 229 1995 23.9 2859 122.1 118 100 13.4 230 1995 18.8 3482 135.9 167 149 11.4 231 1995 15.8 4500 169.3 350 250 9.1 232 1995 22.3 2617 100.0 95 82 14.4 233 1995 15.0 4049 125.0 227 170 11.5 234 1996 18.2 3168 50.0 216 205 8.6 235 1996 19.0 3270 76.8 158 179 9.8 236 1996 23.3 2875 94.9 136 132 11.2 237 1996 21.4 3007 103.1 134 133 11.1 238 1996 18.5 3527 114.1 181 162 10.9 239 1996 16.7 3895 128.1 250 217 9.2 240 1996 22.5 2952 118.0 122 114 12.5 241 1996 18.5 3661 136.9 161 154 11.6 242 1996 15.9 4500 170.2 350 260 8.9 243 1996 21.8 2857 100.0 108 106 12.7 244 1996 14.9 4128 125.0 235 176 11.4 245 1996 13.9 4500 150.0 350 250 9.1 246 1997 18.7 3057 50.0 178 198 8.7 24. 24 | O b s e r v a t i o n a l S t u d y 247 1997 18.7 3338 76.6 154 185 9.7 248 1997 23.1 2909 95.5 126 128 11.5 249 1997 21.2 3004 102.6 135 134 11.1 250 1997 18.4 3551 114.5 184 170 10.6 251 1997 16.8 3821 127.4 242 217 9.1 252 1997 22.7 2901 119.5 125 118 11.8 253 1997 18.5 3666 136.5 160 158 11.3 254 1997 20.5 2989 100.0 134 130 11.2 255 1997 15.0 4136 125.0 228 178 11.3 256 1997 13.2 4500 150.0 302 195 11.2 257 1998 17.0 3458 50.0 244 255 7.7 258 1998 17.9 3522 78.9 169 195 9.6 259 1998 22.1 2984 96.4 139 135 11.1 260 1998 21.4 3010 102.7 131 134 11.0 261 1998 18.7 3534 114.0 177 169 10.5 262 1998 16.8 3784 127.4 240 218 8.9 263 1998 22.3 2874 116.9 124 119 11.7 264 1998 18.5 3669 135.3 155 169 10.7 265 1998 18.6 3380 100.0 144 148 11.3 266 1998 15.5 3943 125.0 232 175 11.0 267 1998 14.8 4500 150.0 217 213 10.4 268 1999 17.9 3179 50.0 208 225 7.9 269 1999 17.3 3602 79.3 181 220 9.0 270 1999 21.8 2973 96.4 137 137 11.0 271 1999 20.8 3135 103.2 139 141 11.0 272 1999 18.5 3540 114.0 175 171 10.5 273 1999 16.8 3854 127.0 235 222 8.9 274 1999 21.8 2923 117.9 125 126 11.3 275 1999 18.3 3691 136.4 157 169 10.8 276 1999 19.8 3214 100.0 130 135 11.5 277 1999 15.7 3953 125.0 221 176 11.0 278 1999 12.7 4461 150.0 317 218 10.2 279 2000 17.9 3124 50.0 208 232 7.7 280 2000 16.6 3729 79.6 193 244 8.3 281 2000 21.3 3026 96.5 145 144 10.7 282 2000 21.0 3097 103.6 132 138 11.1 25. 25 | O b s e r v a t i o n a l S t u d y 283 2000 18.4 3550 113.6 174 178 10.2 284 2000 17.3 3782 124.9 221 215 9.2 285 2000 20.0 3107 119.7 127 143 10.7 286 2000 18.9 3572 134.0 153 162 11.0 287 2000 17.0 3563 100.0 154 147 11.7 288 2000 15.4 3973 125.0 225 181 10.8 289 2000 12.3 4471 150.0 324 228 9.8 290 2001 18.0 3207 50.0 200 239 7.6 291 2001 16.9 3711 80.5 178 228 8.8 292 2001 19.4 3226 93.5 173 173 9.9 293 2001 21.8 3056 103.0 130 139 11.0 294 2001 18.5 3566 113.7 175 180 10.1 295 2001 17.1 3774 124.8 221 204 9.5 296 2001 19.0 3470 119.6 145 148 11.3 297 2001 18.5 3775 133.6 155 176 10.7 298 2001 19.4 3281 100.0 127 138 11.5 299 2001 15.9 4026 125.0 213 194 10.4 300 2001 14.7 4272 150.0 236 211 10.1 301 2002 16.9 3397 50.0 224 251 7.6 302 2002 17.4 3649 81.5 188 238 8.5 303 2002 19.1 3293 95.5 183 186 9.5 304 2002 21.6 3065 103.6 128 140 11.0 305 2002 18.8 3549 114.8 177 185 9.8 306 2002 17.4 3768 124.3 218 209 9.3 307 2002 18.0 3504 118.2 145 156 11.0 308 2002 18.9 3732 133.6 153 172 10.7 309 2002 19.1 3247 100.0 130 146 10.9 310 2002 15.9 3946 125.0 200 193 10.2 311 2002 14.3 4450 150.0 249 231 9.8 312 2003 16.9 3440 50.0 229 266 7.3 313 2003 18.7 3408 82.1 152 199 9.8 314 2003 18.4 3438 94.9 176 185 9.8 315 2003 21.7 3088 104.0 131 146 10.6 316 2003 18.9 3567 114.6 174 189 9.7 317 2003 17.3 3841 124.8 224 208 9.5 318 2003 20.8 3262 115.2 127 146 11.0 26. 26 | O b s e r v a t i o n a l S t u d y 319 2003 18.6 3745 133.5 151 174 10.7 320 2003 20.4 3056 100.0 125 144 10.4 321 2003 16.3 3941 125.0 197 192 10.3 322 2003 14.6 4403 150.0 252 255 8.9 323 2004 16.4 3540 50.0 235 271 7.4 324 2004 18.4 3491 82.3 149 185 10.2 325 2004 18.3 3365 95.9 174 192 9.5 326 2004 21.3 3122 103.6 135 153 10.4 327 2004 19.0 3577 114.0 174 190 9.7 328 2004 17.2 3858 124.7 225 211 9.4 329 2004 21.6 3235 117.5 126 166 10.1 330 2004 17.9 3860 135.0 158 189 10.3 331 2004 14.9 4769 165.0 215 249 9.6 332 2004 20.4 3091 100.0 150 162 9.6 333 2004 16.2 3998 125.0 206 201 10.0 334 2004 15.0 4369 150.0 233 245 9.2 335 2005 16.3 3502 50.0 241 291 7.1 336 2005 19.0 3368 81.7 143 187 9.9 337 2005 18.7 3352 97.3 171 189 9.4 338 2005 21.8 3084 103.8 131 150 10.4 339 2005 19.7 3545 114.5 168 185 9.8 340 2005 17.4 3933 125.0 225 216 9.5 341 2005 22.2 3160 115.9 122 156 10.4 342 2005 17.4 3839 133.3 183 213 9.3 343 2005 14.9 4791 165.0 217 244 9.8 344 2005 20.7 3049 100.0 145 161 9.5 345 2005 16.8 3959 125.0 191 197 10.1 346 2005 15.8 4220 150.0 220 228 9.5 347 2006 16.8 3378 50.0 226 282 7.1 348 2006 19.1 3354 81.4 150 197 9.4 349 2006 20.7 3223 98.0 149 171 9.9 350 2006 21.2 3256 104.8 145 167 10.0 351 2006 19.6 3568 114.0 169 191 9.6 352 2006 17.1 4014 124.7 230 238 8.9 353 2006 21.3 3255 118.4 127 160 10.5 354 2006 17.7 3827 135.6 183 221 9.1 27. 27 | O b s e r v a t i o n a l S t u d y 355 2006 14.8 4806 164.4 219 254 9.5 356 2006 17.2 3991 125.0 188 196 10.1 357 2006 15.9 4182 150.0 221 225 9.5 358 2007 16.9 3398 50.0 227 290 7.0 359 2007 19.7 3380 80.8 159 215 9.0 360 2007 21.1 3190 98.2 147 172 9.8 361 2007 21.6 3251 105.4 137 162 10.2 362 2007 20.7 3581 113.8 165 191 9.6 363 2007 17.1 4026 123.8 229 236 9.0 364 2007 21.5 3264 112.0 127 160 10.4 365 2007 18.4 3727 135.4 161 195 9.8 366 2007 14.5 4785 159.2 237 247 9.7 367 2007 15.0 4408 100.0 232 190 11.2 368 2007 18.0 3908 125.0 181 201 9.9 369 2007 15.5 4289 150.0 222 238 9.3 370 2008 18.3 3187 50.4 207 270 7.7 371 2008 20.5 3342 81.0 147 203 9.4 372 2008 21.4 3261 97.7 147 177 9.8 373 2008 21.1 3311 104.6 142 170 10.0 374 2008 20.9 3564 113.3 163 191 9.5 375 2008 17.8 3966 123.2 213 232 9.0 376 2008 21.9 3300 115.0 130 160 10.5 377 2008 17.9 3845 134.6 165 195 10.0 378 2008 14.6 5017 160.1 227 256 9.6 379 2008 14.9 4500 100.0 231 190 11.4 380 2008 18.5 3870 125.0 175 199 9.8 381 2008 15.8 4353 150.0 221 249 9.0 382 2009 17.9 3272 50.0 222 290 7.6 383 2009 22.0 3260 80.5 140 200 9.5 384 2009 22.4 3182 97.7 138 169 9.9 385 2009 22.5 3297 104.9 137 166 10.2 386 2009 21.5 3539 113.9 156 184 9.8 387 2009 18.6 3883 122.6 200 225 9.0 388 2009 22.8 3263 114.8 125 150 10.7 389 2009 19.0 3653 133.7 153 186 9.9 390 2009 14.8 5500 161.7 209 261 10.0 28. 28 | O b s e r v a t i o n a l S t u d y 391 2009 14.9 4500 100.0 231 190 11.4 392 2009 19.0 3844 125.0 171 204 9.6 393 2009 16.6 4289 150.0 208 239 9.2 394 2010 17.5 3309 50.0 240 320 6.8 395 2010 22.3 3168 81.1 131 191 9.6 396 2010 22.2 3261 97.0 145 180 9.7 397 2010 23.0 3272 104.9 143 173 9.9 398 2010 22.7 3577 114.3 158 188 9.5 399 2010 18.5 3923 122.8 202 232 8.9 400 2010 23.2 3269 117.9 125 150 10.7 401 2010 18.9 3814 135.1 154 223 8.9 402 2010 15.1 4500 100.0 232 200 10.9 403 2010 19.6 3820 125.1 172 206 9.5 404 2010 18.2 4277 141.1 190 236 9.3 405 2011 20.2 3149 50.0 193 257 8.0 406 2011 26.4 2857 79.6 116 171 9.7 407 2011 22.3 3293 96.8 148 197 9.5 408 2011 23.2 3349 104.9 141 175 9.8 409 2011 22.6 3601 113.5 153 189 9.6 410 2011 19.8 3833 121.9 190 232 8.7 411 2011 23.6 3281 116.5 123 154 10.5 412 2011 16.7 4409 136.2 204 261 8.7 413 2011 19.9 3807 125.1 169 205 9.5 414 2011 18.2 4293 150.0 199 257 8.8 Data obtained from: AppendixF.(n.d.). RetrievedMarch72013fromtheUnitedStatesEnvironmentProtection Agency: http://www.epa.gov/otaq/fetrends.htm. 29. 29 | O b s e r v a t i o n a l S t u d y Data Set: ML ID MPG DISP HP WT ACC MY 1 18.0 307.0 130.0 3504.0 12.0 1970 2 15.0 350.0 165.0 3693.0 11.5 1970 3 18.0 318.0 150.0 3436.0 11.0 1970 4 16.0 304.0 150.0 3433.0 12.0 1970 5 17.0 302.0 140.0 3449.0 10.5 1970 6 15.0 429.0 198.0 4341.0 10.0 1970 7 14.0 454.0 220.0 4354.0 9.0 1970 8 14.0 440.0 215.0 4312.0 8.5 1970 9 14.0 455.0 225.0 4425.0 10.0 1970 10 15.0 390.0 190.0 3850.0 8.5 1970 11 15.0 383.0 170.0 3563.0 10.0 1970 12 14.0 340.0 160.0 3609.0 8.0 1970 13 15.0 400.0 150.0 3761.0 9.5 1970 14 14.0 455.0 225.0 3086.0 10.0 1970 15 24.0 113.0 95.0 2372.0 15.0 1970 16 22.0 198.0 95.0 2833.0 15.5 1970 17 18.0 199.0 97.0 2774.0 15.5 1970 18 21.0 200.0 85.0 2587.0 16.0 1970 19 27.0 97.0 88.0 2130.0 14.5 1970 20 26.0 97.0 46.0 1835.0 20.5 1970 21 25.0 110.0 87.0 2672.0 17.5 1970 22 24.0 107.0 90.0 2430.0 14.5 1970 23 25.0 104.0 95.0 2375.0 17.5 1970 24 26.0 121.0 113.0 2234.0 12.5 1970 25 21.0 199.0 90.0 2648.0 15.0 1970 26 10.0 360.0 215.0 4615.0 14.0 1970 27 10.0 307.0 200.0 4376.0 15.0 1970 28 11.0 318.0 210.0 4382.0 13.5 1970 29 9.0 304.0 193.0 4732.0 18.5 1970 30 27.0 97.0 88.0 2130.0 14.5 1971 31 28.0 140.0 90.0 2264.0 15.5 1971 32 25.0 113.0 95.0 2228.0 14.0 1971 33 19.0 232.0 100.0 2634.0 13.0 1971 30. 30 | O b s e r v a t i o n a l S t u d y 34 16.0 225.0 105.0 3439.0 15.5 1971 35 17.0 250.0 100.0 3329.0 15.5 1971 36 19.0 250.0 88.0 3302.0 15.5 1971 37 18.0 232.0 100.0 3288.0 15.5 1971 38 14.0 350.0 165.0 4209.0 12.0 1971 39 14.0 400.0 175.0 4464.0 11.5 1971 40 14.0 351.0 153.0 4154.0 13.5 1971 41 14.0 318.0 150.0 4096.0 13.0 1971 42 12.0 383.0 180.0 4955.0 11.5 1971 43 13.0 400.0 170.0 4746.0 12.0 1971 44 13.0 400.0 175.0 5140.0 12.0 1971 45 18.0 258.0 110.0 2962.0 13.5 1971 46 22.0 140.0 72.0 2408.0 19.0 1971 47 19.0 250.0 100.0 3282.0 15.0 1971 48 18.0 250.0 88.0 3139.0 14.5 1971 49 23.0 122.0 86.0 2220.0 14.0 1971 50 28.0 116.0 90.0 2123.0 14.0 1971 51 30.0 79.0 70.0 2074.0 19.5 1971 52 30.0 88.0 76.0 2065.0 14.5 1971 53 31.0 71.0 65.0 1773.0 19.0 1971 54 35.0 72.0 69.0 1613.0 18.0 1971 55 27.0 97.0 60.0 1834.0 19.0 1971 56 26.0 91.0 70.0 1955.0 20.5 1971 57 24.0 113.0 95.0 2278.0 15.5 1972 58 25.0 97.5 80.0 2126.0 17.0 1972 59 23.0 97.0 54.0 2254.0 23.5 1972 60 20.0 140.0 90.0 2408.0 19.5 1972 61 21.0 122.0 86.0 2226.0 16.5 1972 62 13.0 350.0 165.0 4274.0 12.0 1972 63 14.0 400.0 175.0 4385.0 12.0 1972 64 15.0 318.0 150.0 4135.0 13.5 1972 65 14.0 351.0 153.0 4129.0 13.0 1972 66 17.0 304.0 150.0 3672.0 11.5 1972 67 11.0 429.0 208.0 4633.0 11.0 1972 68 13.0 350.0 155.0 4502.0 13.5 1972 69 12.0 350.0 160.0 4456.0 13.5 1972 31. 31 | O b s e r v a t i o n a l S t u d y 70 13.0 400.0 190.0 4422.0 12.5 1972 71 19.0 70.0 97.0 2330.0 13.5 1972 72 15.0 304.0 150.0 3892.0 12.5 1972 73 13.0 307.0 130.0 4098.0 14.0 1972 74 13.0 302.0 140.0 4294.0 16.0 1972 75 14.0 318.0 150.0 4077.0 14.0 1972 76 18.0 121.0 112.0 2933.0 14.5 1972 77 22.0 121.0 76.0 2511.0 18.0 1972 78 21.0 120.0 87.0 2979.0 19.5 1972 79 26.0 96.0 69.0 2189.0 18.0 1972 80 22.0 122.0 86.0 2395.0 16.0 1972 81 28.0 97.0 92.0 2288.0 17.0 1972 82 23.0 120.0 97.0 2506.0 14.5 1972 83 28.0 98.0 80.0 2164.0 15.0 1972 84 27.0 97.0 88.0 2100.0 16.5 1972 85 13.0 350.0 175.0 4100.0 13.0 1973 86 14.0 304.0 150.0 3672.0 11.5 1973 87 13.0 350.0 145.0 3988.0 13.0 1973 88 14.0 302.0 137.0 4042.0 14.5 1973 89 15.0 318.0 150.0 3777.0 12.5 1973 90 12.0 429.0 198.0 4952.0 11.5 1973 91 13.0 400.0 150.0 4464.0 12.0 1973 92 13.0 351.0 158.0 4363.0 13.0 1973 93 14.0 318.0 150.0 4237.0 14.5 1973 94 13.0 440.0 215.0 4735.0 11.0 1973 95 12.0 455.0 225.0 4951.0 11.0 1973 96 13.0 360.0 175.0 3821.0 11.0 1973 97 18.0 225.0 105.0 3121.0 16.5 1973 98 16.0 250.0 100.0 3278.0 18.0 1973 99 18.0 232.0 100.0 2945.0 16.0 1973 100 18.0 250.0 88.0 3021.0 16.5 1973 101 23.0 198.0 95.0 2904.0 16.0 1973 102 26.0 97.0 46.0 1950.0 21.0 1973 103 11.0 400.0 150.0 4997.0 14.0 1973 104 12.0 400.0 167.0 4906.0 12.5 1973 105 13.0 360.0 170.0 4654.0 13.0 1973 32. 32 | O b s e r v a t i o n a l S t u d y 106 12.0 350.0 180.0 4499.0 12.5 1973 107 18.0 232.0 100.0 2789.0 15.0 1973 108 20.0 97.0 88.0 2279.0 19.0 1973 109 21.0 140.0 72.0 2401.0 19.5 1973 110 22.0 108.0 94.0 2379.0 16.5 1973 111 18.0 70.0 90.0 2124.0 13.5 1973 112 19.0 122.0 85.0 2310.0 18.5 1973 113 21.0 155.0 107.0 2472.0 14.0 1973 114 26.0 98.0 90.0 2265.0 15.5 1973 115 15.0 350.0 145.0 4082.0 13.0 1973 116 16.0 400.0 230.0 4278.0 9.5 1973 117 29.0 68.0 49.0 1867.0 19.5 1973 118 24.0 116.0 75.0 2158.0 15.5 1973 119 20.0 114.0 91.0 2582.0 14.0 1973 120 19.0 121.0 112.0 2868.0 15.5 1973 121 15.0 318.0 150.0 3399.0 11.0 1973 122 24.0 121.0 110.0 2660.0 14.0 1973 123 20.0 156.0 122.0 2807.0 13.5 1973 124 11.0 350.0 180.0 3664.0 11.0 1973 125 20.0 198.0 95.0 3102.0 16.5 1974 126 19.0 232.0 100.0 2901.0 16.0 1974 127 15.0 250.0 100.0 3336.0 17.0 1974 128 31.0 79.0 67.0 1950.0 19.0 1974 129 26.0 122.0 80.0 2451.0 16.5 1974 130 32.0 71.0 65.0 1836.0 21.0 1974 131 25.0 140.0 75.0 2542.0 17.0 1974 132 16.0 250.0 100.0 3781.0 17.0 1974 133 16.0 258.0 110.0 3632.0 18.0 1974 134 18.0 225.0 105.0 3613.0 16.5 1974 135 16.0 302.0 140.0 4141.0 14.0 1974 136 13.0 350.0 150.0 4699.0 14.5 1974 137 14.0 318.0 150.0 4457.0 13.5 1974 138 14.0 302.0 140.0 4638.0 16.0 1974 139 14.0 304.0 150.0 4257.0 15.5 1974 140 29.0 98.0 83.0 2219.0 16.5 1974 141 26.0 79.0 67.0 1963.0 15.5 1974 33. 33 | O b s e r v a t i o n a l S t u d y 142 26.0 97.0 78.0 2300.0 14.5 1974 143 31.0 76.0 52.0 1649.0 16.5 1974 144 32.0 83.0 61.0 2003.0 19.0 1974 145 28.0 90.0 75.0 2125.0 14.5 1974 146 24.0 90.0 75.0 2108.0 15.5 1974 147 26.0 116.0 75.0 2246.0 14.0 1974 148 24.0 120.0 97.0 2489.0 15.0 1974 149 26.0 108.0 93.0 2391.0 15.5 1974 150 31.0 79.0 67.0 2000.0 16.0 1974 151 19.0 225.0 95.0 3264.0 16.0 1975 152 18.0 250.0 105.0 3459.0 16.0 1975 153 15.0 250.0 72.0 3432.0 21.0 1975 154 15.0 250.0 72.0 3158.0 19.5 1975 155 16.0 400.0 170.0 4668.0 11.5 1975 156 15.0 350.0 145.0 4440.0 14.0 1975 157 16.0 318.0 150.0 4498.0 14.5 1975 158 14.0 351.0 148.0 4657.0 13.5 1975 159 17.0 231.0 110.0 3907.0 21.0 1975 160 16.0 250.0 105.0 3897.0 18.5 1975 161 15.0 258.0 110.0 3730.0 19.0 1975 162 18.0 225.0 95.0 3785.0 19.0 1975 163 21.0 231.0 110.0 3039.0 15.0 1975 164 20.0 262.0 110.0 3221.0 13.5 1975 165 13.0 302.0 129.0 3169.0 12.0 1975 166 29.0 97.0 75.0 2171.0 16.0 1975 167 23.0 140.0 83.0 2639.0 17.0 1975 168 20.0 232.0 100.0 2914.0 16.0 1975 169 23.0 140.0 78.0 2592.0 18.5 1975 170 24.0 134.0 96.0 2702.0 13.5 1975 171 25.0 90.0 71.0 2223.0 16.5 1975 172 24.0 119.0 97.0 2545.0 17.0 1975 173 18.0 171.0 97.0 2984.0 14.5 1975 174 29.0 90.0 70.0 1937.0 14.0 1975 175 19.0 232.0 90.0 3211.0 17.0 1975 176 23.0 115.0 95.0 2694.0 15.0 1975 177 23.0 120.0 88.0 2957.0 17.0 1975 34. 34 | O b s e r v a t i o n a l S t u d y 178 22.0 121.0 98.0 2945.0 14.5 1975 179 25.0 121.0 115.0 2671.0 13.5 1975 180 33.0 91.0 53.0 1795.0 17.5 1975 181 28.0 107.0 86.0 2464.0 15.5 1976 182 25.0 116.0 81.0 2220.0 16.9 1976 183 25.0 140.0 92.0 2572.0 14.9 1976 184 26.0 98.0 79.0 2255.0 17.7 1976 185 27.0 101.0 83.0 2202.0 15.3 1976 186 17.5 305.0 140.0 4215.0 13.0 1976 187 16.0 318.0 150.0 4190.0 13.0 1976 188 15.5 304.0 120.0 3962.0 13.9 1976 189 14.5 351.0 152.0 4215.0 12.8 1976 190 22.0 225.0 100.0 3233.0 15.4 1976 191 22.0 250.0 105.0 3353.0 14.5 1976 192 24.0 200.0 81.0 3012.0 17.6 1976 193 22.5 232.0 90.0 3085.0 17.6 1976 194 29.0 85.0 52.0 2035.0 22.2 1976 195 24.5 98.0 60.0 2164.0 22.1 1976 196 29.0 90.0 70.0 1937.0 14.2 1976 197 33.0 91.0 53.0 1795.0 17.4 1976 198 20.0 225.0 100.0 3651.0 17.7 1976 199 18.0 250.0 78.0 3574.0 21.0 1976 200 18.5 250.0 110.0 3645.0 16.2 1976 201 17.5 258.0 95.0 3193.0 17.8 1976 202 29.5 97.0 71.0 1825.0 12.2 1976 203 32.0 85.0 70.0 1990.0 17.0 1976 204 28.0 97.0 75.0 2155.0 16.4 1976 205 26.5 140.0 72.0 2565.0 13.6 1976 206 20.0 130.0 102.0 3150.0 15.7 1976 207 13.0 318.0 150.0 3940.0 13.2 1976 208 19.0 120.0 88.0 3270.0 21.9 1976 209 19.0 156.0 108.0 2930.0 15.5 1976 210 16.5 168.0 120.0 3820.0 16.7 1976 211 16.5 350.0 180.0 4380.0 12.1 1976 212 13.0 350.0 145.0 4055.0 12.0 1976 213 13.0 302.0 130.0 3870.0 15.0 1976 35. 35 | O b s e r v a t i o n a l S t u d y 214 13.0 318.0 150.0 3755.0 14.0 1976 215 31.5 98.0 68.0 2045.0 18.5 1977 216 30.0 111.0 80.0 2155.0 14.8 1977 217 36.0 79.0 58.0 1825.0 18.6 1977 218 25.5 122.0 96.0 2300.0 15.5 1977 219 33.5 85.0 70.0 1945.0 16.8 1977 220 17.5 305.0 145.0 3880.0 12.5 1977 221 17.0 260.0 110.0 4060.0 19.0 1977 222 15.5 318.0 145.0 4140.0 13.7 1977 223 15.0 302.0 130.0 4295.0 14.9 1977 224 17.5 250.0 110.0 3520.0 16.4 1977 225 20.5 231.0 105.0 3425.0 16.9 1977 226 19.0 225.0 100.0 3630.0 17.7 1977 227 18.5 250.0 98.0 3525.0 19.0 1977 228 16.0 400.0 180.0 4220.0 11.1 1977 229 15.5 350.0 170.0 4165.0 11.4 1977 230 15.5 400.0 190.0 4325.0 12.2 1977 231 16.0 351.0 149.0 4335.0 14.5 1977 232 29.0 97.0 78.0 1940.0 14.5 1977 233 24.5 151.0 88.0 2740.0 16.0 1977 234 26.0 97.0 75.0 2265.0 18.2 1977 235 25.5 140.0 89.0 2755.0 15.8 1977 236 30.5 98.0 63.0 2051.0 17.0 1977 237 33.5 98.0 83.0 2075.0 15.9 1977 238 30.0 97.0 67.0 1985.0 16.4 1977 239 30.5 97.0 78.0 2190.0 14.1 1977 240 22.0 146.0 97.0 2815.0 14.5 1977 241 21.5 121.0 110.0 2600.0 12.8 1977 242 21.5 80.0 110.0 2720.0 13.5 1977 243 43.1 90.0 48.0 1985.0 21.5 1978 244 36.1 98.0 66.0 1800.0 14.4 1978 245 32.8 78.0 52.0 1985.0 19.4 1978 246 39.4 85.0 70.0 2070.0 18.6 1978 247 36.1 91.0 60.0 1800.0 16.4 1978 248 19.9 260.0 110.0 3365.0 15.5 1978 249 19.4 318.0 140.0 3735.0 13.2 1978 36. 36 | O b s e r v a t i o n a l S t u d y 250 20.2 302.0 139.0 3570.0 12.8 1978 251 19.2 231.0 105.0 3535.0 19.2 1978 252 20.5 200.0 95.0 3155.0 18.2 1978 253 20.2 200.0 85.0 2965.0 15.8 1978 254 25.1 140.0 88.0 2720.0 15.4 1978 255 20.5 225.0 100.0 3430.0 17.2 1978 256 19.4 232.0 90.0 3210.0 17.2 1978 257 20.6 231.0 105.0 3380.0 15.8 1978 258 20.8 200.0 85.0 3070.0 16.7 1978 259 18.6 225.0 110.0 3620.0 18.7 1978 260 18.1 258.0 120.0 3410.0 15.1 1978 261 19.2 305.0 145.0 3425.0 13.2 1978 262 17.7 231.0 165.0 3445.0 13.4 1978 263 18.1 302.0 139.0 3205.0 11.2 1978 264 17.5 318.0 140.0 4080.0 13.7 1978 265 30.0 98.0 68.0 2155.0 16.5 1978 266 27.5 134.0 95.0 2560.0 14.2 1978 267 27.2 119.0 97.0 2300.0 14.7 1978 268 30.9 105.0 75.0 2230.0 14.5 1978 269 21.1 134.0 95.0 2515.0 14.8 1978 270 23.2 156.0 105.0 2745.0 16.7 1978 271 23.8 151.0 85.0 2855.0 17.6 1978 272 23.9 119.0 97.0 2405.0 14.9 1978 273 20.3 131.0 103.0 2830.0 15.9 1978 274 17.0 163.0 125.0 3140.0 13.6 1978 275 21.6 121.0 115.0 2795.0 15.7 1978 276 16.2 163.0 133.0 3410.0 15.8 1978 277 31.5 89.0 71.0 1990.0 14.9 1978 278 29.5 98.0 68.0 2135.0 16.6 1978 279 21.5 231.0 115.0 3245.0 15.4 1979 280 19.8 200.0 85.0 2990.0 18.2 1979 281 22.3 140.0 88.0 2890.0 17.3 1979 282 20.2 232.0 90.0 3265.0 18.2 1979 283 20.6 225.0 110.0 3360.0 16.6 1979 284 17.0 305.0 130.0 3840.0 15.4 1979 285 17.6 302.0 129.0 3725.0 13.4 1979 37. 37 | O b s e r v a t i o n a l S t u d y 286 16.5 351.0 138.0 3955.0 13.2 1979 287 18.2 318.0 135.0 3830.0 15.2 1979 288 16.9 350.0 155.0 4360.0 14.9 1979 289 15.5 351.0 142.0 4054.0 14.3 1979 290 19.2 267.0 125.0 3605.0 15.0 1979 291 18.5 360.0 150.0 3940.0 13.0 1979 292 31.9 89.0 71.0 1925.0 14.0 1979 293 34.1 86.0 65.0 1975.0 15.2 1979 294 35.7 98.0 80.0 1915.0 14.4 1979 295 27.4 121.0 80.0 2670.0 15.0 1979 296 25.4 183.0 77.0 3530.0 20.1 1979 297 23.0 350.0 125.0 3900.0 17.4 1979 298 27.2 141.0 71.0 3190.0 24.8 1979 299 23.9 260.0 90.0 3420.0 22.2 1979 300 34.2 105.0 70.0 2200.0 13.2 1979 301 34.5 105.0 70.0 2150.0 14.9 1979 302 31.8 85.0 65.0 2020.0 19.2 1979 303 37.3 91.0 69.0 2130.0 14.7 1979 304 28.4 151.0 90.0 2670.0 16.0 1979 305 28.8 173.0 115.0 2595.0 11.3 1979 306 26.8 173.0 115.0 2700.0 12.9 1979 307 33.5 151.0 90.0 2556.0 13.2 1979 308 41.5 98.0 76.0 2144.0 14.7 1980 309 38.1 89.0 60.0 1968.0 18.8 1980 310 32.1 98.0 70.0 2120.0 15.5 1980 311 37.2 86.0 65.0 2019.0 16.4 1980 312 28.0 151.0 90.0 2678.0 16.5 1980 313 26.4 140.0 88.0 2870.0 18.1 1980 314 24.3 151.0 90.0 3003.0 20.1 1980 315 19.1 225.0 90.0 3381.0 18.7 1980 316 34.3 97.0 78.0 2188.0 15.8 1980 317 29.8 134.0 90.0 2711.0 15.5 1980 318 31.3 120.0 75.0 2542.0 17.5 1980 319 37.0 119.0 92.0 2434.0 15.0 1980 320 32.2 108.0 75.0 2265.0 15.2 1980 321 46.6 86.0 65.0 2110.0 17.9 1980 38. 38 | O b s e r v a t i o n a l S t u d y 322 27.9 156.0 105.0 2800.0 14.4 1980 323 40.8 85.0 65.0 2110.0 19.2 1980 324 44.3 90.0 48.0 2085.0 21.7 1980 325 43.4 90.0 48.0 2335.0 23.7 1980 326 36.4 121.0 67.0 2950.0 19.9 1980 327 30.0 146.0 67.0 3250.0 21.8 1980 328 44.6 91.0 67.0 1850.0 13.8 1980 329 33.8 97.0 67.0 2145.0 18.0 1980 330 29.8 89.0 62.0 1845.0 15.3 1980 331 32.7 168.0 132.0 2910.0 11.4 1980 332 23.7 70.0 100.0 2420.0 12.5 1980 333 35.0 122.0 88.0 2500.0 15.1 1980 334 32.4 107.0 72.0 2290.0 17.0 1980 335 27.2 135.0 84.0 2490.0 15.7 1981 336 26.6 151.0 84.0 2635.0 16.4 1981 337 25.8 156.0 92.0 2620.0 14.4 1981 338 23.5 173.0 110.0 2725.0 12.6 1981 339 30.0 135.0 84.0 2385.0 12.9 1981 340 39.1 79.0 58.0 1755.0 16.9 1981 341 39.0 86.0 64.0 1875.0 16.4 1981 342 35.1 81.0 60.0 1760.0 16.1 1981 343 32.3 97.0 67.0 2065.0 17.8 1981 344 37.0 85.0 65.0 1975.0 19.4 1981 345 37.7 89.0 62.0 2050.0 17.3 1981 346 34.1 91.0 68.0 1985.0 16.0 1981 347 34.7 105.0 63.0 2215.0 14.9 1981 348 34.4 98.0 65.0 2045.0 16.2 1981 349 29.9 98.0 65.0 2380.0 20.7 1981 350 33.0 105.0 74.0 2190.0 14.2 1981 351 33.7 107.0 75.0 2210.0 14.4 1981 352 32.4 108.0 75.0 2350.0 16.8 1981 353 32.9 119.0 100.0 2615.0 14.8 1981 354 31.6 120.0 74.0 2635.0 18.3 1981 355 28.1 141.0 80.0 3230.0 20.4 1981 356 30.7 145.0 76.0 3160.0 19.6 1981 357 25.4 168.0 116.0 2900.0 12.6 1981 39. 39 | O b s e r v a t i o n a l S t u d y 358 24.2 146.0 120.0 2930.0 13.8 1981 359 22.4 231.0 110.0 3415.0 15.8 1981 360 26.6 350.0 105.0 3725.0 19.0 1981 361 20.2 200.0 88.0 3060.0 17.1 1981 362 17.6 225.0 85.0 3465.0 16.6 1981 363 28.0 112.0 88.0 2605.0 19.6 1982 364 27.0 112.0 88.0 2640.0 18.6 1982 365 34.0 112.0 88.0 2395.0 18.0 1982 366 31.0 112.0 85.0 2575.0 16.2 1982 367 29.0 135.0 84.0 2525.0 16.0 1982 368 27.0 151.0 90.0 2735.0 18.0 1982 369 24.0 140.0 92.0 2865.0 16.4 1982 370 36.0 105.0 74.0 1980.0 15.3 1982 371 37.0 91.0 68.0 2025.0 18.2 1982 372 31.0 91.0 68.0 1970.0 17.6 1982 373 38.0 105.0 63.0 2125.0 14.7 1982 374 36.0 98.0 70.0 2125.0 17.3 1982 375 36.0 120.0 88.0 2160.0 14.5 1982 376 36.0 107.0 75.0 2205.0 14.5 1982 377 34.0 108.0 70.0 2245.0 16.9 1982 378 38.0 91.0 67.0 1965.0 15.0 1982 379 32.0 91.0 67.0 1965.0 15.7 1982 380 38.0 91.0 67.0 1995.0 16.2 1982 381 25.0 181.0 110.0 2945.0 16.4 1982 382 38.0 262.0 85.0 3015.0 17.0 1982 383 26.0 156.0 92.0 2585.0 14.5 1982 384 22.0 232.0 112.0 2835.0 14.7 1982 385 32.0 144.0 96.0 2665.0 13.9 1982 386 36.0 135.0 84.0 2370.0 13.0 1982 387 27.0 151.0 90.0 2950.0 17.3 1982 388 27.0 140.0 86.0 2790.0 15.6 1982 389 44.0 97.0 52.0 2130.0 24.6 1982 390 32.0 135.0 84.0 2295.0 11.6 1982 391 28.0 120.0 79.0 2625.0 18.6 1982 392 31.0 119.0 82.0 2720.0 19.4 1982 Data obtained from: Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann. 40. 40 | O b s e r v a t i o n a l S t u d y SAS Code regression.sas ods html close; ods html; %let n=414; %let k=4; *ods pdf file='C:UsersJamesDropboxStatistical Computingstandregregression.pdf' style=journal; options nodate; *input data into SAS and perform standardization; %include 'C:UsersJamesDropboxStatistical Computingstandregsubstepsdatastep.sas'; *inspecting data; %include 'C:UsersJamesDropboxStatistical Computingstandregsubstepsinspect.sas'; *check for assumptions of regression; %include 'C:UsersJamesDropboxStatistical Computingstandregsubstepsassumptions.sas'; *linearize the model; %include 'C:UsersJamesDropboxStatistical Computingstandregsubstepslinearize.sas'; *check for assumptions of regression in the linearized model; %include 'C:UsersJamesDropboxStatistical ComputingstandregsubstepsLassumptions.sas'; *determine "optimal" model; %include 'C:UsersJamesDropboxStatistical Computingstandregsubstepsmodel.sas'; *perform ridge regression; %include 'C:UsersJamesDropboxStatistical Computingstandregsubstepsrr.sas'; ods pdf close; 41. 41 | O b s e r v a t i o n a l S t u d y datastep.sas *create a standardized (by year) for epa data set; proc import datafile='C:UsersJamesDropboxStatistical Computingstandregepa data.xls' out=epa dbms=xls replace; sheet='subset'; getnames=yes; run; proc means data=epa noprint nway ; class MY; var MPG WT DISP HP ACC; output out=tempepa mean= MPG_mean WT_mean DISP_mean HP_mean ACC_mean; run; data epaR; merge epa tempepa; by MY; NMPG=MPG/MPG_mean; NWT=WT/WT_mean; NDISP=DISP/DISP_mean; NHP=HP/HP_mean; NACC=ACC/ACC_mean; keep ID NMPG NWT NDISP NHP NACC; run; *create a standardized (by year) for ml data set; proc import datafile='C:UsersJamesDropboxStatistical Computingstandregml data.xls' out=ml dbms=xls replace; sheet='Sheet1'; getnames=yes; run; 42. 42 | O b s e r v a t i o n a l S t u d y proc means data=ml noprint nway ; class MY; var MPG WT DISP HP ACC; output out=tempml mean= MPG_mean WT_mean DISP_mean HP_mean ACC_mean; run; data mlR; merge ml tempml; by MY; NMPG=MPG/MPG_mean; NWT=WT/WT_mean; NDISP=DISP/DISP_mean; NHP=HP/HP_mean; NACC=ACC/ACC_mean; keep ID NMPG NWT NDISP NHP NACC; run; inspect.sas *analyze covariance; title 'Scatterplot Matrix for Car Factors'; title2 '-- with 99% prediction ellipses --'; proc sgscatter data=epaR; matrix NMPG NWT NDISP NHP NACC / ellipse=(alpha=0.01 type=predicted) diagonal=(histogram normal); run; *Pearson correlation coefficient between each variable; title 'Correlation Car Factors'; proc corr data=epaR plots=matrix(histogram); var NMPG NWT NDISP NHP NACC; run; *regression to analyze residuals of "non-linearized" data; title 'Graphical Display of "Normalized" Data'; proc reg data=epaR; 43. 43 | O b s e r v a t i o n a l S t u d y model NMPG = NWT NDISP NHP NACC; run; assumptions.sas *Testing for Assumptions; title 'Testing for Assumptions:'; title2 'Graphical Test of Linearity Assumption'; proc reg data=epaR; model NMPG = NWT NDISP NHP NACC / partial dw vif collin; output out=influence(keep = ID r lev cd dffit) rstudent=r h=lev cookd=cd dffits=dffit; run; quit; *print influential points / leverage points; title 'Studentized residuals: abs(r)>2'; proc print data=influence; where abs(r)>2; var ID r; run; title 'Leverage: lev > (2*k+2)/n'; proc print data=influence; where lev > (2*&k+2)/&n; var ID lev; run; *look at points simultaneously with large influence and leverage; title 'Inspect points that simultaneously have large influence and leverage'; proc sql; create table tempinfluence as select *, r**2/abs(sum(r)) as rsquared from influence; quit; goptions reset=all; axis1 label=(r=0 a=400); symbol1 pointlabel=("#id") font=simplex value=none; 44. 44 | O b s e r v a t i o n a l S t u d y proc gplot data=tempinfluence; plot lev*rsquared / vaxis=axis1; run; quit; linearize.sas *perform Box-Cox transformation; ods graphics off; proc transreg data=epaR; model boxcox(NMPG) = identity(NDISP NHP NWT NACC); run; ods graphics on; *linearize; data epaRln; set epaR; NMPG = (NMPG ** 0.25 - 1) / 0.25; NDISP = log(NDISP); NHP = log(NHP); run; data mlRlin; set mlR; NMPG = (NMPG ** 0.25 - 1) / 0.25; NDISP = log(NDISP); NHP = log(NHP); run; proc reg data=epaRln; model NMPG = NWT NDISP NACC NHP / partial dw vif collin; output out=Linf(keep = ID r lev cd dffit) rstudent=r h=lev cookd=cd dffits=dffit; run; quit; 45. 45 | O b s e r v a t i o n a l S t u d y Lassumptions.sas *Testing for Assumptions; title 'Testing for Assumptions:'; *analyze covariance; title2 'Scatterplot Matrix for Car Factors'; title3 '-- with 99% prediction ellipses --'; proc sgscatter data=epaRln; matrix NMPG NWT NDISP NHP NACC / ellipse=(alpha=0.01 type=predicted) diagonal=(histogram normal); run; *Pearson correlation coefficient between each variable; title2 'Correlation Car Factors'; title3; proc corr data=epaRln plots=matrix(histogram); var NMPG NWT NDISP NHP NACC; run; title 'Regression Diagnostics'; proc reg data=epaRln; model NMPG = NWT NDISP NHP NACC / partial dwprob vif collin spec; output out=Linf(keep = ID r lev cd dffit) rstudent=r h=lev cookd=cd dffits=dffit; OUTPUT OUT=DELIVERYR R=RESID STUDENT=ST; run; quit; /* USING NORMAL OPTION IN UNIVARIATE PROCEDURE */ /* WILL GENERATE NORMALITY TESTS RESULTS */ PROC UNIVARIATE DATA=DELIVERYR NORMAL; VAR RESID; HISTOGRAM RESID/KERNEL(C=1 2 3 4 L=1 20 2 34 COLOR=RED) NORMAL(MU=EST SIGMA=EST); * PROBPLOT RESID; QQPLOT RESID/NORMAL(MU=EST SIGMA=EST); RUN; 46. 46 | O b s e r v a t i o n a l S t u d y PROC AUTOREG DATA=DELIVERYR ALL; MODEL RESID=/DWPROB; RUN; PROC ARIMA DATA=DELIVERYR; IDENTIFY VAR=RESID NLAG=10; RUN; QUIT; *print influential points / leverage points; title 'Studentized residuals: abs(r)>2'; proc print data=Linf; where abs(r)>2; var ID r; run; title 'Leverage: lev > (2*k+2)/n'; proc print data=Linf; where lev > (2*&k+2)/&n; var ID lev; run; *look at points simultaneously with large influence and leverage; title 'Inspect points that simultaneously have large influence and leverage'; proc sql; create table LTinf as select *, r**2/abs(sum(r)) as rsquared from Linf; quit; goptions reset=all; axis1 label=(r=0 a=400); symbol1 pointlabel=("#id") font=simplex value=none; model.sas title 'Modeling'; 47. 47 | O b s e r v a t i o n a l S t u d y title2 'GLMSELECT stepwise selection'; proc glmselect data=epaRln testData=mlRlin plot=CriterionPanel; partition fraction(validate=0.40); model NMPG = NWT NHP NACC / selection=stepwise stats=all; run; quit; rr.sas title 'Ridge Regression to Address ViFs'; ***** CODE ADAPTED FROM PING-SHI WUS COURSESITE *****; /* PILOT RUN FOR THE DETERMINATION OF LAMBDA BY RIDGEPLOT*/ PROC REG DATA=eparln OUTEST=TMP plots(only)=ridge(unpack VIFaxis=log); MODEL NMPG = NDISP NHP NWT NACC / RIDGE=(0.001 TO 0.1 BY .001) OUTVIF vif; PLOT/RIDGEPLOT VREF=0; RUN; QUIT; ods graphics off; PROC REG DATA=eparln OUTEST=TEMP OUTSTB NOPRINT; MODEL NMPG = NDISP NHP NWT NACC / RIDGE=0.02 OUTVIF; RUN; QUIT; ods graphics on; PROC PRINT DATA=TEMP; WHERE _TYPE_='RIDGE'; VAR INTERCEPT NDISP NHP NWT NACC; TITLE 'RIDGE REGRESSION ESTIMATE'; RUN; TITLE; /* DOUBLE CHECK WITH MANUAL CALCULATION */ /* PROC IML; USE eparln; 48. 48 | O b s e r v a t i o n a l S t u d y READ ALL VAR {NMPG} INTO OY; READ ALL VAR {NDISP NHP NWT NACC} INTO OX; CLOSE CEMENT; N=NROW(OY); P=NCOL(OX); MY=OY[:,]; MX=OX[:,]; STDY=(1/(N-1)*OY`*(I(N)-1/N*J(N,N))*OY)**0.5; VARX=1/(N-1)*OX`*(I(N)-1/N*J(N,N))*OX; */ /* STANDARDIZE Y */ /* STY=(OY-MY)/STDY/(N-1)**0.5; D=T(VECDIAG(VARX)); D=D##(0.5); */ /* STANDARDIZE X */ /* STX=(OX-REPEAT(MX,N,1))/REPEAT(D,N,1)/(N-1)**0.5; PRINT STX; KI=0.02*I(P); */ /* RIDGE REGRESSION */ /* BETAR=GINV(STX`*STX+KI)*STX`*STY; PRINT BETAR; */ /* TRANSFORM BACK */ /* B=STDY*BETAR/T(D); B0=MY-MX*B; BB=B0//B; PRINT BB; QUIT; */ ODS GRAPHICS ON; PROC PRINCOMP DATA=eparln OUT=PCSCORE OUTSTAT=PCLOADING; VAR NDISP NHP NWT NACC; RUN; ODS GRAPHICS OFF; PROC PRINT DATA=PCSCORE; PROC PRINT DATA=PCLOADING; RUN; DATA PCLOADING; SET PCLOADING; WHERE _TYPE_=SCORE; RUN; 49. 49 | O b s e r v a t i o n a l S t u d y PROC IML; USE eparln; READ ALL VAR {NMPG} INTO OY; READ ALL VAR {NDISP NHP NWT NACC} INTO OX; CLOSE eparln; N=NROW(OY); P=NCOL(OX); MY=OY[:,]; MX=OX[:,]; STDY=(1/(N-1)*OY`*(I(N)-1/N*J(N,N))*OY)**0.5; VARX=1/(N-1)*OX`*(I(N)-1/N*J(N,N))*OX; /* STANDARDIZE Y */ STY=(OY-MY)/STDY; D=T(VECDIAG(VARX)); D=D##(0.5); /* STANDARDIZE X */ STX=(OX-REPEAT(MX,N,1))/REPEAT(D,N,1); /* PC REGRESSION */ PRINT STX; LAMBDA=EIGVAL(STX`*STX); VP=CUSUM(LAMBDA)/SUM(LAMBDA)*100; PRINT VP; /* CHOOSE 2 COMPONENTS MIN(VP>=0.95) */ P=EIGVEC(STX`*STX)[,1:2]; PRINT P; PCX=STX*P; PRINT PCX; BETAR=GINV(PCX`*PCX)*PCX`*STY; PRINT BETAR; /* TRANSFORM BACK */ B=STDY*P*BETAR/T(D); PRINT B; B0=MY-MX*B; BB=B0//B; PRINT BB; QUIT; 50. 50 | O b s e r v a t i o n a l S t u d y PROC PLS DATA=eparln METHOD=PCR; MODEL NMPG = NDISP NHP NWT NACC; RUN; PROC PLS DATA=eparln METHOD=PCR NFAC=4 DETAILS; MODEL NMPG = NDISP NHP NWT NACC; RUN; QUIT 51. 51 | O b s e r v a t i o n a l S t u d y Works Cited Chatterjee, Samprit, and Ali S. Hadi. "Regression Analysis by Example. 5th ed." Hoboken: Wiley, 2012. Dowdy, Shirley and Stanley Wearden. "Statistics for Research." Wiley, 1983. Quinlan, R. "Combining Instance-Based and Model-Based Learning." Proceeding on the Tenth International Conference of Machine Learning. University of Massachutsetts, Amherst: Morgan Kaufmann, 1993. 236-243. United States Department of Energy. Energy Efficient Technologies. 19 April 2013. 20 April 2013 . . Engine Technologies. 19 April 2013. 20 April 2013 . . Reduce Oil Dependence Costs. 19 April 2013. 20 April 2013 . United States Environmental Protection Agency. "Light Duty Automotive Technology and Fuel Economy Trends: 1975 Through 2011." 2011. . Light-Duty Automotive Technology, Carbon Dioxide Emissions, and Fuel Economy Trends: 1975 Through 2012. 07 March 2013. 07 March 2013 .