predicting housing sales price in the year 2008 and...accurately predict sales price in 2008 via...

23
BY: SHIVANI CHOUDHARY & EMILY PHILLIPS Predicting Housing Sales Price in the Year 2008

Upload: others

Post on 24-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

B Y : S H I V A N I C H O U D H A R Y &

E M I L Y P H I L L I P S

Predicting Housing Sales Price in the Year 2008

Page 2: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Objective

Accurately predict Sales Price in 2008 via House characteristics

Which of these characteristics are important in this prediction?

Dataset obtained from the United States Census Bureau from the http: //www.census.gov /construction/nrc/index.html website

Data is collected through survey of construction

Funded by Department of Housing and Urban Development

Page 3: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Methodology

Split data into training (75%) and test (25%)

Complete Univariate Analysis of variables

Check for Heteroscedasticity, multicollinearity, etc.

Step-wise Model Selection

Test significance, residual analysis, etc.

Check model on test dataset

Re-run on full dataset

Page 4: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Data Distribution

7,042 in whole dataset- 5,281 in train, 1,761 in test

1 continuous response, 1 continuous regressor and 6 categorical regressors,

Variable Type

Sales Price Continuous (Response)

Square Foot Area of the House Continuous (Regressor)

Bedrooms Categorical (Regressor)

Full Bathrooms Categorical (Regressor)

Half Bathrooms Categorical (Regressor)

Stories Categorical (Regressor)

Parking Facility Categorical (Regressor)

Metropolitan Area Categorical (Regressor)

Page 5: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Scatterplot

Page 6: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Checking Heteroscedasticity

Page 7: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Spread vs Level

Page 8: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Box-Cox Transformation

Page 9: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Reducing Heteroscedasticity

Page 10: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Creating Linear Relationships

Page 11: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Scatterplot of Re-expressed Values

Page 12: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Problem- Interpretation

Our final Box-Cox Transformation gave a lambda of -0.333 (the reciprocal cube root)

This is hard to interpret, and thus not optimal.

-0.333 ~ 0

The log is easier to explain

Page 13: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Proof of Similarity of Transformation

Page 14: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Proof of Similarity of Transformation

Page 15: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Outliers: Hat Matrix

Cutoff: 2p/n ~ 0.003

Page 16: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Final Model

All 3 methodologies (forward, backward, and stepwise) using Log transforms agreed on the final model

No metropolitan area

Test data confirmed this model as a good fit

R^2 = 0.5371031 for test

R^2 = 0.5228 for training

Refit this model on the entire dataset for more accuracy

Page 17: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

R^2 = 0.5277

X1 = Log Square Foot Area of House

X2 = 2 full bathrooms if=1

X3 = 3 full bathrooms if =1

X4 = 4 or more full bathrooms if=1

X5 = 1 half bathroom if=1

X6 = 2 or more half bathrooms if=1

X7 = 3 bedrooms if =1

X8 = 4 bedrooms if =1

X9 = 5 or more bedrooms if =1

X10 = 2 car garage if=1

X11= 3 or more car garage if=1

X12 = other parking if=1

X13 = 2 or more stories if =1

X14 = split-level if =1

Variable Coefficient Stan. error t-statistic p-value Meaning

Intercept 7.195195 0.134570 53.468 < 2e-16

X1 0.673273 0.018480 36.432 < 2e-16 Log Square Foot Area ofHouse

X2 0.014853 0.031145 0.477 0.633 2 full bathrooms if =1

X3 0.203439 0.033720 6.033 1.69e-09 3 full bathrooms if =1

X4 0.421026 0.039484 10.663 < 2e-16 4 or more full bathrooms if =1

X5 0.113380 0.011296 10.037 < 2e-16 1 half bathroom if =1

X6 0.182157 0.031005 5.875 4.42e-09 2 or more half bathrooms if =1

X7 -0.164756 0.015939 -10.337 < 2e-16 3 bedrooms if =1

X8 -0.185899 0.018351 -10.130 < 2e-16 4 bedrooms if =1

X9 -0.266615 0.025485 -10.462 < 2e-16 5 or more bedrooms if=1

X10 -0.003572 0.018385 -0.194 0.846 2 car garage if =1

X11 0.145093 0.021573 6.726 1.88e-11 3 or more car garage if=1

X12 -0.075753 0.026314 -2.879 0.004 Other parking if=1

X13 0.067372 0.011786 5.716 1.13e-08 2 or more stories if =1

X14 -0.002316 0.061797 -0.037 0.970 Split-Level house if=1

Page 18: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Testing a Subset of Regression Coefficients

Full Model: F-statistic= 560.9, p-value < 2.2e-16

Can conclude there is predictive value in the equation as a whole

Variable Taken out F-Statistic P-value

Square Foot Area of House 1327.3 < 2.2e-16

Full Bathrooms 118.3 < 2.2e-16

Half Bathrooms 55.972 < 2.2e-16

Bedrooms 45.704 < 2.2e-16

Parking Facility 55.833 < 2.2e-16

Stories 16.48 7.24e-08

Page 19: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Example of Whole vs Individual Sig

Variable Level of Var t-statistic P-value Signif. code

Parking Facility

Level 2 -0.194 0.846

Parking Facility

Level 3 6.726 1.88e-11 ***

Parking Facility

Level 4 -2.879 0.004 **

F-statistic P-value

55.833 < 2.2e-16

Page 20: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Residuals vs Fitted

Page 21: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Normal Q-Q Plot of Residuals

Page 22: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Problems we faced

Necessary transformations for variables

Missing data (chose to exclude)

Low Level of Multicollinearity

Categorical Data

Outliers

Possible overfitting (huge dataset)?

Page 23: Predicting Housing Sales Price in the Year 2008 and...Accurately predict Sales Price in 2008 via House ... Predicting Housing Sales Price in the Year 2008 Author: wildcat Created Date:

Conclusion

We were able to develop a model that moderately well predicted the Sales Price for houses in 2008

We found variables that appear to be important in this prediction