rpa307 p2

41
RPA 307 Project #2 Valuation Modelling Case Study Name: Eun Ji (Lindsay) Lee Student Id : 055-693-147 Date Submitted: December 1 2014 Professor: Anthony Percaccio

Upload: lindsay-eun-ji-lee

Post on 10-Aug-2015

258 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: RPA307  P2

RPA 307 Project #2

Valuation Modelling Case Study

Name: Eun Ji (Lindsay) Lee

Student Id : 055-693-147

Date Submitted: December 1 2014

Professor: Anthony Percaccio

Page 2: RPA307  P2

P a g e | 1

Table of Contents Introduction 2 Data Analysis 2 Transformations 9 Variable selection 10 Model Calibration 11 Final analysis and conclusions 17 Appendix A: Graphs and analysis 22 Appendix B: Syntax for transformation 39

Page 3: RPA307  P2

P a g e | 2

Introduction

This report is intended to provide the most accurate predicted market value based on given data set, ‘Midsize_T2’ through multiple linear regression equation. SPSS software was used for the calculation and the calculation is derived from the nine steps by the course manual.

The equation will provide predicted house price information for single-family residential properties in Midsize City. The intended users are lenders, private vendors, homebuyers, and property tax agents in the market. The valuation date is December 31, 2008.

Data Analysis Step 0. Preliminary data screening

The purpose of preliminary data screening is to filter out ‘bad’ data that was incorrectly put(such as one digit square foot in total finished area, 100 washrooms, etc.), filter out any duplicated data(s), and quickly review the outliers.

The raw data has 726 sales with 43 variables. Quick screening by frequency statistics show there were three ‘bad’ data; 72 Mount View, 46 South Cove, and 46 South Cove (the latter two are duplicated) have one-digit total finished area (1,1,3 accordingly), they were removed. Also there were 9 duplicate sales, which were removed. So far 714 sales were left.

Checking for outliers by scatter plot (sales price vs. continuous variables) showed there were no specific outliers.

Step 1. Specifying the model

The basic model is;

MV= LV + BV

where MV = Estimated Market value of sales price;

LV = land value and BV = building value

This is the base model which will be developed in multivariate (multi variable) linear model.

Step 2. Examine the variables

The 46 variables could be distinguished into structure (building-related), land-related, and information only (user-friendly purpose). The detail is as follow:

Category Variables Structure (Candidate for building value) YearBuilt, EffectiveYear,

Foundation, FinishedArea, First_Floor,

Page 4: RPA307  P2

P a g e | 3

Second_Floor, BasementTotalArea, BasementFinishedArea, Stories, DeckCoveredArea, DeckUncoveredArea, FullBath, ThreeQtrBath,HalfBath, Bedrooms, Fireplcs, MultiCarGarage, SingleCarGarage, CarPort, Pool, Outbuildings

Land (Candidate for land value) LotSizeSqft, CornerLot, OutstandingView, ExcellentView, GoodView,

Information only PID, Nbhd, Neigh, ActualUse, ActualUseTypeDese, OutstandingView, ExcellengView, Good View SalesYear, SalesMonth, SalesDay, Sale Price, Adj_Price AssessedLand, AssessedImprov, AssessedTotal ManualClass, Lin_mancls, ManualClassDesc, Random

Elimination of Non-meaningful Variables

1. ActualUse (all 0), ActualUseTypeDesc(all 'SINGLE FAMILY DWELLING),

2. SaleYear(all 2008) will not be considered in any further steps since there is no distinction finding.

3. Random variable does not imply anything hence will not be considered in further steps.

4. AssessedLand, AssessedImprov, AssessedTotal do not reflect current market price hence will not be considered as well.

5. Sales date will be not used because it is too specific.

Step 3. Examining the variables

Check for time adjustment

Note that all sales were made in 2008 and sales day was not considered since it is too specific. There is a span of 7 months (May 2008 to December 2012) of sales, which shows fairly even distributions. Since the target date is Dec 31 2008, in this rapid moving market, time adjustment is required to make all the data in the same baseline. However, the detail of time adjustment will not be discussed here because it is already included in the data.

Page 5: RPA307  P2

P a g e | 4

Note that the market movement seems stable throughout the seven months but slightly more active during non-winter seasons. The typical time adjustment steps are 1) study graphically the general trend with time; 2) confirm the trend using K-W test; 3)

Sales price and time adjusted sale price has high R-squared = 95.8%.

Summary of the relationship of variables with adjusted sales price

• All 714 sales are open market sales of single family detached dwellings • the land area for all properties is measured in square foot

Neighbourhood characteristics

• there are five neighbourhoods; 55 properties of Alexandra Heights (nbhd code 005), 101 properties on Cairn Wood (nbhd code 014),

Page 6: RPA307  P2

P a g e | 5

• 87 properties on Lillian Lake (nbhd code 015), 164 properties on Mount View (nbhd code 072),

• 175 properties on South Cove (nbhd code 046), and 132 properties on West Hill (nbhd code 036)

House characteristics - general

• the lot size of the properties is from 2311 sq ft to 24803 sq ft, mean of 7572.8 sq ft and median of 7127.5 sq ft.

• *year built ranges from 1930 to 2007, median 1993 anad mean 1991.67 • effective year ranges from 1941 to 2007, median 1993 • There are four foundation types; 375 are basement, 247 are crawl space, 56 are 33 are

slab on-grade • Total finished area varies from 616 sq ft to 7477 sq ft. the range(6861 sq ft) is quite huge • *house characteristics-inside • the size of the first floor varis from 546 sq ft to 3609 sq ft, median of 1292 sq ft and mean

of 1322.39 sq ft • 476 properties have second floor; ranges from 309 sq ft to 2115 sq ft. mean of 1057.31

and median of 1065.5 • 432 properties have basement, total basement area ranges from 140 to 2625 sq ft.and

total finished basement area ranges from 0 to 2625 • bathrooms/ • 4 properties have 0 full bathroom; 4 properties have 6 full bathrooms; most properties

have 1 or 2 full bathrooms • most properties have 0 three-quarter bathrooms • 1 property have 3 half bathrooms; most properties have 2 full bathrooms • *most properties have 1 or 2 fireplaces • only 4 properties have 2 multi-car garages; most properties have 1 multi-car garage • most properties have 0 single car garage • only 1 properties have 2 carports; most properties have 0 carport. • *average number of bedrooms are 3.87. varies from 1 to 8. • *only one property has one bedroom • only two properties has eight bedrooms • *the properties has mostly 1 or 2 fireplaces, 16 properties has three fireplaces. • *most houses have one multi-car garage. only four properties has two multi-car garages • most houses have zero single car garages. • *most houses have zero carport. only one property has two carports.

House characteristics - binary variables

• 94.6% of the properties do not have pool • 96.6% of the properties do not have outbuilding • 6.9% of the properties are located around the corner • 2.8% of the properties have outstanding view • 5.5% of the properties have excellent view • 3.7% of the properties have good view.

Page 7: RPA307  P2

P a g e | 6

• all properties were sold at 2008. • properties were quite evenly sold between May to Dec, • but more properties were sold on Aug to Nov; about 20-30 more properties than other

months. • the majority of the properties, 33.5%, are second storey single family dwelling, modern

style, builted after 1930. • the quality of the house ranges from • the condition of the homes is either • 235 properties are first floor ; 20 properties have 1.5 storey; 456 2-floor properties • number of full bathrooms are varied from 0 to 6; mostly 1 or 2 full bathrooms

According to rule of 5, • 0 full bathrooms; four properties • 6 full bathrooms; four properties • 3 half bathrooms; one property

• 1 bedroom; one property • 8 bedrooms; two properties

• 2 multi-car garages: four properties • 2 carports; one property

Were eliminated because there is less than 5 data, which does not represent the average market condition.

So far, there were 712 remaining properties.

Graph analysis of the variables

a) scatter plot analysis : adjusted time sales price vs. areas

Adjusted time sales and continuous variables (LotSizeSqft, FinishedAreaFirst_Floor, BasementTotalArea, BasementFinishedArea , second_floor , basementfinishedarea ) were examined. Nothing significant relationship was observed. The highest R-squared value was 0.638, which was Finished area and Adjusted sales price.

The graphical analysis between basementtotalarea and basementfinishedarea showed R-squared of 0.447.

Page 8: RPA307  P2

P a g e | 7

b) box plot analysis: adjusted sales price vs. discrete variables

Adjusted sales price and discrete variables such as Neigh, Foundation, Stories, Bedrooms, Fireplcs, Pool, Outbuildings, CornerLot, Views, and ManualClass, were examined.

Page 9: RPA307  P2

P a g e | 8

The Noticeable distinctions were the Neighbourhoods - Alexander and Southcode had slightly higher adjusted sales price than other neighbourhoods, second story houses are higher than 1 or 1.5 stories, number of bedrooms are clearly distinct with adj sales price, number of fireplaces are clearly distinct with adj sales price, and manual class are distinct with adj sales price.

Corrleation analysis showed total finishe area (79.9%), assessment improved (68.6$) are closely corrleated with adjusted sales price.

Hence, the variables difficult or not worth to use were;

sales price will not be used; instead, time adjusted sales price will be used because it represents the data better year built; effective year is more efficient information storeys- finished area will include this information sales date is too specific hence not needed manual class will be transformed to linearized manual class random variable does not attribute any value to the price; hence will be not used sales year will not be used because it does not differentiate the data (all are in 2008) will only use deck covered area because simply deck uncovered area is area - deck covered area

Transformations Step 4. Transformations

Variables need to be transformed were

1) Number of bathrooms to be combined into one variable, Bathrooms = FullBath + 0.75*ThreeQtrBath + 0.5*HalfBath.

2) Corner lot, outbuildings, Excellent_view, Outstanding_view, Good_view and pool to be transformed to binary variable, i.e. Yes/ No to 1 and 0

3) All the square footage variables were linearized; they were multiplied by Lin_mancls,

Lin_LotSizeSqft = Lin_mancls * LotSizeSqft.

Lin_FinishedArea =Lin_mancls * FinishedArea.

Lin_First_Floor = Lin_mancls* First_Floor.

Lin_Second_Floor = Lin_mancls * Second_Floor.

Lin_BasementTotal = Lin_mancls* BasementTotalArea.

Lin_BasementFinished = Lin_mancls* BasementFinishedArea.

Lin_DeckCovered = Lin_mancls * DeckCoveredArea.

4) Foundations and Neighbourhoods were transformed as well; into six different binary variables.

The transformed variables were double checked by cross-tabulation by comparing to pre-transformed data, in case of something went wrong.

(The detailed syntax is available in Appendix B)

Page 10: RPA307  P2

P a g e | 9

Variable Selection Step 5. Re-examing new variables

The transformed variables were re-examined through scatter plot, box plot, and correlation matrix.

Adjusted sales price and discrete variables (bathrooms, CornorLot2, Outbuildings2, Pool2, Outstanding_view, Excellent_view, Good_view, AlexandraHeights, CairnWood, LillianLake, MountView, SouthCove, WestHill, CrawlSpace, PartialBsmt, and SlabFn) were examined through box plot.

Outstanding view, Excellent view, South Cove, West Hill showed clear distinction. (all moderate to low distinctive results could be found in Appendix A).

Page 11: RPA307  P2

P a g e | 10

Scatter plot analysis examined the relationship between adjusted sales price and continuous variables (Lin_LotSizeSqft, Lin_FinisheArea, Lin_First_Floor, Lin_Second_Floor, Lin_BasementTotal, Lin_BasementFinished, Lin_DeckCovered). All of the area variables had positive corrleationwith the adjusted sales price. However mulicolliniearity should be considered hence correlation matrix was examined as well.

According to the correlation matrix, lot size, finished area, first floor area, and second floor area had at least moderate relationship (correlation of 0.6 or high) with the adjusted sales price. However, lot size is also closely related to finished area and first floor, finished area is closely related to first floor, second floor, and finished basement area, etc. This “multicollinearity” is simply double-counting and potentially decreases the effectiveness of the equation. The multicollinearity could be also determined by VIF and significance level on regression analysis.

Model Calibration Step 6: List the Variables for Calibration

*The following 16 variables will be common to each of combinations

bathrooms CornerLot2

outbuildings2 PartialBsmt

SlabFn

pool2 outstanding_view excelleng_view

good_view alexandra heights

southcove westhill

CairnWood LillianLake crawlspace

Fireplcs Stepwise

Regression

In this step, multivariate linear regression (enter) was performed to build the model, which will be further adjusted.

Page 12: RPA307  P2

P a g e | 11

Three possible combinations of area variables are designed based on their low correlation.

Any area variables with their correlations greater than 60% were not put in the same model to avoid multicollinearity.

First Test: the sixteen common variables plus lot size, first floor, second floor

Second Test: the sixteen common variables plus total finished area

Third Test: the sixteen common variables plus lot size and finished area

First Run

First test includes the 16 common variables in addition to (Linearized) Lot Size, First Floor, and Second Floor.

Note that no variable has VIF greater than 3.3, Adjusted R^squared = 0.819, and the Standard Error of Estimate is 40395.853.

Second Run

Page 13: RPA307  P2

P a g e | 12

The second test includes 16 common variables and Total Finished Area.

VIF of 3.591 was observed in bathroom variable, which is not acceptable because VIF should be less than 3.591. Hence, will exclude bathroom and re-run

Second run - 2

All VIF was under 2, adjusted R square was 0.74 and SEE was 48408.249.

Page 14: RPA307  P2

P a g e | 13

Third Run

The third test considered the 16 common variables plus Lin-Lotsize sqft and Finished Area

Adjusted R-Square was 0.812, SEE was 41194.271, and All VIF were less than 3.3

Hence, first test was taken as the best representative based on the highes R-square and lowest SEE, as well as all VIF was less than 3.3.

Hence, the "Best” set of variables was:

Lin_LotSizeSqft Lin_First_Floor

Lin_Second_Floor AlexandraHeights

CairnWood LillianLake MountView SouthCove

WestHill Crawlspace

PartialBsmt SlabFn Bsmt

bathrooms CornerLot2

Outbuildings2 Pool2

Outstanding_view Excellent_view

Good_view.

Page 15: RPA307  P2

P a g e | 14

Step 7: model calibration

The model calibration step is calculating the coefficients for each of the variables in the model.

First, comparing R squared, SEE, constant, and coefficients in enter- and stepwise- regressions.

Showed that the R squared, SEE, constant, and coefficients were all the same.

Page 16: RPA307  P2

P a g e | 15

Residuals against the predicted values indicated sales 168, 196, and 565 were the outliers based on standard residual 3.3. Sales 168, 196, and 565 were filtered out.

When the regression was re-run, adjusted R-square and SEE were improved but crawl space and basement had VIF greater than 3.3. Hence Crawl space was removed because basement had the higher VIF, and sales number 157 was an outlier.

Page 17: RPA307  P2

P a g e | 16

After removing crawl space and sales number 157, the new regression model looked okay. The

following is the final result.

Page 18: RPA307  P2

P a g e | 17

The final model has adjusted R-square of 0.826, which is strong because it is greater than 0.8, SEE of 38941.812, and COV is (mean of adj.sales divided by SEE) which is 530985.62/38941.812 is 13.6%, which is acceptable range of 10~20% but ideally it should be lower than 10%, All t-statistics are not within the range of positive and negative 1.6, all VIF were less than 3.3 which means there is no major multicollinearity, and all significance level was less than 0.1, F-statistics was 262.557 and signicance level of the model was 0.000.

The final model is:

Predval = 327643.789 + 24181.094 * AlexandraHeights - 12570.719 * MountView

+ 20877.339 * SouthCove - 23705.343 * WestHill + 24663.88 * Bsmt - 17051.539 * SlabFn

+ 17787.747 * bathrooms - 15875.994 * CornerLot2 + 48134.742 * Outstanding_view

+ 28127.906 *Excellent_view + 7.613 * Lin_LotSizeSqft + 28.421 * Lin_First_Floor +

10.930 * Lin_Second_Floor.

Final Analysis and Conclusions STEP8: Test and Evaluate the Model

In this step, we will test and evaluate the model based on PAR = Predval/Adj_Price.

a) Testing Neighbourhood

Testing Property characteristics

Both scatter and box plot analysis showed there is no relationship between PAR and the variables in the model.

The mean ranks should be approximately 349 (half of 698). All of the neighbourhoods appear to be close to the target value.

Page 19: RPA307  P2

P a g e | 18

The Chi-square test indicates the acceptable range of the fair distribution of neighbourhoods.

Examination of Property Characteristics

All of them showed unrelated relationships with PAR.

Correlation matrix

The Correlation is quite low. The variable most strongly related to PAR is Lin_LotSizeSqft, -0.03.

Page 20: RPA307  P2

P a g e | 19

b) Stepwise Regression

14 variables are chosen in this model.

This model is acceptable because all of their t-statistics is not within the range of 1.6. It

has adjusted R square of 0.819, the mean value of adjusted sale is 530255.62. The COV is 530255.62 / 40330.79 = 13.15%, which is good but COV less than 10% would

be excellent representation of the data. F statistics is 228.297 with significance level

0.00. Lower significance level indicates good representation. All of the VIF were less than 3.3 and Tolerance greater than 0.3.

Hence, the “best” model has fourteen variables including;

Lin_LotSizeSqft

Lin_First_Floor

Lin_Second_Floor AlexandraHeights

CairnWood LillianLake

MountView SouthCove

WestHill

Crawlspace PartialBsmt

CornerLot2

Outbuildings2

Pool2 Outstanding_view

Excellent_view

Good_view SlabFn

Bsmt

bathrooms

Neighbourhood test

Page 21: RPA307  P2

P a g e | 20

The result is quite good.

The overall mean is 1.006 (should be within 1.1%) and the overall median is 1.000 (should be within 2.7%). The COD is 6.1% which is very good indication because it is below than 10%.

The lowest median PAR is 0.993 while the highest is 1.009. All of the 95% interval contains the median if 1.000. Which means there is less than 5% chance the medial PAR is not equal to 1.000.

Hence, there are no further adjustments are needed.

Step 9: Conclusion

The regression model quite well represents the market value for single family dwellings in Midsize City. Based on model test in Step 7 and 8, there is no further adjustment needed. However, this model represents 82% of the market data. Which is fairly good indication, however, one should research further to improve the goodness of the equation by gathering more data. Also all of the houses in the data set were sold in 2008 hence more comprehensive sales date is needed to understand the general price trend of sales price. Also, one should consider the subjective of the views (excellent, outstanding, and good) and the manual class, which was subjectively decided by one appraiser. Overall, the valuation model is credible and could be used by lenders, private vendors, homebuyers, and property tax agents.

Appendix A: Graphs and analysis Step 0:

Checking outliers

Page 22: RPA307  P2

P a g e | 21

Step 3: graphical analysis of variables

Page 23: RPA307  P2

P a g e | 22

Step 5: Re-examining the variables

Page 24: RPA307  P2

P a g e | 23

Page 25: RPA307  P2

P a g e | 24

Scatter plot analysis

Page 26: RPA307  P2

P a g e | 25

.

Page 27: RPA307  P2

P a g e | 26

Step 6: Regression Analysis First run

Page 28: RPA307  P2

P a g e | 27

Second run

Page 29: RPA307  P2

P a g e | 28

Second run-2

Page 30: RPA307  P2

P a g e | 29

Third run

Page 31: RPA307  P2

P a g e | 30

Step 7: model calibration

Re-run

Page 32: RPA307  P2

P a g e | 31

Final model

Page 33: RPA307  P2

P a g e | 32

Page 34: RPA307  P2

P a g e | 33 PAR

N Valid 698

Missing 0

Mean 1.0056

Median 1.0001

Range .40

Minimum .81

Maximum 1.21

Percentiles 10 .9129

20 .9384

25 .9488

30 .9573

40 .9790

50 1.0001

60 1.0241

70 1.0493

75 1.0601

80 1.0740

90 1.1059

Model Summary

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .907a .823 .819 40330.790

a. Predictors: (Constant), Lin_Second_Floor, MountView, SlabFn,

CornerLot2, Outstanding_view, Excellent_view, Crawlspace,

AlexandraHeights, WestHill, Lin_LotSizeSqft, SouthCove, bathrooms,

Lin_First_Floor, Bsmt

Page 35: RPA307  P2

P a g e | 34

Step 8:

Neighbourhood analysis

Step Variable Added Adjusted R Squared Standard Error 1 Lin_First_Floor 0.612 59106.45 2 Lin_LotSizeSqft 0.670 54500.135 3 Bathrooms 0.746 47800.484 4 Bsmt 0.769 45627.034 5 SouthCove 0.784 44087.197 6 AlexandraHeights 0.796 42910.725

Linear Regression based on PAR

Model Summaryb

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate

1 .086a .007 -.013 .07434

a. Predictors: (Constant), Lin_Second_Floor, MountView, SlabFn,

CornerLot2, Outstanding_view, Excellent_view, Crawlspace,

AlexandraHeights, WestHill, Lin_LotSizeSqft, SouthCove, bathrooms,

Lin_First_Floor, Bsmt

b. Dependent Variable: PAR

Page 36: RPA307  P2

P a g e | 35

7 WestHill 0.802 42276.472 8 Outstanding_View 0.807 41696.602 9 Lin_Second_Floor 0.812 41131.194 10 SlabFn 0.814 40883.494 11 Excellent_View 0.816 40717.662 12 MountView 0.818 40532.234 13 CrawlSpace 0.819 40424.299 14 CornerLot2 0.819 40330.790

Stepwise regression

Page 37: RPA307  P2

P a g e | 36

Page 38: RPA307  P2

P a g e | 37

Page 39: RPA307  P2

P a g e | 38

APPENDIX B: Syntax for Transformation

***step 4 transformation

*variables need to be transfomed

- number of bathrooms are needed to be combine into one variable.

USE ALL.

COMPUTE Bathrooms = FullBath + 0.75*ThreeQtrBath + 0.5*HalfBath.

*- corner lot, outbuildings, and pool will be transformed to binary variable.

RECODE CornerLot ('Y'=1) (ELSE=0) INTO CornerLot2.

RECODE Outbuildings ('Y'=1) (ELSE=0) INTO Outbuildings2.

RECODE Pool ('Y'=1) (ELSE=0) INTO Pool2.

*- views (outstanding, excelleng, good) will be combine into one variable and will be linearized.

RECODE OutstandingView ('Y'=1) (ELSE=0) INTO Outstanding_view.

RECODE ExcellentView ('Y'=1) (ELSE=0) INTO Excellent_view.

RECODE GoodView ('Y'=1) (ELSE=0) INTO Good_view.

*Manual classic description was already linearized.

Page 40: RPA307  P2

P a g e | 39

*cross tab

CROSSTABS

/TABLES=FullBath ThreeQtrBath HalfBath BY bathrooms

/FORMAT=AVALUE TABLES

/CELLS=COUNT

/COUNT ROUND CELL.

*linearize square footage variables

COMPUTE Lin_LotSizeSqft = Lin_mancls * LotSizeSqft.

COMPUTE Lin_FinishedArea =Lin_mancls * FinishedArea.

COMPUTE Lin_First_Floor = Lin_mancls* First_Floor.

COMPUTE Lin_Second_Floor = Lin_mancls * Second_Floor.

COMPUTE Lin_BasementTotal = Lin_mancls* BasementTotalArea.

COMPUTE Lin_BasementFinished = Lin_mancls* BasementFinishedArea.

COMPUTE Lin_DeckCovered = Lin_mancls * DeckCoveredArea.

EXECUTE.

*transform neighbourhoods

*Note: changed Nbhd from String to Numeric.

COMPUTE AlexandraHeights = 0.

COMPUTE CairnWood = 0.

COMPUTE LillianLake = 0.

COMPUTE MountView = 0.

COMPUTE SouthCove = 0.

COMPUTE WestHill = 0.

IF (Nbhd = 005) AlexandraHeights = 1.

IF (Nbhd = 014) CairnWood = 1.

IF (Nbhd = 015) LillianLake = 1.

Page 41: RPA307  P2

P a g e | 40

IF (Nbhd = 072) MountView = 1.

IF (Nbhd = 046) SouthCove = 1.

IF (Nbhd = 036) WestHill = 1.

EXECUTE.

*transform foundation.

COMPUTE Crawlspace = 0.

COMPUTE PartialBsmt = 0.

COMPUTE SlabFn = 0.

COMPUTE Bsmt = 0.

IF (Foundation = 'CRAWL') Crawlspace = 1.

IF (Foundation = 'PBSMT') PartialBsmt = 1.

IF (Foundation = 'SLAB') SlabFn = 1.

IF (Foundation = 'BSMT') Bsmt = 1.

EXECUTE.