toronto housing market segmentation using gwr · organization overview • the mission of canada...
TRANSCRIPT
CANADA MORTGAGE AND HOUSING CORPORATION
Toronto Housing Market Segmentation using GWRXiongbing Jin
Canada Mortgage and Housing Corporation
Oct. 12th, 2017 @ Esri User Conference Ottawa
CANADA MORTGAGE AND HOUSING CORPORATION
Organization overview
• The mission of Canada Mortgage and Housing Corporation (CMHC) is to help Canadians meet their housing needs
• Business areas
• Mortgage loan insurance
• Affordable housing
• First nation housing
• Policy and research
• Securitization
• Uses GIS and Esri products in many sections and areas
CANADA MORTGAGE AND HOUSING CORPORATION
Why market segmentation
• Location is one of the most important factors influencing housing prices
• Large CMAs like Toronto, Montreal and Vancouver contain areas with significantly different locational factors
• Market segmentation divides a study area into many submarkets, where within each market the influence of location is relatively homogeneous, and hedonic models are able to better capture local market dynamics
• Manual delineation of submarkets is often arbitrary, error-prone, and time-consuming
• A GWR-based automated approach is proposed (Borst, 2007)
Location!Location!Location!
CANADA MORTGAGE AND HOUSING CORPORATION
Housing submarkets in the City of Toronto
Source: Toronto Star
CANADA MORTGAGE AND HOUSING CORPORATION
What is Geographically Weighted Regression (GWR)
Source: Fotheringham et al (2003)
CANADA MORTGAGE AND HOUSING CORPORATION
Key features of GWR
• GWR applies a local regression to each subject property for properties in its neighbourhood
• Bandwidth determines sample size
• Fixed bandwidth: all properties within 2,000 metres
• Adaptive bandwidth: 2,000 nearest properties regardless of distance
• Kernel determines how weight decreases over distance
• Only the Gaussian kernel is implemented in ArcGIS
• Each GWR local regression captures the localizedcontributions of price predictors
Source: Borst (2007)
CANADA MORTGAGE AND HOUSING CORPORATION
GWR reveals locational influences – value of floor area
CANADA MORTGAGE AND HOUSING CORPORATION
Use GWR for market segmentation
• Select best independent variables for use in GWR (Exploratory Regression)
log 𝑝𝑟𝑖𝑐𝑒= 𝑓ሺ
ሻ𝑏𝑎𝑡ℎ𝑟𝑜𝑜𝑚𝑠, 𝑓𝑖𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑠, 𝑓𝑙𝑜𝑜𝑟
/𝑏𝑎𝑠𝑒𝑚𝑒𝑛𝑡 𝑎𝑟𝑒𝑎, 𝑎𝑔𝑒,𝑚𝑜𝑛𝑡ℎ 𝑜𝑓 𝑠𝑎𝑙𝑒
• Determine GWR parameters and run GWR (GWR with cross validation/AICc)
• With over 130,000 points, ArcGIS is the only software that can run GWR using 16GB of RAM
• GWR captures the difference in the contribution of each variable in housing prices at different locations of the region. Using GWR to predict the price of an average house across the city reveals the overall influence of location on property prices
CANADA MORTGAGE AND HOUSING CORPORATION
Benchmark house
• Benchmark house (or Market Basket House) is a “typical” house whose characteristics are the median values of the property characteristics of all houses in the region
• For Toronto, a benchmark house has
• 2 full bathrooms,
• 1 fireplace,
• 173 square metres (1,862 square feet) of floor area,
• 48 square metres (516 square feet) of finished basement area,
• was built 18 years ago, and
• was sold in September 2013
(values based on properties sold in Toronto CMA between Nov 2010 and Oct 2015)
CANADA MORTGAGE AND HOUSING CORPORATION
Benchmark house price prediction
CANADA MORTGAGE AND HOUSING CORPORATION
Use GWR for market segmentation: clustering
• Group points based on the predicted benchmark house value (Grouping Analysis, which uses k-means clustering)
• k-means clustering partitions observations into k clusters where each observation belongs to the cluster with the nearest mean
• Animated example of k-means clustering, from David Kauchak (Pomona College) http://www.cs.pomona.edu/~dkauchak/classes/f13/cs451-f13/lectures/lecture31-kmeans.pptx
CANADA MORTGAGE AND HOUSING CORPORATION
Use GWR for market segmentation: number of segments
• k values between 2 and 11 (i.e. 2 to 11 submarkets) are tested.
• For each k value:
• After k-means clustering groups the points in the k groups, the boundaries between groups are re-aligned to census tract boundaries
• An ordinary least square (OLS) model is estimated for each submarket (using the same model specification as the GWR model)
• The overall performance of all submarket models is summarized
• Performance is then compared between the segmentation scenarios to identify the optimal number of submarkets (
• The 7 submarkets scenario is selected as the optimum, balancing performance and model complexity
• When model performance is similar, the simpler model is always preferred
CANADA MORTGAGE AND HOUSING CORPORATION
Comparing different number of submarkets
CANADA MORTGAGE AND HOUSING CORPORATION
Market segmentation results for the Toronto CMA
Submarkets:1. Mississauga/Oakville2. Scarborough/Durham
Region3. Toronto (excluding Don
Valley and Scarborough)4. Don River Valley5. South York Region6. North York Region7. Brampton
CANADA MORTGAGE AND HOUSING CORPORATION
The submarkets have distinct market dynamics
Note: Submarket names are for demonstration only, and do not correspond to the actual administrative areas.
CANADA MORTGAGE AND HOUSING CORPORATION
Comparing single and submarket models
• Model quality and performance are compared between
• A single market model covers the entire Toronto CMA, including all previously-mentioned variables in addition to census tract dummy variables (to capture locational influences)
• 7 submarket models, one for each identified submarket, using the same model specification as the single market model
• Model quality
• Submarket models greatly reduces spatial autocorrelation (Spatial Autocorrelation – Global Moran’s I)
CANADA MORTGAGE AND HOUSING CORPORATION
Comparing single and submarket models’ performance
CANADA MORTGAGE AND HOUSING CORPORATION
Conclusion
• The GWR and k-means clustering based method is able to detect distinct housing submarkets
• Market segmentation improves model quality and prediction accuracy
CANADA MORTGAGE AND HOUSING CORPORATION
References
• Borst, R. (2007). Discovering and Applying Location Influence Patterns in the Mass Valuation of Demestic Real Propety. PhD thesis. University of Ulster
• Fotheringham, et al. (2003). Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. John Wiley & Sons,
• Kauchak, D. (2013). Machine Learning and Big Data (Course material). http://www.cs.pomona.edu/~dkauchak/classes/f13/cs451-f13/lectures/lecture31-kmeans.pptx
• Radil, S. (2011). Spatializing social networks: making space for theory in spatial analysis. PhD thesis. University of Illinois at Urbana-Champaign.
• Yew, M. (2013). Homes in GTA see big price gain. Toronto Star. https://www.thestar.com/business/real_estate/2013/07/18/homes_in_gta_see_big_price_gain.html
CANADA MORTGAGE AND HOUSING CORPORATION
Connecting R and ArcGIS
• Using ArcGIS in R
• arcgisbinding: An R library released by Esri to read/write/convert ArcGIS data formats
• reticulate: An R library to run Python/ArcPy code from within R
CANADA MORTGAGE AND HOUSING CORPORATION
Demo: ArcGIS API for Python
Demo: ArcGIS API for Python
CANADA MORTGAGE AND HOUSING CORPORATION
Additional slides
CANADA MORTGAGE AND HOUSING CORPORATION
k-means clustering: an example
Source: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
k-means clustering: initialize centers randomly
Source: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
k-means clustering: assign points to nearest center
Source: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
k-means: readjust centers
Source: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
K-means: assign points to nearest center
Source: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
K-means: readjust centers
Source: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
K-means: assign points to nearest center
Source: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
K-means: readjust centers
Source: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
K-means: assign points to nearest center
No changes: DoneSource: David Kauchak (Pomona College)
CANADA MORTGAGE AND HOUSING CORPORATION
Spatial autocorrelation (Moran’s I)
• Moran’s I: Spatial autocorrelation in residual errors. Smaller values mean more randomness, or less spatial autocorrelation
• Moran’s I = 1 Moran’s I = 0 Moran’s I = -1
Source: Steven Radil (2011)