comparing estimation methods for spatial econometrics ml model ﬁtting comparison gmm model...

BackgroundML model fitting comparison

GMM model fitting comparisonImpacts

Comparing estimation methods for spatialeconometrics

Recent Advances in Spatial Econometrics (in honor of JamesLeSage), ERSA 2012

Roger Bivand Gianfranco Piras

NHH Norwegian School of Economics

Regional Research Institute at West Virginia University

Thursday, 23 August 2012

Roger Bivand, Gianfranco Piras Comparing estimation methods

Outline

Recent advances in spatial econometrics model fittingtechniques have made it more desirable to be able to compareresults

Results should correspond between implementations usingdifferent applications

A broad range of model fitting techniques are provided by thecontributed R packages for spatial econometrics

These model fitting techniques are associated with methodsfor estimating impacts and some tests, which will also bepresented and compared

Outline

Background

The use of spatial econometrics tools was widened by the easewith which methods and examples presented in Anselin (1988)could be reproduced using SpaceStatTM, written in GaussTM

It was rapidly complemented by the Spatial Econometricstoolbox for MatlabTM, provided as source code together withextensive documentation

A suite of commands for spatial data analysis for use withStataTM was provided by Maurizio Pisati, and macros forMinitabTM and SASTM were also made available

The thrust of SpaceStatTM has largely been taken over byGeoDa (Anselin et al. 2006), and more recently byOpenGeoDa

Background

Today’s software

There is now much more software available for spatialeconometrics

StataTM with sppack and MatlabTM with SpatialEconometrics Toolbox are mainstream programmes; theMatlabTM toolbox remains in the public domain, and has acommunity of contributors

OpenGeoDa and PySAL are open source, with code hosted onGoogle, binary versions for common platforms, and acommunity of users

R with spdep, sphet, McSpatial and other contributedpackages is open source, and the packages are cross-platform;the packages also have a community of users and developers

Today’s software

Why compare?

In the spirit of Rey (2009), this comparison will attempt toexamine some features of the implementation of functions forfitting spatial econometrics models

Firstly, it may be useful to show which kinds of functions forcreating spatial weights, for diagnostics, and for model fittingare available

Next, it is comforting when one can show that fitting thesame model on the same data using different implementationsgives the same results

Finally, if the results are not the same, it is helpful to be ableto show why they vary, possibly because of different designchoices in implementation

Why compare?

Data set: Ward and Gleditsch 2008

The data set used for comparison here is taken from a Sagevolume Spatial Regression Models by Ward and Gleditsch(2008), with political science data for 158 countries

The model they explore is the relationship between democracyscores (POLITY IV indicators) and the logarithm of countryGDP per capita in 2002

They treat countries as neighbours with non-zero spatialweights if their borders are closer than 200km from each other;data and weights are available for download from their site

They were among the first to examine the effects of feedbackin spatial regression models

Spatial weights

Creating spatial weights is a necessary step in using areal data,perhaps just to check that there is no remaining spatial patterningin residuals

The Matlab SE toolbox provides functions for creating nearestneighbour and triangulation contiguity weights, and for reading GALfiles

A number of functions are included in the R spdep package tocreate neighbours, and from them weights; GAL and GWT files maybe read and written

Weights may be constructed using functions in GeoDa, and in Pysalusing Python programming; GAL and GWT files may be read andwritten

The Stata spmat command provides for the creation of a number ofdifferent kinds of weights, and for file import and export

Spatial weights

The symptom — residual autocorrelationA map of the residuals of a least squares regression may help us to see whether

neighbouring observations appear to have residuals od similar value, indicating

positive autocorrelation, or possibly dissimilar values, indicating negative

autocorrelation:

OLS residuals

Spatial lag and Durbin models

The spatial lag model is not dissimilar to traditional time series models,with the autocorrelation process in the dependent variable controlled bythe exogenous matrix of spatial weights W (Ord 1975):

y = ρWy + Xβ + ε,

where y is an (N × 1) vector of observations on a dependent variabletaken at each of N locations, X is an (N × k) matrix of exogenousvariables, β is an (k × 1) vector of parameters, ε is an (N × 1) vector ofdisturbances and ρ is a scalar spatial error parameter (but called λ inStata)

The spatial Durbin model is the lag model augmented by spatially laggedright hand side variables:

y = ρWy + Xβ + WXθ + ε,

where θ is an ((k − 1)× 1) vector of parameters where W isrow-standardised, and a (k × 1) vector otherwise

Spatial lag and Durbin models

The spatial lag model is not dissimilar to traditional time series models,with the autocorrelation process in the dependent variable controlled bythe exogenous matrix of spatial weights W (Ord 1975):

y = ρWy + Xβ + ε,

where y is an (N × 1) vector of observations on a dependent variabletaken at each of N locations, X is an (N × k) matrix of exogenousvariables, β is an (k × 1) vector of parameters, ε is an (N × 1) vector ofdisturbances and ρ is a scalar spatial error parameter (but called λ inStata)

The spatial Durbin model is the lag model augmented by spatially laggedright hand side variables:

y = ρWy + Xβ + WXθ + ε,

where θ is an ((k − 1)× 1) vector of parameters where W isrow-standardised, and a (k × 1) vector otherwise

Spatial lag model log-likelihood function

`(β, ρ, σ2) = −N

2ln 2π − N

2lnσ2 + ln |I− ρW|

[y′(I− ρW)′(I− X(X′X)−1X′)(I− ρW)y

]and β = (X′X)−1(I− ρW)y, where ρ is the ML estimate.Unlike the time series case, the logarithm of the determinant of the(N × N) asymmetric matrix (I− ρW) does not tend to zero withincreasing sample size; it constrains the parameter values to theirfeasible range between the inverses of the smallest and largesteigenvalues of W, since for positive autocorrelation, as ρ → 1,ln |I− ρW| → −∞

ML spatial lag model results

The SE toolbox uses a pre-computed grid of log determinant values,

choosing the nearest rather than computing exactly at each call to the

log likelihood function; this accounts for slightly different coefficient

estimates compated to other implementations. There are two

implementations of the spatial lag in R in spdep and McSpatial

respectively:SE toolbox R spdep R McSpatial Stata spatreg OpenGeoDa

(Intercept) -6.2168 -6.2034 -6.2034 -6.2034 -6.2034 -6.2034log GDP pc 1.0014 0.9988 0.9988 0.9988 0.9988 0.9988ρ 0.5610 0.5632 0.5632 0.5632 0.5632 0.5632SE ρ 0.0760 0.0758 0.0703 0.0717 0.0717 0.0758

σ2 27.1795 27.1605 27.1605 27.1605 27.1605 27.1605LL -436.3166 -491.0994 -436.3408 -491.0994 -491.0994 -491.0994

ML spatial lag model results

The SE toolbox uses a pre-computed grid of log determinant values,

choosing the nearest rather than computing exactly at each call to the

log likelihood function; this accounts for slightly different coefficient

estimates compated to other implementations. There are two

implementations of the spatial lag in R in spdep and McSpatial

respectively:SE toolbox R spdep R McSpatial Stata spatreg OpenGeoDa

(Intercept) -6.2168 -6.2034 -6.2034 -6.2034 -6.2034 -6.2034log GDP pc 1.0014 0.9988 0.9988 0.9988 0.9988 0.9988ρ 0.5610 0.5632 0.5632 0.5632 0.5632 0.5632SE ρ 0.0760 0.0758 0.0703 0.0717 0.0717 0.0758

σ2 27.1795 27.1605 27.1605 27.1605 27.1605 27.1605LL -436.3166 -491.0994 -436.3408 -491.0994 -491.0994 -491.0994

ML spatial lag model differences I

There are two major discrepancies in the table of results: the first is that

the log-likelihood values at the optimimum differ between R McSpatial

and the SE toolbox and the rest.

SE toolbox R spdep R McSpatial Stata spatreg OpenGeoDa(Intercept) -6.2168 -6.2034 -6.2034 -6.2034 -6.2034 -6.2034log GDP pc 1.0014 0.9988 0.9988 0.9988 0.9988 0.9988ρ 0.5610 0.5632 0.5632 0.5632 0.5632 0.5632SE ρ 0.0760 0.0758 0.0703 0.0717 0.0717 0.0758

σ2 27.1795 27.1605 27.1605 27.1605 27.1605 27.1605LL -436.3166 -491.0994 -436.3408 -491.0994 -491.0994 -491.0994

The reason appears to be that π in the log likelihood calculation is not

multiplied by 2 in these cases, but is in the remainder. If we convert the

R McSpatial value of −436.3408 by subtracting n2 log(π) (line 65 in file

McSpatial/R/sarml.R), and adding n2 log(2π), we get −491.0752.

Similarly, correcting the SE toolbox value of −436.3166, we get

−491.0752 (line 453 file spatial/sar models/sar.m). The same kind of

difference appears in other reported SE toolbox log likelihood valuesRoger Bivand, Gianfranco Piras Comparing estimation methods

ML spatial lag model differences II

The other discrepancy is in the coefficient standard errors, given here in

full:SE toolbox R spdep R McSpatial Stata spatreg OpenGeoDa

SE (Intercept) 2.0831 2.0823 2.0579 2.0597 2.0598 2.0823SE log GDP pc 0.2784 0.2783 0.2729 0.2734 0.2734 0.2783SE ρ 0.0760 0.0758 0.0703 0.0717 0.0717 0.0758

We see that R spdep and OpenGeoDa agree, with the SE toolbox

function very close — the code in spatial/sar models/sar.m following line

202 indicates that for N ≤ 500, asymptotic calculations will be used

based on Anselin, (1980, 1988). The same source is used in R spdep and

OpenGeoDa (probably after line 861, Regression/smile2.cpp), but what

about the others?

ML spatial lag model differences II

The other discrepancy is in the coefficient standard errors, given here in

full:SE toolbox R spdep R McSpatial Stata spatreg OpenGeoDa

We see that R spdep and OpenGeoDa agree, with the SE toolbox

function very close — the code in spatial/sar models/sar.m following line

202 indicates that for N ≤ 500, asymptotic calculations will be used

based on Anselin, (1980, 1988). The same source is used in R spdep and

OpenGeoDa (probably after line 861, Regression/smile2.cpp), but what

about the others?

ML spatial lag model differences II — more

The standard errors reported by R McSpatial are taken from the Hessianreturned by the optimization function nlm. The R spdep function lagsarlmcan return a similar numerical Hessian, computed by as a finite differenceHessian by fdHess in nlme, or as the Hessian output by numericaloptimization function optim, using by default a quasi-Newton method due toBroyden, Fletcher, Goldfarb and Shanno (BFGS).

Stata spreg ml can also use a bfgs technique, but the default is a modified

Newton-Raphson method nr; spatreg uses optimization method lf for easy

fitting of maximum likelihood models. Since the R and Stata BFGS standard

errors agree, it is confirmed that Stata is reporting standard errors taken from

the Hessian returned by the optimization function, rather than the analytical

calculations even for small n.R nlm R fdHess R BFGS Stata BFGS Stata NR Stata lf

Maximum likelihood fitting differences

The differences identified in the spatial lag case follow throughfor the other model specifications examined here.

The Matlab SE toolbox uses a grid rather than a linesearch/optimization to fit the spatial coefficient(s), so theyusually agree only for the first few digits.

The Matlab SE toolbox also reports a log likelihood valueusing π rather than 2π.

Stata (both spreg ml and spatreg) reports coefficient standarderrors taken from the coefficient covariance matrix (Hessian)used in optimization, rather than analytical values reported forsmall n by R spdep functions, Matlab SE toolbox functions,and OpenGeoDa.

ML spatial Durbin model results

Only R and the Matlab SE toolbox provide the spatial Durbin model

directly, Stata and OpenGeoDa can fit it after creating the lagged

right-hand side variable(s) by hand. Here, all applications agree except

SE toolbox, again because of the gridded log determinant value causing

the numerical optimization to exit before finding the exact optimum. This

is justified because the SE toolbox is intended to provide for Bayesian

methods, and in that context the distribution of values is more important

than point estimates:

SE toolbox R spdep Stata spatreg OpenGeoDa(Intercept) -5.4303 -5.4143 -5.4143 -5.4143 -5.4143log GDP pc 1.1639 1.1641 1.1641 1.1641 1.1641lag log GDP pc -0.2726 -0.2755 -0.2755 -0.2755 -0.2755ρ 0.5720 0.5734 0.5734 0.5734 0.5734SE ρ 0.0775 0.0773 0.0741 0.0741 0.0773σ2 27.0420 27.0296 27.0296 27.0296 27.0296LL -436.1960 -490.9801 -490.9801 -490.9801 -490.9801

Spatial error model

There are a number of alternative forms of spatial regression models; herewe will also consider the spatial error model (also known as thesimultaneous autoregressive (SAR) model); the model may be written as(Ord 1975):

y = Xβ + u, u = λWu + ε,

where y is an (N × 1) vector of observations on a dependent variabletaken at each of N locations, X is an (N × k) matrix of exogenousvariables, β is an (k × 1) vector of parameters, ε is an (N × 1) vector ofdisturbances and λ is a scalar spatial error parameter (except by Kelejianand Prucha, who term it ρ),

and u is a spatially autocorrelated disturbance vector with constantvariance and covariance terms specified by a fixed spatial weights matrixand a single coefficient λ:

u ∼ N(0, σ2(I− λW)−1(I− λW′)−1)

Spatial error model

There are a number of alternative forms of spatial regression models; herewe will also consider the spatial error model (also known as thesimultaneous autoregressive (SAR) model); the model may be written as(Ord 1975):

y = Xβ + u, u = λWu + ε,

where y is an (N × 1) vector of observations on a dependent variabletaken at each of N locations, X is an (N × k) matrix of exogenousvariables, β is an (k × 1) vector of parameters, ε is an (N × 1) vector ofdisturbances and λ is a scalar spatial error parameter (except by Kelejianand Prucha, who term it ρ),

and u is a spatially autocorrelated disturbance vector with constantvariance and covariance terms specified by a fixed spatial weights matrixand a single coefficient λ:

u ∼ N(0, σ2(I− λW)−1(I− λW′)−1)

Spatial error model log-likelihood function

The log-likelihood function for the spatial error model:

`(β, λ, σ2) = −n

2ln(2π)− n

2ln(σ2) + ln(|I− λW|)

[(y − Xβ)′(I− λW)′(I− λW)(y − Xβ)

As we can see, the problem is one of balancing the logdeterminant term ln(|I− λW|) against the sum of squaresterm. When λ approaches the ends of its feasible range, thelog determinant term may swamp the sum of squares term

Spatial error model log-likelihood function

The log-likelihood function for the spatial error model:

`(β, λ, σ2) = −n

2ln(2π)− n

2ln(σ2) + ln(|I− λW|)

[(y − Xβ)′(I− λW)′(I− λW)(y − Xβ)

]As we can see, the problem is one of balancing the logdeterminant term ln(|I− λW|) against the sum of squaresterm. When λ approaches the ends of its feasible range, thelog determinant term may swamp the sum of squares term

ML spatial error model results

Once again, the line search with exact calculation of the log determinant

for R, Stata and OpenGeoDa agrees fully. There are minor differences in

the standard errors between R and OpenGeoDa on the one hand and

Stata on the other, because of the use of analytical standard errors for

small n in OpenGeoDa and R, with Stata using a numerical Hessian. The

SE toolbox estimates differ somewhat, with λ truncated to its gridded

value:

SE toolbox R spdep Stata spatreg OpenGeoDa(Intercept) -7.4860 -7.4865 -7.4865 -7.4865 -7.4865log GDP pc 1.3870 1.3871 1.3871 1.3871 1.3871λ 0.5820 0.5819 0.5819 0.5819 0.5819SE λ 0.0765 0.0765 0.0737 0.0737 0.0765σ2 27.1388 27.1399 27.1399 27.1399 27.1399LL -436.7681 -491.5267 -491.5267 -491.5267 -491.5267

ML general spatial model results

The general model includes two spatial processes and faces identification issues:

y = ρWy + Xβ + u, u = λWu + ε,

It is however used in some analyses, often in GM estimators, so the ML version

is useful for comparison, but does need care in selecting starting values for

numerical optimization; here R spdep and Stata spreg ml agree in all but

coefficient standard errors (spdep and SE toolbox use analytical calculations,

spreg uses the optimization Hessian):SE toolbox R spdep Stata

(Intercept) -3.9367 -4.0211 -4.0210log GDP pc 0.6161 0.6301 0.6301ρ 0.7810 0.7735 0.7735SE ρ 0.0742 0.0771 0.0804λ -0.4650 -0.4479 -0.4479SE λ 0.1846 0.1880 0.1937σ2 23.1479 23.3251 23.3251LL -434.7130 -489.5175 -489.5175

ML general spatial model results

The general model includes two spatial processes and faces identification issues:

y = ρWy + Xβ + u, u = λWu + ε,

It is however used in some analyses, often in GM estimators, so the ML version

is useful for comparison, but does need care in selecting starting values for

numerical optimization; here R spdep and Stata spreg ml agree in all but

coefficient standard errors (spdep and SE toolbox use analytical calculations,

spreg uses the optimization Hessian):SE toolbox R spdep Stata

(Intercept) -3.9367 -4.0211 -4.0210log GDP pc 0.6161 0.6301 0.6301ρ 0.7810 0.7735 0.7735SE ρ 0.0742 0.0771 0.0804λ -0.4650 -0.4479 -0.4479SE λ 0.1846 0.1880 0.1937σ2 23.1479 23.3251 23.3251LL -434.7130 -489.5175 -489.5175

ML general model standard error differences

The R spdep function sacsarlm can return a similar numerical Hessian,computed by as a finite difference Hessian by fdHess in nlme.

Stata spreg ml can use a bfgs technique, but the default is a modified

Newton-Raphson method nr. Since the R finite difference values and Stata

BFGS standard errors agree closely, it appears that Stata is reporting standard

errors taken from the Hessian returned by the optimization function, rather

than the analytical calculations even for small n.R asymptotic R fdHess Stata BFGS Stata NR

SE (Intercept) 1.5972 1.6659 1.6660 1.6651SE log GDP pc 0.2246 0.2350 0.2350 0.2348SE ρ 0.0771 0.0805 0.0805 0.0804SE λ 0.1880 0.1939 0.1939 0.1937

ML general model standard error differences

The R spdep function sacsarlm can return a similar numerical Hessian,computed by as a finite difference Hessian by fdHess in nlme.

Stata spreg ml can use a bfgs technique, but the default is a modified

Newton-Raphson method nr. Since the R finite difference values and Stata

BFGS standard errors agree closely, it appears that Stata is reporting standard

errors taken from the Hessian returned by the optimization function, rather

than the analytical calculations even for small n.R asymptotic R fdHess Stata BFGS Stata NR

SE (Intercept) 1.5972 1.6659 1.6660 1.6651SE log GDP pc 0.2246 0.2350 0.2350 0.2348SE ρ 0.0771 0.0805 0.0805 0.0804SE λ 0.1880 0.1939 0.1939 0.1937

GMM model fitting comparison

The recent introduction of StataTM and GeoDaSpacefunctions makes it helpful to compare them with SE toolboxand R functions

Within R, some functions have been contributed to spdep byLuc Anselin, and modified by the authors, and others are insphet, which now uses the function wrapper spreg

The functions use different parts of the literature as bases forimplementation, and the consequences of these choices will bemade clear here

Once again, we examine spatial lag, spatial error, and generalspatial models

GMM spatial lag models

Using two stage least squares with Wy instrumented by [WX,WWX], all

the functions yield the same coefficient estimates. In the two R and SE

toolbox functions, the error variance is calculated as σ2 = e′en−k , while in

the other two implementations is simply calculated as e′en :

SE toolbox R spdep R sphet spreg Stata GeoDaSpace(Intercept) -5.7466 -5.7466 -5.7466 -5.7466 -5.7466

(2.4576) (2.4576) (2.4576) (2.4341) (2.4341)log GDP pc 0.9097 0.9097 0.9097 0.9097 0.9097

(0.3783) (0.3783) (0.3783) (0.3747) (0.3747)ρ 0.6370 0.6370 0.6370 0.6370 0.6370

(0.2283) (0.2283) (0.2283) (0.2261) (0.2261)

GMM spatial lag models

Using two stage least squares with Wy instrumented by [WX,WWX], all

the functions yield the same coefficient estimates. In the two R and SE

toolbox functions, the error variance is calculated as σ2 = e′en−k , while in

the other two implementations is simply calculated as e′en :

SE toolbox R spdep R sphet spreg Stata GeoDaSpace(Intercept) -5.7466 -5.7466 -5.7466 -5.7466 -5.7466

(2.4576) (2.4576) (2.4576) (2.4341) (2.4341)log GDP pc 0.9097 0.9097 0.9097 0.9097 0.9097

(0.3783) (0.3783) (0.3783) (0.3747) (0.3747)ρ 0.6370 0.6370 0.6370 0.6370 0.6370

(0.2283) (0.2283) (0.2283) (0.2261) (0.2261)

The heteroskedastic error case I

White standard errors may be calculated in most of the functions directly,

in which the asymptotic VC matrix can be estimated consistently by the

sandwich form: (Z ′Z )−1(Z ′ΣZ )(Z ′Z )−1, where Σ is a diagonal matrix

whose elements are the e2i ; the results are the same in all cases:

R spdep R sphet Stata GeoDaSpace(Intercept) 2.4417 2.4417 2.4417 2.4417log GDP pc 0.3829 0.3829 0.3829 0.3829ρ 0.2239 0.2239 0.2239 0.2239

The heteroskedastic error case II

GeoDaSpace and sphet also implement the Kelejian and Prucha (2007)

HAC estimator of the variance covariance matrix. Here we compare

standard error estimates using a Triangular kernel with a variable

bandwidth of the six nearest neighbours. The available options for the

kernel function in R are the Epanechnikov, Triangular, Bisquare, Parzen,

Tukey-Hanning and Quadratic Spectral. The options available in GeoDa

space are the Uniform, Triangular, Epanechnikov, Quartic and Gaussian.

GeoDa space only allows for the implementation of adaptive kernel.

R sphet spreg GeoDaSpace(Intercept) 2.7512 2.7512log GDP pc 0.4445 0.4445ρ 0.2314 0.2314

GMM spatial error models

In the GMM spatial error model, we depend on the first stage residuals,

the implementation of the moment coment conditions, and the tuning of

the optimiser finding the spatial parameter λ, as well as defintions for

finding the standard error of λ. We see that there are three cases, the

first for the SE toolbox, spdep, and GeoDaSpace-2, using the Kelejian

and Prucha (1999) moment conditions (OLS first stage), and the

standard error of λ from Pruch (2004):

SE toolbox R spdep R sphet Stata GeoDaSpace-1 GeoDaSpace-2(Intercept) -7.5798 -7.5798 -7.5858 -8.8787 -7.5664 -7.5798

(3.0403) (3.0403) (3.0319) (3.2603) (3.0788) (3.0403)log GDP pc 1.3993 1.3993 1.4001 1.5688 1.3976 1.3993

(0.3767) (0.3767) (0.3758) (0.4072) (0.3796) (0.3767)λ 0.5621 0.5621 0.5604 0.5586 0.5850 0.5621

(0.1820) (0.1820) (0.0661) (0.0668) (0.0660)

The next case is for the sphet and GeoDaSpace-1 implementations, using

Drukker et al., which use a TSLS first stage. The results are close, and

differences seem to come from numerical optimization. The final case is

Stata, which appears to use a restricted instrument set — work is

continuing to establish why this is chosen:SE toolbox R spdep R sphet Stata GeoDaSpace-1 GeoDaSpace-2

(Intercept) -7.5798 -7.5798 -7.5858 -8.8787 -7.5664 -7.5798(3.0403) (3.0403) (3.0319) (3.2603) (3.0788) (3.0403)

log GDP pc 1.3993 1.3993 1.4001 1.5688 1.3976 1.3993(0.3767) (0.3767) (0.3758) (0.4072) (0.3796) (0.3767)

λ 0.5621 0.5621 0.5604 0.5586 0.5850 0.5621(0.1820) (0.1820) (0.0661) (0.0668) (0.0660)