spatial auto correlation primer - anselin_luc

8/3/2019 Spatial Auto Correlation Primer - Anselin_luc

1/24

JOURNAL OF HOUSING ECONOMICS 7, 304327 (1998)ARTICLE NO. HE980236

Spatial Autocorrelation: A Primer

Robin A. Dubin

Case Western Reserve University

Received September 17, 1998

Regression error terms are likely to be spatially autocorrelated in any situationin which location matters. While both the precision of the estimates and thereliability of hypothesis testing can be improved by making a correction for spatialautocorrelation, the techniques for making such a correction are not widely under-stood. The purpose of this paper is to explore some of the issues involved inestimating models with spatially autocorrelated error terms. One of the two mostcommon methods of handling spatial autocorrelation is the weight matrix approach,in which the process generating the errors is modeled. The resulting correlationstructure is then derived from the hypothesized process. The second method models

the correlation structure itself, rather than the underlying process. The bulk of thispaper is concerned with comparing these two methods and their resulting correlationstructures. Other issues are discussed at the end of the paper. 1998 Academic Press

1. INTRODUCTION

While autocorrelation in a time series context is well understood, andresearchers routinely test and correct for this problem, the same cannotbe said of autocorrelation in a cross-sectional context. The standard ruleof thumb is that autocorrelation is a problem in time series data andheteroscedasticity is a problem with cross-sectional data. However, thereare many instances in which an entitys location affects its behavior. Housingprices are a prime example: clearly the location of the house will have aneffect on its selling price. If the location of the house influences its price,then the possibility arises that nearby houses will be affected by the samelocation factors. Any error in measuring these factors will cause their errorterms to be correlated. Spatial autocorrelation is likely to be present inany situation in which location matters.

Although spatial autocorrelation can occur in many contexts, in thispaper I will focus on housing prices. In the case of housing prices, thelocation factors are called neighborhood effects. There are at least two

reasons to suspect that neighborhood effects are measured with errors.First, neighborhood is unobservable. This means that researchers wishing to

304


2/24

SPATIAL AUTOCORRELATION 305

account for neighborhood must use proxies. Crime rates and socioeconomiccharacteristics of residents are examples of variables which are commonlyused. Second, to make the use of proxies operational, a set of geographicboundaries must be assumed. Typically, the researcher uses the same setof boundaries as the data collector: census tracts are generally the bound-aries when socioeconomic data are used and crime reporting areas arecommonly used when crime rates are needed. Of course, the geographicboundaries that should be used are the (unknown) neighborhood bound-aries. To the extent that neighborhood boundaries differ from the datagathering boundaries, the proxies themselves will contain error. These twoproblems, unobservability and boundaries, make it virtually certain that

neighborhood variables will be measured with error, with the result thatthe regression error terms will be autocorrelated.

The consequences of spatial autocorrelation are the same as those oftime series autocorrelation: the OLS estimators are unbiased but inefficient,and the estimates of the variance of the estimators are biased. Thus theprecision of the estimates as well as the reliability of hypotheses testingcan be improved by making a correction for autocorrelation. Once thestructure of the autocorrelation has been estimated, this information canbe incorporated into any predictions, thereby improving their accuracy.1

Just as with time series autocorrelation, maximum likelihood (ML) tech-niques are commonly used to estimate the autocorrelation parameters andthe regression coefficients.2 Despite the similarities, spatial autocorrelationis conceptually more difficult to model than time series autocorrelation,because of the ordering issue. In a time series context, the researchertypically assumes that earlier observations can influence later ones, but not

the reverse. In the spatial context, an ordering assumption such as this isnot possible: if A affects B, it is likely that the reverse is also true. Also,the direction of influence is not limited to one dimension as in time series,but can occur in any direction (although we generally restrict the problem,at least in the case of housing, to two dimensions).

The purpose of this paper is to explore some of the issues involved inestimating models with spatially autocorrelated error terms. I use hedonicregression as the example problem, although the techniques discussed hereare applicable to a wide variety of problems. I discuss the basic issuesinvolved in modeling the autocorrelation structure and compare and con-trast the most commonly used techniques. My purpose in doing so is to

1 This technique is known as kriging in the geostatistics literature and best linear unbiasedprediction (BLUP) in the econometrics literature. Dubin (1992) and Basu and Thibodeau(1998) use this technique to predict housing prices. Also, Dubin (1998) and Dubin et al. (1998)discuss the issues involved in kriging.

2 Although other techniques are used in the literature for estimating models with spatiallyautocorrelated error terms, ML will be the only technique discussed here.


3/24

306 ROBIN A. DUBIN

promote a better understanding of these techniques, which I hope willencourage their use.

2. MODELS

There are two commonly used methods of modeling the autocorrelationstructure. The first is to model the process itself. This approach is basedon the work of geographers (Cliff and Ord, 1981) and requires the use ofa weight matrix. This approach is probably the more common of the twoin the real estate literature (see Can (1992) and Pace and Gilley (1998) for

examples). The second approach is to model the covariance matrix of theerror terms directly. This approach is based on the work of geologists(Matheron, 1963) and has also been used in the real estate literature (seeDubin (1988) and Basu and Thibodeau (1998) for examples).

2.a. First Approach: Weight Matrix

In this approach, the process generating the error terms is modeled

explicitly. The model is

YX u (1.a)

u Wu . (1.b)

In a hedonic regression, Y is an (N 1) vector containing the sellingprices of the houses, X is an (N K) matrix of the characteristics of the

houses, u is an (N 1) vector of the correlated error terms, and is a(K 1) vector of unknown regression coefficients. The process generatingthe correlations is shown in Eq. (1.b). Here, is an (N 1) vector ofnormally distributed and independent error terms (with mean zero andvariance 2) and is an unknown autocorrelation parameter (note that is a scalar). W is the weight matrix, which represents the spatial structureof the data. By far, the most common practice is to treat W as nonstochastic;that is, the researcher takes W as known a priori, and therefore, all resultsare conditional upon the specification of W (see Pace et al. (1998) for anexception). Note the similarity of this model to the time series AR1 model.Also, just as in time series, the model can be expanded by using variousspatial lags (see Anselin (1988, pp. 2224) for details).

In view of its centrality in this approach, a digression on W is in order.W is an N N matrix with zeros on its main diagonal. The off-diagonalelements, Wij, represent the spatial relationship between observations i and

j. A common method of forming W is to use nearest neighbors. Under thisscheme, Wij 1 if i and j are such that there is no observation closer to


4/24


either i or j, and zero otherwise. This scheme can easily be extended to nnearest neighbors. Another popular approach is to set Wij 1 if i and jare separated by a distance less than some, prespecified, limit. Rather thanmaking the elements of W binary, another approach is to set Wij 1/DPij, where D is an N N matrix showing the distances separating theobservations, and P is a constant. All of these approaches have been usedin the real estate literature; there does not appear to be any consensusregarding which scheme represents the best realization of the correlationstructure appearing in the housing market. This is problematic because allof the results are conditional on the researchers a priori specification ofthe spatial structure.

Solving (1.b) for u gives

u (IW)1 (2)

and thus

V E [uu]

2(I W)1(I W)1 (3)

where V is the variance/covariance matrix of u. Note that V typically willnot have a constant on the main diagonal. Thus, in this type of model, uis heteroskedastic, even though is not.

The fact that V involves the product of two inverted matrices makes itdifficult to visualize. In what follows, I show the correlations implied by

the various choices of W, given a set of locations. Because housing dataare not typically located on a regular grid, I use 10 observations, randomlylocated in a 10 10 square. These locations are shown in Fig. 1. Once thelocations are known, the distance matrix, D, can be calculated; all of theweight matrices discussed here are based on D (see Table I). Once W iscalculated, the population variance/covariance matrix is given by (3). Inthe illustration, I generate the correlations3 implied by the choice ofWforeach of the commonly used methods of specifying it: nearest neighbors,Wij 1 if Dij L, and Wij 1/DPij.

In addition to choosing the spatial weighting scheme, the researcher mustalso choose a parameter pertaining to it. For example, if the researcherchooses nearest neighbors, he must also decide the number of neighborsto use. For Wij 1 if Dij L, the researcher must decide the distancelimit (L). And for the inverse distance weight matrix, the researcher mustdecide the power to which the denominator is raised (P). These choices

3 The correlations are derived from (3) as follows: Corrij Vij/ViiVjj.


5/24

308 ROBIN A. DUBIN

FIG. 1. Locations.

(the form of the weight matrix and the value of the parameter) are madea priori by the researcher; the resulting weight matrix is taken as given. Asthe illustration below shows, these choices change the nature of the impliedcorrelations considerably.

A useful tool for representing spatial dependencies is the correlogram.

The correlogram shows the correlations between points, graphed as a func-tion of the distance separating them. Although not necessary, a nice prop-

TABLE IDistance Matrix

1 2 3 4 5 6 7 8 9 10

1 0.00 2.80 5.64 3.45 3.43 3.76 1.53 3.14 5.67 4.422 2.80 0.00 7.70 6.11 2.06 6.01 4.32 0.55 8.45 2.323 5.64 7.70 0.00 5.82 9.00 7.26 4.61 7.71 4.14 7.824 3.45 6.11 5.82 0.00 5.93 1.52 2.33 6.52 3.33 7.865 3.43 2.06 9.00 5.93 0.00 5.31 4.84 2.55 8.84 4.266 3.76 6.01 7.26 1.52 5.31 0.00 3.20 6.50 4.76 8.057 1.53 4.32 4.61 2.33 4.84 3.20 0.00 4.62 4.14 5.728 3.14 0.55 7.71 6.52 2.55 6.50 4.62 0.00 8.72 1.77

9 5.67 8.45 4.14 3.33 8.84 4.76 4.14 8.72 0.00 9.6110 4.42 2.32 7.82 7.86 4.26 8.05 5.72 1.77 9.61 0.00


6/24


erty for correlograms to exhibit is that the correlations decline as separationdistance increases. This is in accordance with Toblers (1970) first law ofgeography: everything is related to everything else, but near things aremore related than distant things. In the illustration, I show the correlo-grams for each of the spatial weighting schemes for different values of theparameters (which are normally chosen by the researcher) and for differentvalues of (the autocorrelation parameter, which is normally estimated).I use three values of the parameters and two values of, which gives sixcorrelograms for each weighting scheme. These are shown in Figs. 2 through4. Note that these correlograms are not based on simulated data, but arethe population correlograms, given the locations and the choice of W.4 I

also present one weight matrix and one correlation matrix for each scheme,these are shown in Tables 2 through 4. Finally the distance matrix for thedata is presented in Table 1.

2.a.1. Nearest neighbors. Figure 2.a. shows the correlograms for threechoices of the number of nearest neighbors (1, 2, or 3) when the spatialdependencies are strong ( 0.67). Note that because the weight matricesare row standardized,5 the range of 1 to 1.

Two observations can be drawn from an examination of this figure. First,while the correlations implied by this choice ofWtend to fall with separationdistance, the correlations do not fall monotonically. For example, in Fig.2.a.1, there are zeros interspersed with positive correlations. This meansthat points separated by the same distances can have very different correla-tions. This occurs for two reasons: (a) the definition ofW itself and (b) theformulation of the variance/covariance matrix as the product of two in-verted matrices. The definition comes into play because W

ij 1 only for

nearest neighbors. Consider a case where points A and B are 0.5 unitsapart and points A and C are 0.6 units apart. For one nearest neighbor,only A and B are neighbors, and therefore, WAC 0.

The presence of the inverted matrices is important, because it meansthat the locations of all points are taken into consideration when calculatingthe correlations. For example, consider row 2 of Table II (this is the correla-tion matrix for one nearest neighbor, when 0.67). Corr2,8 is the highest

in this row because 2 and 8 are nearest neighbors. The other correlationsare not zero, however. Corr2,5 is 0.826. This is because 2 is 5s nearestneighbor.6 Also, Corr2,10 0.764. This illustrates a three-way interaction:8 is nearest neighbor to both 10 and 2, therefore 10 is correlated with 2

4 These correlograms were generated by graphing the values in the population correlationmatrix (obtained from Eq. (3)) against the values in the distance matrix.

5 Row standardized means that W is transformed so that the rows sum to one.6 Note that the reverse is not true: 8 (and not 5) is 2s nearest neighbor. Thus W is not

symmetric for the nearest neighbor model.


7/24

310 ROBIN A. DUBIN

FIG. 2A. Nearest neighbor correlations: 0.67. (A1) One nearest neighbor; (A2) twonearest neighbors; (A3) three nearest neighbors.


8/24


FIG. 2B. Nearest neighbor correlations: 0.33. (B1) One nearest neighbor; (B2) twonearest neighbors; (B3) three nearest neighbors.


9/24

312 ROBIN A. DUBIN

FIG. 3A. Correlograms for Wij 1 if Dij L: 0.67. (A1) L 2; (A2) L 3; (A3)L 4.


10/24


FIG. 3B. Correlograms for Wij 1 if Dij L: 0.33. (B1) L 2; (B2) L 3; (B3)L 4.


11/24

314 ROBIN A. DUBIN

FIG. 4A. Correlograms for Wij 1/ DPij: 0.67. (A1) P 1; (A2) P 2; (A3) P 3.


12/24


FIG. 4B. Correlograms for Wij 1/ DPij: 0.33. (B1) P 1; (B2) P 2; (B3) P 3.


13/24

316 ROBIN A. DUBIN

TABLE IIOne Nearest Neighbor

A. Weight Matrix1 2 3 4 5 6 7 8 9 10

1 0 0 0 0 0 0 1 0 0 02 0 0 0 0 0 0 0 1 0 03 0 0 0 0 0 0 0 0 1 04 0 0 0 0 0 1 0 0 0 05 0 1 0 0 0 0 0 0 0 06 0 0 0 1 0 0 0 0 0 07 1 0 0 0 0 0 0 0 0 08 0 1 0 0 0 0 0 0 0 0

9 0 0 0 1 0 0 0 0 0 010 0 0 0 0 0 0 0 1 0 0

B. Correlation Matrix: 0.671 2 3 4 5 6 7 8 9 10

1 1.00 0.00 0.00 0.00 0.00 0.00 0.92 0.00 0.00 0.002 0.00 1.00 0.00 0.00 0.83 0.00 0.00 0.92 0.00 0.763 0.00 0.00 1.00 0.63 0.00 0.58 0.00 0.00 0.76 0.00

4 0.00 0.00 0.63 1.00 0.00 0.92 0.00 0.00 0.83 0.005 0.00 0.83 0.00 0.00 1.00 0.00 0.00 0.76 0.00 0.636 0.00 0.00 0.58 0.92 0.00 1.00 0.00 0.00 0.76 0.007 0.92 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.008 0.00 0.92 0.00 0.00 0.76 0.00 0.00 1.00 0.00 0.839 0.00 0.00 0.76 0.83 0.00 0.76 0.00 0.00 1.00 0.00

10 0.00 0.76 0.00 0.00 0.63 0.00 0.00 0.83 0.00 1.00

because of 8. Although the correlations are not monotonic in terms ofseparation distance, they are with respect to the strength of the relation-ships. Corr2,8 is the highest because 2 and 8 are each others nearest neigh-bors. Corr2,5 is smaller because 2 is 5s nearest neighbor, but not the reverse.Corr2,10 is smaller yet because 10 and 2 are related only indirectly, through 8.

The second observation to be drawn from Fig. 2 is that the choice of the

number of nearest neighbors changes the implied correlation structureconsiderably. For one nearest neighbor, the correlations decline with dis-tance. For two nearest neighbors, there are no nonzero correlations, becauseall of the points are related, either directly or indirectly. For three nearestneighbors, all of the correlations are about the same, because all of thepoints are related to each other (recall that there are only 10 observations).This is a potential weakness of this approach, because the researcher gener-ally chooses (rather than estimates) the number of neighbors.

2.a.2. Wij 1 if Dij L. This spatial weighting scheme is similar in


14/24


TABLE IIIWij 1 if Dij 2

A. Weight Matrix1 2 3 4 5 6 7 8 9 10

1 0 0 0 0 0 0 1 0 0 02 0 0 0 0 0 0 0 1 0 03 0 0 0 0 0 0 0 0 0 04 0 0 0 0 0 1 0 0 0 05 0 0 0 0 0 0 0 0 0 06 0 0 0 1 0 0 0 0 0 07 1 0 0 0 0 0 0 0 0 08 0 1 0 0 0 0 0 0 0 1

9 0 0 0 0 0 0 0 0 0 010 0 0 0 0 0 0 0 1 0 0


1 1.00 0.00 0.00 0.00 0.00 0.00 0.92 0.00 0.00 0.002 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.87 0.00 0.723 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

4 0.00 0.00 0.00 1.00 0.00 0.92 0.00 0.00 0.00 0.005 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.006 0.00 0.00 0.00 0.92 0.00 1.00 0.00 0.00 0.00 0.007 0.92 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.008 0.00 0.87 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.879 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00

10 0.00 0.72 0.00 0.00 0.00 0.00 0.00 0.87 0.00 1.00

concept to nearest neighbors. The weight matrix is still binary; however,rather than specifying the number of ones in each row, this number isdetermined by setting a maximum distance within which points can influ-ence each other. Unlike the nearest neighbor scheme, this W is alwayssymmetric. Despite the similarities, the two schemes produce differentcorrelation patterns (see Table III). When one nearest neighbor is used,each row of W contains exactly one 1. When L 2, the number of 1scontained in each row ofWis zero, one, or two, with the average being 0.75.Thus, L 2 is somewhat comparable to one nearest neighbor. However,L 2 gives much different correlations: there are only four off-diagonalnonzero correlations and these are very large. When L 3, the rows ofWcontain between zero and four 1s each, with the average being 1.8. Butagain the correlations are very different from two nearest neighbors: thecorrelations fall off more markedly with distance and some pairs exhibit

zero correlation. When L 4, the average number of ones is 3, but againthe pattern of correlations is much different from three nearest neighbors.


15/24

318 ROBIN A. DUBIN

TABLE IVWij 1/ D

3ij

A. Weight Matrix1 2 3 4 5 6 7 8 9 10

1 0.00 0.36 0.18 0.29 0.29 0.27 0.65 0.32 0.18 0.232 0.36 0.00 0.13 0.16 0.48 0.17 0.23 1.81 0.12 0.433 0.18 0.13 0.00 0.17 0.11 0.14 0.22 0.13 0.24 0.134 0.29 0.16 0.17 0.00 0.17 0.66 0.43 0.15 0.30 0.135 0.29 0.48 0.11 0.17 0.00 0.19 0.21 0.39 0.11 0.236 0.27 0.17 0.14 0.66 0.19 0.00 0.31 0.15 0.21 0.127 0.65 0.23 0.22 0.43 0.21 0.31 0.00 0.22 0.24 0.178 0.32 1.81 0.13 0.15 0.39 0.15 0.22 0.00 0.11 0.56

9 0.18 0.12 0.24 0.30 0.11 0.21 0.24 0.11 0.00 0.1010 0.23 0.43 0.13 0.13 0.23 0.12 0.17 0.56 0.10 0.00


1 1.00 0.32 0.45 0.40 0.36 0.37 0.77 0.31 0.41 0.302 0.32 1.00 0.28 0.13 0.73 0.12 0.23 0.92 0.20 0.753 0.45 0.28 1.00 0.42 0.29 0.39 0.48 0.28 0.54 0.27

4 0.40 0.13 0.42 1.00 0.18 0.82 0.48 0.12 0.57 0.135 0.36 0.73 0.29 0.18 1.00 0.18 0.28 0.72 0.23 0.616 0.37 0.12 0.39 0.82 0.18 1.00 0.44 0.12 0.51 0.137 0.77 0.23 0.48 0.48 0.28 0.44 1.00 0.22 0.47 0.238 0.31 0.92 0.28 0.12 0.72 0.12 0.22 1.00 0.19 0.789 0.41 0.20 0.54 0.57 0.23 0.51 0.47 0.19 1.00 0.19

10 0.30 0.75 0.27 0.13 0.61 0.13 0.23 0.78 0.19 1.00

Here, the correlations fall off with separation distance, rather than beingapproximately constant, as for three nearest neighbors. However, this caseis similar to nearest neighbors, in that the choice of the parameter greatlyaffects the correlation pattern.

2.a.3. Wij 1/DP

ij. In this formulation, the elements ofWare fractions.

This is a departure from the earlier cases, both of which resulted in binaryweight matrices. When this case is compared to the previous cases, it isimportant to remember that the larger P the smaller the band of influ-ence. Thus, P 3 is closest to one nearest neighbor and to L 2. Thecorrelations for this case tend to fall with separation distance, particularlyfor the smaller bandwidths (see Table IV). The variation in the correla-tions is largest when the band width is small, because this allows the indirect

relations to show up. As the band width increases, more of the points shareneighbors, and so the correlations become more uniform. Finally, note that


16/24


the pattern of correlations produced by this scheme is markedly differentfrom those produced by the other spatial weighting schemes.

2.a.4. Discussion. This illustration has demonstrated that different

spatial weighting schemes produce markedly different implied correlationpatterns. Furthermore, the choice of the parameter, which must be setonce the family of weighting schemes is specified, also affects the impliedcorrelations. This is problematic for a number of reasons. First, theweighting schemes discussed here are all plausible, and yet they implydifferent things for the data. Second, most tests of the presence of spatialautocorrelation are conditional on the choice of W. For example, Morans

Istatistic, which is one of the most commonly used tests of spatial autocorre-lation, is given by the formula

IN(eWe)

S(ee), (4)

where Nis the number of observations, e is a vector of regression residuals,

S is a standardization factor, and Wis the weight matrix. Clearly the resultsof this test will be conditional on the researchers choice ofW.7 This problemis illustrated by a recent article by Can (1992). In this paper, Can uses threeweighting schemes: W1ij 1 if Dij 5 miles, W2ij 1/Dij, and W3ij 1/D2ij. She also uses two functional forms of the hedonic regression: linearand semilog. This gives six combinations. Three of these combinations showsignificant spatial autocorrelation and three do not. Can has no way ofknowing which is the correct specification and therefore whether the errorsare spatially correlated or not.8 It would seem that users of this approachto modeling spatial autocorrelation should move in the direction of estimat-ing the parameters of the weight matrix.

2.b. Second Approach: Direct Specification of the Covariance Structure

In this approach, rather than starting with the process and deriving the

covariance matrix, a functional form for the covariance structure is assumed.The parameters of this function are then estimated, along with the regres-sion coefficients, using maximum likelihood methods. Functions are chosenwhich cause the correlations to fall as separation distance increases. Thefollowing are all permissible functions:

7 Kelejian and Robinson (1982) provide a test of spatial autocorrelation that does not usea weight matrix.

8 It is possible that the likelihood values could give some guidance as to which model bestfits the data. However, this requires that the models be nested.


17/24

320 ROBIN A. DUBIN

Negative Exponential

Kij b1 exp

Dij

b2

(5)

Gaussian

Kij b1 exp D2ijb2 (6)

Spherical

Kij b1 1 3Dij2b2

D3ij

2b32 if 0Dij b2 (7)

0 ifDij b2 ,

where K is the correlation matrix for the error terms (and 2

K

V).The correlograms for these models for the simulated data are shown inFigs. 5 through 7. These figures differ from those for the weight matrixmethod. For example, in Fig. 2.a., the three panels represent differentchoices made by the researcher: the number of nearest neighbors to con-sider. In Fig. 5, the three panels represent different values of b1 , whereb1 is estimated. Once the researcher picks the functional form, the datadetermine which of the nine functions shown is best (of course, values of

b1 and b2 other than those shown in the figure are possible).The three functions result in similar graphs. The Gaussian correlogram

falls off faster with separation distance than does the Negative Exponential.The Gaussian also has somewhat more weight at very small separationdistances. This is difficult to see from these figures, however, because ofthe lack of observations with small separation distances (there is only onepair separated by a distance smaller than 1.5).9 As depicted in Fig. 7, the

Spherical Correlogram looks very much like the Negative Exponential. Inreality, the functions differ in their behavior near the origin, where theSpherical model produces higher correlations.

These functions are much smoother than the implied correlograms forthe various weight matrices. This is because the correlations are modeleddirectly, and thus, all points separated by a given distance will have the

9 This turns out to be a problem in empirical work as well. Typically, there are many pairs

with large separation distances but a much smaller number with small separation distances.This can make it difficult to fit the beginning of the curve.


18/24


FIG. 5. Correlograms for Kij b1 exp(Dij/b2). (A) b1 0.95; (B) b1 0.67; (C) b1 0.33.


19/24

322 ROBIN A. DUBIN

FIG. 6. Correlograms for Kij b1 exp(D2ij/b2). (A) b1 0.95; (B) b1 0.67; (C) b1 0.33.


20/24


FIG. 7. Correlograms for spherical case. (A) b1 0.95; (B) b1 0.67; (C) b1 0.33.


21/24

324 ROBIN A. DUBIN

same correlation, regardless of the location of other points. This is not thecase for the weight matrix correlograms. For example, in the case of onenearest neighbor and 0.95, points 5 and 8 have a correlation of 0.629and are separated by a distance of 2.549. Points 4 and 7 are closer (separationdistance equals 2.327), but have a correlation of zero. This seeming anomalyoccurs because point 2 provides the link between points 5 and 8 (as describedearlier), while points 4 and 7 have other points which are closer to themand therefore are not nearest neighbors.

2.c. Discussion

As pointed out above, there are two main approaches to modeling spatialautocorrelation: the weight matrix approach and the direct approach.Within each approach, there are alternatives available to the researcher(i.e., the method of forming the weight matrix or the functional form forthe direct approach). As shown by the figures, each alternative implies adifferent assumption about the spatial relationships in the data. The litera-ture currently provides little guidance about which models work best inwhich situations. However, two points seem clear. First, to the extent possi-

ble, it is probably better to estimate the parameters of the model, ratherthan choosing them a priori. Second, any spatial modeling of the errorterms, in a situation when autocorrelation is likely to be present, willdominate a model which ignores the problem completely.

3. ESTIMATION

Once a model (weight matrix or direct approach) has been chosen torepresent the covariance structure of the error terms, it can be estimatedvia Maximum Likelihood.10 In Maximum Likelihood estimation, the follow-ing log likelihood function is maximized with respect to the unknown param-eters:

ln(L)

1

2 ln V

n

2 ln(Y

X)V1

(Y

X). (8)

The unknowns are the regression coefficients (), the error variance (2),

10 Other techniques are available. For example, in the direct approach, one technique is tofit (usually by eye) the parameters to an empirical correlogram (which is the average correlationamong all points in a given separation distance range, plotted against separation distance).Once the parameters of the correlation function have been estimated, EGLS (estimated

generalized least squares) can be used to obtain the regression coefficients. These techniqueswill not be discussed further here.


22/24


and the parameters of V (b1 and b2 or , depending on which approach isused). One nice byproduct of the ML approach is that a likelihood ratiotest can be used to determine the presence of spatial autocorrelation: twotimes the difference between the likelihood functions of the restrictedand unrestricted models is distributed as a 2 random variable. Here therestricted model is OLS, i.e., restricting V to be the identity matrix. Thedegrees of freedom are 1 for the weight matrix approach and 2 for thedirect approach.

4. OTHER ISSUES

4.a. Sample Size

V is an N N matrix, where N is the sample size. The log likelihoodfunction contains both the determinant and the inverse of this matrix. Thus,the computational burden increases with sample size. However, since theaccuracy of the estimates also increase with the sample size, it is important

to use a large sample size in these problems.Pace (1997) has suggested the use of sparse matrix techniques to facilitate

the use of large samples. If V is specified so that the number of nonzeroelements is relatively small, these methods can reduce the computationalburden considerably.

4.b. Measurement of Separation Distance

Urban areas vary in the density of development. Therefore, it is possiblethat neighborhood size varies with the location of the neighborhood withinthe city: dense areas may have neighborhoods which are more compact,while suburban areas may have geographically larger neighborhoods. Re-searchers may wish to account for this by using separation measures otherthan geographic distance. For example, Dubin (1992) measures separationdistance in terms of houses.

4.c. Functional Form of the Regression

Hedonic regressions are reduced form, and economic theory has littleto say about the proper functional form of such an equation. Most of thispaper addresses the issue of the assumed form of the covariance structure.Clearly the functional form of the regression itself is of even greater impor-

tance: the regression residuals will not reflect the true error structure if thewrong functional form is used.


23/24

326 ROBIN A. DUBIN

5. FURTHER READING

Below, I provide an annotated list of sources which the interested readermay wish to consult. Some of these are cited elsewhere in this paper.

Texts

1. Anselin (1988). This book provides an extremely complete presen-tation of the weight matrix approach.

2. Ripley (1981). Chapter 4 of this book provides an excellent discus-sion of Kriging (prediction incorporating the spatially autocorrelated

errors).3. Upton and Fingleton (1985). In Chapter 5, the authors discussregression with autocorrelated errors, using the weight matrix approach.This book is particularly nice because data and solutions are provided formost of the techniques discussed.

4. Anselin and Florax (1995). This book is an edited collection ofmany interesting papers on spatial econometrics.

Papers

1. Dubin (1988). This is probably the first application of these tech-niques to estimating a hedonic regression. This paper uses the direct ap-proach.

2. Can (1992). An example of a hedonic estimation using the Weightmatrix technique.

3. Pace et al. (1998). Provides an example of estimating the numberof nearest neighbors in the weight matrix.

4. Basu and Thibodeau (1998). Kriges housing prices in Dallas.5. Dubin (1998). Discusses the issues involved in Kriging housing

prices.6. Pace (1997). Uses sparse matrix techniques to facilitate the esti-

mation.

REFERENCES

Anselin, L. (1988). Spatial Econometrics: Methods and Models. Dordrecht: Kluwer.

Anselin, L., and Florax, R. (1995). New Directions in Spatial Econometrics. Berlin:Springer-Verlag.

Basu, S., and Thibodeau, T. (1998). Analysis of Spatial Autocorrelation in House Prices,J. Real Estate Finance Econ. 17, 6186.

Can, A. (1992). Specification and Estimation of Hedonic Housing Price Models, Reg. Sci.Urban Econ. 22, 453474.


24/24


Cliff, A. D., and Ord, J. K. (1981). Spatial Processes: Models and Applications. London: Pion.

Dubin, R. A. (1988). Estimation of Regression Coefficients in the Presence of SpatiallyAutocorrelated Error Terms, Rev. Econ. Statist., 168173.

Dubin, R. A. (1992). Spatial Autocorrelation and Neighborhood Quality, Reg. Sci. Urban

Econ. 22, 433452.Dubin, R. A. (1998). Predicting House Prices Using Multiple Listings Data, J. Real Estate

Finance Econ. 17, 3560.

Dubin, R. A., Pace, K., and Thibodeau, T. (forthcoming). Spatial Autoregression Techniquesfor Real Estate Data.

Kelejian, H. H., and Robinson, D. P. (1982). Spatial Autocorrelation: A New ComputationallySimple Test with an Application to Per Capita County Policy Expenditures, Reg. Sci.Urban Econ. 22, 317332.

Matheron, G. (1963). Principles of Geostatistics, Econ. Geol. 58, 12461266.Pace, K. (1997). Performing Large Spatial Regressions and Autoregressions, Econ. Lett.,

283291.

Pace, K., and Gilley, O. (1998). Generalizing the OLS and Grid Estimators, Real EstateEcon., 331347.

Pace, K., Barry, R., Clapp, J. M., and Rodriguez, M. (1998). Spatiotemporal AutoregressiveModels of Neighborhood Effects, J. Real Estate Finance Econ., 1534.

Ripley, B. D. (1981). Spatial Statistics. New York: Wiley.

Tobler, W. (1970). A Computer Movie Simulating Urban Growth in the Detroit Region,Econ. Geog. Supplement 46, 234240.

Upton, G., and Fingleton, B. (1985). Spatial Data Analysis by Example. New York: Wiley.

spatial auto correlation primer - anselin_luc

Documents