Modeling Spatial Relationships
We have discussed in previous chapters why OLS is inappropriate to use when variables exhibit spatial autocorrelation. We have also covered how to detect if a variable exhibits spatial autocorrelation. This chapter covers the next and most significant step in the spatial regression process: estimating the actual spatial regression models. At this point, a researcher should have visualized their spatial data and tested for spatial autocorrelation and now must make the decision of model selection. The tools discussed in this section will equip researchers to estimate a number of candidate models and select the one that best accounts for the type and degree of spatial autocorrelation in their data. We will cover the two dominant models used to estimate spatial regressions: the spatial error model and the spatial lag model. Both models have analogs, like most aspects of spatial regression, to time series analysis. By the end of this chapter, readers will be comfortable with the theoretical underpinnings and modeling techniques for both spatial error and spatial lag models. We will also cover more complex modeling choices. These models, and by extension this chapter, are the most important aspect of spatial regression. By following the steps discussed here, and shown in a detailed applied context in chapter four, researchers can estimate models with confidence that their standard errors are accurate and any and all spatial autocorrelation is purged from their model’s error term.
The main problem when analyzing spatial data is that residuals have a high probability of being autocorrelated due to the similarity of nearby neighbors. For example, we can expect census tracts nearby to one another in Los Angeles, CA to have more similar crime rates than census tracts across the city from one another. We need a specification that accounts for this, lest our standard errors will be incorrect. We have two main options when it comes to modeling choices that handle global spatial autocorrelation: spatial error models and spatial lag models. We begin first by discussing spatial error models.
Spatial Error Models
As we have said many times, the main issue with modeling spatially correlated variables is that our standard errors will be incorrect. Spatial error models account for the spatial autocorrelation in our residuals, giving us more accurate standard errors (Hurtado 2016 and DGES 2022). The central theory of spatial error models is that spatial autocorrelation can be modeled as a weighted mean of the residuals of a unit’s neighbors (DGES 2022). Spatial error models specifically model the spatial autocorrelation in the error term. Once we have modeled the spatial autocorrelation in our error term, the new error term is free of autocorrelation. In spatial error models, we weight based on the errors of an observation’s neighbors, giving us the following model:
\[Y_{i}=\alpha + \beta\ X_{i} + \delta \Lambda' + \nu_{i}\] \[\nu_i=\lambda_{Err}\ W_i\nu_i + \epsilon_i\]
Where:
- \(\lambda\) is the spatial constant
- \(W\) is the weighted error of a unit’s neighbors
- \(\beta\ X_i\) is our main independent variable its coefficient
- \(\delta \Lambda'\) is a matrix of covariates and a vector of their coefficients
- \(\nu\) is the spatially autocorrelated errors
- \(\epsilon\) is the now-purged of spatial autocorrelation error term
The key takeaway from the equations above is that once we weight the neighbor’s residuals, our new error term, \(\epsilon_i\), is purged of spatial autocorrelation. Now we can move forward estimating our spatial relationship and confidently know that spatial autocorrelation will not be affecting our standard errors. We should also notice a change, sometimes slight, in our variables’ standard errors, t-values, and \(p\)-values compared to OLS. Using spatialreg::errorsarlm()
, which takes arguments for a model and weights, estimates a spatial error model in R. We can see evidence that there is, in fact, spatial autocorrelation in our models by looking at the estimate of \(\lambda\) that is given when running summary()
on our model. If the \(p\)-value of \(\lambda\) is statistically significant, there was global spatial autocorrelation that has now been accounted for.
Another method to handle globally spatial autocorrelated data is a spatial lag model.
Spatial Lag Models
As opposed to spatial error models that model the part of the error term that is spatially autocorrelated, in a spatial lag model, we include a lagged term as an independent variable, similar to time series lagged DV models. A spatial lag variable averages the weighted neighboring values of a unit, \(i\). Spatial lag models compare \(i\)’s values with its neighbors values (DGES 2022). The resulting weighted matrix defines what units are \(i\)’s neighbors and how much to weight them. We can use standardized weights from library(spdep)
. Usually, the unit at the center of its defined list of neighbors is not included in the definition of neighbors and the weight of that unit is set to zero. The weight matrix is the same as the weight matrix for global Moran’s \(I\) test.
A spatial lag model looks like:
\[Y_i=\rho_{lag}\ W_iY_i + \beta\ X_i + \delta \Lambda' + \epsilon_i\]
Where:
- \(Y_i\) is our dependent variable
- \(\rho_{lag}\) is the degree of autocorrelation
- \(W_i\) is an observations weighted by its neighbors
- \(\beta\ X_i\) is our main independent variable its coefficient
- \(\delta \Lambda'\) is a matrix of covariates and a vector of their coefficients
- \(\epsilon_i\) is the now-purged of autocorrelation error term
Including \(\rho_{lag}\) as an independent variable accounts for the spatial autocorrelation that was in \(\epsilon_i\). Similar to the result of the spatial error model above, our new error term, \(\epsilon_i\) is entirely purged of spatial autocorrelation. In R, we use spatialreg::lagsarlm()
for estimating spatial lag models. The resulting output will give us coefficient estimates, standard errors, t-values, and \(p\)-values for each variable in our model. We can also check, after running this model, that there was, in fact, spatial autocorrelation by examining the summary output with summary()
. If \(\rho\) is statistically significant, then there was global spatial autocorrelation that has now been accounted for.
We can use a goodness-of-fit metric like \(R^2\) values, Akiake Information Criterion, or another metric of choice for model selection between OLS, spatial lag, and spatial error models. While using either spatial lag or spatial error models is always preferable to relying on basic OLS when you have spatially autocorrelated data, it is a good idea to present results from all three models to show readers that the results are robust to differing model specifications. The main weakness of both spatial error and spatial lag models is that they do not account for localized spatial autocorrelation and operate under the same assumption of global Moran’s \(I\): spatial autocorrelation is homogeneous across space. This assumption is often violated when working with actual data from the “real world,” so we need more advanced techniques to handle this.
Advanced Spatial Regression Modeling
Spatial lag and spatial error models are the workhorses of spatial regression, but neither of those models account for localized autocorrelation in spatial units and may be inadequate for handling severe spatial autocorrelation. Understanding spatial lag and spatial error models equips researchers to reliably model spatial relationships with well-behaving data, but other methods may be needed for severe or heterogeneous spatial autocorrelation. While implementing either a spatial error or a spatial lag model will give researchers far more accurate standard errors, and therefore statistical inferences, than a basic OLS model, we still want to be familiar with more advanced models to better account for the specific type and severity of spatial autocorrelation in our data.
SAC/SARAR Models
A spatial simultaneous autoregressive model, or a “SAC/SARAR” model combines spatial error and spatial lag models together (DGES 2022). We include both a lagged term and model the spatial autocorrelation in the error term. The most basic SAC/SARAR model looks like:
\[Y_i=\rho_{lag}\ W_iY_i + \beta\ X_i + \delta \Lambda' + \nu_i\] \[\nu_i=\lambda_{Err}\ W_i\nu_i + \epsilon_i\]
- \(\rho_{lag}\) is the degree of autocorrelation
- \(W_i\) is an observations weighted by its neighbors
- \(\beta\ X_i\) is our main independent variable and its coefficient
- \(\delta \Lambda'\) is a matrix of covariates and a vector of their coefficients
- \(\nu_i\) is the spatially autocorrelated errors
- \(\lambda\) is the spatial constant
- \(W_i\) is the weighted error of an observations neighbors
- \(\epsilon\) is the now-purged of spatial autocorrelation error term
We can use spdep::sacsarlm()
in R to calculate the maximum likelihood estimation of SAC/SARAR models. When we summarize the results in R, we can check the statistical significance of both \(\rho\) and \(\lambda\). As mentioned above, model selection should depend upon some goodness-of-fit metric. In the applied section we rely on AIC values, but researchers can select the goodness-of-fit metric they believe best suits their analysis. All things equal, the SAC/SARAR model should provide a better fit than either the spatial lag or spatial error models individually. One interesting statistical artifact of SAC/SARAR models to keep in mind is that we can skip all of the steps above and directly estimate a SAC/SARAR model from the beginning of our analysis as a test for spatial autocorrelation. If both \(\rho=0\) and \(\lambda=0\), then we have no spatial autocorrelation and we can proceed by simply using OLS and conclude that our data is not spatially autocorrelated (Hurtado 2016). We do not even have to consider Moran’s \(I\).
While a SAC/SARAR can handle sever autocorrelation, it does not account for local spatial autocorrelation. For that, we need even more advanced methods.
Geographically Weighted Regression
When spatial heterogeneity exists in our data, we can still see heteroscedastic and correlated errors even when utilizing a spatial lag, spatial error, or SAC/SARAR model. In these cases, our first choice should be to use a geographically weighted regression model. We can use the tools already discussed in this guide to estimate a geographically weighed regression model easily in R. Of all the models discussed in this chapter, geographically weighted regression models are the most powerful and flexible and allow for the richest results when modeling spatial relationships.
Geographically Weighted Regression (GWR) is most prominently used in geography and public health scholarship, but it has applications to the social sciences, namely when our data exhibits heterogeneous spatial autocorrelation. GWR is a method of spatial regression that accounts for non-stationary variables and models local spatial relationships (DGES 2022). GWR models are an expansion of OLS models that allow the relationship between our independent and dependent variables to vary depending upon location. GWR estimates local models by fitting a regression model to every spatial unit in the dataset. GWR estimates separate models by including the dependent and independent variables of the units falling either within the neighborhood of each unit or the \(k\) nearest neighbors of each unit. The model results are not single point estimates, but distributions of point estimates that can be summarized and visualized in engaging ways. Accordingly, GWR runs many regression models, estimating a different but related regression equation for each spatial unit, so the computational time can be considerably longer than with the other methods mentioned above.
Recall from introductory statistics that a basic linear regression model is:
\[Y_{i}=\alpha + \beta_1\ X_{1i} + \beta_2\ X_{2i}\ +...+\ \beta_n\ X_{ni} + \epsilon_{i}\]
Where:
\[\beta=(X'X)^{-1}X'y\]
In GWR models, \(\beta\) is calculated as:
\[\beta=(X'W_iX)^{-1}X'W_iy\]
Where \(W_i\) is the weight matrix, discussed several times before, that weights nearby units more than distant units. Now the \(\hat{\beta}\) estimator is accounting for spatial autocorrelation.
GWR models provide a powerful tool to handle heterogeneous spatial autocorrelation. GWR is, however, not without its limitations and it requires the use of different packages than the models above. The main limitation of GWR is that you need a high number of spatial units, or else the weights may over weight some units and GWR can also not accommodate multipoint data. There are further problems with multicollinearity that can arise due to the estimation procedure (Wheeler & Tiefelsdorf 2005). You can estimate a GWR model in R with library(GWmodel)
, but the process takes several more steps than the spatial error, spatial lag or SAC/SARAR models, and, depending on the number of independent variables and spatial units, can take a long time to run.
To estimate a GWR model, we begin by estimating the optimal bandwidth to define local neighborhoods of each unit in our analysis with GWmodel::bw.gwr()
. We can define our neighborhoods with the bandwidth
argument. R uses a first set of basic OLS models to determine the best bandwidth; this is called the “adaptive” approach. Our results are heavily dependent upon our choice in bandwidth, so it is best to use the default values for gwr.baisc()
, which is the adaptive approach (Fotheringham et al. 2002 & Goovaerts 2008). Once we have estimated our optimal bandwidth, we can run a GWR model with GWmodel::gwr.basic()
. Now we will have our regression results accounting for heterogeneity in our data’s spatial autocorrelation and we can map localized coefficients and goodness-of-fit metrics (Mennis 2006).
Bayesian Hierarchical Spatial Models, SARAR, GS2SLS, and the Spatial Durbin Model
While we have covered the most common and flexible approaches to estimating spatial relationships, there are numerous other models that we can consider. Many of the techniques described below require outside software such as Stata or have asymptotic behavior that has not been formally proved yet. So, while we do not go into detail about any of the following models, we present them quickly to give readers an idea about the current literature revolving around estimating spatial relationships in econometrics, statistics, & public health that may be of interest to researchers looking to innovate in the field of spatial regression.
The first additional model we consider is the Bayesian hierarchical spatial model. The Bayesian hierarchical model relies on the use of WinBUGs or GeoBugs. Bayesian hierarchical spatial models have become quite widely used in epidemiology and public health departments, but they are not used much in the social sciences. The key strength of Bayesian hierarchical models is that they model the complex levels inherent in spatial data and flexibly model different types of spatial autocorrelation. Since they are Bayesian models, they offer an approach to spatial regression consistent with the Bayesian philosophical perspective of statistics, not the frequentist paradigm inherent in other models discussed here. Similar to GWR models, Bayesian hierarchical models provide an estimate for each spatial unit, but they further provide an estimate for different levels in spatial data. This is especially useful when researchers are working with multi-level spatial data (Zhu et al. 2006). Another key strength is that Bayesian models allow researchers to incorporate prior beliefs about the spatial distribution and autocorrelation of the data in their model (Zhu et al. 2006). Zhu et al. (2006) overview this technique and apply it to analyzing the relationship between areas with high drug use and violent crime.
The second and third additional models that we consider have related uses. Our presentation here follows Hurtado’s (2016) lecture notes and Jin & Lee (2013). GS2SLS, or the generalized spatial two-stage least squares estimator, applies the logic of instrumental variables to spatial data analysis. GS2SLS can be implemented in R with spatialreg::stsls()
. GS2SLS models work with researchers first obtaining consistent estimates of \(\beta\) and \(\lambda\) from a Spatial AutoRegressive with additional AutoRegressive error structure, or SARAR model. SARAR models are similar to SAC/SARAR models and are the most basic way to account for spatial autocorrelation. A SARAR model looks like:
\[Y_i = \lambda WY_i + \beta_1X_i + W\beta_2X_i + u\]
Where:
\[u = \rho Wu + \epsilon_i\]
The next step in GS2SLS is to use the consistent estimates of \(\lambda\) to obtain an estimate of \(u\). With the estimate of \(u\), we can then obtain an estimate of \(\rho\). Once we have \(\rho\), we can create a \(\rho\) transformed equation of our main model and then proceed normally with a 2SLS model. Our result should now be free of spatial autocorrelation (Jin & Lee 2013).
The final additional model that we will cover briefly is the Spatial Durbin model (SDM) which is strikingly similar to the SARAR model. Our discussion here follows the presentation of Spatial Durbin models in Eilers (2019). SDM includes both a lagged dependent and independent variable in the main model. Both a \(WY_i\) and a \(WX_i\) term are included. While geographically weighted regression methods are preferred by public health scholars, SDM is one of the most common methods in econometrics. SDM looks like:
\[Y_i = \lambda WY_i + \beta_1X_i + W\beta_2X_i + \rho Wu_i + \epsilon_i\]
Concluding Remarks on Modeling Spatial Relationships
As one final note, we have covered several types of models but many more exist. It is always a good idea to run an analysis on a series of different models to ensure consistent results and robustness of findings across many model specifications. Model selection for spatial regression can be decided by comparing psudeo-\(R^2\)s, Akiake Information Criterion (\(AIC\)), or another goodness-of-fit statistic. An applied example of this is discussed in the next chapter.
This chapter introduced the core models needed to implement spatial regression and overviewed more advanced methods when spatial lag and spatial error models still exhibit some degree of spatial autocorrelation. In the next chapter, we leave the ivory tower of well-behaving statistical theory behind and enter the basement of data analysis to conduct a full spatial study of the effect of college education rates on poverty in Brooklyn, New York.