geostatistics and spatial hierarchical modeling · 2004-12-13 · geostatistics geostatistics a eld...
TRANSCRIPT
Geostatistics and Spatial Hierarchical Modeling
Presented for the Workshop on Spatial Analysis in Social Research
Sponsored by:
Interuniversity Consortium on Social and Political Research, University of Michigan
Center for Spatially Integrated Social Science, University of California, Santa Barbara
May 17-20, 2001
Carol A. Gotway Crawford
National Center for Environmental Health
Centers for Disease Control and Prevention
Mail Stop E70
1600 Clifton Road NE
Atlanta, GA 30333
Tel: (404) 639-2504; Fax: (404) 639-1677; E-Mail: [email protected]
******************************************************************
Outline
I. Geostatistics
A. Introduction. What is Geostatistics? When could geostatistical techniques be useful? Stress
continuous, fixed, spatial index (contrast lattice and point processes).
B. The Semivariogram. Definition. Relationship to “autocorrelation.” Estimation. Models and
modeling. Example.
C. Kriging. Overview and rationale. Development of ordinary kriging predictor. Search neigh-
borhoods. Other types of kriging (universal, block kriging, indicator, co-kriging definitions).
Example.
D. Software
II. Spatial Hierarchical Modeling
A. What is a hierarchical model? Advantages and disadvantages.
B. Estimation in single-stage model. Likelihood-based inference and Bayesian Estimation.
C. Example. Model specification. Notes on estimation.
D. Overview of more complex models. Bayesian modeling. Gibbs sampling.
E. Software
******************************************************************
1
Geostatistics
Geostatistics a field of statistics concerned with the study of spatial data that have a continuous
spatial index (i.e., data can be observed at any point within a domain of interest, at least
conceptually).
Data locations are assumed to be fixed and known.
These attributes distinguish geostatistics from point patterns and lattice data, although geosta-
tistical concepts have been useful in these areas.
Assumes data are a partial realization of a random process (a collection of random variates)
{Z(s) : s ∈ D}
where D a fixed subset of <n (n is usually 2), and the spatial index, s, varies continuously
throughout D.
Data are denoted Z(s1), Z(s2), . . . , Z(sn).
******************************************************************
Replication needed for inference is provided by an assumption of intrinsic stationarity.
E[Z(s)] = µ, for all s ∈ D,
V ar(Z(si) − Z(sj)) = 2γ(si − sj), si, sj ∈ D.
The function 2γ(·) is called the variogram. The function γ(·) is called the semivariogram.
If γ(si − sj) is a function of just the distance between si and sj and not direction, then the
process is called isotropic. If γ(si − sj) depends on both distance and direction, then the spatial
process is called anisotropic.
******************************************************************
2
The Semivariogram
The semivariogram is a function of the spatial process and as such satisfies certain properties.
Let h = s − u be the spatial lag, or vector between spatial locations s and u.
i) γ(−h) = γ(h)
ii) γ(0) = 0
iii) γ(h)/‖h‖2 −→ 0 as ‖h‖ −→ ∞, i.e., γ(h) cannot increase too fast with ‖h‖.
iv) γ(·) must be conditionally negative-definite, i.e.,
n∑
i=1
n∑
j=1
aiajγ(si − sj) ≤ 0
for any finite number of locations {si : i = 1, . . . , m} and real numbers {a1, . . . , an}satisfying
∑ni=1 ai = 0.
v) If the spatial process is isotropic, then γ(h) ≡ γ(h) where h = ‖h‖ (Euclidean distance).
******************************************************************
A graph of the semivariogram plotted against separation distance conveys information about
the continuity and spatial variability of the process.
This graph starts at zero and then, if observations close together are more alike than those
farther apart, increases as the separation distance increases.
It may level off to nearly a constant value (called the sill) at a large separation distance (called
the range). Beyond this distance, observations are spatially uncorrelated.
The shape of the semivariogram near the origin indicates the degree of smoothness or spatial
continuity of the spatial variable under study. A parabolic shape near the origin arises with a
very smooth spatial variable that is both continuous and differentiable. A linear shape near the
origin reflects a variable that is continuous, but not differentiable, and hence less regular.
A discontinuity, or vertical jump, at the origin indicating that the spatial variable is not even
continuous and has highly irregular spatial variablity. This discontinuity is called the nugget
effect.
******************************************************************
3
Idealized Semivariogram
******************************************************************
4
Estimating the Semivariogram
γ(h) =1
2|N(h)|∑
N(h)
(Z(si) − Z(sj))2, h ∈ <2,
where N(h) is the number of pairs separated by h, i.e., N(h) = {(si, sj) : si − sj = h} and
|N(h)| = the number of distinct pairs in N(h).
Irregularly spaced data: need to define distance classes, called tolerance intervals, and group
the sample pairs into these classes prior to averaging. This is analogous to the procedure used
in making a histogram.
Estimates of the semivariogram at large lags are based only on points at the opposite ends of
the domain. Thus, in practice, we usually take the maximum lag distance to be about half the
maximum separation distance.
The empirical semivariogram is a picture of your data spatially: the sill and the range, if they
exist, provide estimates of the process variance and the zone of influence of the observations, and
information at larger lags can indicate large-scale trends that may be important to interpret.
******************************************************************
Example: Dioxin Contamination
In 1971, a truck transporting dioxin-contaminated residues dumped an unknown quantity of
waste in a rural area of Missouri in order to prevent tickets for being overweight.
In November 1983, The U.S. EPA collected soil samples in several areas and measured the
TCDD (tetrachlorodibenzo-p-dioxin) concentration in each sample. Following Zirschy and Har-
ris (1996), we will analyze the logarithm of the TCDD data.
For simplicity, transform the study domain by dividing the x-coordinate by 50 to produce a
region that is almost square.
******************************************************************
5
******************************************************************
Tolerance intervals:
• maximum lag distance=maximum separation distance/2=√
71.12 + 602 = 93.0/2 = 46.5.
• number of lags=11 (arbitrary)
• Many locations are about 4 ft. apart, so try a lag spacing slightly greater than 4, say 4.6.
• lag tolerance=lag spacing/2.
• defines distance intervals 4.6i ± 2.3, i = 0, 1, 2, . . . 10.
Lag Distance γ(h) N(h)1 1.47 0.55 412 4.37 0.93 1553 9.85 1.75 4044 14.02 2.22 5335 19.25 2.58 4976 22.93 3.27 5347 27.40 2.84 5048 31.87 3.19 7869 36.82 3.37 42210 41.22 3.83 87011 45.73 3.86 607
Experiment with different tolerance intervals. The goal is accurate estimation (> 30 pairs for
each estimated value), a clear structure, and at least a total of 10-20 lags for modeling and
inference.
******************************************************************
6
Modeling the Semivariogram
The empirical semivariogram, γ(·), is not guaranteed to satisfy the properties of the semivari-
ogram.
This can lead to inconsistent results when used for spatial prediction. To solve this problem,
parametric functions are used to model the structure of the empirical semivariogram.
There are several commonly-used parametric models. Models may be added together for complex
semivariogram structures.
******************************************************************
7
• SPHERICAL:
γ(h; θ) =
0 h = 0c0 + cs {(3/2)(h/as) − (1/2)(h/as)
3} 0 ≤ h ≤ as
c0 + cs h > as,
θ = (c0, cs, as)′, c0 ≥ 0, cs ≥ 0, as ≥ 0. The spherical semivariogram is nearly linear near
the origin. The parameter c0 measures the nugget effect, cs is the partial sill (so c0 + cs is the
sill), and a is the range.
•EXPONENTIAL:
γ(h; θ) =
{0 h = 0c0 + ce {1 − exp(−h/ae)} h > 0,
θ = (c0, ce, ae)′, c0 ≥ 0, ce ≥ 0, ae ≥ 0. The exponential semivariogram rises more slowly from
the origin than the spherical. As with the spherical model, c0 measures the nugget effect and cs
is the partial sill (so c0 + cs is the sill). However, this model approaches the sill asympotitcally
and the effective range is 3a.
•GAUSSIAN:
γ(h; θ) =
{0 h = 0c0 + cg {1 − exp[−(h/ag)
2]} h > 0,
θ = (c0, cg, ag)′, c0 ≥ 0, cg ≥ 0, ag ≥ 0. The Gaussian semivariogram model is parabolic
near the origin, indicative of a very smooth spatial process. As with the previous models, c0
measures the nugget effect and cs is the partial sill (so c0 + cs is the sill). The range is√
3a.
•POWER:
γ(h; θ) =
{0 h = 0c0 + b`h
p h > 0,
θ = (c0, b`, p)′, c0 ≥ 0, b` ≥ 0, 0 ≤ p < 2. The power model is used to model processes with
large-scale trend by taking p ≥ 1. It also plays an important role in fractal processes and the
estimation of the fractal dimension .
•HOLE-EFFECT:
γ(h; θ) =
{0 h = 0c0 + cw{1 − awsin(h/aw)/h} h > 0,
θ = (c0, cw, aw)′, c0 ≥ 0, cw ≥ 0, aw ≥ 0. The hole-effect model is useful for processes with
negative spatial autocorrelation arising from a cyclical or periodic variability. It reaches a global
maximum and then continues to oscillate around the sill with a period of a.
There are many different ways to parameterize these models and different computer software
programs use slightly different forms. Check to be sure what the parameters mean!
******************************************************************
8
Fitting Semivariogram Models
Use the empirical semivariogram to estimate the parameters of the semivariogram model (θ).
• ML and REML (Gaussian data), Composite Likelihood (non-Gaussian data)
• Nonlinear least squares regression, weighted and generalized
• By eye
******************************************************************
9
Kriging
Kriging is a method of spatial prediction or interpolation that has optimal statistical properties.
There are many types of kriging:
• Simple kriging: Process mean known
• Ordinary kriging: Process mean unknown but constant
• Universal kriging: Process mean is a parametric function of covariates
• Indicator kriging: For binary or nonlinear prediction
• Block kriging: Areal prediction. Geostatistical solution to MAUP
• Co-kriging: Spatial prediction in a multivariate framework
There are more......
We will discuss ordinary kriging here. Development of other kriging predictors is similar.
******************************************************************
Ordinary Kriging
Data Z(s1), . . . , Z(sn) observed at locations s1, . . . , sn.
Goal: Predict Z(s0) at location s0 where no data value is observed.
Consider predictors that are a weighted average of the data values, i.e.,
Z(s0) =n∑
i=1
λiZ(si).
Assume E(Z(s)) = µ, with µ unknown and independent of location s, and choose the weights
{λi} so that Z(s0) is unbiased for Z(s0) and has the smallest prediction error variance out of all
linear unbiased predictors.
Such a predictor is called the best linear unbiased predictor, or BLUP.
******************************************************************
10
The method of Lagrange multipliers from calculus is used to minimize the prediction error
variance subject to the unbiasedness constraint.
This gives the ordinary kriging equations
n∑
j=1
λjγ(si − sj) − m = γ(si − s0); i = 1, . . . n
n∑
i=1
λi = 1.
This is a system of (n+1) equations (n λ’s and one Lagrange multiplier m) which must be solved
simultaneously.
Once these equations are solved for the λ’s, the ordinary kriging predictor is
Z(s0) =n∑
i=1
λiZ(si).
******************************************************************
These equations are often written in matrix form:
0 γ(s1 − s2) . . . γ(s1 − sn) 1γ(s2 − s1) 0 . . . γ(s2 − sn) 1
...γ(sn − s1) . . . γ(sn − sn−1) 0 1
1 1 . . . 1 0
λ1
λ2...
λn
−m
=
γ(s1 − s0)γ(s2 − s0)
...γ(sn − s0)
1
which can be written succinctly as
Γλ = γ.
This development assumes the semivariogram γ(·) in known. In practice, the semivariogram
is estimated and modeled from the data (as described earlier), and then the fitted parametric
model is used to specify the entries in Γ and γ.
******************************************************************
11
The minimized prediction error variance, also called the kriging variance is given by
σ2k(s0) = 2
n∑
i=1
λiγ(si − s0) −n∑
i=1
n∑
j=1
λiλjγ(si − sj).
The square root of this quantity, called the kriging standard error, gives a measure of the
uncertainty in our prediction of Z(s0). Prediction intervals (similar to confidence intervals for a
fixed parameter) can be constructed as
Z(s0) ± zα/2σk(s0),
where zα/2 is the α/2 percentage point from a normal distribution.
******************************************************************
Search Neighborhoods
Inversion of matrices in kriging can be cumbersome, particularly for large data sets.
Construct a neighborhood (usually a circle or ellipse) around each prediction location, s0. Only
data observed at locations within this neighborhood are used to predict the data value at s0.
For more localized prediction, the neighborhood is further restricted by retaining only a specified
number of points for prediction.
Care must be taken, particularly on the edges of the domain, to ensure enough data for stable
and accurate predictions.
******************************************************************
12
******************************************************************
13
Geostatistical Software
• S-PLUS Spatial Statistics
• SAS (Vario, Krige2d)
• GSLIB (Fortran)
• ArcView/ArcInfo (Spatial and Geostatistical Analyst)
• GS+ (Windows)
http://www.ai.geostats.org
******************************************************************
Hierarchical Modeling
A hierarchical model is one that is specified in stages.
Hierarchical models are developed as a sequence of conditional distributions.
Advantages:
1. Can build complicated models by layering simple pieces
2. Can integrate data from different sources
Disadvantages:
1. Computational complexity
2. Model checking and diagnostics
******************************************************************
14
Simple Example
Research study: Investigate the effect of paper color (blue, green, red ) on response rates for
questionnaires distributed by the “windshield method” in supermarket parking lots. (Example
14.11 from Neter, Wasserman, and Kutner, 1990).
Data yij are response rates for the jth supermarket using the ith paper, i = 1, 2, 3; j = 1, 2, . . . , 5.
Let αi be the effect of paper color on response rate.
Color Supermarket
Green 28 26 31 27 35
Blue 34 29 25 31 29
Red 31 25 27 29 28
hierarchical model:
yij|αi, σ2 ∼ N(αi, σ
2)
αi|µ, τ 2 ∼ N(µ, τ 2)
Need estimates of αi, σ2, µ and τ 2.
******************************************************************
Depending on the complexity, inference can be done in one of two ways:
1. Likelihood-based methods. Maximize joint likelihood (or approximate joint likelihood) of
the data with respect to unknown parameters.
• Gaussian data: ML and REML
• Binary or Count data: Penalized Quasi-Likelihood; Pseudo-Likelihood
2. Bayesian methods. Use Bayes’ rule to construct the posterior distribution of the data. Let
f(y|θ) be the distribution of the data, given the unknown parameter θ. Let θ also be a
random variable with (prior) distribution π(θ). Then the posterior distribution of θ given
the data is
h(θ|y) =f(y|θ)π(θ)
∫f(y|θ)π(θ)dθ
Typically the mean of this distribution is taken to be the Bayes’ estimate of θ.
******************************************************************
15
In order to obtain closed-form expressions for the posterior mean, conjugate priors are used.
Conjugate family: If the prior belongs to a family of distributions, so does the posterior.
Suppose
f(y|θ) ∼ N(θ, σ2) (data)
θ ∼ N(µ, τ 2) (prior)
Assume both τ 2 and σ2 are known.
Then,
h(θ|y) ∼ N(ynτ 2 + µσ2
nτ 2 + σ2,
σ2τ 2
nτ 2 + σ2) (posterior)
and
θ =ynτ 2 + µσ2
nτ 2 + σ2.
More on Bayesian estimation and modeling later....
******************************************************************
Spatial Hierarchical Modeling
For use in a spatial setting, hierarchical models must include some sort of neighborhood depen-
dence or autocorrelation.
This can be done in many different ways. Specification of spatial structure in these models is
an area of recent and ongoing statistical research.
The key to these models is the conditional specification with the assumption of conditional
independence.
******************************************************************
16
Lip Cancer Example
Observed (C) and expected (E) numbers of lip cancer cases in males in the 56 districts of
Scotland
Outcome variable (Y): standardized morbidity ratio: (SMR) = 100 observed/expected
Hypothesis: Occupational exposure to sunlight might contribute to the incidence of lip cancer
in males.
Covariate (X): Percentage of the work force engaged in Agriculture, Fishing or Forestry (%AFF)
(divided by 10).
Clayton and Kaldor (1987, Biometrics), Breslow and Clayton (1993, JASA)
******************************************************************
17
******************************************************************
18
Two-Level Hierarchical Model
Let θi represent the district-specific log-relative risks, i = 1, . . . , 56.
Assume that, conditional on θi, the data are mutually independent Poisson random variables
with mean
µ ≡ E(yi|θi) = niexp(β0 + β1xi + θi)
Assume {θi} arise from a stationary Gaussian random process with mean 0 and covariance
function σ2ρ(||i − j||).
Thus, the joint distribution of θ = (θ1, θ2, . . . , θ56)′ is MV N(0, Σ), where Σ(i, j) = σ2ρ(||i− j||).
******************************************************************
In hierarchical modeling notation:
yi|θi = Poisson(niexp(β0 + β1xi + θi))
θi ∼ MV N(0, Σ), where Σ(i, j) = σ2ρ(||i − j||).
Note that the spatial structure (and any autocorrelation) is induced in the data by the assump-
tions on θi.
We would like estimates of β0 and β1 and their standard errors, estimates of the spatial au-
tocorrelation parameters that parameterize ρ(||i − j||), and a p-value for testing the covariate
effect.
******************************************************************
19
Notes
1. This model is also called a generalized linear mixed model.
2. If given θ, the data were assumed Gaussian, this would be a linear mixed model. This is
one spatial extention of the random factors ANOVA.
3. Inference can be done using likelihood-based techniques or Bayesian estimation.
4. With likelihood-based techniques, Taylor series is used to transform the problem to a
linear one and then well-known methods for inference in linear models are used. (Common
approach is Pseudo-Likelihood (PL)). However, this can be done only with Gaussian priors.
5. Obtaining p-values per se with Bayesian approaches is somewhat problematic. Theoret-
ically, must fit 2 models: the null and one alternative of choice. It is difficult to get
something that can be compared to the hallowed 0.05.
******************************************************************
20
Estimation with Pseudo-Likelihood
Due to Wolfinger and O’Connell (1993). Implemented in SAS by the GLIMMIX macro.
1. Obtain an estimate of µi ≡ E(yi|θi) = exp(log(ni) + β0 + β1xi + θi).
2. Compute “pseudo” data obtained by linearization
ν = g(µ) + ∆µ(y − µ),
where g(µ) = β0 + β1x + θ, (g is called the link function), and ∆µ is a n x n diagonal
matrix with elements [∂g(µ)/∂µ] evaluated at µ. In this example, the (i, i)th element of
∆µ is 1/µi.
3. Using ML or REML, fit a weighted linear mixed model using the pseudo data. In this
case the weights (in W) are equal to µ. More generally, the weights are taken to be the
variance function of the generalized linear model. The likelihood has a familiar form:
l(ν; β) = −1/2|V | − 1/2(ν − Xβ)′V −1(ν − Xβ) − (n/2)log(2π),
V = W−1 + Σ.
Σ contains the spatial autocorrelation parameters.
4. Obtain a new estimate of µ and iterate until convergence.
******************************************************************
To implement this in practice, a parametric form is usually assumed for the spatial autocor-
relation function, ρ(||i − j||). Any of the familiar models discussed previously (e.g., spherical,
exponential) can be chosen. The empirical semivariogram gives a crude indication of the func-
tional form of this model.
For the lip cancer data, a spatial power covariance model (equivalent to the exponential semi-
variogram model) was assumed. This model is
cov(yi, yj) = σ2ρ||i−j||.
The parameter ρ measures the strength of spatial autocorrelation.
This function specifies Σ parametrically in the likelihood function.
21
Results: β0 = 0.47, s.e.(β0) = 0.36;
β1 = 0.29, s.e.(β0) = 0.13;
ρ = 0.44;
p-value=0.03.
Can do many different types of spatial regression.
Can also do prediction and smoothing.
Implemented with the SAS GLIMMIX macro available at http://www.sas.com/techsup/download/stat/
******************************************************************
A More Complex Model
Separate heterogeneity and spatial similarity.
yi|θi = Poisson(niexp(β0 + β1xi + θi + φi))
θiind∼ N(0, 1/η) (heterogeneity)
φ ∼ MV N(0, C−1), (spatial autocorrelation)
where C(i, i) = λ∑
j 6=i cij and C(i, j) = −λcij (Gaussian intrinsic autoregression).
Adjacency: cij = 1 if region j is adjacent to region i and are equal to zero otherwise.
η and λ are called hyperparameters and are assigned distributions called hyperpriors.
******************************************************************
22
Bayesian Model Fitting
Approximations to posterior distribution
Numerical integration
Markov Chain Monte Carlo
• Tool to sample from posterior distribution
• Markov property: distribution of θn|θ1, θ2, . . . θn−1 depends only on most recent value,
θn−1.
• Implemented with Gibbs sampling and Metropolis-Hastings algorithms
******************************************************************
Gibbs Sampling
Gibbs sampling is a numerical algorithm that allows one to evaluate a complex marginal or joint
distribution using conditional distributions.
Allows evaluation of distributions that cannot be explicity formed. Requires finite densities.
Example: Suppose we want to make inferences about h(θ1, θ2). Assume that the conditional
distributions g(θ1|θ2) and f(θ2|θ1) can be sampled from easily.
Given a starting value, draw a sample from each conditional density and repeat the loop
θi+11 ∼ g(θ1|θi
2)
θi+12 ∼ f(θ2|θi+1
1 )
Eventually we will be sampling from the joint distribution h(θ1, θ2).
Burn-in: How long until the chain reaches the target distribution, h.
******************************************************************
23
Gibbs Sampling Continued
The result is
θ11, θ
21, . . . , θ
N1 ∼ g(θ1) (marginal)
θ12, θ
22, . . . , θ
N2 ∼ g(θ2) (marginal)
Could use these distributions to estimate θ1 and θ2. However it is better to use
1
N
N∑
i=1
g(θ1|θi2) and
1
N
N∑
i=1
f(θ2|θi1).
Mixing: How close are these to independent draws? Values are likely to be autocorrelated.
See references for rationale and theory behind Gibbs sampling as well as ideas pertaining to
burn-in, chain length, number of chains, and convergence issues and diagnostics.
******************************************************************
Software
• SAS. 2-level models only. Uses Likelihood-based methods. PROC MIXED (Gaussian lin-
ear mixed models), GLIMMIX (generalized linear mixed models), and PROC NLMIXED
(nonlinear mixed models).
• MLWIN. Uses likelihood-based methods. Best suited to clustered data applications.
• S-PLUS. 2-level Gaussian mixed models
• BUGS and winBUGS (Bayesian inference Using Gibbs Sampling) Downloaded free from
www.mrc-bsu.cam.ac.uk/bugs
******************************************************************
*****************************************************************
*****************************************************************
24
References
Geostatistics
Armstrong, M. 1999. Basic Linear Geostatistics. Springer-Verlag: New York.
Chiles, J. P. and Delfiner, P. 1999. Geostatistics: Modeling Spatial Uncertainty. John Wiley:
New York.
Cressie, N. 1985. Fitting variogram models by weighted least squares. Journal of the Interna-
tional Association for Mathematical Geology, 17: 563-586.
Deutsch, C.V. and Journel, A.G. 1992. GSLIB: Geostatistical Software Library and User’s
Guide. Oxford University Press: New York.
Goovaerts, P. 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press:
New York.
Gotway, C.A. 1991. Fitting semivariogram models by weighted least squares. Computers and
Geosciences, 17: 171-172.
Isaaks, E. H. and Srivastava, R. M. 1989. An Introduction to Applied Geostatistics. Oxford
University Press: New York.
Journel, A.G. 1989. Fundamentals of Geostatistics in Five Lessons. American Geophysical
Union: Washington, D.C.
Journel, A. G. and Huijbregts, C. J. 1978. Mining Geostatistics. Academic Press: London.
Zirschy, J.H. and Harris, D. J. 1986. Geostatistical analysis of hazardous waste site data. Journal
of Environmental Engineering, ASCE, 112: 770-784.
Spatial Statistics
Bailey, T.C.and and Gatrell, A.C. 1995. Interactive Spatial Data Analysis. Addison Wesley
Longman: Essex.
Cressie, N. 1993. Statistics for Spatial Data. John Wiley: New York.
Haining, R. 1990. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge
University Press: New York.
25
Upton, G.J.G. and Fingleton, B. 1985. Spatial Data Analysis by Example. Volume I. Point
Pattern and Quantitative Data. John Wiley: Chichester
Upton, G.J.G. and Fingleton, B. 1985. Spatial Data Analysis by Example. Volume II. Categor-
ical and Directional Data. John Wiley: Chichester
Webster, R. and Oliver, M. 1990. Statistical Methods in Soil and Land Resource Survey. Oxford
University Press: Oxford.
Bayesian Analysis and Computation
Besag, J., Green, P., Higdon, D., and Mengersen, K. 1995. Bayesian computation and stochastic
systems (with discussion). Statistical Science, 10: 3-66.
Carlin, B.P. and Louis, T.A. 1996. Bayes and Empirical Bayes Methods for Data Analysis.
Chapman & Hall: New York.
Clayton, D. G. and Kaldor, J. 1987. Empirical Bayes estimates of age-standardized relative
risks for use in disease mapping. Biometrics, 43: 671-682.
Chib, S. and Greenberg, E. 1995. Understanding the Metropolis-Hastings algorithm. The
American Statistician, 49: 327-335.
Cowles, M.K. and Carlin, B.P. 1996. Markov Chain Monte Carlo convergence diagnostics: A
comparative review. Journal of the American Statistical Association, 91: 883-904.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. 1995. Bayesian Data Analysis. Chapman
& Hall: London.
Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. 1996. Markov Chain Monte Carlo in
Practice. Chapman & Hall: London.
Smith, A.F.M. and Gelfand, A.E. 1992. Bayesian statistics without tears: A sampling-resampling
perspective. The American Statistician, 46: 84-88.
Spatial Hierarchical Modeling
Besag, J., York, J.C., and Mollie , A. 1991. Bayesian image restoration, with two applications
in spatial statistics (with discussion). Annals of the Institute of Statistical Mathematics, 43:
1-59.
26
Clayton, D. G. and Bernardinelli, L. 1992. Bayesian methods for disease mapping. Pages 205-
220 in Geographical and Environmental Epidemiology, Elliott, P. and Cuzick, J. and English, D.
and Stern, R. (eds.) Oxford Medical Publications: Oxford.
Breslow, N. E. and Clayton, D. G. 1993. Approximate inference in generalized linear mixed
models. Journal of the American Statistical Association, 88: 9-25.
Littell, R. C. and Milliken, G. A. and Stroup, W. W. and Wolfinger, R. D. 1996. The SAS
System for Linear Models. SAS Institute: Cary, NC.
Mugglin, A. S. and Carlin, B. P. and Zhu, L. and Conlon, E. 1999. Bayesian areal interpola-
tion, estimation, and smoothing: an inferential approach for geographic information systems.
Environment and Planning, A, 31: 1337-1352.
Royle, J.A. and Berliner, L.M. 1999. A hierarchical approach to multivariate spatial modeling
and prediction. Journal of Agricultural, Biological, and Environmental Statistics, 4: 1-28.
Waller, L.A., Carlin, B.P., Xia, H. and Gelfand, A.E. 1997. Hierarchical spatio-temporal map-
ping of disease rates. Journal of the American Statistical Association, 92: 607-617.
Wakefield, J.C., Best, N.G., and Waller, L. 2000. Bayesian approaches to disease mapping.
Pages 104-127 in Spatial Epidemiology Methods and Applications. Elliott, P., Wakefield, J.C.,
Best, N.G., and Briggs, D.J. (eds.). Oxford University Press: Oxford.
Wikle,C. K. and Berliner, L. M. and Cressie, N. 1998. Hierarchical Bayesian space-time models.
Environmental and Ecological Statistics, 5: 117-154.
Wolfinger, R. D. and O’Connell, M. 1993. Generalized linear mixed models: a pseudo-likelihood
approach. Journal of Statistical Computing and Simulation, 48: 233-243.
27