geostatistics and spatial hierarchical modeling · 2004-12-13 · geostatistics geostatistics a eld...

Geostatistics and Spatial Hierarchical Modeling

Presented for the Workshop on Spatial Analysis in Social Research

Sponsored by:

Interuniversity Consortium on Social and Political Research, University of Michigan

Center for Spatially Integrated Social Science, University of California, Santa Barbara

May 17-20, 2001

Carol A. Gotway Crawford

National Center for Environmental Health

Centers for Disease Control and Prevention

Mail Stop E70

1600 Clifton Road NE

Atlanta, GA 30333

Tel: (404) 639-2504; Fax: (404) 639-1677; E-Mail: [email protected]

******************************************************************

Outline

I. Geostatistics

A. Introduction. What is Geostatistics? When could geostatistical techniques be useful? Stress

continuous, fixed, spatial index (contrast lattice and point processes).

B. The Semivariogram. Definition. Relationship to “autocorrelation.” Estimation. Models and

modeling. Example.

C. Kriging. Overview and rationale. Development of ordinary kriging predictor. Search neigh-

borhoods. Other types of kriging (universal, block kriging, indicator, co-kriging definitions).

Example.

D. Software

II. Spatial Hierarchical Modeling

A. What is a hierarchical model? Advantages and disadvantages.

B. Estimation in single-stage model. Likelihood-based inference and Bayesian Estimation.

C. Example. Model specification. Notes on estimation.

D. Overview of more complex models. Bayesian modeling. Gibbs sampling.

E. Software

******************************************************************

1

Geostatistics

Geostatistics a field of statistics concerned with the study of spatial data that have a continuous

spatial index (i.e., data can be observed at any point within a domain of interest, at least

conceptually).

Data locations are assumed to be fixed and known.

These attributes distinguish geostatistics from point patterns and lattice data, although geosta-

tistical concepts have been useful in these areas.

Assumes data are a partial realization of a random process (a collection of random variates)

{Z(s) : s ∈ D}

where D a fixed subset of <n (n is usually 2), and the spatial index, s, varies continuously

throughout D.

Data are denoted Z(s1), Z(s2), . . . , Z(sn).

******************************************************************

Replication needed for inference is provided by an assumption of intrinsic stationarity.

E[Z(s)] = µ, for all s ∈ D,

V ar(Z(si) − Z(sj)) = 2γ(si − sj), si, sj ∈ D.

The function 2γ(·) is called the variogram. The function γ(·) is called the semivariogram.

If γ(si − sj) is a function of just the distance between si and sj and not direction, then the

process is called isotropic. If γ(si − sj) depends on both distance and direction, then the spatial

process is called anisotropic.

******************************************************************

2

The Semivariogram

The semivariogram is a function of the spatial process and as such satisfies certain properties.

Let h = s − u be the spatial lag, or vector between spatial locations s and u.

i) γ(−h) = γ(h)

ii) γ(0) = 0

iii) γ(h)/‖h‖2 −→ 0 as ‖h‖ −→ ∞, i.e., γ(h) cannot increase too fast with ‖h‖.

iv) γ(·) must be conditionally negative-definite, i.e.,

n∑

i=1

n∑

j=1

aiajγ(si − sj) ≤ 0

for any finite number of locations {si : i = 1, . . . , m} and real numbers {a1, . . . , an}satisfying

∑ni=1 ai = 0.

v) If the spatial process is isotropic, then γ(h) ≡ γ(h) where h = ‖h‖ (Euclidean distance).

******************************************************************

A graph of the semivariogram plotted against separation distance conveys information about

the continuity and spatial variability of the process.

This graph starts at zero and then, if observations close together are more alike than those

farther apart, increases as the separation distance increases.

It may level off to nearly a constant value (called the sill) at a large separation distance (called

the range). Beyond this distance, observations are spatially uncorrelated.

The shape of the semivariogram near the origin indicates the degree of smoothness or spatial

continuity of the spatial variable under study. A parabolic shape near the origin arises with a

very smooth spatial variable that is both continuous and differentiable. A linear shape near the

origin reflects a variable that is continuous, but not differentiable, and hence less regular.

A discontinuity, or vertical jump, at the origin indicating that the spatial variable is not even

continuous and has highly irregular spatial variablity. This discontinuity is called the nugget

effect.

******************************************************************

3

Idealized Semivariogram

******************************************************************

4

Estimating the Semivariogram

γ(h) =1

2|N(h)|∑

N(h)

(Z(si) − Z(sj))2, h ∈ <2,

where N(h) is the number of pairs separated by h, i.e., N(h) = {(si, sj) : si − sj = h} and

|N(h)| = the number of distinct pairs in N(h).

Irregularly spaced data: need to define distance classes, called tolerance intervals, and group

the sample pairs into these classes prior to averaging. This is analogous to the procedure used

in making a histogram.

Estimates of the semivariogram at large lags are based only on points at the opposite ends of

the domain. Thus, in practice, we usually take the maximum lag distance to be about half the

maximum separation distance.

The empirical semivariogram is a picture of your data spatially: the sill and the range, if they

exist, provide estimates of the process variance and the zone of influence of the observations, and

information at larger lags can indicate large-scale trends that may be important to interpret.

******************************************************************

Example: Dioxin Contamination

In 1971, a truck transporting dioxin-contaminated residues dumped an unknown quantity of

waste in a rural area of Missouri in order to prevent tickets for being overweight.

In November 1983, The U.S. EPA collected soil samples in several areas and measured the

TCDD (tetrachlorodibenzo-p-dioxin) concentration in each sample. Following Zirschy and Har-

ris (1996), we will analyze the logarithm of the TCDD data.

For simplicity, transform the study domain by dividing the x-coordinate by 50 to produce a

region that is almost square.

******************************************************************

5

******************************************************************

Tolerance intervals:

• maximum lag distance=maximum separation distance/2=√

71.12 + 602 = 93.0/2 = 46.5.

• number of lags=11 (arbitrary)

• Many locations are about 4 ft. apart, so try a lag spacing slightly greater than 4, say 4.6.

• lag tolerance=lag spacing/2.

• defines distance intervals 4.6i ± 2.3, i = 0, 1, 2, . . . 10.

Lag Distance γ(h) N(h)1 1.47 0.55 412 4.37 0.93 1553 9.85 1.75 4044 14.02 2.22 5335 19.25 2.58 4976 22.93 3.27 5347 27.40 2.84 5048 31.87 3.19 7869 36.82 3.37 42210 41.22 3.83 87011 45.73 3.86 607

Experiment with different tolerance intervals. The goal is accurate estimation (> 30 pairs for

each estimated value), a clear structure, and at least a total of 10-20 lags for modeling and

inference.

******************************************************************

6

Modeling the Semivariogram

The empirical semivariogram, γ(·), is not guaranteed to satisfy the properties of the semivari-

ogram.

This can lead to inconsistent results when used for spatial prediction. To solve this problem,

parametric functions are used to model the structure of the empirical semivariogram.

There are several commonly-used parametric models. Models may be added together for complex

semivariogram structures.

******************************************************************

7

• SPHERICAL:

γ(h; θ) =

0 h = 0c0 + cs {(3/2)(h/as) − (1/2)(h/as)

3} 0 ≤ h ≤ as

c0 + cs h > as,

θ = (c0, cs, as)′, c0 ≥ 0, cs ≥ 0, as ≥ 0. The spherical semivariogram is nearly linear near

the origin. The parameter c0 measures the nugget effect, cs is the partial sill (so c0 + cs is the

sill), and a is the range.

•EXPONENTIAL:

γ(h; θ) =

{0 h = 0c0 + ce {1 − exp(−h/ae)} h > 0,

θ = (c0, ce, ae)′, c0 ≥ 0, ce ≥ 0, ae ≥ 0. The exponential semivariogram rises more slowly from

the origin than the spherical. As with the spherical model, c0 measures the nugget effect and cs

is the partial sill (so c0 + cs is the sill). However, this model approaches the sill asympotitcally

and the effective range is 3a.

•GAUSSIAN:

γ(h; θ) =

{0 h = 0c0 + cg {1 − exp[−(h/ag)

2]} h > 0,

θ = (c0, cg, ag)′, c0 ≥ 0, cg ≥ 0, ag ≥ 0. The Gaussian semivariogram model is parabolic

near the origin, indicative of a very smooth spatial process. As with the previous models, c0

measures the nugget effect and cs is the partial sill (so c0 + cs is the sill). The range is√

3a.

•POWER:

γ(h; θ) =

{0 h = 0c0 + b`h

p h > 0,

θ = (c0, b`, p)′, c0 ≥ 0, b` ≥ 0, 0 ≤ p < 2. The power model is used to model processes with

large-scale trend by taking p ≥ 1. It also plays an important role in fractal processes and the

estimation of the fractal dimension .

•HOLE-EFFECT:

γ(h; θ) =

{0 h = 0c0 + cw{1 − awsin(h/aw)/h} h > 0,

θ = (c0, cw, aw)′, c0 ≥ 0, cw ≥ 0, aw ≥ 0. The hole-effect model is useful for processes with

negative spatial autocorrelation arising from a cyclical or periodic variability. It reaches a global

maximum and then continues to oscillate around the sill with a period of a.

There are many different ways to parameterize these models and different computer software

programs use slightly different forms. Check to be sure what the parameters mean!

******************************************************************

8

Fitting Semivariogram Models

Use the empirical semivariogram to estimate the parameters of the semivariogram model (θ).

• ML and REML (Gaussian data), Composite Likelihood (non-Gaussian data)

• Nonlinear least squares regression, weighted and generalized

• By eye

******************************************************************

9

Kriging

Kriging is a method of spatial prediction or interpolation that has optimal statistical properties.

There are many types of kriging:

• Simple kriging: Process mean known

• Ordinary kriging: Process mean unknown but constant

• Universal kriging: Process mean is a parametric function of covariates

• Indicator kriging: For binary or nonlinear prediction

• Block kriging: Areal prediction. Geostatistical solution to MAUP

• Co-kriging: Spatial prediction in a multivariate framework

There are more......

We will discuss ordinary kriging here. Development of other kriging predictors is similar.

******************************************************************

Ordinary Kriging

Data Z(s1), . . . , Z(sn) observed at locations s1, . . . , sn.

Goal: Predict Z(s0) at location s0 where no data value is observed.

Consider predictors that are a weighted average of the data values, i.e.,

Z(s0) =n∑

i=1

λiZ(si).

Assume E(Z(s)) = µ, with µ unknown and independent of location s, and choose the weights

{λi} so that Z(s0) is unbiased for Z(s0) and has the smallest prediction error variance out of all

linear unbiased predictors.

Such a predictor is called the best linear unbiased predictor, or BLUP.

******************************************************************

10

The method of Lagrange multipliers from calculus is used to minimize the prediction error

variance subject to the unbiasedness constraint.

This gives the ordinary kriging equations

n∑

j=1

λjγ(si − sj) − m = γ(si − s0); i = 1, . . . n

n∑

i=1

λi = 1.

This is a system of (n+1) equations (n λ’s and one Lagrange multiplier m) which must be solved

simultaneously.

Once these equations are solved for the λ’s, the ordinary kriging predictor is

Z(s0) =n∑

i=1

λiZ(si).

******************************************************************

These equations are often written in matrix form:

0 γ(s1 − s2) . . . γ(s1 − sn) 1γ(s2 − s1) 0 . . . γ(s2 − sn) 1

...γ(sn − s1) . . . γ(sn − sn−1) 0 1

1 1 . . . 1 0

λ1

λ2...

λn

−m

=

γ(s1 − s0)γ(s2 − s0)

...γ(sn − s0)

1

which can be written succinctly as

Γλ = γ.

This development assumes the semivariogram γ(·) in known. In practice, the semivariogram

is estimated and modeled from the data (as described earlier), and then the fitted parametric

model is used to specify the entries in Γ and γ.

******************************************************************

11

The minimized prediction error variance, also called the kriging variance is given by

σ2k(s0) = 2

n∑

i=1

λiγ(si − s0) −n∑

i=1

n∑

j=1

λiλjγ(si − sj).

The square root of this quantity, called the kriging standard error, gives a measure of the

uncertainty in our prediction of Z(s0). Prediction intervals (similar to confidence intervals for a

fixed parameter) can be constructed as

Z(s0) ± zα/2σk(s0),

where zα/2 is the α/2 percentage point from a normal distribution.

******************************************************************

Search Neighborhoods

Inversion of matrices in kriging can be cumbersome, particularly for large data sets.

Construct a neighborhood (usually a circle or ellipse) around each prediction location, s0. Only

data observed at locations within this neighborhood are used to predict the data value at s0.

For more localized prediction, the neighborhood is further restricted by retaining only a specified

number of points for prediction.

Care must be taken, particularly on the edges of the domain, to ensure enough data for stable

and accurate predictions.

******************************************************************

12

******************************************************************

13

Geostatistical Software

• S-PLUS Spatial Statistics

• SAS (Vario, Krige2d)

• GSLIB (Fortran)

• ArcView/ArcInfo (Spatial and Geostatistical Analyst)

• GS+ (Windows)

http://www.ai.geostats.org

******************************************************************

Hierarchical Modeling

A hierarchical model is one that is specified in stages.

Hierarchical models are developed as a sequence of conditional distributions.

Advantages:

1. Can build complicated models by layering simple pieces

2. Can integrate data from different sources

Disadvantages:

1. Computational complexity

2. Model checking and diagnostics

******************************************************************

14

Simple Example

Research study: Investigate the effect of paper color (blue, green, red ) on response rates for

questionnaires distributed by the “windshield method” in supermarket parking lots. (Example

14.11 from Neter, Wasserman, and Kutner, 1990).

Data yij are response rates for the jth supermarket using the ith paper, i = 1, 2, 3; j = 1, 2, . . . , 5.

Let αi be the effect of paper color on response rate.

Color Supermarket

Green 28 26 31 27 35

Blue 34 29 25 31 29

Red 31 25 27 29 28

hierarchical model:

yij|αi, σ2 ∼ N(αi, σ

2)

αi|µ, τ 2 ∼ N(µ, τ 2)

Need estimates of αi, σ2, µ and τ 2.

******************************************************************

Depending on the complexity, inference can be done in one of two ways:

1. Likelihood-based methods. Maximize joint likelihood (or approximate joint likelihood) of

the data with respect to unknown parameters.

• Gaussian data: ML and REML

• Binary or Count data: Penalized Quasi-Likelihood; Pseudo-Likelihood

2. Bayesian methods. Use Bayes’ rule to construct the posterior distribution of the data. Let

f(y|θ) be the distribution of the data, given the unknown parameter θ. Let θ also be a

random variable with (prior) distribution π(θ). Then the posterior distribution of θ given

the data is

h(θ|y) =f(y|θ)π(θ)

∫f(y|θ)π(θ)dθ

Typically the mean of this distribution is taken to be the Bayes’ estimate of θ.

******************************************************************

15

In order to obtain closed-form expressions for the posterior mean, conjugate priors are used.

Conjugate family: If the prior belongs to a family of distributions, so does the posterior.

Suppose

f(y|θ) ∼ N(θ, σ2) (data)

θ ∼ N(µ, τ 2) (prior)

Assume both τ 2 and σ2 are known.

Then,

h(θ|y) ∼ N(ynτ 2 + µσ2

nτ 2 + σ2,

σ2τ 2

nτ 2 + σ2) (posterior)

and

θ =ynτ 2 + µσ2

nτ 2 + σ2.

More on Bayesian estimation and modeling later....

******************************************************************

Spatial Hierarchical Modeling

For use in a spatial setting, hierarchical models must include some sort of neighborhood depen-

dence or autocorrelation.

This can be done in many different ways. Specification of spatial structure in these models is

an area of recent and ongoing statistical research.

The key to these models is the conditional specification with the assumption of conditional

independence.

******************************************************************

16

Lip Cancer Example

Observed (C) and expected (E) numbers of lip cancer cases in males in the 56 districts of

Scotland

Outcome variable (Y): standardized morbidity ratio: (SMR) = 100 observed/expected

Hypothesis: Occupational exposure to sunlight might contribute to the incidence of lip cancer

in males.

Covariate (X): Percentage of the work force engaged in Agriculture, Fishing or Forestry (%AFF)

(divided by 10).

Clayton and Kaldor (1987, Biometrics), Breslow and Clayton (1993, JASA)

******************************************************************

17

******************************************************************

18

Two-Level Hierarchical Model

Let θi represent the district-specific log-relative risks, i = 1, . . . , 56.

Assume that, conditional on θi, the data are mutually independent Poisson random variables

with mean

µ ≡ E(yi|θi) = niexp(β0 + β1xi + θi)

Assume {θi} arise from a stationary Gaussian random process with mean 0 and covariance

function σ2ρ(||i − j||).

Thus, the joint distribution of θ = (θ1, θ2, . . . , θ56)′ is MV N(0, Σ), where Σ(i, j) = σ2ρ(||i− j||).

******************************************************************

In hierarchical modeling notation:

yi|θi = Poisson(niexp(β0 + β1xi + θi))

θi ∼ MV N(0, Σ), where Σ(i, j) = σ2ρ(||i − j||).

Note that the spatial structure (and any autocorrelation) is induced in the data by the assump-

tions on θi.

We would like estimates of β0 and β1 and their standard errors, estimates of the spatial au-

tocorrelation parameters that parameterize ρ(||i − j||), and a p-value for testing the covariate

effect.

******************************************************************

19

Notes

1. This model is also called a generalized linear mixed model.

2. If given θ, the data were assumed Gaussian, this would be a linear mixed model. This is

one spatial extention of the random factors ANOVA.

3. Inference can be done using likelihood-based techniques or Bayesian estimation.

4. With likelihood-based techniques, Taylor series is used to transform the problem to a

linear one and then well-known methods for inference in linear models are used. (Common

approach is Pseudo-Likelihood (PL)). However, this can be done only with Gaussian priors.

5. Obtaining p-values per se with Bayesian approaches is somewhat problematic. Theoret-

ically, must fit 2 models: the null and one alternative of choice. It is difficult to get

something that can be compared to the hallowed 0.05.

******************************************************************

20

Estimation with Pseudo-Likelihood

Due to Wolfinger and O’Connell (1993). Implemented in SAS by the GLIMMIX macro.

1. Obtain an estimate of µi ≡ E(yi|θi) = exp(log(ni) + β0 + β1xi + θi).

2. Compute “pseudo” data obtained by linearization

ν = g(µ) + ∆µ(y − µ),

where g(µ) = β0 + β1x + θ, (g is called the link function), and ∆µ is a n x n diagonal

matrix with elements [∂g(µ)/∂µ] evaluated at µ. In this example, the (i, i)th element of

∆µ is 1/µi.

3. Using ML or REML, fit a weighted linear mixed model using the pseudo data. In this

case the weights (in W) are equal to µ. More generally, the weights are taken to be the

variance function of the generalized linear model. The likelihood has a familiar form:

l(ν; β) = −1/2|V | − 1/2(ν − Xβ)′V −1(ν − Xβ) − (n/2)log(2π),

V = W−1 + Σ.

Σ contains the spatial autocorrelation parameters.

4. Obtain a new estimate of µ and iterate until convergence.

******************************************************************

To implement this in practice, a parametric form is usually assumed for the spatial autocor-

relation function, ρ(||i − j||). Any of the familiar models discussed previously (e.g., spherical,

exponential) can be chosen. The empirical semivariogram gives a crude indication of the func-

tional form of this model.

For the lip cancer data, a spatial power covariance model (equivalent to the exponential semi-

variogram model) was assumed. This model is

cov(yi, yj) = σ2ρ||i−j||.

The parameter ρ measures the strength of spatial autocorrelation.

This function specifies Σ parametrically in the likelihood function.

21

Results: β0 = 0.47, s.e.(β0) = 0.36;

β1 = 0.29, s.e.(β0) = 0.13;

ρ = 0.44;

p-value=0.03.

Can do many different types of spatial regression.

Can also do prediction and smoothing.

Implemented with the SAS GLIMMIX macro available at http://www.sas.com/techsup/download/stat/

******************************************************************

A More Complex Model

Separate heterogeneity and spatial similarity.

yi|θi = Poisson(niexp(β0 + β1xi + θi + φi))

θiind∼ N(0, 1/η) (heterogeneity)

φ ∼ MV N(0, C−1), (spatial autocorrelation)

where C(i, i) = λ∑

j 6=i cij and C(i, j) = −λcij (Gaussian intrinsic autoregression).

Adjacency: cij = 1 if region j is adjacent to region i and are equal to zero otherwise.

η and λ are called hyperparameters and are assigned distributions called hyperpriors.

******************************************************************

22

Bayesian Model Fitting

Approximations to posterior distribution

Numerical integration

Markov Chain Monte Carlo

• Tool to sample from posterior distribution

• Markov property: distribution of θn|θ1, θ2, . . . θn−1 depends only on most recent value,

θn−1.

• Implemented with Gibbs sampling and Metropolis-Hastings algorithms

******************************************************************

Gibbs Sampling

Gibbs sampling is a numerical algorithm that allows one to evaluate a complex marginal or joint

distribution using conditional distributions.

Allows evaluation of distributions that cannot be explicity formed. Requires finite densities.

Example: Suppose we want to make inferences about h(θ1, θ2). Assume that the conditional

distributions g(θ1|θ2) and f(θ2|θ1) can be sampled from easily.

Given a starting value, draw a sample from each conditional density and repeat the loop

θi+11 ∼ g(θ1|θi

2)

θi+12 ∼ f(θ2|θi+1

1 )

Eventually we will be sampling from the joint distribution h(θ1, θ2).

Burn-in: How long until the chain reaches the target distribution, h.

******************************************************************

23

Gibbs Sampling Continued

The result is

θ11, θ

21, . . . , θ

N1 ∼ g(θ1) (marginal)

θ12, θ

22, . . . , θ

N2 ∼ g(θ2) (marginal)

Could use these distributions to estimate θ1 and θ2. However it is better to use

1

N

N∑

i=1

g(θ1|θi2) and

1

N

N∑

i=1

f(θ2|θi1).

Mixing: How close are these to independent draws? Values are likely to be autocorrelated.

See references for rationale and theory behind Gibbs sampling as well as ideas pertaining to

burn-in, chain length, number of chains, and convergence issues and diagnostics.

******************************************************************

Software

• SAS. 2-level models only. Uses Likelihood-based methods. PROC MIXED (Gaussian lin-

ear mixed models), GLIMMIX (generalized linear mixed models), and PROC NLMIXED

(nonlinear mixed models).

• MLWIN. Uses likelihood-based methods. Best suited to clustered data applications.

• S-PLUS. 2-level Gaussian mixed models

• BUGS and winBUGS (Bayesian inference Using Gibbs Sampling) Downloaded free from

www.mrc-bsu.cam.ac.uk/bugs

******************************************************************

*****************************************************************

*****************************************************************

24

References

Geostatistics

Armstrong, M. 1999. Basic Linear Geostatistics. Springer-Verlag: New York.

Chiles, J. P. and Delfiner, P. 1999. Geostatistics: Modeling Spatial Uncertainty. John Wiley:

New York.

Cressie, N. 1985. Fitting variogram models by weighted least squares. Journal of the Interna-

tional Association for Mathematical Geology, 17: 563-586.

Deutsch, C.V. and Journel, A.G. 1992. GSLIB: Geostatistical Software Library and User’s

Guide. Oxford University Press: New York.

Goovaerts, P. 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press:

New York.

Gotway, C.A. 1991. Fitting semivariogram models by weighted least squares. Computers and

Geosciences, 17: 171-172.

Isaaks, E. H. and Srivastava, R. M. 1989. An Introduction to Applied Geostatistics. Oxford

University Press: New York.

Journel, A.G. 1989. Fundamentals of Geostatistics in Five Lessons. American Geophysical

Union: Washington, D.C.

Journel, A. G. and Huijbregts, C. J. 1978. Mining Geostatistics. Academic Press: London.

Zirschy, J.H. and Harris, D. J. 1986. Geostatistical analysis of hazardous waste site data. Journal

of Environmental Engineering, ASCE, 112: 770-784.

Spatial Statistics

Bailey, T.C.and and Gatrell, A.C. 1995. Interactive Spatial Data Analysis. Addison Wesley

Longman: Essex.

Cressie, N. 1993. Statistics for Spatial Data. John Wiley: New York.

Haining, R. 1990. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge

University Press: New York.

25

Upton, G.J.G. and Fingleton, B. 1985. Spatial Data Analysis by Example. Volume I. Point

Pattern and Quantitative Data. John Wiley: Chichester

Upton, G.J.G. and Fingleton, B. 1985. Spatial Data Analysis by Example. Volume II. Categor-

ical and Directional Data. John Wiley: Chichester

Webster, R. and Oliver, M. 1990. Statistical Methods in Soil and Land Resource Survey. Oxford

University Press: Oxford.

Bayesian Analysis and Computation

Besag, J., Green, P., Higdon, D., and Mengersen, K. 1995. Bayesian computation and stochastic

systems (with discussion). Statistical Science, 10: 3-66.

Carlin, B.P. and Louis, T.A. 1996. Bayes and Empirical Bayes Methods for Data Analysis.

Chapman & Hall: New York.

Clayton, D. G. and Kaldor, J. 1987. Empirical Bayes estimates of age-standardized relative

risks for use in disease mapping. Biometrics, 43: 671-682.

Chib, S. and Greenberg, E. 1995. Understanding the Metropolis-Hastings algorithm. The

American Statistician, 49: 327-335.

Cowles, M.K. and Carlin, B.P. 1996. Markov Chain Monte Carlo convergence diagnostics: A

comparative review. Journal of the American Statistical Association, 91: 883-904.

Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. 1995. Bayesian Data Analysis. Chapman

& Hall: London.

Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. 1996. Markov Chain Monte Carlo in

Practice. Chapman & Hall: London.

Smith, A.F.M. and Gelfand, A.E. 1992. Bayesian statistics without tears: A sampling-resampling

perspective. The American Statistician, 46: 84-88.

Spatial Hierarchical Modeling

Besag, J., York, J.C., and Mollie , A. 1991. Bayesian image restoration, with two applications

in spatial statistics (with discussion). Annals of the Institute of Statistical Mathematics, 43:

1-59.

26

Clayton, D. G. and Bernardinelli, L. 1992. Bayesian methods for disease mapping. Pages 205-

220 in Geographical and Environmental Epidemiology, Elliott, P. and Cuzick, J. and English, D.

and Stern, R. (eds.) Oxford Medical Publications: Oxford.

Breslow, N. E. and Clayton, D. G. 1993. Approximate inference in generalized linear mixed

models. Journal of the American Statistical Association, 88: 9-25.

Littell, R. C. and Milliken, G. A. and Stroup, W. W. and Wolfinger, R. D. 1996. The SAS

System for Linear Models. SAS Institute: Cary, NC.

Mugglin, A. S. and Carlin, B. P. and Zhu, L. and Conlon, E. 1999. Bayesian areal interpola-

tion, estimation, and smoothing: an inferential approach for geographic information systems.

Environment and Planning, A, 31: 1337-1352.

Royle, J.A. and Berliner, L.M. 1999. A hierarchical approach to multivariate spatial modeling

and prediction. Journal of Agricultural, Biological, and Environmental Statistics, 4: 1-28.

Waller, L.A., Carlin, B.P., Xia, H. and Gelfand, A.E. 1997. Hierarchical spatio-temporal map-

ping of disease rates. Journal of the American Statistical Association, 92: 607-617.

Wakefield, J.C., Best, N.G., and Waller, L. 2000. Bayesian approaches to disease mapping.

Pages 104-127 in Spatial Epidemiology Methods and Applications. Elliott, P., Wakefield, J.C.,

Best, N.G., and Briggs, D.J. (eds.). Oxford University Press: Oxford.

Wikle,C. K. and Berliner, L. M. and Cressie, N. 1998. Hierarchical Bayesian space-time models.

Environmental and Ecological Statistics, 5: 117-154.

Wolfinger, R. D. and O’Connell, M. 1993. Generalized linear mixed models: a pseudo-likelihood

approach. Journal of Statistical Computing and Simulation, 48: 233-243.

27

geostatistics and spatial hierarchical modeling · 2004-12-13 · geostatistics geostatistics a eld...

Documents