model based geostatistics - leg-ufprpaulojus/mbg/slides/mbg.slides.pdf · 2003-06-09 · model...

126
MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamentode Estat´ıstica Universidade Federal do Paran´ a, Brasil 1 Correspondence Addrees: Departamento de Estat´ ıstica, Universidade Federal do Paran´ a, Cx. Pos- tal 19.081, 81.531-990, Curitiba, PR, Brasil. E-mail: [email protected]

Upload: others

Post on 18-Mar-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

MODEL BASED GEOSTATISTICS(Course Slides)

Presented by:Paulo Justiniano Ribeiro Jr 1

Departamento de EstatısticaUniversidade Federal do Parana, Brasil

1Correspondence Addrees: Departamento de Estatıstica, Universidade Federal do Parana, Cx. Pos-tal 19.081, 81.531-990, Curitiba, PR, Brasil. E-mail: [email protected]

Page 2: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Acknowledgements

The course notes and slides are result of joint work with:

• Professor Peter J. DiggleDepartment of Mathematics and StatisticsLancaster University, UK

• Ole ChristensenCenter for Bioinformatik, Datalogisk Institut, Aarhus Universitet,Denmarkformlely at: Department of Mathematics and Statistics, LancasterUniversity, UKand Department of Mathematical Sciences, Aalborg University, Den-mark

Page 3: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

PART I:

INTRODUCTION

1. Basic Examples of Spatial Data

2. A Taxonomy for Spatial Statistics

3. Further Examples of Geostatistical Problems

4. Characteristic Features of Geostatistical Problems

5. Some History

6. Core Geostatistical Problems

7. Model Based Geostatistics

Page 4: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

1. Basic Examples of Spatial Data

(a) Cancer rates in administrative regionsGrey-scale corresponds to estimated varia-tion in relative risk of colorectal cancer in the36 electoral wards of the city of Birmingham,UK.

395000 400000 405000 410000 415000 420000

2750

0028

0000

2850

0029

0000

2950

0030

0000

Eastings (meters)

Nor

thin

gs (

met

ers)

0.9 1.0 1.1 1.2 1.3

2

Page 5: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(b) Rainfall in Parana State, Brasil

Rainfall measurements at 143 recording sta-tions.Average for the May-June period (dry sea-son).

200 300 400 500 600 700 800

010

020

030

040

050

060

0

E−W (km)

N−

S (

km)

3

Page 6: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(c) Campylobacter cases in southern En-gland

Residential locations of 651 cases of campylo-bacter reported over a one-year period in cen-tral southern England.

•••••••••••••••••••••

•••••••••••••••••••••••••••••

•••••• •• •••• ••••••••••••••••

••••••

••

•••

••••••••••

••••• •

•••••••••••••••

•••••••

••• ••• ••••••

••••••• ••

••••••••••••

•••

••••• •••••••••••••••••••••••••••••••••••

••••••••

•••

•••••• •••• ••••••••

•••••••••••••••••••••

•••••••••

••••

••••••

•••••• •

•••••••••••••••••••••

••••••••••••

••••• •••••••••••••••••••••

• •••••••••••••

•• •

••••

•••

••••••

•••••••••••••••••••••••••••••••

•••

••••••••

••••

•••••••

• ••••

• • •••••••

•••• •••

•••••••

••

•••••••

••

•••••••••••••• •••

••••

••• •••••

•••••••••••••••

••••••••

••••••••••• •••

••

•••

• •• ••••

•••••

•••

••

•• •••

•••

••

••

••••

••••

••

•• •

••••••

E-W (km)

N-S

(km

)

380 400 420 440 460

8010

012

014

0

4

Page 7: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

2. A Taxonomy of Spatial Statistics(a) Discrete spatial variation

Basic structure. Yi : i = 1, ..., n

• rarely arises naturally

• but often useful as a pragmatic strategy

• models typically defined indirectly fromfullconditionals, [Yi|Yj,∀j 6= i]

(b) Continuous spatial variation

Basic structure. Y (x) : x ∈ IR2

• data (yi, xi) : i = 1, ..., n, locations xi may be:– non-stochastic (eg lattice to cover obser-

vation region A)– or stochastic, but independent of the

process Y (x)

(c) Spatial point processes

Basic structure. Countable set of points xi ∈IR2, generated stochastically.• data sometimes converted to apparently

discrete spatial variation by aggregationover sub-regions

5

Page 8: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Spatial statistics is the collection of statisti-cal methods in which spatial locations play anexplicit role in the analysis of data.

Two strategic issues:

• don’t confuse the data-format with the un-derlying process

• the choice of model may be influenced by thescientific objectives of the study – analyseproblems, not data

6

Page 9: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

3. Further Examples of GeostatisticalProblems

(a) Swiss rainfall data

E-W (km)

N-S

(km

)

0 100 200 300

0

50

100

150

200

Locations shown as points with size proportional tothe value of the observed rainfall.

• 467 locations in Switzerland

• daily rainfall measurements on 8th of May1986

• data from:Spatial Interpolation Comparison 97(ftp://ftp.geog.uwo.ca/SIC97/ ).

7

Page 10: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(b) Rongelap Island

• study of residual contamination, followingnuclear weapons testing programme du-ring 1950’s

• island evacuated in 1985, is it now safe forre-settlement?

• survey yields noisy measurements Yi of ra-dioactive caesium concentrations

• initial grid of locations xi at 200m spacinglatersupplemented by in-fill squares at 40mspacing.

• particular interest in maximum caesiumconcentration

E-W

N-S

-6000 -5000 -4000 -3000 -2000 -1000 0

-500

0-4

000

-300

0-2

000

-100

00

1000

•• ••

••

••••••••••••••••••••••••••••••••••••••••••••••••••••• •

• •• ••

•• • • • • • • • • • • • • • • • •• •

•• ••••••••••

••••••••••••••••••••••••••

•••••••••••••••••••

•••••••••••

•••

••

8

Page 11: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(c) Gambia malaria

• survey of villages in Gambia• in village i, data Yij = 0/1 denotes ab-

sence/presence of malarial parasites inblood sample from child j

• village-level covariates:– village locations– public health centre in village?– satellite-derived vegetation green-ness

index• child-level covariates:

– age, sex, bed-net use• interest in effects of covariates, and pat-

tern of residual spatial variation

E-W (km)

N-S

(km

)

300 400 500 600

1400

1500

1600

o

o

o

o

oo

oo

oo

o

ooo ooo

ooo o

oo

o

o ooo

oo

o

oo

oooo

o o

o

o oo

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

Western Eastern

300 400 500 600

1400

1500

1600

o

o

o

oo

ooo ooo

oo

o

oo o oo

o

oooo

o

300 400 500 600

1400

1500

1600

Central

ooo

oo

o

oo

o

o

oo

oo

o

300 400 500 600

1400

1500

1600

o

o

o

oo

o

oooo

ooooo

o ooo

oo

ooo

o

9

Page 12: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

4. Characteristic Features of Geosta-tistical Problems

• data consist of responses Yi associated withlocations xi

• in principle, Y could be determined from anylocation x within a continuous spatial regionA

• it is reasonable to behave as if {Y (x) : x ∈ A}is a stochastic process

• xi is typically fixed. If the locations xi are ge-nerated by a stochastic point process, it is re-asonable to behave as if this point process isindependent of the Y (x) process

• scientific objectives include prediction of oneor more functionals of a stochastic process{S(x) : x ∈ A} which is dependent on the Y (x)process.

10

Page 13: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

5. Some History

• Origins in problems connected with esti-mation of ore reserves in mineral explora-tion/mining (Krige, 1951).

• Subsequent development largely indepen-dent of “mainstream” spatial statistics, initi-ally by Matheron and colleagues at Ecole desMines, Fontainebleau.

• Parallel developments by Matern (1946,1960),Whittle (1954, 1962, 1963)

• Ripley (1981) re-casts kriging in terminologyof stochastic process prediction

• Significant cross-fertilization during 1980’sand 1990’s (eg variogram is now a standardstatistical tool for analysing correlated datain space and/or time).

• But still vigorous debate on practical issues:– prediction vs inference– role of explicit probability models

11

Page 14: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

6. Core Geostatistical Problems

• Design– how many locations?– how many measurements?– spatial layout of the locations?– what to measure at each location?

• Modelling– probability model for the signal, [S]

– conditional probability model for the mea-surements, [Y |S]

• Estimation– assign values to unknown model parame-

ters– make inferences about (functions of) model

parameters

• Prediction– evaluate [T |Y ], the conditional distribution

of the target given the data

12

Page 15: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

7. Model-Based Geostatistics

• declares explicit stochastic model

• apply general statistical principles for infe-rence

Notation

(Yi, xi) : i = 1, ..., n

• {xi : i = 1, ..., n} is the sampling design

• {Y (x) : x ∈ A} is the measurement process

• {S(x) : x ∈ A} is the signal process

• T = F(S) is the target for prediction

• [S, Y ] = [S][Y |S] is the geostatistical model

13

Page 16: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Traditional geostatistics:

• avoids explicit references to the parametricspecification of the model

• inference via variograms(Matheron’s “estimating and choosing”)

• complex variogram structures are often used

• concentrates on linear estimators

• specific methods/paradigms for:– point prediction (SK, OK, KTE, UK)

– prediction of non-linear functionals (IK,DK)

– predictive estimation (IK, DK)

– simulations from the predictive distribu-tion(SGSIM, SISIM, ...)

• the kriging menu

14

Page 17: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

PART II:

SPATIAL PREDICTIONAND

GAUSSIAN MODELS

1. The Gaussian Model

2. Specification of the Correlation Function

3. Stochastic Process Prediction

4. Linear Geostatistics

5. Prediction under the Gaussian Model

6. What does Kriging Actually do to the Data

7. Prediction of Functionals

8. Directional Effects

9. Non-stationary Gaussian Models

Page 18: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

1. The Gaussian Spatial Model

(a) S(·) is a stationary Gaussian process withi. E[S(x)] = µ,

ii. Var{S(x)} = σ2

iii. ρ(u) = Corr{S(x), S(x− u)};

(b) the conditional distribution of Yi given S(·) isGaussian with mean S(xi) and variance τ 2;

(c) Yi : i = 1, ..., n are mutually independent, con-ditional on S(·).

0.0 0.2 0.4 0.6 0.8 1.0

4.0

4.5

5.0

5.5

6.0

6.5

7.0

locations

data

simulated data in 1-D illustrating the elements of the mo-del: data Y (xi) (dots), signal S(x) (curve line) and mean µ.(horizontal line).

16

Page 19: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

An Equivalent Formulation:

Yi = S(xi) + εi : i = 1, ..., n.

where εi : i = 1, ..., n are mutually independent,identically distributed with εi ∼ N(0, τ 2).

Then, the joint distribution of Y is multivariateNormal,

Y ∼ MVN(µ1, σ2R + τ 2I)

where:1 denotes an n-element vector of ones,I is the n× n identity matrixR is the n × n matrix with (i, j)th element ρ(uij)whereuij = ||xi − xj||, the Euclidean distance betweenxi and xj.

17

Page 20: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

2. Specification of the CorrelationFunction

Differentiability of Gaussian processes

• A formal mathematical description of thesmoothness of a spatial surface S(x) is its de-gree of differentiability.

• A process S(x) is mean-square continuous if,for all x, E[{S(x + h)− S(x)}2] → 0 as h→ 0.

• S(x) is mean square differentiable if thereexists a process S ′(x) such that, for all x,

E

[{S(x + h)− S(x)

h− S ′(x)

}2]→ 0 as h→ 0

• the mean-square differentiability of S(x) isdirectly linked to the differentiability of itscovariance function

Theorem 3 Let S(x) be a stationary Gaussianprocess with correlation function ρ(u) : u ∈ IR.Then:

• S(x) is mean-square continuous iff ρ(u) is con-tinuous at u = 0;

• S(x) is k times mean-square differentiable iffρ(u) is (at least) 2k times differentiable at u =0.

18

Page 21: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(a) The Matern family

The correlation function is given by:

ρ(u) = {2κ−1Γ(κ)}−1(u/φ)κKκ(u/φ)

• κ and φ are parameters

• Kκ(·) denotes modified Bessel function oforder κ

• valid for φ > 0 and κ > 0.

• κ = 0.5: exponential correlation function

• κ→∞: Gaussian correlation function

• S(x) is mean-square dκ − 1 times differen-tiable

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

distance

corr

elat

ion

Three examples of the Matern correlation functionwith φ = 0.2 and κ = 1 (solid line), κ = 1.5 (dashedline) and κ = 2 (dotted line).

19

Page 22: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(b) The powered exponential family

ρ(u) = exp{−(u/φ)κ}

• defined for φ > 0 and 0 < κ ≤ 2

• φ and κ are parameters

• mean-square continuous (but non-differentiable)if κ < 2

• mean-square infinitely differentiable if κ =2

• ρ(u) very ill-conditioned when κ = 2

• κ = 1: exponential correlation function

• κ = 2: Gaussian correlation functionConclusion: not as flexible as it looks

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

distance

corr

elat

ion

Three examples of the powered exponential correla-tion function with φ = 0.2 and κ = 1 (solid line), κ = 1.5

(dashed line) and κ = 2 (dotted line).

20

Page 23: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(c) The spherical family

ρ(u;φ) =

{1− 3

2(u/φ) + 12(u/φ)3 : 0 ≤ u ≤ φ

0 : u > φ

• φ > 0 is parameter

• finite range

• non-differentiable at the origin

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

distance

corr

elat

ion

The spherical correlation function with φ = 0.6.

21

Page 24: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Comparable Simulations (same seed)

0.0 0.2 0.4 0.6 0.8 1.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

x

y

simulations with Matern correlation functions with φ = 0.2 and κ =

0.5 (solid line), κ = 1 (dashed line) and κ = 2 (dotted line).

0.0 0.2 0.4 0.6 0.8 1.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

x

y

simulations with powered exponential correlation function with φ =

0.2 and κ = 1 (solid line), κ = 1.5 (dashed line) and κ = 2 (dotted line).

0.0 0.2 0.4 0.6 0.8 1.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

x

y

simulations with spherical correlation function (φ = 0.6).

22

Page 25: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

3. Stochastic Process Prediction

General results for prediction

goal: predict the realised value of a (scalar)r.v. T , using data y a realisation of a (vector)r.v. Y .

predictor: of T is any function of Y , T = t(Y )

best choice: needs a criterion

MMSPE: the best predictor minimises

MSPE(T ) = E[(T − T )2]

Theorem 1.The minimum mean square error predictor of Tis

T = E(T |Y ).

Theorem 2.(a) The prediction mean square error of T is

E[(T − T )2] = EY [Var(T |Y )],

(the prediction variance is an estimate of theMSPE).(b) E[(T − T )2] ≤ Var(T ), with equality if T and Yare independent random variables.

23

Page 26: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Comments

• We call T the least squares predictor for T ,and Var(T |Y ) its prediction variance

• Var(T ) − Var(T |Y ) measures the contributionof the data (exploiting dependence between Tand Y )

• point prediction, prediction variance aresummaries

• complete answer is the distribution [T |Y ]

• not transformation invariant:T the best predictor for T does NOT necessa-rily imply that g(T ) is the best predictor forg(T ).

24

Page 27: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

4. Linear Gaussian Geostatistics

Suppose the target for prediction is T = S(x)

A predictor for T is a function T = T (Y )

The mean square prediction error (MSPE)is

MSPE(T ) = E[(T − T )2]

The the predictor which minimises MSPE is

T = E[S(x)|Y ]

Two approaches:

• Model-based geostatistics:– specify a probability model for [Y, T ]

– choose T to minimise MSPE(T ) amongstall functions T (Y )

• Traditional (linear) geostatistics:– Assume that T is linear in Y , so that

T = b0(x) +

n∑i=1

bi(x)Yi

– Choose bi to minimise MSPE(T ) withinthe class of linear predictors

Coincident results under Gaussian assumpti-ons

25

Page 28: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

5. Prediction Under The GaussianModel

• assume that the target for prediction is T =S(x)

• [T, Y ] are jointly multivariate Gaussian.

• T = E(T |Y ), Var(T |Y ) and [T |Y ] can be easilyderived from a standard result:

Theorem 4. Let X = (X1, X2) be jointly multi-variate Gaussian, with mean vector µ = (µ1, µ2)and covariance matrix

Σ =

[Σ11 Σ12

Σ21 Σ22

],

ie X ∼ MVN(µ,Σ). Then, the conditional distri-bution of X1 given X2 is also multivariate Gaus-sian, X1|X2 ∼ MVN(µ1|2,Σ1|2), where

µ1|2 = µ1 + Σ12Σ−122 (X2 − µ2)

and

Σ1|2 = Σ11 − Σ12Σ−122 Σ21.

26

Page 29: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

For the geostatistical model:

[T, Y ] is multivariate Gaussian with mean vec-tor µ1 and variance matrix

[σ2 σ2r′

σ2r τ 2I + σ2R

]

where r is a vector with elements ri = ρ(||x −xi||) : i = 1, ..., n.

Hence, using Theorem 4 with X1 = T and X2 =Y , we find that the minimum mean square errorpredictor for T = S(x) is

T = µ + σ2r′(τ 2I + σ2R)−1(Y − µ1) (1)

with prediction variance

Var(T |Y ) = σ2 − σ2r′(τ 2I + σ2R)−1σ2r. (2)

27

Page 30: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Notes

1. Because the conditional variance does not de-pend on Y , the prediction mean square error isequal to the prediction variance.

2. Equality of prediction mean square error andprediction variance is a special property of themultivariate Gaussian distribution, not a gene-ral result.

3. In conventional geostatistical terminology,construction of the surface S(x), where T = S(x)is given by (1), is called simple kriging. Thisname is a reference to D.G. Krige, who pionee-red the use of statistical methods in the SouthAfrican mining industry (Krige, 1951).

28

Page 31: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

6. What Does Kriging Actually Do tothe Data?

The minimum mean square error predictor forS(x) is given by

T = S(x) = µ +

n∑i=1

wi(x)(Yi − µ)

= {1−n∑i=1

wi(x)}µ +

n∑i=1

wi(x)Yi

• the predictor S(x) compromises between itsunconditional mean µ and the observed dataY

• the nature of the compromise depends on thetarget location x, the data-locations xi andthe values of the model parameters.

• call the wi(x) the prediction weights.

29

Page 32: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

6.1 Effects on predictions

(a) Varying the correlation function

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

0

1

2

locations

pred

icte

d si

gnal

Predictions from 10 equally spaced data-points using expo-nential (solid line) or Matern of order 2 (dashed line) correla-tion functions.

0.0 0.2 0.4 0.6 0.8 1.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5

locations

pred

icte

d si

gnal

Predictions from 10 randomly spaced data-points using expo-nential (solid line) or Matern of order 2 (dashed line) correla-tion functions.

30

Page 33: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(b) Varying the correlation parameter

0.0 0.2 0.4 0.6 0.8 1.0

−0.5

0.0

0.5

1.0

1.5

2.0

locations

pred

icte

d si

gnal

Predictions from 10 randomly spaced data-points using theMatern (κ = 2) correlation function and different values of φ:0.05 (solid line), 0.1 (dashed line) and 0.5 (thick dashed line).

31

Page 34: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(c) Varying the noise-to-signal ratio

0.0 0.2 0.4 0.6 0.8 1.0

−0.5

0.0

0.5

1.0

1.5

2.0

locations

pred

icte

d si

gnal

Predictions from 10 randomly spaced data-pointsusing the Matern correlation function and differentvalues of τ 2: 0 (solid line), 0.25 (dashed line) and 0.5

(thick dashed line).

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

locations

pred

ictio

n va

rianc

e

Prediction variances from 10 randomly spaced data-points using the Matern correlation function and dif-ferent values of τ 2: 0 (solid line), 0.25 (dashed line) and0.5 (thick dashed line).

32

Page 35: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

6.2 Effects on kriging weights

(a) The prediction weights: varying φ

0.2 0.4 0.6 0.8

−0.

20.

00.

20.

40.

6φ = 0

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

−0.

20.

00.

20.

40.

6

φ = 0.05

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

−0.

20.

00.

20.

40.

6

φ = 0.15

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

−0.

20.

00.

20.

40.

6

φ = 0.3

data locations

pred

ictio

n w

eigh

ts

Prediction weights for 10 equally spaced data-points with tar-get location x = 0.50.

i. varying parameter φ = 0, 0.05, 0.15, 0.30

ii. locations: equally spaced xi = −0.05 + 0.1i :i = 1, ..., 10

iii. prediction location: x = 0.50

iv. correlation function: Matern with κ = 2

v. nugget: τ 2 = 0

33

Page 36: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(b) The prediction weights: varying κ

0.2 0.4 0.6 0.8

−0.

20.

00.

20.

40.

6

κ = 0.5

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

−0.

20.

00.

20.

40.

6

κ = 1

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

−0.

20.

00.

20.

40.

6

κ = 2

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

−0.

20.

00.

20.

40.

6

κ = 5

data locations

pred

ictio

n w

eigh

ts

Prediction weights for 10 equally spaced data-points with tar-get location x = 0.50.

i. varying parameter κ = 0.5, 1, 2, 5

ii. locations: equally spaced xi = −0.05 + 0.1i :i = 1, ..., 10

iii. prediction location: x = 0.50

iv. correlation function: Matern with φ = 0.1

v. Nugget: τ 2 = 0

34

Page 37: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(c) The prediction weights: varying τ 2

0.2 0.4 0.6 0.8

0.0

0.4

0.8

τ2 = 0

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

0.0

0.4

0.8

τ2 = 0.1

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

0.0

0.4

0.8

τ2 = 0.25

data locations

pred

ictio

n w

eigh

ts

0.2 0.4 0.6 0.8

0.0

0.4

0.8

τ2 = 0.5

data locations

pred

ictio

n w

eigh

ts

Prediction weights for 10 equally spaced data-points with tar-get location x = 0.45.

i. varying parameter τ 2 = 0, 0.1, 0.25, 0.5

ii. locations: equally spaced xi = −0.05 + 0.1i :i = 1, ..., 10

iii. prediction location: x = 0.45

iv. correlation function: Matern with κ = 2and φ = 0.1

35

Page 38: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

7. Prediction of Functionals

Let T be any linear functional of S,

T =

A

w(x)S(x)dx

for some prescribed weighting function w(x).

Under the Gaussian model:

• [T, Y ] is multivariate Gaussian;

• [T |Y ] is univariate Gaussian;

• the conditional mean and variance are:

E[T |Y ] =

A

w(x)E[S(x)|Y ]dx

Var[T |Y ] =

A

A

w(x)w(x′)Cov{S(x), S(x′)}dxdx′

Note in particular that

T =

A

w(x)S(x)dx

36

Page 39: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

In words:

• given a predicted surface S(x), it is legitimatesimply to calculate any linear property of thissurface and to use the result as the predictorfor the corresponding linear property of thetrue surface S(x)

• it is NOT legitimate to do this for predictionof non-linear properties

• for example, the maximum of S(x) is a verybad predictor for the maximum of S(x) (thisproblem will be addressed later)

37

Page 40: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

8. Directional Effects

• Environmental conditions can induce directi-onal effects (wind, soil formation, etc).

• As a consequence the spatial correlation mayvary with the direction.

0.1

0.2

0.3

0.4

0.5

0.6 0.7

0.8

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

−1.0 −0.5 0.0 0.5 1.0

−1.0

−0.5

0.0

0.5

1.0

correlation contours for a isotropic model (left) andtwo anisotropic models (center and right).

• a possible approach: geometric anisotropy.

• two more parameters: the anisotropy angleψA and the anisotropy ratio ψR.

• rotation and stretching of the original coordi-nates:

(x1′, x2′) = (x1, x2)

(cos(ψA) − sin(ψA)sin(ψA) cos(ψA)

)(1 00 1

ψR

)

38

Page 41: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Geometric Anisotropy Correction

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Original Space, ψ1 = 0°, ψ2 = 2

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

25 26 27 28 29 30

31 32 33 34 35 36

x

0.0 0.2 0.4 0.6 0.8 1.0

−0.

20.

00.

20.

40.

6

Isotropic Space

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

25 26 27 28 29 30

31 32 33 34 35 36

x

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Original Space, ψ1 = 120°, ψ2 = 2

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

25 26 27 28 29 30

31 32 33 34 35 36

x

−1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0

−0.

6−

0.4

−0.

20.

00.

20.

40.

60.

8

Isotropic Space

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

x

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Original Space, ψ1 = 45°, ψ2 = 2

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

25 26 27 28 29 30

31 32 33 34 35 36

x

−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

1.0

Isotropic Space

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

x

39

Page 42: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

9. Non-Stationary Gaussian Models

Stationarity is a convenient working assump-tion, which can be relaxed in various ways.

• Functional relationship between meanand variance?Can sometimes be resolved by a transforma-tion of the data.

• Non-constant mean?Replace constant µ by

µ(x) = Fβ =

k∑j=1

βjfj(x)

for measured covariates fj(x) (or non-linearversions).

Note: sometimes called universal krigingor kriging with external trend.

40

Page 43: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

• Non-stationary random variation?

Intrinsic variation a weaker hypothesisthan stationarity (process has stationary in-crements, cf random walk model in timeseries), widely used as default model fordiscrete spatial variation (Besag, York andMolie, 1991).

Spatial deformation methods (Sampsonand Guttorp, 1992) seek to achieve statio-narity by transformation of the geographicalspace, x.

• as always, need to balance increased fle-xibility of general modelling assumptionsagainst over-modelling of sparse data, lea-ding to poor identifiability of model parame-ters.

41

Page 44: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

PART III:

PARAMETRIC ESTIMATION

1. Second-Moment Properties

2. Variogram Analysis

3. Likelihood Inference

4. Plug-in Prediction

5. Gaussian Transformed Models

6. A Case Study

7. Anisotropic Models

8. Model Validation

Page 45: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

1. Second-Moment Properties

• the variogram of a process Y (x) is the func-tion

V (x, x′) =1

2Var{Y (x)− Y (x′)}

• for the spatial Gaussian model, with u = ||x−x′||,

V (u) = τ 2 + σ2{1− ρ0(u)}

• basic structural parameters of the spatialGaussian model are:– the nugget variance: τ 2

– the sill: σ2 = Var{S(x)}

– the total sill: τ 2 + σ2 = Var{Y (x)}

– the range: φ, such ρ0(u) = ρ(u/φ)

• practical implications:– any reasonable version of the (linear) spa-

tial Gaussian model has at least three pa-rameters

– but you need a lot of data (or contextualknowledge) to justify estimating more thanthree parameters

• the Matern family uses a fourth parameterto determine the differentiability of S(x)

43

Page 46: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Paradigmas for parameter estimation

• Ad hoc (variogram based) methods– compute the empirical variogram

– fit a theoretical covariance model

• Likelihood-based methods– typically under Gaussian assumptions

– Optimal under stated assumptions, butcomputationally expensive and may lackrobustness?

• Bayesian implementation,combining estimation with prediction, beco-ming more widely accepted (amongst statis-ticians!)

44

Page 47: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

2. Variogram Analysis

• The variogram is defined by

V (x, x′) =1

2Var{Y (x)− Y (x′)}

• if Y (x) is stationary,

V (x, x′) = V (u) =1

2E[{Y (x)− Y (x′)}2]

where u = ||x− x′||

• suggests an empirical estimate of V (u):

V (u) = average{[y(xi)− y(xj)]2}

where each average is taken over all pairs[y(xi), y(xj)] such that ||xi − xj|| ≈ u

• for a process with non-constant mean (cova-riates), trend-removal can be used as follows:– define ri = Yi − µ(xi)

– define V (u) = average{(ri − rj)2},

where each average is taken over all pairs(ri, rj)

45

Page 48: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(a) The variogram cloud

• define the quantities

ri = Yi − µ(xi)

uij = ||xi − xj||vij =

(ri − rj)2

2

• the variogram cloud is a scatterplot ofthe points (uij, vij)

Example: Swiss rainfall data

0 50 100 150 200

050000

100000

150000

distance

sem

ivariance

• under the spatial Gaussian model:– vij ∼ V (uij)χ

21

– the vij are correlated

• the variogram cloud is therefore unstable,both pointwise and in its overall shape

46

Page 49: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(b) The empirical variogram

• derived from the variogram cloud by ave-raging within bins: u−h/2 ≤ uij < u+h/2

• forms k bins, each averaging over nk pairs

• removes the first objection to the vario-gram cloud, but not the second

• is sensitive to mis-specification of µ(x)

Example: Swiss rainfall data.

0 50 100 150 200

05000

10000

15000

distance

sem

ivariance

Empirical binned variogram

47

Page 50: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(c) The fitted variogramEstimate θ to minimise a particular crite-riumeg, the weighted least squares (Cressie, 1993)

S(θ) =∑

k

nk{[Vk − V (uk; θ)]/V (uij; θ)}2

where Vk is average of nk variogram ordinatesvij.Other criteria: OLS, WLS with weights givennk only, ...• Corresponds to biased estimating equation

for θ, although still widely used in practice.• Potentially misleading because of inherent

correlations amongst successive Vk

Example: Swiss rainfall data

0 50 100 150 200

020

40

60

80

100

distance

sem

ivariance

WLS 1WLS 2OLS

Empirical variogram with WLS with nk only (thickline), Cressie’s WLS (full line) and OLS (dashed line)

48

Page 51: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Further comments on empirical vario-grams(a) Variograms of raw data and residuals can be

very different

Example: Parana rainfall data.

•• •

• •

• ••

••

••

• •

0 100 200 300 400

0

1000

2000

3000

4000

5000

6000

••

• •

• • •

•• •

••

••

• •

••

0 100 200 300 400

0

200

400

600

800

1000

empirical variograms of raw data (left-hand panel)and of residuals after linear regression on latitude,longitude and altitude (right-hand panel).

• variogram of raw data includes variationdue tolarge-scale geographical trend

• variogram of residuals eliminates thissourceof variation

49

Page 52: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(b) How unstable are empirical variograms?

• •• • •

•• •

Distance

Sem

i-var

ianc

e

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

••

• •

• •

••

••

• thick solid line shows true underlying va-riogram

• fine lines show empirical variograms fromthreeindependent simulations of the same mo-del

• high autocorrelations amongst V (u) forsuccessivevalues of u imparts misleading smoothness

50

Page 53: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

3. Likelihood Inference

(a) Maximum likelihood estimation (ML)The Gaussian model is given by:

Yi|S ∼ N(S(xi), τ2)

• S(xi) = µ(xi) + Sc(xi)

• Sc(·) is a zero mean stationary Gaus-sian process with covariance parameters(σ2, φ, κ),

• µ(xi) = Fβ =∑k

j=1 fk(xi)βk, where fk(xi) isa vector of covariates at location xi

Which allows us to write:

Yi = µ(xi) + Sc(xi) + εi : i = 1, ..., n

ThenY ∼ MVN(Fβ, σ2R + τ 2I)

and the likelihood function is

L(β, τ, σ, φ, κ) ∝ −0.5{log |(σ2R + τ 2I)| +(y − Fβ)′(σ2R + τ 2I)−1(y − Fβ)}.

which maximisation yields the maximum li-kelihood estimates of the model parameters.

51

Page 54: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Maximum likelihood estimation (cont.)

Computational details:• reparametrise ν2 = τ2

σ2 and denote:σ2V = σ2(R + ν2I)

• the log-likelihood function is maximisedforβ(V ) = (F ′V −1F )−1F ′V −1y

σ2 = n−1(y − Fβ)′V −1(y − Fβ)

• then substituting (β, σ2) by (β, σ2) in 3 themaximisation reduces to

L(τr, φ, κ) ∝ −0.5{n log |σ2| + log |(R + ν2I)|}

• For the Matern correlation function wesuggest to take κ in a discrete set{0.5, 1, 2, 3, ..., N}

52

Page 55: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

(b) Restricted Maximum Likelihood Esti-mation (REML)

The REML principle is as follows:• under the assumed model for E[Y ] = Fβ,

transform the data linearly to Y ∗ = AYsuch that the distribution of Y ∗ does notdepend on β;

• estimate θ = (ν2, σ2, φ, κ) by maximum like-lihood applied to the transformed data Y ∗

We can always find a suitable matrix Awithout knowing the true values of β or θ, forexample

A = I − F (F ′F )−1F ′

The REML estimators for θ can be computedby maximising

L∗(θ) ∝ −1

2

{log |σ2V | + log |F ′{σ2V }−1F |

+(y − Fβ)′{σ2V }−1(y − Fβ))},

where β = β(θ).

53

Page 56: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Comments on REML

• introduced in the context of variance com-ponents estimation in designed experi-ments (Patterson and Thompson, 1971)

• leads to less biased estimators in smallsamples

• L∗(θ) depends on F , and therefore on a cor-rect specification of the model for µ(x),

• L∗(θ) does not depend on the choice of A.

• Given the model for µ(·), the method enjoysthe same objectivity as does maximum li-kelihood estimation.

• widely recommended for geostatistical mo-dels.

• REML is more sensitive than ML to mis-specification of the model for µ(x).

• for designed experiments the mean µ(x) isusually well specified

• however in the spatial setting there is nosharp distinction between µ(x) and Sc(x).

54

Page 57: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Profile likelihood

Variability of the parameter estimators can beinspected by looking at the log-likelihood sur-face defined by (3).Surface reflects the information contained inthe data about the model parameters.Dimension of the log-likelihood surface does notallow direct inspection.

Computing profile likelihoods

• Suppose a model with parameters (α, ψ)

• denote its likelihood by L(α, ψ)

• To inspect the likelihood for α replace thenuisance parameters ψ by their ML estima-tors ψ(α), for each value of α

• This gives the profile likelihood:

Lp(α) = L(α, ψ(α)) = maxψ

(L(α, ψ)).

55

Page 58: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

4. Plug-In Prediction – Kriging

Quite often the interest is to predict

• the realised value of the process S(·) at apoint,

• or the average of S(·) over a region,

T = |B|−1

B

S(x)dx

where |B| denotes the area of the region B.

For the Gaussian model we’ve seen that the mi-nimum MSPE predictor for T = S(x) is

T = µ + σ2r′(τ 2I + σ2R)−1(Y − µ1)

with prediction variance

Var(T |Y ) = σ2 − σ2r′(τ 2I + σ2R)−1σ2r

where the only unknowns are the model para-meters.

The plug-in prediction consists of replacingthe true parameters by their estimates.

56

Page 59: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Comments

• ML estimates and simple kriging

• REML estimates and ordinary kriging

• The ad-hoc prediction(a) Estimate β by OLS, β = (F ′F )−1F ′Y , and

construct residuals Z = Y − Fβ.(b) Calculate the empirical variogram of Z

and use it for model formulation and pa-rameter estimation.

(c) Re-estimate β by GLS and use the fittedmodel for prediction.

• the role of empirical variograms:– diagnostics (model-based approach)– inferential tool (ad-hoc approach)

• The conventional approach to kriging (M-Bor ad-hoc) is to plug-in estimated parametervalues and proceed as if the estimates werethe truth.This approach:– usually gives good point predictions when

predicting T = S(x)

– but often under-estimates prediction vari-ance

– and can produce poor results when predic-ting other targets T

57

Page 60: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

5. Gaussian Transformed Models

The Gaussian model might be clearly inappro-priate for variables with asymmetric distributi-ons.Some flexibility with an extra parameter λ defi-ning a Box-Cox transformation.Terminology: Gaussian-transformed model.

The model is defined as follows:

• assume a variable Y ∗ ∼MVN(Fβ, σ2V )

• the data, denoted y = (y1, ..., yn), are genera-ted by a transformation of the linear Gaus-sian model Y = h−1

λ (Y ∗) such that:

Y ∗i = hλ(Y ) =

{(yi)

λ−1λ if λ 6= 0

log(yi) if λ = 0

The log-likelihood is:

`(β, θ, λ) = − 1

2{log |σ2V |+(hλ(y)− Fβ)′{σ2V }−1(hλ(y)− Fβ)}

+

n∑i=1

log((yi)

λ−1)

58

Page 61: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Notes:

• λ = 0 corresponds to a particular case: log-Gaussian model

• Inference strategies:(a) λ as a random parameter (De Oliveira, Ke-

dem and Short, 1997). Prediction averagesover a range of models.

(b) An alternative approach is to estimate λand then fix it equal to the estimate whenperforming prediction (Christensen, Dig-gle and Ribeiro, 2000).i. find the best transformation maximising

the profile likelihood for λ

ii. fix the transformation, transform data

iii. inferences on transformed scale

iv. back-transform resultsDifficulties with negative values and back-transformation.

59

Page 62: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

6. A Case Study: Swiss rainfall data

E-W (km)

N-S

(km

)

0 100 200 300

0

50

100

150

200

Locations of the data points with points size proportionalto the value of the observed data. Distances are in kilo-metres.

• 467 locations in Switzerland

• daily rainfall measurements on 8th of May1986

• The data values are integers where the unitof measurement is 1/10 mm

• 5 locations where the value is equal to zero.

60

Page 63: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Swiss rainfall data (cont.)

Estimating the transformation parameter andusing the Matern model for the correlation func-tion.

κ λ log L0.5 0.514 -2464.246

1 0.508 -2462.4132 0.508 -2464.160

Maximum likelihood estimate λ and the corresponding valueof the log-likelihood function log L for different values of theMatern parameter κ.

•• • •

0.40 0.50 0.60

-2466

-2465

-2464

-2463

••• •

0.40 0.50 0.60

-2466

-2465

-2464

-2463

••• •

0.40 0.50 0.60

-2466

-2465

-2464

-2463

Profile likelihoods for λ. Left: κ = 0.5, middle: κ = 1, right:κ = 2. The two lines correspond to 90% and 95% percentquantiles for a 1

2χ2(1)-distribution.

Neither untransformed nor log-transformed areindicated.

61

Page 64: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Swiss rainfall data (cont.)

Estimates for the model with λ = 0.5

κ β σ2 φ τ 2 log L0.5 18.36 118.82 87.97 2.48 -2464.315

1 20.13 105.06 35.79 6.92 -2462.4382 21.36 88.58 17.73 8.72 -2464.185

Maximum likelihood estimates β, φ, σ, τ and the correspon-ding value of the likelihood function log L for different valuesof the Matern parameter κ, for λ = 0.5

••• •

••

••50 100 150 200 250 300

-2464.0

-2463.5

-2463.0

-2462.5

••• •

••

•20 30 40 50 60 70

-2464.0

-2463.5

-2463.0

-2462.5

••••

••

5 6 7 8 9

-2464.0

-2463.5

-2463.0

-2462.5

Profile likelihood for covariance parameters with κ = 1 andλ = 0.5. Left: σ2, middle: φ, right: τ 2.

62

Page 65: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Swiss rainfall data (cont.)

• •• • • •

••

••

Distance

Sem

i-var

ianc

e

0 50 100 150 200

0

20

40

60

80

100

120

Estimated semivariances for square-root transformed data(dot-dashed line), compared with the theoretical semivario-gram model, with parameters equal to the maximum like-lihood estimates. κ = 0.5 (dashed line), κ = 1 (thick solidline), κ = 2 (thin solid line).

63

Page 66: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Swiss rainfall data (cont.)

0 100 200 300

-100

0

100

200

0 100 200 300 400 500

0 100 200 300

-100

0

100

200

0 5000 10000 15000

Maps with predictions (left) and prediction variances(right).

Prediction of the percentage of the area whereY (x) ≥ 200: A200 is 0.4157

0.410 0.415 0.420

0

100

200

300

400

500

600

Samples from the predictive distribution of A200.

64

Page 67: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

7. Estimating Anisotropic ModelsLikelihood based methods

• Two more parameters for the correlationfunction

• Increases the dimension of the numerical mi-nimisation problem

• In practice a lot of data might be needed

Variogram based methods

• Compute variograms for different directions

• Tolerance angles, in particular for irregularlydistributed data

• Fit variogram model using directional vario-grams

0 50 100 150 200

050

0010

000

1500

020

000

2500

0

yy

omnid.0°45°90°135°

0 50 100 150 200

050

0010

000

1500

020

000

2500

0

yy

omnid.30°75°120°165°

Directional variograms for the Swiss rainfall data.

65

Page 68: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

8. Model Validation and Comparison:

Using validation dataData divided in two groups: data for model fit-ting and data for validationFrequently in practice data are scarce and to ex-pensive to be left out

“Leaving-one-out”

• One by one, for each datum:(a) remove the datum from the data-set

(b) (re-estimate model parameters)

(c) predict at the datum location

• Compare original data with predicted values.

66

Page 69: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

100 fitting data + 367 validation data

0 100 200 300 400 500

010

020

030

040

050

0

data

pred

icte

d

+x +x +x ++ +

+

x+x

+

x

++++

x

+x

+

+

+x

++

x

+x +

x

+

+x

x

+

+

+++x

x

x+

+

x+

x

x

x

+

x

+x

x

+

+

x

+

++

x

+

+

x

x

+

x

x

+

x+

x

x +

xx +

+xx

+

x

xx

x

x

+

+

+

+

xx

+

+

+

+

x

+

x

x

x

+

++x +

+

x x+

x

+x

xx

+x+x+

x

+

x+xx

++

x

+

+

+x

+

x

+xx

+ ++

+

x

x

+

x

+

xx

+

x

+

+

x

x+

+

+

xx

+

xx

x+

+

+

+

x+

++x

+

+

xx

+

+

+

+

x++

+

+

+

x + ++ +

+

+

+

x

xx

x

+x

+x

++xx

x

+

+xx +++x

+x

x +++

+

xx

+x

xx

+

x

xx+x+

x

+x

x

x

++x

x

x+x

++

++x

+

x

+

x

x +

+

+++

x+x

x

+xx

+

xx

+

x+

+

xx

x

+

++ +

x

xx

xx

+

x ++ +

x

x

x

x+x

++

+

x

x

x

xx

++

x xx+x

+xxx

+

xx

+

+x

x+x

+xx

xx +

x

x

x

x++

x +

+

+

x x

+

xx

++

x

x++

+

++xxx xxx

0 50 150 250 350

−50

050

150

250

Coord X

Coo

rd Y

+

+

+x

xx

+

+x x

+

x

x

x

+

+

xx

+

xx

x

+++

+ x

+

x

x

x

x

+

x

x+

++

x+

+

x

+

+

+

x +

+

x

x

+

x

x

x

+x

+

+

+

+x

x+

x

+

+x

+

+x

+

+

x

+

xx

+

x

x++ x

xx+

x x

x

x

+ x+

x

x+

x

+

+x

+

+

xx

+

++

++

+ +

x

++

x

++

x

x

+

+

x+ x

+

+

++

x

x+

+

+ xx x

x+x

+x

x

+

+

x

++

+

++

x+

x x

x

x+

x

x

x+

+

+

x

x+x

+ x

+

+

x

+

+

x

x

++

x

x

x

x

x

+

+x+

+

++

x+

++

x+xx

+

+

x

+

x

x

x

+xx

x

+ xx

+

+

x

+

x

x

x

+

x

+x

x

+x

+

+

xx

x+

x++

x

+

+

x

x

xxx

x

x

+xx

++

+

xx

+

x

+x

x

+x

x

x

+

x

x

+

++

x

x

x

++

x

+

+

+

x+

x

xx

xx

+

+

x

xx

x

x x+

+

+

+

+x x+

x

x

x+

x

xx+

x

x

++

+

+

x

+x

x x +x +

+x+

+

x+ ++xx++

++

+x

+

++

+x

+

+ x+

x

++

+

++

+xxx

+

+ ++xx +

+

+x+

data − predicted

Den

sity

−200 −100 0 100 200

0.00

00.

002

0.00

40.

006

0.00

8

0 100 200 300 400 500

−20

00

100

200

predicted

data

− p

redi

cted

+x

+

x

+

x++ +

+ x+

x

+

x

+

+++ x+

x+

+

+

x

++

x

+

x

+x

+

+x

x+ +

+

++

xx

x

+

+

x

+

x

x x

+

x +

x

x

+ +

x

++ +

x

++x

x

+

x

x

+

x+

xx

+

x x

++x

x

+

xx

x x

x

++

+

+

xx

+

+ ++

x

+x xx +++

x

++

xx

+

x

+

xxx

+x+

x+

x

+

x+x x

++

x+

+

+x

+

x

+

x x+

+

++

xx

+x

+

xx

+

x+

+

xx

+

++

x x

+x xx

+ +++

x

++

+x

+ +xx

+++

+x

++

++ +x

++

+

+

+

++

xx

xx+

x

+x

++

xx x

++

xx+

+ +x

+

x

x

+++

+

xx

+

x

x

x

+

xx x+

x+x

+x

xx

+ +x

xx

+x

+ +++

x +x+

x

x

+ ++ ++x+x x

+x

x

+

xx

+

x++

xxx

+++

+

xxx

xx

+

x++

+xxx x

+

x

++

+

x xx

x x+

+

xx

x

+x

+

xx x

+x x

++

x

x

+

x

+

xx xx

+

x

xxx

+

+

x

+

++

x

x

+

xx

+

+

xx

++

+++

xx xx

xx

0 100 200 300 400 500

−20

00

100

200

data

data

− p

redi

cted

+x

+

x

+

x++

++ x

+x

+

x

+

+ + +x

+

x+

+

+

x

++

x

+

x

+x

+

+x

x+ +

+

+ +

xx

x

+

+

x

+

x

x x

+

x +

x

x

+ +

x

++ +

x

++x

x

+

x

x

+

x+

xx

+

x x

++x

x

+

xx

x x

x

++

+

+

xx

+

+ ++

x

+x xx +++

x

++

xx

+

x

+

xxx

+x+

x+

x

+

x+

x x+

+

x+

+

+x

+

x

+

x x+

+

++

xx

+x

+

xx

+

x+

+

xx

+

++

x x

+x xx

+ +++

x

++

+x

+ +x x

+++

+x

++

++ +x

++

+

+

+

++

xx

xx+

x

+x

++xx x

++

xx+

++x

+

x

x

+++

+

xx

+

x

x

x

+

xx x

+x+ x

+x

xx

+ +x

xx+

x++

++x +x

+

x

x

+ +++ +x+x x

+x

x

+

xx

+

x++

xxx

+++

+

xxx

xx

+

x++

+xxx x

+

x

++

+

x xx

xx+

+

xx

x

+x

+

xxx

+xx

++

x

x

+

x

+

xx xx

+

x

xxx

+

+

x

+

++

x

x

+

xx

+

+

xx

++

+++

xxx x

x x

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

theoretical prob

obse

rved

pro

b

0 50 150 250 350

−50

050

150

250

Coord X

Coo

rd Y +

x

+

+

x+

+

+x

xx

x

x

x

x

+

x

+ x+

+x

+

+x

x

x

x ++

+

+

x x

x

++

x

x

+

x

+

+

+ x

x

+

+

+

+

+xx

x

+

x

xxx

+

+

x

+

+ +

++

+

+

+

+

+

x

+

+

x

+

+

x+

x

x

x+

x

x

++

+

x

x

+xx xx

+

+

+

x

x+

x

+

+

+

+xx

+

x

+

xx

+

x+

+

x

+

x

+

++

++

+

+

+

x x

x

x

+x

xx

x

x

+

x

x+

+x

x

+

+

+

xx

++

+

x

x

+

+

+

++

xx

x

x

+

+

x

+

x

x

x

+

x

+

+x

+

+

+

+

x+

xx+

+

x

++

+

x

+x

x

x

+x

x

x

+ x

x

x

+

+x

+

x

+

x

xx

x

xx

+

+

x+x

+

x

x

+

x

x

x+

x

xx

+

x+

x

x

+

x

x

x

+x

x

x +

xx

+

x

x

x x+

x

+

+

+

+

x

x+

x

x

x

+

+

x

++

+

xxx

++

x

++

x

xx

x+ +

x

x

+

x

+

xx+

x

x+

x

++ x

+ x

+

x

x

+xx

x

++

+

+

x

x

+

+

x

x

+x x

xx

x

+

+

x

xx

++

+

+x

x

+

+x

+x

++

++

+x

+

+

++

+

++

x+

++ ++

x+

+

++++

std error

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0 100 200 300 400 500

−4

−2

02

4

predicted

std

erro

r

+x+x

+

x ++ ++ x+x

+

x

+

+++

x+

x+

+

+

x

+

+

x

+

x+

x

+

+ x x+ ++

++

xxx

++

x

+

x

xx

+x +

x

x

+ +

x

++ +

x

++x

x+

xx

+x +

xx

+

xx

++xx

+xx

x xx

++

+

+

xx

+

+ ++

x+

xxx +++

x

+ +

x

x

+

x+

xxx

+x

+

x+

x

+

x

+x x

++

x+

+

+

x

+

x

+

x x+

+

+ +

xx

+

x

+

xx

+

x

+

+

xx

+

+

+

xx

+x

xx

++++

x

++

+x

+ +

xx

++ ++

x+ +

+

+ +x

+

+

+

+

+

+

+

x

xxx+

x

+x

++

xx x

+

+

xx

++ +

x

+

xx

++

++

xx

+

x

xx

+

xx x

+

x+

x+

x xx

+ +x

xx

+

x+ +++

x +x+

x

x

++

+ ++x+x x+

xx

+

x

x

+

x+ +

xxx

++++

xxx

xx

+

x

++ +xxx x

+

x

+++

x x

xx x

++

xx

x

+x

+

xxx

+

x x

++

x

x

+

x

+

xxxx

+

x

xx

x

+

+

x

+

++

x

x

+

xx

+

+

x

x

+

+++

+

xx xxxx

0 100 200 300 400 500

−4

−2

02

4

data

std

erro

r

+x

+x

+

x ++ ++ x+x

+

x

+

+ ++

x+

x+

+

+

x

+

+

x

+

x+

x

+

+x x+ ++

+ +

xxx

++

x

+

x

xx

+x +

x

x

+ +

x

++ +

x

++x

x+

xx

+x +

xx

+

xx

++xx

+xx

x xx

++

+

+

xx

+

+ ++

x+

xxx +++

x

+ +

x

x

+

x+

xxx

+x+

x+

x

+

x

+x x

++

x+

+

+

x

+

x

+

x x+

+

+ +

xx

+

x

+

x x

+

x

+

+

xx

+

+

+

xx

+x

xx

++++

x

++

+x

+ +

xx

++ ++

x++

+

+ +x

+

+

+

+

+

+

+

x

xxx+

x

+x

++x

x x

+

+

xx

+++

x

+

x x

++

++

xx

+

x

xx

+

xx x

+

x+

x+

x xx

+ +x

xx

+

x++++

x +x+

x

x

++

++ +x+x x+

xx

+

x

x

+

x+ +xx

x

++++

xx

xxx

+

x

++ +xxx x

+

x

+++

x x

xxx

+ +

x x

x

+x

+

xxx

+

xx

++

x

x

+

x

+

xxxx

+

x

xx

x

+

+

x

+

++

x

x

+

xx

+

+

x

x

+

+++

+

xxx xx x

67

Page 70: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

all data – “leaving-one-out”

0 100 300 500

010

030

050

0

data

pred

icte

d+xx+ +xxxx+++xx+++xxxx

+x

xx

x++x

+

+x +x

x

x++++

+

+

+

++x

++

x +xx

++

+x

xxx

xx

++

+

+

+

x

x

x ++

x+x

+x

x +

+xx

+

x

xx

x

+

+

x

+

x

xx

+

x+

+

+x

x

+

x

+

xx +++x

x

+++

xx

+

x+

x

+

x

xxx

+

x+

x

+

x

++x

xx

x

+x

x

x

++

+

x

xx+

++

xx

+

x

x

x+

+x

+

x

+x

x

x

+

xxx+

x

+

x

x

x

x

+

+

xx

+

+

x

x

+

xx

+

+

++x

+

+x

x

x

x

x +

xx

x

+x

x

+x

+x

x

+

+x

+

+

+

++

+

xx

++

x+ +

+

+

+ +x

xx

+

x

+

x

++

x

+xx

x+

+

xxx

+

+

x

x

+

+

+x

+

+x+

x

+x

xx

+

x

x

x

xxx

xx

+

+

x

+

x

++

+

++

+

x

+

x

+

xx

x

++

xx

x

x

++

+

x

+ +

x

x

xx

xx

x

x

x

xx +

x+x

+

+

+

x

+x

+

+

x

++

x

++

+

x

xx

+

x+

x

xx

+

+

+x

xx

++

+x

+

+

+

+

x

xx

x

+

x

+

+

+

x

+x

+++

xx

+

+

+

++

+

x

+

x

x

x

x

+

+

+

+

x

x

+

x

xx

x+

x

+

x

+

x ++

+x

x

+x

+

x

x

x

x

+

++

++

x

xx+

+x ++x

++

x

x

x

x

+

x +x

+

+x

x+

+

+

x

x

x

xx +xx

xxx

+x ++

+xxxx

0 50 150 250 350

−50

050

150

250

Coord X

Coo

rd Y

+

+

x

x

x

x x+

x

x ++

x

+

xx

x

+

x

+

x

+

+

x

++

+

+x

x

+

+

x x+

x

+

x xx

+

x

+x

x++

x

+ +

+

x

+

+

x++

+

+

x

x +

x

x

+ x x

x x

x

x

x

+

x

xx

x

x

+

+

+

x

x x

x

x

x

x

+

+x

+

+

x

+

+

+

+

+

+

x

x

++

x

x

x

xxx

x

+

++

x

x

+

+

x

xx

+

+

x

x

x

+

x

+x

+++ x

x

++

x

+x

++

x

+

+

++

+

+x

+

x+x

x

x+

+

x

x

+

x

+

x

+ x

x

xx

+

xx

xx

++

x

x

+

x

+x

x

+x+

x

x+

+x

xx

x

+

++

x

+

x

+xx

x

x+

+x

+

x

+

+

x

x

+x

x xx

xx

x

x+

x

+

xx

+x+

x+

+

+

+

x

x

+

x

x

+x

x

++

xxxx

+

+

+++

+x

x

xxx

xx

x+ x

+

+x x

+

+

+

x

x

x

x

+x

x

+

+ x

+ x

x

x+

+

++

+

+

+

x

x

x+

+

xx

x

x+ +

xx+x

+

+

+

x

+

+xx

+

x

+x

x

x

x+

+

x

x x

+

x

xx

+

++

+x

xx

+

+

++

x

+ x

++

++

+

+

+

+

+

x

x

x

x

+

xx

+x

+

+

+

x

x++

+

x

xx

x

x+

x+

+

x+

+

+ xx

+

x

x

xx

x

+x

xx

x

x

+x

+ x

x

+

x

+

++

+

+

xx

x

+ xx x+

++x

+

++

+ +

x

+

++ x

+

+

+x

+ ++

+ x

+

+x xx+xx

x+

x

+

x

+x

+

++x

x

x+

+

+

x

data − predicted

Den

sity

−200 −100 0 100 200

0.00

00.

004

0.00

8

0 100 200 300 400

−20

00

100

200

predicted

data

− p

redi

cted

+

x x+ +xx xx+++

x x+++xxx x

+x x x

x

++x ++

x+x

xx++++

++ +++ x++

x

+xx

+++

xx

xx xx+ + ++ +

x

xx

+

+

x

+ x+

xx+ +

xx+

x

xxx

++x

+

x

xx

+

x

+ + +x

x

+

x

+

xx

+++x x

+++xx

+x + x+

xx

x x+

x+x +x

++

xx

x x

+

x x x

+ ++

xxx

++ +

x x

+

x xx +

+

x

+

x

+xx

x

+xxx

+x

+

xxx

x+

+

x

x

++

xx

+ x

x

++

++x

++x

x

xx x

+x

x

x

+

x x

+

x+

x x+

+

x

++ ++

++

xx

+

+

x

+

+

+++

+xxx

+

x+

x

++ x

+

x xx

+

+x

xx

++x

x

++

+

x

+

+

x+

x

+

xx

x

+xx

x

x xx xx

++

x+

x

+ ++

++

+

x

+

x

+

xx x

+

+

xx xx

+ ++

x+

+

xx x

x

x x x

x x

xx

+x+

x

+

++

x +

x

+

+

x

++x

++

+

xxx

+

x

+

x

xx

+++

x

x

x

++

+

x

++

+

+

x

xx

x+

x

+

+

+

x+

x++ +

xx

+

++ + ++

x+

xxxx

+

+ +

+

xx

+

xxx

x

+x

+

x

+

x

+

++

xx

+x

+

xx

x

x

+ +++

+

x

xx

+ +x

++x

+

+

xx

x x

+

x

+

x

++

x x+

+

+

xx xx

x

+

xxxxx

+

x+

+ +xxxx

0 100 200 300 400 500 600

−20

00

100

200

data

data

− p

redi

cted

+

x x+ +

x xxx+++xx

+++

xxx x+

x x xx

++x ++

x+x

xx+ +++

++ +++x+

+

x

+xx

+++

xx

x x xx+ + ++ +

x

xx

+

+

x

+x+

xx+ +

xx+

x

xxx

++x

+

x

xx

+

x

+ + +x

x

+

x

+

xx

+++x x

+++xx

+x + x+

xx

x x+

x+x +x

++x

xx x

+

x x x

+ ++

xxx

++ +

x x

+

x xx +

+

x

+

x

+xx

x

+xxx

+x

+

xxx

x+

+

x

x

++

xx

+ x

x

++

++x

++x

x

xx x

+x

x

x

+

x x

+

x+

x x+

+

x

++ ++

++

xx

+

+

x

+

+

+++

+xxx

+

x+

x

++ x

+

x xx

+

+xxx

++x

x

++

+

x

+

+

x+

x

+

xx

x

+xx

x

xxx xx

++

x+

x

+ ++

++

+

x

+

x

+

xx x

+

+

xx xx

+ ++

x+

+

xx x

x

x x x

x x

xx

+x+

x

+

++

x +

x

+

+

x

++x

++

+

xx

x

+

x

+

x

xx

+++

x

x

x

++

+

x

++

+

+

x

xx

x+

x

+

+

+

x+

x+ + +

xx

+

++ + ++

x+

xxxx

+

+ +

+

xx

+

xxx

x

+x

+

x

+

x

+

++

xx

+x

+

xx

x

x

+ +++

+

x

xx+ +

x

++x

+

+

xx

x x

+

x

+

x

++

x x+

+

+

xx xx

x

+

xxxxx

+

x++ +

xxxx

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

theoretical prob

obse

rved

pro

b

0 50 150 250 350

−50

050

150

250

Coord X

Coo

rd Y

x

+

+

x

x

x+ x

x

x ++

+

x+ +

+x

x

x

xx

x

x++

++

+x

x

+

+

xx

+

+

xx

+x

x+

x

+

++

+x

+

+

+

++

x x

x x

+

+x

x

x

++ x

+x

x+

x

+

x

x+

+

x

+

xx

x

x

x

x

+

x

x

x+

x

+

++x

+

x

x +x

+ +

+ +

x

+

+

+

x

+

x

+

xxx

+x

+

x

+

+

+

+

x

x+

++

x

x

xx

+x

x+

x

+

+

x

xx

+

x

x

+

x

+

x

x+

+ +

x

++

+

+

++

+

+

x

+

x

+x

+

+ +

x

x

x

x

x

+

+x

x

x

x

x

x ++

x

+

+

xx

xx x

xx

x x

+x

x

x

+

+

x

+

x

x+

xx

+ x++

++

+x x

+

x

+

x

+

x

+

+

+x

+

+

x

x++

+

x

x

+x

x

+x

x

+x

x

xxxx

x

xx

+

x

x

+x

x+

+

+

xx

x

+

+

+

x

x

+

x++

xx

+x

+

+

+x +

+

x

+

x

+

x

x

x

+

x

x

+

+ xx

+x

+

x

x

+

x

+

x

+x

x

x

x

x

x xx

x

+ x

x

x

x

++x

x

x

+

x

+x

x

xx+ +

+ +

xx

x

x

x

+

x+

x

x

x+

x

x

x

x

x

x+ x

+ ++x

x

+

x

+

x+x

+

+

+

+

x x+

x +

x

+

++x

+ x

+x

x

x

+

++

++ x

x+

+

++ +

+

x

+

++x

x

x

x+

x

x

+

x

x

x

x+

+

+ x

++

+

+

x

+x x

+

x x

x

++

+

+ ++

+

x+

+

+

++

x+

x+

x

+x

xxx+ xx +

+

++

+x

+

std error

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0 100 200 300 400

−4

−2

02

4

predicted

std

erro

r +

x x+ +

xx x

x+++

x x+++xxx x

+

x x xx

++x

++x

+x

xx+++

+++ +++

x++

x

+xx

++

+x

xxx xx

+ + ++ +

x

xx

+

+

x

+ x+

xx

+ +

xx

+x

xxx

++

x+

x

xx

+

x

+ + +

xx

+

x

+

xx

+++

x x

+++

xx

+

x+

x+x

xx x

+x

+x +x

+

+x

xx x

+

x x x

++

+

xx

x

+

+ +x x

+

xxx +

+

x

+

x

+

xxx

+xxx

+x

+

x xxx

+

+

x

x

++

x x

+ x

x

++

++x

++

xx

xx x

+x

x

x

+

x x

+

x+

xx+

+

x

++

+++ +

xx

+

+

x

+

+

+++

+

xxx

+

x

+x

++ x

+

x xx

+

+x

xx

++x

x

+

+

+

x

+

+

x+

x

+

xx

x

+x

x

x

xxx

xx

++

x

+

x

+ +

+ + +

+

x

+

x

+

x

x x

+

+

xx xx

+ ++

x+

+

xx

x

x

x x x

xx

xx

+x+

x

+

+ +

x+

x

++

x

++

x

++

+

x

xx

+

x

+

xx

x

++

+

x

xx

++

+

x

++

+

+

x

xxx

+

x

+

+

+

x+

x ++ +

xx

+

++ + + +

x+

xxx

x

+

+ ++

xx

+

xxx

x

+x

+

x

+

x

+

+

+

x x

+

x

+

xx

x

x

+ +++

+

x

xx

++

x

++x

+

+

x

x

x x+

x

+

x

++

x x+

++

xx xx

x

+

xxxx

x

+

x

+

+

+

xxxx

0 100 200 300 400 500 600

−4

−2

02

4

data

std

erro

r +

x x+ +

xxxx

+++xx++

+xxx x

+

x x xx

++x

++x

+x

xx+ +++

++ +++x+

+

x

+xx

++

+x

xx x xx+ + ++ +

x

xx

+

+

x

+x+

xx

+ +

xx

+x

xxx

++

x+

x

xx

+

x

+ + +

xx

+

x

+

xx

+++

x x

+++

xx

+

x+

x+x

xx x

+x+x +

x

+

+x

xx x

+

x x x

++

+

xx

x

+

+ +x x

+

xxx +

+

x

+

x

+

xxx

+xxx

+x

+

x xxx

+

+

x

x

++

x x

+ x

x

++

++x

++

xx

xx x

+x

x

x

+

x x

+

x+

xx+

+

x

++

+++ +

xx

+

+

x

+

+

+++

+

xxx

+

x

+x

++ x

+

x xx

+

+xxx

++x

x

+

+

+

x

+

+

x+

x

+

xx

x

+x

x

x

xxx

xx

++

x

+

x

+ +

+ + +

+

x

+

x

+

x

x x

+

+

xx xx

+ ++

x+

+

xx

x

x

x x x

xx

xx

+x+

x

+

+ +

x+

x

++

x

++

x

++

+

x

xx

+

x

+

xx

x

++

+

x

xx

++

+

x

++

+

+

x

xxx

+

x

+

+

+

x+

x + + +

xx

+

++ + + +

x+

xxx

x

+

+ ++

xx

+

xxx

x

+x

+

x

+

x

+

+

+

x x

+

x

+

xx

x

x

+ +++

+

x

xx

++

x

++x

+

+

x

x

x x+

x

+

x

++

x x+

++

xx xx

x

+

xxxx

x

+

x

+

+

+

xxxx

68

Page 71: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

PART IV:

BAYESIAN INFERENCEFOR THE

GAUSSIAN MODEL

1. Basic Concepts

2. Bayesian Analysis of the Gaussian Model

3. A Case Study: Swiss Rainfall data

69

Page 72: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

1. Bayesian Analysis - Basic Concepts

Bayesian inference deals with parameter uncer-tainty by treating parameters as random vari-ables, and expressing inferences about parame-ters in terms of their conditional distributions,given all observed data.For inference about model parameters, the fullmodel specification now should include the mo-del parameters:

[Y, θ] = [θ][Y |θ]

Bayes’ Theorem allows us to calculate:

[Y, θ] = [Y |θ][θ] = [Y ][θ|Y ]

Thus,

[θ|Y ] = [Y |θ][θ]/[Y ]

is the posterior distribution where

[Y ] =

∫[Y |θ][θ]dθ.

70

Page 73: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

The Bayesian paradigm:

(a) Model• the full model specification consists of

[Y, θ] = [Y |θ][θ].• formulate a model for the observable vari-

able Y .

• this model defines [Y |θ] (and hence an ex-pression for the log-likelihood `(θ;Y ))

(b) Prior• before we observe Y , the marginal [θ] ex-

presses our uncertainty about θ

• call [θ] prior distribution for θ

(c) Posterior• having observed Y , it is no longer an unk-

nown (randomly varying) quantity

• therefore revise uncertainty about θ byconditioning on the observed value of Y

• call [θ|Y ] posterior distribution for θ, anduse it to make inferential statements

NOTE: the likelihood function occupies a cen-tral role in both classical and Bayesian infe-rence

71

Page 74: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Example: Swiss rainfall data

0 50 100 150 200

020

40

60

80

100

distance

sem

ivariance

WLSMLposterior mode

Fitted variograms (100 data), using three differentmethods of estimation: (a) curve-fitting (thin line); (b) ML(dashed line); (c) posterior mode (thick line).

72

Page 75: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Prediction

Because Bayesian inference treats θ as a ran-dom variable, it makes no formal distinctionbetween parameter estimation problems andprediction problems, and thereby provides a na-tural means of allowing for parameter uncer-tainty in predictive inference.The general idea for prediction is to formulate amodel for

[Y, T, θ] = [Y, T |θ][θ]

and make inferences based on the conditionaldistribution

[T |Y ] =

∫[T, θ|Y ]dθ

=

∫[θ|Y ][T |Y, θ]dθ

73

Page 76: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Comparing plug-in and Bayesian

• the plug-in prediction corresponds to inferen-ces about [T |Y, θ]

• Bayesian prediction is a weighted average ofplug-in predictions, with different plug-in va-lues of θ weighted according to their conditi-onal probabilities given the observed data.

Bayesian prediction is usually more cautiousthan plug-in prediction, or in other words:

• allowance for parameter uncertainty usuallyresults in wider prediction intervals

Notes:

(a) Until recently, the need to evaluate the in-tegral which defines [Y ] represented a majorobstacle to practical application.

(b) Development of Markov Chain Monte Carlo(MCMC) methods has transformed the situa-tion.

(c) BUT, for geostatistical problems, reliableimplementation of MCMC is not straight-forward. Geostatistical models don’t havea natural Markovian structure for the algo-rithms work well.

(d) in particular for the Gaussian model other al-gorithms can be implemented.

74

Page 77: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

2. Results for the Gaussian Model

Uncertainty only in the mean parameterAssume for now that only the mean parameterβ is regarded as random with (conjugate) prior:

β ∼ N(mβ ; σ2Vβ

)

The posterior is given by

[β|Y ] ∼ N((V −1β + F ′R−1F )−1(V −1

β mβ + F ′R−1y) ;

σ2 (V −1β + F ′R−1F )−1)

∼ N(β ; σ2 Vβ

)

The predictive distribution is

p(S∗|Y, σ2, φ) =

∫p(S∗|Y, β, σ2, φ) p(β|Y, σ2, φ) dβ.

with mean and variance given by

E[S∗|Y ] = (F0 − r′V −1F )(V −1β + F ′V −1F )−1V −1

β mβ +[r′V −1 + (F0 − r′V −1F )(V −1

β + F ′V −1F )−1F ′V −1]Y

Var[S∗|Y ] = σ2[V0 − r′V −1r+

(F0 − r′V −1F )(V −1β + F ′V −1F )−1(F0 − r′V −1F )′

].

The predictive variance has three interpreta-ble components: a priori variance, the reduc-tion due to the data and the uncertainty in themean.Vβ → ∞ corresponds to universal (or ordinary)kriging.

75

Page 78: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Uncertainty for all model parametersAssume (w.l.g.) a model without measurementerror and the prior p(β, σ2, φ) ∝ 1

σ2p(φ).The posterior distribution:

p(β, σ2, φ|y) = p(β, σ2|y, φ) p(φ|y)

pr(φ|y) ∝ pr(φ) |Vβ|12 |Ry|−1

2 (S2)−n−p

2 .

Algorithm 1:(a) Discretise the distribution [φ|y], i.e. choose a

range of values for φ which is sensible for theparticular application, and assign a discreteuniform prior for φ on a set of values span-ning the chosen range.

(b) Compute the posterior probabilities on thisdiscrete support set, defining a discreteposterior distribution with probability massfunction pr(φ|y), say.

(c) Sample a value of φ from the discrete distri-bution pr(φ|y).

(d) Attach the sampled value of φ to the distribu-tion [β, σ2|y, φ] and sample from this distribu-tion.

(e) Repeat steps (3) and (4) as many times asrequired; the resulting sample of triplets(β, σ2, φ) is a sample from the joint posteriordistribution.

76

Page 79: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

The predictive distribution is given by:

p(S∗|Y ) =

∫ ∫ ∫p(S∗, β, σ2, φ|Y ) dβ dσ2 dφ

=

∫ ∫ ∫p(s∗, β, σ2|y, φ)

dβ dσ2 pr(φ|y) dφ

=

∫p(S∗|Y, φ) p(φ|y) dφ.

To sample from this distribution:

Algorithm 2:

(a) Discretise [φ|Y ], as in Algorithm 1.

(b) Compute the posterior probabilities on thediscrete support set. Denote the resultingdistribution pr(φ|y).

(c) Sample a value of φ from pr(φ|y).

(d) Attach the sampled value of φ to [s∗|y, φ] andsample from it obtaining realisations of thepredictive distribution.

(e) Repeat steps (3) and (4) as many times as re-quired to generate a sample from the requi-red predictive distribution.

77

Page 80: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Note:

(a) The algorithms are of the same kind to treatτ and/or κ as unknown parameters.

(b) We specify a discrete prior distribution on amulti-dimensional grid of values.

(c) This implies extra computational load (butno new principles)

78

Page 81: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

3. A Case Study: Swiss rainfall, 100data

60 80 120

−563

.5−5

62.5

σ2

prof

ile lo

g−lik

eliho

od

15 20 25

−563

.5−5

62.5

φ

prof

ile lo

g−lik

eliho

od

Profile likelihoods for covariance parameters: σ2; φ

σ2

Dens

ity

0 200 400 600 800

0.00

00.

005

0.01

00.

015

φ

Dens

ity

0 20 40 60 80 100

0.00

0.05

0.10

0.15

Posterior distributions for covariance parameters: σ2; φ

σ2

φ

50 100 150 200

1015

2025

3035

200 400 600 800

1020

3040

5060

7080

σ2

φ

2D profile log-likelihood (left) and samples from posteriordistributions (right) for parameters σ2 and φ

79

Page 82: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Swiss rainfall: prediction results

0 50 100 150 200 250 300 350

−50

050

100

150

200

250

Coords X

Coo

rds

Y

7 150 294 437 581

0 50 100 150 200 250 300 350

−50

050

100

150

200

250

Coords X

Coo

rds

Y

8 5297 10587 15876 21165

Predicted signal surfaces and associated measures of pre-cision for the rainfall data: (a) posterior mean; (b) poste-rior variance

0 50 100 150 200 250 300 350

−50

050

100

150

200

250

Coords X

Coo

rds

Y

0 0.1 0.5 0.9 1

Posterior probability contours for levels 0.10, 0.50 and0.90 for the random set T = {x : S(x) < 150}

80

Page 83: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Swiss rainfall: prediction results (cont.)

0 50 100 150 200 250 300 350

−50

050

100

150

200

250

Coords X

Coo

rds

Y

1

23 4

Recording stations and selected prediction locations (1 to4)

0 100 200 300 400 500 600

0.0

00

0.0

05

0.0

10

0.0

15

rainfall

density

Loc. 1Loc. 2Loc. 3Loc. 4

Bayesian predictive distributions for average rainfall atselected locations.

81

Page 84: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Comments

• nugget effect

• other covariates (altitude, temperature, etc)

• data “peculiarities”

• (variogram) estimation: REML vs Bayesian

• priors

• Bayesian implementation algorithm

• spatial-temporal modelling

• vector random fields

• ad hoc vs(?) model-based

82

Page 85: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

PART V:

GENERALIZED LINEAR SPATIALMODELS

1. Generalized linear mixed models

2. Inference for the generalized linear geostatisticalmodel

3. Application of MCMC to Generalized Linear Pre-diction

4. Case-study: Rongelap Island

5. Case-study: Gambia Malaria

Page 86: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Non-Gaussian data

Positive data:

0 1 2 3 4 5

0.0

0.3

0.6

Count data:

0 1 2 3 4 5 6 7

0.00

0.15

0.30

Binomial data:

0 1 2 3 4

0.00

0.15

0.30

Positive data with zeros:

0.0

0.3

0.6

84

Page 87: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Towards a model specification

The linear model

Y = Fβ + ε

can be written as:

Yi ∼ N(µi, σ2)

µi =∑k

j=1 fijβj

and generalized in two ways

Yi ∼ Q(µ, ...)

h(µi) =∑k

j=1 fijβj,

where Q is distribution in the exponential fa-mily and h(·): known link function

• Generalized Linear Models (GLM)

• no longer requires: normality, homocedasticity,continuous scale

• linear model: particular case

85

Page 88: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Generalized Linear Mixed Models

Classical generalized linear model has

• Yi : i = 1, ..., nmutually independent, with µi = E[Yi]

• h(µi) =∑k

j=1 fijβj,for known link function h(·)

Generalized linear mixed model has

• Yi : i = 1, ..., nmutually independent, with µi = E[Yi], conditio-nal on realised values of a set of latent randomvariables Ui

• h(µi) = Ui +∑k

j=1 fijβj,for known link function h(·)

Generalized linear geostatistical model has

• Yi : i = 1, ..., nmutually independent, with µi = E[Yi], conditio-nal on realised values of a set of latent randomvariables Ui

• h(µi) = Ui +∑p

j=1 fijβj,for known link function h(·)

• Ui = S(xi)where {S(x) : x ∈ IR2} is a spatial stochasticprocess

86

Page 89: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Examples

x1, . . . , xn locations with observations

Poisson-log

• [Y (xi) | S(xi)] is Poisson with density

f (z;µ) = exp(−µ)µz/z! z = 0, 1, 2, . . .

• link: E[Y (xi) | S(xi)] = µi = exp(S(xi))

Binomial-logit

• [Y (xi) | S(xi)] is binomial with density

f (z;µ) =

(r

z

)(µ/r)z(1− µ/r)r−µ z = 0, 1, . . . , r

• link: µi = E[Y (xi) | S(xi)] , S(xi) = log(µi/(r − µi))

Likelihood function

L(θ) =

IRn

n∏i

f (yi;h−1(si))f (s | θ)ds1, . . . , dsn

High-dimensional integral !!!

87

Page 90: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Inference For The Generalized LinearGeostatistical Model

• likelihood evaluation involves high-dimensionalnumerical integration

• approximate methods (eg Breslow and Clayton,1993) are of uncertain accuracy

• MCMC is feasible, although not routine.

88

Page 91: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Application of MCMC to GeneralizedLinear Prediction

• Ingredients

· Prior distributions for regression parametersβ and covariance parameters θ

· Data: Y = (Y1, ..., Yn)

· S = (S(x1), ..., S(xn))

· S∗ = all other S(x)

• Conditional independence structure

Y S

θ

S*

• Use output from chain to construct posteriorprobability statements about [T |Y ], where T =F(S∗)

89

Page 92: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Case-study: Rongelap Island

This case-study illustrates a model-based geosta-tistical analysis combining:

• a Poisson log-linear model for the sampling dis-tribution of the observations, conditional on alatentGaussian process which represents spatial vari-ationin the level of contamination

• Bayesian prediction of non-linear functionals ofthe latent process

• MCMC implementation

Details are in Diggle, Moyeed and Tawn (1998).

90

Page 93: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Radiological survey of Rongelap Island

• Rongelap Island– approximately 2500 miles south-west of

Hawaii

– contaminated by nuclear weapons testing du-ring 1950’s

– evacuated in 1985

– now safe for re-settlement?

• The statistical problem– field-survey of 137Cs measurements

– estimate spatial variation in 137Cs radioacti-vity

– compare with agreed safe limits

91

Page 94: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Poisson Model for Rongelap Data

• Basic measurements are nett counts Yi overtime-intervals ti at locations xi (i = 1, ..., n)

• Suggests following model:

· S(x) : x ∈ R2 stationary Gaussian process(local radioactivity)

· Yi|{S(·)} ∼ Poisson(µi)

· µi = tiλ(xi) = ti exp{S(xi)}.

• Aims:

· predict λ(x) over whole island

· max λ(x)

· arg(max λ(x))

92

Page 95: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Predicted radioactivity surface using log-Gaussian kriging

-6000 -5000 -4000 -3000 -2000 -1000 0

-500

0-4

000

-300

0-2

000

-100

00

1000

0 2 4 6 8 10

93

Page 96: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Predicted radioactivity surface using Pois-son log-linear model with latent Gaussianprocess

-6000 -5000 -4000 -3000 -2000 -1000 0

-500

0-4

000

-300

0-2

000

-100

00

1000

0 5 10 15

• The two maps above show the differencebetween:

– log-Gaussian kriging of observed counts perunit time

– log-linear analysis of observed counts

• the principal visual difference is in the extentof spatial smoothing of the data, which in turnstems from the different treatments of the nug-get variance

94

Page 97: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Bayesian prediction of non-linear functio-nals of the radioactivity surface

The left-hand panel shows the predictive distribu-tion of maximum radioactivity, contrasting the ef-fects of allowing for (solid line) or ignoring (dottedline) parameter uncertainty; the right-hand panelshows 95% pointwise credible intervals for the pro-portion of the island over which radioactivity exce-eds a given threshold.

Intensity level

Den

sity

10 20 30 40 50 60

0.0

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Intensity level

Sur

vivo

r fu

nctio

n

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

95

Page 98: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

• The two panels of the above diagram illustrateBayesian prediction of non-linear functionals ofthe latentGaussian process in the Poisson log-linear mo-del

• the left-hand panel contrasts posterior distribu-tions of the maximum radioactivity based on:(i) the fully Bayesian analysis incorporating theeffects of parameter uncertainty in addition touncertainty in the latent process (solid line)(ii) fixing the model parameters at their estima-ted values, ie allowing for uncertainty only inthe latent process

• the right-hand panel gives posterior estimateswith 95% point-wise credible intervals for theproportion of the island over which radioactivityexceeds a given threshold (dotted line).

96

Page 99: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Case-study: Gambia malaria

• In this example, the spatial variation is of se-condary scientific importance.

• The primary scientific interest is to describehow the prevalence of malarial parasites de-pends on explanatory variables measured:

– on villages

– on individual children

• There is a particular scientific interest inwhether a vegetation index derived from satel-lite data is a useful predictor of malaria preva-lence, as this would help health workers to de-cide how to make best use of scarce resources.

97

Page 100: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Data-structure

• 2039 children in 65 villages

• test each child for presence/absence of malariaparasites

Covariate information at child level:

• age (days)

• sex (F/M)

• use of mosquito net (none, untreated, treated)

Covariate information at village level:

• location

• vegetation index, from satellite data

• presence/absence of public health centre

98

Page 101: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Logistic regression model

Logistic model for presence/absence in each child:

• Yij = 0/1 for absence/presence of malaria para-sites in jth child in ith village

• fij = child-specific covariates

• wi = village-specific covariate

• logitP (Yij = 1|S(·)) = f ′ijβ1 + wi′β2 + S(xi)

Is it reasonable to assume conditionally indepen-dent infections within same village?If not, we might wish to extend the model to allowfor non-spatial extra-binomial variation:

• Ui ∼ N(0, ν2)

• logitP (Yij = 1|S(·), U) = f ′ijβ1 + wi′β2 + Ui + S(xi)

99

Page 102: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Exploratory analysis

• fit standard logistic linear model, ignoring S(x)and/or U

• compute for each village:

Ni =∑ni

j=1 Yij

µi =∑ni

j=1 Pij

σ2i =

∑nij=1 Pij(1− Pij)

• compute village-residuals, ri = (Ni − µi)/σi

• apply conventional geostatistics to derived datari

• variogram indicates residual spatial structure

100

Page 103: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Variogram of residuals

0 5 10 15 20 25 30

01

23

4

distance (km)

semi−v

ariance

101

Page 104: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Model-based geostatistical analysisα = intercept term in linear predictorβ1 = regression coefficient for ageβ2 = regression coefficient for bed-net useβ3 = regression coefficient for treated bed-netβ4 = regression coefficient for green-ness indexβ5 = regression coefficient for presence of public he-alth centre in villageν2 = variance of non-spatial random effects Uiσ2 = variance of spatial process S(x)

φ = rate of decay of spatial correlation with dis-tanceκ = shape parameter for Matern correlation func-tion

Param. 2.5% Qt. 97.5% Qt. Mean Medianα -4.232073 1.114734 -1.664353 -1.696228β1 0.000442 0.000918 0.000677 0.000676β2 -0.684407 -0.083811 -0.383750 -0.385772β3 -0.778149 0.054543 -0.355655 -0.355632β4 -0.039706 0.071505 0.018833 0.020079β5 -0.791741 0.180737 -0.324738 -0.322760ν2 0.000002 0.515847 0.117876 0.018630σ2 0.240826 1.662284 0.793031 0.740790φ 1.242164 53.351207 11.653717 7.032258κ 0.150735 1.955524 0.935064 0.830548

• note concentration of posterior for ν2 close tozero

102

Page 105: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Map of the predicted surface S(x)(posterior mean)

Kilometres

Kilo

met

res

300 400 500 600

1400

1500

1600

-1.5 0.0 1.0

Western Eastern

300 400 500 600

1400

1500

1600

300 400 500 600

1400

1500

1600

Central

300 400 500 600

1400

1500

1600

103

Page 106: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Posterior density estimates for S(x) at two se-lected locations.

S(x)

Den

sity

-4 -2 0 2 4

0.0

0.2

0.4

0.6

0.8

S(x)

Den

sity

-4 -2 0 2 4

0.0

0.2

0.4

0.6

0.8

x = (452, 1493)x = (520, 1497)

• solid curve – remote location (452, 1493),

• dashed curve – location (520, 1497), close to ob-served sites in central region.

104

Page 107: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Empirical posterior distributions for regres-sion parameters

0.0004 0.0008

050

010

0015

0020

0025

0030

00

beta_1

Dens

ity

-1.0 -0.6 -0.2 0.2

0.0

0.5

1.0

1.5

2.0

beta_2

Dens

ity

-1.0 -0.5 0.0

0.0

0.5

1.0

1.5

beta_3

Dens

ity

o

o

o

o

o

o

o

oo

o

o

ooo

o

oo

oooo

ooo

o

o

o

o

o

oooo

o

o

o

ooo

o

ooo

ooo

o o

o

o

o

o

oo

o

o

oo

o

oo o

ooo

oo

o

o

oo

o

o

o

oo

o

o

ooo

oo

oo

o

o o

oo

oo

o

oo

o

oo

oo

o

o o

o

o

o

oo

o

ooo

oo

o

o

o

o

o

ooo

o

o

ooooo

ooo

o

o

o

o

o

o

oo

ooo

o

o

o

ooo

o

o

o

o

o

oo

o o

o

o

o

o

oo o

oo

o

o

o

oo

o

o

o

o

oo

o

oo

o

ooo

o

o

o

o

o

o

oooo

o

o

ooo

o

o

oo

oo

o

oo

o

ooo

ooooooo

o o o

o

o

oo

oo

ooo

o

oo

o

ooo

oo

o

o

o

oo

o

o

o

o

o

o oo

oo

o

ooo

oo

o

o

oo

o

oo

o

o

o o

o o

o

ooo

o

o

o

o

o

o

oo

o o

o

o

o

o

ooo

o o

o

o

oo

o

oo

o

o

o

oooo

oo

o

o

oo

o

o

oo

oo

o

o

ooo

oo

o

oo

o

o

o

o oo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

oo

o

o

oo

o

o

oo

o

oo

oo

o

o

o

ooo

o

oo

oo

o

o

ooo

o

oo

o

o

o

ooo

oo oo

o

o

o

o

oo

o

o

oo

oo

o

o

oo

o

o

o

o

o

o

o

o o

o

oo

o

o

oo

o

o

oo

o

o

oo

o

o

o oo

o

o

oo

ooo

o

o

o

oo

o

o

o

o

o

oo

o

o o

ooo

oo

o

o

o

o

o

oo

o oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

oooo

o

o

oo

o

o

oo

oo

ooo

oo

oo

o

o

o

oo

oo

o

o

o

o

o

o

oo o

o

o

oo

o

oo

oo

oo

o

o oo

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

ooo

o

oo

oo

oo

o

o

o

oo

o

o

o

o o

o

o

o

o

o

oooo

o

o

o

o

oo

o

o

ooo o

o

ooo

o

o

o

o

oo

oo

oo

o

o

oo

o

o

o

ooooo

oooo

ooo

o

oooo

o

o

oo

o

o

o o

o

oooo

o

o

o

oo

o

o

o

o oo

o

o

o

o o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

oo

o

o

oo

oo

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o oo

o

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

o

oo

o

o

oo

o

ooo

ooo

o o

oo

o

oo

o

o

oo ooo

ooo

o

o

o

o

o

ooo

o

o ooo

o

o

oo

oo

o

o

oo

oo

o

o

o

oo

o ooo

o o

o

oooo

oo

o

oo

o

ooo oo ooo

o

o

o

ooo

oo

oo

o

o

o

o

o oo

oo o

o

ooo

o

oo

o

ooo

o

o

oo

o

o

o

ooo

o

o

o

oo

o

o

oo

o

o

o

o

o

o oo

o

o

o

o

o

ooo

oo

o

o

o

ooo

o

o

oo

o

o

o

o

ooo

o

o

oo

o

o

o

o

oo

o

o

ooo

o

o

ooo

o

o

o

o

oo

oo

ooo

oooo

oo

o

o

o

o

o

o

o

o

ooo

o

o

oo

o

o

ooo

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o o

ooo

oo

o

o

o

o oo

ooo

oo

ooo

o

o

oo

o

o

o

o ooo

o

oo

o o

o oo

o

o

ooo

ooo oo

o

oo

ooooo

oo

oo

o

o

o

oo

o o

o o

ooo

oo

oo

o

o

o

o

oo

oo

o

o

oo

o

oo

oo

o

o

oo

o

o

oo

o

o

o

o

ooo

oo

o

o o

ooo

o

ooo

o

o

oo

o

oo

oo

o

o

o

o

o

oo

o

oo

o

o

oo

o

oo

o o

o

oo

o

o

oo

o

o oo

oo

o o

oo

o

o

o

o

o

o

o

o

o

oo

oo

oooo

o

oo

ooo

o

ooo

o

o

o

ooo

oo o

o o o

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

o

o

oo

oo

o

o

oo

o

oo

o

o

o o

o

o

o o

o

o

o

o

o

oo

o oo

o

o

o

oo

oo

o

o

o

oooo

o

o

oo

oooo o

o

o o

o

o

o

oo

o

o

o

o

oo

o o

oo

oo

ooo

o

ooo

o

o

oo

o

o

oo

o

o

ooo

o

o oo

oo

o

o

o

oo

o

o

ooooo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

oo

o

ooo

o

oo

o

o

oo

oo

o

o

o

o

oo

o

o

o

o

o

o oooo

oooo

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

oo

oo

ooo

o

o

o

ooo

o

o

o

o

oo oo

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oooo

o

o

o

o o

ooo

o

oo

oo

o

o

o

o

o

o

o

o

o oo o

oo

oo

o

o

o

o

ooo

oo

oo

o

o

oo o

o

oo

o

ooo

o

oo

o

o

o

o

o

ooo

o

o

o ooo

o

o

o

o

o

o

oo

o

o

o

o o oo

o

o

o

oo

o

oo

o

o

o

o

oo

o

o

ooo

oo

oo

o

o

o

o

o

oo

o

o

oo

o

o

oo

oo

o

o

o oo

oo

o

oo

o o

oo

o

oo

oooo

o

o

oo

o

ooooo o

o

o

ooo

oo

oo

o

o o

oo

o oo

o

o

o

oooo

o

o

o o

o

o

oo

oo

oo

o

o

o

oo

o

o

o

o oo

o

o

oo

o

o

o

oo

oo

o ooo

o

o

o

ooo

o

oo

oo

o

oo

ooo

o

o

oo

o

o

ooo

o o

oo o

oo o

oo

o

o

o

o

ooo

o

o

oo

o

o

o

oo

o

o

oo

oo

oo

oo

o

o

o

o

o

o

oo o

oo

o

o

o

o

o

ooo

o

oo

o oo

o

o oo o

o

oo

o

o

o

o

ooo

o

oo

o

oo

o

oo

o

o

o

oo

ooooo o

oo

o

o

o

o

oo

o

o

o

oo

oo

o

o

oo oo

oo

o

o o

o

oo

oooo

oo

o

ooo ooo

o

oo

o

o

oo

o

oo

o

o

o

oo

oo

oo

o

o

o

o

oo o

ooo

ooo o

o

oooo

ooo

ooo

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

oo

o

oo

oo

o

o

o

oo

o

oo

o

o

o

oo

o

o

oooo

o

o

o

o

oo

oo

o

o

oo o

oo

o

oo

o

o

oo

o

o

o

o

o

o

o o

o

o o

o

o

o o

o

o

o

o

o

o

o

oo

o

oo

o

oo

o

o

o

o

oo

ooo

o

o

o

ooo

oo

o

o

o

o

o

o

o

o

o

oo

ooo

o

o

o

o

o

o

o

o

o oo

o

ooo

oo

o oo

o

oo

o

o

o

o

o

o o

o

ooo

o

oo

o oo

ooo

o

o

ooo

o oo

o

o

oooo

oo

oo

o o

o

ooo

o

o

o oo

o

o

ooo

oo

oo

o

oo

oo

o

oo

o

o

oo

o

oooo

oo

o

oo

o oo

o

o

oo

o oo

oo

o

o

o

o

o

o o

o

o

o

o

oo

o

o

oo

o

oo

oo

o

o

o

o o

oo

ooo

oo

o

oo

o

o

o

oo o

o

o

o

ooooo o o o

oo

o

o

oo

oo

o o

o

o

o

oo

oo

o

o

o

o

oo

oo

o

o

oo

o

oooo

o o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

oo

o

o

o

oo

oo

o

o

o

o

o

oo

ooo

oo

oo

ooo o

o

oo

o

oo

o

oo

o

o

o

oo

ooo

o

o

o

o

o

ooo

o o

o

o

oo

o

oo

o

o

o

o

oo

o

o

o

o

oooo

oo

o

oo o

o

o

o

o o

o

oo

o

oooo

oo

oo

oo

oo o

o

o

ooo

oo

oo

o

o

oo

o

oo

o

o

oo

o

o o

o

oo

o

o

o

ooo

o

ooo

o

o o

oo

oo

oo

o

oo

o

o

o

o ooo

o

o

o

o

o

o

oo o

o

oo

o

ooo o

o

o

o

o

o

o

oo o

oo

o

o

o

o

oo

o

oooo

o

o

o

ooo

oo o oo

o

o

o

o

oo

oo

o

o o

o

oo

o

o

oo

o o

o

o

o

ooo

o

o

o

o

oo

oo o

oo

oo o

o

o

o

ooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

oo

o

o

o

oo

o

o

o

o

oo

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

oo

o

o

oo

oo

o

o

o

oo

o

o

ooo o

oo

o

oooo

oo

oo

o

oo

o

o

o

o

oo

o

o

o

ooo

o

o

o

o

oo

oo

o

o

o

o

oo

o

o

oooo

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

oo

o

o o

o

o

o

o

o

oo

ooo

oo

o

o

o

o

o o

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

o o

o

oo

oo

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

oo

ooo

oo

oo

o

o

o

o

oo

o

o

oo

o

o

ooo

oo

o oo

o

oooo

o

o

o

oo

o

oo

oo

o

o

o

o o

o

o

oo

ooo

o

o

oo

o

o

o

o o

oo

o

o

o

o

o

o

o

o

ooo

oo

o

o

oo

o oo

o

ooo

oo

o

o

o

o

o

o

oo

ooooo

ooo

o

oo

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

ooo

o

o oo

o

oo

oo

o

oo

o

o

o

o

o

o

o

oo

oo

oo

o

oo

oo

o

o

ooo

o

oo

ooo o

ooo

oo

ooo

o

o

o

oo

oo o o

oo

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

oo

o

o

oooo

o

oo

o

o

o

o

o

o

oo

o

o o

o

o

oo

o

oo

oo

oo

ooo

oo

o

oo

oo

o

o

oo

o

o

o

oo

o

oo o

o

ooo

oo

o

o

o

o

o

o

oo

o

oo

oo o

o

o

o

o

o

o

o ooo

oo

o

oo

o

o o

oo

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o o

o

o o

o

o

o

o

o

o

o ooo

oo

o

oo

o

o

o

ooo

o

o oo

o

oo

o o

o oo

oo

oo

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oooo

oo

o

o

o

o

o

oo

o

o

o

oo

oo

o

oo

ooo oo

o

o

oo o

ooo

oo

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

oo

oo

oo

o

o

oo

o

ooo

o

oooo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

ooo

o

ooo

oo

o

oo

o

oo

o

o

oo

o

o

oo

ooo

o

ooo

oo

o

ooo

o

oo

o

o

o

o

o

o

ooo

o

oooo o

o

o

o

o

o

o o o

o

o

o

o

o

ooo

o

ooo

o

o oooooo

oo

o

oo

o

o

o

o

o

o

o

o

o

ooooo

o

oo

oooo

o

o

o

oo

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

oo

o

o

ooo

oo

o

oo

oooooo

o

o

oo o

o

o

o

o

o

oo

o o

o

o

o

o

o

o

o

ooo

oo

o

oo

o

oo

o

o

ooo

o oo

oo

o

o

o

oo

o

oo

o

o

oo

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

oo

oo

o

oo oo

o

o

o

o o

o

oo

oo o

o

o

o

o

o ooo

ooo

o

o

ooo

o

o

o

o

o

o

o

o

o

o

oooo o

o

o

o

oo

oo

o

o

o

o

ooo

o

oooo

o

o

o

o

o

oo

o o

o

o

o

o

oo

o

oo

o

ooo

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

oooo

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

ooo

o

ooo

o

o

o

oo

oo

oo

oo

oooo

o ooo

ooooo

o

oo

oo

o

o

ooo

o

o

o

o

oo

o

o

oo

o

oooo

o

o

oo

o

o

ooo

oo

o

o

o

o

o

oo

o

o

oo

o

o

oooo

o oo

o

o

o

oo

o

o

o

o

o

oo

ooo

o

o o

o o

o oo

o

o

o

oo

o

o

o oo

ooo

o

o ooo

o

o

o

o

o

o

oo

o

o

o oo

o

o

o o

o o

o

o

o

o

o

o

ooo

oo

oo

o

oo

o

o

o

o

oo

o

o

o

ooo

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

oo

o

o

o

oo

oo

o

o

oo

o o

o

o

oo

ooo

o

o

oo

o

oo

ooo

o

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

o

o

oo

o

o

o

oo

oo o

o

ooo

ooo

o

o

o

oo

oo

o

oo

oo

o

o

o

o

o

oo

ooo

ooo

o

o

o

o

o

o

ooo

o

o

o

o

o

o

oo

o

ooo

o o

o

o

o

oo

oo

o

o

oooo

oo

oo

o

o

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o o

o

o

o

o

o

o

o

o

o

o o

o

oo

oo

o

ooo o

oo

o

oo

o

o

oo

o

o

o

ooo

o

oo

o

oo

o

oo

o

o

ooo

o

o

o

o

o

o

o

o

oo

oo

oo o

oo

o

o

o

ooo

o

oo

o

o

o

o o

oo

o o

oo ooo

o o

oo

oo

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oooo

ooo

o

o

o

o

o

o

o

oo

o

o

oooo

oo oo

oo o

oo

o

o

o

o

oo

o

o

o

oo

oo

oo

o oo

oo

oo

o

o

oo

o

oooo

o

ooo

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

oo

oo

oo

ooo

oo

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

oo o

o

o

o

o

o

o oo

o

o

o o

o

o

oo o

o

o

o

o o

o

oo

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oooooo

ooo

oo

oooo

oo

o

oo o

oo

o

o

o

o

oo

oo

o

o

o

o

oo

ooo

o

oo

oo

ooo

o

o

o

o

o

o

o

o

o

oo

o

oo

o

oo oo

o

o

o

o

o

o

o o

oo

ooo

oo o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

o o

o

oo

o o oo

o o

o o

ooo

oooo

o

oo

o

o

o

o

o

o

o

o

ooo

o

o

o

o

oooo

o

o o

o

o

o

o

o

o

o

o

o

o o

oo

ooo o

o

ooo

o

o

oo

ooo

o

oo

o

o

o

o

ooo

oo

oo

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

ooo

o

oo

o

oo

o

o

ooo

o

oo

o

o

o

o

o

ooo

ooo

o

oo

ooo o

oooooo

o

o

oo

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

oo

ooo

ooo

o

o

o

o

o

o

oo

oooo

o

oo

oo

o

o

oo

ooo

ooo

oo

o

ooo

ooo

oo

o

o

o

o

oo

o

o

o

oo o

o

o

oo

o

o

o

o

o

oo

oo

ooo

o

oo

o

o

oo

o

o

o

oo

o

oo

o

o

o

oo

o

ooo

o

o

o

oo

oo

o

o o

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo oo

oo

oo

o

ooo

oo

o

oo

o o

o

o

o

o

o

o

ooo

o

oo

o

o

o

o

o o

o

o

o

o

o

o

o

o

o

o

oo

oo

oo

o

oo

o

o

oo

o

o

o

o

oo

oo

oo

oo

o

o

o

ooo

o

o

o

o o

o

oo

o

o

o

oo

o

o o

oo

o

o

o

o

oo o

o

oo

oo

o

o

oo

o

ooooo

o

o

oo

o

oo

oo

o

ooo

ooo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

ooo

ooo

o

o

o

o

o

oo ooo

oo

oo o

oo

o

o

o

o

oo

ooo

o

oo

o

o

o

o

o

o

oo

oo

oo

o

o

oo

o

oo

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

oo

o ooo

oo

o

o

o

o

o

o

o

o

oo

o o

o

oo

o

oo

oo

o

o

o

o

o

oo

o

oo

o o

o

o o

oo

o

oo

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

o

o

o

o

oo

oo

o

beta_2

beta

_3

-1.0 -0.6 -0.2 0.2

-1.0

-0.5

0.0

• β1 = effect of age

• β2 = effect of untreated bed-nets

• β3 = additional effect of treated bed-nets

105

Page 108: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Goodness-of-fit for Gambia malaria model

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

o o

o

o

o

o

o

o o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

Fitted value

Resid

ual

0.25 0.30 0.35 0.40 0.45 0.50 0.55

-4-2

02

4

Village-level residuals against fitted values.

• rij = (Yij − pij)/√{pij(1− pij)}

• ri =∑rij/

√ni

• intended to check adequacy of model for pij

106

Page 109: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Distance (km)

Vario

gram

0 10 20 30 40 50 60

0.5

1.0

1.5

2.0

2.5

Standardised residual empirical variogram plot (village-leveldata and pointwise 95% posterior intervals constructed fromsimulated realisations of fitted model).

• rij = (Yij − p∗ij)/√{p∗ij(1− p∗ij)}

• ri =∑rij/

√ni

• logitp∗ij = α + f ′ijβ + S(xi)

• intended to check adequacy of model for S(x)

107

Page 110: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Is a geostatistical model necessary?

oo

o o

o

o

oo

o

o

o

o

o oo

oo

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

E[S(x)|data]

E[U

|dat

a]

-2 -1 0 1 2

-2-1

01

2

Plot of estimated posterior means of random effects Ui fromnon-spatial GLMM against estimated posterior means S(xi)

at observed locations in geostatistical model.

• high correlation represents strong empiricalevidence of spatial dependence

• but explicit modelling of spatial dependence hassmall effect on inferences about regression pa-rameters

108

Page 111: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Generalised linear Spatial ModelsDiggle, Tawn and Moyeed (1998).

Error distribution [Y (x) | S(x)]

Link function E[Y (x) | S(x)] = h−1(S(x))

Gaussian random field {S(x) : x ∈ A}• E[S(x)] = d(x)Tβ

• Cov (S(x), S(x′)) = σ2ρ(‖x− x′‖;φ) + τ 21{x=x′}.

Structure:

Y S

θ

S*

• Data : Y = (Y (x1), ..., Y (xn))

• S = (S(x1), ..., S(xn))

• S∗ = some other S(x)

• θ parameters

109

Page 112: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Prediction with known parameters

From figure, prediction is separated into threesteps.

• Simulate s(1), . . . , s(m) from [S|y] (usingMCMC).

• Simulate s∗(j) from [S∗|s(j)], j = 1, . . . ,m(multivariate Gaussian)

• Approximate E[T (S∗)|y] by

1

m

m∑j=0

T (s∗(j))

If possible: instead calculate E[T (S∗)|s(j)],j = 1, . . . ,m directly, and estimate E[T (S∗)|y] by

1

m

m∑j=0

E[T (S∗)|s(j)]

Advantage: smaller Monte Carlo error

110

Page 113: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

MCMC for conditional simulation

Let S = DTβ + Σ1/2Γ, Γ ∼ Nn(0, I).

Conditional density of [Γ |Y = y]

f (γ|y) ∝ f (y|γ)f (γ)

Langevin-Hastings algorithm

Proposal: γ′ from a Nn(ξ(γ), hI) where ξ(γ) = γ +h2∇ log f (γ | y).

Poisson-log Spatial model:∇ log f (γ|y) = −γ + (Σ1/2)T(y − exp(s)) where s =Σ1/2γ.

Expression generalises to other generalised linearspatial models.

MCMCM output γ1, . . . , γm. Multiply by Σ1/2 andobtain: s(1), . . . , s(m) from [S|y].

111

Page 114: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Random walk Metropolis algorithm

• Current state: γ.

• Proposal: γ′ from a Nn(γ, hI).

• Accept: γ′ with probability

a(γ, γ′) = 1 ∧ π(γ′)π(γ)

.

Langevin-Hastings algorithm

• Current state: γ.

• Proposal: γ′ from a Nn(ξ(γ), hI) whereξ(γ) = γ + h

2∇ log π(γ).

• Accept: γ′ with probability

a(γ, γ′) = 1 ∧ π(γ′)q(γ′, γ)

π(γ)q(γ, γ′).

Models considered here:

∇ log f (γ|y) = −γ+(Σ1/2)T{

(yi − g−1(si))gc(g

−1(si))

g(g−1(si))

}n

i=1.where s = Dβ + Σ1/2γ.

112

Page 115: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Geometric ErgodicityConvergence rate to equilibrium is geometricallyfast.

Geometrical ergodicity ⇒ Central Limit Theorem:

√m

1

m

m∑j=1

ψ(Γ(j))− E[ψ(Γ)|y] → N(0, σ2

ψ).

when ψ(·)2 ≤ V (·)

Random walk Metropolis:Conditions (Jarner and Hansen, 2000):

lim supµ→m1,m2

1

g′(µ)

∣∣∣∣1

µ− yi+g′′c (µ)

g′c(µ)− g′′(µ)

g′(µ)

∣∣∣∣ <∞,

lim supµ→m1,m2

yi − µ

g(µ)

g′c(µ)

g′(µ)≤ 0,

imply geometric ergodicity, with V (γ) = f (γ|y)−1/2.

Langevin-Hastings:(Robert and Tweedie,1996)Poisson-log model (∇(γ) = −γ + QT(y − exp(s))) notgeom. ergodic.

Truncated Langevin-Hastings

∇(γ)trunc = −γ +QTR(γ)

where R(γ) is a bounded function. Geometricallyergodic with V (γ) = exp(t‖γ‖).

113

Page 116: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

MCMC for Bayesian inference

Parameters in underlying Gaussian model:S ∼MVN(DTβ, σ2κ(φ)).S = DTβ + σκ(φ)1/2Γ, Γ ∼ Nn(0, I).Metropolis-within-Gibbs algorithm:

• Update Γ from [Γ|y, β, log(σ), log(φ)](Langevin-Hasting described earlier)

• Update β from [β|Γ, log(σ), log(φ)] (RW-Metropolis)

• Update log(σ) from [log(σ)|Γ, β, log(φ)] (RW-Metropolis)

• Update log(φ) from [log(φ)|Γ, β, log(σ)] (RW-Metropolis)

Prediction:

• Simulate (s(j), β(j), σ2(j), φ(j)), j = 1, . . . ,m(using MCMC)

• Simulate s∗(j) from [S∗|s(j), β(j), σ2(j), φ(j)],j = 1, . . . ,m (multivariate Gaussian)

Discrete prior for φ is an advantage (reduced com-puting time).Thinning not to store a large sample of high-dimensional quantities.

114

Page 117: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Marginalisation Integrating out β and σ2.

Conjugate priors:

β | σ2 ∼ N(mb ; σ2Vb

)

1/σ2 ∼ 1

S2c

χ2 (nc)

[scaled-inverse-χ2(nc, S2c )]

Marginalisation

• Posterior distribution is f (s|y) = f (y|s)fI(s),where fI is the marginalised density for S.

• Marginalisation makes [S∗|S] a multivariate-tdistribution.

Procedure:

• Simulate s(1), . . . , s(m) from [S|y] (usingMCMC).

• Simulate s∗(j) from [S∗|s(j)], j = 1, . . . ,m(multivariate-t distribution)

• Approximate E[T (S∗)|y] by

1

m

m∑j=0

T (s∗(j))

115

Page 118: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Parameter estimation for generalizedlinear spatial model

• Methods of moments (empirical vario-gram/covariogram)

• Approximate methods: pseudo-likelihood (accu-racy unknown).

Monte Carlo approximation to likelihoodGeyer and Thompson, 1992 ; Geyer, 1994.

Define f (y) = f (y | s)f (s) for some density f (s).

L(θ) =

IRnf (y | s)f (s | θ)ds ∝

IRn

f (y | s)f (s | θ)f (y | s)f (s)

f (s | y)ds

= E[f (S | θ)f (S)

| y],

where E[· | y] is expectation w.r.t f (s | y).• How to chose f (s) ?

– Could chose f (s) = f (s | θ0) for some θ0.– Instead we use f (s) = fI(s | ψ0) for someψ0, where θ = (β, σ2, ψ), and fI(s | ψ0) is themarginalised density fI(s | ψ0) =

∫ ∫fI(s |

β, σ2, ψ0)π(β, σ2)dβdσ2 (using conjugate prioron (β, σ2)).

• Very useful as exploratory tool; exploring othercorrelation functions, anisotropy, etc.

116

Page 119: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Priors

S ∼ Nn(Dβ, σ2R(φ))

Non-informative priors:

• π(σ2) ∝ 1/σ2 commonly used, but gives improperposterior !

• π(φ) must be proper.

• Thesis gives conditions for proper posterior.

Conjugate priors:

β | σ2 ∼ N(mb ; σ2Vb

)

1/σ2 ∼ 1

S2σ

χ2 (nσ)

[scaled-inverse-χ2(nσ, S2σ)]

Marginalisation

• Conditional distribution becomes f (s|y) ∝f (y|s)fI(s),where fI is the marginalised density of S.

• Marginalisation makes [S∗|S] a multivariate-tdistribution.

Computationally advantageous in MCMC-algorithm.

Conjugate prior for φ ?117

Page 120: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

PART VI:

Further topics and conclusions

1. Multivariate methods

2. Non-linear methods

3. Space-time models

4. Marked point processes

5. Closing remarks

Page 121: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Multivariate methods

• within linear Gaussian setting, extension tomultivariate data is straightforward in princi-ple

• but specification of a useful class of defaultmodels for cross-covariance structure is notstraightforward

• detailed implementation should be problem-specific: for example,

– (Y1, Y2) qualitatively different, and measuredat a common set of locations?

– Y2 a low-cost surrogate for Y1 of primary inte-rest?

119

Page 122: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Non-linear methods

• non-linear mean response µ(x) with Gaussianerrors – straightforward in principleExample: plume model for dispersal of pollu-tant

• intrinsically non-linear systems specified bysystem of stochastic differential equations – achallenging problemExample: soil conductivity/transmissivity

120

Page 123: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Space-time models

• Emerging space-time data-sets are big, andpresent severe computational challenges.

• Specific models are best defined in context.

• Some examples:

1. Calibration of radar reflectance againstground-truth rainfall intensity (Brown, Dig-gle, Lord and Young, 2001).

(a) Yit : i = 1, ..., n – ground-truth log-rainfallintensity at small number of sites xi

(b) U(x, t) : x ∈ A – log-radar reflectance mea-sured effectively continuously over a studyregion A

(c) Empirical model,

Yit = α +B(xi, t)U(xi, t)

where B(x, t) is continuous space-timeGaussian process

(d) Spatial prediction,

Y (x, t) = α + B(x, t)U(x, t)

121

Page 124: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

2. On-line disease surveillance (Brix and Dig-gle, 2001)

(a) Data give population density λ0(x) (appro-ximately), plus locations of daily incidentcases

(b) Model space-time point process of incidentcases as Cox process:– Poisson process with intensity

λ(x, t) = λ0(x) exp{α + Z(x, t)}

– Space-time Gaussian process Z(x, t) mo-dels variation in disease risk

– Interested in early detection of changesin the risk surface,

λ(x, t)/λ(x, t−1) = exp{Z(x, t)−Z(x, t−1)}

122

Page 125: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Marked point processes

Definition: a joint probability model for a stochas-tic point process P , and an associated set of ran-dom variables, or marks, M

Different possible structural assumptions:

• [P,M ] = [P ][M ]

The random field model – often assumedimplicitly in geostatistical work.

• [P,M ] = [P |M ][M ]

Preferential sampling – sampling locations aredetermined by partial knowledge of the un-derlyingmark processExample. deliberate siting of air pollution mo-nitors in badly polluted areas.

• [P,M ] = [P ][M |P ]

Often appropriate when the mark process isonlydefined at the sampling locations.Example. Presence/absence of disease amongstindividual members of a population.

Implications of ignoring violations of the randomfield model are not well understood.

123

Page 126: MODEL BASED GEOSTATISTICS - LEG-UFPRpaulojus/MBG/slides/mbg.slides.pdf · 2003-06-09 · MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1 Departamento

Closing remarks

• all models are wrong, but some models are use-ful

• whatever model is adopted, inferential procedu-res which respect general statistical principlesare likely to out-perform ad hoc procedures

• ignoring parameter uncertainty can seriouslyprejudice nominal prediction intervals

• the Bayesian paradigm gives a workable inte-gration of parameter estimation and stochasticprocess prediction, but results can be sensitiveto joint prior specifications.

• the best models are developed by statisticiansand subject-matter scientists working in colla-boration.