chapter kri

7/28/2019 Chapter Kri

1/24

Chapter 5

Objective Mapping and Kriging

Most of you are familiar with topographic contour maps. Those squiggly lines represent locations

on the map of equal elevation. Many of you have probably seen a similar mode of presentation

for scientific data, the iso-ness of those lines are comparable. What many of you are probably not

familiar with are the mathematics that lie behind the creation of those maps and their uses.

5.1 Contouring and gridding concepts

This chapter covers the question: what do you do when your data is not on a regular grid?

This question comes up frequently because computers can only draw, for example, contour lines

if they know where to draw them. Often, a contouring package will grid your data for you using

a default method of generating a grid and this will be acceptable. But theres more to it than

making it easy for computers to draw contour lines. Nevertheless, different gridding methods will

produce different looking maps. How, then, can we objectively decide between these different

results? The answer to this question is, of course, dependent upon the problem you are working.

5.1.1 Data on a regular grid

There is a straightforward way to contour irregularly spaced data: Delaunay triangularization. The

individual data points are connected in a network of triangles that have the following properties.

The triangles formed are nearly equiangular and the longest side of each triangle is as short aspossible. We surround our irregularly spaced data points with an irregularly shaped polygon such

that every point inside the polygon is closer to our enclosed data point and every data point outside

the polygon is closer to some other data point. These irregular polygons are known as Thiessen

polygons and the surrounding Thiessen polygons also enclose data points. Straight lines drawn

from only neighboring Thiessen data points creates a Delaunay triangular network. The location of

the contour line along these triangularization lines is then computed by a simple linear interpolation

(see Fig. 5.1). This approach is OK for producing contour maps, but is difficult to use for derived

103


2/24

104 Modeling Methods for Marine Science

products (gradients, etc.) and is compute intensive. Furthermore, if you were to sample at different

locations you would get a different contour map.

Figure 5.1: An example of what a triangularization grid looks like. Choosing the optimal way

to draw the connecting lines is a form of the Delaunay triangularization problem.

Better then to put your data onto a regular, rectangular grid. A regular grid is easier for the

computer to use, but is more difficult for the user to generate. But the benefits to be gain for this

extra trouble are large.

5.1.2 Methods: nearest neighbor, bilinear, inverse square of distance, etc.

Nearest Neighbor: is a method that works in a way you might expect from the name. The grid

value ( ) is estimated from the value of the nearest neighbor data point. The distance from a gridpoint to the actual data points is given by:

(5.1)

where the index is for a number indicating grid number (in a sequential sense) and in this case

refers to sequential numbers identifying the actual data points.

(5.2)


3/24

Glover, Jenkins and Doney; 9 May 2005 DRAFT 105

Equations 5.1 and 5.2 are the bare bones of nearest neighbor formulation. Sometimes this

method is augmented by the N-nearest neighbors (see Fig 5.2):

(5.3)

This method of generating grids is of particular use for filling in gaps in data already on a regular

grid or very nearly so.

Bilinear Interpolation: is a method that is frequently referred to as a good enough for govern-

ment work method. The value at the grid point is an interpolation product ( ) from the following

formulas for a 2-dimensional case:

(5.4)

(5.5)

and:

(5.6)

and:

(5.7)

(5.8)

where are the actual data points surrounding the grid point (sometimes called node). But

this method is best used for interpolating between data already on a grid. This method can be

augmented and there are the logical extensions such as bicubic interpolation which yields higher

order accuracy, but suffers from over- and under-shooting the target more frequently.Inverse distance: is actually a class of methods that weight the data points contribution to the

grid point by the inverse of the distance between the grid point and data point (sometimes this

weight is raised to a power, 2, 3, or even higher if there is a reason). This is basically Eqn 5.3

with the raised to the power mentioned. This method is fast, but has a tendency to generate

bulls-eyes around the actual data points.

Kriging: is a method to determined the best linear unbiased estimate of the grid points. We

will discuss this in greater detail in section 5.4. This method is very flexible, but requires the user


4/24


++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Z1

+

+

+

+ Z2

Z3

Z4 Z

8

^

x1

x2

Figure 5.2: An example of a regular grid. The lines connecting the -points are the distances

that could be calculated, the dashed lines indicate distances too large for the data to be expected

to have any significant influence on the grid point value. For example, the N-nearest neighbors

method ( ) estimation of grid point would use the points , and ; for the simpler

nearest neighbor method . This is a two dimensional example with axes and .

to bring a priori information about the data to the problem. This information takes the form of

a variogram of the semivariances and there are several models of variograms that can be used.

Typically, real data is best dealt with a linear variogram unless there is rasonable amount of data to

derive a robustvariogram (more in sections 5.3 and 5.4).

5.1.3 Weighted averaging

Several of the above methods can also have weighting added to improve the fidelity of the grid to

the actual data. Consider Eqn 5.3 (N-nearest neighbors), this equation can have a weighting factor

added to the numerator to increase the influence of some data points over others on the value of

the grid points. Typically the weighting is done with some idea of the uncertainty in the individual

data points themselves (such as ).


5/24


(5.9)

5.1.4 Splines

We dont plan on covering splines per se. Like many of the topics covered in this course, splines

are a course onto themselves. But we would be remiss if we did not mention them here. Splines

got their start as long flexible pieces of wood or metal. They were used to fit curvilinearly smooth

shapes when the mathematics and/or the tools were not available to machine the shapes directly

(i.e. hull shapes and the curvature of airplane wings).

Since then, a mathematical equivalent has grown up around their use and they are extremely

useful in fitting a smooth line or surface to irregularly spaced data points. They are also useful

for interpolating between data points. They exist as piecewise polynomials constrained to have

continuous derivatives at the joints between segments. By piecewise we mean, if you dont

know how/what to do for the entire data array, then fit pieces of it one at a time. Essentially then,

splines are piecewise functions for connecting points in 2 or 3 dimensions. They are not analytical

functions nor are they statistical models, they are purely empirical and devoid of any theoretical

basis.

The most common spline (there are many of them) is the cubic spline. A cubic polynomial can

pass through any four points at once. To make sure that it is continuously smooth, a cubic spline

is fit to only two of the data points at a time. This allows for the use of the other information to

maintain this smoothness.

If you consider Fig. 5.3 there are four data points ( , and ). Cubic polynomials are

fit to only two data points at a time ( to , to , etc.). By requiring the tangent of

at to be equal to the tangent of at , we can write a series of simultaneous equations

and solve for the unknown coefficients. See Davis (1986) for more details and M ATLABs spline

toolbox (based on deBoor, 1978).

There are a number of known problems with splines. Extrapolating beyond the edges of thedata domain quite often yields wildly erratic results. This is because there is no information beyond

the data domain to constrain the extrapolation and splines are essentially higher order polynomials

which will grow to large values (positive or negative). Closely spaced data points can develop

aneurysms. In an attempt to squeeze a higher order polynomial into a tight space large over- and

under-shoots of the true function can occur. These problems also occur in 3-D applications of

splines. However, if a smooth surface is what you are looking for, frequently a spline (see spline

relaxation in other texts) will give you a good, usable smooth fit to your data.


6/24


Figure 5.3: A cubic polynomial is fit piecewise from to , to , etc. Because only two

points are used at any one time, the additional information from the other points can be used to

constrain the tangents to be equal at the intersections of the piecewise polynomials, for example at

.

5.2 Moving Averages

Sometimes it is possible to put your data onto a regular grid through various averaging schemes.

One of the most common is the moving average. These averaging schemes are an outgrowth

of a school of thought largely credited to mining operations in France and South Africa and are

precursors to kriging, the main topic of this segment. Each averaging scheme applies some variant

of the following mathematical equation:

(5.10)

where the grid estimate ( ) is the sum of a weighting scheme ( ) times the actual observations

( ). The nature of varies as we have seen in the first part of this segment ( -nearest neighbors,

inverse of the distance, inverse of the square of the distance, etc.).


7/24


5.2.1 Block Means

The first and simplest of these averaging techniques is the block mean. This technique involvesdividing your field area (containing somewhat randomly located samples) into equal area/volume

blocks. Consider a two-dimensional field divided into nine sub-areas, or blocks, of equal area.

1

2

3

4

5

6

78

9

+

Figure 5.4: A hypothetical study area divided into nine equal sub-areas or blocks. The red points

represent actual data sampling locations and the blue represents just one out of several grid

points an estimate is desired. As shown in Eqn 5.11, the value at the blue can be estimated as

the weighted sum of the means of the surrounding blocks.

An estimator for the center of this design is then given by equation 5.11 and each sub-area

can be estimated by making it the center of its own 3-by-3 block.

(5.11)

Here the s are the weights applied to the block means. These weights are determined by a

number of methods, some of which are outlined in Section 5.1.3 or from field data that allows the

inversion of the system of equations in Eqn 5.11.

One drawback to this approach is that although the mean of the block is relatively independent

of the size of the block (once the block is above a certain, data dependent, size), the variance of the


8/24


block estimate tends to increase with increasing block size. It is quite possible that the variance

of the block estimate may be too large to make the estimate of much use in your investigation.Your block size can go either way, smaller: not enough data to be realistic; larger: all the structure

averaged out (see discussion of stationarity below).

5.2.2 Moving average design

To produce estimates with lower variance and increase the reliability of the estimate we can use

a variation of the above block mean called a moving average. This moving average is a variation

upon the design of the block averaging. Once the study area has been divided into blocks, these

same blocks can be re-divided to give you more s in Eqn 5.11, consider Fig 5.5.

1

2

3

4

5

6

78

9

+

Figure 5.5: The same study are as in Fig 5.4, but with the area surrounding the point to be estimated

divided into four new areas indicated by the shaded areas. Instead of having only nine block

averages to work with, Eqn 5.11 will have 13. The red data points have been rmoved from the

figure for clarity and the new blocks have not been numbered.

It is left to the readers imagination as to how other geometries could be used to divide and

re-divide the study area into blocks for estimating the blue cross. Keep in mind that only one blue

cross was shown for demonstration purposes, but that each block has its own blue cross that is

estimated in a fashion similar to the one we just discussed.


9/24


The averages from the blocks, the s, can also be weighted, or windowed. The results, in

histogram form, are shown in Fig 5.6. Now in Eqn 5.11 the s are not equal over the entire block,but rather are a function of how distant the block centroid is from the point to be estimated.

a b

Figure 5.6: The effects of two kinds of windowing on averaging (there are many forms of window-ing, these are just examples of two). In a) a simple boxcar type of of windowing was applied to the

data, in this case the windows are all of an equal width producing a classical looking histogram.

In b) the windows are tapered, like a Gaussian, making the data points closer to the point of esti-

mation more important (of greater weight) than points farther away. As an example compare the

shaded areas, which enclose the data and their weights used to estimate the point being estimated.

5.2.3 Trend surface analysisThere are two aspects of trend surface analysis that are important for gridding your data and kriging

(which may sound redundant, but there are subtle differences between the two). In the first case,

by fitting a trend surface to your data you can use the fit function to re-sample your data field on a

regular grid. This reflects an interest in the trend surface itself. In the second case, you may want

to remove a trend surface from your data before proceeding with the kriging operation.

Sometimes it is desirable or just convenient to have a function that represents your data in

terms of the coordinate system of your study area (e.g. in terms of the longitude and latitude).


10/24


In these cases it is possible to make your study variable a function of your coordinate system.

You are, in fact, fitting atrend surface

to your data in terms of the coordinates that you use tolocate your samples. It can be a trend surface of any order and rank; meaning that the trend can

be order (a straight line, a flat plane) or order (quadratic curve, surface or hyper-surface).

The order refers to the highest power any independent variable is raised to, the rankrefers to the

dimensionality. You can set up the equations and solve them with either normal equations or the

design matrix, in certain advanced cases you may need to apply the non-linear fitting technique of

Levenberg-Marquardt. Or, in most cases, you can use a handy little m-file Bill Jenkins wrote up

called surfit.m. It uses the repetitious nature of higher and higher order polynomials and the

SVD solution to the normal equations to fit surfaces to your data of the form:

(5.12)

Most grid generation schemes work best when , in order to accomplish this it is im-

portant to remove any trend surface from your data first. At the very least you should remove a

first order, -dimensional surface ( refers to the rank of your coordinate system) from your data

before proceeding to run your grid generation routine. You can always add it back in to your grid

estimation points because you now have an analytical equation that relates your property to the

coordinate system of your study area. Higher order, -dimensional, surfaces can also be fitted.

The higher order you go, the better your fit will be regardless of what you use as a goodness-of-fit

parameter. But keep in mind the better fit may not be statistically significant and you can use

ANOVA to test for this (see also Davis, 1986, pp 419-425).

5.3 Variograms

At the heart of kriging is the semivariogram or structure function of the regionalized variables that

you are trying to estimate. This amounts to the a priori information that you must supply to the

software in order to make a regular grid out of your irregularly spaced data. Basically the idea is to

have an estimate of the distance one would need to travel before data points separated by that much

distance are uncorrelated. This information is usually presented in the form of the variogram, in

which the semivariance is a function of distance or lag ( ).

5.3.1 Regionalized variables

Simply put, a regionalized variable is a variable that can be said to be distributedin space. This

space is not limited to the three-dimensional kind of space that we move around in every day, but

can be extended to include time, parameter space, property space, etc. This definition distributed

in space is purely descriptive and makes no probabilistic assumptions. It merely recognizes the

fact that properties measured in space follow an irregular pattern that cannot be described by a

mathematical function. Nevertheless, at every point in a space it has a value ( ,


11/24


where is equal to the dimensionality of your space). A regionalized variable is typically repre-

sented as and the grid point estimate of it as . A regionalized variable, then, seems tohave two contradictory characteristics:

a local, random, erratic aspect which calls to mind the notion of a random variable;a general (or average) structured aspect which requires a certain functional representation.

Hence we are dealing with a naturally occurring property (variable) that has characteristics

intermediate between a truly random variable and completely deterministic variable. In addition,

this variable (property) can have what is known as a drift associated with it. These drifts are

generally handled with trend surface analysis and can be analyzed for and subtracted out of the

data much the same way an offset can be subtracted out of a data set.

5.3.2 Semivariance

First remember the definition of variance:

(5.13)

in most cases the variance of a data set is a number (scalar). The semivariance is a curve (vector)

derived from the data according to:

(5.14)

where the asterisk indicates an experimental variogram computed from the data and is the lag

distance between data point pairs. There also are theoretical semivariograms which model the

structure of the underlying correlation between data points, such as the exponential model:

(5.15)

where equals the nugget, equals the silland equals the range of the semivariogram model.

5.3.3 The nugget, range and sill

These three parameters define the semivariogram:

Nugget ( ): Represents unresolved, sub-grid scale variation or measurement error and is seen on

the variogram as the intercept of the variogram.


12/24


Range ( ): The scalar that controls the degree of correlation between data points, usually repre-

sented as a distance.Sill ( ): The value of the semivariance as the lag ( ) goes to infinity, it is equal to the total variance

of the data set.

Given the two parameters range and sill and the appropriate model of semivariogram, the

semivariances can be calculated for any . These quantities can be best visualized in Fig 5.7, a

simple exponential model of semivariance.

0 5 10 15 20 25 300

1

2

3

4

5

6

7

8

9

10

(h)

Lag (h)

Exponential Semivariogram

Figure 5.7: A simple exponential semivariogram with a range of 5 and a sill of 10.

The constant offset ( ) added to the theoretical semivariance models is known as the nugget

effect. This constant accounts for the influence of high concentration centers in the data that pre-

vent the experimental semivariogram from passing through the origin. This model has its begin-nings with mining geologist who were looking for nuggets of gold, which were rarely sampled

directly, hence the unresolved or sub-sampling grid scale variability.

There are several models of semivariance to pick from, the trick is to pick the one that best fits

your data. We will mention, later on in our discussions of kriging and cokriging, that if you are

estimating the semivariogram experimentally (i.e. from actual data) often the linear modelseems

to give the best results. But there seems to be quite a bit of debated over what is the universal

model. You have already seen the exponential model, there are also the:


13/24


spherical model - which rises to the sill value more quickly than the exponential model, the gen-

eral equation for it looks like:

(5.16)

Gaussian model - is a semivariogram model that displays parabolic behavior near the origin (un-

like the previous models which display linear behavior near the origin). The formula that

describes a gaussian model is:

(5.17)

linear model - in this model the data do not support any evidence for a sill or a range and rather

appear to have increasing semivariance as the lag increases. This is a key sign that the proper

choice is the linear model. In these cases the linear model is concerned with the slope and

intercept of the experimental semivariogram. It is given simply as:

(5.18)

and the slope ( ) is nothing more than the ratio of the sill ( ) to the range ( ).

5.3.4 2 Order Stationarity

Data fields are said to be first order stationary when there is no trend, i.e. the mean of the field is

the same in all sub-regions. This is easily accomplished by fitting and removing a trend surface

to/from the data (if you know what the trend is in the first place). Second order stationary data

field are realized when the variance is constant from one sub-region to the next. We say the data

(actually really the residuals) are homoscedastic, that is to say, equally scattered about a mean of

zero.

5.3.5 Isotropic and anisotropic data

The easiest semivariance model to envision of your data is when the sill and range values are

always the same, regardless of the direction being considered. But that is not always the case andit is often found that data display anisotropic behavior in their range. Nevertheless, if the data is

second order stationary, the sill will be the same in all directions. If it is not, then this is a warning

that not all the large-scale structure has been removed from the data. Consider again an exponential

model but now look at the difference revealed when the semivariances are calculated only in the

north-south direction compared to only in the east-west direction (Fig 5.8). Knowledge of these

anisotropies is necessary when designing an appropriate semivariogram model of your data prior

to kriging.


14/24


0 5 10 15 20 25 300

1

2

3

4

5

6

7

8

9

10

(h)

Lag (h)

EastWest

NorthSouth

Figure 5.8: Two semivariograms showing the presence of anisotropies in the data. In this case the

range and sill for the east-west direction is 5 and 8, but in the north-south direction they are 3 and

8.

5.3.6 Robust semivariogram

There will be times when you will hear references to a robust semivariance estimator. This idea

was championed by Noel Cressie and is dealt with in some detail in his book (Statistics for Spatial

Data). Basically it is a variant on Eqn 5.14 that accounts for the effects of outliers in your data.

Outliers (data in the tails of your data distribution that fall outside Gaussian expectations) have a

tendency to distort the results of Eqn 5.14. Cressie has put forward the following equation to make

the experimentally determined semivariogram less sensitive to these outliers (hence, robust).

(5.19)

While somewhat overwhelming looking, upon inspection we see that this is just Eqn 5.14

modified. By taking the absolute value of the difference between two data points separated by

a distance , then taking its square root, dividing by the number of data pairs separated by the


15/24


distance , and then raising the results to the fourth power we diminish the effects of these outliers.

The denominator is nothing more than anormalization

to make gamma unbiased. This form ofthe experimental semivariogram is very useful in cases where we have a lot of data to estimate

the semivariogram from and outliers can become an irksome problem; although this equation also

works on lower data densities.

5.4 Kriging

Kriging is a method devised by geostatisticians to provide the best local estimate of the value of

the mean value of a regionalized variable. Typically this was an ore grade and was motivated

by the desire to extract the most value from the ore deposit with the minimum amount of capital

investment. The technique and theory of geostatistics has grown since those early days into afield dedicated to finding the best linear unbiased estimator(BLUE) of the unknown characteristic

being studied.

5.4.1 Variogram criticality

While one can use ANOVA to determine at what level a trend surface is significant, there is still

something of an art to determining the correct variogram model to use. Using ANOVA as a guide

one can fit trend surfaces to the data of ever increasing order, eventually your ANOVA will tell

you that at some specified level of significance, a trend surface of that order does not provide a

statistically significant increase in the fit to the data. Thats where you stop, at order minus one.

The weights and neighborhood of the trend surface analysis is dependent upon the semivarianceof the data, i.e. dependent upon the structure function the data displays. This interdependency

between trend surface and semivariance means that there is no unique solution/combination of

trend and semivariance, hence the art. The degree of spatial continuity of the data (regionalized

variable) is given by the semivariogram (see section 5.3) and some of the types of models used are

covered in section 5.3.2.

5.4.2 Punctual kriging

To explain what kriging is, we am going to concentrate on the simplest form of kriging, punctual

kriging. Consider that you want to find:

(5.20)

That is to say, find the best linear weighted (unbiased) estimate for property at point (note

thios is a capital ). In addition suppose you also want to know what the estimation error is as

well:

(5.21)


16/24


That is to say, you want to know the difference between what you estimated is and what it

really is, a quantity we usually dont know ( ). There is a way to do this by requiring that theweights sum to one, this will result in an unbiased estimate if there is no trend. You can then

calculate the error variance as:

(5.22)

It seems only logical that the closer a data point is to the grid point you wish to estimate the

more weight it should carry. These weights used ( ) and the error of estimate ( ) are related to

through the semivariogram. So, if we had three data points from which to estimate one grid

point (as in Fig 5.9), we would have:

(5.23)

for the estimate and:

(5.24)

for the weights. The question that remains is: how do we find the best set of s? Consider

Fig 5.9, here we have three data (control) points and from them we wish to make a best linear

unbiased estimate of the -field at grid point .

Using the semivariogram we can create the following sets of equations:

(5.25)

(5.26)

where is the semivariance over the distance between control points and and

is the semivariance over the distance between the control point and the grid point . With

Eqn 5.24 we have three unknowns and fourequations (remember Eqn 5.24) and to force Eqn 5.24

to always be true we add a slack variable resulting in a matrix set of equations like:

(5.27)

This yields the and one more equation:

(5.28)


17/24


x1

x2

Z1

Z2

Z3

ZP

^

++

+

+1

3.3

2

3.4

4.8

2.9

Figure 5.9: Showing the layout of three control points and the grid point to be estimated in example

5.1. The distances ( ) between control points (dashed lines) used to calculate the left hand side

of Eqn 5.29 and the distances from control points to used to calculate the righthand side are

given.

yields the error of estimate.

Now we want to point out that something really cool is happening here. If you stop and think

about it you may wonder why the weights should apply to both the data points and the semi-

variances. We shouldnt have any problem considering Eqn 5.23, after all its just the best linearly

weighted combination of the surrounding data points. But what about Eqn 5.25? Why should these

also be true? Well, strictly they arent, not until you add the slack variable ( ) that allows Eqn 5.24to always be true. What insight does this give you into the nature of regionalized variables? Well

let you ponder that for a while.

Now it sometimes happens that you dont want to or cant remove the trend surface prior to

kriging. It is still possible to come up with a best linear unbiased estimate of your grid points using

Universal Kriging, the matrix you form is even more complicated than the one in Eqn 5.27 and is

covered in Davis, Chapter 5.


18/24


Table 5.1: Example 5.1

Coordinate (km) Coordinate (km) Water Elevation (m)Well 1 3.0 4.0 120

Well 2 6.3 3.4 103

Well 3 2.0 1.3 142

Your site 3.0 3.0 ???

5.4.3 An example

This example is taken from Davis, Chapter 5 and addresses only the concept of punctual kriging.

Suppose you wanted to dig a well and wanted a good estimate of the elevation of the water tablebefore you began digging. Suppose further that you had three wells already dug distributed about

your proposed site ( ) much in the same fashion as are the control points in Fig 5.9. Given the

data in Table 5.1, you can use punctual kriging to make a best linear unbiased estimate of the water

table elevation at your proposed site.

From this information and a structure analysis (semivariogram) you can fill out the equations

in Eqn 5.27 and then solve for the water table elevation at your site. The semivariogram analysis

revealed a linear semivariogram out to 20 km with an intercept of zero and a slope of 4 m /km. So

the matrices in Eqn 5.27 look like:

(5.29)

The numbers on the left-hand side of the equation come from the semivariances between con-

trol points calculated by knowing the distance between them and the linear semivariogram model.

The numbers on the right-hand side of the equation come from knowing the distance between the

proposed site and each control point and the linear semivariogram model.

If the condition number of the matrix on the left-hand side isnt too bad, you can invert directly

to solve for the Ws and lambda, otherwise you can use SVD to solve for the answer. Either way

in this case you get a column vector of:

(5.30)

which when multiplied through Eqn 5.23 gives an estimate of 125.3 m and using Eqn 5.28 yields an

error estimate of 5.28 m . The square root of this number represents one standard deviation (2.30


19/24


m), which represents the bounds of 68% confidence. So plus or minus two times this standard

deviation yields the elevation of the water table at your proposed site with a 95% confidence.MATLABs answers are a little different from the ones in Davis (1986), but we attribute that

to the fact that Davis does not use singular value decomposition to invert his matrices (see Davis,

1986, Chap 3).

5.5 Cokriging with MATLAB

We have been very fortunate to obtain from Denis Marcotte (via e-mail) copies of the m-files

published in Marcotte (1991). This section of the lecture notes covers material on how to use this

program. Although not covered in lecture, this very powerful program will extremely useful to anyof you that must make objective grids of their data during your careers.

The concept ofcokriging is nothing more than a multivariate extension of the kriging technique

we went over in class and is covered in lecture notes section 5.4. Instead of going through all of

the machinations necessary for kriging one property at a time, we do all of the properties we wish

to grid in one calculation. In addition, covariance information about the way properties related

to each other is used to improve the grid estimation and reduce the error associated with the grid

estimates.

5.5.1 Estimating the variogramAlong with the types of variograms estimated in lecture notes section 5.3, cross-variograms are also

necessary. These are logical extensions of the variograms we have already dealt with. Remember

the semivariance is provided by:

(5.31)

The cross-semivariance is given by:

(5.32)

where refers to the number of data pairs that are separated by the same distance , when

you have the definition of the semivariogram. One interesting thing about the cross-semivariance

is that it can take on negative values. The semivariance must, by definition always be positive, the

cross-semivariance can be negative because the value of one property may be increasing while the

other in the pair is decreasing.


20/24


5.5.2 The coregionalized model

As we discussed earlier, a regionalized variable is a variable that is distributed in space, where themeaning of space can be extended to include phenomena that are generally thought of as occurring

in time. A regionalized phenomena can be represented by several inter-correlated variables, for

example, lead-zinc deposits or nutrients in the ocean. Then there may be some advantage to study

them simultaneously, this is an extension of the regionalized variable theory to mulitvariate space

and is what amounts to a coregionalized model. We can see from Eqn 5.32 that the cross-variogram

is symmetric in ( ) and ( ), which is not always the case in the covariance matrix formed

from the data.

5.5.3 Using cokri.m

In this section we will try to give you our best understanding of the program cokri.m. In this

way we hope to make the simplest and most straightforward application of this program available

to you while opening the possibility of future, more complicated uses, to you as well.

Sometimes the easiest way to understand a program is to understand, as best as possible, what

the input and output variables are. But first lets define some of the indices we will be using

when talking about the parts of the cokri.m: represents the number of data points (real ones,

not estimated ones), represents the number of dimensions you are working with (in the water

table example above you have and coordinates, hence , remeber the elevation of the

water table was your regionalized variable), lowercase represents the number of properties you

are working with (again in the example above there was only water table elevation, so ),

represents the total number of grid points (nodes) that you are working with, represents thenumber of variogram models you are working with. Now for the input and output variables, in the

case ofcokri.m they are as follows:

Input:

x this is the by matrix of data points. In this program refers to the total number of

sample locations (stations, well locations, etc.), refers to the number of properties you are

estimating, the dimensionality of the problem being studied (1-D, 2-D, 3-D, etc.).

x0 is the by matrix of points on your grid that you will be kriging (cokriging) onto. In this

program in the number of grid points, e.g. if you are working a 2-D problem and decide

to put your estimates onto a 21 by 57 point grid, will be equal to 1197, is, however,

represented as a 1197 by 2 matrix of coordinate doublets.

model is perhaps one of the trickiest variables in the program. It represents half of the core-

gionalization model to be used. If you are using only one model in 3-D, then model is a 1

by 4 matrix wherein the first column is the code of the model (1=nugget, 2=exponential,

3=gaussian, 4=spherical, 5=linear). The remaining columns of the variable represents

the range of your model in the , , and directions. A special note should be made here


21/24


about the use of a linear model. As stated in the help information of cokri.m, the ranges in

a linear model are arbitrary so that when they are dividedinto

the also arbitrary sill values inthey produce the slope of the linear semivariogram model being used for that linear model

in that direction.

c is the variable containing the by sills of the coregionalization model in use. In this program

is used to represent the number of variogram models being applied to the problem. For

example, one might wish to combine the effects of a nugget model with a linear model for

three properties in 3-dimensions, then , is a 2 by 4 matrix, and is a 6 by 3 matrix

of numbers. A nugget model is indicated when the intercept of a semivariogram model is

not zero and that intercept value is put in the first by sub-matrix of to correspond to the

first model row of the model variable.

itype is a scale variable indicating the type of cokriging to be done. In five different values

just about everything is covered, from simple cokriging to universal cokriging with a trend

surface of order 2. In general, simple cokriging should be used when the mean of the data is

known and the data field is globally stationary in its mean as well as locally stationary in it

variance.

avg, in his paper (Marcotte, 1991) states that this variable is not used, but later in one of his

examples he uses it. We cannot get the program to run unless we provide a 1 by matrix of

the averages of the individual properties being cokriged when doing simple cokriging.

block is a 1 by vector of the size of the to be cokriged. If we were certain of the volumeof our individual samples we could use something other than point kriging, i.e. any positive

values will work in that case.

ndis a 1 by vector of the discretization grid for cokriging, if using point cokriging make them

all ones.

ival is a scalar describing whether or not cross-validation should be done and how. We find it

easier and quicker to run the program with set to zero for no cross-validation.

nk is a scalar indicating the number of nearest neighbors of the input matrix to use in estimating

the cokriged grid point. This is a difficult parameter to give hard and fast rules for deciding

how large to make this. You may wish cokri.m to use all of the data points and set this

scalar to a very large number, on the other hand you may wish for only local effects to factor

into the weighted estimates for the grid point. If you dont get satisfactory results the first

time around, increase or decrease this number.

radis a scalar that describes the radius of search for the nearest neighbors in , clearly they are

interrelated and one helps constrain the other. Additionally, it is clear here that the units of

the coordinates need to be in the same units, if not, standardization helps.


22/24


ntok a scalar descibing how many groups of grid points in will be cokriged as one. When

is greater than one, the points inside the search radius will be found from the scentroid location.

Output:

x0s is, of course, your answer. It is a by matrix of the grid point estimates. The

columns correspond to the grid point coordinates given in and the columns correspond

to the estimates of the properties at those grid point coordinates.

s is a by matrix of the error estimates of the grid points. This is the big benefit to

kriging in that it provides you with not only an estimate of a propertys value at a grid point,

but also an estimate of the uncertainty in that estimate.

sv is a 1 by vector of variances of points in the universe.

idis a ( ) by 2 matrix of the identifiers of the (or, in Davis, ) weights for the last cokriging

system solved (i.e. the last grid point system of equations).

lis a (( ) minus ) by ( ) matrix with the (or ) weights and Lagrange multipliers

of the last cokriging system solved. In this program refers to the number of constraints

applied to the simple cokriging system.

A word of caution, for some reason, Marcotte has set up cokri.m to turn off case sensitivity.

When the program is finished running variables Axb and axb are considered the same and makingreference to a variable such as Axb will generate a variable or function not found

error. Simply issue the command casesen and case sensitivity will be restored. We have modi-

fied the code we provide to you by simply commenting out the casesen off command with

%casen off, so you neednt worry about this at first (but it is available, just remove the % sign).

5.5.4 Things to Remember

When using cokri.m it may be helpful to remember the following three insights as to how the

program works.

1. If the data have been properly detrended (rendered second order stationary), then it is only

logical to assume that the nuggets will be equal regardless of direction. As the lag goes

to zero the subgrid scale noise (composed of both real geophysical noise and measurment

error) will converge to the same value for all directions.

2. In a similar argument, if the data have been properly detrended, then the sills have to be

equal regardless of direction for each property (and cross-property). Think about the mostly

gridded example in class, the sill represents the total variance in the anomalies (residuals


23/24


= data minus trend). Just because you calculated the semivariances in different directions

doesnt mean you havent used all of the data points, since youve used all of the data andthe sill represents the total variance contained in the data (anomalies), they will also be equal

regardless of direction.

3. This last one may seem a little odd. The ranges should all be the same in a given direction,

regardless of the property or cross-property. Think of it this way, the decorrelation scale

length is always the same in a given direction; the medium (seawater, granitic batholith, etc.)

doesnt change eventhough the property might.

Now, of course, weve told you how difficult it is to render your data second order stationary

and the above insights might not be strickly, numerically true. Your options are to return to your

trend surface analysis and see if you cant find a better filter to remove the large scale trend that

is contaminating your anomalies. Or, if the fitted parameters are close in value (remember that

nlleasqr.m gives you error estimates of these parametes), averaging them can still yield useful

results. Remember, the sill and nugget are averaged over directions, but the ranges are averaged

over properties.

5.6 Problems

All of your problems sets are served from the web page:

http://eos.whoi.edu/12.747/problem_sets.html

which can be reach via a number of links from the main course web page. In addition, the date the

problem set comes out, the date it is due, and the date the answers will be posted are also available

in a number of locations (including the one above) on the course web page.

References

Clark, I., 1979, Practical Geostatistics, Elsevier, New York, 129 p.

Cressie, N.A., 1993, Statistics for Spatial Data, Wiley-Interscience, New York, 900 p.

Davis, J.C., 1986, Statistics and Data Analysis in Geology, 2 Edition. John Wiley and Sons,

New York, 646 pp.

deBoor, C., 1978, A Practical Guide to Splines, Springer-Verlag, New York, 392 p.

Marcotte, D., 1991, Cokriging with MATLAB, Comp. and Geosci., 17(9): 12651280.


24/24


chapter kri

Documents