a general framework ffor error analysis iin measurement-bbased …good/papers/407.pdf ·...

32
A general framework for error analysis in measurement-based GIS ------Part 2 1 A General Framework fFor Error Analysis iIn Measurement-Bbased GIS, Part 2: The Algebra-Based Probability Model fFor Point-iIn-Polygon Analysis. Yee Leung Department of Geography and Resource Management, Center for Environmental Policy and Resource Management, and Joint Laboratory for Geoinformation Science, The Chinese University of Hong Kong, Hong Kong E-mail: [email protected] Jiang-Hong Ma Faculty of Science, Xi’an Jiaotong University and Chang’an University, Xi’an, P.R. China E-mail: [email protected] Michael F. Goodchild Department of Geography, University of California, Santa Barbara, California, U.S.A. E-mail: [email protected] Abstract. This paper is Part 2 of a four-part series of our research on the development of a general framework for error analysis in measurement-based geographic information systems (MBGIS). In this paper, we discuss the problem of point-in-polygon analysis under randomness, i.e., with random measurement error (ME). It is well known that overlay is one of the most important operations in GIS, and point-in-polygon analysis is a basic class of overlay and query problems. Though it is a classic problem, it has, however, not been addressed appropriately. With ME in the location of the vertices of a polygon, the resulting random polygons may undergo complex changes, so that the point-in-polygon problem may become theoretically and practically ill-defined. That is, there is a possibility that we cannot answer whether a random point is inside a random polygon if the polygon is not simple and cannot form a region. For the point-in-triangle problem, however, such a case need not be considered since any triangle can always forms its an interior or region. To formulate the general point-in-polygon problem in a suitable way, a conditional probability mechanism is first introduced in order to accurately characterize the nature of the problem and establish the basis for further analysis. For the point-in-triangle problem, four quadratic forms in the joint coordinate vectors of a point and the vertices of the triangle are constructed. The probability model for the point-in-triangle problem is then established by the identification of signs of these quadratic form variables. Our basic idea for solving a general point-in-polygon (concave or convex) problem is to convert it into several point-in-triangle

Upload: others

Post on 29-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

1

A General Framework fFor Error Analysis iIn Measurement-Bbased

GIS, Part 2: The Algebra-Based Probability Model fFor Point-iIn-Polygon Analysis.

Yee Leung Department of Geography and Resource Management, Center for Environmental Policy

and Resource Management, and Joint Laboratory for Geoinformation Science, The Chinese University of Hong Kong, Hong Kong

E-mail: [email protected]

Jiang-Hong Ma Faculty of Science, Xi’an Jiaotong University and Chang’an University, Xi’an, P.R. China

E-mail: [email protected]

Michael F. Goodchild Department of Geography, University of California, Santa Barbara, California, U.S.A.

E-mail: [email protected] Abstract. This paper is Part 2 of a four-part series of our research on the development of a general

framework for error analysis in measurement-based geographic information systems (MBGIS). In this

paper, we discuss the problem of point-in-polygon analysis under randomness, i.e., with random

measurement error (ME). It is well known that overlay is one of the most important operations in GIS,

and point-in-polygon analysis is a basic class of overlay and query problems. Though it is a classic

problem, it has, however, not been addressed appropriately. With ME in the location of the vertices of

a polygon, the resulting random polygons may undergo complex changes, so that the point-in-polygon

problem may become theoretically and practically ill-defined. That is, there is a possibility that we

cannot answer whether a random point is inside a random polygon if the polygon is not simple and

cannot form a region. For the point-in-triangle problem, however, such a case need not be considered

since any triangle can always forms its an interior or region. To formulate the general point-in-polygon

problem in a suitable way, a conditional probability mechanism is first introduced in order to

accurately characterize the nature of the problem and establish the basis for further analysis. For the

point-in-triangle problem, four quadratic forms in the joint coordinate vectors of a point and the

vertices of the triangle are constructed. The probability model for the point-in-triangle problem is then

established by the identification of signs of these quadratic form variables. Our basic idea for solving a

general point-in-polygon (concave or convex) problem is to convert it into several point-in-triangle

Page 2: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

2

problems under a certain condition. By solving each point-in-triangle problem and summing the

solutionsm up, the probability model for a general point-in-polygon analysis is constructed. The

simplicity of the algebra-based approach is that from using these quadratic forms, we can circumvent

the complex geometrical relations between a random point and a random polygon (convex or concave)

that one has to deal with in any geometric methods when the probability is computed. The theoretical

arguments are substantiated by simulation experiments.

Keywords: algebra-based probability model, approximate covariance-based error band, point-in-

triangle, point-in-polygon, quadratic form

1. Introduction

A traditional method of geographical analysis is to lay a map of one theme over the a map of

another. Such a process is commonly called overlay (or overlay functions). Overlay is the operation of

comparing variables among multiple coverages, requiring both graphic and attribute comparisons, and

is one of the most powerful features of the GIS. Vector overlays are methodologically and technically

more complex than raster overlays and usually produce complex output files with more nodes, arcs

and polygons than the original files. Geometry is used to define new objects in a topological sense.

Overlay does not always involve comparisons between polygons, since points and lines are often

involved. In terms of point, line and polygon, we usually have three types of problems: (1) point-in-

polygon, (2) line-in-polygon, and (3) polygon-on-polygon. When performing overlays, it is often

important to know whether a point of the first layer lies within a polygon of the second layer. Point-in-

polygon operations are used to compare a map of a point distribution with a map of regions, or to

entertain a query of whether a point is in a polygon. The point-in-polygon query in GIS has been

formally discussed in Leung and Yan (1997) under nine basic situations where points and polygons

can be precise, fuzzy (imprecise), and random (with error).

Blakemore (1984) has discussed the point-in-polygon relation under the epsilon band concept, i.e.,

a precise point in an epsilon-band. Based on the geometric relation of points and polygons, Stanfel et

al. (1995) have developed two computationally simple approximation schemes to calculate the

Page 3: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

3

probability that a precise point is inside an uncertain polygon having as vertices a set of points with

coordinates determined by various measurement schemes. Monte Carlo simulation, on the other hand,

is a technique for the modeling of uncertainty in spatial data. It can quantify uncertainty in spatial data

and applications by determining the possible range of an application result. A spatial statistical model

that represents multiple statistics simultaneously and weighted against each other is proposed in

Ehlschlaeger (2002). With the exception of the conceptual sketch made in Leung and Yan (1997), it

appears that indepth theoretical analysis of the point-in-polygon issue when points and polygons both

haveare both with random errors has not been dealt with in the literature (Rigaux et al. 2002).

Testing whether a point is inside a polygon is not only a common operation in spatial and GIS

applications, but it is also a basic operation in computer graphics (Rigaux et al., 2002). Algorithms

such as the crossings test (or ray-polygon intersection test), the angle summation method, the bins

method and the grid method have been developed over the years. The triangle test is another method

by which a polygon is treated as a fan of triangles emanating from one vertex and the point is tested

against each triangle by computing its barycentric coordinates. A faster triangle fan test is to store a set

of half-plane equations for each triangle and test each in turn. Most of these methods developed in

computer science are fast but need much larger memory and higher initialization times (see Haines,

1994 for details). However, these methods are generally not suitable for error analysis in GIS because

they usually do not deal with the uncertainty caused by locational errors of the vertices, and it is not so

easy to perform probability analysis in their formulations.

To appropriately solve the point-in-polygon problem when polygons are havewith random errors,

a fundamental concept should first be clarified. When locations of the vertices of a polygon are

random and when the polygon is formed by vertices listed in a specific order (this is a common

representation of a polygon), the randomly generated polygon can undergo complex changes.

Theoretically, even if the variances of the locational errors are not large, there is a possibility that we

have no basis to answer whether a random point is inside a random ‘polygon’ if such the ‘polygon’

cannot form itshas no well-defined interior or region. Under this situation, the point-in-polygon

problem is ill-defined in the strict sense. It complicates the point-in-polygon problem and makes it

Page 4: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

4

more difficult to solve. The complication and difficulty are largely due to the (1) complex relations

between random points and random polygons, and (2) convexity/non-convexity of polygons.

Fortunately, for random triangles the point-in-triangle problems can always be well-defined since any

triangle has its interior or region. This sheds light on solving the problem through polygon

triangulation.

To formulate appropriately the general point-in-polygon problem under randomness, and to set up

the corresponding probability model, we will first introduce a conditional probability mechanism to

accurately characterize the nature of the problem and to use it as the basis for further analysis. We will

try to address the point-in-polygon problem from the algebra-based point of view so that we can

circumvent the complications arising from polygon convexity which have to be carefully dealt with in

any geometric methods. Since any polygon, concave or convex, can be triangulated, we will also

employ the decomposition approach like that in computer graphics, to decompose the point-in-

polygon problem into several elementary point-in-triangle problems. The advantages of the above

approach are (1) it legitimizes the point-in-polygon analysis under randomness; (2) it greatly

simplifies the problem by avoiding the issues of polygon convexity, and (3) it renders a probability

estimation for the problem.

In this part of the series, we first discuss the relationship between a point and a triangle in section

2. Four quadratic forms in the joint coordinate vectors of a point and the vertices of a triangle are

introduced in order to identify whether the point is inside the triangle. The probability that the point is

inside the polygon is then computed by the identification of signs of the four quadratic forms, giving a

concise expression for the probability model. Accordingly, a triangle model and a general polygon

model for the point-in-polygon problem under ME are constructed. To substantiate the theoretical

arguments, several simulation experiments are discussed in section 3. We then conclude the paper with

a summary in section 4.

Page 5: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

5

2. An Algebra-Bbased Probability Model for Point-in-Polygon Analysis

In general, a polygon consists of an ordered series of vertices linked by edges. An n-sided polygon

means a closed plane with n sides. Polygons can be classified into simple and non-simple (or

complex). A simple polygon satisfies these two conditions: (1) all adjacent edges have only a single

shared point; and (2) all non-adjacent edges do not intersect. There is a well-defined bounded interior

(surrounded by edges) and unbounded exterior for a simple polygon. Since simple polygons are basic

building blocks in vector-based GIS, the polygons we consider henceforth will be simple, unless

specified otherwise. Vertices of a simple n-sided polygon can always be arranged into a certain

ordered series: 01V , 0

2V , 0 , nV , ni ,,2 ,1 = (clockwise or counter-clockwise); in reverse, the polygon

can be uniquely determined by the order of the n vertices when their locations are given. Let the

polygon be denoted as )( 001 nVVPoly and the region formed by its interior and boundaries be

denoted as )( 001 nVVR . For convenience, the coordinates of a point are written as a column vector

throughout, and the vector is denoted in bold face, e.g. x, 1x . The random vector corresponding to x is

denoted in bold uppercase, e.g. X, 1X . A point V with coordinate vector x is denoted by )(xV ,

sometimes, written as ),( 21 xxV .

The general point-in-polygon problem when points and polygons are both random amounts to

addressing whether an uncertain point V is in the region of an uncertain polygon )( 21 nVVVRR ≡ ,

where iV are the vertices of the polygon R, ni ,,2 ,1 = . (For simplicity and without confusion, we

henceforth use iV for both the singular and plural form of iV (i.e., rather than using iV ’s for plural),

and the same applies to all other relevant symbols). Within the probability framework, it is equivalent

to computing the probability that point V is in the region of polygon R. This concept actually implies

an underlying condition, that is, a random polygon must have its region (area). If such this condition

does not hold, the statement “point V is in the region of polygon R ” is ill-defined. Naturally, the

probability description becomes problematic and we cannot answer the point-in-polygon query. We

know that in vector-based data, the randomness of a polygon is characterized by the randomness of

their its vertices whose coordinates are randomly generated (usually from a certain distribution). For

Page 6: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

6

an underlying polygon, the number and the order of the vertices are fixed. Accordingly, it cannot be

theoretically guaranteed that any random samples or variations of the polygon ( 3>n ) will form a

region or will be simple even if the underlying polygon is simple. For example, the underlying

polygon shown in Figure 2.1a is a simple quadrilateral )( 04

01 VVPoly (see Fig. 2.1(a)). However, the

random polygon )( 41 VVPoly generated from four random points is not simple since its two edges

intersect (the intersection is nevertheless not a vertex!). Moreover, the random point V corresponding

to the original point 0V makes the point-in-polygon problem ill-defined. However, such a problem

does not exist in point-in-triangle analysis since a triangle always has its region although the relative

topological relationship among vertices may change under randomness.

01V 1V 3V

4V

2V02V

03V

04V

0V

V

Fig. 2.1 A simple polygon and its randomly generated version, a non-simple polygon

Therefore, in order to study the probability that a random point is inside a random polygon, the

concept that a point is inside a random polygon must be well-defined. In other words, we have to

guarantee that the region of a random polygon can be formed before the point-in-polygon problem

under randomness can be discussed. This is quite different from other cases of the point-in-polygon

analysis discussed in Leung and Yan (1997). This problem will be theoretically ill-defined without the

determination of the region.

In practice, the above case may seldom occur if random error is sufficiently small. If we can

almost surely confirm (which may be difficult to do) that a random polygon has its region or is simple,

the probability that a random point is inside a random polygon can be viewed as absolute probability.

(a) a simple polygon (b) a random polygon

Page 7: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

7

It should be a conditional probability otherwise. This is thus a salient theoretical and practical issue of

fundamental importance.

Let T21 ),( XX≡X and T

21 ),( iii XX≡X denote respectively the random coordinates vectors of

the point V and the vertices iV , ni ,,2 ,1 = . Thus the above absolute probability can be expressed as

][ RVP ∈ [ ]∫ ∫ =∈= ),,,(),,,(|))()()(()( 21212211 nnnnVVVRVP xxxXXXXXXX nnRf xxxxxx d d d ),,,( 2121 ⋅ nVVVRV vnR

nnff xxxxxxxx

xxxx d d d ] d )( [ ),,,( 21))()()(()(21

2211

∫∫∫ ∫ ∈= ,

where Rf is the joint probability density of all vertex coordinates of R, and vf is the joint probability

density of the coordinates ),( 21 XX of point V. It can be observed that the multiple integral in the

middle of the expression at on the right hand side involves the region ))()()(( 2211 nnVVVR xxx .

Obviously, if it is not a region, the probability cannot be computed. Even if it is computable, it may be

very difficult and complex because the multiple integrals involved may have complex domains.

Taking all of the above into consideration, in order to compute this probability, we need a new

representation of the probability ][ RVP ∈ , and a theoretically and practically sound probability

model to solve the point-in-polygon problem under randomness.

2.1 Point-in-triangle probability model

In the point-in-polygon problem, only the point-in-triangle problem may have an absolute

probability model. We first describe the relationship between a point and a triangle and then formulate

the corresponding probability model.

2.1.1 Identification of the relationship between a point and a triangle

In this subsection, we only focus on the establishment of a relationship between a point and a

triangle without random errors. In other words, the variables or vectors involved are not random. This

will form a basis for the point-in-polygon analysis under ME.

We start with the relationship between a point and a triangle. Let the three vertex positions of the

underlying triangle 321 VVV∆ be iV , 3 ,2 ,1=i , the corresponding coordinate column vectors be

Page 8: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

8

2T21 ),( Rxx iii ∈=x , 3 ,2 ,1=i , and 2T

21 ),( Rxx ∈=x be the coordinate column vector of any point V

in the plane. First, we define two basic functions ) , (det ⋅⋅f and ) , , ( ⋅⋅⋅h as follows:

Definition 2.1 For any column vectors 2,, R∈zyx , we define || ),det(),(det yx yxyx =≡f , (2.1)

),(),,( det zyyxzyx −−≡ fh . (2.2)

According to the properties of a determinant, it is easy to check the following properties:

(i) ),(),( detdet zyzxzyyx −−=−− ff ,

(ii) ),,(),,(),,( yxzxzyzyx hhh == ,

(iii) ),,(),,( yzxzyx hh −= , ),,(),,( xyzzyx hh −= , ),,(),,( zxyzyx hh −= .

Lemma 2.1 A point V is located in a triangle 321 VVV∆ if and only if there are three non-negative

numbers iλ , 3 ,2 ,1=i such that

332211 xxxx λλλ ++= , (2.3)

1321 =++ λλλ , 0≥iλ , 3 ,2 ,1=i . (2.4)

That is, the coordinate column vector of the point V is a convex combination of coordinate column

vectors of three vertices in the triangle 321 VVV∆ . (The proof is given in Appendix 1)

The conclusion can be generalized. In general, the convex hull of a set of n points =nS

{ :)( iii VV x= ni ,...,1 = } can be expressed into as that set of points whose coordinate column vectors

x satisfy

i

n

iixx ∑

=

=1

λ , 11

=∑=

n

iiλ , 0≥iλ , T

21 ),( iii xx=x , ni ,...,1= .

It should be noted that the equation (2.3) can be expressed intoas

31212121111 )1( xxxx λλλλ −−++= , 32212221212 )1( xxxx λλλλ −−++= ,

or 3112312113111 )()( xxxxxx −=−+− λλ ,

3222322213212 )()( xxxxxx −=−+− λλ . (2.5) This is a linear system with respect to the unknown parameters 1λ and 2λ . Since the area of the

triangle 321 VVV∆ is equal to the absolute value of the following determinant

= 111

21

3231

2221

1211

xxxxxx

21

32223121

32123111

xxxxxxxx

−−−−

),(21

3231det xxxx −−= f , (2.6)

the determinant will not be equal to zero if the triangle is formed. Thus we can obtain the unique

solution according to the Cramer’s Rule

Page 9: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

9

),(),(

3231det

323det1 x

xxxxxxx

−−−−

=∗

ff

λ , ),(

),(

3231det

331det2 x

xxxxxxx−−−−

=∗

ff

λ . (2.7)

Corollary 2.1 The coefficients in the expression (2.3) of Lemma 2.1 are unique whenever they

exist.

For arbitrary point V and a given triangle 321 VVV∆ , we may define the following discriminant

functions:

Definition 2.2 For a given triangle 321 VVV∆ with vertex coordinates vectors ix , 3 ,2 ,1=i ,

we define

),,()( 321 xxxx hh ≡ , (2.8) ),,()( 312 xxxx hh ≡ , (2.9) ),,()( 213 xxxx hh ≡ . (2.10)

Accordingly, (2.7) can be rewritten as

),,(

)(

321

11 xxx

xh

h=∗λ ,

),,()(

321

22 xxx

xh

h=∗λ . (2.11)

In addition, we have

=−−

=−−= ∗∗∗

),,()()(),,(

1321

21321213 x

xxx

xxxxh

hhhλλλ

),,()(

321

3

xxxx

hh

.

In fact, | | | | | | )()(),,( 31322322121321 xxxxxxxxxxxxxxxx x −−−−−−−−=−− hhh

)( | | | | | | 32131321 xxxxxxxxxxxxx h=−−=−−−−−= .

Thus, in terms of Lemma 2.1 and Corollary 2.1 we have

Proposition 2.1 Let )( 321 VVVR be the triangle region formed by the vertices )( iii VV x= ,

3 ,2 ,1=i . Then

“ )()( 321 VVVRVV ∈= x ” ⇔ “ 0≥∗iλ , 3 ,2 ,1=i ”

⇔ “ )(xih and ),,( 321 xxxh have the same sign, 3 ,2 ,1=i ”,

where the symbol “⇔ ” denotes “if and only if ”.

In particular, when the vertices of the triangle 321 VVV∆ are listed in an order such that the area is

circulated in counter-clockwise fashion, its area (see (2.6)) is 0),,( 32121 ≥xxxh . In this case, we have

“ )()( 321 VVVRVV ∈= x ” ⇔ “ 0)( ≥xih , 3 ,2 ,1=i ”.

Remark 1. It is obvious that )(xih is a linear function in the components of x, and 0)( =xih

represents a straight line through the points jV ( ij ≠ ). For example, 0)(3 =xh represents a straight

line through the points 1V and 2V since 0)()( 2313 == xx hh . Thus 0)( ≥xih represents a half-plane.

Since each edge corresponds to such a half-plane, the point V is inside the triangle region )( 321 VVVR if

Page 10: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

10

and only if it is inside the intersection of three half-planes 0)( ≥xih , 3 ,2 ,1=i . Fig. 2.2 is such a

geometric interpretation.

It should be noted that the sign of ),,( 321 xxxh is fully decided by the triangle 321 VVV∆ and is

independent of the point V. Once the triangle is given, the sign of ),,( 321 xxxh will be settled. The

signs of )(xih ( 3 ,2 ,1=i ) are sufficient to test whether the point )(xV is inside the triangle 321 VVV∆ .

Remark 2. In this paper, when we say that a set of values has the same sign, it means that all of

them are non-negative or all of them are non-positive. In such a sense, we can say that 1, 2, 0.5, and 0

have the same sign.

2.1.2 Quadratic form representations for the discriminant functions

Now we will give quadratic form representations for the discriminant functions )(⋅ih and

) , , ( ⋅⋅⋅h . We first rewrite ) , (det ⋅⋅f into as

|| ),det(),(det yx yxyx =≡f yHx 0T

1

2211221

22

11 ) ( =

=−==y

yxxyxyx

yxyx

, (2.12)

where the matrix

≡0110

0H , (2.13)

which has the properties that 0T0 HH −= and 2

20 IH −= . That is, 0H is a skew-symmetric and

orthogonal matrix, and it has no real eigenvalues and eigenvectors. Its geometric meaning is a 90°

rotation transformation, i.e., the vector 0Hx ≡′ x can be obtained by rotating x by 90° about the origin

in clockwise way fashion (see Fig. 2.3) since

°−=

−=

≡ 90 , cossinsincos

0110

0 θθθθθ

H .

Page 11: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

11

V3

V1

V2V

03 >h

03 <h03 =h

Fig. 2.2 A geometric interpretation of the h-functions

2x

1x

x

xHx 0≡′

O°90

We then derive the quadratic form representation for ) , , ( ⋅⋅⋅h in the joint vector. For any vectors

2,, R∈zyx , define the pooled vector and unit coordinate vectors in 3R as follows:

× zyx

x1 6)3( ,

001

1e ,

010

2e ,

100

3e .

Then we have

)3(2T1 )( xIex ⊗= , )3(2

T2 )( xIey ⊗= , )3(2

T3 )( xIez ⊗= ,

where the symbol “⊗ ” denotes the Kronecker product of matrices (see Turkington, 2002), and 2I is a

22× identity matrix.

),(),,( det zyyxzyx −−≡ fh )()( 0T zyHyx −−=

)3(2T

320221T

)3( ])[( ])[( xIeeHIeex ⊗−⊗−=

)3(0T

3221T

)3( }]))([( { xHeeeex ⊗−−=

)3(00T

)3( ][ xHΔx ⊗=

,

where ≡0Δ T

3221 ))(( eeee −− . It should be noted that

=−≡ T000 ΔΔΔ

−−

−=−+−+−

011101110

)()()( T31

T13

T23

T32

T12

T21 eeeeeeeeeeee . (2.14)

Therefore, =),,( zyxh )3(0

T00

T)3(2

1)3(

T0000

T)3(2

1 ])[(}][]{[ xHΔΔxxHΔHΔx ⊗−=⊗+⊗

)3(0T

)3(21

)3(00T

)3(21 ][ xHxxHΔx =⊗= , (2.15)

where 000 HΔH ⊗≡ is symmetric since T0

T0

T0 HΔH ⊗= 000 HHΔ =⊗= ( 0

T0 ΔΔ −= , see (2.14)).

The expression (2.15) indicates that ),,( zyxh is a quadratic form in the joint vector )3(x .

Furthermore, the functions )(⋅ih and ) , , ( ⋅⋅⋅h can be expressed into as quadratic forms in ix

( 3 ,2 ,1=i ) and x. Let the 18× pooled vector )4(x be defined by

) ( TT3

T2

T1

T)4( xxxxx ≡ . (2.16)

Fig. 2.3 A geometric interpretation of 0H

Page 12: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

12

Then

)4(1

3

2)3(1 xCxxx

x =

≡ , where

0I0000I0I000

C

2

2

2

1 .

In terms of (2.15), we have ),,()( 321 xxxx hh = )3(10T

)3(121 xHx= )4(10

T1

T)4(2

1 xCHCx= . It can be derived

from simple computation that 110T1 HCHC = , where

011 HΔH ⊗≡ ,

−−

−≡

011010101100

0000

1Δ . (2.17)

Thus we get )(1 xh )4(1

T)4(2

1 xHx= . (2.18)

In general, the following expressions can similarly be derived:

)4(T

)4(21)( xHxx iih = , 3 ,2 ,1=i (2.19)

)4(4T

)4(21

321 ),,( xHxxxx =h , (2.20) where

0HΔH ⊗≡ ii , 4 ,3 ,2 ,1=i , (2.21)

−−

01011001

00001100

2Δ ,

−−

0011000010011010

3Δ ,

−−

0000001101010110

4Δ . (2.22)

At the same time, we may check that iH ( 4 ,3 ,2 ,1=i ) are symmetric (see Remark 3). So all )(⋅ih and

) , , ( ⋅⋅⋅h can be expressed into as quadratic forms in the joint vector )4(x and have a unified

expression (see (2.19) and (2.20)). From this point of view, Proposition 2.1 can be rewritten in a more

concise way as follows:

Proposition 2.2 Let )4(T

)4( xHx iiz ≡ , where )4(x and iH are defined by (2.16) and (2.21)

respectively. Then

“ )()( 321 VVVRVV ∈= x ” ⇔ “ iz , 4 ,3 ,2 ,1=i , have the same sign” .

Remark 3. The matrices iH ( 4 ,3 ,2 ,1=i ) in (2.21) are symmetric. Each of them has rank of 4,

i.e., 4)(rank =iH , and the eight eigenvalues corresponding to each matrix are 23 , 2

3 , 23− , 2

3− ,

0, 0, 0, 0. (the proof is given in Appendix 2)

2.1.3 Probability model for point-in-triangle analysis under measurement error

Assume that three true vertex positions of the underlying triangle are )(0iiV μ , where

T21 ),( iii µµ≡μ are the corresponding coordinate vectors, 3 ,2 ,1=i , and the true position of a given

Page 13: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

13

point is )(0 vV μ , where T21 ),( vvv µµ≡μ . Due to random errors, we cannot observe the true values of

these coordinates, instead what we can observe are the coordinates of random points )( iiV X and

)( vV X with random errors:

iii εμX += , 3 ,2 ,1=i and vvv εμX += , (2.23)

where T21 ),( iii XX≡X , 3 ,2 ,1=i and T

21 ),( vvv XX≡X , the T21 ) ,( iii εε≡ε and T

21 ) ,( vvv εε≡ε

are respectively the random error vectors associated with the corresponding random vertices and the

point V, with zero mean and variance-covariance matrix (for simplicity, called the covariance matrix

henceforth) iΣ , vΣ , denoted as ),( ~ ii Σ0ε , 3 ,2 ,1=i , ~vε ),( vΣ0 . Since the model (2.23) can

be viewed as the direct measurement error (ME) model in Leung et al. (2003a), the random errors iε

and vε can be called the MEs.

It is obvious that the triangle with vertices )( iiV X ( 3 ,2 ,1=i ) and the point )( vV X are random

because of the randomness of ME. Therefore, whether V is inside the triangle region )( 321 VVVR will

be a random event. According to Proposition 2.2, the probability of such an event can be computed as:

)]([ 321 VVVRVP ∈ ,4 ,3 ,2 ,1 ,[ == iZP i having the same sign], (2.24)

where quadratic forms iZ , 4 ,3 ,2 ,1=i , are random variables, the joint random vector )4(X consists

of the coordinate vectors of iV ( 3 ,2 ,1=i ) and V, they are respectively defined by

)4(T

)4( XHX iiZ ≡ , 4 ,3 ,2 ,1=i , ≡T)4(X ) ( TT

3T2

T1 vXXXX . (2.25)

It should be pointed out that as the distributions of ME are usually continuous, the probability that

the determinant in (2.6) is zero will be zero, i.e., 1] 0) , ,([ 321 =≠XXXhP . We consider only the case

in which the triangle can be formed.

In addition, if we define the new random variables

}{min41min ii

ZZ≤≤

≡ , }{max41max ii

ZZ≤≤

≡ , 1maxminΔ−≡ ZZZ (2.26)

we then have

“ 4 ,3 ,2 ,1 , =iZ i , having the same sign” ⇔ “ minZ and maxZ having the same sign”

⇔ “ 01maxminΔ ≥≡ −ZZZ ”.

Thus (2.24) becomes

)]([ 321 VVVRVP ∈ 0][ Δ ≥= ZP . (2.27)

Remark 3. It can be checked that the random variable ΔZ is invariant with respect to the scale of

the coordinates, i.e., if the joint vector )4(X in (2.25) is transformed into )4(Xc , where 0>c is the

scale parameter, the value of ΔZ is then unchanged. In other words, ΔZ is independent of the

Page 14: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

14

measurement scale. Furthermore, when the point V is fixed and the triangle is random, (2.27) still

holds.

Therefore, whether we use one random variable ΔZ or four random variables iZ ( 4 ,3 ,2 ,1=i ),

the probability )]([ 321 VVVRVP ∈ can always be expressed in a concise way (see (2.24) and (2.27)).

This is our probability model for the point-in-triangle problem under ME.

2.2 The algebra-based probability model for point-in-polygon analysis

Without loss of generality, we now, on the basis of the triangle model discussed above, deduce in

here a probability model for general simple polygons (convex or concave). on the basis of the triangle

model discussed above. Since triangles are the most essential and simplest geometric objects that we

can use to form any polygon, we will transform the polygon problem into a multiple- triangle problem.

The proposed model can circumvent the discussion on the convexity of a polygon needed in geometric

procedures.

Let the true vertices of the underlying simple n-sided polygon be )(0iiV μ , and the true position of

a given point be )(0 vV μ , where T21 ),( iii µµ≡μ and T

21 ),( vvv µµ≡μ are the true coordinate vectors

of the corresponding points, ni , ,1 = . Under the effect of random errors, the polygon )...( 21 nVVVR

and the point V are random. From the theoretical and practical points of view, the approaches to solve

the ill-defined point-in-polygon ( 3>n ) problem may be as follows:

Approach (1): We discuss the point-in-polygon problem only when the random polygons can form

their regions. For random polygons without regions, there is no logical basis to discuss the problem.

Under this required condition, we can perform the point-in-polygon analysis when points and

polygons are random and compute the corresponding probability (strictly speaking, it is a conditional

probability).

Approach (2): We can discuss the point-in-polygon problem only when we can guarantee that at a

high enough confidence level, the generated random polygons are simple or the relative topological

relationships among the vertices of the original polygon can be preserved. Under this situation, it is

Page 15: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

15

natural to summon invoke the concept of the covariance-based error bands for the boundaries of the

true polygon. Using this concept of restricted By the restriction of the bands, we will be able to assign

confidence levels that randomly generated polygons will form its regions at a certain confidence level.

Although the probability obtained by approach (2) is still conditional, its advantage over approach

(1) is that it will give a stricter and finer analysis. So, we employ the second approach to solve the

point-in-polygon problem.

First, we need to compute the confidence level of the covariance-based error band of a line

segment. As discussed in Part 1, it is generally difficult if not impossible to determine this probability.

With reasonable adjustment in the delimitation process, we can however consider an approximate

covariance-based error band so that its confidence level can be described. As depicted in Fig. 2.4, we

replace the region formed by the varying covariance ellipses along a line segment with the region

formed by two new line segments 11BA and 22 BA that are tangent respectively to the two sides of the

confidence ellipses of the endpoints (see Fig. 2.4). The region, denoted by )(}2,1{

~ αR , surrounded by the

two tangent line segments and the two end-point confidence ellipses is thus an approximate

covariance-based error band (denoted as Cov-error band in short). Adopt the notations in Leung et al.

(2003a) and let )(αiR , 2 ,1=i , denote the confidence-ellipse regions of the endpoints with the

confidence level )1( α− . Assume the error vectors 1ε and 2ε corresponding to the coordinates

vectors 1X and 2X of the endpoints are normal and independent. Then αα −=∈ 1])([ )(111 RVP X and

αα −=∈ 1])([ )(222 RVP X . The random line segment L generated from 1X and 2X is the random set

:)({),( 21 XXX ′′≡ VL 21 )1( XXX tt −+=′ , 10 ≤≤ t }.

Define the random events }{ )(111αRD ∈≡ X , }{ )(

222αRD ∈≡ X , and }~),({ )(

}2,1{21{1,2}αRLD ⊆≡ XX .

Note }2,1{21 DDD ⊆ . The probability that the random line segment L is inside the region )(}2,1{

~ αR can

thus be obtained as:

Page 16: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

16

1A 1B

2A2B

1V 2V

][ ]|[ ][ ]|[ ][ ]~),([ 2121}2,1{2121}2,1{}2,1{)(}2,1{21 DDPDDDPDDPDDDPDPRLP ⋅+⋅==⊆ αXX

][ ][ ][ ][ ]|[ 21212121}2,1{ DPDPDDPDDPDDDP ⋅==⋅>

2)1( α−= .

Therefore, when the confidence levels of the endpoints of a line segment are )1( α− , the confidence

level of the resulting approximate Cov- error band is at least 2)1( α− . The band )(}2,1{

~ αR is thus a result

of a small sacrifice in the precision of the actual covariance-based error band for the gain in being able

to give the region a probability, the confidence level. So, it is not as accurate as the actual covariance-

based error band since its area is made larger. Nevertheless, it is simpler and it possesses the following

advantages:

(1) Unlike the covariance-based error band discussed in Part 1, it has a lower-bound description of

the confidence level which can be determined in advance.

(2) It is effective since it is an approximation to the covariance-based error band. In general, it is

conceptually and practically more appropriate than the conventional epsilon error band which

possesses no probability arguments.

(3) Its geometrical shape is simpler than the covariance-based error band and it can easily be

formed.

Fig. 2.4. An approximation to the covariance-based error band

(a) a covariance-based error band

(b) the approximate Cov-error band

(c) difference of bands in (a) and (b)

Page 17: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

17

(4) More importantly, it gives us a unified probability description of GIS operations when

covariance-based error bands are involved. Particularly, it provides a sound basis for the point-

in-polygon analysis under ME.

However, we must stress that the above advantages can be realized only when the error vectors of

the endpoints are independent and normally distributed. The independence condition guarantees the

derivation of the lower bound of the confidence level. The normality condition determines the level

and the elliptical shapes of the confidence regions of the endpoints. Therefore, when the error vectors

of the endpoints are not independent or not normal, the approximate Cov-error band defined above

will not possess the above said properties.

Extendinged on this concept, we can define the uncertainty of the boundary of a polygon as a

collective entity (called the approximate Cov-error region) consisting of the approximate Cov-error

bands of all relevant edges under the condition that the confidence ellipses of all vertices have the

same confidence level )1( α− . Obviously, for a n-sided polygon, the confidence level of such a

confidence region is at least n)1( α− . Although it may appear to be a rather conservative, it enables us

to give a probability description of a random polygon. Thus, when the locational error vectors

corresponding to the vertices of a polygon are normal and independent, the approximate Cov-error

region of its boundary has a lower-bound specification of the confidence level. If the error variance is

not larger than the maximal allowable limits, MAL (see Leung et al (2003a)), the approximate Cov-

error region preserves the topological structure of the original polygon, and the point-in-polygon

analysis under ME will make sense. Thus the probability that a randomly generated polygon is simple

(i.e., it is inside the approximate Cov-error region) is at least n)1( α− .

Remark 4. Therefore our answers to question (b) “Can probability be assigned to an error band? ”

and (c) “What should the error band for a line segment be? ” raised in Subsection 2.3 of Part 1 are that

in the affirmative: yes, we can compute probability for the error band under the concept of an

approximate Cov-error band; and the approximate Cov-error band should probably be an appropriate

construct which can enable us to compute probability for the uncertainty about a line segment, about

polygons thus constructed, and about basic GIS operations such as point-in-polygon analysis.

Page 18: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

18

Finally, under the condition that random polygons are simple, to compute the probability that a

random point V is inside a random polygon )...( 21 nVVVR , we can perform the triangulation of simple

polygons. The basic idea is to first decompose the polygon into triangles, and then apply the triangle

probability model formulated in subsection 2.1.3 to each triangle. By summing these probabilities, we

can obtain the probability for the point-in-polygon problem under ME. Triangulation of simple

polygons is a basic topic in computational geometry (Berg et al., 2000) and spatial database operations

(Rigaux, 2002). It can simply be done by drawing diagonals between pairs of vertices (see Fig. 2.5). A

diagonal is an open line segment that connects two vertices of the polygon and lies in the interior of

the polygon. Strictly speaking, a decomposition of a polygon into triangles by a maximal set of non-

intersecting diagonal is called a triangulation of the polygon (Berg et al., 2000). Triangulations are

usually not unique and can be suitably selected on the basis of need. The following is a basic

conclusion:

V1

3

4

56

2

9

1

23 4

5

6

78

(a) For a concave polygon (b) For a polygon with hole

Fig 2.5 A possible triangulation of a polygon

Lemma 2.2 (Berg et al., 2000) Every simple polygon admits a triangulation, and any triangulation

of a simple polygon with n vertices consists of exactly n −2 triangles.

According to this lemma, the region )...( 002

01 nVVVR of a simple polygon )...( 00

20

1 nVVVPoly can

be triangulated into the following general form:

=)...( 002

01 nVVVR )()()( 0

302

01

023

022

021

013

012

011 mmm VVVRVVVRVVVR +++ , 2−= nm ,

where )( 321 iii VVVR are sub-regions of triangles 321 iii VVV∆ , in which 321 ,, iii VVV are the corresponding

vertices, and )( 321 iii VVVR are mutually exclusive. For a given approximate Cov-error region of a

Page 19: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

19

polygon, a chosen triangulation of the polygon should guarantee that the confidence ellipse region of

each vertex in each partitioned triangle does not intersect the approximate Cov-error band of the edge

determined by the other two points so that the topological structures of each triangle are preserved.

Once such a partition is determined, any random polygon )...( 21 nVVVPoly falling into the approximate

Cov-error region of )...( 002

01 nVVVPoly satisfies

=)...( 21 nVVVR )()()( 321232221131211 mmm VVVRVVVRVVVR +++ . (2.28)

It is apparent from the previous derivation that the probability that (2.28) holds is still at least

n)1( α− . If such partition cannot be found, the widths of the approximate Cov-error bands should be

decreased, that is, the confidence level )1( α− should be decreased. By adjusting )1( α− , the required

partition can be found.

In general, let

{≡A partition (2.28) holds } (2.29)

be a random event. By (2.27) and (2.28), the following (conditional) probability formula for the point-

in-polygon problem holds:

]|)...([ 21 AVVVRVP n∈ ]|)([ ]|)([ 232221131211 AVVVRVPAVVVRVP ∈+∈=

]|)([ 321 AVVVRVP mmm∈++

]|0[ ]|0[ ]|0[ Δ2Δ1Δ AZPAZPAZP m ≥++≥+≥= , (2.30)

where 1max,min,Δ−≡ iii ZZZ is the random variable corresponding to 321 iii VVV∆ . This is the algebra-based

probability model for point-in-polygon analysis when points and polygons are random. A remarkable

feature of the model is that it is a conditional probability model. It is decided by the characteristics of

the random polygons. If we do not use the conditional probability to restrict the possibility of having

the singular behavior of random polygons, the point-in-polygon problem when points and polygons

are random will be ill-defined.

Remark 5. The conditions with whichrequired by the probability model (2.30) is involved are: (1)

random polygons are simple in order to guarantee that the point-in-polygon problem is not ill-defined;

(2) the partition (2.28) holds. These two conditions are given by the probability ][ AP . When a

Page 20: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

20

legitimate partition of )...( 002

01 nVVVPoly on the basis of its approximate Cov-error region is found,

the corresponding probability ][ AP is larger than n)1( α− , i.e., >][ AP n)1( α− . Furthermore,

although (2.30) is derived for a simple polygon, we can indeed see from the above derivation that it

still holds for deformed polygons, e.g., polygons with holes, as long as (2.28) holds. Therefore, for

polygons with holes or region objects consisting of several non-connected polygons, when the confidence ellipse region of each vertex in each partitioned triangle does not intersect the approximate

Cov-error band of the edge determined by the other two points at a certain confidence level, the

conditional probability model (2.30) holds. Such regions can occur in GIS. For example, there is may

be a lake in a forest or there are may be islands in an ocean.

Theoretically, the larger the number of edges of a polygon is, the larger the number of triangles

obtained from triangulation becomes. In this case, choosing a required triangulation of the polygon

may be difficult. From the practical point of view, the number of necessary triangles for a point-in-

polygon problem may usually be reduced since some probabilities in the right hand side of (2.30) may

be approximately zero, i.e., these triangles have very little contributions to the left hand side of (2.30).

For an illustration, Fig. 2.6 (a) is shows the approximate Cov-error regions and the partitioned

triangles of a polygon in Fig. 2.5(a). It can be observed (in Fig. 2.5(b)) that the confidence ellipse of

the point 0V at confidence level )1( α− does not intersect with the approximate Cov-error regions of

some of the triangles. Recalled that “the random event A in (2.29) occurs” means that random

polygons fall into the approximate Cov-error regions of the polygon/corresponding triangles.

Therefore, under this condition, the probability that a random point V is inside random triangles in Fig.

2.6(b) is not larger than 2)1(1 α−− (the proof is simple and is given in Appendix 3). Accordingly, the

point-in-polygon problem for this particular case becomes approximately two point-in-triangle

problems shown in Fig. 2.6(c) and (d) although the true polygon has six partitioned triangles.

Page 21: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

21

Fig 2.6 A reduction of the polygon model

3. Simulation Experiments In this section, we give some simulation experiments to show the applications of the algebra-based

probability models in point-in-polygon analysis.

Example 3.1. Point-in-triangle problem. Let the three true vertices of the triangle be 01V (0,0),

02V (1,2) and 0

3V (3, −1). We consider two cases: (i) the true point )1,2(0V is outside the triangle region

(see Fig.3.1(a)); (ii) the true point )6.0 ,8.1(0V is inside the triangle region (see Fig.3.2(a)). Then for

case (i), we have =T)4(μ ) 1 ,2 ,1 ,3 ,2 ,1 ,0 ,0( − , and for case (ii), we have =T

)4(μ

) .60 ,.81 ,1 ,3 ,2 ,1 ,0 ,0( − . Assume that the random error vector iε , 3 ,2 ,1=i , v, in (2.23) are

independently and identically distributed as a circular normal distribution, where the covariance matrix

is

22

2

2

00 IΣ σσ

σσ =

≡ . (3.1)

Based on the above, we can perform the point-in-triangle analysis by a simulation experiment.

This relation under ME becomes uncertain and only the results of 100 simulations are respectively

(a) Approximate Cov-error regions (b) some Cov-error regions and the of the polygon in Fig. 2.5(a) error ellipse do not intersect

(c) A Cov-error band and the error ellipse (d) Another Cov-error band and the error ellipse intersect intersect

0V

Page 22: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

22

shown in Fig. 3.1(b) and Fig. 3.2(b) for illustration purposes. In terms of various values of the error

variance 2σ , we can use the point-in-triangle probability model (2.27) to compute the probability that

the random point V is in the region of the random triangle )( 321 VVVR . For each case, we actually run

the simulation experiment 10 times with a sample size of 1000 each. The mean results are listed in

Table 3.1 and plotted in Fig.3.3. We can observe that the smaller the variance 2σ get, the smaller is

the uncertainty about whether the point V is in the triangle. As 2σ increases, the uncertainty increases

accordingly.

-1 0 1 2 3X.1

-2

-1

0

1

2

X.2

A

B

Q

C

-1 0 1 2 3

X.1

-2

-1

0

1

2

X.2

(a) A point is outside a triangle (b) Random points and random triangles

Fig. 3.1. Case (i) the true point 0V is located outside the triangle region

-1 0 1 2 3X.1

-2

-1

0

1

2

X.2

B

Q

A

C

-1 0 1 2 3

X.1

-2

-1

0

1

2

X.2

(a) A point is inside a triangle (b) Random points and random triangles

Fig. 3.2 Case (ii) the true point 0V is located inside the triangle region

01V

02V

03V

0V

V

02V

0V

03V

01V

Page 23: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

23

0.05 0.30 0.55 0.80 1.05 1.30 1.55Sequare of Sigma

0.0

0.2

0.4

0.6

0.8

1.0

Case ( i )Case ( ii )

Fig. 3.3 The mean estimations of probability with respect to value changes in the error variance

Table 3.1. Probability estimateions for different values of the error variance in Example 3.1

2σ 0.001 0.005 0.01 0.05 0.25 0.50 0.75 1.00 1.25 1.50

(i) 0.0000 0.0010 0.0109 0.1524 0.3068 0.2830 0.2478 0.2331 0.1990 0.1902

(0.0000 0.0013 0.0023 0.0127 0.0085 0.0158 0.0088 0.0126 0.0110 0.0114)

(ii) 0.9977 0.8960 0.8144 0.6479 0.5242 0.4011 0.3159 0.2625 0.2342 0.2075

(0.0013 0.0101 0.0105 0.0154 0.0116 0.0144 0.0126 0.0125 0.0146 0.0191)

* For each case, the first line represents the mean estimated probability and the second line represents the corresponding standard deviations

Example 3.2 (continuation of Example 3.1) We examine the probability estimation in the point-

in-triangle problem under case (i) in which structures of the error covariance matrices for the four

points are made different. The distributions of the error vectors iε of all points are independent and

normal for each case. Define

≡Σ 2

221

2121

, 21 σσρσσρσσρ

σσ , (3.2)

and choose σΣ in (3.1) with 05.02 =σ and ρσσ 21 ,Σ in (3.2) with 6.0=ρ , 1.01 =σ and 3.02 =σ . As

an illustration, for the triangle 03

02

01 VVV∆ and the point 0V , we carried out the simulation experiments

with respect to case (i) in Example 3.1 only. The five cases of this example are listed in the first

column coded under the common symbol “1,2,3,V ” in Table 3.2, where the first three alphanumeric

characters “1,2,3” denote the corresponding vertices 1V , 2V , and 3V of the triangle and the fourth

character V denotes the point. For each case, we run the simulation experiment 10 times with a sample

Page 24: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

24

size of 10000 in each run. The point-in-triangle probability model (2.27) is used again. Simulation

results in each run, together with the mean of the 10 runs are tabulated in Table 3.2.

Table 3.2. Probability estimations for different error structures of points in Example 3.2*

Case 1, 2, 3, V Sample estimate Mean

(i) + + + + 0.1652 0.1564 0.1498 0.1615 0.1632 0.1548 0.1516 0.1529 0.1670 0.1635 0.1586 (ii) + + + − 0.1648 0.1646 0.1540 0.1567 0.1601 0.1621 0.1572 0.1607 0.1577 0.1594 0.1597

(iii) + + − − 0.1622 0.1615 0.1664 0.1602 0.1636 0.1669 0.1640 0.1594 0.1658 0.1666 0.1637

(iv) + − − − 0.1498 0.1595 0.1574 0.1582 0.1625 0.1587 0.1564 0.1504 0.1581 0.1540 0.1565

(v) − − − − 0.1586 0.1572 0.1517 0.1620 0.1576 0.1565 0.1586 0.1650 0.1575 0.1590 0.1584

* The symbol “+” denotes circular errors for (3.1) and the symbol “−” denotes elliptical errors for (3.2).

It can be observed that estimation results have very little difference for cases (i) and (ii). It shows

that by using two different error structures (3.1) and (3.2) for the random point, it can affect

remarkably in each run the probability that the random point V is inside the random triangle with the

same error covariance matrices for its vertex coordinates. Averaging the 10 runs, the difference of the

mean estimates of the two cases is still noticeable: 0.1597− 0.1586 = 0.0011. Once the error covariance

matrix 3Σ of a vertex ( 3V ) is changed and the others kept unchanged (i.e., case (iii) versus case (ii)),

the effect of the change in error covariance structure of the vertex becomes larger (as indicated in the

rows corresponding to cases (ii) and (iii) in Table 3.2), with the difference of the means: 0.1637− 0.1597

= 0.0040. From case (iii), by making one more change in the error covariance matrix 2Σ of the vertex

2V (i.e., case (iv)), the effect is even more apparent. The difference of the mean estimates, 0.1637−

0.1565 = 0.0072, is indeed the largest. Comparing the results obtained from cases (iv) and (v), however,

the change in the error covariance 1Σ of point 1V has very little effect on the probability

estimate. It is natural since 01V is far away from point 0V . Its effect should be small whether or

X.1

X.2

0 1 2 3

-2-1

01

23

Page 25: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

25

0 1 2 3

X.1

-2

-1

0

1

2

X.2

(a) Cov-error band with level 0.9 (b) 100 simulated random triangles

Fig. 3.4. The Cov-error band and simulation results when covariance matrices are (3.2)

not its error covariance matrix varies. One last observation:, when errors of the four points are

homogeneous (such as cases (i) and (v)), the means of the probability estimates are almost the same.

Even though the difference in terms of mean probabilities may not seem to be outstanding, it should

however be noted that by taking the mean value, we actually reduce the randomness among individual

samples, which may be large. This example shows the applicability of the point-in-triangle probability

model under different error structures, and confirms that the model can convey the effect of such

differences. Fig. 3.4 depicts the covariance-based error band and 100 simulation results for case (v).

Example 3.3 Point-in- polygon problem. Consider two cases: (i) the true point 0V (1.5, −1) is

outside the region of a concave polygon (see Fig.3.5(a)); (ii) the true point 0V (−0.4, −1.8) is outside

the region of a convex polygon (see Fig.3.5(b)). The vertices of the concave polygon are 01V (0, 0),

02V (1, 2), 0

3V (3, −1), 04V (2, −2), and 0

5V (1.6, −0.5), and the vertices of the convex polygon are the

same except for 05V , which becomes 0

5V (−0.2, −1.5). Using the polygon probability model (2.30),

we can estimate the probability that the random point V is in the region of a random polygon.

For simplicity, we choose the error structure (3.1) as the error covariance matrices of all vertices

and the point 0V . First, we can give the empirical values of the maximal allowable limits (MAL) 2σ ,

discussed in Part 1, for vertices on the polygons with different confidence levels α−1 . Since only 2σ

02V

03V

01V

0V

Page 26: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

26

varies under the condition that the regions are given, by adjusting 2σ the corresponding MAL values

can be obtained (Table 3.3). It can be observed that for the concave and convex polygons for this

example, the MALs 2σ decreases as the confidence level increases and the MALs of the convex

polygon are much larger than that of the concave polygon at the same confidence level. Moreover, if

the confidence region of the boundaries of the concave polygon is determined by the Cov-error band,

the MAL values are all larger than that by the approximate Cov-error band (see Fig. 3.5(c) and (e)).

However, for the convex polygon, a similar case does not occur since its MALs are completely

determined by the confidence regions of the vertices and there is more room for adjusting the Cov-

error band (as evident in Fig. 3.5(d) and (f)). Furthermore, after the triangulation has been made, the

MALs corresponding to these triangulations usually need to be further adjusted in order to guarantee

that the corresponding confidence regions of each vertex in each partitioned triangle does not intersect

the confidence band of the opposite edge. For the concave polygon in this example, there is no such

problem. For the convex polygon, the new MALs (see Fig. 3.6(a) and (b)) with respect to the

approximate Cov-error band and the Cov-error band become smaller than the MALs with respect to

the polygon. That is, under the condition that α−1 is invariant, the widths of the corresponding error

bands become smaller. A comparison can be observed from Fig. 3.6 and Fig. 3.5(d) and (f). The

MALs values listed in Table 3.3 for the convex polygon are the adjusted results.

0 1 2 3

X.1

-2

-1

0

1

2

X.2

-1 0 1 2 3

X.1

-3

-2

-1

0

1

2

3

X.2

(a) The point is outside the concave polygon (b) The point is outside the convex polygon

01V

02V

03V

04V

05V

0V

02V

01V

05V0V

04V

03V

Page 27: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

27

X.1

X.2

-1 0 1 2 3 4

-3-2

-10

12

3

X.1

X.2

-1 0 1 2 3 4

-3-2

-10

12

3

(c) Approximate Cov-error band and MAL (d) Approximate Cov-error band and MAL

X.1

X.2

-1 0 1 2 3 4

-3-2

-10

12

3

X.1

X.2

-1 0 1 2 3 4

-3-2

-10

12

3

(e) Covariance-based error band and MAL (f) Covariance-based error band and MAL

Fig. 3.5. A comparison of MAL in concave and convex polygons ( =−α1 0.90)

X.1

X.2

-1 0 1 2 3 4

-3-2

-10

12

3

X.1

X.2

-1 0 1 2 3 4

-3-2

-10

12

3

(a) Confidence region using approximate C-error bands (b) Confidence region using covariance-based error bands

Fig. 3.6. Confidence regions of triangulations for the convex polygon ( =−α1 0.90)

In Table 3.3, it can be observed that the larger is the confidence level α−1 , the smaller is the

MAL 2σ . Since the confidence region with larger confidence level is wider, there is much less room Comment [MG1]: The MAL symbols are showing differently in this paper than in 1 – in both cases there is a parenthesis above the sigma, here the parenthesis is horizontal but in paper 1 it was vertical

Page 28: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

28

to change the error variance before non-adjacent error bands intersect. Consequently, the

corresponding MAL 2σ is smaller.

Table 3.3. MAL values for vertices of the corresponding polygons with different confidence levels*

Concave polygon Convex polygon

α−1 0.80 0.85 0.90 0.95 0.99 0.80 0.85 0.90 0.95 0.99

2σ (1) 0.064 0.05 0.042 0.032 0.021 0.158 0.134 0.108 0.084 0.054

(2) 0.084 0.075 0.054 0.044 0.028 0.158 0.134 0.108 0.084 0.054

* (1) when using the approximate Cov-error band and (2) when using the covariance-based error band

To use the model (2.30), we choose =−α1 0.90. Thus the conditional probability ][ AP of the

following results is at least =− 5)1( α 0.59. It should be noted that this is a conservative lower bound

but not the true value, which may indeed be much larger. Of course, to boost the probability, we can

choose a larger α−1 , but from Table 3.3 the MAL will decrease accordingly and the error variances

may not be within the range bounded by the MAL. For example, if 2σ of concern is 0.04 in case (1)

and we choose =−α1 0.99, this 2σ is then larger than the corresponding MAL 2σ = 0.021. Thus the

validity of the obtained probability estimations cannot be guaranteed. We must then need to make a

tradeoff. This shows the practical value of MAL.

For the concave polygon, we first partition it into three triangles: ∆V1V2V5, ∆V2V3V5, and ∆V3V4V5.

The probability estimations are then made for each triangle by the triangle probability model (2.24)

with a sample size of 1000. The results are tabulated in Table 3.4, where the ‘Sum’ column is the

result obtained by applying (2.30), and the ‘Whole’ column is the result obtained by simulating

directly (i.e., simulating the whole polygon) the probability P[V∈R(V1V2V3V4V5)|A].

After the approximate Cov-error regions of the polygons in cases (1) and (2) with confidence level

0.90 are respectively determined, the corresponding triangulations are selected as shown in Fig. 3.5(a),

(b). For the concave polygon and the triangulation, since the confidence ellipse region of each vertex

in each partitioned triangle does not intersect the approximate Cov-error band of the opposite edge, the

probability model (2.30) can be applied and >][ AP 0.59. From Table 3.4, we can find that the result,

under “Sum”, obtained by applying (2.30) is almost the same as the one, under “Whole”, obtained by

Page 29: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

29

directly simulating the polygon as a whole. When 2σ is not larger than 2σ , the difference between

the results of the last two columns is very small, which may well be due to randomness instead. When

2σ = 0.06, the difference is bigger but is still relatively small. It shows that 2σ is conservative.

However, when 2σ = 0.07, 2 2σ , 3 2σ , the difference is getting bigger and bigger. This is due to having

a 2σ exceedingly larger than the MAL, resulting in the change of topological properties of the

random polygons. Therefore, the introduction of MAL, 2σ , is necessary in order to make the

corresponding probability legitimate.

Table 3.4 Probability estimations for the point-in-polygon example in Fig. 3.5(a)

2σ Triangles Sum Whole

1.2.5 2.3.5 3.4.5 0.001 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.000 0.008 0.008 0.005 0.010 0.000 0.000 0.033 0.033 0.035 0.020 0.000 0.002 0.104 0.106 0.106 0.030 0.004 0.006 0.141 0.151 0.152

0.040(= 2σ (1)) 0.007 0.024 0.154 0.185 0.195

0.050(= 2σ (2)) 0.019 0.031 0.179 0.229 0.230

0.060 0.016 0.042 0.195 0.253 0.257 0.070 0.018 0.060 0.186 0.264 0.279

0.080(=2 2σ (1)) 0.023 0.059 0.190 0.272 0.296

0.120(=3 2σ (1)) 0.049 0.082 0.187 0.318 0.346

Without further elaboration, a similar experiment can be performed for the convex polygon and

the results can likewise be interpreted.

4. Conclusion We have argued in this part of the series that point-in-polygon analysis under ME, i.e., when points

and polygons are random, is in fact an ill-defined problem. The legitimacy of the problem all hinges

on whether the concerned polygon is simple and forms a region. To solve the ill-defined problem, we

have made use of the MAL and introduced the concept of an approximate covariance-based error band

from which we can render a probability, a conditional probability to be precise, description of the

problem (a conditional probability to be precise). It This approach clarifies the issues surrounding the

point-in-polygon analysis and enables a more accurate investigation that forms a basis for further

Page 30: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

30

study. Analytically, we adopt the idea of polygon triangulation and decompose the point-in-polygon

problem into a multiple point-in-triangle problems whose solution is obtained by the proposed

algebra-based probability model. Instead of tackling the problem with geometry which will involve the

complex discussion of polygon convexity and nonconvexity, we have proposed to use four concise

quadratic form variables, which have a unified structure, to compute the probability that a random

point is inside a random polygon. The validity and effectiveness of the proposed model have also been

demonstrated with simulation experiments. An outstanding problem for further study is to obtain the

analytical distribution of these quadratic form variables under a certain condition. This will make the

approach theoretically more complete.

In Part 3 of the present series, we will employ the quadratic forms to analyze the intersections and

polygon-on-polygon problems. It will be demonstrated that the quadratic forms serve as a unified basis

to solve the point-in-polygon, line-in-polygon and polygon-on-polygon problems under ME.

Appendix 1 Proof of Lemma 2.1

Assume that there are three non-negative numbers such that equation (2.3) and (2.4) hold. If

11 =λ , then 1xx = . Consequently, the point V is located in the triangle 321 VVV∆ . If 11 ≠λ , then

01 1 >− λ . Thus the equation (2.3) can be expressed into

+−

−+= 31

32

1

2111 11

)1( xxxxλ

λλ

λλλ *111 )1( xx λλ −+≡ , (A.1)

where 31

32

1

2* 11

xxxλ

λλ

λ−

+−

= .

Obviously, the point *V )( ** xV= is located on the edge 32VV of the triangle 321 VVV∆ and thus it

is in the triangle. According to (A.1), the point V is located on the line segment *1VV and of course it

belongs to the region of the triangle 321 VVV∆ . If the point V is located in the interior of the triangle

321 VVV∆ , then the line segment from 1V passing through V will intersect the edge 32VV at the point

*V . We have

])1()[1()1( 3222111*111 xxxxxx γγλλλλ −+−+=−+= ,

32122111 )]1)(1[(])1[( xxx γλγλλ −−+−+= 332211 xxx λλλ ′+′+′≡ .

Comment [MG2]: Under what condition?

Page 31: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

31

It is easy to check that 21 , λλ ′′ , and 3λ′ satisfy the specific conditions. Therefore, equation (2.3)

holds. ٱ

Appendix 2 Proof of Remark 3

Note that ii ΔΔ −=T and 0T0 HH −= . According to the formula =⊗ T)( BA TT BA ⊗ , then

iiii HHΔHΔH =−⊗−=⊗≡ )()()( 0T

0T . It is easy to show that 2)(rank =iΔ , )(rank 0H 2= , thus

we have 4)(rank )(rank)(rank 0 == HΔH ii . Finally, as the eigenvalues of 0H are i , i− , and the

eigenvalues of iΔ are i23 , i2

3− , 0, 0, where i is the imaginary unit, then their products are the

eigenvalues of 0HΔH ⊗≡ ii ٱ .

Appendix 3 Proof of a probability inequality in the discussion of (2.30)

Let )(0αR be the confidence ellipse of a point 0V with confidence level )1( α− and −R be the

union of the approximate Cov-error regions (with the same confidence level) of all partitioned triangles which do not intersect )(

0αR . Thus, =∩ −RR )(

0α ∅ (an empty set). For independent random

point V and random triangle 321 iii VVV∆ , define the random events }{ )(

0αRVA ∈≡ , }{ 321 −⊆∆≡ RVVVB iii , }{ 321 iii VVVVC ∆∈≡ .

Then α=][ AP , α=][ BP , 0]|[ =ABCP , and

=∆∈ ][ 321 iii VVVVP ][ CP ][ ]|[ ][ ]|[ ABPABCPABPABCP ⋅+⋅= 2)1(1][ ][ 1][ 1][ α−−=⋅−=−=≤ BPAPABPABP ٱ .

References Berg, M. de, M. van Kreveld, M. Overmars, and O. Schwarzkopf. 2000. Computational Geometry:

Algorithms and Applications (2nd ed.), Berlin: Springer-Verlag. Blakemore, M. 1984. Generalization and error in spatial data bases, Cartographica, 21,131-139. Bolstad, P.V., Gessler, P. and Lillesand, T.M. (1990). Positional uncertainty in manually digitized map

object, Int. J. Geographical Information Systems, 4(4), 399-412. Egenhofer, M.J. and R.D. Franzosa. 1991. Point-set topological spatial relations, International Journal

of Geographical Information Systems, 5,161-174. Ehlschlaeger, C. R. (2002). Representing multiple spatial statistics in generalized elevation uncertainty

models: moving beyond the variogram, Int. J. Geographical Information Systems, 16(3), 259-285.

Haines, E. 1994. Point in Polygon Strategies. In Graphics Gems IV, ed. Paul Heckbert, Boston: Academic Press, p. 24-46.

Heuvelink, G.B.M. 1998. Error Propagation in Environmental Modelling with GIS, London: Taylor & Francis.

Goodchild, M.F. and S. Gopal. (Eds). 1989. Accuracy of Spatial Databases, London: Taylor & Francis.

Page 32: A General Framework fFor Error Analysis iIn Measurement-Bbased …good/papers/407.pdf · 2012-03-30 · and polygons than the original files. Geometry is used to define new objects

A general framework for error analysis in measurement-based GIS ------Part 2

32

Goodchild, M.F., Parks, B.O. and Staeyert, L.T. (Eds). (1993). Environmental Modelling with GIS, New York: Oxford University Press.

Leung, Y., and J.P. Yan. 1998. A locational error model for spatial features. Int. J. Geographical Information Science, 12, 607-620.

Leung, Y., and J.P. Yan. 1997. Point-in-polygon analysis under certainty and uncertainty. GeoInformatica, 1, 93-114.

Leung, Y., J. H. Ma, and M.F. Goodchild. 2003a. A general framework for error analysis in measurement-based GIS---Part 1: the basic measurement-error model and related concepts. (unpublished paper)

Leung, Y., J. H. Ma, and M.F. Goodchild. 2003b. A general framework for error analysis in measurement-based GIS---Part 2: the algebraic-based probability model for point-in-polygon analysis. (unpublished paper)

Leung, Y., J. H. Ma, and M.F. Goodchild. 2003c. A general framework for error analysis in measurement-based GIS---Part 3: error analysis for intersections and overlays. (unpublished paper)

Leung, Y., J. H. Ma, and M.F. Goodchild. 2003d. A general framework for error analysis in measurement-based GIS---Part 4: error analysis for length and area measurements. (unpublished paper)

Rigaux, P. , M. Scholl and A.Voisard, 2002. Spatial Databases with Application to GIS. San Francisco : Morgan Kaufmann Publishers.

Stanfel, L.E., M. Conerly, and C. Stanfel. 1995. Reliability of polygonal boundary of land parcel. Journal of Surveying Engineering, 121(4): 163-176.

Turkington, D.A. 2002. Matrix Calculus and Zero-One Matrices. Cambridge, UK: Cambridge University Press.