A general framework for error analysis in measurement-based GIS ------Part 2
1
A General Framework fFor Error Analysis iIn Measurement-Bbased
GIS, Part 2: The Algebra-Based Probability Model fFor Point-iIn-Polygon Analysis.
Yee Leung Department of Geography and Resource Management, Center for Environmental Policy
and Resource Management, and Joint Laboratory for Geoinformation Science, The Chinese University of Hong Kong, Hong Kong
E-mail: [email protected]
Jiang-Hong Ma Faculty of Science, Xi’an Jiaotong University and Chang’an University, Xi’an, P.R. China
E-mail: [email protected]
Michael F. Goodchild Department of Geography, University of California, Santa Barbara, California, U.S.A.
E-mail: [email protected] Abstract. This paper is Part 2 of a four-part series of our research on the development of a general
framework for error analysis in measurement-based geographic information systems (MBGIS). In this
paper, we discuss the problem of point-in-polygon analysis under randomness, i.e., with random
measurement error (ME). It is well known that overlay is one of the most important operations in GIS,
and point-in-polygon analysis is a basic class of overlay and query problems. Though it is a classic
problem, it has, however, not been addressed appropriately. With ME in the location of the vertices of
a polygon, the resulting random polygons may undergo complex changes, so that the point-in-polygon
problem may become theoretically and practically ill-defined. That is, there is a possibility that we
cannot answer whether a random point is inside a random polygon if the polygon is not simple and
cannot form a region. For the point-in-triangle problem, however, such a case need not be considered
since any triangle can always forms its an interior or region. To formulate the general point-in-polygon
problem in a suitable way, a conditional probability mechanism is first introduced in order to
accurately characterize the nature of the problem and establish the basis for further analysis. For the
point-in-triangle problem, four quadratic forms in the joint coordinate vectors of a point and the
vertices of the triangle are constructed. The probability model for the point-in-triangle problem is then
established by the identification of signs of these quadratic form variables. Our basic idea for solving a
general point-in-polygon (concave or convex) problem is to convert it into several point-in-triangle
A general framework for error analysis in measurement-based GIS ------Part 2
2
problems under a certain condition. By solving each point-in-triangle problem and summing the
solutionsm up, the probability model for a general point-in-polygon analysis is constructed. The
simplicity of the algebra-based approach is that from using these quadratic forms, we can circumvent
the complex geometrical relations between a random point and a random polygon (convex or concave)
that one has to deal with in any geometric methods when the probability is computed. The theoretical
arguments are substantiated by simulation experiments.
Keywords: algebra-based probability model, approximate covariance-based error band, point-in-
triangle, point-in-polygon, quadratic form
1. Introduction
A traditional method of geographical analysis is to lay a map of one theme over the a map of
another. Such a process is commonly called overlay (or overlay functions). Overlay is the operation of
comparing variables among multiple coverages, requiring both graphic and attribute comparisons, and
is one of the most powerful features of the GIS. Vector overlays are methodologically and technically
more complex than raster overlays and usually produce complex output files with more nodes, arcs
and polygons than the original files. Geometry is used to define new objects in a topological sense.
Overlay does not always involve comparisons between polygons, since points and lines are often
involved. In terms of point, line and polygon, we usually have three types of problems: (1) point-in-
polygon, (2) line-in-polygon, and (3) polygon-on-polygon. When performing overlays, it is often
important to know whether a point of the first layer lies within a polygon of the second layer. Point-in-
polygon operations are used to compare a map of a point distribution with a map of regions, or to
entertain a query of whether a point is in a polygon. The point-in-polygon query in GIS has been
formally discussed in Leung and Yan (1997) under nine basic situations where points and polygons
can be precise, fuzzy (imprecise), and random (with error).
Blakemore (1984) has discussed the point-in-polygon relation under the epsilon band concept, i.e.,
a precise point in an epsilon-band. Based on the geometric relation of points and polygons, Stanfel et
al. (1995) have developed two computationally simple approximation schemes to calculate the
A general framework for error analysis in measurement-based GIS ------Part 2
3
probability that a precise point is inside an uncertain polygon having as vertices a set of points with
coordinates determined by various measurement schemes. Monte Carlo simulation, on the other hand,
is a technique for the modeling of uncertainty in spatial data. It can quantify uncertainty in spatial data
and applications by determining the possible range of an application result. A spatial statistical model
that represents multiple statistics simultaneously and weighted against each other is proposed in
Ehlschlaeger (2002). With the exception of the conceptual sketch made in Leung and Yan (1997), it
appears that indepth theoretical analysis of the point-in-polygon issue when points and polygons both
haveare both with random errors has not been dealt with in the literature (Rigaux et al. 2002).
Testing whether a point is inside a polygon is not only a common operation in spatial and GIS
applications, but it is also a basic operation in computer graphics (Rigaux et al., 2002). Algorithms
such as the crossings test (or ray-polygon intersection test), the angle summation method, the bins
method and the grid method have been developed over the years. The triangle test is another method
by which a polygon is treated as a fan of triangles emanating from one vertex and the point is tested
against each triangle by computing its barycentric coordinates. A faster triangle fan test is to store a set
of half-plane equations for each triangle and test each in turn. Most of these methods developed in
computer science are fast but need much larger memory and higher initialization times (see Haines,
1994 for details). However, these methods are generally not suitable for error analysis in GIS because
they usually do not deal with the uncertainty caused by locational errors of the vertices, and it is not so
easy to perform probability analysis in their formulations.
To appropriately solve the point-in-polygon problem when polygons are havewith random errors,
a fundamental concept should first be clarified. When locations of the vertices of a polygon are
random and when the polygon is formed by vertices listed in a specific order (this is a common
representation of a polygon), the randomly generated polygon can undergo complex changes.
Theoretically, even if the variances of the locational errors are not large, there is a possibility that we
have no basis to answer whether a random point is inside a random ‘polygon’ if such the ‘polygon’
cannot form itshas no well-defined interior or region. Under this situation, the point-in-polygon
problem is ill-defined in the strict sense. It complicates the point-in-polygon problem and makes it
A general framework for error analysis in measurement-based GIS ------Part 2
4
more difficult to solve. The complication and difficulty are largely due to the (1) complex relations
between random points and random polygons, and (2) convexity/non-convexity of polygons.
Fortunately, for random triangles the point-in-triangle problems can always be well-defined since any
triangle has its interior or region. This sheds light on solving the problem through polygon
triangulation.
To formulate appropriately the general point-in-polygon problem under randomness, and to set up
the corresponding probability model, we will first introduce a conditional probability mechanism to
accurately characterize the nature of the problem and to use it as the basis for further analysis. We will
try to address the point-in-polygon problem from the algebra-based point of view so that we can
circumvent the complications arising from polygon convexity which have to be carefully dealt with in
any geometric methods. Since any polygon, concave or convex, can be triangulated, we will also
employ the decomposition approach like that in computer graphics, to decompose the point-in-
polygon problem into several elementary point-in-triangle problems. The advantages of the above
approach are (1) it legitimizes the point-in-polygon analysis under randomness; (2) it greatly
simplifies the problem by avoiding the issues of polygon convexity, and (3) it renders a probability
estimation for the problem.
In this part of the series, we first discuss the relationship between a point and a triangle in section
2. Four quadratic forms in the joint coordinate vectors of a point and the vertices of a triangle are
introduced in order to identify whether the point is inside the triangle. The probability that the point is
inside the polygon is then computed by the identification of signs of the four quadratic forms, giving a
concise expression for the probability model. Accordingly, a triangle model and a general polygon
model for the point-in-polygon problem under ME are constructed. To substantiate the theoretical
arguments, several simulation experiments are discussed in section 3. We then conclude the paper with
a summary in section 4.
A general framework for error analysis in measurement-based GIS ------Part 2
5
2. An Algebra-Bbased Probability Model for Point-in-Polygon Analysis
In general, a polygon consists of an ordered series of vertices linked by edges. An n-sided polygon
means a closed plane with n sides. Polygons can be classified into simple and non-simple (or
complex). A simple polygon satisfies these two conditions: (1) all adjacent edges have only a single
shared point; and (2) all non-adjacent edges do not intersect. There is a well-defined bounded interior
(surrounded by edges) and unbounded exterior for a simple polygon. Since simple polygons are basic
building blocks in vector-based GIS, the polygons we consider henceforth will be simple, unless
specified otherwise. Vertices of a simple n-sided polygon can always be arranged into a certain
ordered series: 01V , 0
2V , 0 , nV , ni ,,2 ,1 = (clockwise or counter-clockwise); in reverse, the polygon
can be uniquely determined by the order of the n vertices when their locations are given. Let the
polygon be denoted as )( 001 nVVPoly and the region formed by its interior and boundaries be
denoted as )( 001 nVVR . For convenience, the coordinates of a point are written as a column vector
throughout, and the vector is denoted in bold face, e.g. x, 1x . The random vector corresponding to x is
denoted in bold uppercase, e.g. X, 1X . A point V with coordinate vector x is denoted by )(xV ,
sometimes, written as ),( 21 xxV .
The general point-in-polygon problem when points and polygons are both random amounts to
addressing whether an uncertain point V is in the region of an uncertain polygon )( 21 nVVVRR ≡ ,
where iV are the vertices of the polygon R, ni ,,2 ,1 = . (For simplicity and without confusion, we
henceforth use iV for both the singular and plural form of iV (i.e., rather than using iV ’s for plural),
and the same applies to all other relevant symbols). Within the probability framework, it is equivalent
to computing the probability that point V is in the region of polygon R. This concept actually implies
an underlying condition, that is, a random polygon must have its region (area). If such this condition
does not hold, the statement “point V is in the region of polygon R ” is ill-defined. Naturally, the
probability description becomes problematic and we cannot answer the point-in-polygon query. We
know that in vector-based data, the randomness of a polygon is characterized by the randomness of
their its vertices whose coordinates are randomly generated (usually from a certain distribution). For
A general framework for error analysis in measurement-based GIS ------Part 2
6
an underlying polygon, the number and the order of the vertices are fixed. Accordingly, it cannot be
theoretically guaranteed that any random samples or variations of the polygon ( 3>n ) will form a
region or will be simple even if the underlying polygon is simple. For example, the underlying
polygon shown in Figure 2.1a is a simple quadrilateral )( 04
01 VVPoly (see Fig. 2.1(a)). However, the
random polygon )( 41 VVPoly generated from four random points is not simple since its two edges
intersect (the intersection is nevertheless not a vertex!). Moreover, the random point V corresponding
to the original point 0V makes the point-in-polygon problem ill-defined. However, such a problem
does not exist in point-in-triangle analysis since a triangle always has its region although the relative
topological relationship among vertices may change under randomness.
01V 1V 3V
4V
2V02V
03V
04V
0V
V
Fig. 2.1 A simple polygon and its randomly generated version, a non-simple polygon
Therefore, in order to study the probability that a random point is inside a random polygon, the
concept that a point is inside a random polygon must be well-defined. In other words, we have to
guarantee that the region of a random polygon can be formed before the point-in-polygon problem
under randomness can be discussed. This is quite different from other cases of the point-in-polygon
analysis discussed in Leung and Yan (1997). This problem will be theoretically ill-defined without the
determination of the region.
In practice, the above case may seldom occur if random error is sufficiently small. If we can
almost surely confirm (which may be difficult to do) that a random polygon has its region or is simple,
the probability that a random point is inside a random polygon can be viewed as absolute probability.
(a) a simple polygon (b) a random polygon
A general framework for error analysis in measurement-based GIS ------Part 2
7
It should be a conditional probability otherwise. This is thus a salient theoretical and practical issue of
fundamental importance.
Let T21 ),( XX≡X and T
21 ),( iii XX≡X denote respectively the random coordinates vectors of
the point V and the vertices iV , ni ,,2 ,1 = . Thus the above absolute probability can be expressed as
][ RVP ∈ [ ]∫ ∫ =∈= ),,,(),,,(|))()()(()( 21212211 nnnnVVVRVP xxxXXXXXXX nnRf xxxxxx d d d ),,,( 2121 ⋅ nVVVRV vnR
nnff xxxxxxxx
xxxx d d d ] d )( [ ),,,( 21))()()(()(21
2211
∫∫∫ ∫ ∈= ,
where Rf is the joint probability density of all vertex coordinates of R, and vf is the joint probability
density of the coordinates ),( 21 XX of point V. It can be observed that the multiple integral in the
middle of the expression at on the right hand side involves the region ))()()(( 2211 nnVVVR xxx .
Obviously, if it is not a region, the probability cannot be computed. Even if it is computable, it may be
very difficult and complex because the multiple integrals involved may have complex domains.
Taking all of the above into consideration, in order to compute this probability, we need a new
representation of the probability ][ RVP ∈ , and a theoretically and practically sound probability
model to solve the point-in-polygon problem under randomness.
2.1 Point-in-triangle probability model
In the point-in-polygon problem, only the point-in-triangle problem may have an absolute
probability model. We first describe the relationship between a point and a triangle and then formulate
the corresponding probability model.
2.1.1 Identification of the relationship between a point and a triangle
In this subsection, we only focus on the establishment of a relationship between a point and a
triangle without random errors. In other words, the variables or vectors involved are not random. This
will form a basis for the point-in-polygon analysis under ME.
We start with the relationship between a point and a triangle. Let the three vertex positions of the
underlying triangle 321 VVV∆ be iV , 3 ,2 ,1=i , the corresponding coordinate column vectors be
A general framework for error analysis in measurement-based GIS ------Part 2
8
2T21 ),( Rxx iii ∈=x , 3 ,2 ,1=i , and 2T
21 ),( Rxx ∈=x be the coordinate column vector of any point V
in the plane. First, we define two basic functions ) , (det ⋅⋅f and ) , , ( ⋅⋅⋅h as follows:
Definition 2.1 For any column vectors 2,, R∈zyx , we define || ),det(),(det yx yxyx =≡f , (2.1)
),(),,( det zyyxzyx −−≡ fh . (2.2)
According to the properties of a determinant, it is easy to check the following properties:
(i) ),(),( detdet zyzxzyyx −−=−− ff ,
(ii) ),,(),,(),,( yxzxzyzyx hhh == ,
(iii) ),,(),,( yzxzyx hh −= , ),,(),,( xyzzyx hh −= , ),,(),,( zxyzyx hh −= .
Lemma 2.1 A point V is located in a triangle 321 VVV∆ if and only if there are three non-negative
numbers iλ , 3 ,2 ,1=i such that
332211 xxxx λλλ ++= , (2.3)
1321 =++ λλλ , 0≥iλ , 3 ,2 ,1=i . (2.4)
That is, the coordinate column vector of the point V is a convex combination of coordinate column
vectors of three vertices in the triangle 321 VVV∆ . (The proof is given in Appendix 1)
The conclusion can be generalized. In general, the convex hull of a set of n points =nS
{ :)( iii VV x= ni ,...,1 = } can be expressed into as that set of points whose coordinate column vectors
x satisfy
i
n
iixx ∑
=
=1
λ , 11
=∑=
n
iiλ , 0≥iλ , T
21 ),( iii xx=x , ni ,...,1= .
It should be noted that the equation (2.3) can be expressed intoas
31212121111 )1( xxxx λλλλ −−++= , 32212221212 )1( xxxx λλλλ −−++= ,
or 3112312113111 )()( xxxxxx −=−+− λλ ,
3222322213212 )()( xxxxxx −=−+− λλ . (2.5) This is a linear system with respect to the unknown parameters 1λ and 2λ . Since the area of the
triangle 321 VVV∆ is equal to the absolute value of the following determinant
= 111
21
3231
2221
1211
xxxxxx
21
32223121
32123111
xxxxxxxx
−−−−
),(21
3231det xxxx −−= f , (2.6)
the determinant will not be equal to zero if the triangle is formed. Thus we can obtain the unique
solution according to the Cramer’s Rule
A general framework for error analysis in measurement-based GIS ------Part 2
9
),(),(
3231det
323det1 x
xxxxxxx
−−−−
=∗
ff
λ , ),(
),(
3231det
331det2 x
xxxxxxx−−−−
=∗
ff
λ . (2.7)
Corollary 2.1 The coefficients in the expression (2.3) of Lemma 2.1 are unique whenever they
exist.
For arbitrary point V and a given triangle 321 VVV∆ , we may define the following discriminant
functions:
Definition 2.2 For a given triangle 321 VVV∆ with vertex coordinates vectors ix , 3 ,2 ,1=i ,
we define
),,()( 321 xxxx hh ≡ , (2.8) ),,()( 312 xxxx hh ≡ , (2.9) ),,()( 213 xxxx hh ≡ . (2.10)
Accordingly, (2.7) can be rewritten as
),,(
)(
321
11 xxx
xh
h=∗λ ,
),,()(
321
22 xxx
xh
h=∗λ . (2.11)
In addition, we have
=−−
=−−= ∗∗∗
),,()()(),,(
1321
21321213 x
xxx
xxxxh
hhhλλλ
),,()(
321
3
xxxx
hh
.
In fact, | | | | | | )()(),,( 31322322121321 xxxxxxxxxxxxxxxx x −−−−−−−−=−− hhh
)( | | | | | | 32131321 xxxxxxxxxxxxx h=−−=−−−−−= .
Thus, in terms of Lemma 2.1 and Corollary 2.1 we have
Proposition 2.1 Let )( 321 VVVR be the triangle region formed by the vertices )( iii VV x= ,
3 ,2 ,1=i . Then
“ )()( 321 VVVRVV ∈= x ” ⇔ “ 0≥∗iλ , 3 ,2 ,1=i ”
⇔ “ )(xih and ),,( 321 xxxh have the same sign, 3 ,2 ,1=i ”,
where the symbol “⇔ ” denotes “if and only if ”.
In particular, when the vertices of the triangle 321 VVV∆ are listed in an order such that the area is
circulated in counter-clockwise fashion, its area (see (2.6)) is 0),,( 32121 ≥xxxh . In this case, we have
“ )()( 321 VVVRVV ∈= x ” ⇔ “ 0)( ≥xih , 3 ,2 ,1=i ”.
Remark 1. It is obvious that )(xih is a linear function in the components of x, and 0)( =xih
represents a straight line through the points jV ( ij ≠ ). For example, 0)(3 =xh represents a straight
line through the points 1V and 2V since 0)()( 2313 == xx hh . Thus 0)( ≥xih represents a half-plane.
Since each edge corresponds to such a half-plane, the point V is inside the triangle region )( 321 VVVR if
A general framework for error analysis in measurement-based GIS ------Part 2
10
and only if it is inside the intersection of three half-planes 0)( ≥xih , 3 ,2 ,1=i . Fig. 2.2 is such a
geometric interpretation.
It should be noted that the sign of ),,( 321 xxxh is fully decided by the triangle 321 VVV∆ and is
independent of the point V. Once the triangle is given, the sign of ),,( 321 xxxh will be settled. The
signs of )(xih ( 3 ,2 ,1=i ) are sufficient to test whether the point )(xV is inside the triangle 321 VVV∆ .
Remark 2. In this paper, when we say that a set of values has the same sign, it means that all of
them are non-negative or all of them are non-positive. In such a sense, we can say that 1, 2, 0.5, and 0
have the same sign.
2.1.2 Quadratic form representations for the discriminant functions
Now we will give quadratic form representations for the discriminant functions )(⋅ih and
) , , ( ⋅⋅⋅h . We first rewrite ) , (det ⋅⋅f into as
|| ),det(),(det yx yxyx =≡f yHx 0T
1
2211221
22
11 ) ( =
−
=−==y
yxxyxyx
yxyx
, (2.12)
where the matrix
−
≡0110
0H , (2.13)
which has the properties that 0T0 HH −= and 2
20 IH −= . That is, 0H is a skew-symmetric and
orthogonal matrix, and it has no real eigenvalues and eigenvectors. Its geometric meaning is a 90°
rotation transformation, i.e., the vector 0Hx ≡′ x can be obtained by rotating x by 90° about the origin
in clockwise way fashion (see Fig. 2.3) since
°−=
−=
−
≡ 90 , cossinsincos
0110
0 θθθθθ
H .
A general framework for error analysis in measurement-based GIS ------Part 2
11
V3
V1
V2V
03 >h
03 <h03 =h
Fig. 2.2 A geometric interpretation of the h-functions
2x
1x
x
xHx 0≡′
O°90
We then derive the quadratic form representation for ) , , ( ⋅⋅⋅h in the joint vector. For any vectors
2,, R∈zyx , define the pooled vector and unit coordinate vectors in 3R as follows:
≡
× zyx
x1 6)3( ,
≡
001
1e ,
≡
010
2e ,
≡
100
3e .
Then we have
)3(2T1 )( xIex ⊗= , )3(2
T2 )( xIey ⊗= , )3(2
T3 )( xIez ⊗= ,
where the symbol “⊗ ” denotes the Kronecker product of matrices (see Turkington, 2002), and 2I is a
22× identity matrix.
),(),,( det zyyxzyx −−≡ fh )()( 0T zyHyx −−=
)3(2T
320221T
)3( ])[( ])[( xIeeHIeex ⊗−⊗−=
)3(0T
3221T
)3( }]))([( { xHeeeex ⊗−−=
)3(00T
)3( ][ xHΔx ⊗=
,
where ≡0Δ T
3221 ))(( eeee −− . It should be noted that
=−≡ T000 ΔΔΔ
−−
−=−+−+−
011101110
)()()( T31
T13
T23
T32
T12
T21 eeeeeeeeeeee . (2.14)
Therefore, =),,( zyxh )3(0
T00
T)3(2
1)3(
T0000
T)3(2
1 ])[(}][]{[ xHΔΔxxHΔHΔx ⊗−=⊗+⊗
)3(0T
)3(21
)3(00T
)3(21 ][ xHxxHΔx =⊗= , (2.15)
where 000 HΔH ⊗≡ is symmetric since T0
T0
T0 HΔH ⊗= 000 HHΔ =⊗= ( 0
T0 ΔΔ −= , see (2.14)).
The expression (2.15) indicates that ),,( zyxh is a quadratic form in the joint vector )3(x .
Furthermore, the functions )(⋅ih and ) , , ( ⋅⋅⋅h can be expressed into as quadratic forms in ix
( 3 ,2 ,1=i ) and x. Let the 18× pooled vector )4(x be defined by
) ( TT3
T2
T1
T)4( xxxxx ≡ . (2.16)
Fig. 2.3 A geometric interpretation of 0H
A general framework for error analysis in measurement-based GIS ------Part 2
12
Then
)4(1
3
2)3(1 xCxxx
x =
≡ , where
≡
0I0000I0I000
C
2
2
2
1 .
In terms of (2.15), we have ),,()( 321 xxxx hh = )3(10T
)3(121 xHx= )4(10
T1
T)4(2
1 xCHCx= . It can be derived
from simple computation that 110T1 HCHC = , where
011 HΔH ⊗≡ ,
−−
−≡
011010101100
0000
1Δ . (2.17)
Thus we get )(1 xh )4(1
T)4(2
1 xHx= . (2.18)
In general, the following expressions can similarly be derived:
)4(T
)4(21)( xHxx iih = , 3 ,2 ,1=i (2.19)
)4(4T
)4(21
321 ),,( xHxxxx =h , (2.20) where
0HΔH ⊗≡ ii , 4 ,3 ,2 ,1=i , (2.21)
−−
−
≡
01011001
00001100
2Δ ,
−
−−
≡
0011000010011010
3Δ ,
−−
−
≡
0000001101010110
4Δ . (2.22)
At the same time, we may check that iH ( 4 ,3 ,2 ,1=i ) are symmetric (see Remark 3). So all )(⋅ih and
) , , ( ⋅⋅⋅h can be expressed into as quadratic forms in the joint vector )4(x and have a unified
expression (see (2.19) and (2.20)). From this point of view, Proposition 2.1 can be rewritten in a more
concise way as follows:
Proposition 2.2 Let )4(T
)4( xHx iiz ≡ , where )4(x and iH are defined by (2.16) and (2.21)
respectively. Then
“ )()( 321 VVVRVV ∈= x ” ⇔ “ iz , 4 ,3 ,2 ,1=i , have the same sign” .
Remark 3. The matrices iH ( 4 ,3 ,2 ,1=i ) in (2.21) are symmetric. Each of them has rank of 4,
i.e., 4)(rank =iH , and the eight eigenvalues corresponding to each matrix are 23 , 2
3 , 23− , 2
3− ,
0, 0, 0, 0. (the proof is given in Appendix 2)
2.1.3 Probability model for point-in-triangle analysis under measurement error
Assume that three true vertex positions of the underlying triangle are )(0iiV μ , where
T21 ),( iii µµ≡μ are the corresponding coordinate vectors, 3 ,2 ,1=i , and the true position of a given
A general framework for error analysis in measurement-based GIS ------Part 2
13
point is )(0 vV μ , where T21 ),( vvv µµ≡μ . Due to random errors, we cannot observe the true values of
these coordinates, instead what we can observe are the coordinates of random points )( iiV X and
)( vV X with random errors:
iii εμX += , 3 ,2 ,1=i and vvv εμX += , (2.23)
where T21 ),( iii XX≡X , 3 ,2 ,1=i and T
21 ),( vvv XX≡X , the T21 ) ,( iii εε≡ε and T
21 ) ,( vvv εε≡ε
are respectively the random error vectors associated with the corresponding random vertices and the
point V, with zero mean and variance-covariance matrix (for simplicity, called the covariance matrix
henceforth) iΣ , vΣ , denoted as ),( ~ ii Σ0ε , 3 ,2 ,1=i , ~vε ),( vΣ0 . Since the model (2.23) can
be viewed as the direct measurement error (ME) model in Leung et al. (2003a), the random errors iε
and vε can be called the MEs.
It is obvious that the triangle with vertices )( iiV X ( 3 ,2 ,1=i ) and the point )( vV X are random
because of the randomness of ME. Therefore, whether V is inside the triangle region )( 321 VVVR will
be a random event. According to Proposition 2.2, the probability of such an event can be computed as:
)]([ 321 VVVRVP ∈ ,4 ,3 ,2 ,1 ,[ == iZP i having the same sign], (2.24)
where quadratic forms iZ , 4 ,3 ,2 ,1=i , are random variables, the joint random vector )4(X consists
of the coordinate vectors of iV ( 3 ,2 ,1=i ) and V, they are respectively defined by
)4(T
)4( XHX iiZ ≡ , 4 ,3 ,2 ,1=i , ≡T)4(X ) ( TT
3T2
T1 vXXXX . (2.25)
It should be pointed out that as the distributions of ME are usually continuous, the probability that
the determinant in (2.6) is zero will be zero, i.e., 1] 0) , ,([ 321 =≠XXXhP . We consider only the case
in which the triangle can be formed.
In addition, if we define the new random variables
}{min41min ii
ZZ≤≤
≡ , }{max41max ii
ZZ≤≤
≡ , 1maxminΔ−≡ ZZZ (2.26)
we then have
“ 4 ,3 ,2 ,1 , =iZ i , having the same sign” ⇔ “ minZ and maxZ having the same sign”
⇔ “ 01maxminΔ ≥≡ −ZZZ ”.
Thus (2.24) becomes
)]([ 321 VVVRVP ∈ 0][ Δ ≥= ZP . (2.27)
Remark 3. It can be checked that the random variable ΔZ is invariant with respect to the scale of
the coordinates, i.e., if the joint vector )4(X in (2.25) is transformed into )4(Xc , where 0>c is the
scale parameter, the value of ΔZ is then unchanged. In other words, ΔZ is independent of the
A general framework for error analysis in measurement-based GIS ------Part 2
14
measurement scale. Furthermore, when the point V is fixed and the triangle is random, (2.27) still
holds.
Therefore, whether we use one random variable ΔZ or four random variables iZ ( 4 ,3 ,2 ,1=i ),
the probability )]([ 321 VVVRVP ∈ can always be expressed in a concise way (see (2.24) and (2.27)).
This is our probability model for the point-in-triangle problem under ME.
2.2 The algebra-based probability model for point-in-polygon analysis
Without loss of generality, we now, on the basis of the triangle model discussed above, deduce in
here a probability model for general simple polygons (convex or concave). on the basis of the triangle
model discussed above. Since triangles are the most essential and simplest geometric objects that we
can use to form any polygon, we will transform the polygon problem into a multiple- triangle problem.
The proposed model can circumvent the discussion on the convexity of a polygon needed in geometric
procedures.
Let the true vertices of the underlying simple n-sided polygon be )(0iiV μ , and the true position of
a given point be )(0 vV μ , where T21 ),( iii µµ≡μ and T
21 ),( vvv µµ≡μ are the true coordinate vectors
of the corresponding points, ni , ,1 = . Under the effect of random errors, the polygon )...( 21 nVVVR
and the point V are random. From the theoretical and practical points of view, the approaches to solve
the ill-defined point-in-polygon ( 3>n ) problem may be as follows:
Approach (1): We discuss the point-in-polygon problem only when the random polygons can form
their regions. For random polygons without regions, there is no logical basis to discuss the problem.
Under this required condition, we can perform the point-in-polygon analysis when points and
polygons are random and compute the corresponding probability (strictly speaking, it is a conditional
probability).
Approach (2): We can discuss the point-in-polygon problem only when we can guarantee that at a
high enough confidence level, the generated random polygons are simple or the relative topological
relationships among the vertices of the original polygon can be preserved. Under this situation, it is
A general framework for error analysis in measurement-based GIS ------Part 2
15
natural to summon invoke the concept of the covariance-based error bands for the boundaries of the
true polygon. Using this concept of restricted By the restriction of the bands, we will be able to assign
confidence levels that randomly generated polygons will form its regions at a certain confidence level.
Although the probability obtained by approach (2) is still conditional, its advantage over approach
(1) is that it will give a stricter and finer analysis. So, we employ the second approach to solve the
point-in-polygon problem.
First, we need to compute the confidence level of the covariance-based error band of a line
segment. As discussed in Part 1, it is generally difficult if not impossible to determine this probability.
With reasonable adjustment in the delimitation process, we can however consider an approximate
covariance-based error band so that its confidence level can be described. As depicted in Fig. 2.4, we
replace the region formed by the varying covariance ellipses along a line segment with the region
formed by two new line segments 11BA and 22 BA that are tangent respectively to the two sides of the
confidence ellipses of the endpoints (see Fig. 2.4). The region, denoted by )(}2,1{
~ αR , surrounded by the
two tangent line segments and the two end-point confidence ellipses is thus an approximate
covariance-based error band (denoted as Cov-error band in short). Adopt the notations in Leung et al.
(2003a) and let )(αiR , 2 ,1=i , denote the confidence-ellipse regions of the endpoints with the
confidence level )1( α− . Assume the error vectors 1ε and 2ε corresponding to the coordinates
vectors 1X and 2X of the endpoints are normal and independent. Then αα −=∈ 1])([ )(111 RVP X and
αα −=∈ 1])([ )(222 RVP X . The random line segment L generated from 1X and 2X is the random set
:)({),( 21 XXX ′′≡ VL 21 )1( XXX tt −+=′ , 10 ≤≤ t }.
Define the random events }{ )(111αRD ∈≡ X , }{ )(
222αRD ∈≡ X , and }~),({ )(
}2,1{21{1,2}αRLD ⊆≡ XX .
Note }2,1{21 DDD ⊆ . The probability that the random line segment L is inside the region )(}2,1{
~ αR can
thus be obtained as:
A general framework for error analysis in measurement-based GIS ------Part 2
16
1A 1B
2A2B
1V 2V
][ ]|[ ][ ]|[ ][ ]~),([ 2121}2,1{2121}2,1{}2,1{)(}2,1{21 DDPDDDPDDPDDDPDPRLP ⋅+⋅==⊆ αXX
][ ][ ][ ][ ]|[ 21212121}2,1{ DPDPDDPDDPDDDP ⋅==⋅>
2)1( α−= .
Therefore, when the confidence levels of the endpoints of a line segment are )1( α− , the confidence
level of the resulting approximate Cov- error band is at least 2)1( α− . The band )(}2,1{
~ αR is thus a result
of a small sacrifice in the precision of the actual covariance-based error band for the gain in being able
to give the region a probability, the confidence level. So, it is not as accurate as the actual covariance-
based error band since its area is made larger. Nevertheless, it is simpler and it possesses the following
advantages:
(1) Unlike the covariance-based error band discussed in Part 1, it has a lower-bound description of
the confidence level which can be determined in advance.
(2) It is effective since it is an approximation to the covariance-based error band. In general, it is
conceptually and practically more appropriate than the conventional epsilon error band which
possesses no probability arguments.
(3) Its geometrical shape is simpler than the covariance-based error band and it can easily be
formed.
Fig. 2.4. An approximation to the covariance-based error band
(a) a covariance-based error band
(b) the approximate Cov-error band
(c) difference of bands in (a) and (b)
A general framework for error analysis in measurement-based GIS ------Part 2
17
(4) More importantly, it gives us a unified probability description of GIS operations when
covariance-based error bands are involved. Particularly, it provides a sound basis for the point-
in-polygon analysis under ME.
However, we must stress that the above advantages can be realized only when the error vectors of
the endpoints are independent and normally distributed. The independence condition guarantees the
derivation of the lower bound of the confidence level. The normality condition determines the level
and the elliptical shapes of the confidence regions of the endpoints. Therefore, when the error vectors
of the endpoints are not independent or not normal, the approximate Cov-error band defined above
will not possess the above said properties.
Extendinged on this concept, we can define the uncertainty of the boundary of a polygon as a
collective entity (called the approximate Cov-error region) consisting of the approximate Cov-error
bands of all relevant edges under the condition that the confidence ellipses of all vertices have the
same confidence level )1( α− . Obviously, for a n-sided polygon, the confidence level of such a
confidence region is at least n)1( α− . Although it may appear to be a rather conservative, it enables us
to give a probability description of a random polygon. Thus, when the locational error vectors
corresponding to the vertices of a polygon are normal and independent, the approximate Cov-error
region of its boundary has a lower-bound specification of the confidence level. If the error variance is
not larger than the maximal allowable limits, MAL (see Leung et al (2003a)), the approximate Cov-
error region preserves the topological structure of the original polygon, and the point-in-polygon
analysis under ME will make sense. Thus the probability that a randomly generated polygon is simple
(i.e., it is inside the approximate Cov-error region) is at least n)1( α− .
Remark 4. Therefore our answers to question (b) “Can probability be assigned to an error band? ”
and (c) “What should the error band for a line segment be? ” raised in Subsection 2.3 of Part 1 are that
in the affirmative: yes, we can compute probability for the error band under the concept of an
approximate Cov-error band; and the approximate Cov-error band should probably be an appropriate
construct which can enable us to compute probability for the uncertainty about a line segment, about
polygons thus constructed, and about basic GIS operations such as point-in-polygon analysis.
A general framework for error analysis in measurement-based GIS ------Part 2
18
Finally, under the condition that random polygons are simple, to compute the probability that a
random point V is inside a random polygon )...( 21 nVVVR , we can perform the triangulation of simple
polygons. The basic idea is to first decompose the polygon into triangles, and then apply the triangle
probability model formulated in subsection 2.1.3 to each triangle. By summing these probabilities, we
can obtain the probability for the point-in-polygon problem under ME. Triangulation of simple
polygons is a basic topic in computational geometry (Berg et al., 2000) and spatial database operations
(Rigaux, 2002). It can simply be done by drawing diagonals between pairs of vertices (see Fig. 2.5). A
diagonal is an open line segment that connects two vertices of the polygon and lies in the interior of
the polygon. Strictly speaking, a decomposition of a polygon into triangles by a maximal set of non-
intersecting diagonal is called a triangulation of the polygon (Berg et al., 2000). Triangulations are
usually not unique and can be suitably selected on the basis of need. The following is a basic
conclusion:
V1
3
4
56
2
9
1
23 4
5
6
78
(a) For a concave polygon (b) For a polygon with hole
Fig 2.5 A possible triangulation of a polygon
Lemma 2.2 (Berg et al., 2000) Every simple polygon admits a triangulation, and any triangulation
of a simple polygon with n vertices consists of exactly n −2 triangles.
According to this lemma, the region )...( 002
01 nVVVR of a simple polygon )...( 00
20
1 nVVVPoly can
be triangulated into the following general form:
=)...( 002
01 nVVVR )()()( 0
302
01
023
022
021
013
012
011 mmm VVVRVVVRVVVR +++ , 2−= nm ,
where )( 321 iii VVVR are sub-regions of triangles 321 iii VVV∆ , in which 321 ,, iii VVV are the corresponding
vertices, and )( 321 iii VVVR are mutually exclusive. For a given approximate Cov-error region of a
A general framework for error analysis in measurement-based GIS ------Part 2
19
polygon, a chosen triangulation of the polygon should guarantee that the confidence ellipse region of
each vertex in each partitioned triangle does not intersect the approximate Cov-error band of the edge
determined by the other two points so that the topological structures of each triangle are preserved.
Once such a partition is determined, any random polygon )...( 21 nVVVPoly falling into the approximate
Cov-error region of )...( 002
01 nVVVPoly satisfies
=)...( 21 nVVVR )()()( 321232221131211 mmm VVVRVVVRVVVR +++ . (2.28)
It is apparent from the previous derivation that the probability that (2.28) holds is still at least
n)1( α− . If such partition cannot be found, the widths of the approximate Cov-error bands should be
decreased, that is, the confidence level )1( α− should be decreased. By adjusting )1( α− , the required
partition can be found.
In general, let
{≡A partition (2.28) holds } (2.29)
be a random event. By (2.27) and (2.28), the following (conditional) probability formula for the point-
in-polygon problem holds:
]|)...([ 21 AVVVRVP n∈ ]|)([ ]|)([ 232221131211 AVVVRVPAVVVRVP ∈+∈=
]|)([ 321 AVVVRVP mmm∈++
]|0[ ]|0[ ]|0[ Δ2Δ1Δ AZPAZPAZP m ≥++≥+≥= , (2.30)
where 1max,min,Δ−≡ iii ZZZ is the random variable corresponding to 321 iii VVV∆ . This is the algebra-based
probability model for point-in-polygon analysis when points and polygons are random. A remarkable
feature of the model is that it is a conditional probability model. It is decided by the characteristics of
the random polygons. If we do not use the conditional probability to restrict the possibility of having
the singular behavior of random polygons, the point-in-polygon problem when points and polygons
are random will be ill-defined.
Remark 5. The conditions with whichrequired by the probability model (2.30) is involved are: (1)
random polygons are simple in order to guarantee that the point-in-polygon problem is not ill-defined;
(2) the partition (2.28) holds. These two conditions are given by the probability ][ AP . When a
A general framework for error analysis in measurement-based GIS ------Part 2
20
legitimate partition of )...( 002
01 nVVVPoly on the basis of its approximate Cov-error region is found,
the corresponding probability ][ AP is larger than n)1( α− , i.e., >][ AP n)1( α− . Furthermore,
although (2.30) is derived for a simple polygon, we can indeed see from the above derivation that it
still holds for deformed polygons, e.g., polygons with holes, as long as (2.28) holds. Therefore, for
polygons with holes or region objects consisting of several non-connected polygons, when the confidence ellipse region of each vertex in each partitioned triangle does not intersect the approximate
Cov-error band of the edge determined by the other two points at a certain confidence level, the
conditional probability model (2.30) holds. Such regions can occur in GIS. For example, there is may
be a lake in a forest or there are may be islands in an ocean.
Theoretically, the larger the number of edges of a polygon is, the larger the number of triangles
obtained from triangulation becomes. In this case, choosing a required triangulation of the polygon
may be difficult. From the practical point of view, the number of necessary triangles for a point-in-
polygon problem may usually be reduced since some probabilities in the right hand side of (2.30) may
be approximately zero, i.e., these triangles have very little contributions to the left hand side of (2.30).
For an illustration, Fig. 2.6 (a) is shows the approximate Cov-error regions and the partitioned
triangles of a polygon in Fig. 2.5(a). It can be observed (in Fig. 2.5(b)) that the confidence ellipse of
the point 0V at confidence level )1( α− does not intersect with the approximate Cov-error regions of
some of the triangles. Recalled that “the random event A in (2.29) occurs” means that random
polygons fall into the approximate Cov-error regions of the polygon/corresponding triangles.
Therefore, under this condition, the probability that a random point V is inside random triangles in Fig.
2.6(b) is not larger than 2)1(1 α−− (the proof is simple and is given in Appendix 3). Accordingly, the
point-in-polygon problem for this particular case becomes approximately two point-in-triangle
problems shown in Fig. 2.6(c) and (d) although the true polygon has six partitioned triangles.
A general framework for error analysis in measurement-based GIS ------Part 2
21
Fig 2.6 A reduction of the polygon model
3. Simulation Experiments In this section, we give some simulation experiments to show the applications of the algebra-based
probability models in point-in-polygon analysis.
Example 3.1. Point-in-triangle problem. Let the three true vertices of the triangle be 01V (0,0),
02V (1,2) and 0
3V (3, −1). We consider two cases: (i) the true point )1,2(0V is outside the triangle region
(see Fig.3.1(a)); (ii) the true point )6.0 ,8.1(0V is inside the triangle region (see Fig.3.2(a)). Then for
case (i), we have =T)4(μ ) 1 ,2 ,1 ,3 ,2 ,1 ,0 ,0( − , and for case (ii), we have =T
)4(μ
) .60 ,.81 ,1 ,3 ,2 ,1 ,0 ,0( − . Assume that the random error vector iε , 3 ,2 ,1=i , v, in (2.23) are
independently and identically distributed as a circular normal distribution, where the covariance matrix
is
22
2
2
00 IΣ σσ
σσ =
≡ . (3.1)
Based on the above, we can perform the point-in-triangle analysis by a simulation experiment.
This relation under ME becomes uncertain and only the results of 100 simulations are respectively
(a) Approximate Cov-error regions (b) some Cov-error regions and the of the polygon in Fig. 2.5(a) error ellipse do not intersect
(c) A Cov-error band and the error ellipse (d) Another Cov-error band and the error ellipse intersect intersect
0V
A general framework for error analysis in measurement-based GIS ------Part 2
22
shown in Fig. 3.1(b) and Fig. 3.2(b) for illustration purposes. In terms of various values of the error
variance 2σ , we can use the point-in-triangle probability model (2.27) to compute the probability that
the random point V is in the region of the random triangle )( 321 VVVR . For each case, we actually run
the simulation experiment 10 times with a sample size of 1000 each. The mean results are listed in
Table 3.1 and plotted in Fig.3.3. We can observe that the smaller the variance 2σ get, the smaller is
the uncertainty about whether the point V is in the triangle. As 2σ increases, the uncertainty increases
accordingly.
-1 0 1 2 3X.1
-2
-1
0
1
2
X.2
A
B
Q
C
-1 0 1 2 3
X.1
-2
-1
0
1
2
X.2
(a) A point is outside a triangle (b) Random points and random triangles
Fig. 3.1. Case (i) the true point 0V is located outside the triangle region
-1 0 1 2 3X.1
-2
-1
0
1
2
X.2
B
Q
A
C
-1 0 1 2 3
X.1
-2
-1
0
1
2
X.2
(a) A point is inside a triangle (b) Random points and random triangles
Fig. 3.2 Case (ii) the true point 0V is located inside the triangle region
01V
02V
03V
0V
V
02V
0V
03V
01V
A general framework for error analysis in measurement-based GIS ------Part 2
23
0.05 0.30 0.55 0.80 1.05 1.30 1.55Sequare of Sigma
0.0
0.2
0.4
0.6
0.8
1.0
Case ( i )Case ( ii )
Fig. 3.3 The mean estimations of probability with respect to value changes in the error variance
Table 3.1. Probability estimateions for different values of the error variance in Example 3.1
2σ 0.001 0.005 0.01 0.05 0.25 0.50 0.75 1.00 1.25 1.50
(i) 0.0000 0.0010 0.0109 0.1524 0.3068 0.2830 0.2478 0.2331 0.1990 0.1902
(0.0000 0.0013 0.0023 0.0127 0.0085 0.0158 0.0088 0.0126 0.0110 0.0114)
(ii) 0.9977 0.8960 0.8144 0.6479 0.5242 0.4011 0.3159 0.2625 0.2342 0.2075
(0.0013 0.0101 0.0105 0.0154 0.0116 0.0144 0.0126 0.0125 0.0146 0.0191)
* For each case, the first line represents the mean estimated probability and the second line represents the corresponding standard deviations
Example 3.2 (continuation of Example 3.1) We examine the probability estimation in the point-
in-triangle problem under case (i) in which structures of the error covariance matrices for the four
points are made different. The distributions of the error vectors iε of all points are independent and
normal for each case. Define
≡Σ 2
221
2121
, 21 σσρσσρσσρ
σσ , (3.2)
and choose σΣ in (3.1) with 05.02 =σ and ρσσ 21 ,Σ in (3.2) with 6.0=ρ , 1.01 =σ and 3.02 =σ . As
an illustration, for the triangle 03
02
01 VVV∆ and the point 0V , we carried out the simulation experiments
with respect to case (i) in Example 3.1 only. The five cases of this example are listed in the first
column coded under the common symbol “1,2,3,V ” in Table 3.2, where the first three alphanumeric
characters “1,2,3” denote the corresponding vertices 1V , 2V , and 3V of the triangle and the fourth
character V denotes the point. For each case, we run the simulation experiment 10 times with a sample
2σ
A general framework for error analysis in measurement-based GIS ------Part 2
24
size of 10000 in each run. The point-in-triangle probability model (2.27) is used again. Simulation
results in each run, together with the mean of the 10 runs are tabulated in Table 3.2.
Table 3.2. Probability estimations for different error structures of points in Example 3.2*
Case 1, 2, 3, V Sample estimate Mean
(i) + + + + 0.1652 0.1564 0.1498 0.1615 0.1632 0.1548 0.1516 0.1529 0.1670 0.1635 0.1586 (ii) + + + − 0.1648 0.1646 0.1540 0.1567 0.1601 0.1621 0.1572 0.1607 0.1577 0.1594 0.1597
(iii) + + − − 0.1622 0.1615 0.1664 0.1602 0.1636 0.1669 0.1640 0.1594 0.1658 0.1666 0.1637
(iv) + − − − 0.1498 0.1595 0.1574 0.1582 0.1625 0.1587 0.1564 0.1504 0.1581 0.1540 0.1565
(v) − − − − 0.1586 0.1572 0.1517 0.1620 0.1576 0.1565 0.1586 0.1650 0.1575 0.1590 0.1584
* The symbol “+” denotes circular errors for (3.1) and the symbol “−” denotes elliptical errors for (3.2).
It can be observed that estimation results have very little difference for cases (i) and (ii). It shows
that by using two different error structures (3.1) and (3.2) for the random point, it can affect
remarkably in each run the probability that the random point V is inside the random triangle with the
same error covariance matrices for its vertex coordinates. Averaging the 10 runs, the difference of the
mean estimates of the two cases is still noticeable: 0.1597− 0.1586 = 0.0011. Once the error covariance
matrix 3Σ of a vertex ( 3V ) is changed and the others kept unchanged (i.e., case (iii) versus case (ii)),
the effect of the change in error covariance structure of the vertex becomes larger (as indicated in the
rows corresponding to cases (ii) and (iii) in Table 3.2), with the difference of the means: 0.1637− 0.1597
= 0.0040. From case (iii), by making one more change in the error covariance matrix 2Σ of the vertex
2V (i.e., case (iv)), the effect is even more apparent. The difference of the mean estimates, 0.1637−
0.1565 = 0.0072, is indeed the largest. Comparing the results obtained from cases (iv) and (v), however,
the change in the error covariance 1Σ of point 1V has very little effect on the probability
estimate. It is natural since 01V is far away from point 0V . Its effect should be small whether or
X.1
X.2
0 1 2 3
-2-1
01
23
A general framework for error analysis in measurement-based GIS ------Part 2
25
0 1 2 3
X.1
-2
-1
0
1
2
X.2
(a) Cov-error band with level 0.9 (b) 100 simulated random triangles
Fig. 3.4. The Cov-error band and simulation results when covariance matrices are (3.2)
not its error covariance matrix varies. One last observation:, when errors of the four points are
homogeneous (such as cases (i) and (v)), the means of the probability estimates are almost the same.
Even though the difference in terms of mean probabilities may not seem to be outstanding, it should
however be noted that by taking the mean value, we actually reduce the randomness among individual
samples, which may be large. This example shows the applicability of the point-in-triangle probability
model under different error structures, and confirms that the model can convey the effect of such
differences. Fig. 3.4 depicts the covariance-based error band and 100 simulation results for case (v).
Example 3.3 Point-in- polygon problem. Consider two cases: (i) the true point 0V (1.5, −1) is
outside the region of a concave polygon (see Fig.3.5(a)); (ii) the true point 0V (−0.4, −1.8) is outside
the region of a convex polygon (see Fig.3.5(b)). The vertices of the concave polygon are 01V (0, 0),
02V (1, 2), 0
3V (3, −1), 04V (2, −2), and 0
5V (1.6, −0.5), and the vertices of the convex polygon are the
same except for 05V , which becomes 0
5V (−0.2, −1.5). Using the polygon probability model (2.30),
we can estimate the probability that the random point V is in the region of a random polygon.
For simplicity, we choose the error structure (3.1) as the error covariance matrices of all vertices
and the point 0V . First, we can give the empirical values of the maximal allowable limits (MAL) 2σ ,
discussed in Part 1, for vertices on the polygons with different confidence levels α−1 . Since only 2σ
02V
03V
01V
0V
A general framework for error analysis in measurement-based GIS ------Part 2
26
varies under the condition that the regions are given, by adjusting 2σ the corresponding MAL values
can be obtained (Table 3.3). It can be observed that for the concave and convex polygons for this
example, the MALs 2σ decreases as the confidence level increases and the MALs of the convex
polygon are much larger than that of the concave polygon at the same confidence level. Moreover, if
the confidence region of the boundaries of the concave polygon is determined by the Cov-error band,
the MAL values are all larger than that by the approximate Cov-error band (see Fig. 3.5(c) and (e)).
However, for the convex polygon, a similar case does not occur since its MALs are completely
determined by the confidence regions of the vertices and there is more room for adjusting the Cov-
error band (as evident in Fig. 3.5(d) and (f)). Furthermore, after the triangulation has been made, the
MALs corresponding to these triangulations usually need to be further adjusted in order to guarantee
that the corresponding confidence regions of each vertex in each partitioned triangle does not intersect
the confidence band of the opposite edge. For the concave polygon in this example, there is no such
problem. For the convex polygon, the new MALs (see Fig. 3.6(a) and (b)) with respect to the
approximate Cov-error band and the Cov-error band become smaller than the MALs with respect to
the polygon. That is, under the condition that α−1 is invariant, the widths of the corresponding error
bands become smaller. A comparison can be observed from Fig. 3.6 and Fig. 3.5(d) and (f). The
MALs values listed in Table 3.3 for the convex polygon are the adjusted results.
0 1 2 3
X.1
-2
-1
0
1
2
X.2
-1 0 1 2 3
X.1
-3
-2
-1
0
1
2
3
X.2
(a) The point is outside the concave polygon (b) The point is outside the convex polygon
01V
02V
03V
04V
05V
0V
02V
01V
05V0V
04V
03V
A general framework for error analysis in measurement-based GIS ------Part 2
27
X.1
X.2
-1 0 1 2 3 4
-3-2
-10
12
3
X.1
X.2
-1 0 1 2 3 4
-3-2
-10
12
3
(c) Approximate Cov-error band and MAL (d) Approximate Cov-error band and MAL
X.1
X.2
-1 0 1 2 3 4
-3-2
-10
12
3
X.1
X.2
-1 0 1 2 3 4
-3-2
-10
12
3
(e) Covariance-based error band and MAL (f) Covariance-based error band and MAL
Fig. 3.5. A comparison of MAL in concave and convex polygons ( =−α1 0.90)
X.1
X.2
-1 0 1 2 3 4
-3-2
-10
12
3
X.1
X.2
-1 0 1 2 3 4
-3-2
-10
12
3
(a) Confidence region using approximate C-error bands (b) Confidence region using covariance-based error bands
Fig. 3.6. Confidence regions of triangulations for the convex polygon ( =−α1 0.90)
In Table 3.3, it can be observed that the larger is the confidence level α−1 , the smaller is the
MAL 2σ . Since the confidence region with larger confidence level is wider, there is much less room Comment [MG1]: The MAL symbols are showing differently in this paper than in 1 – in both cases there is a parenthesis above the sigma, here the parenthesis is horizontal but in paper 1 it was vertical
A general framework for error analysis in measurement-based GIS ------Part 2
28
to change the error variance before non-adjacent error bands intersect. Consequently, the
corresponding MAL 2σ is smaller.
Table 3.3. MAL values for vertices of the corresponding polygons with different confidence levels*
Concave polygon Convex polygon
α−1 0.80 0.85 0.90 0.95 0.99 0.80 0.85 0.90 0.95 0.99
2σ (1) 0.064 0.05 0.042 0.032 0.021 0.158 0.134 0.108 0.084 0.054
(2) 0.084 0.075 0.054 0.044 0.028 0.158 0.134 0.108 0.084 0.054
* (1) when using the approximate Cov-error band and (2) when using the covariance-based error band
To use the model (2.30), we choose =−α1 0.90. Thus the conditional probability ][ AP of the
following results is at least =− 5)1( α 0.59. It should be noted that this is a conservative lower bound
but not the true value, which may indeed be much larger. Of course, to boost the probability, we can
choose a larger α−1 , but from Table 3.3 the MAL will decrease accordingly and the error variances
may not be within the range bounded by the MAL. For example, if 2σ of concern is 0.04 in case (1)
and we choose =−α1 0.99, this 2σ is then larger than the corresponding MAL 2σ = 0.021. Thus the
validity of the obtained probability estimations cannot be guaranteed. We must then need to make a
tradeoff. This shows the practical value of MAL.
For the concave polygon, we first partition it into three triangles: ∆V1V2V5, ∆V2V3V5, and ∆V3V4V5.
The probability estimations are then made for each triangle by the triangle probability model (2.24)
with a sample size of 1000. The results are tabulated in Table 3.4, where the ‘Sum’ column is the
result obtained by applying (2.30), and the ‘Whole’ column is the result obtained by simulating
directly (i.e., simulating the whole polygon) the probability P[V∈R(V1V2V3V4V5)|A].
After the approximate Cov-error regions of the polygons in cases (1) and (2) with confidence level
0.90 are respectively determined, the corresponding triangulations are selected as shown in Fig. 3.5(a),
(b). For the concave polygon and the triangulation, since the confidence ellipse region of each vertex
in each partitioned triangle does not intersect the approximate Cov-error band of the opposite edge, the
probability model (2.30) can be applied and >][ AP 0.59. From Table 3.4, we can find that the result,
under “Sum”, obtained by applying (2.30) is almost the same as the one, under “Whole”, obtained by
A general framework for error analysis in measurement-based GIS ------Part 2
29
directly simulating the polygon as a whole. When 2σ is not larger than 2σ , the difference between
the results of the last two columns is very small, which may well be due to randomness instead. When
2σ = 0.06, the difference is bigger but is still relatively small. It shows that 2σ is conservative.
However, when 2σ = 0.07, 2 2σ , 3 2σ , the difference is getting bigger and bigger. This is due to having
a 2σ exceedingly larger than the MAL, resulting in the change of topological properties of the
random polygons. Therefore, the introduction of MAL, 2σ , is necessary in order to make the
corresponding probability legitimate.
Table 3.4 Probability estimations for the point-in-polygon example in Fig. 3.5(a)
2σ Triangles Sum Whole
1.2.5 2.3.5 3.4.5 0.001 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.000 0.008 0.008 0.005 0.010 0.000 0.000 0.033 0.033 0.035 0.020 0.000 0.002 0.104 0.106 0.106 0.030 0.004 0.006 0.141 0.151 0.152
0.040(= 2σ (1)) 0.007 0.024 0.154 0.185 0.195
0.050(= 2σ (2)) 0.019 0.031 0.179 0.229 0.230
0.060 0.016 0.042 0.195 0.253 0.257 0.070 0.018 0.060 0.186 0.264 0.279
0.080(=2 2σ (1)) 0.023 0.059 0.190 0.272 0.296
0.120(=3 2σ (1)) 0.049 0.082 0.187 0.318 0.346
Without further elaboration, a similar experiment can be performed for the convex polygon and
the results can likewise be interpreted.
4. Conclusion We have argued in this part of the series that point-in-polygon analysis under ME, i.e., when points
and polygons are random, is in fact an ill-defined problem. The legitimacy of the problem all hinges
on whether the concerned polygon is simple and forms a region. To solve the ill-defined problem, we
have made use of the MAL and introduced the concept of an approximate covariance-based error band
from which we can render a probability, a conditional probability to be precise, description of the
problem (a conditional probability to be precise). It This approach clarifies the issues surrounding the
point-in-polygon analysis and enables a more accurate investigation that forms a basis for further
A general framework for error analysis in measurement-based GIS ------Part 2
30
study. Analytically, we adopt the idea of polygon triangulation and decompose the point-in-polygon
problem into a multiple point-in-triangle problems whose solution is obtained by the proposed
algebra-based probability model. Instead of tackling the problem with geometry which will involve the
complex discussion of polygon convexity and nonconvexity, we have proposed to use four concise
quadratic form variables, which have a unified structure, to compute the probability that a random
point is inside a random polygon. The validity and effectiveness of the proposed model have also been
demonstrated with simulation experiments. An outstanding problem for further study is to obtain the
analytical distribution of these quadratic form variables under a certain condition. This will make the
approach theoretically more complete.
In Part 3 of the present series, we will employ the quadratic forms to analyze the intersections and
polygon-on-polygon problems. It will be demonstrated that the quadratic forms serve as a unified basis
to solve the point-in-polygon, line-in-polygon and polygon-on-polygon problems under ME.
Appendix 1 Proof of Lemma 2.1
Assume that there are three non-negative numbers such that equation (2.3) and (2.4) hold. If
11 =λ , then 1xx = . Consequently, the point V is located in the triangle 321 VVV∆ . If 11 ≠λ , then
01 1 >− λ . Thus the equation (2.3) can be expressed into
−
+−
−+= 31
32
1
2111 11
)1( xxxxλ
λλ
λλλ *111 )1( xx λλ −+≡ , (A.1)
where 31
32
1
2* 11
xxxλ
λλ
λ−
+−
= .
Obviously, the point *V )( ** xV= is located on the edge 32VV of the triangle 321 VVV∆ and thus it
is in the triangle. According to (A.1), the point V is located on the line segment *1VV and of course it
belongs to the region of the triangle 321 VVV∆ . If the point V is located in the interior of the triangle
321 VVV∆ , then the line segment from 1V passing through V will intersect the edge 32VV at the point
*V . We have
])1()[1()1( 3222111*111 xxxxxx γγλλλλ −+−+=−+= ,
32122111 )]1)(1[(])1[( xxx γλγλλ −−+−+= 332211 xxx λλλ ′+′+′≡ .
Comment [MG2]: Under what condition?
A general framework for error analysis in measurement-based GIS ------Part 2
31
It is easy to check that 21 , λλ ′′ , and 3λ′ satisfy the specific conditions. Therefore, equation (2.3)
holds. ٱ
Appendix 2 Proof of Remark 3
Note that ii ΔΔ −=T and 0T0 HH −= . According to the formula =⊗ T)( BA TT BA ⊗ , then
iiii HHΔHΔH =−⊗−=⊗≡ )()()( 0T
0T . It is easy to show that 2)(rank =iΔ , )(rank 0H 2= , thus
we have 4)(rank )(rank)(rank 0 == HΔH ii . Finally, as the eigenvalues of 0H are i , i− , and the
eigenvalues of iΔ are i23 , i2
3− , 0, 0, where i is the imaginary unit, then their products are the
eigenvalues of 0HΔH ⊗≡ ii ٱ .
Appendix 3 Proof of a probability inequality in the discussion of (2.30)
Let )(0αR be the confidence ellipse of a point 0V with confidence level )1( α− and −R be the
union of the approximate Cov-error regions (with the same confidence level) of all partitioned triangles which do not intersect )(
0αR . Thus, =∩ −RR )(
0α ∅ (an empty set). For independent random
point V and random triangle 321 iii VVV∆ , define the random events }{ )(
0αRVA ∈≡ , }{ 321 −⊆∆≡ RVVVB iii , }{ 321 iii VVVVC ∆∈≡ .
Then α=][ AP , α=][ BP , 0]|[ =ABCP , and
=∆∈ ][ 321 iii VVVVP ][ CP ][ ]|[ ][ ]|[ ABPABCPABPABCP ⋅+⋅= 2)1(1][ ][ 1][ 1][ α−−=⋅−=−=≤ BPAPABPABP ٱ .
References Berg, M. de, M. van Kreveld, M. Overmars, and O. Schwarzkopf. 2000. Computational Geometry:
Algorithms and Applications (2nd ed.), Berlin: Springer-Verlag. Blakemore, M. 1984. Generalization and error in spatial data bases, Cartographica, 21,131-139. Bolstad, P.V., Gessler, P. and Lillesand, T.M. (1990). Positional uncertainty in manually digitized map
object, Int. J. Geographical Information Systems, 4(4), 399-412. Egenhofer, M.J. and R.D. Franzosa. 1991. Point-set topological spatial relations, International Journal
of Geographical Information Systems, 5,161-174. Ehlschlaeger, C. R. (2002). Representing multiple spatial statistics in generalized elevation uncertainty
models: moving beyond the variogram, Int. J. Geographical Information Systems, 16(3), 259-285.
Haines, E. 1994. Point in Polygon Strategies. In Graphics Gems IV, ed. Paul Heckbert, Boston: Academic Press, p. 24-46.
Heuvelink, G.B.M. 1998. Error Propagation in Environmental Modelling with GIS, London: Taylor & Francis.
Goodchild, M.F. and S. Gopal. (Eds). 1989. Accuracy of Spatial Databases, London: Taylor & Francis.
A general framework for error analysis in measurement-based GIS ------Part 2
32
Goodchild, M.F., Parks, B.O. and Staeyert, L.T. (Eds). (1993). Environmental Modelling with GIS, New York: Oxford University Press.
Leung, Y., and J.P. Yan. 1998. A locational error model for spatial features. Int. J. Geographical Information Science, 12, 607-620.
Leung, Y., and J.P. Yan. 1997. Point-in-polygon analysis under certainty and uncertainty. GeoInformatica, 1, 93-114.
Leung, Y., J. H. Ma, and M.F. Goodchild. 2003a. A general framework for error analysis in measurement-based GIS---Part 1: the basic measurement-error model and related concepts. (unpublished paper)
Leung, Y., J. H. Ma, and M.F. Goodchild. 2003b. A general framework for error analysis in measurement-based GIS---Part 2: the algebraic-based probability model for point-in-polygon analysis. (unpublished paper)
Leung, Y., J. H. Ma, and M.F. Goodchild. 2003c. A general framework for error analysis in measurement-based GIS---Part 3: error analysis for intersections and overlays. (unpublished paper)
Leung, Y., J. H. Ma, and M.F. Goodchild. 2003d. A general framework for error analysis in measurement-based GIS---Part 4: error analysis for length and area measurements. (unpublished paper)
Rigaux, P. , M. Scholl and A.Voisard, 2002. Spatial Databases with Application to GIS. San Francisco : Morgan Kaufmann Publishers.
Stanfel, L.E., M. Conerly, and C. Stanfel. 1995. Reliability of polygonal boundary of land parcel. Journal of Surveying Engineering, 121(4): 163-176.
Turkington, D.A. 2002. Matrix Calculus and Zero-One Matrices. Cambridge, UK: Cambridge University Press.