components of spatial data quality in gis

Post on 13-Jul-2015

307 Views

Category:

Education

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 5, Wednesday 17th September 2014

DEPARTMENT OF GEOGRAPHY AND ENVIRONMENT

UNIVERSITY OF DHAKA

According to NCDCDS (The US National Committee for Digital Cartographic Data Standards) there are five dimensions for geographic data quality. In addition, ICA proposed two more dimensions.

1. Lineage of Geographic data

2. Positional Accuracy of Geographic data

3. Attribute Accuracy of Geographic data

4. Logical consistency

5. Completeness of Geographic data

6. Temporal accuracy

7. Semantic accuracy

This refers to the sources of materials from which a specific set of geographic data was derived

Lineage provides following questions to a user about data:

1. Who collected data?

2. When were the data collected?

3. How collected?

4. How were the data converted?

5. What algorithms were used to process the data?

6. What was the precision of computation?

“Closeness” of coordinate values to the “true” positions of the real world

Generally, maps are accurate to roughly one line width or 0.5 mm. This is known as minimum mapping unit. A 0.5 mm resolution is equivalent to 5 m on 1:10000 scale maps and 125 m on 1:250000 scale maps.

Positional accuracy of data can be measured by two ways:

1. Planimetric accuracy

2. Height accuracy

Scale Effective Resolution (m)

1:2500 1.25

1:10000 5

1:24000 12

1:50000 25

1:100000 50

1:250000 125

1:500000 250

1:1000000 500

1:10000000 5000

Defined as the “closeness” of the descriptive data in the geographic database to the true or assumed values of the real world features that they may represent

Different ways are used to measure attribute accuracy:

For metric attribute (DEM, TIN), accuracy may always be simply expressed as measurement error

For categorical attributes (land use classification) it is very difficult to measure accuracy of spatial data. In such case, attribute accuracy usually evaluated in terms of other factors, such as-

1. The classification scheme

2. The amount of gross error

3. The degree of heterogeneity of the polygons

Defined as a square array of values, denoted as C, which cross-tabulates the number of sample spatial data units assigned to a particular category relative to the actual category as verified by the reference data

Constructed to show the frequency of discrepancies between encoded values and their corresponding reference values of sample

In the error matrix, rows represent the categories of the classification of the database obtained by the user

The columns indicate the classification of the reference data obtained by source data or field visit

Diagonal elements represent correctly classified spatial data

Off-diagonal elements represent the frequencies of misclassification of various categories

If in a particular error matrix, all the non-zero entries lie on the diagonal, it indicates that no misclassification at the sample locations has occurred and an overall accuracy of 100% is obtained

When misclassifications occur, it can be termed either as an error of commission/user accuracy (error of inclusion) or an error of omission/ producers accuracy (errors of exclusion)

Overall Accuracy

Computed by dividing the total number of correctly classified pixels by the total number of reference pixels

The maximum value of the overall accuracy is 100 when there is perfect agreement between the database and the reference data. The minimum value is 0.

OA can also be termed as PCC (Percent Correctly Classified). The following equation can be used:

PCC or OA= (Sd /n)* 100%

Where,

Sd = sum of values along diagonal

N= total number of sample locations

Sample Data

Reference Data Total

Exposed soil

Cropland Range Sparse woodland

Forest Water

Exposed soil

1 2 0 0 0 0 3

Cropland

0 5 0 2 3 0 10

Range

0 3 5 1 0 0 9

Sparse woodland

0 0 4 4 0 0 8

Forest

0 0 0 0 4 0 4

Water

0 0 0 0 0 1 1

Total 1 10 9 7 7 1 35

This can be computed by dividing the number of correctly classified pixels in each category (on the major diagonal) by number of training set pixels used for that category (the column total)

Producer’s accuracy= (C i / C t) *100%

Where,

Ci= correctly classified sample locations in column

Ct= total number of sample locations in column

EO=100-producer’s accuracy

Calculation of PA

Exposed soil =1/1 =100%

Cropland =5/10 =50%

Range =5/9 =55.6%

Sparse woodland =4/7 =57.1%

Forest =4/7 =57.1%

Water body =1/1 =100%

Computed by dividing the number of correctly classified pixels in each category by the total number of pixels that were classified in that category (the row total)

This figure is a measure of commission error and indicates the probability that a pixel classified into a given category actually represents that category on the ground

UA= (Ri / Rt) *100

Where,

Ri= correctly classified sample locations in row

Rt= total number of sample locations in row

Error of commission=100-users accuracy

Calculation of UA

Exposed soil =1/3 =33.3%

Cropland =5/10 =50%

Range =5/9 =55.6%

Sparse woodland =4/8 =50%

Forest =4/4 =100%

Water body =1/1 =100%

4. Logical consistency

Description of the fidelity of the relationships between the real

world and encoded geographic data

In GIS, topological model is an example of assigning logical

consistency

>> consistency of the data model

>> consistency of the positional and attribute data

>> consistency between data files

5. Completeness of Geographic data

Are all possible objects included within the database?

A. Spatial completeness

B. B. Thematic completeness

6. Temporal accuracy

Measure of data quality with respect to the representation of time in geographic database

A. World time

B. Database time

7. Semantic accuracy

>> how correctly spatial objects are labeled on

named

>> correct encoding in accordance with a set of

features

Datum

A geodetic datum (plural datums, not data) is a reference from

which measurements are made.

In surveying and geodesy, a datum is a set of reference points

on the Earth's surface against which position measurements are

made.

Horizontal datums are used for describing a point on the earth's surface, in latitude and longitude or another coordinate system.

Vertical datums are used to measure elevations or underwater depths.

A coordinate system defines the location of a point on a planar or spherical surface.

Types of coordinate system

A. Based on Nature

B. Based on Extent

A. Based on Nature

1. Plane coordinate system

2. Geographic coordinate system

B. Based on Extent

1. Global coordinate system

2. Local coordinate system

Some coordinate systems

1. Cartesian coordinate system

2. Universal Transverse Mercator (UTM)

3. WGS 84

The World Geodetic System 1984 (WGS84) is the datum used

by the Global Positioning System (GPS). The datum is defined

and maintained by the United States National Geospatial-

Intelligence Agency (NGA).

Coordinates computed from GPS receivers are likely to be

provided in terms of the WGS84 datum and the heights in

terms of the WGS84 ellipsoid.

4. Everest 1830

India and other countries of the world made measurements in

their countries and defined reference surface to serve as

Datum for mapping.

In India the reference surface was defined by Sir George

Everest, who was Surveyor General of India from 1830 to

1843.

It has served as reference for all mapping in India. Indian

system can be called Indian Geodetic System as all

coordinates are referred to it. The reference surface was

called Everest Spheroid.

Geoid

An imaginary surface that coincides with mean sea level in the ocean and its extension through the continents.

A hypothetical surface that corresponds to mean sea level and extends at the same level under the continents.

The geoid is used as a reference surface for astronomical measurements and for the accurate measurement of elevationon the Earth's surface.

Ellipsoid

A geometric surface, symmetrical about the three coordinate axes, whose plane sections are ellipses or circles

top related