lecture 24: more on data quality and metadata by austin troy ------using gis-- introduction to gis

22
Lecture 24: More on Data Quality and Metadata By Austin Troy ------Using GIS-- Introduction to GIS

Post on 19-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Lecture 24:More on Data Quality and Metadata

By Austin Troy

------Using GIS--Introduction to GIS

©2005 Austin Troy

Random and Systematic error•Error can be systematic or random

•Systematic error can be rectified if discovered, because its source is understood

•A common example is where an remote sensing instrument consistently measures data erroneously because of bad calibration—if the problem in calibration can be understood and accounted for, then that error is called systematic

•Another example: projecting map data using the wrong zone would result in consistently wrong data

Introduction to GIS

©2005 Austin Troy

Random and Systematic error•Random error cannot be controlled for because its source is not understood.

•Random errors are often introduced in little bits at each stage of data collection and processing

•Sources can range from slight air turbulence when an airplane is collecting RS data to a file getting corrupted in data transfer

Introduction to GIS

©2005 Austin Troy

Random and Systematic error•Systematic errors affect accuracy, but are usually independent of precision; data can use highly precise methods but still be inaccurate due to systematic error

Introduction to GIS

Accurate and precise: no systematic , little random error

inaccurate and precise: little random error but significant systematic error

Accurate and imprecise: no systematic , but considerable random error

inaccurate and imprecise: both types of error

©2005 Austin Troy

Types of Error Sources•Burrough (1986) divides error sources into :

1.Obvious sources of error.

2.Errors resulting from natural variations or from original measurements.

3.Errors arising through processing.

The third type is the least obvious

Introduction to GIS

©2005 Austin Troy

1. Obvious Error Sources•Areal cover:

•E.g. cloud cover in remote sensing or missing/incomplete data at coverage boundaries

•Older maps where parts are unknown

Introduction to GIS

•Currency/ timeliness: Being obviously out of date

•Timeliness, or currency are not so easy to judge: depends on the types of data looking at

©2005 Austin Troy

2. Measurement Errors•Positional accuracy:

•Random or systematic equipment malfunction, and misuse

•Bad GPS measurements,

•Map digitizing errors or other input errors

•Location of imprecise boundaries (e.g. vegetation stands, soil zones, flood zones, wetlands, climatic zones) can be compromised by the criteria used to define and classify these zones, as well as by errors in measurement

•Interpolation error

Introduction to GIS

©2005 Austin Troy

2. Measurement Errors•Attribute accuracy

•Misclassification of categorical data(automated or manual)

•The chance for misclassification grows as number of possible classes increases

•Quantitative measurement errors: e.g. truncation

•A common error is to measure a phenomenon in only one phase of a temporal cycle: bird counts, river flows, average weather metrics, soil moisture

Introduction to GIS

©2005 Austin Troy

3. Processing Errors•Numerical processing (math operations, data type, rounding, etc)

•Geocoding (e.g. rural address matching and street interpolation)

•Topological errors from digitizing (overshoots, dangling nodes, slivers, etc)

•Automated classification steps, like unsupervised or supervised land cover classification in remote sensing, can result in processing errors

Introduction to GIS

©2005 Austin Troy

Error propagation and cascading•These can accumulate and cascade through processing steps; each succeeding layer that uses the erroneous processing method compounds the error

•Propagation: where one error leads to another

•Example: if a key reference point was mis-digitized in layer A and that point was used to “register” layer B to layer A, then the error is propagated in layer B and all subsequent layers based on either of them; this error can propagate additively or multiplicatively

Introduction to GIS

©2005 Austin Troy

Error propagation and cascading•Cascading: Refers to when errors are allowed to propagate unchecked from one layer to the next and on to the final set of products or recommendations

•Cascading error can be managed to a certain extent by conducting “sensitivity analysis” on different data layers to see how slight changes in one or several layers would affect the final outcome or product

•Cascading can occur with positional as well as with attribute errors; e.g. errors in the z value of a raster layer would yield cascading errors in map algebra

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•To avoid many of these errors, good documentation of source data is needed

•Metadata is data documentation, or “data about data”

•Ideally, the metadata describes the data according to federally recognized standards of accuracy

•Almost all state, local and federal agencies are required to provide metadata with geodata they make

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•Metadata usually include sections similar to these

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•The federal geographic data committee (FGDC) is a federal entity that developed a “Content Standard for Digital Geospatial Metadata” in 1998, which is a model for all spatial data users to follow

•Purpose is: “to provide a common set of terminology and definitions for the documentation of digital geospatial data. The standard establishes the names of data elements and compound elements (groups of data elements) to be used for these purposes, the definitions of these compound elements and data elements, and information about the values that are to be provided for the data elements” (FGDC)

•All federal agencies are required to use these standards

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•The information requirements in FGDC metadata were chosen based on the four roles that they see metadata playing:

•“availability -- data needed to determine the sets of data that exist for a geographic location.

•fitness for use -- data needed to determine if a set of data meets a specific need.

•access -- data needed to acquire an identified set of data.

•transfer -- data needed to process and use a set of data.” (FGDC)

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•Critical components usually break down into:

•Dataset identification

•Administrative information

•Dataset overview

•Data quality

•Data definition

•Spatial reference information

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•Data identification, overview and administrative info:

•Often combined

•General info: name and brief ID of dataset and owner organization, geographic domain, general description/ summary of content, data model used to represent spatial features, intent of production, language used , reference to more detailed documents, if applicable

•Contact info, constraints on access and use

•This is usually where info on currency is found

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•Data quality should address:

• Positional accuracy

•Attribute accuracy

•Logical consistency

•Completeness

•Lineage

•Processing steps

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•Spatial reference should include:

• horizontal coordinate system (e.g. State Plane)

•Includes projection used, scale factors, longitude of central meridian, latitude of projection origin, distance units

•Geodetic model (e.g. NAD 83), ellipsoid, semi-major axis

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•Data definition, also known as “Entity and Attribute Information,” should include:

•Entity types (e.g. polygon, raster)

•Information about each attribute, including label, definition, domain of values

•Sometimes will include a data dictionary, or description of attribute codes, while sometimes it will reference a documents with those codes if they are too long and complex

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•Data distribution info usually includes:

•Name, address, phone, email of contact person and organization

•Liability information

•Ordering information, including online and ordering by other media; usually includes fees

Introduction to GIS

©2005 Austin Troy

Documentation and Metadata•Metadata reference, or meta-metadata

•This is data about the metadata

•Contains information on

•When metadata updated

•Who made it

•What standard was used

•What constraints apply to the metadata

Introduction to GIS