11 lecture 18 data quality issues ch. 14. 2 introduction spatial data and analysis standards are...
TRANSCRIPT
![Page 1: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/1.jpg)
11
Lecture 18Data Quality Issues
Ch. 14
![Page 2: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/2.jpg)
2
Introduction
• Spatial data and analysis standards are important because of the range of organizations producing and using spatial data, and the amount of data transferred between these organizations.
• There are several types of standards:– Data standards– Interoperability standards– Analysis standards – Professional and certification standards
![Page 3: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/3.jpg)
3
Introduction (continued)
• National and international standards organizations are important in defining and maintaining geospatial standards:– Federal Geographic Data Committee (FGDC) which
focuses on the national spatial data infrastructure (www.fgdc.gov)
– International Spatial Data Standards Commission which is a clearing house and gateway for international standards
– Open Geospatial Consortium (OGC) which is developing interoperability standards. Web Mapping Service (WMS) standards are an example.
![Page 4: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/4.jpg)
GIS Certification
• What kind of certification is available?
• Two primary options:– Geographic Information Systems Professional
(GISP) is based on your work and volunteering experience.
– ESRI Technical Certifications are test based.
• The third option is a university based certification.
4
![Page 5: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/5.jpg)
5
The Geospatial Competency Model
![Page 6: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/6.jpg)
6
![Page 7: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/7.jpg)
77
GIS Professional Certification URISA is the founding member of the
GIS Certification Institute, the organization that administers professional certification for the field
and is dedicated to advancing the industry.
Education: 30 Points
Experience: 60 Points
Contributions: 8 Points
The additional 52 points can be counted from any of the three categories.
The minimum number of points needed to become a certified GIS Professional as detailed in the three point schedules given below is 150 points. Thus, all applicants are expected to document achievements valued at a minimum of 150 points. To ensure that applicants have a broad foundation, specific minimums in each of the three achievement categories must be met or exceeded. These minimums are as follows:
![Page 8: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/8.jpg)
8
![Page 9: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/9.jpg)
9
A Sample of University Certificates
• UMM – undergraduate
• USM undergrad/grad
• UM – graduate
• Penn State
• University of Denver
• University of Southern California
• George Mason University
![Page 10: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/10.jpg)
1010
Spatial Data Standards
• Data – measurements and observations
• Data quality – a measure of the fitness for use of data for a particular task (Chrisman, 1994).
• It is the responsibility of the user to insure that the data is fit for the task.
• Metadata – data about the data
![Page 11: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/11.jpg)
1111
Spatial Data Standards
• Spatial Data Standards – methods for structuring, describing and delivering spatially-referenced data.
• Media Standards – the physical form of the data (CD/download etc).
• Format Standards – specify data file components and structures. These standards aid in data transfer.
• Spatial Data Accuracy Standards –document the quality of the positional and attribute accuracy.
• Document Standards – define how we describe spatial data.
![Page 12: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/12.jpg)
1212
GIS Is Not PerfectA GIS cannot perfectly represent the world for many
reasons, including: • The world is too complex and detailed. • The data structures or models (raster, vector, or
TIN) used by a GIS to represent the world are not discriminating or flexible enough.
• We make decisions (how to categorize data, how to define zones) that are not always fully informed or justified, and are always biased.
• It is impossible to make a perfect representation of the world, so uncertainty is inevitable
• Uncertainty degrades the quality of a spatial representation
![Page 13: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/13.jpg)
1313
Concepts Related to Data Quality
• Related to individual data sets:– Errors – flaws in data– Accuracy – the extent to which an estimated
value approaches the true value.– Precision – the recorded level of detail of your
data.– Bias – the systematic variation of the data
from reality.• Personal bias• Instrument bias
![Page 14: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/14.jpg)
1414
![Page 15: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/15.jpg)
1515
Concepts Related to Data Quality
• Related to source data:– Resolution – the smallest feature in the data
set that can be displayed.– Generalization- simplification of objects in the
real world to produce scale models and maps.
![Page 16: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/16.jpg)
1616
Resolution and generalization of raster datasets
![Page 17: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/17.jpg)
1717
Figure 10.3 Scale-related generalization
![Page 18: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/18.jpg)
1818
Data Sets Used for Analysis
Must be:– Complete – spatially and temporally– Compatible – same scale, units of measure,
measurement level– Consistent – both within and between data
sets. – And Applicable for the analysis being
performed.
![Page 19: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/19.jpg)
1919
Sources of Error (Uncertainty) in GIS
![Page 20: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/20.jpg)
2020
A Conceptual View of Uncertainty
Real World
Conception
Data conversion and Analysis
Source Data, Measurements &Representation
Result
error propagation
![Page 21: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/21.jpg)
2121
Uncertainty in The Conception of Geographic Phenomena
Many spatial objects are not well defined or their definition is to some extent arbitrary, so that people can reasonably disagree about whether a particular object is x or not. There are at least four types of conceptual uncertainty
• Spatial uncertainty• Vagueness• Ambiguity• Regionalization problems
![Page 22: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/22.jpg)
2222
• Spatial uncertainty occurs when objects do not have a discrete, well defined extent.
• They may have indistinct boundaries.
• They may have impacts that extend beyond their boundaries.
• They may simply be statistical entities.
• The attributes ascribed to spatial objects may also be subjective.
Spatial uncertainty
![Page 23: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/23.jpg)
2323
• Vagueness occurs when the criteria that define an object as x are not explicit or rigorous.
• For example:– In a land cover analysis, how many oaks (or
what proportion of oaks) must be found in a tract of land to qualify it as oak woodland?
– What incidence of crime (or resident criminals) defines a high crime neighborhood?
Vagueness (obscureness)
![Page 24: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/24.jpg)
2424
Ambiguity
Ambiguity occurs when y is used as a substitute, or indicator, for x because x is not available.
• The link between direct indicators and the phenomena for which they substitute is straightforward and fairly unambiguous.
• Indirect indicators tend to be more ambiguous and opaque.
• Of course, indicators are not simply direct or indirect; they occupy a continuum. The more indirect they are, the greater the ambiguity.
![Page 25: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/25.jpg)
2525
• Regional geography is largely founded on the creation of a mosaic of zones that make it easy to portray spatial data distributions.
• A uniform zone is defined by the extent of a common characteristic, such as climate, landform, or soil type.
• Functional zones are areas that delimit the extent of influence of a facility or feature—for example, how far people travel to a shopping center or the geographic extent of support for a football team.
• Regionalization problems occur because zones are artificial.
Regionalization problems
![Page 26: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/26.jpg)
2626
Uncertainty in the measurement of geographic phenomena
Error occurs in physical measurement of objects. This error creates further uncertainty about the true nature of spatial objects.
• Physical measurement error• Digitizing error• Error caused by combining data sets with
different lineages
![Page 27: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/27.jpg)
2727
Physical measurement error
Instruments and procedures used to make physical measurements are not perfectly accurate. For example, a survey of Mount Everest might find its height to be 8,850 meters, with an accuracy of plus or minus 5 meters.
• In addition, the earth is not a perfectly stable platform from which to make measurements. Seismic motion, continental drift, and the wobbling of the earth's axis cause physical measurements to be inexact. (GPSing error, GPSing error, remote sensing errorremote sensing error)
![Page 28: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/28.jpg)
2828
Digitizing error
• A great deal of spatial data has been digitized from paper maps.
• Digitizing, or the electronic tracing of paper maps, is prone to human error. – Lines may be drawn too far, not far enough, or missed
entirely. Errors caused by digitizing mistakes can be partially, but not completely, fixed by software.
– Additional error occurs because adjacent data digitized from different maps may not align correctly. This problem can also be partially corrected through a software technique called rubbersheeting.
![Page 29: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/29.jpg)
2929
Digitizing ErrorAny digitized map requires:
Considerable post-processing Check for missing features
Connect lines Remove spurious polygons Some of these steps can be
automated
![Page 30: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/30.jpg)
3030
Error caused by combining data sets with different lineages
• Data sets produced by different agencies or vendors may not match because different processes were used to capture or automate the data. – For example, buildings in one data set may appear on the
opposite side of the street in another data set. – Error may also be caused by combining sample and
population data or by using sample estimates that are not robust at fine scales.
– "Lifestyle" data are derived from shopping surveys and provide business and service planners with up-to-date socioeconomic data not found in traditional data sources like the census. Yet the methods by which lifestyle data are gathered and aggregated to zones or are compared to census data may not be scientifically rigorous
![Page 31: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/31.jpg)
3131
Uncertainty in the representation of geographic phenomena
• Representation is closely related to measurement. • Representation is not just an input to analysis, but
sometimes also the outcome of it. For this reason, we consider representation separately from measurement.– The world is infinitely complex, but computer system are finite. – Representation is all about the choices that are made in capturing
knowledge about the world
• Uncertainty in earth model: ellipsoid models, datum, projection types
• Uncertainty in the raster data model (structure)• Uncertainty in the vector data model (structure)
![Page 32: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/32.jpg)
3232
• The raster structure partitions space into square cells of equal size (also called pixels).
• Spatial objects x, y, and z emerge from cell classification, in which Cell A1 is classified as x, Cell A2 as y, Cell A3 as z, and so on, until all cells are evaluated.
• A spatial object x can be defined as a set of contiguous cells classified as x.
• Commonly, a cell is not purely one thing or another, but might contain some x, some y, and maybe a bit of z within its area.
• These impure cells are termed mixed pixels or "mixels." • Because a cell can hold only one value, a mixel must be
classified as if it were all one thing or another. Therefore, the raster structure may distort the shape of spatial objects.
Uncertainty in the raster data structure
![Page 33: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/33.jpg)
3333
Error in raster
• raster- because of the distortions due to flattening, cells in a raster can never be perfectly equal in size on the Earth’s surface. - when information is represented in raster form all detail about variation within cells is lost, and instead the cell is given a single value. largest sharelargest share, central central pointpoint (f.g. USGS DEM), and mean valuemean value (f.g. remote sensing imagery)
Largest share
Central point
8 6 7.5
Mean value
6.33
66.29
8
8
8 6
6
66
6
8x(1/6)+6x(5/6)=6.338x(3/4)+6x(1/4)=7.58x(1/7)+6x(6/7)=6.29
![Page 34: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/34.jpg)
3434
Figure 10.8 Problems with remotely sensed imagery: (left) example of a satellite image with cloud cover (A), shadows from topography (B), and shadows from cloud cover
(C); (right) an urban area showing a building leaning away from the cameraSource: Ian Bishop (left) and Google UK (right)
![Page 35: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/35.jpg)
3535
• Socioeconomic data—facts about people, houses, and households—are often best represented as points.
• For various reasons (to protect privacy, to limit data volume), data are usually aggregated and reported at a zonal level, such as census tracts or ZIP Codes.
• This distorts the data in two ways: – First, it gives them a spatially inappropriate representation
(polygons instead of points); – Second, it forces the data into zones whose boundaries
may not respect natural distribution patterns.
Uncertainty in the vector data structure
![Page 36: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/36.jpg)
3636
Map representation error
Map scale Ground distance, accuracy, or resolution (corresponding to 0.5 mm map distance)
1:1,250 0.625 m
1:2,500 1.25 m
1:5,000 2.5 m
1:10,000 5 m
1:24,000 12 m
1:50,000 25 m
1:100,000 50 m
1:250,000 125 m
1:1,000,000 500 m
1:10,000,000 5 km
![Page 37: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/37.jpg)
3737
Uncertainty in the data conversion and analysis of geographic phenomena
Uncertainties in data lead to uncertainties in the results of analysis; Data conversion and spatial analysis methods can create further uncertainty
• Data conversion error• Georeferencing and resampling• Projection and datum conversions• The ecological fallacy• The modifiable areal unit problem (MAUP)• Classification errors
![Page 38: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/38.jpg)
3838
• The ecological fallacyThe ecological fallacy is the mistake of assuming that an overall characteristic of a zone is also a characteristic of any location or individual within the zone.
• The Modifiable Areal Unit Problem (MAUP)The results of data analysis are influenced by the number and sizes of the zones used to organize the data. The Modifiable Area Unit Problem has at least three aspects:
1. The number, sizes, and shapes of zones affect the results of analysis.
2. The number of ways in which fine-scale zones can be aggregated into larger units is often great.
3. There are usually no objective criteria for choosing one zoning scheme over another.
![Page 39: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/39.jpg)
3939http://www.gistutor.com/concepts/24-intermediate-concept-tutorials/57-
ecological-fallacy-in-gis.html
Ecological Fallacy Example
![Page 40: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/40.jpg)
4040
http://www.google.com/imgres?um=1&hl=en&client=firefox-a&sa=N&rls=org.mozilla:en-US:official&biw=1257&bih=845&tbm=isch&tbnid=ghU6S5VuksC-8M:&imgrefurl=http://www.indiana.edu/~gisci/courses/g438/lectures/gis_census.html&docid=VCO84JSYMIBN2M&imgurl=http://w
MAUP Example
![Page 41: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/41.jpg)
4141
Classification error and quality check
![Page 42: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/42.jpg)
4242
SelectingSelectingROIsROIs
Alfalfa
Cotton
Grass
Fallow
![Page 43: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/43.jpg)
4343
Background:Background: ETM+, 7/15/01
Top image:Top image:IKONOS, Oct, 2000
Classification ResultClassification Result
![Page 44: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/44.jpg)
4444
Confusion Matrix
1686
Grass Alfalfa Cotton Chili Fallow (corn)
total User accuracy (%)
Grass 110 22 0 0 0 132 83.3
Alfalfa 5 105 0 0 0 110 79.5
Cotton 0 0 945 5 0 950 99.5
Chili 0 0 50 42 0 92 45.7
Fallow 0 0 0 0 484 484 100
total 115 127 995 47 484 1768
Producer accuracy (%)
95.6 82.7 95.0 89.4 100
Classification resultsClassification resultsGGrroouunndd ttrruutthh
%4.951768
1686_ AccuracyOverlay
%3.891768/)4844844792995950127110115132(1768
1768/)4844844792995950127110115132(1686_
xxxxx
xxxxxIndexKappa
![Page 45: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/45.jpg)
4545
• Producer accuracy is a measure indicating the probability that the classifier has labeled an image pixel into Class A given that the ground truth is Class A.
• User accuracy is a measure indicating the probability that a pixel is Class A given that the classifier has labeled the pixel into Class A
• Overall accuracy is total classification accuracy.• Kappa index (another parameter for overall accuracy) is a
more useful index for evaluating accuracy.– Errors of commission represent pixels that belong to another class
but are labeled as belonging to the class.– Errors of omission represent pixels that belong to the ground truth
class but that the classification technique has failed to classify them into the proper class.
Bases of Confusion Matrix
![Page 46: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/46.jpg)
4646
Error Propagation
Real World
Conception
Data conversion and Analysis
Measurement &Representation
Result
error propagation
• the errors in the input will propagate to the output of the operation
• error propagation measures the impacts of error (uncertainty) in data on the results of GIS operations
![Page 47: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/47.jpg)
4747
Finding and Modeling Errors
• Checking for errors– Visual inspection during data editing and
cleaning.– Attributes can be checked by using
annotation, line colors and patterns.– Double digitizing– Statistical analysis may identify extreme
values of attributes.
![Page 48: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/48.jpg)
4848
Finding and Modeling Errors
• Error modeling– 1. Epsilon modeling
• Based on a method of line generalization, and adapted by Blakemore.
• It places an error band around a digitized line, describing the probable distribution of error.
• Error distribution is subject to debate:– Normal curve– Piecewise quartile distribution– Bimodal
• The epsilon band can be used in analyses to improve the confidence of the user in the result.
![Page 49: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/49.jpg)
4949
Figure 10.17 Point-in-polygon categories of containmentSource: Blakemore (1984)
![Page 50: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/50.jpg)
5050
Finding and Modeling Errors• Error modeling
– 2. Monte Carlo simulation – used in overlays.• Simulates input data error by adding random noise to the
line coordinates of the map data.
• Each input is assumed to be characterized by an estimate of positional error.
• This changes the shape of the line.
• The process is repeated multiple times and the randomized data put through the GIS analyses.
• Output:– A number
– A map
![Page 51: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/51.jpg)
5151
Figure 10.18 Simulating effects of DEM error and algorithm uncertainty on derived stream networks
![Page 52: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/52.jpg)
5252
Managing GIS Error
• To manage errors we must track and document them.
• The concepts introduced earlier:– Accuracy, Precision, Resolution,
Generalization, Bias, Compatibility, Completeness and Consistency
provide a checklist of quality indicators:
• These should be documented for each data layer.
![Page 53: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/53.jpg)
5353
Managing GIS Error
• Data quality information can be used to create a data lineage.
• A record of the data history that presents essential information about the development of the data.
• This becomes the metadata.
![Page 54: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/54.jpg)
5454
Living with uncertainty
• uncertainty is inevitable and easier to find,• use metadata to document the uncertainty• sensitivity analysis to find the impacts of input
uncertainty on output, • rely on multiple sources of data, • be honest and informative in reporting the results of GIS
analysis.• US Federal Geographic Data Committee lists five
components of data quality: attribute accuracy, positional accuracy, logical consistency, completeness, and lineage (details see www.fgdc.gov)
![Page 55: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/55.jpg)
5555
Basics of FGDC
• Federal Geographic Data Committee (FGDC) metadata answers the who, what, where, when, how and why questions of geospatial data.
• The data structure and elements defined for FGDC metadata are described fully in the “Content Standard for Digital Geospatial Metadata” (CSDGM).
![Page 56: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/56.jpg)
5656
SEVEN SECTIONS OF FGDC
The Federal Geographic Data Committee (FGDC), Content Standard for Digital Geospatial Metadata (CSDGM) organizes a metadata record into seven main sections: – Identification Information– Data Quality Information– Spatial Data Organization Information– Spatial Reference Information– Entity and Attribute Information– Distribution Information– Metadata Reference Information
![Page 57: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/57.jpg)
5757http://www.maine.gov/megis/policies/megisfgdc.rtf
Identification Information
• What is the name of the dataset?• What is the subject or theme of the information included?• What is the scale of the dataset?• What are the attributes of the dataset?• Where is the geographic location of the dataset?• Who developed the dataset?• Who provided the source material for the dataset?• Who will publish the dataset?• When were the features of the dataset identified?• How are the features of the dataset depicted?• Why was the data set created?• Are there restrictions on accessing or using the data?• Are external files available that are related to the dataset?
![Page 58: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/58.jpg)
5858http://www.maine.gov/megis/policies/megisfgdc.rtf
Data Quality Information
• How reliable are the data?• What are its limitations or inconsistencies? • What is the positional and attribute accuracy? • Is the dataset complete? • Were the consistency and content of the data
verified? • Where can the sources of the data be located?• What processes were applied to these sources
and by whom?
![Page 59: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/59.jpg)
5959http://www.maine.gov/megis/policies/megisfgdc.rtf
Spatial Data Organization
• What spatial data model was used to encode the spatial data?
• How many and what kind of spatial objects are included in the dataset?
• Are methods other than coordinates, such as street addresses used to encode locations?
![Page 60: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/60.jpg)
6060http://www.maine.gov/megis/policies/megisfgdc.rtf
Spatial Reference
• Are coordinate locations encoded using longitude and latitude?
• What map projections is used?
• What horizontal datum and/or vertical datum are used?
• What parameters should be used to convert the data to another coordinate system?
![Page 61: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/61.jpg)
6161http://www.maine.gov/megis/policies/megisfgdc.rtf
Entity and Attribute Information
• What geographic information (roads, houses, elevation, temperature, etc.) is described?
• How is this information coded?
• What do the codes mean?
• What source was used for defining the attributes or codes, i.e. Cowardin classification?
![Page 62: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/62.jpg)
6262http://www.maine.gov/megis/policies/megisfgdc.rtf
Distribution
• From whom can the data be obtained?
• What formats are available?
• What media are available?
• Are the data available online?
• What is the price of the data?
![Page 63: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing](https://reader031.vdocuments.us/reader031/viewer/2022012922/5697bf9d1a28abf838c93cf9/html5/thumbnails/63.jpg)
6363http://www.maine.gov/megis/policies/megisfgdc.rtf
Metadata Reference
• When were the metadata compiled, and by whom?
• When was the metadata record created?
• Who is the responsible party?
• When were the metadata last updated?