error & uncertainty: ii ce / enve 424/524. handling error methods for measuring and visualizing...
TRANSCRIPT
Error & Uncertainty: II
CE / ENVE 424/524
Handling Error
Methods for measuring and visualizing error and uncertainty vary for nominal/ordinal and interval/ratio data types.
Uncertainty associated with ‘classification’ data types is usually expressed in terms of a probability of being correctly classified
Uncertainty associated with quantitative values is usually expressed as a deviation from the true value.
Classification Uncertainty
Example: Satellite image or aerial photograph is processed and some pixels are inaccurately reported.
Confusion Matrix
A confusion matrix contains information about actual and predicted classifications done by a classification system. Performance of such systems is commonly evaluated using the data in the matrix.
The entries in the confusion matrix have the following meaning:•a is the number of correct predictions of class A, •b is the number of incorrect predictions of class A, •c is the number of incorrect of predictions of class B, and •d is the number of correct predictions of class B.
Actual
Class A Class B
PredictedClass A a b
Class B c d
Confusion Matrix Example
Overall map accuracy = total on diagonal / grand total
Ground Classification
A B C
MapClassification
A 10 2 3
B 0 20 0
C 4 1 10
Ground Classification
A B C
MapClassification
A 10 2 3
B 0 20 0
C 4 1 10
Ground ClassificationGround Classification
AA BB CC
MapClassification
MapClassification
AA 1010 22 33
BB 00 2020 00
CC 44 11 1010
Confusion Matrix Example
Overall accuracy (percent correctly classified): (10+20+10)/(10+2+3+0+20+0+4+1+10)= 40/50 = 80%
Error of commission for class A: (2+3)/(10+2+3) = 5/15 = 33% error
Error of omission for class A: (0+4)/(10+0+4) = 4/14 = 29% error
Total
15
20
15
14 23 13 50
User and Producer Perspective
A B C TOTALUser's
Accuracy
A 10 2 3 15 0.67
B 0 20 0 20 1.00
C 4 1 10 15 0.67TOTAL 14 23 13 40
Producer's Accuracy 0.71 0.87 0.77
Overall Accuracy 0.8
• A measure of agreement that compares the observed agreement to agreement expected by chance if the observer ratings were independent
• Expresses the proportionate reduction in error generated by a classification process, compared with the error of a completely random classification.
– For perfect agreement, kappa = 1– A value of .82 would imply that the classification process was avoiding 82 % of
the errors that a completely random classification would generate.
Cohen’s Kappa
n
i
ji
n
i
n
i
jiij
c
ccc
c
ccc
1 ..
....
1 1 ..
..
ci.= sum over all columns for row icj.=sum over all rows for column jc..=grand total sum over all columns or all rows
Sum of diagonal entriesq = number of agreements between prediction and actual that sould occur by chance
kappa is 1 for perfectly accurate data (all N cases on the diagonal), zero for accuracy no
better than chance
Interval/Ratio Data Type Error
Error = Estimated Value – True Value
These errors are often referred to as residuals.
For a set of values, the magnitude of errors is described by the root mean square error (RMSE):
n
xRMSE
n
i 1
2
x = Error
n = number of observations/values
Positional Accuracy AssessmentSummary Table
14
Error Scatterplots
The plot to the right is preferable since they generally fall closer to the diagonal on which perfect estimates would fall
Error Distributions
negative bias positive bias
no bias
Error Distribution Variance (Spread)
Error Propagation
No data stored in a GIS is truly error-free. When data that are stored in a GIS database are used as input to a GIS operation, then the errors in the input will propagate to the output of the operation. Moreover, the error propagation continues when the output from one operation is used as input to an ensuing operation. Consequently, when no record is kept of the accuracy of intermediate results, it becomes extremely difficult to evaluate the accuracy of the final result.
Although users may be aware that errors propagate through their analyses, in practice they rarely pay attention to this problem. No professional GIS currently in use can present the user with information about the confidence limits that should be associated with the results of an analysis.
Living with It (Error)
• As with any inherent problem, first step to dealing with it is to admit it’s there.
• Document the data quality (metadata)
• Conduct error propagation analysis (ex.: sensitivity analysis)
• Use multiple sources of data• The more data sources tell you the same story, the more reliable your story (weight of evidence)
Visualization
Overview
• The techniques of effective data display• How mapping can mislead• How displays are customised to the requirements of
particular applications
Visualization Definitions
“It is a human ability to develop mental representations that allow us to identify patterns and create or impose order” (MacEachren, 1992)
Visualization is the process of representing information synoptically for the purpose of recognizing, communicating and interpreting pattern and structure. Its domain encompasses the computational, cognitive, and mechanical aspects of generating, organizing , manipulating and comprehending such representations.” (Buttenfield and Mackaness, 1999)
Visualization Principles
• Role of visualization in spatial analysis is not limited to maps but extends to numeric and statistical analysis as well.
• The interpretation of a graph or chart is often more efficient than interpretation based on a string of numbers representing the same data.
• “It is abstraction, not realism that give maps their unique power” (Muehrcke, 1990)
• Visualization is needed to:• access pertinent information from large volumes of data• communicate complex patterns effectively• formalize sound principles for data presentation• guide analysis, modeling and interpretation
Visualizing Continuous and Discrete Variation
Graphic Variables