error & uncertainty: ii ce / enve 424/524. handling error methods for measuring and visualizing...

Error & Uncertainty: II

CE / ENVE 424/524

Handling Error

Methods for measuring and visualizing error and uncertainty vary for nominal/ordinal and interval/ratio data types.

Uncertainty associated with ‘classification’ data types is usually expressed in terms of a probability of being correctly classified

Uncertainty associated with quantitative values is usually expressed as a deviation from the true value.

Classification Uncertainty

Example: Satellite image or aerial photograph is processed and some pixels are inaccurately reported.

Confusion Matrix

A confusion matrix contains information about actual and predicted classifications done by a classification system. Performance of such systems is commonly evaluated using the data in the matrix.

The entries in the confusion matrix have the following meaning:•a is the number of correct predictions of class A, •b is the number of incorrect predictions of class A, •c is the number of incorrect of predictions of class B, and •d is the number of correct predictions of class B.

Actual

Class A Class B

PredictedClass A a b

Class B c d

Confusion Matrix Example

Overall map accuracy = total on diagonal / grand total

Ground Classification

A B C

MapClassification

A 10 2 3

B 0 20 0

C 4 1 10

Ground Classification

A B C

MapClassification

A 10 2 3

B 0 20 0

C 4 1 10

Ground ClassificationGround Classification

AA BB CC

MapClassification

MapClassification

AA 1010 22 33

BB 00 2020 00

CC 44 11 1010

Confusion Matrix Example

Overall accuracy (percent correctly classified): (10+20+10)/(10+2+3+0+20+0+4+1+10)= 40/50 = 80%

Error of commission for class A: (2+3)/(10+2+3) = 5/15 = 33% error

Error of omission for class A: (0+4)/(10+0+4) = 4/14 = 29% error

Total

15

20

15

14 23 13 50

User and Producer Perspective

A B C TOTALUser's

Accuracy

A 10 2 3 15 0.67

B 0 20 0 20 1.00

C 4 1 10 15 0.67TOTAL 14 23 13 40

Producer's Accuracy 0.71 0.87 0.77

Overall Accuracy 0.8

• A measure of agreement that compares the observed agreement to agreement expected by chance if the observer ratings were independent

• Expresses the proportionate reduction in error generated by a classification process, compared with the error of a completely random classification.

– For perfect agreement, kappa = 1– A value of .82 would imply that the classification process was avoiding 82 % of

the errors that a completely random classification would generate.

Cohen’s Kappa

n

i

ji

n

i

n

i

jiij

c

ccc

c

ccc

1 ..

....

1 1 ..

..

ci.= sum over all columns for row icj.=sum over all rows for column jc..=grand total sum over all columns or all rows

Sum of diagonal entriesq = number of agreements between prediction and actual that sould occur by chance

kappa is 1 for perfectly accurate data (all N cases on the diagonal), zero for accuracy no

better than chance

Interval/Ratio Data Type Error

Error = Estimated Value – True Value

These errors are often referred to as residuals.

For a set of values, the magnitude of errors is described by the root mean square error (RMSE):

n

xRMSE

n

i 1

2

x = Error

n = number of observations/values

Positional Accuracy AssessmentSummary Table

14

Error Scatterplots

The plot to the right is preferable since they generally fall closer to the diagonal on which perfect estimates would fall

Error Distributions

negative bias positive bias

no bias

Error Distribution Variance (Spread)

Error Propagation

No data stored in a GIS is truly error-free. When data that are stored in a GIS database are used as input to a GIS operation, then the errors in the input will propagate to the output of the operation. Moreover, the error propagation continues when the output from one operation is used as input to an ensuing operation. Consequently, when no record is kept of the accuracy of intermediate results, it becomes extremely difficult to evaluate the accuracy of the final result.

Although users may be aware that errors propagate through their analyses, in practice they rarely pay attention to this problem. No professional GIS currently in use can present the user with information about the confidence limits that should be associated with the results of an analysis.

Living with It (Error)

• As with any inherent problem, first step to dealing with it is to admit it’s there.

• Document the data quality (metadata)

• Conduct error propagation analysis (ex.: sensitivity analysis)

• Use multiple sources of data• The more data sources tell you the same story, the more reliable your story (weight of evidence)

Visualization

Overview

• The techniques of effective data display• How mapping can mislead• How displays are customised to the requirements of

particular applications

Visualization Definitions

“It is a human ability to develop mental representations that allow us to identify patterns and create or impose order” (MacEachren, 1992)

Visualization is the process of representing information synoptically for the purpose of recognizing, communicating and interpreting pattern and structure. Its domain encompasses the computational, cognitive, and mechanical aspects of generating, organizing , manipulating and comprehending such representations.” (Buttenfield and Mackaness, 1999)

Visualization Principles

• Role of visualization in spatial analysis is not limited to maps but extends to numeric and statistical analysis as well.

• The interpretation of a graph or chart is often more efficient than interpretation based on a string of numbers representing the same data.

• “It is abstraction, not realism that give maps their unique power” (Muehrcke, 1990)

• Visualization is needed to:• access pertinent information from large volumes of data• communicate complex patterns effectively• formalize sound principles for data presentation• guide analysis, modeling and interpretation

Visualizing Continuous and Discrete Variation

Graphic Variables

error & uncertainty: ii ce / enve 424/524. handling error methods for measuring and visualizing...

Documents

error error of omission

error uncertainty

error total15201514231350user

visualizing error

error of commission

classification data

random classification

classification process