digital image analysis

DIGITAL IMAGE ANALYSISImage Classification: Supervised Classification

• Image classification– Quantitative analysis used to automate the identification

of features– Spectral pattern recognition

• Unsupervised classification• Supervised classification• Object-based classification

DIGITAL IMAGE ANALYSIS STEPS

• Fundamental difference:– Unsupervised classification: assigning meaningful

names to spectrally-similar clusters– Supervised classification: assigning spectral clusters

to meaningful names– Object-based classification: a mash-up of both

• Basic problem:– Spectrally-similar clusters can be associated with

several distinct land covers– Land covers can be composed of several spectrally-

distinct areas

THE YIN YANG OF CLASSIFICATION

• Unsupervised classification does not require the analyst to know anything about the area being classified prior to performing the classification.

– The difficult part comes after, when names have to be associated with each cluster.

• Supervised classification requires the analyst to define, a priori, the classes (or features) that will be identified in the image.

– The difficult part occurs in the beginning, when the analyst has to identify (relatively) homogeneous areas in the image that correspond to the classes.

THE YIN YANG OF CLASSIFICATION

• Information classes are those categories of interest that the analyst is actually trying to identify in the imagery, such as different kinds of crops, different forest types or tree species, different geologic units or rock types, etc.

• Spectral classes are groups of pixels that are uniform (or near-similar) with respect to their digital numbers in the different bands of the data.

• The objective is to match the spectral classes in the data to the information classes of interest.

• Rarely is there a simple one-to-one match between these two types of classes.

SUMMARY: THE DIFFERENT CLASSES

• Object-based classification:– Start by doing an unsupervised (but highly parameterized)

classification that creates ‘segments’• Uses both spectral similarity as well as spatial contiguity in

creating the segments• You control how ‘similar’ the spectral and spatial classes have to

be.– Then, using training sites, perform a supervised classification on the

segments.

OBJECT-BASED CLASSIFICATION

Spec

tral

cla

sses

Information classes

• The process typically involves four main steps:1. Identifying training areas for each land cover class2. Creating signatures for those training areas (spectral

response patterns associated with each land cover class)3. Classifying the image

1. Simplification of the classified image may occur4. Determining the classification accuracy

SUPERVISED CLASSIFICATION

Named classes

Training sites

Classified image

Accuracy Assessment

• The first step in a supervised classification is to decide: What classes or features am I interested in?

• The analysts may only be interested in identifying broad (level 1) classes (e.g., forest, urban, water, soil) or they may require knowledge that requires the delineation of fine (level 2) classes (e.g., deciduous forests, coniferous forests, mixed forests, recent clearcuts, high-density residential, low-density residential, commercial, industrial).

STEP ONE: WHAT DO I WANT?

• The greater the number of classes, the less accurate most classifications become, so the need for fine classes must be balanced against the increasing uncertainty associated with finer classes.

• Once the analyst has decided which classes are required, identifying areas in the image that represent the range of spectral responses typically associated with that class is the next step.

• How to do this?

STEP ONE: WHAT DO I WANT?

http://soils.cals.uidaho.edu/soil205-90/Lecture%202.htm

http://soils.cals.uidaho.edu/soil205-90/Lecture%202.htm

• Identifying training sites:– Use higher resolution imagery (e.g., an aerial photograph,

imagery from a higher resolution sensor)– Use an existing map (e.g., GIS layers)– Field observations (using a GPS to provide georeferenced data)

• Problems?– Temporal mismatch may occur– Spatial mismatch might occur– Expensive to conduct field studies– Variation in spectral response may occur because of

topographic complexity, etc.

STEP ONE: HOW DO IDENTIFY SITES?

http://www.intechopen.com/books/biomass-and-remote-sensing-of-biomass/introduction-to-remote-sensing-of-biomass

http://www.intechopen.com/books/biomass-and-remote-sensing-of-biomass/introduction-to-remote-sensing-of-biomass

• Identify training sites: Selecting homogeneous areas in the image that correspond to the land cover classes that you are interested in.

• This involves digitizing polygons (areas) that delimit the training sites--each training site should contain pixels that belong to one of the land cover classes, and should not contain pixels that belong to another class.

• Being too conservative or restrictive in selecting pixels for a given class can be lead to problems, in that most classes encompass a range of spectral responses. A balance is required.

STEP ONE: DELIMITING TRAINING SITES

For example, in creating training sites for an urban class, selecting only the very dense urban core may leave out areas that are less dense but still urban.

STEP ONE: SELECTING TRAINING SITES

• Problems also arise when attempting to delimit, for example, forested areas, when the land cover exhibits variable density.

• Where to ‘draw the line’ is often not easily decided.



Training sites for each class should, in total, encompass at least 10 times as many pixels as there are bands in the image being classified.

STEP TWO: SIGNATURE DEVELOPMENT

STEP TWO: EXAMINING SIGNATURES

A scatterplot oftraining sites’ DNs can show you how much overlap there may be amongst the classes.

• Creating signatures: once the training areas have been delineated, the statistical characteristics of the digital numbers for each class must be determined.

• This question must be answered:– Are the DNs (i.e., spectral response patterns) of the

different classes sufficiently different / unique such that the DNs of an unknown pixel can be confidentially assigned to one class?

STEP TWO: CREATING SIGNATURES

Unknown pixel

Forest

CornHay


I have digitized some training sites on the Houston image.


Some features are welldifferentiated, while others may have very similar spectral response patterns.

If, after examining the histograms, you observe that two classes completely overlap, you may need to create new training sites that better differentiate between the two classes, or decide that the two classes should be combined into one.

Using the Training Samples Manager you can create scatter plots for your training sites.

SCATTER PLOTS

TRAINING SITE STATISTICS

http://kingfish.coastal.edu/marine/Animations/

http://kingfish.coastal.edu/marine/Animations/

• If the spectral response patterns associated with the training sites cannot be clearly differentiated in the histograms / scattergrams, then it will be difficult if not impossible to assign unknown pixels to the appropriate class in the next stage (classification of the image).

• Therefore, creating good training sites and examining the statistical outputs is a very important determinant of the quality of the final output.

STEP TWO: INITIAL QUALITY CONTROL

http://www.crcpress.com/product/isbn/9781566704434

http://www.crcpress.com/product/isbn/9781566704434

• There are a wide variety of image classification techniques available, some of which are known to produce poor quality classifications, but none of which will consistently provide the ‘best’ classification.

– That is, since the quality of the classification depends on factors such as the image quality itself (e.g., was there a lot of haze in the air when the image was taken?), the selection of the training sites--which might not adequately represent the spectral responses of the class, etc.,--no one method can be guaranteed to produce a consistently accurate result.

STEP THREE: CLASSIFICATION

• The basic process that every image classification routine follow is this:

– Use the training sites (‘known’ pixels) and the spectral response patterns derived from them to create ‘idealized’ signatures for each class.

– Determine the extent to which the idealized signatures best match the spectral response patterns present in an unassigned (unknown) pixel.

– Either assign that pixel to the class associated with the best match (i.e., a ‘hard classifier), or assign to the pixel a membership value for each class, ranging from 0 to 100% (e.g., 45%)(i.e., a ‘soft’ classifier).


RECALLING: SIGNATURE DEVELOPMENT


The different classification methods generally differ in how they compare the unknown pixel DN values to the idealized

signatures.

Unknown pixel

Forest

CornHay

• Some of the hard supervised classification methods that have been developed include:– Parallelepiped– Minimum-distance-to-means– Gaussian Maximum Likelihood

CLASSIFICATION METHODS


Parallelepiped

The parallelepiped classification strategy is computationally simple and takes into account the variance in training classes, but problems may arise from parallelepiped overlap due to correlation amongst classes. In this classifier, multidimensional boxes are constructed for each class using the class mean and standard deviation. Each pixel is tested against each box to determine its membership.

Pixel 1: hay; Pixel 2: urban


Minimum distance to mean

The minimum-distance-to-means strategy is mathematically simple and computationally efficient, but it is insensitive to different degrees of variance or covariance in the spectral response of training pixels. In this classifier, the pixel is allocated to the class to which it has the closest mean. You can specify a threshold—how close a pixel’s DNs must be before it can be assigned to a group. The larger (wider) the threshold, the ‘fuzzier’ or less precise the results will be (but fewer pixels will remain unclassified). Pixel 1: corn; Pixel 2: sand

• The Gaussian Maximum Likelihood classifier quantitatively evaluates both the variance (internal variability in a band) and covariance (the similarity between bands) of training class pixels and assumes a normal distribution for training classes (viewable in ArcMap’s Show Statistics—Training Sample Manager). This classifier has been widely used in remote sensing.

• This method typically produces an accurate classification, but assumes the most about how the data (the DNs summarized from the training sites) is distributed.



Maximum likelihood

Takes into account the mean and the standard deviation (variability) of the DNs associated with each land cover class

(and the variability in the DNs associated with each training site between the different bands).

•2Pixel 1: corn; Pixel 2: urban

MAXIMUM-LIKELIHOOD CLASSIFICATION

The standard deviationellipses associated withtwo classes. The one class(B) displays far greatervariability than the other (A),and therefore pixels that are physically-closer to A arenone-the-less associatedwith the broader class B, sincethey are statistically closer to B.

Band 1

Band

2

Class A

Class B

• Soft or fuzzy classifiers assign a degree of membership to each pixel’s spectral reflectance values.

• So, if you have identified seven land cover classes, a soft classifier will create one image for each land cover class.

• Within each image, the pixels will have a value ranging from 0.0 to 1.0, reflecting the degree to which that pixel matches the spectral reflectance pattern of the training site for that class.

• ~ equivalent to ArcMap’s Class Probability

SUPERVISED CLASSIFICATION: SOFTLY

Hard classification

Soft classification

Class probabilities

• An uncertainty image can also be produced--indicating the degree to which no single class stands out above the others.

• That is, if a pixel appears to belong to every class with equal likelihood, then the uncertainty as to which single class it should be assigned to is at a maximum (14). (We are not confident as to which class that pixel should belong to.)

• However, if the pixel appears to belong to only one class, then the uncertainty will be a minimum (1). (We are confident that the pixel should be assigned to a specific class.)

• ArcMaps’ Output Confidence raster produces this output.

SOFTLY, WITH CONFIDENCE?

ARCMAP’S OUTPUT CONFIDENCE RASTER

1 – most confident14 – least confident


Consider, for example, the spectral reflectance values for pixel 2, which would contain the spectral reflectances of grass, trees, concrete (the patio) and roofing tiles (asphalt).

A hard classifier might assign that pixel to grass, since that occupies the largest proportion of the pixel.

A soft classifier might identify that 30% of the pixel is asphalt, 10% is concrete, 20% is tree, and 40% is grass.

The uncertainty associated with pixel 2 would be very high. However, for pixel A the uncertainty would be lower, since a clear majority of the pixel is grass.

• As with hard supervised classification methods, there are several different approaches to assigning an unknown pixel to an ‘idealized’ spectral response pattern.

• The different approaches vary in how they handle the uncertainty (that is, some assume that a pixel must belong to at least one of the land cover classes, while others assume that a pixel may, in fact, belong to a land cover class not identified by the analyst).

• Typically the next step would be to ‘harden’ the results of a ‘soft’ classification and produce a hard (i.e., single) image.


http://www.clarklabs.org/applications/upload/CS_LCM_SpatialPriors1-6-2.pdf

http://www.clarklabs.org/applications/upload/CS_LCM_SpatialPriors1-6-2.pdf

SOFT CLASSIFIERS

Pixels can belong to more than one class.

Often after classifying an image (and assuming that 100% of the pixels have been assigned to a land cover class) we find that there are many ‘orphan’ or non-connected pixels. These pixels may reflect reality (that is, there may be, in fact, a bare patch of concrete in the midst of a forest), or they may reflect a misclassified [mixed] pixel.

REMOVING ‘DUST’: GENERALIZATION

• Generalization works by reassigning a pixel to the most commonly occurring class (e.g., the modalclass) surrounding the isolated pixel.

• The result is a ‘smoother’ image that will be easier to work with (especially if the classified image will be exported to a GIS).

GENERALIZATION: MODAL FILTERS

• Filtering is used for a variety of purposes. – Mean and Gaussian filters are commonly used to

generalize an image (pre-classification). (Why pre?) – The median filter is excellent for random noise removal. – Mode filters are good for filling gaps between polygons

after a vector-to-raster conversion.– Edge enhancement filters accentuate areas of change in

continuous surfaces. – High-pass filters emphasize areas of abrupt change

relative to those of gradual change.

GENERALIZATION: FILTERING THE DATA

SOME OF THE FILTERS AVAILABLE IN ARCMAP

http://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/what-are-the-functions-used-by-a-raster-or-mosaic-dataset.htm




SOME OF THE FILTERS AVAILABLE IN ARCMAP

5 10 5

10 30 10

5 10 5

5 10 5

10 50 10

5 10 5





• How to measure the accuracy of a classified image?

• By comparing the classified image to an independent source of information.

• The confusion or misclassification matrix compares recorded classes (that is, the classified pixels) with data obtained by a more accurate process, or from a more accurate source (e.g., an aerial photograph).

STEP FOUR: ACCURACY ASSESSMENT

• The Situation– You’ve just created a classified map for your clients.– You need to tell them how well it actually represents

what’s out there (how confident you are about the results).

• Without an accuracy assessment, a classified map is just a pretty picture. Why?

ACCURACY ASSESSMENT

• Overview– Collect reference data: “ground truth” (similar to

training sites, but should not have been used in the classification)

• Determination of class types at specific locations– Compare reference data to classified map

• Does class type on classified map = class type determined from reference data?

ACCURACY ASSESSMENT

• Issue 1: Choosing an appropriate reference source– Ensure that you can extract from the reference source

the specific information needed to confirm the accuracy of the classification scheme• For example, aerial photos may not be good

reference sources if your classification scheme distinguishes four species of grass. You may need GPS’d ground data in order to distinguish the different species.

AA: REFERENCE DATA

• Issue 2: Determining size of reference plots– Match spatial scale of reference plots and

remotely-sensed data• e.g. GPS’d ground plots 5 meters on a side may

not be as relevant if remotely-sensed cells are 1 km on a side. You may need aerial photos or even other satellite images.

AA: REFERENCE DATA

• Issue 3: Determining position and number of samples– Make sure to adequately sample the landscape– Variety of sampling schemes

• Random, stratified random, systematic, etc.– The more reference plots, the better

• You can estimate how many you need statistically

• In reality, you rarely get enough (time and $)• Lillesand and Kiefer: suggest 50 per class as a

rule of thumb

AA: REFERENCE DATA

• Examining every pixel is not practical.• Therefore, a random sample of pixels is identified and the ‘true’

land covers associated with them are determined, either through field work (best) or by examining higher-resolution imagery.

• Rarer classes should be sampled more in order to reliably assess their accuracy

– sampling is often stratified by class

COLLECTING THE REFERENCE DATA

°°

°

°°

°°

°°

°

°

°

SAMPLING METHODS

Simple Random Sampling:observations are randomly placed.

Stratified Random Sampling: aminimum number of observationsare randomly placed in eachcategory.

SAMPLING METHODS

Systematic Sampling: observationsare placed at equal intervalsaccording to a strategy.

Systematic Non-Aligned Sampling:a grid provides even distribution ofrandomly placed observations.

• Having chosen the appropriate reference source, plot size, and locations:– Determine class types from reference source– Identify class type present in classified map

• Compare them!

AA: REFERENCE DATA

• Example:

AA: COMPARE

Reference Plot ID Number

Class determined from reference source

Class claimed on classified map

Agreement?

1 Conifer Conifer Yes

2 Hardwood Conifer No

3 Water Water Yes

4 Hardwood Hardwood Yes

5 Grass Hardwood No

6 Etc….

Summarize using an error matrix

AA: ERROR MATRIX

Class types determined from reference source

Class types determined

from classified

map

# Plots Conifer Hardwood Water Totals

Conifer 50 5 2 57

Hardwood 14 13 0 27

Water 3 5 8 16

Totals 67 23 10 100

• Quantifying accuracy– Total Accuracy: Number of correct plots / total number of

plots

AA: TOTAL ACCURACY


Class types

determined from

classified map


Conifer 50 5 2 57

Hardwood 14 13 0 27

Water 3 5 8 16

Totals 67 23 10 100

%71100*100

81350=

++=AccuracyTotal

Diagonals represent sites classified correctly according to the reference data.

Off-diagonals were mis-classified pixels.

• Problem with total accuracy:– Summary value is an average

• Does not reveal if error was evenly distributed between classes or if some classes were very poorly identified and others very accurately id’d

• Therefore, include other forms of quantifying accuracy:– User’s accuracy– Producer’s accuracy– Kappa coefficient

AA: TOTAL ACCURACY

• User’s accuracy corresponds to error of commission(inclusion):

– e.g. 5 hardwood and 2 water sites erroneouslyincluded in conifer category

• Producer’s accuracy corresponds to error of omission(exclusion):

– e.g. 14 hardwood and 3 water sites omitted from conifer category

USER’S AND PRODUCER’S ACCURACY AND TYPES OF ERROR


Class types determined

from classified

map


Conifer 50 5 2 57

Hardwood 14 13 0 27

Water 3 5 8 16

Totals 67 23 10 100

• From the perspective of the user (the ‘client’) of the classified map, how accurate is the map?

– For a given class, how many of the pixels on the map are actually what they say they are? Conversely, in a given class, how many pixels were classified erroneously (commission)?

– Calculated as:

Number correctly identified in a given map class

Number in the classified map assigned to that map class

AA: USER’S ACCURACY

________________________________________________________________

• From the perspective of the maker (producer) of the classified map, how accurate is the map?

– For a given class in the reference plots, how many of the field sample plots are labeled correctly in the map? Conversely, how many of the field samples were, on the map, classified as a different class (omission)?

– Calculated as:

Number of the ref. plots that matched their mapped class

Number of ref. plots actually collected for that class

AA: PRODUCER’S ACCURACY

__________________________________________________________________

• where r is the number of rows in the confusion (error) matrix, xii is the number of observations in row i and column i (on the major diagonal), xi+ is the total observations in row i, x+1 is the total of observations in column i, and N is the total number of observations included in the matrix.

KAPPA COEFFICIENT

agreement chance - 1agreement chance -accuracy observedˆ =K

http://justusrandolph.net/kappa/

http://justusrandolph.net/kappa/

ACCURACY ASSESSMENT: KAPPA


User’s AccuracyClass types

determined from

classified map


Conifer 50 5 2 57 88%

Hardwood 14 13 0 27 48%

Water 3 5 8 16 50%

Totals 67 23 10 100

Producer’s Accuracy 75% 57% 80% Total: 71%

= 46%K̂

ACCURACY ASSESSMENT: KAPPA

= 46%K̂

Kappa = [ 100 * (50+13+8) – [ (67*57) + (23*27) + 10*16)] ] /[ 100*100 - [ (67*57) + (23*27) + 10*16) ] ]

Kappa = 7100 – (3819 + 621 + 160) / 10000 – (3819 + 621 + 160)

Kappa = 2500 / 5400 = 46.3

• Supervised classification:–Know the information classes (land

cover classes) desired.–Identify in the image areas that contain

the land covers (training sites digitized).–Develop idealized spectral response

patterns for each information class.

SUMMARY

• Supervised classification:– Based on the idealized spectral responses,

assign each pixel to a single class (hard), or determine the degree of membership to each class (soft). Also need to consider thresholds (all pixels -> information classes, or some pixels -> unknown class)

– ‘Clean’ the image, if necessary (modal filters, boundary clean)

– Determine the accuracy of the classification.

SUMMARY

digital image analysis

Documents