reclassification methods from important a research topic to trivial computer functions is it to...

Reclassification Methods

From important a research topic to trivial computer functions

Is it to easy?

In the past• Important cartographic process, map-makers only had

one chance to do it right. R/I N/O transformation.• Goal was to classify and put into categories attribute

information that best preserved the distribution of the data and convey its meaning based on the objective of the map composition.

• Researchers developed many reclassification techniques with different advantages. Each will give a different representation.

• Today, within a GIS, it is easy to classify data and it is sometimes done with little thought (i.e. use default).

• The importance of creating meaning visualizations that convey information to stakeholders has not changed.

Factors to Consider

• Distribution of the data (Uniform, Gaussian, Gamma, etc.).

• Audience (e.g. scientific vs. lay)• Goals and Objectives

– Highlight Rare– Highlight Common– Highlight areas of importance– Best preserves the distribution

Data Distribution• Always look at the distribution of your data. Histograms are useful.• You can always change the representation by changing the number

of classes. However, most people can not make distinctions beyond 10 categories.

• 2000 Census Tract Data, Tucson– Proportion of population between 18-29.

Data Distribution• 2000 Census Data, Tucson

– Average Household Size

Manual• User Defined – You create the class breaks• If there is a logical way to reclassify the data

based on original research, literature, prior work, traditional values, or common sense – DO IT

• Importantly, you should be able to write a justification of your procedure.

Equal Interval• Scheme divides the range of attribute values into equal-sized

subranges:

– Class Interval = Data Range (high – low) / # of intervals

• This method emphasizes the amount of an attribute value relative to other values. For example you can show that a store is part of a group of stores that make up the upper 1/3 of all sales.

• Best to apply on familiar data ranges such as percentages or temperature.

• Advantages : Easy to understand concept, compute and understand the legend.

• Disadvantages: Does not consider data distribution, not acceptable for ordinal data.

Prop_18-29

Ave_HH_SZ

Defined Interval

• Defined Interval—You specify an interval to divide the range of cell values, and ArcMap determines the number of classes.

• Similar characteristics as Equal Interval.

Quantile• Each Class contains an equal number of features (or cells in a

raster).

– # Observations per class = Total Obs. / # of Classes

• With raster data quantile and equal area are the same.• Rules must be applied to keep like values together, so classes

may not be equal, and in some cases missing.• Maps may be misleading since similar features may be placed

in different classes. • Better for uniform or normal distributed data. • Advantages: Easy to understand concept and compute.

Acceptable for ordinal data.• Disadvantages: Does not consider data distribution, hard to

understand legend.

Prop_18-29

Ave_HH_SZ

Standard Deviation • Shows you the amount a cell’s value varies from the

mean. • In this method you compute the mean value and then

generate class breaks by successively adding and subtracting the standard deviation from the mean.

• Advantages: Considers data distribution, easy to understand concept, compute and understand legend, highlight outliers.

• Disadvantages: Best of a Gaussian distribution, need understanding in statistics to understand results, may not be good for lay audiences.

Prop_18-29

Ave_HH_SZ

Jenks Natural Breaks (Optimal)• Determines the best arrangement of values into classes by

minimizing the within-class sum of squared differences of values from the means of their class. The “optimal” arrangement is determined through an iterative process by looking at different sets of breaks in the data.

2.... )][(

j

injiji MeannASSD

Where A = set of values that have been ordered from 1 to N. 1<=i<j<=NMeani..j = Mean of the class bounded by i and j.

Jenks Natural Breaks

• Advantages: Considers data distribution, can be used to determine best number of classes, relatively easy to understand concept and compute.

• Disadvantages: Hard to understand legend, can not be use for ordinal data.

• Current ESRI default.

Prop_18-29

Ave_HH_SZ

Geometrical Interval • Class breaks are based on class intervals that have a

geometrical series. • The geometric coefficient in this classifier can change once (to

its inverse) to optimize the class ranges.• The algorithm creates these geometrical intervals by

minimizing the square sum of element per class. This ensures that each class range has approximately the same number of values with each class and that the change between intervals is fairly consistent.

• Advantages: Relatively easy to compute and understand legend, considers data distribution

• Disadvantages: Hard to understand concept, can not be used for ordinal data

Prop_18-29

Ave_HH_SZ

Percentiles

• Uses the percentile breaks to determine class breaks.• Order data (low high), each value represents 1/n

percentile of the total. Must break on unique values.• Advantages: Relatively easy to compute and

understand legend, considers data distribution, highlight outliers

• Disadvantages: Relatively hard to understand concept, not in ArcGIS.

From GEODA

Box Map

• Shows outliers as the function of quartiles.

IQR = Q75 – Q25Lower Outlier = Q25 – Hinge * IQRUpper Outlier = Q75 + Hinge * IQRHinge is commonly either 1.5 or 3

• Primary used to highlight outliers.• Not in ArcGIS

From

GEODA

reclassification methods from important a research topic to trivial computer functions is it to...

Documents

data distribution

distribution slide

census data

data uniform

ordinal data

sz slide

raster data quantile

census tract data