using a knowledge-based system to ... - ufdc image array...

USING A KNOWLEDGE-BASED SYSTEM TO TEST THE TRANSFERABILITY OF A SOIL-LANDSCAPE MODEL IN NORTHEASTERN VERMONT

By

JESSICA MCKAY

A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

UNIVERSITY OF FLORIDA

2008

1

© 2008 Jessica McKay

2

To my parents, especially my dad; the only person other than my advisors who even tried to read this whole thesis. Also to my husband, because even though he has no idea what this is about, he

did cook me dinner many nights while I was in between work and school.

3

ACKNOWLEDGMENTS

I thank my advisory committee: Dr. Sabine Grunwald and Dr. Willie Harris of the

University of Florida, and Dr. Xun Shi of Dartmouth College, who all offered important insight

into what needed to be in this document. I also thank Roger DeKett and Tom Burke, two

members of our team at the NRCS who dug and described many of the holes for this study.

Finally, I thank Robert Long, who I work next to every day. Not only did he help me dig holes

and describe soils for this project, he has been a valuable source of knowledge and support since

day one.

4

TABLE OF CONTENTS page

ACKNOWLEDGMENTS ...............................................................................................................4

LIST OF TABLES...........................................................................................................................7

ABSTRACT...................................................................................................................................10

CHAPTER

1 INTRODUCTION ..................................................................................................................12

Traditional Soil Mapping........................................................................................................12 Predictive Modeling................................................................................................................12

Digital Soil Mapping .......................................................................................................14 Fuzzy Logic .....................................................................................................................15 Digital Elevation Models.................................................................................................16 Digital Modeling Approaches and Methods....................................................................16 Knowledge-Based Models...............................................................................................18 Soil Inference Engine ......................................................................................................19

Model Transferability .............................................................................................................19

2 OBJECTIVES AND HYPOTHESIS......................................................................................21

3 METHODOLOGY .................................................................................................................22

Study Area ..............................................................................................................................22 Field Sampling........................................................................................................................25 Model Development ...............................................................................................................30 Data Preparation .....................................................................................................................32 Rules .......................................................................................................................................39 Evaluation ...............................................................................................................................41

4 RESULTS AND DISCUSSION.............................................................................................44

Final Predictions .....................................................................................................................44 Evaluation of Predicting Soil Series .......................................................................................53 Fuzzy Drainage Class .............................................................................................................55 Discussion...............................................................................................................................57

5 SUGGESTIONS FOR FURTHER RESEARCH ...................................................................61

5

APPENDIX

A DOCUMENTATION EXAMPLES .......................................................................................63

B VEGETATIVE ARTIFACTS IN DIGITAL ELEVATION DATA ......................................66

C FUZZY DRAINAGE CLASS DESIGNATIONS..................................................................67

D PREDICTION RESULTS FROM W1 (MULTIPLE SAMPLE CONFIGURATIONS).......74

LIST OF REFERENCES...............................................................................................................80

BIOGRAPHICAL SKETCH .........................................................................................................83

6

LIST OF TABLES

Table page 3-1 Study area comparison.......................................................................................................23

3-2 Soil series modeled in W1 and W2....................................................................................25

3-3 Rules for Cabot, Colonel, and Dixfield soils. ....................................................................39

3-4 Evaluation criteria for fuzzy drainage class.......................................................................42

3-5 Matrix of fuzzy membership designations comparing SIE results and fuzzy drainage classes. ...............................................................................................................................43

4-1 Confusion table that compares calibration prediction results based on SIE to observed soil series including most similar soil series using 90 model development sites in W1..........................................................................................................................53

4-2 Confusion table that compares validation prediction results based on SIE to observed soil series including most similar soil series using 38 independent evaluation sites in W1......................................................................................................................................53

4-3 Confusion table that compares validation prediction results based on SIE to observed soil series including most similar soil series using 42 validation independent evaluation sites in W2........................................................................................................54

4-4 Confusion table that compares calibration prediction results based on SIE to observed soil series using 90 model development sites in W1 using 9 calibration runs.....................................................................................................................................54

4-5 Confusion table that compares validation prediction results based on SIE to observed soil series using 38 independent evaluation sites in W1 using 9 validation runs ..............55

4-6 Confusion table that compares calibration prediction results based on SIE to observed drainage classes using 90 model development sites in W1................................55

4-7 Confusion table that compares validation prediction results based on SIE to observed drainage classes using 38 independent evaluation sites in W1..........................................56

4-8 Confusion table that compares validation prediction results based on SIE to observed drainage classes using 42 independent evaluation sites in W2..........................................56

4-9 Percent accuracy overall based on fuzzy drainage class membership (Validation) ..........56

C-1 Study area W1 fuzzy drainage class designations (validation)..........................................67

C-2 Study area W2 fuzzy drainage class designations (validation)..........................................68

7

C-3 Study Area W-1 Fuzzy drainage class designations (calibration) .....................................70

D-1 Confusion table that compares calibration prediction results based on SIE to observed soil series using 90 model development sites in W1 (configuration 2)..............74

D-2 Confusion table that compares validation prediction results based on SIE to observed soil series using 38 independent evaluation sites in W1 (configuration 2)........................74















8

LIST OF FIGURES

Figure page 3-1 Essex County, Vermont and study areas W1 and W2. ......................................................24

3-2 Study area W1 sample points.............................................................................................27

3-3 Study area W2 sample points.............................................................................................29

3-4 Elevation for study area W1 ..............................................................................................33

3-5 Elevation for study area W2 ..............................................................................................34

3-6 Slope for study area W1.....................................................................................................35

3-7 Slope for study area W2.....................................................................................................36

3-8 Wetness index for study area W1 ......................................................................................37

3-9 Wetness index for study area W2 ......................................................................................38

3-10 Inference interface for Colonel (ArcSIE). A) bell-shaped curve for wetness index, B) Z-shaped curve for slope....................................................................................................40

4-1 Fuzzy prediction map of Cabot soil series for study area W1 ...........................................45

4-2 Fuzzy prediction map of Colonel soil series for study area W1........................................46

4-3 Fuzzy prediction map of Dixfield soil series for study area W1 .......................................47

4-4 Fuzzy prediction map of Cabot soil series for study area W2 ...........................................48

4-5 Fuzzy prediction map of Colonel soil series for study area W2........................................49

4-6 Fuzzy prediction map of Dixfield soil series for study area W2 .......................................50

4-7 Final prediction maps of soil series for study area W1......................................................51

4-8 Final prediction map of soil series for study area W2. ......................................................52

A-1 Sample point 127 description.............................................................................................63

A-2 Sample point 127 profile photo..........................................................................................64

A-3 Sample point 127 landscape photo ....................................................................................65

B-1 Vegetative artifacts in digital elevation data......................................................................66

9

Abstract of Thesis Presented to the Graduate School

of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science

USING A KNOWLEDGE-BASED SYSTEM TO TEST THE TRANSFERABILITY OF A

SOIL-LANDSCAPE MODEL IN NORTHEASTERN VERMONT

By

Jessica McKay

December 2008 Chair: Sabine Grunwald Major: Soil and Water Science

Knowledge-based digital soil mapping has been used extensively to predict soil taxonomic

and physico-chemical soil characteristics. Fuzzy logic knowledge-based models allow explicit

integration of knowledge and expertise from soil mappers familiar with a region. Questions

remain about the transferability of soil-landscape models developed in one region to other

regions.

Objectives of this study were to develop and evaluate a knowledge-based model to predict

soil series and fuzzy drainage classes and assess its transferability potential between similar soil

landscapes in Essex County, Vermont.

Two study areas, study area (W1), 3.5 km2 in size and study area (W2), 1.9 km2 in size,

were sampled at 128 and 42 sites, respectively. Both study areas are located in Essex County,

Vermont. The bedrock in the area is phyllite and schist. Vegetation is spruce-fir and mixed

northern-hardwood forests. The topography of the study areas is a series of hills and narrow

valleys. Deep, loamy basal till covers the modeled area.

Rule-based fuzzy inference was used based on fuzzy membership functions characterizing

soil-environment relationships to create a model derived from expert knowledge (soil scientists)

10

using 70 percent of sampled sites in W1. The model was implemented using the Soil Inference

Engine (SIE), which provides tools and a user-friendly interface for soil scientists to prepare

environmental data, define soil-environment models, run soil inference, and compile final map

products. The soil prediction model was created and evaluated in W1 using 38 validation sites

and transferred and validated in W2 using 42 validation sites. Defuzzified raster predictions were

compared to field mapped soil series and fuzzy drainage class properties to assess their accuracy.

The model was found to be highly transferable between the two areas. In W1 the model

was 73.7 and 88.8 percent accurate in predicting soil series and fuzzy drainage classes using an

independent validation set, respectively. In W2, similar results were achieved, with 71.4 and 89.9

percent accuracy in predicting soil series and drainage class.

With more research into pre-processing tools to enhance the knowledge being fed into the

inference engine, these accuracy numbers may be improved in the future. It was shown that the

prediction model was transferable to a landscape with similar soil characteristics; however, it is

critical to identify constraints and thresholds that limit transferability of prediction models to

other soil-landscapes.

11

CHAPTER 1 INTRODUCTION

Traditional Soil Mapping

Conventional soil survey methods are relatively expensive in terms of time and cost

required to complete them. There are three main steps that make up soil survey according to

Cook et al. (1996). The first step consists of observing ancillary data such as aerial photography,

geology, and vegetation, along with soil profile characteristics. The second step requires these

observations to be incorporated into an implicit conceptual model that is used to infer on the

variation of soils. The third step is the practice of applying the conceptual model to the survey

area in order to predict the soil variation and occurrence at unobserved sites. Commonly, soil

scientists develop soil-landscape relationships using site-specific information that is translated to

unsampled locations across a landscape. The survey process relies on tacit knowledge that is

passed from surveyor to surveyor through training and experience and is never fully captured in

documentation.

Traditional soil mapping products utilize polygons, or crisp map units, which suggests

abrupt changes from one map unit or soil type to another. This only allows each location on the

landscape to fit into the constraints of one map unit, which does not accurately reflect the soil

landscape. One way that scientists attempt to remedy this is to use a continuous field model,

which uses pixels or voxels rather than polygons to reflect the gradual change of soil attributes

across the landscape (Grunwald, 2006).

Predictive Modeling

For years, soil scientists have been working to build quantitative predictive models to a

large extent based on the five factors of soil formation as described by Jenny (1941):

12

S = f(Cl, O, R, P, T) (1-1)

where S = soil Cl = climate O = organisms R = relief P = parent material T = time

The most prominent soil-landscape model that underlies research studies was set forth by

McBratney et al. (2003) and is known as the SCORPAN model. This model can be written as

either:

Sc = f(s,c,o,r,p,a,n) (1-2) or Sa = f(s,c,o,r,p,a,n) (1-3) where Sc is soil class and Sa is a soil attribute. The SCORPAN model is unique in that it includes

s, soil, and n, spatial position, as factors. McBratney et al. (2003) pointed out that soil can be

predicted from its properties, and that soil properties can be predicted from soil classes or from

other soil properties. The reason s can be part of the model is the fact that soil properties and

classes are correlated (linked) with each other. For example, drainage class is dependent on other

soil properties such as soil texture, porosity, organic matter content, and others. Soil properties

can be derived from remote or proximal sensing or from expert knowledge. Also implicit in the

SCORPAN model are the spatial coordinates x,y and an approximate time coordinate ~ t

(McBratney et al., 2003).

Often, soil variability is primarily controlled by topography (Thompson et al., 2006), while in

some landscapes other factors such as land use and land cover control soil variability. The

predictive models are generally based on this concept, or, more specifically, the catena concept

(Milne, 1935), which indicates that soil profiles that occur on topographically associated

13

landscapes will be repeated on similar landscapes. A catena is a sequence of soils that are about

the same age, which are derived from similar parent material and which occur under similar

climatic conditions, but which have different characteristics due to variation in relief and

drainage (Grunwald, 2006). A catena model developed on one hillslope has the potential to be

transferred to adjacent hillslopes with similar landscape characteristics.

Digital Soil Mapping

Digital soil mapping techniques are rapidly being developed that take advantage of the vast

quantity of information technologies available to the soils discipline. Digital soil mapping is

defined by the International Working Group on Digital Soil Mapping as the creation and

population of spatial soil information systems by the use of field and laboratory observational

methods coupled with spatial and non-spatial soil inference systems (McBratney, 2006).

The concept of soil inference systems was introduced by McBratney et al. (2002) as a way

of using pedotransfer functions as knowledge rules for inference engines. Soil inference systems

take information that is known with a given level of (un)certainty and use pedotransfer functions

to infer data that is unknown. A pedotransfer function (PTF), according to Bouma (1989), is a

process of translating data we have into what we need. There are two types of PTFs based on the

amount of information that is available. Class PTFs predict soil properties based on the class to

which the soil sample belongs (such as textural class, or any other class that the soil scientist

defines). Continuous PTFs, on the other hand, predict certain soil properties as a continuous

function of one or more measured variables (Wösten et al., 1995). Another classification of

PTFs has been given by McBratney et al. (2002) as single point regressions, parametric and

physico-empirical PTFs. Single point PTFs predict a single soil property, while parametric PTFs

predict parameters of a model. This is similar to the idea of a soil inference engine that creates a

soil-landscape model.

14

These soil inference systems are tools used for environmental soil-landscape modeling,

which Grunwald (2006) describes as a science devoted to understanding the spatial distribution

of soils and coevolving landscapes as part of ecosystems that change dynamically through time.

McSweeney et al. (2004) describe the methodology of soil-landscape modeling based on (i)

characterization of the local physiographic domain through analysis of digital elevation model

(DEM) data, (ii) collection of georeferenced soil samples and compiling desired soil property

data, and (iii) development of explicit, quantitative, and usually simple empirical models. As

Grunwald (2006) points out, soil-landscape modeling depends greatly on soil and ancillary

variables. There are multiple factors which impact soil-landscape modeling, which include:

attribute type (Boolean, categorical, ordinal, interval, or continuous); content of attributes (soil

attributes, topographic attributes and classes, parent material, land cover and land use, or time);

sample support; geographic extent of observations; total number of observations; density of

observations; and sampling design.

Fuzzy Logic

Soil landscape models may include some form of fuzzy logic. Zadeh (1965) introduced the

idea of fuzzy sets, which set out to quantify the imprecision and uncertainty that is an inherent

part of soil mapping. While soils are traditionally mapped with crisp borders between map units

and there are technically specific boundaries defined between soil series in terms of the attributes

that make each specific series unique, it is common understanding between soil scientists that

each soil “type” has a range of characteristics and soils vary constantly across the landscape.

Fuzzy logic can be used to try to show the variation of soils as they actually occur while possibly

moving away from the soil series concept employed in traditional soil survey. McBratney and

Odeh (1997) point out that fuzzy set theory can be useful in dealing with uncertainty that arises

due to imprecise boundaries between categories. Zhu (1999) explains that under fuzzy logic,

15

each unit on a map (be it defined as a soil or simply as a pixel) can be assigned to more than one

class with varying degrees of class assignment, or differing membership values.

Digital Elevation Models

Bishop and Minasny (2006) pointed out that historically, a major limitation to soil

landscape modeling has been the quality of available elevation data. In recent years, much more

detailed elevation data has become available and the use of geographic information systems

(GIS) as a tool for modeling has risen dramatically.

McBratney et al. (2003) found that a DEM was the most common source of secondary

information in published soil mapping studies. They also found that a terrain attribute was used

in 80% of the studies as part of the final prediction model. This, as Bishop and Minasny (2006)

articulate, illustrates the importance of ensuring the accuracy of the DEM. If the DEM is

inaccurate, it likely leads to uncertainty in the model output.

The availability of high quality DEM’s has vastly improved the outlook for soil-landscape

modeling. For example, Thomson et al. (2006) used a high resolution DEM and resulting

empirical quantitative models to predict patterns of soil properties.

Digital Modeling Approaches and Methods

Quite a bit of research has been done on the topic of predictive mapping of soil properties,

while little has been done on the topic of mapping broad soil types or map units. Lagacherie and

Voltz (2000) pointed out that mapping of soil properties in large areas is challenging to

accomplish with acceptable precision and cost. Therefore, methods must be employed that utilize

available information and minimize sampling. They also mention that predictions are often

refined using secondary data, such as attributes derived from DEMs. In multiple case studies in

Southern France, Lagacherie and Voltz (2000) and Voltz et al. (1997) used a method of first

16

modeling the soil-landscape relationships in the area, and then using those to improve spatial

predictions.

To model the soil landscape relationships, a conditional probability approach was used as

described in Lagacherie et al. (1995). This approach is used to represent the soil patterns and

how they depend on landform features by computing the probability of a soil class occurring at a

site given the soil classes, the geographical location, and the relative elevation of neighboring

sites (Lagacherie and Voltz, 2000).

Scull et al. (2003), McBratney et al. (2000) and Grunwald (2006) provide an overview of

predictive soil mapping methods, including geostatistical methods, statistical methods (such as

decision tree analysis), and knowledge-based models. Geostatistics has emerged as an especially

popular approach to mapping soil properties because all soil and landscape properties show more

or less spatial autocorrelation. Kriging is the geostatistical method of spatial interpolation

(McBratney et al., 2000). According to McBratney et al. (2000), there are some major limitations

to kriging, due to the assumptions of stationarity and spatial autocorrelation, which can be a

problem in complex terrain such as in northeastern Vermont, because there are many areas where

abrupt changes in soil-forming factors occur. Zhu (1999) also pointed out that these techniques

require a large amount of field data in order to extract the relationships between soil properties

and landscapes, which is a limiting factor when aiming to increase efficiency in a survey area.

Not to mention that, as McBratney et al. (2002) pointed out, the most difficult and expensive step

in environmental modeling is the collection of data.

Statistical methods can also be used to describe the relationships between quantifiable

landscape indices and soil properties and regression analysis has been successfully performed to

account for variation in various soil characteristics using multiple predictor variables (Scull et

17

al., 2003). There have been many advances in spatial statistics which provide multiple tools for

pedologists to quantify and model the nature of soils in the landscape (Pennock and Veldkamp,

2006). The main drawback of any statistical method is that standard statistical procedures are not

flexible enough to allow much integration with new data sources, such as expert knowledge

(Scull et al., 2003).

Decision tree analysis is new to the field of soil science, but essentially it uses soil

landscape correlation in model development by designing a set of predictive rules developed

from training data, which are then applied to a geographic database to predict the value of a

response variable (Michaelsen et al., 1994, Scull et al., 2003).

Three main goals of predictive soil mapping are defined by Scull et al. (2003): (1) to

exploit the relationship between environmental variables and soil properties in order to more

efficiently collect soil data; (2) produce and present models that better represent soil landscape

continuity; and (3) explicitly incorporate expert knowledge in model design. Knowledge-based

models have the potential to satisfy all three of these goals and, until recently, have been

underrepresented in the research.

Knowledge-Based Models

Knowledge-based models are composed of three main elements: environmental data, a

knowledge base, and an inference engine which combines the data and the knowledge base to

infer logically valid conclusions about the soil (Skidmore et al., 1996). Davis (1993) reviewed

knowledge-based models and their applications to environmental modeling research and found

that while a possible absence of fundamental knowledge for rule generation would be a

constraint on the application of the systems, they were becoming more widely accepted as a

technique, even over a decade ago. Traditional soil survey has been the most popular form of soil

mapping for many years incorporating knowledge of soil surveyors with extensive soil mapping

18

experience. To incorporate soil mappers expertise into soil knowledge-based models has the

potential to improve soil predictive models.

Soil Inference Engine

The Soil Inference Engine (SIE) is an expert knowledge-based inference engine designed

for creating soil maps under fuzzy logic. There are two main types of knowledge that SIE uses:

rules, which are defined in parametrical space, and cases, which are defined in geographical

space. Both rule-based reasoning (RBR) and case-based reasoning (CBR) can be used to perform

inference. Case-based reasoning aims to use the knowledge represented in specific cases to help

solve a problem in a different area (Shi et al., 2004). The Soil Inference Engine also provides

tools for result validation, terrain analysis, pre- and post-processing for raster data, and data

format conversion (Shi, 2006).

The Soil Inference Engine performs fuzzy soil mapping based on the concept of fuzzy

soil classification, which assigns fuzzy membership values for different soil types to each

location. Rule-based reasoning and CBR are used by SIE to calculate these fuzzy membership

values. The values are meant to represent the similarities of a given soil to be predicted to those

soil types defined within the inference engine (Shi, 2006).

Model Transferability

One major question that remains in the field of soil landscape modeling is that of model

transferability, especially when it comes to modeling of soil types and not just one or two soil

properties.

It has been speculated by Lagacherie and Voltz (2000) that predictive capabilities are

limited, especially over large areas, because the relationships between soil properties and

landscapes are either nonlinear or unknown. Prediction becomes even more difficult when

factors other than topography begin to play more of a role, such as different parent materials or

19

changes in climate (Thompson et al., 2006). These other factors influence the soil environment

as soils get further and further apart from each other spatially, especially in a varied landscape

such as the glaciated region of northeastern Vermont. Pedotransfer functions that are developed

in one geomorphic region and applied to another region may show larger uncertainties due to

extrapolation (McBratney et al., 2002). This is likely true also for soil inference models.

This study aims to take a soil prediction model developed for a relatively small study area

in a complex landscape and test how well it transfers to another, similar study area a few

kilometers away.

20

CHAPTER 2 OBJECTIVES AND HYPOTHESIS

This study had two main objectives, the first of which was to develop a model to predict

soils occurring in dense till in a study area in Essex County, Vermont. The second objective was

to test the transferability of that model to a second study area with similar landscape

characteristics in the same county.

Specific steps were:

(1) To predict which soils (soil series; drainage classes) occur across the landscape in the study

area (W1) using the SIE model.

(2) To evaluate the completed soil model within W1 using an independent validation set.

(3) To run the same model in study area W2.

(4) To assess the transferability of the model by running transects in the W2 similar to a random

catena sampling strategy and comparing the field results with the SIE results.

The hypothesis was that the model will transfer well between similar landscapes to

predict soil series and drainage classes.

21

CHAPTER 3 METHODOLOGY

Study Area

The W1 study area is in Essex County, Vermont (Figure 3-1). It is about 3.5 km2 and the

elevation ranges from 479 m at the outlet of the East Branch of the Nulhegan River to 853 m at

the summit of Sable Mountain. The study area lies within the U.S. Geological Survey (USGS)

Averill Lake topographic quadrangle. The bedrock is mainly phyllite and schist of the Gile

Mountain formation, with some granite on the upper elevations of Sable Mountain. Vegetation is

mainly spruce-fir forests on the mountain summit and poorly drained lower slopes and mixed

northern-hardwood and spruce-fir forests on middle slopes. The general topography of the area is

a series of hills and narrow valleys. Deep loamy basal till covers most of the middle and low

elevations of the study area, while some very poorly drained organic materials occur on broad

flats and in depressions.

The W2 study area is also in Essex County, Vermont. It is 1.9 km2 surrounding an

unnamed stream and the elevation ranges from 373 m to 619 m. The study area is completely

within the USGS Bloomfield topographic quadrangle. The bedrock and vegetation are similar to

that of the W1 study area, and the soil landscapes are also alike.

The two study areas share a comparable climate, with a mean annual temperature of about

6 degrees Celsius and total annual precipitation equaling about 97 centimeters. The land use is

also exactly the same, with both study areas (Table 3-1) being managed long-term by a large

timber company.

22

Table 3-1. Study area comparison Study Areas

W1 W2 USGS Quad Averill Lake Bloomfield Size 3.5 km2 1.9 km2 Elevation (Meters)

Min: 468 Max: 833 Mean: 664 Std. Dev.: 51.9

Min: 375 Max: 618 Mean: 475 Std. Dev.: 49.67

Geology phyllite and schist (Gile mountain formation)

phyllite and schist (Gile mountain formation)

Vegetation Mixed northern-hardwood and spruce-fir forests

Mixed northern-hardwood and spruce-fir forests

Topography hills and narrow valleys hills and narrow valleys

Slope (Percent) Min: 0.02 Max: 86.08 Mean: 15.42 Std. Dev.: 12.02

Min: 0.10 Max: 54.82 Mean: 12.93 Std. Dev.: 7.38

Mean Annual Temperature 6 degrees Celsius 6 degrees Celsius Mean Annual Precipitation 97 cm 97 cm Land use Long term timber management Long term timber management

Soils (general knowledge)

Deep, loamy basal till; some very poorly drained organic materials in depressions

Deep, loamy basal till; some very poorly drained organic materials in depressions

23

Figure 3-1. Essex County, Vermont and study areas W1 and W2.

24

Essex County is the last county in Vermont to be undergoing an initial soil survey, and

therefore, there is no official soils data available for the county. However, Essex County is part

of a larger region known as the Northeast Kingdom, which also includes Orleans and Caledonia

counties. These areas have already been mapped and the data is available through the USDA-

Natural Resources Conservation Service Web Soil Survey and Soil Data Mart. Given the

experience in the rest of the Northeast Kingdom, it is reasonable to assume that in these mainly

wooded areas, the basal till areas will be dominated by one catena of soils, and the model for this

study reflects this assumption. The three soil series that dominate these and similar areas are

known as Cabot, Colonel, and Dixfield (Table 3-2.). In general, Dixfield soils are found highest

on the landscape and on the steepest and most convex slopes, and Cabot soils are found lowest

on the landscape and on the flattest and most concave slopes. Colonel soils occur in between

Cabot and Dixfield in terms of both hillslope position and slope shape. Other soils occur to a

lesser extent on the landscapes evaluated in this study as well. These soils, for the purpose of

validation, were designated based on which of the three dominant series they most closely

resembled morphologically.

Series Name

Drainage Class Taxonomic Class

Cabot Poorly Loamy, mixed, active, nonacid, frigid, shallow Typic Humaquepts

Colonel Somewhat poorly

Loamy, isotic, frigid, shallow Aquic Haplorthods

Dixfield Moderately well Coarse-loamy, isotic, frigid Aquic Haplorthods Table 3-2. Soil series modeled in W1 and W2.

Field Sampling

The field sampling in W1 consisted of 157 soil pits dug as part of a separate (related)

project. The 157 sites were laid out in a 150 m grid design throughout the entire study area.

Detailed profile descriptions were written at each site (including documentation on soil series

25

and drainage class properties), and landscape and profile photographs were also taken for later

use. For use in this study, those 157 sample points were pared down in a few ways. First, areas of

W1 that are known to be bedrock-controlled were masked out, because this particular model is

not designed to map bedrock-controlled soils. This process left 128 sample points. Of those

points, seventy percent (90 points) were used to aid model development and thirty percent (38

points) were used for model validation within W1. The seventy-thirty distribution within the W1

study area is random (figure 3-2).

26

Figure 3-2. Study area W1 sample points

27

In order to validate the model in W2, a sampling design similar to a random catena

sampling strategy was used. Six sampling points along seven catenas were dug, for a total of 42

sampling points (Figure 3-3). Detailed profile descriptions were written at each site and

photographs were taken of both the landscape and the soil profiles.

28

Figure 3-3. Study area W2 sample points

29

See Appendix A for examples of the documentation gathered during the course of this

study.

Model Development

There are eight basic steps included in the RBR-CBR process used by SIE (Shi et al.,

2007), and these are the steps that were followed to develop the model for the study area. They

are as follows:

(1) The soil scientist (myself, with guidance from two senior soil scientists) provided global

knowledge. This includes the soils expected to be found in the area as well as the typical

environmental conditions in which these soils occur. This global knowledge was

supplemented in this study by the data obtained from the 90 sample sites in W1. There

were also several hundred other known points that were not a part of the study but are in

the same county and are the same soil types. These were not formally used in this study

but are considered to be supplemental knowledge previously gained by the soil scientists.

Environmental conditions that are defined by environmental values are formalized into

rules, while those represented by geographical locations are formalized into cases.

(2) The soil scientist prepared data layers such as slope and wetness index from the DEM to

be used for characterizing the previously defined environmental conditions.

(3) The Soil Inference Engine was used to perform RBR or global CBR, using both the

global knowledge and the GIS layers. An output map was generated which shows the

general pattern of soils on the landscape, based on the input information.

(4) The soil scientist verified the initial round of output maps by comparing the results to

knowledge of the area and any known points (in this case the 90 points), and adjusted

them. This can be done by either adjusting the rules or global cases, or by fine-tuning the

maps by using the following steps. For this study, the rules were adjusted multiple times

30

in an attempt to gain inference results that showed high accuracy matches to the 90 points

within the first study area. This turned out to be a challenge, but looking at the study area

as a whole, it seemed the results were reasonable and therefore the process was moved

forward to validation.

(5) The soil scientist could have provided local knowledge, in the form of cases, to address

local exceptions. These occur when the results make sense from the inputs, but for some

reason it is known that a different soil may actually occur at a specific location. This

knowledge can only be gained by either a.) field sampling or b.) extensive experience and

knowledge of landforms. In this study, there were no local exceptions that were

addressed.

(6) The Soil Inference Engine would then be used to perform local CBR using the local

knowledge and the GIS layers.

(7) The soil scientist verifies the next round of output maps. The cases can be adjusted and

the CBR can be run again. Running the inference is a very quick (a matter of seconds)

process that can be repeated easily until the results are satisfactory.

(8) The soil scientist used

(9) the post-processing tools and other GIS tools (in this study, ArcGIS (Environmental

Systems Research Institute, Redlands, CA) was used extensively, specifically spatial

analyst) to integrate the results and generate final maps.

Once the model was fully developed for W1, it was run on the W2 study area as well. The

model developed by the soil scientist was then evaluated for the purpose of this study using an

independent validation dataset consisting of 42 sample points from W2.

31

Data Preparation

In basal till soils, the two main factors that have proven to provide a good basis for rules

are slope and compound topographic wetness index. Other layers, such as vegetation, landform,

and relative position were investigated and ultimately not used in this study. Both of the layers

used in the study are derived from a DEM (Figures 3-4 and 3-5), derived from Light Detection

and Ranging (LiDAR) data. The LiDAR data was originally provided at 1 m resolution, which

was too fine a resolution for this purpose due partly to vegetative artifacts (see appendix B) that

affect inference results. The data was therefore filtered using a 9 x 9 rectangular neighborhood,

then resampled to a 5 m pixel size using the resample tool in ArcToolbox. The software used for

this process was ArcGIS. The DEM used for this study has this resulting 5 m pixel size as well as

approximately 30 cm vertical accuracy.

The terrain attributes (slope and wetness index) were derived using SIE. The tools for

deriving both layers are found under the Terrain Attributes menu of SIE. The slope layer (figures

3-6 and 3-7) was created using the Evans-Young algorithm (Pennock et al., 1987), a

neighborhood size of 30, and a square neighborhood shape. The wetness index (figures 3-8 and

3-9) is calculated as

w = In(Flow Accumulation/Slope Gradient) (3-1)

with the input being the DEM since this study used a multi-path wetness index algorithm (Shi,

2007), which is a function that represents water flowing into all neighboring pixels that are lower

than the center pixel. The amount of flow to each pixel is proportional to the steepness in that

direction. This is in contrast to a uni-path wetness index algorithm, which only allows flow in the

steepest direction.

32

Figure 3-4. Elevation for study area W1

33

Figure 3-5. Elevation for study area W2

34

Figure 3-6. Slope for study area W1

35

Figure 3-7. Slope for study area W2

36

Figure 3-8. Wetness index for study area W1

37

Figure 3-9. Wetness index for study area W2

38

Rules

The rules developed for the three soil series in this study are relatively straightforward and

represent the understanding of the soils as they occur on the landscape in relation to one another.

The final rules are shown in Table 3-3, below. Figure 3-10 illustrates an example of the inference

interface which shows the membership function.

Table 3-3. Rules for Cabot, Colonel, and Dixfield soils.

Full Membership at

0.5 Membership at

Curve Shape P Function Series

Slope % Wetness Index

Slope % Wetness Index

Slope Wetness Slope Wetness

Cabot 8 6.3 20 4.8 Z-shaped

S-shaped

Limiting Factor

Limiting Factor

Colonel 15 3.9 35 2.4, 5.4 Z-shaped

Bell-shaped

Limiting Factor

Limiting Factor

Dixfield 15 3.4 8 4.9 S-shaped

Z-shaped

Limiting Factor

Limiting Factor

39

A

B Figure 3-10. Inference interface for Colonel (ArcSIE). A) bell-shaped curve for wetness index,

B) Z-shaped curve for slope.

40

Evaluation

The original output maps are fuzzy maps, with each pixel having an assigned fuzzy value

for each soil series. In order to have a concrete way to validate results, a specific value must be

assigned to each pixel, which is what a hardened (defuzzified) map accomplishes. Using the

post-processing tools from SIE, hardened maps of the W1 and W2 study areas were created.

The results were evaluated in two ways. First, a simple, one-to-one comparison of the

hardened map and the soil series name at the validation points in each study area was done in the

form of confusion matrices. To accommodate for bias in splitting the whole dataset into

calibration and validation sets the procedure was repeated a total of 9 times to capture some of

the uncertainty in predictions associated with selecting calibration/validation samples. Prediction

performance on the multiple model runs are presented in form of confusion matrices.

Second, a process was developed for evaluating the results based on fuzzy drainage class.

One of the questions that came up during the course of this study was that of “typical” soils

versus soils that remain in a series but that are not so typical of that series. This led to the

development of fuzzy boundaries for soil series based on drainage class. For example, the

Dixfield series falls into the ‘moderately well drained’ drainage class, which has a range of

characteristics defined that allows all soils that have redoximorphic features between 41 and 102

cm to be grouped in the same category. Some soils that are classified as Dixfield are more typical

of Dixfield while some are still Dixfield but are on the dry fringe and others are on the wet

fringe. A set of criteria (Table 3-4) was developed which allows the illustration of this

differentiation between what is ‘typical’ in a soil series (based on drainage class) and what is not.

Since this model was developed for three soils in one catena, each belongs to a different drainage

class, and the properties measured in the field were consistent with those that can be used to

determine drainage class, this was deemed a reasonable evaluation characteristic.

41

Table 3-4. Evaluation criteria for fuzzy drainage class.

Drainage Class (Soil Series)

Typical Characteristics

Wetter Fringe Characteristics

Drier Fringe Characteristics

Poorly Drained (Cabot)

O Horizon 0-15 cm, Chroma 2 in profile

O horizon 15-20 cm Chroma 3 within 76 cm of top of mineral soil; must be chroma 2 somewhere

Somewhat Poorly Drained (Colonel)

Redox between 23 and 36 cm



Moderately Well Drained (Dixfield)




It may be noted here that the wetter fringe characteristics of Colonel are outside the range in

characteristics listed in the Official Series Description for the Colonel Series (N.C.S.S., 2008).

This is because this study was developed to test the transferability of a simple model with only

three soils, and once the study was underway, it was discovered that in places in both study areas

there are soils occurring between Cabot and Colonel on the drainage class profile. These soils are

Spodosols that are morphologically more similar to Colonel than to Cabot, so they were counted

as “Colonel” (most like Colonel) for the purpose of this study. Also, drainage class evaluation

criteria were defined such that they captured these intermediate soils as somewhat poorly

drained. Specifically, a reduced matrix was made a requirement for poorly drained soils.

Every validation point was then assigned a fuzzy value (Table 3-5) based on a

comparison of the SIE results and the evaluation of whether the field results were typical for the

series’ drainage class. For example, if SIE predicted Colonel, and field results yielded a wet-

fringed Dixfield, a fuzzy membership value of 0.75 was assigned. A high fuzzy membership

number means the field results more closely match the central concept of the drainage class

associated with the predicted soil.

42

Table 3-5. Matrix of fuzzy membership designations comparing SIE results and fuzzy drainage classes.

Field Results SIE Output

Cabot (Poorly Drained)

Colonel (Somewhat Poorly Drained)

Dixfield (Moderately Well Drained)

SIE Output Wet fringe

Typ-ical

Dry fringe

Wet Fringe

Typical Dry Fringe

Wet Fringe

Typical Dry Fringe

Cabot 1 1 1 .75 .5 .25 0 0 0

Colonel .25 .5 .75 1 1 1 .75 .5 .25

Dixfield 0 0 0 .25 .5 .75 1 1 1

Accuracy numbers were then determined based on these fuzzy membership designations

by adding up all the fuzzy drainage class memberships in a given drainage class set and dividing

by the number of sample sites in that set.

43

CHAPTER 4 RESULTS AND DISCUSSION

Final Predictions

The initial output maps from SIE show the fuzzy results for each soil series (figures 4-1

through 4-6). On each of these maps, darker colors mean higher fuzzy memberships for that soil.

The final prediction maps (Figures 4-7 and 4-8) for each study area are hardened maps of

the SIE results, and also serve as a proxy for drainage class maps, because each soil type has a

drainage class associated with it. The hardened maps are created by aggregating all three of the

fuzzy membership maps for each study area using SIE to assign, at each pixel, the soil series

with the highest fuzzy membership.

44

Figure 4-1. Fuzzy prediction map of Cabot soil series for study area W1

45

Figure 4-2. Fuzzy prediction map of Colonel soil series for study area W1

46

Figure 4-3. Fuzzy prediction map of Dixfield soil series for study area W1

47

Figure 4-4. Fuzzy prediction map of Cabot soil series for study area W2

48

Figure 4-5. Fuzzy prediction map of Colonel soil series for study area W2

49

Figure 4-6. Fuzzy prediction map of Dixfield soil series for study area W2

50

Figure 4-7. Final prediction maps of soil series for study area W1.

51

Figure 4-8. Final prediction map of soil series for study area W2.

52

Evaluation of Predicting Soil Series

The one-to-one comparison of the hardened map to the soil series as found in the field

yielded low (42.6 percent) accuracy for the calibration sites in W1.

The one-to-one comparison of the hardened map to the soil series as found in the field

yielded 73.7 percent accuracy overall in W1 (validation sites) and 71.4 percent accuracy overall

in W2. The confusion tables below show the breakdown of percent accuracy results by series

name.

Table 4-1. Confusion table that compares calibration prediction results based on SIE to observed soil series including most similar soil series using 90 model development sites in W1

Observations Calibration sites (n:90)

Percent Cabot Colonel Dixfield

Cabot 42 25 33

Colonel 21 47 33

Predictions

Dixfield 9 52 39

Table 4-2. Confusion table that compares validation prediction results based on SIE to observed

soil series including most similar soil series using 38 independent evaluation sites in W1

Observations Validation sites (n:38)


Cabot 73 27 0

Colonel 15 77 8

Predictions

Dixfield 0 30 70

53

Table 4-3. Confusion table that compares validation prediction results based on SIE to observed soil series including most similar soil series using 42 validation independent evaluation sites in W2



Cabot 69 31 0

Colonel 11 63 26

Predictions

Dixfield 0 10 90

Since the accuracy for the calibration points was so low compared to the validation points

in W1, multiple iterations of statistics were done using different arrangements of points as

representing calibration versus validation points within W1 (Tables 4-4 and 4-5). A breakdown

of these results can be seen in Appendix D.

Table 4-4. Confusion table that compares calibration prediction results based on SIE to observed soil series using 90 model development sites in W1 using 9 calibration runs

Observations

Calibration sites (n:90) Percent Cabot

Colonel Dixfield

Cabot 42 to 56 (mean: 51)

18 to 32 (mean: 26)

18 to 33 (mean: 23)

Colonel 18 to 27 (mean: 22)

40 to 58 (mean: 50)

21 to 40 (mean: 28)

Predictions

Dixfield 0 to 10 (mean: 7)

36 to 55 (mean: 47)

35 to 55 (mean: 47)

54

Table 4-5. Confusion table that compares validation prediction results based on SIE to observed soil series using 38 independent evaluation sites in W1 using 9 validation runs

Observations

Validation sites (n:38) Percent Cabot Colonel

Dixfield

Cabot 50 to 73 (mean: 60)

9 to 45 (mean: 24)

0 to 27 (mean: 16)

Colonel 6 to 31 (mean: 19)

38 to 77 (mean: 52)

8 to 40 (mean: 29)

Predictions

Dixfield 0 to 18 (mean: 5)

30 to 64 (mean: 43)

36 to 70 (mean: 52)

Fuzzy Drainage Class

The fuzzy drainage class results show an overall average between classes of 88.8 percent

accuracy in W1 and 89.9 percent accuracy in W2 (validation sets). The calibration points were

62.6 percent accurate overall when comparing fuzzy drainage class prediction results. While the

calibration points still had lower accuracy numbers than the validation points, the drainage class

results show higher accuracy (Tables 4-6, 4-7, and 4-8) than the one-to-one soil series

comparison seen in the above confusion tables.

Table 4-6. Confusion table that compares calibration prediction results based on SIE to observed drainage classes using 90 model development sites in W1


Percent Poorly Drained Somewhat Poorly Drained

Moderately Well Drained

Poorly Drained 68 32 0

Somewhat Poorly Drained

17 54 29

Predictions


0 33 66

55

Table 4-7. Confusion table that compares validation prediction results based on SIE to observed drainage classes using 38 independent evaluation sites in W1






7 87 6

Predictions


0 15 85

Table 4-8. Confusion table that compares calibration prediction results based on SIE to observed

soil series using 90 model development sites in W1 (configuration 2) Observations Validation sites (n:42)





6 73 21

Predictions


0 5 95

The overall accuracy ratings for each study area, distributed by drainage class, are

presented in table 4-9.

Table 4-9. Percent accuracy overall based on fuzzy drainage class membership (Validation)

W1

W2

Poorly Drained (Cabot)

93

90

Somewhat Poorly Drained (Colonel)

88

83

Moderately Well Drained (Dixfield)

87

95

Poorly drained and somewhat poorly drained soils had field results that more closely

matched the central concept of the predicted drainage class in W1 than in W2. However,

56

moderately well drained soils showed higher accuracy in W2 than in W1. Overall, assigning

fuzzy memberships to the validation point field data brought accuracy ratings up from the raw

comparison between the hardened SIE results and soil series.

Discussion

The results from both the direct comparison between the hardened map and field results

and the fuzzy drainage class comparison show that the model is highly transferable between the

two study areas, specifically looking at the validation points. The calibration points showed

lower accuracy compared to the validation points, which could be a result of the fact that the

calibration set is so much bigger than the validation set in W1 and thus captures more variability

in the landscape.

The results from different configurations of points show that the model is sensitive to the

selection of sample and observation sites for calibration and validation. This is illustrated by the

fact that the accuracy numbers change, at times dramatically, between soil series and point

selections. It must be considered that the validation set is small relative to the calibration set and

a random selection of points can skew the results one way or another.

If it is considered that the even though the calibration points resulted in low accuracy

numbers, the overall results for W1 looked reasonable according to expert soil scientists (myself

included), and the study was pushed forward to the validation stage, it can be seen that the result

for the validation sets showed high accuracy numbers and thus good transferability between

similar areas. The model should therefore transfer well to other areas that are similar to these

study areas. As one or more environmental factors change, the transferability of the model will

go down.

57

Assigning fuzzy drainage class memberships not only brings accuracy numbers up, but it

points to the concept of a continuous field model, with soils changing gradually across the

landscape rather than having discrete boundaries between one another.

This model represented the basic soil landscape relationship of drier soils occurring

higher on the landscape and wetter soils occurring lower on the landscape. The curves (rules)

were designed with the two environmental variables (slope and wetness index) to reflect this

relationship. As the slope increased and wetness decreased, drier soils took over. Wetness index

served as a proxy for landscape position because it is a function of such.

The resulting maps reflected the modeled soil landscape relationships in that the driest

soil, Dixfield, generally occurred highest on the landscape and the wettest soil, Cabot, occurred

lowest on the landscape, in the drainageways and flat areas. Colonel, which is between Cabot

and Dixfield both in relative slope position and drainage class, occurred on middle slopes,

generally in between Cabot and Dixfield soils.

Knowledge-based prediction models have previously been compared to traditional soil

mapping (Zhu et al., 2001). The model that was tested in that study was SoLIM, and SoLIM was

found to be correct 81 percent of the time compared to the soil map being correct 61 percent of

the time at one site, and the corresponding numbers at another site were 83.8 percent and 66.7

percent, respectively. The areas in this study have not been mapped traditionally, so such a

comparison cannot be made; however, the SIE accuracy numbers were slightly lower, at 73.7

percent in W1 and 71.4 percent in W2. The fact that the county has never been mapped could be

one reason for the lower accuracy numbers; it is reasonable to assume that a soil scientist

creating a model for an area that has already been worked in extensively would create a more

accurate model.

58

There is discussion in the soil mapping community (undocumented meeting discussions)

about raster versus vector mapping. The output from SIE is pixel, or raster, based, and this could

have some benefits for users of the soil information. Traditionally, soil maps have been given out

in polygon format, with each polygon representing a map unit labeled with one or two named

soil types, and a customer would have to look at metadata to find out that there is actually the

possibility of finding multiple other soils within that polygon. With raster data output, it is much

easier to create a map that shows the continuous distribution of those so-called “inclusions” of

soils within the map units. The accuracy of the raster soils data depends on the accuracy of the

inputs, right down to the DEM. For this study, there was a very accurate one meter DEM

available, which is not the case in most places. This raster resolution could affect the spatial

resolution of the soil prediction maps. For this study, as for the rest of the county that is currently

being mapped, the scale is 1:24,000.

There are constraints to this model. Three soil series were modeled, with accuracies

between 70 and 80 percent. That leaves 20 to 30 percent unexplained. Of the five CLORPT

factors (climate, organisms, relief, parent material, or time), the one that most likely plays the

biggest role in variability in this region is relief. This corresponds to the environmental factors

that were used to create the model; slope and wetness, in that the catena concept shows that as

topography varies, so does drainage and wetness. Variability in topography can lead to

variability in drainage, though the catena concept would suggest that if the topography varies in

the same manner, the drainage would change accordingly. Sudden changes in land surface occur

indiscriminately across the landscape in both study areas. Tied in to this is the fact that

evaluation of results relied partly on accuracy of GPS readings. Most of the study areas are

59

forested and even with extra backpack antennas, there is the possibility that sample holes were

dug outside of the correct pixel, on a slightly different landscape position.

Many more than three soils will need to be modeled at a time in the future. This model

was limited to three soils as a test of one catena. If a soil scientist can conceptualize a soil

landscape model and has the available data layers to transfer that concept into a rule, SIE can be

used to model that soil type or class. For this model, multiple other data layers were tested and

ultimately not used because it was found that they did not add any benefit to the models outcome

and only served to complicate things. This is not always the case, and as more, similar soils get

added to the mix, it becomes necessary to add more environmental layers in order to differentiate

between soil types.

Soil properties are of interest to consultants, researchers, and agencies for multiple uses.

A knowledge-based model such as SIE has the potential to predict continuous soil properties in

the same manner as described above for soil classes. Zhu et al. (2001) modeled soil properties in

two study areas using fuzzy logic knowledge-based modeling. If a soil property model can be

conceptualized and environmental data layers are available that allow the transference of that

knowledge to the model, inference should be able to be performed. However, SIE has not been

tested as a tool for modeling soil properties, so more research and development would need to be

invested in order to investigate the question of continuous soil property prediction.

60

CHAPTER 5 SUGGESTIONS FOR FURTHER RESEARCH

There are multiple related issues that directly impact soil scientists working with SIE. The

first is that of data manipulation, and at what point has the DEM been manipulated a sufficient

amount to accurately reflect what is on the ground and also allow for relatively flawless rule

development? There are infinite possibilities for data manipulation built into not only the SIE

software, but to the other GIS software packages that soil scientists use every day.

One other issue is the method of evaluation of results. For this study, the results were

evaluated on a pixel level, which is important on a very basic level, and must be done before a

model can be considered useful. However, the soil scientists who use SIE for soil mapping are

more concerned with an end product that fits the concept of map units. The new concept of map

units could be raster data, though there is still currently a need for vectors due to SSURGO (Soil

Survey Geographic) Database standards. It becomes important to know if the level of detail that

SIE provides is not only accurate, but does it translate to map unit composition concepts? High

resolution soil maps are easy to understand and likeable by soil scientists, but other users such as

conservation planners and farmers find them daunting and wonder if the detail is really how soils

occur across the landscape. Questions remain on the validity of creating vector maps for map

units from the raster data, while preserving the raster data for later use. The results of this study

can be built upon to move into a map unit discussion, where applicable.

A third question is that of the limits of transferability. This study demonstrates that models

are transferable between similar landscapes, but there is sure to come a point when they are not

transferable. When is this point? Can it be defined within certain types of landscapes? Soil

variability is linked to variability within CLORPT factors (climate, organisms, relief, parent

61

material, time, as well as spatial position), and if the CLORPT factors in two soil regions differ,

it is reasonable to believe that transferability will be limited.

All these questions are important to the study of soils and soil landscape analysis, and can

surely be investigated readily.

62

APPENDIX A DOCUMENTATION EXAMPLES

Figure A-1. Sample point 127 description

63

Figure A-2. Sample point 127 profile photo

64

Figure A-3. Sample point 127 landscape photo

65

APPENDIX B VEGETATIVE ARTIFACTS IN DIGITAL ELEVATION DATA

Figure B-1. Vegetative artifacts in digital elevation data

66

APPENDIX C FUZZY DRAINAGE CLASS DESIGNATIONS

Table C-1. Study area W1 fuzzy drainage class designations (validation)

Point # Described as?

SIE Result 1= Cabot 2=Colonel 3=Dixfield Drainage Class Features

Typical of described drainage class?

Fuzzy Membership

8 Colonel 2 Redox at 34 cm. Yes 1 9 Colonel 2 Redox at 31 cm. Yes 1 14 Colonel 1 redox at 6 cm fringe toward pd 0.75 18 Dixfield 2 redox at 54 cm fringe toward spd 0.75 21 Colonel 1 redox at 19 cm fringe toward pd 0.75 23 Colonel 1 redox at 0 cm fringe toward pd 0.75

29 Cabot 1

O horizon 12 cm., depleted matrix with redox Yes 1

30 Dixfield 3 redox at 62 cm. Yes 1 31 Dixfield 3 redox at 44 cm. fringe toward spd 1 32 Dixfield 3 redox at 46 cm. fringe toward spd 1

38 Cabot 1 O horizon 3 cm, depleted matrix with redox Yes 1

42 Colonel 2 Redox at 17 cm fringe toward pd 1

47 Cabot 1 O horizon 4cm, depleted matrix with redox Yes 1



51 Colonel 2 Redox at 37 cm. fringe toward mwd 1

54 Cabot 1

O horizon 16 cm, depleted matrix with redox fringe toward vpd 1

56 Cabot 1 O horizon 2cm, depleted matrix with redox Yes 1


59 Cabot 1 O horizon 10 cm, chroma 4 above 76 cm fringe toward spd 1

64 Colonel 2 redox at 25 cm. Yes 1 66 Colonel 2 redox at 15 cm. fringe toward pd 1 71 Colonel 2 redox at 3 cm. fringe toward pd 1 80 Colonel 2 redox at 9 cm. fringe toward pd 1 87 Dixfield 3 redox at 58 cm. Yes 1

67

90 Colonel 2 redox at 20 cm. fringe toward pd 1

91 Cabot 1 O horizon 4 cm, chroma 3 within 76 cm fringe toward spd 1

98 Dixfield 3 redox at 45 cm fringe toward spd 1 121 Colonel 3 redox at 36 cm Yes 0.5

126 Cabot 2

O horizon 15 cm, depleted matrix with redox Yes 0.5

127 Colonel 2 Redox at 34 cm. Yes 1 128 Colonel 1 redox at 0 cm fringe toward pd 0.75

135 Cabot 2

O horizon 20 cm, depleted matrix with redox fringe toward vpd 0.25

137 Dixfield 3 redox at 68 cm Yes 1 141 Colonel 3 redox at 7 cm fringe toward pd 0.25 144 Dixfield 3 redox at 48 cm fringe toward spd 1 154 Cabot 1 O horizon 19 cm Yes 1 158 Colonel 3 redox at 27 cm Yes 0.5

Table C-2. Study area W2 fuzzy drainage class designations (validation)




Fuzzy Membership

1 Colonel 1 redox at 24 cm. Yes 0.5

2 Dixfield 3 only faint redox at 70 cm. fringe toward wd 1

3 Dixfield 2 redox at 48 cm fringe toward spd 0.75

4 Colonel 1 redox at 16 cm fringe toward pd 0.75


6 Colonel 2 redox at 12 cm fringe toward pd 1

7 Dixfield 3 redox at 30 cm but really Sunapee for model 1

8 Dixfield 3 redox at 63 cm Yes 1 9 Dixfield 3 redox at 82 cm Yes 1 10 Colonel 2 redox at 35 cm Yes 1


12 Dixfield 2 redox at 63 cm Yes 0.5

13 Dixfield 3 redox at 33 cm but really Monadnock for model 1

68

14 Colonel 2 redox at 23 cm Yes 1



17 Dixfield 3 no redox fringe toward wd 1


19 Colonel 2 redox at 24 cm. Yes 1

20 Cabot 1

O horizon 12 cm, depleted matrix with redox Yes 1

21 Colonel 2 redox at 32 cm Yes 1


23 Colonel 3 redox at 35 cm Yes 0.5

24 Dixfield 3 redox at 53 cm fringe toward spd 1

25 Cabot 1

O horizon 15 cm, depleted matrix with redox Yes 1



28 Cabot 1 O horizon 16 cm fringe toward vpd 1

29 Cabot 2 O horizon 4 cm, depleted matrix with redox Yes 0.5








37 Cabot 1 O horizon 12 cm, chroma fringe toward 1

69

3 above 76 cm spd


39 Cabot 1 O horizon 13 cm, reduced matrix with redox Yes 1


41 Cabot 2 O horizon 14 cm, reduced matrix with redox Yes 0.5

42 Cabot 1 O horizon 9 cm, reduced matrix with redox Yes 1

Table C-3. Study Area W-1 Fuzzy drainage class designations (calibration)




Fuzzy Membership

1 Cabot 1

O horizon 20 cm., depleted matrix with redox

fringe toward vpd 1

2 Cabot 1


fringe toward vpd 1

4 Dixfield 2 redox at 59 cm. Yes 0.5

5 Colonel 3 redox at 7 cm. fringe toward pd 0.25

6 Cabot 2


fringe toward vpd 0.25

10 Dixfield 1 redox at 12 cm. fringe toward pd 0

11 Cabot 1 O horizon 27 cm but really Peacham

fringe toward vpd 1


13 Cabot 2 O horizon 17 cm, depleted matrix with redox




20 Colonel 2 redox at 39 cm fringe toward mwd 1

22 Colonel 2 redox at 28 cm. yes 1 24 Colonel 1 redox at 28 cm. Yes 0.5

70



28 Cabot 2 O horizon 8 cm., depleted matrix with redox Yes 0.5





39 Dixfield 2 redox at 57 cm yes 0.5

41 Dixfield 2 redox at 0 cm but really Lyman for model 0.5

43 Colonel 3 redox at 37 cm fringe toward mwd 0.75


45 Cabot 3 chroma 3 fringe toward swpd 0

46 Cabot 2 O horizon 38 cm, but really Peacham



52 Dixfield 2 redox at 50 fringe toward spd 0.75

53 Dixfield 2 redox at 45 fringe toward spd 0.75

55 Colonel 2 redox at 25 cm yes 1



62 Dixfield 1 redox at 59 cm. yes 0


65 Colonel 3 redox at 25 cm yes 0.5


68 Cabot 2 O horizon 5 cm, depleted matrix with redox yes 0.5

69 Colonel 1 redox at 5 cm fringe toward 0.75

71

pd


72 Dixfield 3 redox at 12 cm. but really Tunbridge for model 1


76 Colonel 2 redox at 40 cm fringe toward mwd 1

77 Cabot 2 chroma 3 fringe toward spd 0.75


79 Cabot 1 O horizon 1 cm, depleted matrix with redox yes 1



86 Dixfield 2 redox at 23 cm, but really Tunbridge for model 0.5



92 Dixfield 2 No redox, but really Tunbridge for model 0.5

95 Dixfield 3 redox at 0 cm, but really Tunbridge for model 1


97 Cabot 3 O horizon 9 cm, depleted matrix with redox yes 0




106 Cabot 1 croma 3 fringe toward spd 1

107 Dixfield 3 no redox but really Abram for model 1

113 Cabot 2 chroma 3 fringe toward spd 0.75

114 Dixfield 3 no redox but really Tunbridge for model 1

120 Dixfield 2 redox at 49 cm fringe toward 0.75

72

spd

122 Cabot 1 chroma 3 fringe toward spd 1

123 Dixfield 2 no redox but really Berkshire for model 0.5


130 Dixfield 1 redox at 37 cm but really Sunapee

fringe toward spd 0

131 Colonel 2 redox at 28 cm yes 1 132 Dixfield 3 redox at 74 cm yes 1 133 Colonel 3 redox at 26 cm yes 0.5 134 Colonel 3 redox at 36 cm yes 0.5





142 Dixfield 3 redox at 9 cm but really Tunbridge for model 1







fringe toward spd 1


155 Dixfield 1 no redox but really Abram for model 0


fringe toward spd 1

159 Dixfield 2 redox at 23 but really Sheepscot

fringe toward spd 0.75

73

APPENDIX D PREDICTION RESULTS FROM W1 (MULTIPLE SAMPLE CONFIGURATIONS)

Table D-1. Confusion table that compares calibration prediction results based on SIE to observed soil series using 90 model development sites in W1 (configuration 2)



Cabot 50 27 23

Colonel 18 53 30

Predictions

Dixfield 10 55 35

Table D-2. Confusion table that compares validation prediction results based on SIE to observed

soil series using 38 independent evaluation sites in W1 (configuration 2) Observations Validation sites (n:38)


Cabot 67 22 11

Colonel 31 38 31

Predictions

Dixfield 0 31 69

Table D-3. Confusion table that compares calibration prediction results based on SIE to observed

soil series using 90 model development sites in W1 (configuration 3) Observations Calibration sites (n:90)


Cabot 52 28 20

Colonel 21 48 31

Predictions

Dixfield 9 48 43

74

Table D-4. Confusion table that compares validation prediction results based on SIE to observed soil series using 38 independent evaluation sites in W1 (configuration 3)



Cabot 57 21 21

Colonel 21 50 29

Predictions

Dixfield 0 40 60




Cabot 50 32 18

Colonel 27 45 28

Predictions

Dixfield 9 36 55




Cabot 64 9 27

Colonel 6 56 38

Predictions

Dixfield 0 64 36

75




Cabot 54 18 29

Colonel 26 43 31

Predictions

Dixfield 10 45 45




Cabot 55 45 0

Colonel 7 64 29

Predictions

Dixfield 0 46 54




Cabot 55 23 22

Colonel 22 57 21

Predictions

Dixfield 0 45 55

76




Cabot 53 29 18

Colonel 20 40 40

Predictions

Dixfield 18 45 36

Table D-11. Confusion table that compares calibration prediction results based on SIE to

observed soil series using 90 model development sites in W1 (configuration 7) Observations Calibration sites (n:90)


Cabot 54 25 21

Colonel 21 55 24

Predictions

Dixfield 5 45 50

Table D-12. Confusion table that compares validation prediction results based on SIE to

observed soil series using 38 independent evaluation sites in W1 (configuration 7) Observations Validation sites (n:38)


Cabot 55 27 18

Colonel 21 50 29

Predictions

Dixfield 8 46 46

77




Cabot 56 26 19

Colonel 18 58 24

Predictions

Dixfield 8 48 44

Table D-14. Confusion table that compares validation prediction results based on SIE to

observed soil series using 38 independent evaluation sites in W1 (configuration 8) Observations Validation sites (n:38)


Cabot 50 25 25

Colonel 28 44 28

Predictions

Dixfield 0 37 63

Table D-15. Confusion table that compares calibration prediction results based on SIE to

observed soil series using 90 model development sites in W1 (configuration 9) Observations Calibration sites (n:90)


Cabot 50 32 18

Colonel 20 55 25

Predictions

Dixfield 0 45 55

78




Cabot 64 9 27

Colonel 25 50 25

Predictions

Dixfield 18 45 36

79

LIST OF REFERENCES

Bishop, T.F.A., and B. Minasny. 2006. Digital soil-terrain modeling: the predictive potential and uncertainty. p. 185-213. In Grunwald, S. (ed.) Environmental Soil Landscape Modeling, Geographic Information Technologies and Pedometrics. Taylor and Francis, New York. Bouma, J. 1989. Using soil survey data for quantitative land evaluation. Advances in Soil Science 9:177-213. Cook, S.E., R.J. Corner, G. Grealish, P. E. Gessler, and C. J. Chartress. 1996. A rule-based system to map soil properties. Soil Sci. Soc. Am. J. 60:1893-1900. Grunwald, S. 2006. What do we really know about the space-time continuum of soil- landscapes? p. 3-36. In Grunwald, S. (ed.) Environmental Soil Landscape Modeling, Geographic Information Technologies and Pedometrics. Taylor and Francis, New York. Jenny, H. 1941. Factors of Soil Formation, A System of Quantitative Pedology. McGraw-Hill, New York. Lagacherie, P., J.P. Legros, and P.A. Burrough. 1995. A soil survey procedure using the knowledge on soil pattern of a previously mapped reference area. Geoderma 65:283- 301. Lagacherie, P. and M. Voltz. 2000. Predicting soil properties over a region using sample information from a mapped reference area and digital elevation data: a conditional probability approach. Geoderma 97:187-208. Lamsal S., S. Grunwald, G.L. Bruland, C.M. Bliss and N.B. Comerford. 2006. Regional hybrid geospatial modeling of soil nitrate-nitrogen in the Santa Fe River Watershed. Geoderma 135:233-247. McBratney, A.B. 2006. Background to digital soil mapping. International Working Group on Digital Soil Mapping. Retrieved April 23, 2007 from http://www.digitalsoilmapping.org/DSM_Background.html McBratney, A.B., and I.O.A. Odeh. 1997. Application of fuzzy sets in soil science: fuzzy logic, fuzzy measurements and fuzzy decisions. Geoderma 77:85-113. McBratney, A.B., I.O.A. Odeh, T.F.A. Bishop, M.S. Dunbar, and T.M. Shatar. 2000. An overview of pedometric techniques for use in soil survey. Geoderma 97:293-327. McBratney, A.B., B. Minasny, S.R. Cattle, and R. W. Vervoort. 2002. From pedotransfer functions to soil inference systems. Geoderma 109:41-73. McBratney, A.B., M.L. Mendonca Santos, and B. Minasny. 2003. On digital soil mapping. Geoderma 117:3-52.

80

http://www.digitalsoilmapping.org/DSM_Background.html

McSweeney, K., P.E. Gessler, B.K. Slater, R.D. Hammer, J.C. Bell, and G. W. Petersen. 1994. Towards a new framework for modeling the soil-landscape continuum. p. 127-145. In R.G. Amundson et al (ed.) Factors of soil formation: A fiftieth anniversary retrospective. SSSA Spec. Publ. no. 33. SSSA, Madison, WI. Michaelsen, J., D.S. Schimel, M.A. Friedl, F.W. Davis, and R.C. Dubayah. 1994. Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. J. of Vegetation Science 5:673-686. Milne, G. 1935. Some suggested units of classification and mapping particularly for East African soils. Soil Res. 4:183 -198. National Cooperative Soil Survey. 2008. Retrieved July 24, 2008 from http://www2.ftw.nrcs.usda.gov/osd/dat/C/COLONEL.html Pennock, D.J., and A. Veldkamp. 2006. Advances in landscape-soil research. Geoderma 133:1-5. Pennock, D.J., B.J. Zebarth, and E. De Jong. 1987. Landform classification and soil distribution in Hummocky terrain, Saskatchewan, Canada. Geoderma 40:297-315. Scull, P., J. Franklin, O.A. Chadwick, and D. McArthur. 2003. Predictive soil mapping: a review. Progress in Physical Geography 27:171-197. Shi, X. 2006. Soil inference engine user’s guide. In progress, unpublished. Shi, X. 2007. ArcSIE Help Document. In progess, unpublished. Shi, X., R. Long, R. DeKett, and J. Philippe. 2008. Integrating different types of knowledge for digital soil mapping. Soil Sci. Soc. Am. J. Accepted. Shi, X., A. X. Zhu, J. E. Burt, F. Qi, and D. Simonson. 2004. A case-based reasoning approach to fuzzy soil mapping. Soil Sci. Soc. Am. J 68:885-894. Skidmore, A.K., F. Watford, P. Luckananurug, and P. J. Ryan. 1996. An operational GIS expert system for mapping forest soil. Photogrammetric Engineering and Remote Sensing 62:501-511. Thompson, J.A., E.M. Pena-Yewtukhiq, and J.H. Grove. 2006. Soil-landscape modeling across a physiographic region: topographic patterns and model transportability. Geoderma 133:57-70. Voltz, M., P. Lagacherie, and X. Louchart. 1997. Predicting soil properties over a region using sample information from a mapped reference area. Eur. J. Soil Sci. 48:19-30. Wösten, J.H.M., P.S. Finke, and M.J.W. Jansen. 1995. Comparison of class and continuous pedotransfer functions to generate soil hydraulic characteristics. Geoderma 66:227-237.

81

Zadeh, L.A. 1965. Fuzzy sets. Information and Control 8:338-353.

82

BIOGRAPHICAL SKETCH

Jessica (McKay) Philippe received a Bachelor of Science degree in 2005 from the

University of Vermont in natural resources planning with a minor in plant and soil science. She

is employed as a soil scientist with the USDA-Natural Resources Conservation Service in Saint

Johnsbury, Vermont. She lives with her husband and two cats in Newport, Vermont.

83

using a knowledge-based system to ... - ufdc image array...

Documents