impact of heuristic knowledge discovery techniques on ...alazar/pdf/adiss.pdf · impact of...
TRANSCRIPT
-
IMPACT OF HEURISTIC KNOWLEDGE DISCOVERY TECHNIQUESON MULTI-AGENT SIMULATION OF CULTURAL EVOLUTION
by
ALINA LAZAR
DISSERTATION
Submitted to the Graduate School
of Wayne State University,
Detroit, Michigan
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
2002
MAJOR: COMPUTER SCIENCE
Approved by:
Advisor Date
-
c COPYRIGHT BY
ALINA LAZAR
2002
All Rights Reserved
-
DEDICATION
To Mircea and to my parents.
ii
-
ACKNOWLEDGMENTS
I would like to offer my thanks to my advisor Professor Robert Reynolds for his
continuing support and guidance. It has been a privilege to have him as professor as
well as thesis advisor.
Most of this research was supported by NSF IIS Grant ]9907257.
Appreciation goes to all my dissertation committee members: Dr. William Grosky,
Dr. George Yin and Dr. Sherali Zeadally for accepting this task.
Thanks to the Department of Computer Science, Wayne State University for giv-
ing me the opportunity to study here and make this work possible.
Finally, I wish to express my grateful acknowledgments to the professors in the
Department of Mathematics and Computer Science, West Univeristy of Timisoara.
They taught me the background in computer science and mathematics during my
undergraduate studies and encouraged me to pursue a doctoral degree.
iii
-
TABEL OF CONTENTS
Chapter Page
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
CHAPTERS
CHAPTER - 1 INTODUCTION . . . . . . . . . . . . . . . . . 1
SECTION - 1.1 Simulating Culture in Algorithms . . . . . . . . 1
SECTION - 1.2 Archaeological Knowledge Discovery Problem . 2
SECTION - 1.3 Heuristics . . . . . . . . . . . . . . . . . . . . . 3
SECTION - 1.4 Multi-agent Based Simulation . . . . . . . . . . 6
CHAPTER - 2 EVOLUTIONARY LEARNING UNDER UNCER-
TAINTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
SECTION - 2.1 Uncertainty . . . . . . . . . . . . . . . . . . . . 8
SUBSECTION - 2.1.1 Fuzzy Sets . . . . . . . . . . . . . . . . . 9
SUBSECTION - 2.1.2 Rough Sets . . . . . . . . . . . . . . . . . 10
SECTION - 2.2 Evolutionary Computation . . . . . . . . . . . . 10
CHAPTER - 3 EXTRACTING PATTERNS FROM ARCHAE-
OLOGICAL DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
SECTION - 3.1 Data Material . . . . . . . . . . . . . . . . . . . 13
SECTION - 3.2 Large-Scale Data . . . . . . . . . . . . . . . . . 15
iv
-
SECTION - 3.3 Uncertainty in Data . . . . . . . . . . . . . . . . 15
SECTION - 3.4 Redundant Data . . . . . . . . . . . . . . . . . 17
SECTION - 3.5 Dynamic Data . . . . . . . . . . . . . . . . . . . 18
SECTION - 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . 18
CHAPTER - 4 ROUGH SETS FORMALIZATION . . . . . . . 19
SECTION - 4.1 Formal Definitions and Properties . . . . . . . . 19
SECTION - 4.2 Discernibility Matrix . . . . . . . . . . . . . . . 21
SECTION - 4.3 Indiscernibility Relations . . . . . . . . . . . . . 21
SECTION - 4.4 Rough Sets . . . . . . . . . . . . . . . . . . . . 22
SECTION - 4.5 Reducts and the Core of Knowledge . . . . . . . 23
SECTION - 4.6 Reducts, Hitting Sets, Approximate Hitting Sets 25
SECTION - 4.7 Decision Rules, Decision Systems . . . . . . . . 26
SECTION - 4.8 Decision System Construction . . . . . . . . . . 27
SECTION - 4.9 Steps for Model Construction . . . . . . . . . . 30
CHAPTER - 5 EVOLUTION-BASED LEARNING OF ONTO-
LOGICAL KNOWLEDGE FOR A LARGE-SCALE MULTI-AGENT
SIMULATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
SECTION - 5.1 Introduction . . . . . . . . . . . . . . . . . . . . 33
SECTION - 5.2 Decision Tree . . . . . . . . . . . . . . . . . . . 37
SECTION - 5.3 Decision Trees and Rough Set Rules . . . . . . . 40
SECTION - 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . 45
SECTION - 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . 47
CHAPTER - 6 RULE SET GENERATION FOR ALL REGIONS
AND PERIODS OF THE VALLEY . . . . . . . . . . . . . . . . . . . 49
SECTION - 6.1 Introduction . . . . . . . . . . . . . . . . . . . . 49
SECTION - 6.2 Reducts and Their Rules for the Tierras Largas
Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
v
-
SECTION - 6.3 San Jose Phase Reducts and Related Rules . . . 54
SECTION - 6.4 Reducts for the Guadalupe Phase and Related
Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
SECTION - 6.5 Reducts and related rules for the Rosario Phase 55
SECTION - 6.6 Reducts and Related Rules for Monte Alban Ia 58
SECTION - 6.7 Reducts and Related Rules of Monte Alban Late I 64
SECTION - 6.8 Monte Alban II Reducts and Related Rules . . . 67
SECTION - 6.9 Monte Alban IIIa reducts and Related Rules . . 71
SECTION - 6.10 Comparison of the Rough Set Results with Deci-
sion Trees Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 75
SECTION - 6.11 Conclusions . . . . . . . . . . . . . . . . . . . . 77
CHAPTER - 7 PROTOTYPE OF THE MODEL . . . . . . . . 78
SECTION - 7.1 Introduction . . . . . . . . . . . . . . . . . . . . 78
SECTION - 7.2 Modeling the Environment in the Valley of Oaxaca 82
SECTION - 7.3 Modeling the Impact of Changing Social Rela-
tions on Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
SECTION - 7.4 Acquiring the Rules Used to Constrain Agent Be-
haviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
SECTION - 7.5 Modeling Agent Interaction Using Evolutionary
Game Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
SECTION - 7.6 Conclusions . . . . . . . . . . . . . . . . . . . . 91
CHAPTER - 8 THE IMPACT OF RULE ONTOLOGIES ON
MODEL PERFORMANCE . . . . . . . . . . . . . . . . . . . . . . . 92
SECTION - 8.1 Introduction . . . . . . . . . . . . . . . . . . . . 92
SECTION - 8.2 Experimental Setup . . . . . . . . . . . . . . . . 93
SECTION - 8.3 Simulating Social Evolution from Tierras Largas
Phase into San Jose Phase . . . . . . . . . . . . . . . . . . . . . . . . 93
vi
-
SECTION - 8.4 San Jose to Guadalupe Results . . . . . . . . . 101
SECTION - 8.5 Simulating from Guadalupe into Rosario . . . . 109
SECTION - 8.6 Simulation of Rosario into Monte Alban I . . . . 110
SECTION - 8.7 Simulating social evolution from Monte Alban Ia
to Monte Alban Ib . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
SECTION - 8.8 Simulation from Monte Alban Ib to Monte Alban II125
SECTION - 8.9 Conclusions . . . . . . . . . . . . . . . . . . . . 133
CHAPTER - 9 CONCLUSION AND FUTURE RESEARCH . . 138
SECTION - 9.1 Summary and Conclusions . . . . . . . . . . . . 138
SECTION - 9.2 Future Work . . . . . . . . . . . . . . . . . . . . 140
APPENDICIES
APPENDIX - A DECISION TREE RULES . . . . . . . . . . . 141
APPENDIX - B ROUGH SET REDUCTS . . . . . . . . . . . . 146
APPENDIX - C ROUGH SET RULES . . . . . . . . . . . . . . 151
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
AUTOBIOGRAPHICAL STATEMENT . . . . . . . . . . . . . . . . . . . 176
vii
-
LIST OF TABLES
4.1 Example of Decision Table . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Full Discernibility Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Core Discernibility Matrix . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Two Minimal Discernibility Matrix . . . . . . . . . . . . . . . . . . . 25
4.5 Exhaustive Decision System . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Exhaustive Decision System . . . . . . . . . . . . . . . . . . . . . . . 29
5.1 Number of Terminal Nodes for each Phase . . . . . . . . . . . . . . . 38
5.2 Decision Rule Set Induced from the Decision Tree . . . . . . . . . . . 40
5.3 A Comparison of Decision Trees versus Rough Sets . . . . . . . . . . 42
5.4 First Exhaustive Decision System for the Rosario Phase in Etla Region 43
5.5 Second Exhaustive Decision System for the Rosario Phase in Etla Region 44
5.6 A Comparison of the Rules Produced by Using Decision Trees (DT)
and Rough Set (RS) Constraint Representations. . . . . . . . . . . . 47
6.1 Reducts for Each of the First Three Phases in the Etla Region . . . . 52
6.2 Reducts for Each of the Last Five Phases in the Etla Region . . . . . 52
6.3 Reducts for Each of the Eight Phases in the Central Valley Region . . 52
6.4 Reducts for Each of the Eight Phases in the Valle Grande Region . . 53
6.5 Reducts for Each of the Eight Phases in the Tlacolula Region . . . . 53
6.6 Reducts for Each of the Eight Phases in the Ocatlan Region . . . . . 53
6.7 Comparison between DT and RS in Terms of Rules Length . . . . . 75
6.8 Comparison between DT and RS in Terms of Average Rules Length . 75
viii
-
6.9 Sensitivity and Specificity for DT and RS . . . . . . . . . . . . . . . . 76
7.1 Environmental Variables . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Land Use Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3 Payoff Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
ix
-
LIST OF FIGURES
1.1 Knowledge Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Complexity of the Data . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 A Description of the Oaxaca Valley . . . . . . . . . . . . . . . . . . . 14
3.3 Visual Representation of Rough Sets . . . . . . . . . . . . . . . . . . 16
3.4 Visual Representation of Fuzzy Sets . . . . . . . . . . . . . . . . . . . 17
4.1 Example of Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Model Construction Phases . . . . . . . . . . . . . . . . . . . . . . . 32
5.1 The decision tree for the locations of sites with evidence of raiding in
Rosario phase in the Etla region of the valley . . . . . . . . . . . . . . 39
7.1 Oaxaca Valley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 Catchment Area for Sites in The Subgrid 7 . . . . . . . . . . . . . . . 84
8.1 Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . . . 94
8.2 DT Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . 95
8.3 RS Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . 96
8.4 Real Sites; Phase San Jose . . . . . . . . . . . . . . . . . . . . . . . . 97
8.5 Relationships Network; Phase San Jose . . . . . . . . . . . . . . . . . 98
8.6 DT Relationships Network; Phase San Jose . . . . . . . . . . . . . . . 99
8.7 RS Relationships Network; Phase San Jose . . . . . . . . . . . . . . . 100
8.8 Occupied Cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . . . 102
8.9 DT Occupied Cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . 103
x
-
8.10 RS Occupied cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . 104
8.11 Real Sites; Phase Guadalupe . . . . . . . . . . . . . . . . . . . . . . . 105
8.12 Relationships Network; Phase Guadalupe . . . . . . . . . . . . . . . . 106
8.13 DT Relationships Network; Phase Guadalupe . . . . . . . . . . . . . 107
8.14 RS Relationships Network; Phase Guadalupe . . . . . . . . . . . . . . 108
8.15 Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . . . . 110
8.16 DT Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . 111
8.17 RS Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . . 112
8.18 Real Sites; Phase Rosario . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.19 Relationships Network; Phase Rosario . . . . . . . . . . . . . . . . . . 114
8.20 DT Relationships Network; Phase Rosario . . . . . . . . . . . . . . . 115
8.21 RS Relationships Network; Phase Rosario . . . . . . . . . . . . . . . 116
8.22 Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . . . 118
8.23 DT Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . 119
8.24 RS Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . 120
8.25 Real Sites; Phase Monte Alban Early I . . . . . . . . . . . . . . . . . 121
8.26 Relationship Network; Phase Monte Alban Early I . . . . . . . . . . . 122
8.27 DT Relationship Network; Phase Monte Alban Early I . . . . . . . . 123
8.28 RS Relationship Network; Phase Monte Alban Early I . . . . . . . . . 124
8.29 Relationship Network; Phase Monte Alban Late I . . . . . . . . . . . 126
8.30 DT Relationship Network - Big Sites; Phase Monte Alban Early I . . 127
8.31 RS Relationship Network - Big Sites; Phase Monte Alban Early I . . 128
8.32 DT Relationship Network - Monte Alban; Phase Monte Alban Early I 129
8.33 RS Relationship Network - Monte Alban; Phase Monte Alban Early I 130
8.34 DT Relationship Network - Big Sites; Phase Monte Alban Late I . . . 131
8.35 RS Relationship Network - Big Sites; Phase Monte Alban Late I . . . 132
8.36 DT Relationship Network - Monte Alban; Phase Monte Alban II . . . 133
xi
-
8.37 RS Relationship Network - Monte Alban; Phase Monte Alban II . . . 134
8.38 DT Relationship Network - Big Sites; Phase Monte Alban II . . . . . 135
8.39 RS Relationship Network - Big Sites; Phase Monte Alban II . . . . . 136
xii
-
1
CHAPTER 1
INTODUCTION
1.1 Simulating Culture in Algorithms
Culture is a complex system of general customs, ideas, values and beliefs, of a
particular civilization, society or social group, especially at a particular time. In
human societies, culture can be viewed as a system which stores information generated
by groups of one or more individuals, in a symbolic form that is potentially accessible
to all the members of the society. This model of a cultural system is helpful in
guiding the problem solving activities and the social interactions of the societys
members. Knowledge such as general customs, ideas, values, and beliefs are stored
and manipulated at a symbolic level as opposed to a genetic level. This knowledge
can be used to direct and control the self-adaptation and the evolution of the social
population. Thus, cultural systems are viewed as a multiple-inheritance system where,
at each time step, knowledge at both the population level and the symbolic or belief
level is transmitted to the next generation.
Cultural Algorithms have been developed by Reynolds [37] in order to model
evolution in systems upon principles of human social evolution. Cultural Algorithms,
are population-based algorithms, and provide a framework for the integration of an
evolution-based search engine (i.e. genetic algorithm), with a knowledge repository or
belief space. The goal of the coevolutionary search engine is to identify individuals in
a population relevant to the problem at hand. Next, relevant knowledge is extracted
from these individuals and reasoned about in the belief space. This knowledge is then
used to guide the individuals in the next time step.
Basic application areas to which cultural algorithms have been applied include re-
source scheduling, evolution of agriculture, concept learning applications, real-valued
-
2
function optimization, genetic programming, fraud detection and recently data min-
ing and knowledge discovery, and modeling dynamic environments.
1.2 Archaeological Knowledge Discovery Problem
Anthropologists interested in ancient societies of Highland Mesoamerica, Valley
of Oaxaca, have used intensive archaeological survey in order to study the state
formation. Since these archaeological surveys were begun in the 1960s, the computer
was an essential tool because of the large quantity of data resulting from the surveys.
After the data was collected, it was placed on punch cards and the additional results
published in several books [4, 5, 19] along with extensive site maps. The reason behind
this archaeological survey was to find answers to the following questions: What were
the characteristics of Mesoamerican agricultural systems? What role did hydraulic
agriculture play in prompting or facilitating the growth of large population centers?
When was irrigation first introduced? What was the nature of these population
centers? When and where did urbanism first arise? What decision making structures
and adaptations were necessary to facilitate these changes? [5].
Our goal for the proposed research is to integrate evolutionary learning tools into
the knowledge discovery process and to apply them to the large-scale, archaeological
spatial-temporal data produced by the surveys. This heuristic based approach used
here will employ rough set concepts in order to represent the domain knowledge and
the hypotheses.
While answers to the questions above can possibly be found by investigating the
large-scale database resulting from the archaeological survey, this database contains
over 2700 regional sites and over 2000 residential sites at the Monte Alban urban
center. Each site is comprised of one or more components and can be occupied
in one or more archaeological periods, spanning a period from approximately 9000
B.C. to 1500 A.C. Thus, the total spatial and temporal scope is so vast as to make
-
3
manual interpretation difficult if not impossible task. In addition, each temporal and
spatial instance of a site component can be described in terms of several hundred
variables of differing types. We can clearly see a gap between data generation and
data understanding here. Tools and techniques from Artificial Intelligence can be
used to fill this gap and aid in the extraction of emergent patterns hidden in the
data, as is shown by Reynolds [35, 38, 39].
Classical tools for generating database queries are good when you know what to
look for, and can answer questions of type How many sites have ceramics from a given
time period and where are their locations? The goal here is to interrogate this data
in order to answer questions about archaic systems, systems that existed thousands
of years ago. To answer these questions we can use a data driven search. Each record
in the data set is accessed sequentially and associated with a node in the search
space. Data driven searches minimize data accesses and can result in time efficient
queries solving, but they are not good for complicated questions, like the ones above,
questions that require the reasoning about spatial and temporal patterns. It may
require exhaustive search in the worst case. To be able to deal with such advanced
queries, more intelligent search tools are needed. The problem now is how to find the
right intelligent search strategy in order to reach our goal.
1.3 Heuristics
Uninformed or blind search, which processes and evaluates all nodes of a search
space in the worst case, is not realistic here because of time constraints are close
related to the dimension of the data. Generally, the search space increases exponen-
tially with problem size thereby limiting the size of problems which can realistically
be solved using exact techniques such as exhaustive search. An alternative solution
is represented by heuristic techniques, which can provide much help in areas where
classical search methods failed.
-
4
The word heuristic comes from greek and means to know, to find, to
discover or to guide a investigation. Specifically, Heuristics are techniques which
seek good (near-optimal) solutions at a reasonable computational cost without being
able to guarantee either feasibility or optimality, or even in many cases to state how
close to optimality a particular feasible solution is. [42]
Heuristic refers to any techniques that improves the average-case performance on
a problem-solving task but does not necessarily improve the worst case performance.
Heuristic techniques search the problem space intelligently using knowledge of pre-
viously tried solutions to guide the search into fruitful areas of the search space.
Often, search spaces are so large that only heuristic search can produce a solution in
reasonable time. These techniques improve the efficiency of a search process, some-
times by sacrificing the completeness or the optimality of the solution. Heuristics are
estimates of the distance remaining to the goal, estimates computed based on the
domain knowledge.
The domain knowledge provides help to heuristics in guiding the search and can
be represented in a variety of knowledge formats. These formats include patterns,
networks, trees, graphs, version spaces, rule sets, equations, and contingency tables.
With regard to heuristics there are a number of generic approaches such as greedy,
A search, tabu search, simulating annealing, and population-based heuristics. The
heuristic methods can be applied to a wide class of problems in optimization, classi-
fication, statistics, recognition, planning and design.
Of special interest is the integration of heuristic search principles with the dynamic
processes in which data becomes available in successive stages, or where data and
inputs are subjects to uncertainties or with large-scale data sets. The integration
is a vehicle to generate data driven hypotheses. The process is shown in figure 1.1.
Our goal is to generate hypotheses in terms of the archaeological data in order help
anthropologists answer their questions.
-
5
Data
Model Hypothesis
Acceptable Solutions
Heuristic Learning
Testing
Patterns
Generating
Dual Interaction
Figure 1.1: Knowledge Discovery
The kind of knowledge produced, and the heuristic search algorithm selected, will
reflect the nature of the data analysis task. In this thesis, the hypotheses will be
represented as sets of decision rules and the extracted rules will be represented in
terms of rough sets. Rough sets were selected because of the nature of our data sets
as we will discuss later, in chapter 3.
From a mathematical point of view the problems that we want to solve, can
be formulated in terms of the well known, minimal set cover problem, which is a
combinatorial optimization problem.
Traditional methods for combinatorial optimization problems are not appropriate
here for several reasons. These methods are NP-hard in the worst case and would be
costly to use given the size of the data set. Also, since archaeological efforts in the
valley are ongoing, new sites can be added to the database which would require the
traditional combinatorial approach to be restarted from scratch. The techniques used
to solve these difficult optimization problems have slowly evolved from constructive
methods, like uniformed search, to local search techniques and to population-based
algorithms.
Cultural Algorithms, as population-based algorithms, are good vehicles in which
-
6
to build meta-level heuristics to guide the search more efficiently. That knowledge,
here we well use rough sets concepts, or rules, can be employed to direct the evolu-
tionary search. The rules can reflect spatial and temporal patterns that will guide the
generation of new candidate search objects by the evolutionary engine. The spatial
and temporal continuity of the data will facilitate this process.
1.4 Multi-agent Based Simulation
After getting decision rules trough evolutionary techniques we want to use them
in a multi-agent model based upon Cultural Algorithms that simulates the evolution
of complex social systems in the Valley of Oaxaca, Mexico. Here, we operationalize
the evolution without stages model of state evolution described by Marcus and
Flannery [23]. Agents are based at sites and have plans for production and resource
exchange with other agents. Agents interact via an evolving game matrix that begins
as a prisoners dilemma model and can evolve to a wardens dilemma model.
Cultural Algorithms are used to model agent learning as the environment evolves.
The organization of the thesis is as follows. Chapter 2 briefly describes the evolu-
tionary learning and ques a short literature review about related approaches. Chapter
3 presents the archeological data sets and the data related problems. Chapter 4 is
dedicated to the theory of Rough Sets. In Chapter 5 we make a comparison between
the Decision Tree approach and Rough Set approach in building a set of rules for a
multi-agent simulation. Chapter 6 presents the Rough Set rules for all five regions of
the Valley using the environmental variables, including land use type. While the site
data identified just 75 sites as targets of warfare, both decision trees and Rough Sets
generalized on this data to identify 987 and 242 sites respectively that are predicted
to be targets of warfare based upon their similarities to the known sites. Chapter 7
describes the large-scale multi-agent simulation model. Chapter 8 describes the runs
of the model using the Rough Set rules and the Decision Tree rules to see the semantic
-
7
impact of the two different rule sets. The resultant set of association between sites
in terms of trade and warfare demonstrated that the Rough Set rules produced a hi-
erarchical structure of associations that was a better fit to the hierarchical structure
observed in the data than that for Decision Trees. It is suggested that lack of consid-
eration of uncertainty in the generation of decision trees led to an overgeneralization
of the rules that describe warfare activities. Chapter 9 concludes with a discussion
of the impact that the rules used determine agent interaction and expertise have on
the emergence of phenomenon in multi-agent systems. And the impact that the rule
learning process has on the rules formed. It is suggested that based upon this model,
warfare is an important device for the structuring of emergent complexity in the social
organization of the valley
-
8
CHAPTER 2
EVOLUTIONARY LEARNING UNDER UNCERTAINTY
Population-based heuristic methods are iterative solution techniques that handle
a population of individuals which are evolving according to a given search strat-
egy. At each iteration, periods of self-adaptation (mutations) alternate with periods
of cooperation (crossover), and periods of competition (selection). The population-
based heuristic search [8] is dependent of the following components: the knowledge
representation for the specific problem we want to solve and the search strategy or
the evolution process. The adaptability of an individual represents its ability to
survive in an uncertain environment [7]. Artificial Intelligence researchers have ex-
plored different ways to represent uncertainty [42]: belief networks, default reasoning,
Dempster-Shafer theory, Fuzzy Set theory, Rough Set theory.
For the problems we want to solve, the learning task will require a representation
that explicitly deals with uncertainty. The evolutionary learning methods that are
employed must be able to work with such a representation. In this chapter we look
first at basic ways to represent uncertainty in developing rules. And, then we will
investigate how that uncertain knowledge can be used to direct evolutionary search
and learning.
2.1 Uncertainty
Uncertainty, as well as evolution, is a part of nature. When humans describe
complex environments, they use linguistic descriptors of cognised real-world circum-
stances, that are often not precise, but rather fuzzy. The theory of fuzzy sets
[55] provides an effective method of describing the behavior of a system which is too
complex to be handle with the classical precise mathematical analysis. The theory
-
9
of rough sets [33] emerged as another mathematical approach for dealing with uncer-
tainty that arises from inexact, noisy or incomplete information. Fuzzy Set theory
assumes that the membership of the objects in some set is defined as a degree rang-
ing over the interval [0, 1]. Rough set theory focuses on the ambiguity caused by the
limited distinction between objects in a given domain.
2.1.1 Fuzzy Sets
Fuzzy Sets have been employed to represent rules generated by evolutionary learn-
ing systems.
Using fuzzy concepts, Valenzuela-Rendon [50] tried to overcome the limitations
of the conventional rule-based classifier system [14] when representing continuous
variables. He used fuzzy logic to represent the results of the genetic-based search of
the classifier system.
Likewise, fuzzy functions have been used to describe and update knowledge in
Cultural Algorithms. First, Reynolds et al. [40] employed a fuzzy acceptance and
influence function in the solution of real-valued constrained optimization problems.
Following the same idea Zhu designed a fully fuzzy Cultural Algorithm [57] which
included a fuzzy knowledge representation scheme in order to deal with the continuous
variables [56] in the belief space, as well as a fuzzy acceptance and influence function.
All these approaches were tested on real-values function optimization problems. More
recently, Jin [16] used a fuzzy knowledge representation for normative knowledge in
the belief space of Cultural Algorithms, to solve the real-valued constrained function
optimization.
The design of a fuzzy representation system is not an easy job, because of the
membership functions should be carefully chosen, and the procedures that use these
functions should specified precisely. The problem is to optimize the fuzzy membership
functions for a problem and to find optimum plans related to the fuzzy performance
-
10
measures. It is natural approach to use heuristics (i.e. evolutionary algorithms) to
solve this task.
2.1.2 Rough Sets
Another approach to represent uncertainty is with Rough Sets. Rough Sets are
based on equivalence relations and set approximations, and the algorithms for com-
puting rough set properties are combinatorial in nature. Wroblewski [53] implemented
a Genetic Algorithms for computing reducts, based on permutation code as well as a
greedy algorithm. Another approach for building reducts is described by Vinterbo
in [51, 52] and it is based on the set cover problem, in particular on finding minimal
hitting sets using a classical Genetic Algorithm. Finding a minimal set of decision
rules or a satisfactory set is an NP-complete problem. Agotnes [1, 2] used a Genetic
Algorithms to build a optimal set of decision rules, where the fitness function was
based on the quality of each rule. In conclusion, there are many hybrid methods that
integrate evolutionary algorithms and other methods from soft computing, methods
such as Rough Sets.
2.2 Evolutionary Computation
Evolution can be defined in one word, adaptation in an uncertain environment.
Nature has a robust way of dealing with the adaptation of organisms to all kind of
changes and to evolve successful organisms. According to the principles of natural
selection, the organisms that have a good performance in a given environment, sur-
vive and reproduce, whereas the others die off. After reproduction, a new generation
of offspring, derived from the members of the previous generation is formed. The
selection of parents from these offspring is often based upon fitness. Changes in the
environment will affect the population of organisms through the random mutations.
Mayr said that Evolution is a dynamic, two-step process of random variation and se-
-
11
lection [11]. Using examples from natural systems and theories of adaptive behavior
researchers have been trying to build heuristic evolutionary learning systems.
Evolutionary algorithms are heuristic optimization methods inspired from natu-
ral evolution processes. Currently there are three basic population-only mechanisms
that model evolution: genetic algorithms, evolutionary strategies and evolutionary
programming. Each of the methods, models the evolution of a population of indi-
viduals at a different scale and applies selection and reproduction operators to find
an individual that is fit with regard of the fitness function. The Genetic Algorithm
models evolution at the gene scale, but evolutionary strategies and evolutionary pro-
gramming, model evolution at the species level.
The Cultural Algorithms [37] approach adds another level to the evolutionary
process inspired from the human societies and cultural evolution. It adds to the
population space, a belief space. The belief space will be a collection of symbolic
knowledge that will be used to guide the evolution of the population.
These evolutionary learning approaches have been used to perform the general
rule extraction task that we are interested in. For example, Holland [15] proposed the
learning classifier system that use genetic search as the primary discovery heuristic for
rules. Classifier systems are inductive models capable of learning, in a reinforcement
learning regime, by using the evolutionary process schema. The reward is given to
individual rules not to a complete rule set.
Besides the rule based methods, Decision Trees are well known for their inductive
learning capabilities. Any decision tree can be reformulated as a set of rules. One
of the problems related to the Decision Trees is finding the smallest decision tree.
Simple heuristics can solve the problem. Researchers have tried to integrate Genetic
Algorithms with Decision Tree learning in order to solve complex classification prob-
lems [3, 38, 39]. Bala et al.[3, 49] applied the above methodology for difficult visual
recognition problems involving satellite and facial image data. Other researchers com-
-
12
bined the Genetic Algorithms or Evolutionary Strategies with neural networks. Yang
et al.[54] used a Genetic Algorithm for feature subset selection problem for neural
network classifiers.
Reynolds et al. [35, 38, 39] investigated the use of Cultural Algorithms to guide
decision tree learning. The data was taken from a real world archeological database,
with a collection of sites found in Valley of Oaxaca, Mexico. The problem was to
localize the sites that present evidence of warfare as opposed with those that did not.
Cultural Algorithms were also used to generate rule-based classifiers for the fraud
detection problem [44]. The task is hard because of the dynamic environments, but
the Cultural Algorithms provide self-adaptive capabilities to deal with the dynamic
changes.
-
13
CHAPTER 3
EXTRACTING PATTERNS FROM ARCHAEOLOGICAL DATA
Here we will work with the archeological data from [19], a survey of Valley of
Oaxaca, Mexico. This volumes describe and analyze the data collected during the
the Valley of Oaxaca Settlement Project. The project consisted of a detailed mapping
and surface collection of the regions sites plus the major archaeological site, Monte
Alban, part of the Oaxaca valley as is shown in figure 3.1.
Oaxa
Valley
Region
ValleyVall
Grande Tlacolula Ocatlan
Squares
Sites
Components
... ...
... ...
... ...
Figure 3.1: Complexity of the Data
3.1 Data Material
In Valley of Oaxaca the basic descriptive unit is the site, of which there are around
2700 in an area of about 1700 km2. The detailed description for each terrace con-
-
14
Figure 3.2: A Description of the Oaxaca Valley
-
15
tains 100 attributes. This data format was obtained after scanning the data from
[19]. First, a series of maps were drawn that show Valley of Oaxaca occupation at
various periods of time by plotting those categories specific to each period. Based
on these maps, population estimates were made for each period. Second, patterns
of differential distribution of ceramic categories were illustrated, within the site by
plotting their densities of occurrence by site division. Little statistical analysis [5],
[19] or data analysis was done previously. In our project the terrace data and the
ceramic categories are put together for the first time in order to discover the reasons
for growth and decline of Valley of Oaxaca and Monte Alban over time.
3.2 Large-Scale Data
One of the most important problems in data analysis relates to the dimensionality
of the data, because many data analysis techniques involve exhaustive search over
the object space. They are very sensitive to the size of the data in terms of time
complexity and it is hard to generate compact rules. The solution is to reduce the
search space horizontally (in terms of records or objects) and vertically (in terms of
fields or attributes or variables), and to use heuristics to guide the search through the
large space of possible combinations of attributes values and classes. Our data set,
for example, contains over 6000 records and over 100 attributes.
3.3 Uncertainty in Data
Uncertainty in a data set can appear for different reasons. One reason is noise.
Errors which can occur during data collection or data entry are referred as noise in
the data. It is also possible that the data set can have missing attribute values. In
this case, the objects containing missing attributes values can be discarded or the
missing values can be replaced with the most common values. Another problem is
that the available knowledge in many situations is incomplete and imprecise. This
-
16
means that sometimes the attribute values for a set of objects are not sufficient and
precise enough to differentiate between classes of objects. When we are taking about
the Oaxaca Data Set, errors and noise may have occurred for many reasons. The
ancient sites are damaged because plowing, erosion, pot hunting and grazing. Also,
human perception is subjective, and many people worked on the collection of the
data. Some errors are possible due to the scanning process since much of the data
was available from printed text only.
Many different ways of representing and reasoning about uncertainty have been
developed in Artificial Intelligence. These theories includes: belief networks, non-
monotonic logic, Fuzzy Sets along with Fuzzy Logic and Rough Sets. The well-known
Fuzzy Set theory [55] characterizes a concept approximately using a set membership
function with a range of values between 0 and 1. Another approach based on the rough
set theory [33] provides a lower and upper approximation in terms of set belonging
of a concept depending on how the relationship between two partitions of a finite
universe is defined.
Figure 3.3: Visual Representation of Rough Sets
Fuzzy Sets are good approaches for problems with multiple membership grade
requirements (figure 3.4), where judgment on set membership grades is possible and
-
17
Figure 3.4: Visual Representation of Fuzzy Sets
where the ability to deal with vague predicates is required. They are very good for
real-valued data. On the other hand, Rough Sets with the three-valued simplicity,
lower, upper, and boundary approximation sets (figure 3.3), work well on discrete and
categorical data. Rough Sets can be useful even with missing data, changes of scale,
and problems where membership grades are hard to define, and problems requiring
changes in the partition. Checking the attributes table for the Oaxaca Data Set we
can see that out of all attributes only very few attributes are integer in type with
no real-valued data types. All the other attributes are of the categorical data type,
nominal, ordinal, or binary. We want to find which sites where occupied in each
period of time, so we have to deal with data partitions. These facts suggest here us
that rough sets methods are more appropriate here.
3.4 Redundant Data
In addition to incomplete data, a data set may contain redundant or insignificant
attributes with respect to the problem, or variables that are obscure, and have non
interesting relationships. This case might arise in several situations, for example,
redundant attributes may result from combining relational data tables. Solution to
this problem exists, in the form of feature selection algorithms, such as the reduct
computation in the rough sets case. After discharging the irrelevant attributes we
-
18
may have horizontal or object related redundancy. This can be solved by applying
horizontal pruning methods, or merging identical objects.
For our data set, looking in the attributes table, we can see first that not all of the
attributes are relevant to our problem. Fields like Recorder, Published information,
and Written comments are useful for detecting noise, but are less important for the
overall analysis and can be removed after the preliminary stage. Secondly, we will
apply the reduct algorithm in order to find a subset of significant attributes.
3.5 Dynamic Data
A fundamental characteristic of real world data sets is that they are dynamic,
which means that their contents are often changing over time. There are two impor-
tant aspects of this problems. First, the run time efficiency of the knowledge model
becomes very important. Second, the knowledge model will no longer be static, but
should have the capability of evolving as data changes over time. Our data sets change
when new sites or ceramic types are discovered. One important characteristic of our
data sets is their temporal and spatial dimensions. The time periods span between
1400 B.C. to 500 A.D.. Spatially speaking, we work with the Valley of Oaxaca data,
1700 km2. In order to find the changes between different periods of time our models
have to change, and we will gain insights of what a dynamic data set is.
3.6 Conclusion
In this chapter a motivation for explicitly representing uncertainty in our knowl-
edge base was presented. The advantages of a Rough Set approach for representing
uncertain knowledge were presented. Then, a methodological framework for applying
rough sets to this data set was discussed. In the next section we describe the formal
properties of rough sets.
-
19
CHAPTER 4
ROUGH SETS FORMALIZATION
Pawlak [33] introduced Rough Set theory in the early 1980s as a tool for rep-
resenting imprecise or uncertain information, and for reasoning about it. Based on
the notion of indiscernability, rough set theory deals with the approximation of sets,
using equivalence relations. These approximations can form model hypotheses. Many
different applications can be found in the literature, but here we focus on the appli-
cations to the classification problem, in order to answer the questions posed in the
first chapter.
4.1 Formal Definitions and Properties
c1 c2 c3 c4 c5 d
o1 3 2 2 0 2 0o2 1 3 3 1 2 0o3 0 1 0 1 2 0o4 2 1 3 0 0 0o5 3 2 2 0 1 0o6 1 1 2 0 2 0o7 3 0 0 3 2 0o8 0 0 3 2 1 0o9 2 1 1 1 2 0o10 0 3 2 0 0 0o11 2 0 2 1 2 1o12 1 3 2 2 2 1o13 0 1 1 0 2 1o14 3 1 0 1 2 1o15 3 3 0 2 2 1o16 1 2 3 2 0 1o17 2 1 3 2 2 1o18 1 2 3 1 0 1o19 2 2 1 1 2 1o20 2 1 1 1 2 1
Table 4.1: Example of Decision Table
An information system can be defined as a pair S = (U,A), where U is a finite
set of objects and A is a finite set of attributes. Each attribute a A is a function
-
20
that maps elements of U into a set Va called the attribute domain, of attribute a,
a : U Va (4.1)
Let S = (U,A) be an information system and let C,D A be two subsets ofattributes, called the condition and the decision attributes respectively. A condition
attribute is an attribute that is thought to influence another attribute, the decision at-
tribute. An information system with distinguished conditions and decision attributes
it is called decision table and it is denoted by T = (U,A,C,D). Because a table
with more than one decision attribute can be easily transformed into a similar ta-
ble with only one decision attribute, usually the set of decision attributes contains
only one decision attribute, denoted D = {d}, and T = (U,C, {d}). The decisionattribute d determines a partition in the object space U . The partitions elements
are named decision classes. For example in our example from figure 4.1, U is the
set of objects in the decision table, U = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o11, o12,o13, o14, o15, o16, o17, o18, o19, o20}, C is the set of condition attributes relevant tothe current classification problem, C = {c1, c2, c3, c4, c5} and d is the variable whosevalues define the classification task.
In this example we have two decision classes:
D0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10}
D1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}
With every x U we associate a function dx, which gives the value from Vc fora given attribute c, dx : C Vc ([33]), such that dx = c(x), for every c C {d}.dx|C and dx|d are the restrictions of dx to C, respectively d.
For every x, y U , we say that the object x is discernible if for every y, suchthat x 6= y, dx|C = dy|C implies dx|d = dy|d, otherwise the object is indiscernible.A decision table is consistent if all of the objects are discernible, otherwise it is
inconsistent.
-
21
4.2 Discernibility Matrix
A decision table T = (U,C, {d}), defines a matrix MdC called the discernibilitymatrix modulo decision attribute d [26]. For x, y U with the condition that x andy are from different decision classes dx|d 6= dy|d.
MdC(x, y) = {c C|c(x) 6= c(y) and d(x) 6= d(y)} (4.2)
Since objects are divided into decision classes, we do not have to discern between
o1 o2 o3 o4 o5 o6 o7 o8 o9 o10
o11 1,2,4 1,2,3 1,2,3 2,3,4,5 1,2,4,5 1,2,4 1,3,4 1,3,4,5 2,3 1,2,4,5o12 1,2,4 3,4 1,2,3,4 1,2,3,4,5 1,2,4,5 2,4 1,2,3,4 1,2,3,5 1,2,3,4 1,4,5o13 1,2,3 1,2,3,4 3,4 1,3,5 1,2,3,5 1,3 1,2,3,4 2,3,4,5 1,4 2,3,5o14 2,3,4 1,2,3 1 1,3,4 2,3,4,5 2,3,4,5 2,4 1,2,3,4,5 1,3 1,2,3,4,5o15 2,3,4 1,3,4 1,2,4 1,2,3,4,5 2,3,4,5 1,2,3,4 2,4 1,2,3,5 1,2,3,4 1,3,4,5o16 1,3,4,52,4,5 1,2,3,4,5 1,2,4 1,3,4,5 2,3,4,5 1,2,3,4,5 1,2,5 1,2,3,4,5 1,2,3,4o17 1,2,3,41,2,4 1,3,4 4,5 1,2,3,4,5 1,3,4 1,2,3,4 1,2,5 3,4 1,2,3,4,5o18 1,3,4,52,5 1,2,3,5 1,2,4 1,3,4,5 2,3,4,5 1,2,3,4,5 1,2,4,5 1,2,3,5 1,2,3,4o19 1,3,4 1,2,3 1,2,3 2,3,4,5 1,3,4,5 1,2,3,4 1,2,3,4 1,2,3,4,5 2 1,2,3,4,5o20 1,2,3,41,2,3 1,3 3,4,5 1,2,3,4,5 1,2,3,4 1,2,3,4 1,2,3,4,5 - 1,2,3,4
Table 4.2: Full Discernibility Matrix
objects that belong to the same class. Figure 4.2 shows the discernibility matrix
for our decision table from figure 4.1. The entries in the table represent the set of
condition attributes. For example for the row o11 and column o1, the entry 1, 2, 4
identifies condition attributes set {c1, c2, c4}.
4.3 Indiscernibility Relations
A discernibility matrix MdC defines a binary relation RdC U2. The relation RdC
is called an indiscernibility relation with respect to C and d, and reveals the pairs of
objects from different classes that we cannot discern between. For x, y U underthe condition that x and y are from different decision classes, dx|d 6= dy|d,
xRdCy MdC(x, y) = . (4.3)
-
22
RdC is an equivalence relation that has three basic properties:
1. Reflexivity. For all x U , xRdCx.
2. Symmetry. For all x, y U , xRdCy yRdCx.
3. Transitivity. For all x, y, z U , if xRdCy and yRdCz xRdCz.
The equivalence relation RdC , induces a partition over the universe U, meaning that
the resultant equivalence classes are disjoint and the union equals the universe U.
The indiscernibility set of an object x U is denoted as
RdC(x) = {y U | xRdCy}. (4.4)
In our example RdC(o9) = RdC(o20) = {o9, o20}.
4.4 Rough Sets
The idea behind Rough Sets is to approximate a set of interest in terms of other
sets.
With each subset X U and an equivalence relation RdC defined over U we canassociate two subsets:
RdCX = {x U | RdC(x) X} (4.5)
RdCX = {x U | RdC(x) X 6= } (4.6)
are called the lower and upper approximations of X respectively. The set RdCX is
the set of all elements of U which can be classified with certainty as elements of X,
in terms of the equivalence relation Rdc . This set is also called the positive region
of X. The set RdCX is the set of elements of U which can be classified, with some
uncertainty, as elements of X. The set RdCX RdCX will be called the RdC boundaryof X.
We consider two subsets D0, D1 U , D0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10}and D1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}, and the equivalence relation
-
23
Figure 4.1: Example of Rough Sets
RdC over U defined from the decision table from table 4.1 and the discernability matrix
from table 4.2. Then we compute the lower and the upper aproximations for the two
subsets, D0, D1 (figure 4.1):
RdCD0 = {o1, o2, o3, o4, o5, o6, o7, o8, o10}RdCD0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o20}RdCD1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19}RdCD1 = {o9, o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}RdCD0RdCD0 = RdCD1RdCD1 = {o9, o20}
4.5 Reducts and the Core of Knowledge
One problem is whether some of the attributes in a decision system are redundant
with respect to the object classifications. If an attribute set B C preserves theindiscernibility relation, RdC , then the attributes that form the set C B are said tobe dispensable. All minimal subsets, in terms of size, of attributes B that preserve
the relation RdC are called reducts and we denoted the set by Red(T ).
-
24
The intersection of all minimal subsets of attributes, from B is called the core of
the attributes set C, and it is denoted by Core(C). The core can by defined in terms
of the discernibility matrix as the set of all single element entries of the discernibility
matrix.
Core(C) = {c C||Mdc (x, y)| = 1 for some x, y} (4.7)
In our case, looking in the table 4.2, we can conclude that Core(C) = {c1, c2}.
o1 o2 o3 o4 o5 o6 o7 o8 o9 o10o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2o12 1,2 3,4 1,2 1,2 1,2 2 1,2 1,2 1,2 1o13 1,2 1,2 3,4 1 1,2 1 1,2 2 1 2o14 2 1,2 1 1 2 2 2 1,2 1 1,2o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o17 1,2 1,2 1 4,5 1,2 1 1,2 1,2 3,4 1,2o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2o20 1,2 1,2 1 3,4,5 1,2 1,2 1,2 1,2 - 1,2
Table 4.3: Core Discernibility Matrix
Now, we can define the full set of reducts in terms of the discernibility matrix.
The set B, such that B C is the reduct of C if B is a minimal, with respect toinclusion, subset of C such that B MdC 6= for any nonempty MdC (MdC(x, y) 6= ).Looking at figure 4.3, it can be easily seen that there are two reducts in our example
Red(T ) = {{c1, c2, c4}, {c1, c2, c3, c5}} shown in figure 4.4.Besides the full reducts defined above, we can define reducts that are relative to
a particular object in the decision table. We call these reducts object-related reducts
If indiscernibility is relative to an object x, two other objects y and z are considered
to be indiscernible in comparison with x. Reducts that are related to a particular
object x are called x-relative reducts, Red(T, x), since they contain the minimum
information needed to select that particular object from other objects in the decision
table.
-
25
o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o1 o2 o3 o4 o5 o6 o7 o8 o9 o10
o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2 o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2o12 1,2 4 1,2 1,2 1,2 2 1,2 1,2 1,2 1 o12 1,2 3 1,2 1,2 1,2 2 1,2 1,2 1,2 1o13 1,2 1,2 4 1 1,2 1 1,2 2 1 2 o13 1,2 1,2 3 1 1,2 1 1,2 2 1 2o14 2 1,2 1 1 2 2 2 1,2 1 1,2 o14 2 1,2 1 1 2 2 2 1,2 1 1,2o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1 o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2 o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o17 1,2 1,2 1 4 1,2 1 1,2 1,2 4 1,2 o17 1,2 1,2 1 5 1,2 1 1,2 1,2 3 1,2o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2 o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2 o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2o20 1,2 1,2 1 4 1,2 1,2 1,2 1,2 - 1,2 o20 1,2 1,2 1 3,5 1,2 1,2 1,2 1,2 - 1,2
Table 4.4: Two Minimal Discernibility Matrix
There are several algorithms for computing reducts or reduct approximations.
These algorithms assume that any attributes subset of C can be an approximation
to a reduct. The Exhaustive Reducer algorithm [28, 29, 30] computes all the reducts
by brute force, by exhaustive search. The algorithm takes exponential time in terms
of the number of objects, so it is not suitable for very large decision systems as it
may be very time consuming. Another algorithm is the Johnson Reducer [28, 29, 30]
which invokes a simple greedy algorithm to compute only a single reduct. Because
of the NP-completeness of the problem, heuristic methods can be more effective.
Wroblewski [53] proposed a variation of a Genetic Algorithm to search for reducts,
either until the search space is exhausted or until a given maximum number of reducts
has been found. Another heuristic approach was proposed by Vinterbo [51, 52]. It is
based on minimal hitting sets. We will present this method in the following section.
4.6 Reducts, Hitting Sets, Approximate Hitting Sets
Multisets are unordered collections of elements where an element can occur as a
member more than once. A hitting set [51, 52] for a given multiset, MS, of elements
from 2C is a set B, B C, such that the intersection between B and every set in
-
26
MS is non-empty.
HS(MS) = {B C|B MSi 6= for all MSi MS} (4.8)
The set B HS(MS) is a minimal hitting set of MS, if B is no longer a hitting set,when ever any of its elements are removed. The set of minimal hitting sets is denoted
by minHS(MS).
An approximation to the hitting set is a set that covers enough elements of the
multiset MS as denoted by a constant . The set of -approximate hitting sets of S is
denoted by HS(MS, ), where the parameter controls the degree of approximation,
HS(MS, ) = {B C| |MSi MS and MSi B 6= ||MS| } (4.9)
The set B SH(MS, ) is a minimal -approximation hitting set if it is no longeran -approximation hitting set when any of its elements are removed.
The problem of computing the minimal hitting set, like the reducts computation,
is an NP-hard problem. Again, it is necessary to use heuristics in order to find reducts
using hitting sets, but we still cannot guarantee the minimality of the reducts.
4.7 Decision Rules, Decision Systems
A decision rule is an assertion, of the form if p then s, denoted by p s,where p and s are logical formulas in the first order logic. For each object, certain
values of the condition attributes determine the value of the decision attribute. We
define a decision system as a finite collection or set of decision rules. In order to
obtain a decision system with a minimum number of rules, superfluous decision rules
associated with the same decision class can be eliminated without disturbing the
decision making process.
Let D be a decision system and let T = (U,C, {d}) be an information system.The set of all decision rules in D having the same successor s as a decision attribute
value, is denoted by Ds. We can say that Ds is the decision subsystem associated
-
27
with class s. The set of all predecessor formulas of the decision rules belonging to Ds,
is denoted by Ps. We say that a decision rule p s from D is dispensable in D ifthe disjunction of all the formulas in Ps is equivalent to the disjunction of formulas
in Ps {p}. Otherwise, the rule is indispensable. If every decision rule belongingto Ds is indispensable, the decision subsystem Ds is said to be independent. For a
decision system, if all its decision subsystems Ds are independent, we say that the
decision system is minimal.
4.8 Decision System Construction
The problem of decision system construction is to induce a set of rule descriptors
of decision classes from the input set of objects. These sets of descriptors, named
decision systems, consist of a set of decision rules. We can classify the decision
system as following:
1. Decision systems with a minimum set of rules. They are focused on describing
input objects using a minimum number of necessary rules.
2. Decision systems with an exhaustive set of rules (figure 4.5, 4.6). These decision
systems contain all possible decision rules.
3. Decision systems with a satisfactory set of rules. This category represents sets
of decision rules which satisfy given a priori users requirement for an acceptable
decision system.
One strategy for finding a simple decision system with good classificatory capabil-
ities is to first induce an exhaustive set of rules, and then to prune away those rules
that do not lower the decision systems performance significantly. An exhaustive de-
cision system can be generated from the object-related reducts [26, 27], as described
in section 4.5.
-
28
Rule 1 2 3 4 5 6 7 81 C2(0) AND C4(2) => D(0) 1 1 1 0.05 0.1 1 2 12 C2(3) AND C4(1) => D(0) 1 1 1 0.05 0.1 1 2 13 C2(2) AND C4(0) => D(0) 2 2 1 0.1 0.2 1 2 14 C2(3) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 15 C1(0) AND C4(2) => D(0) 1 1 1 0.05 0.1 1 2 16 C1(1) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 17 C1(2) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 18 C1(0) AND C4(1) => D(0) 1 1 1 0.05 0.1 1 2 19 C1(3) AND C4(0) => D(0) 2 2 1 0.1 0.2 1 2 1
10 C4(3) => D(0) 1 1 1 0.05 0.1 1 1 111 C1(0) AND C2(3) => D(0) 1 1 1 0.05 0.1 1 2 112 C1(0) AND C2(0) => D(0) 1 1 1 0.05 0.1 1 2 113 C1(3) AND C2(0) => D(0) 1 1 1 0.05 0.1 1 2 114 C1(1) AND C2(1) => D(0) 1 1 1 0.05 0.1 1 2 115 C1(3) AND C2(2) => D(0) 2 2 1 0.1 0.2 1 2 116 C1(2) AND C2(1) AND C4(1) => D(0) OR D(1) 2 1,1 0.5,0.5 0.1 0.1,0.1 1.0,1.0 3 217 C2(1) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 118 C2(2) AND C4(1) => D(1) 2 2 1 0.1 0.2 1 2 119 C1(0) AND C2(1) AND C4(0) => D(1) 1 1 1 0.05 0.1 1 3 120 C2(2) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 121 C2(3) AND C4(2) => D(1) 2 2 1 0.1 0.2 1 2 122 C2(0) AND C4(1) => D(1) 1 1 1 0.05 0.1 1 2 123 C1(2) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 124 C1(3) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 125 C1(3) AND C4(1) => D(1) 1 1 1 0.05 0.1 1 2 126 C1(1) AND C4(2) => D(1) 2 2 1 0.1 0.2 1 2 127 C1(2) AND C2(2) => D(1) 1 1 1 0.05 0.1 1 2 128 C1(1) AND C2(2) => D(1) 2 2 1 0.1 0.2 1 2 129 C1(3) AND C2(3) => D(1) 1 1 1 0.05 0.1 1 2 130 C1(3) AND C2(1) => D(1) 1 1 1 0.05 0.1 1 2 131 C1(2) AND C2(0) => D(1) 1 1 1 0.05 0.1 1 2 1
Table 4.5: Exhaustive Decision System
Pruning can be done by identifying and removing components of the decision
system that only explain small parts of the data, thereby preserving general trends in
the underlying data material. In order to find a minimal decision system we can use a
simple greedy heuristic algorithm described by Lazar and Sethi [20]. This algorithm
computes only one decision system. If more than one minimal decision system is
required we can use a Genetic Algorithm which solves the minimal cover set problem.
Agotnes [1, 2] proposed two algorithms for generating satisfactory decision systems,
a quality-based rule filtering algorithm and a genetic rule filtering algorithm. Rule
-
29
Rule 91 C2(0) AND C4(2) => D(0) {o2}2 C2(3) AND C4(1) => D(0) {o2}3 C2(2) AND C4(0) => D(0) {o1,o5}4 C2(3) AND C4(0) => D(0) {o10}5 C1(0) AND C4(2) => D(0) {o8}6 C1(1) AND C4(0) => D(0) {o6}7 C1(2) AND C4(0) => D(0) {o4}8 C1(0) AND C4(1) => D(0) {o3}9 C1(3) AND C4(0) => D(0) {o1,o5}
10 C4(3) => D(0) {o7}11 C1(0) AND C2(3) => D(0) {o10}12 C1(0) AND C2(0) => D(0) {o8}13 C1(3) AND C2(0) => D(0) {o7}14 C1(1) AND C2(1) => D(0) {o6}15 C1(3) AND C2(2) => D(0) {o1,o5}16 C1(2) AND C2(1) AND C4(1) => D(0) OR D(1) {o9,o20}17 C2(1) AND C4(2) => D(1) {o17}18 C2(2) AND C4(1) => D(1) {o18,o19}19 C1(0) AND C2(1) AND C4(0) => D(1) {o13}20 C2(2) AND C4(2) => D(1) {o16}21 C2(3) AND C4(2) => D(1) {o12,o15}22 C2(0) AND C4(1) => D(1) {o11}23 C1(2) AND C4(2) => D(1) {o17}24 C1(3) AND C4(2) => D(1) {o15}25 C1(3) AND C4(1) => D(1) {o14}26 C1(1) AND C4(2) => D(1) {o12,o16}27 C1(2) AND C2(2) => D(1) {o19}28 C1(1) AND C2(2) => D(1) {o16,o18}29 C1(3) AND C2(3) => D(1) {o15}30 C1(3) AND C2(1) => D(1) {o14}31 C1(2) AND C2(0) => D(1) {o11}
Table 4.6: Exhaustive Decision System
filtering operates on an existing exhaustive decision system, pruning it while retaining
a high performance. Both of the above solutions make no assumptions about the
minimal set cover condition. As a result, the decision system may not be minimal.
We will propose a new solution based on the Cultural Algorithm which addresses the
minimal set cover problem explicitly.
-
30
4.9 Steps for Model Construction
The typical steps (figure 4.2) for constructing model hypotheses for classification
problems using the Rough Sets methodology are as follows:
Figure 4.2: Model Development
1. Sampling and selection: This step imply the construction of a research file. Usu-
ally this is done by selecting a random sample of prospects from the database.
Also, the irrelevant attributes should be removed. Then, the selected data is
represented as a two-dimensional table.
2. Preprocessing: If the selected table contains missing values or empty cell entries,
the table must be preprocessed in order to remove some of the incompleteness.
Statistics should be run to obtain more information about the data.
-
31
In the initial data set, missing values were coded differently for each variable.
To make the table consistent we replaced all the values assigned to missing
data with the SPSS convention, system missing data. We did not replace the
missing values with a default value since the Rough Sets representation can deal
with uncertainty, and the algorithms define them as undetermined. Descriptive
statistics, using SPPS 8.0 were run for all of the attributes and used to identify
outliers for modification.
3. Transformation: For example, measurement attributes should be discretized,
and used instead of exact observations. Categorical data may be recoded to
provide a consistent interpretation. The Rough Sets procedure is potentially
sensitive to large differences in the number of categories for variables. Thus,
we performed transformations of the data with many categories to consolidate
them when possible into smaller set of categories. This was done in a systematic
way, after checking the statistics, and carefully without loosing much informa-
tion. Also, the decision variables should be identified. After this step the data
becomes more qualitative than quantitative.
4. Training and validation sample: The initial table is divided into at least two
subtables. One will be used in the training step, the other in the validation or
testing step.
5. Develop the model: Knowledge discovery techniques are applied to the training
data in order to generate a set of hypothesized relations. Following the Rough
Set methodology, the full set of reducts are computed, a set of minimal reducts
is chosen, and the data table is vertically pruned. Then the object related
reducts are computed and the exhaustive decision rule system is generated. At
the end a pruning method for the decision rule set is applied in order to obtain a
good decision system, with a good balance between the number of rules and the
-
32
Data
Compute full
reducts
Compute object-related
reducts
Generate exaustive
decision-rule system
Find minimal or satisfectory
decision-rule system
Figure 4.3: Model Construction Phases
accuracy of the classifications. The process is shown in figure 4.3. The above
procedure was followed exactly and the results are shown in section 6.
6. Interpretation and evaluation: The validation or test data set is then used
to test the classificatory performance of the new model. Also, if it is a rule-
based model, it can be checked by specialists, in order to understand the data
sets, understand and explain dependencies between values of attributes and
definitions of decision classes. The expert will check the decision rule system.
-
33
CHAPTER 5
EVOLUTION-BASED LEARNING OF ONTOLOGICAL
KNOWLEDGE FOR A LARGE-SCALE MULTI-AGENT SIMULATION
5.1 Introduction
The results of the data mining process can be used in many different ways. There-
fore, the form of the knowledge collected will have a major impact on the efficiency
and effectiveness of its use in a given application. In this chapter we examine the
problem of extracting knowledge for use by agents in a large-scale multi-agent system
[42]. Here, the knowledge is ontological knowledge that represents constraints that
the physical and social environments placed upon the agents and their interactions.
The ontological knowledge represents the semantic building blocks around which the
world models are formed. For an agent in a particular model, only the things in
his ontology can exist and it cannot perceive things that are not represented in the
ontology. An ontology Fox et al [12] is a basic level of knowledge representation
scheme, a formal definition of entities and their properties, interactions, behaviors
and constraints. Each agents decisions need to be checked against these constraints
prior to their execution. In a complex multi-agent system, hundreds of thousands of
agents may need to check these constraints regularly which means that a successful
Data Mining activity will need to produce a relatively small set of syntactically simple
rules for the process to be efficient. Fox et al [12] have used Data Mining techniques
to produce corporate ontogenies.
There are several factors that can influence the nature of the ontological con-
straints that are produced. First, the nature of the data collection and measurement
process and the uncertainty induced into the data set by the presence of noise. Second,
the nature of the representation used to express the extracted patterns; e.q. whether
-
34
it allows for uncertainty or not. Third, the data mining technique employed and the
assumptions that it makes about the collected data. Fourth, how these constraints
will be stored, accessed, and used by the agents involved.
For a given data set one can compare the different Data Mining techniques in
terms of the syntactic and semantics of the induced constraints. In this application
we are interested in simulating the emergence of the archaic state in the Valley of
Oaxaca, Mexico. A state is among the most sophisticated and powerful structures
that has emerged from the social evolution process. In the modern world these are
termed nation states with a government composed of a hierarchical decision-making
structure where the decision-makers are either elected or appointed. States are sup-
ported by various economies and are able to interact with each other via warfare,
trade, etc.. Most states in the ancient world-often called archaic states were ruled by
hereditary royal families. These archaic states exhibited much internal diversity with
populations numbering from tens of thousands to millions. They had a bureaucracy,
organized religion, a military presence, large urban centers, public buildings, public
works, and services provided by various professional specialists. The state itself could
enter into warfare and trade-based relationships with other states and less complex
neighbors.
The process by which complex social entities such as the state emerged from lower
level structures and other supporting economies has long been of prime interest to
anthropologists and other disciplines as well. This is because the emergence of such
a social structure can have a profound impact on the societies physical and social
environment. However, the task of developing realistic computational models that
aid in the understanding and explanation of state emergence has been a difficult one.
This is the result of two basic factors:
The process of state formation inherently takes place on a variety of temporaland spatial scales.
-
35
The emergence of hierarchical decision-making [35] can be viewed as an adapta-tion that allows decision-makers to specialize their decisions to particular spatial
and temporal scales.
The formation of the state is a complex process that is fundamentally directed by
the social variables but requiring dynamic interaction between the emergent system
and its environment. Identifying the nature of these interactions is one of the reasons
why the process of state formation is of such interest.
The goal of this project is to produce a large-scale knowledge-based computational
model of the origins of the Zapotec State (Marcus and Flannery 1996), centered at
Monte Alban, in the Valley of Oaxaca, Mexico. State formation took place between
1400 B.C. and 300 B.C. While archaic states have emerged in various parts of the
world, the relative isolation of the valley allowed the processes of social evolution
to be more visible there. Extensive surveys [4, 5, 19] of the 1700 km2 valley, were
undertaken by the Oaxaca Settlement Pattern Project in the 1970s and 1980s. The
location and features of over 2,700 sites dating from the archaic period (8000 B.C.)
to Late Monte Alban V (just prior to the arrival of the Spaniards) were documented.
Several hundred variables were recorded for each site. In addition, they surveyed the
6.5 square kilometer urban center of Monte Alban, a site that contained over 2,000
residential terraces. This site was the focus for early state formation in the valley.
Both surveys provided the knowledge needed to create our multi-agent simulation
model. We then produced a spatial temporal database that contained the results
of both surveys and used data mining techniques from Artificial Intelligence [42] to
produce knowledge about site location, warfare, trade, and economic decisions to be
used for the construction of the multi-agent model. However, in order to do this
we needed to add more data about the spatial and temporal context to both the
regional and urban center surveys. Specifically, we had to add variables that allowed
us to locate each site spatially and temporally to a level of precision consistent with
-
36
the scale of our simulation. For example, temporal periods are characterized by the
presence of pottery of different styles. That data was available only in text form. All
of this pottery data, over 130 variables for each residential terrace, was scanned into
the computer, corrected for errors, and added to the Monte Alban data set. This
data allowed us to identify the periods that each terrace was occupied. Pottery data
was also integrated into the regional data set.
In addition, the survey had produced hundreds of pages of hand drawn maps
for both the Monte Alban and regional surveys that contained the spatial context
for the location of each site. Since our goal was to ask specific questions about the
spatial and temporal context we needed to tie each site into its mapped location. We
then proceeded to digitize each of the maps and to associate each site object with its
corresponding data record. This allowed us to produce a geographical information
system (GIS) that serves as our virtual valley of Oaxaca. This acts as a vehicle
for our data mining activities and as a knowledge base for the multi-agent simulation
and allows the results of the simulation to be displayed and compared with the actual
data in a spatial context. It is envisioned that the resultant GIS system will be a
useful tool for researchers and students from various fields to study the emergence of
complexity in the future.
In order to perform the data mining activities, we extended traditional data mining
techniques and developed new ones in order to deal with the complexities inherent
in the Oaxaca database. At the regional level we used Utgoffs incremental Decision
Tree Algorithm (IDTI) [48] to generate the Decision Trees for each region and phase
of the valley. The approach was used to generate decision trees that discriminated
between sites that were targets for warfare and those that were not for a given period
[39, 38].
However, given the many disparate steps under which the data was collected and
organized it was felt that perhaps some improvements might be made by using a
-
37
technique that took into account the presence of uncertainty in the data, especially
in regions and periods when the social and settlement patterns were complex and
prone to data collection error. To test this hypothesis we selected a period of time
just before the emergence of the state, Rosario, where there was evidence of increased
social strife brought about in part by increased population growth. The part of the
valley that exhibited the greatest population at the time, Etla, was also selected.
Since the majority of the data was discrete rather than continuous in nature we
selected Rough Sets as a vehicle for representing uncertainty here. We employed an
evolutionary technique, Genetic Algorithms [13, 14], to control the search in this case
because Genetic Algorithms had been successfully used with Rough Sets previously.
The decision systems or rule sets produced by both approaches were then compared in
terms of their ability to decide about the location of sites that are targets for warfare
in this period. We then compared the two approaches over all relevant phases of
social evolution in the valley.
In section 2 we begin with an overview of Decision Trees and their generation.
Section 3 provides a comparison of a specific decision system for the Etla region in
the Rosario phase as produced by Decision Trees and Rough Sets respectively. Section
4 compares the two approaches together over all phases in the Etla region. The Rough
Set approach appears most useful in those phases with the most evidence for change
in social complexity. Section 5 gives our conclusions.
5.2 Decision Tree
Decision Tree induction is a very good method for high-dimensional applications. It
is a fast non-linear prediction method and employs dynamic feature selection. The
solution complexity is expressed in terms of the number of terminal nodes. The most
complex tree covers all cases in the training data. Pruning the tree and measuring
the errors in progressively smaller trees find less complex solutions.
-
38
Period Approximate Date
Tierras Largas 1400 - 1150 BCSan Jose 1150 - 850 BCGuadalupe 850 - 700 BCRosario 700 - 500 BCMonte Alban Ia 500 - 300 BCMonte Alban Ic 300 - 150/100 BCMonte Alban II 150/100 BC - AD 200Monte Alban IIIa AD 200 - 500Monte Alban IIIb AD 500 - 700/750Monte Alban IV AD 700/750Monte Alban V AD 1000 - 1521
Table 5.1: Number of Terminal Nodes for each Phase
Any decision tree can be rewritten in a form of decision rule set, however there are
sets of rules which can not be expressed as any decision tree. An implied decision rule
in a tree is a complete path to a terminal node. Because these rules are not mutually
exclusive, the size of the decision rule set can be much larger then the logic needed
for overlapping rules. One rule can combine a large number of terms or true-false
conditions, which takes a lot of time for evaluation, when the rule set is used in an
agent-based simulation.
One of the advantages of logic-based solutions as decision trees and corresponding
decision rules is their powerful explanatory capabilities. Table 5.1 gives all of the
relevant periods of social evolution in the valley. Tierras Largas marks the beginning
of early village settlement there. The state emerged at Monte Alban in period Monte
Alban Ia. The valley came under control of the state by Monte Alban II, and Monte
Alban IIIa signaled the decline of the state and it succession by a collection of city-
states localized in different parts of the valley.
For some periods there were several hundred rules produced just for the warfare
constraint alone. Adding in the other constraints would produce a knowledge base of
several thousand rules at minimum. However, since the data were collected over a ten-
year period over varying landscape using different surveyors and recording equipment
-
39
env_zone
-
40
it was felt that a significant amount of noise might be present in the data. The
original technique did not account explicitly for uncertainty in the measurement data.
Thus, it was possible that certain rules or rule conditions were present only to deal
with the specific noise introduced into the process by the survey methods and data
transformation activities. If this was the case, then by using a representation and
associated learning technique that dealt explicitly with uncertainty it might be that
fewer rules would be needed in periods where the noise is most pronounced.
Rules
1 env zone
-
41
and the those that are not. Three variables were used in order to compute the
decision: the presence of burnt daubt at the site, other evidence of burning, and the
presence of defensive walls. The variables used to predict these decisions from positive
and negative examples in the training set were: Environmental zone, Slope, Hilltop
or Ridge top, Soil character, On the boundary between the loam and the swampy
region, Water source, Depth of Water table, Type of irrigation, and Land use type
among others.
In section 2 we presented a decision tree 5.1 and a corresponding decision system
5.2 for the Rosario phase (700-500 B.C.) generated by the Decision Tree approach.
It is the fourth phase of occupation in the study and at that time population size
and warfare increased substantially (Marcus at al [23]). For example, it was observed
that chunks of burnt daub appear on the surface of the villages seven times more
frequently than in the previous phases. There are 36 sites in the Rosario phase. The
archaic state emerged in the period following this phase of increased warfare.
Further, we consider another approach for extracting logical patterns within data
sets Rule induction using Rough Sets. Although, trees and rules may seem similar
they are in fact very different as it can be seen in the table 5.3.
In essence, Decision Trees may be viewed as a simplistic approach to rule discovery.
In order to extract rules, we performed a feature selection using the Rough Set guided
by Genetic Algorithm with the variables above. The Rough Set approach selected the
four variables in two reducts: environmental zone, slope, water source, land use and
slope, soil character, water source and land use. The corresponding decision systems
are given in tables 5.4 and 5.5.
This table represents the exhaustive set of rules produced. While it is clear that
several of the rules are so simple that they can be easily combined to produce a
smaller set of rules overall, it is sufficient for comparative purposes here.
Our focus here is on the impact to which the use of a technique, such as Rough
-
42
Decision Trees Rough Sets
Advantages Easy to understand Very expressiveModular knowledgeGood with missing dataThey handle unprecise data
Disadvantages May be difficult to use withcontinuous data
Can be memory intensive
They look at simple combina-tion of attributes
Can be computational intensive
They need to break numericfields into fixed rangesNot very good with inexactdataNot flexibleNo way to handle missing dataCan not easily approach largedata setsMay have overfittingLess accurate predictions
Table 5.3: A Comparison of Decision Trees versus Rough Sets
Sets, that explicitly is able to deal with uncertainty in the recognition decision. From
this standpoint there are two basic points of comparison. Firstly, how many of the
rules identify a site for attack unambiguously and, what percentage of the rules that
select sites for attack do they comprise? Secondly, in those cases in which the rule
produces a split decision we will need to resolve the tie using other means. The
question is, how much effort do we need to spend in order to find out that we must
contact another source to resolve the question?
In answer to the first question, explicitly dealing with uncertainty using the rough
set representation produced four rules that identify sites for attack as opposed to just
three rules in the decision tree approach. Of these five rules, three of the five (7, 10
and 12 or 6, 9 and 11) result in unambiguous decisions. That is, 66% of the rules
that can conclude that a site can be attacked are unambiguously so whereas the other
two need further clarification. The Decision Trees approach produces 3 rules that can
-
43
Rules
1 env zone(High Alluvium) => Not Attacked2 env zone(Middle Piedmont) => Not Attacked3 env zone(Lower Piedmont) AND slope(Shallow) AND water source(Arroyo)
AND land use(Class I) => Not Attacked OR Attacked4 water source(Tributary Stream) AND land use(Class III 100%) => Not At-
tacked5 water source(Tributary Stream) AND land use(Class I) => Not Attacked OR
Attacked6 water source(Arroyo) AND land use(Class III 100%) => Not Attacked7 water source(Spring) AND land use(Class III 100%) => Attacked8 water source(Spring) AND land use(Class I) => Not Attacked9 slope(Shallow) AND land use(Class III 100%) => Not Attacked10 slope(Flat) AND land use(Class III 100%) => Attacked11 slope(Flat) AND land use(Class I) => Not Attacked12 slope(Flat) AND water source(Spring) => Attacked13 slope(Flat) AND water source(Arroyo) => Not Attacked14 slope(Shallow) AND water source(Spring) => Not Attacked15 water source(Well) => Not Attacked16 water source(Main River) => Not Attacked
Table 5.4: First Exhaustive Decision System for the Rosario Phase in Etla Region
-
44
Rules
1 soil(Alluvium) => Not Attacked2 slope(Shallow) AND soil(Bedrock) AND water source(Arroyo)
AND land use(Class I) => Not Attacked OR Attacked3 water source(Tributary Stream) AND land use(Class III 100%) => Not At-
tacked4 water source(Tributary Stream) AND land use(Class I) => Not Attacked OR
Attacked5 water source(Arroyo) AND land use(Class III 100%) => Not Attacked6 water source(Spring) AND land use(Class III 100%) => Attacked7 water source(Spring) AND land use(Class I) => Not Attacked8 slope(Shallow) AND land use(Class III 100%) => Not Attacked9 slope(Flat) AND land use(Class III 100%) => Attacked10 slope(Flat) AND land use(Class I) => Not Attacked11 slope(Flat) AND water source(Spring) => Attacked12 slope(Flat) AND water source(Arroyo) => Not Attacked13 slope(Shallow) AND water source(Spring) => Not Attacked14 water source(Well) => Not Attacked15 water source(Main River) => Not Attacked
Table 5.5: Second Exhaustive Decision System for the Rosario Phase in Etla Region
conclude that a site can be attacked with only one of them (rule 3) being conclusive.
Thus, only 33% of the rules that identify a site for attack are conclusive as opposed
to 66% for the Rough Set approach. By taking data uncertainty into account the
Rough Set approach not only produced more rules for the identification of the target
concept, but also a higher percentage of unambiguous ones.
The other question concerns the relative amount of effort expended to produce
an uncertain conclusion. In the decision system produced using Rough Sets the
inconclusive rules have fewer conditions to be checked than for those from the Decision
Trees approach. Specifically, the inclusive Rough Set rules have 2 and 4 conditions
respectively for a total of 7 conditions, one of which is shared between them (land
use type = Class I). In the Decision Tree system 8 conditions must be checked in the
two inconclusive rules for a total of 16. However, each shares the same 8 so that the
total number of unique conditions to be tested is 8 as opposed to 6 for the Rough
-
45
Set approach. More effort must then be expended in order to check the inconclusive
rules in the Decision Tree approach as opposed to that for Rough Sets.
Since both approaches extracted the same set of condition variables, the differences
are likely to reflect the impact that