impact of heuristic knowledge discovery techniques on ...alazar/pdf/adiss.pdf · impact of...

IMPACT OF HEURISTIC KNOWLEDGE DISCOVERY TECHNIQUESON MULTI-AGENT SIMULATION OF CULTURAL EVOLUTION

by

ALINA LAZAR

DISSERTATION

Submitted to the Graduate School

of Wayne State University,

Detroit, Michigan

in partial fulfillment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

2002

MAJOR: COMPUTER SCIENCE

Approved by:

Advisor Date

c COPYRIGHT BY

ALINA LAZAR

2002

All Rights Reserved

DEDICATION

To Mircea and to my parents.

ii

ACKNOWLEDGMENTS

I would like to offer my thanks to my advisor Professor Robert Reynolds for his

continuing support and guidance. It has been a privilege to have him as professor as

well as thesis advisor.

Most of this research was supported by NSF IIS Grant ]9907257.

Appreciation goes to all my dissertation committee members: Dr. William Grosky,

Dr. George Yin and Dr. Sherali Zeadally for accepting this task.

Thanks to the Department of Computer Science, Wayne State University for giv-

ing me the opportunity to study here and make this work possible.

Finally, I wish to express my grateful acknowledgments to the professors in the

Department of Mathematics and Computer Science, West Univeristy of Timisoara.

They taught me the background in computer science and mathematics during my

undergraduate studies and encouraged me to pursue a doctoral degree.

iii

TABEL OF CONTENTS

Chapter Page

DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

CHAPTERS

CHAPTER - 1 INTODUCTION . . . . . . . . . . . . . . . . . 1

SECTION - 1.1 Simulating Culture in Algorithms . . . . . . . . 1

SECTION - 1.2 Archaeological Knowledge Discovery Problem . 2

SECTION - 1.3 Heuristics . . . . . . . . . . . . . . . . . . . . . 3

SECTION - 1.4 Multi-agent Based Simulation . . . . . . . . . . 6

CHAPTER - 2 EVOLUTIONARY LEARNING UNDER UNCER-

TAINTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

SECTION - 2.1 Uncertainty . . . . . . . . . . . . . . . . . . . . 8

SUBSECTION - 2.1.1 Fuzzy Sets . . . . . . . . . . . . . . . . . 9

SUBSECTION - 2.1.2 Rough Sets . . . . . . . . . . . . . . . . . 10

SECTION - 2.2 Evolutionary Computation . . . . . . . . . . . . 10

CHAPTER - 3 EXTRACTING PATTERNS FROM ARCHAE-

OLOGICAL DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

SECTION - 3.1 Data Material . . . . . . . . . . . . . . . . . . . 13

SECTION - 3.2 Large-Scale Data . . . . . . . . . . . . . . . . . 15

iv

SECTION - 3.3 Uncertainty in Data . . . . . . . . . . . . . . . . 15

SECTION - 3.4 Redundant Data . . . . . . . . . . . . . . . . . 17

SECTION - 3.5 Dynamic Data . . . . . . . . . . . . . . . . . . . 18

SECTION - 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . 18

CHAPTER - 4 ROUGH SETS FORMALIZATION . . . . . . . 19

SECTION - 4.1 Formal Definitions and Properties . . . . . . . . 19

SECTION - 4.2 Discernibility Matrix . . . . . . . . . . . . . . . 21

SECTION - 4.3 Indiscernibility Relations . . . . . . . . . . . . . 21

SECTION - 4.4 Rough Sets . . . . . . . . . . . . . . . . . . . . 22

SECTION - 4.5 Reducts and the Core of Knowledge . . . . . . . 23

SECTION - 4.6 Reducts, Hitting Sets, Approximate Hitting Sets 25

SECTION - 4.7 Decision Rules, Decision Systems . . . . . . . . 26

SECTION - 4.8 Decision System Construction . . . . . . . . . . 27

SECTION - 4.9 Steps for Model Construction . . . . . . . . . . 30

CHAPTER - 5 EVOLUTION-BASED LEARNING OF ONTO-

LOGICAL KNOWLEDGE FOR A LARGE-SCALE MULTI-AGENT

SIMULATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

SECTION - 5.1 Introduction . . . . . . . . . . . . . . . . . . . . 33

SECTION - 5.2 Decision Tree . . . . . . . . . . . . . . . . . . . 37

SECTION - 5.3 Decision Trees and Rough Set Rules . . . . . . . 40

SECTION - 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . 45

SECTION - 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . 47

CHAPTER - 6 RULE SET GENERATION FOR ALL REGIONS

AND PERIODS OF THE VALLEY . . . . . . . . . . . . . . . . . . . 49


SECTION - 6.2 Reducts and Their Rules for the Tierras Largas

Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

v

SECTION - 6.3 San Jose Phase Reducts and Related Rules . . . 54

SECTION - 6.4 Reducts for the Guadalupe Phase and Related

Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

SECTION - 6.5 Reducts and related rules for the Rosario Phase 55

SECTION - 6.6 Reducts and Related Rules for Monte Alban Ia 58

SECTION - 6.7 Reducts and Related Rules of Monte Alban Late I 64

SECTION - 6.8 Monte Alban II Reducts and Related Rules . . . 67

SECTION - 6.9 Monte Alban IIIa reducts and Related Rules . . 71

SECTION - 6.10 Comparison of the Rough Set Results with Deci-

sion Trees Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 75


CHAPTER - 7 PROTOTYPE OF THE MODEL . . . . . . . . 78


SECTION - 7.2 Modeling the Environment in the Valley of Oaxaca 82

SECTION - 7.3 Modeling the Impact of Changing Social Rela-

tions on Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

SECTION - 7.4 Acquiring the Rules Used to Constrain Agent Be-

haviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

SECTION - 7.5 Modeling Agent Interaction Using Evolutionary

Game Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89


CHAPTER - 8 THE IMPACT OF RULE ONTOLOGIES ON

MODEL PERFORMANCE . . . . . . . . . . . . . . . . . . . . . . . 92


SECTION - 8.2 Experimental Setup . . . . . . . . . . . . . . . . 93

SECTION - 8.3 Simulating Social Evolution from Tierras Largas

Phase into San Jose Phase . . . . . . . . . . . . . . . . . . . . . . . . 93

vi

SECTION - 8.4 San Jose to Guadalupe Results . . . . . . . . . 101

SECTION - 8.5 Simulating from Guadalupe into Rosario . . . . 109

SECTION - 8.6 Simulation of Rosario into Monte Alban I . . . . 110

SECTION - 8.7 Simulating social evolution from Monte Alban Ia

to Monte Alban Ib . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

SECTION - 8.8 Simulation from Monte Alban Ib to Monte Alban II125


CHAPTER - 9 CONCLUSION AND FUTURE RESEARCH . . 138

SECTION - 9.1 Summary and Conclusions . . . . . . . . . . . . 138

SECTION - 9.2 Future Work . . . . . . . . . . . . . . . . . . . . 140

APPENDICIES

APPENDIX - A DECISION TREE RULES . . . . . . . . . . . 141

APPENDIX - B ROUGH SET REDUCTS . . . . . . . . . . . . 146

APPENDIX - C ROUGH SET RULES . . . . . . . . . . . . . . 151

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

AUTOBIOGRAPHICAL STATEMENT . . . . . . . . . . . . . . . . . . . 176

vii

LIST OF TABLES

4.1 Example of Decision Table . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Full Discernibility Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3 Core Discernibility Matrix . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 Two Minimal Discernibility Matrix . . . . . . . . . . . . . . . . . . . 25

4.5 Exhaustive Decision System . . . . . . . . . . . . . . . . . . . . . . . 28

4.6 Exhaustive Decision System . . . . . . . . . . . . . . . . . . . . . . . 29

5.1 Number of Terminal Nodes for each Phase . . . . . . . . . . . . . . . 38

5.2 Decision Rule Set Induced from the Decision Tree . . . . . . . . . . . 40

5.3 A Comparison of Decision Trees versus Rough Sets . . . . . . . . . . 42

5.4 First Exhaustive Decision System for the Rosario Phase in Etla Region 43

5.5 Second Exhaustive Decision System for the Rosario Phase in Etla Region 44

5.6 A Comparison of the Rules Produced by Using Decision Trees (DT)

and Rough Set (RS) Constraint Representations. . . . . . . . . . . . 47

6.1 Reducts for Each of the First Three Phases in the Etla Region . . . . 52

6.2 Reducts for Each of the Last Five Phases in the Etla Region . . . . . 52

6.3 Reducts for Each of the Eight Phases in the Central Valley Region . . 52

6.4 Reducts for Each of the Eight Phases in the Valle Grande Region . . 53

6.5 Reducts for Each of the Eight Phases in the Tlacolula Region . . . . 53

6.6 Reducts for Each of the Eight Phases in the Ocatlan Region . . . . . 53

6.7 Comparison between DT and RS in Terms of Rules Length . . . . . 75

6.8 Comparison between DT and RS in Terms of Average Rules Length . 75

viii

6.9 Sensitivity and Specificity for DT and RS . . . . . . . . . . . . . . . . 76

7.1 Environmental Variables . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.2 Land Use Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.3 Payoff Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

ix

LIST OF FIGURES

1.1 Knowledge Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Complexity of the Data . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 A Description of the Oaxaca Valley . . . . . . . . . . . . . . . . . . . 14

3.3 Visual Representation of Rough Sets . . . . . . . . . . . . . . . . . . 16

3.4 Visual Representation of Fuzzy Sets . . . . . . . . . . . . . . . . . . . 17

4.1 Example of Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Model Construction Phases . . . . . . . . . . . . . . . . . . . . . . . 32

5.1 The decision tree for the locations of sites with evidence of raiding in

Rosario phase in the Etla region of the valley . . . . . . . . . . . . . . 39

7.1 Oaxaca Valley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.2 Catchment Area for Sites in The Subgrid 7 . . . . . . . . . . . . . . . 84

8.1 Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . . . 94

8.2 DT Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . 95

8.3 RS Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . 96

8.4 Real Sites; Phase San Jose . . . . . . . . . . . . . . . . . . . . . . . . 97

8.5 Relationships Network; Phase San Jose . . . . . . . . . . . . . . . . . 98

8.6 DT Relationships Network; Phase San Jose . . . . . . . . . . . . . . . 99

8.7 RS Relationships Network; Phase San Jose . . . . . . . . . . . . . . . 100

8.8 Occupied Cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . . . 102

8.9 DT Occupied Cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . 103

x

8.10 RS Occupied cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . 104

8.11 Real Sites; Phase Guadalupe . . . . . . . . . . . . . . . . . . . . . . . 105

8.12 Relationships Network; Phase Guadalupe . . . . . . . . . . . . . . . . 106

8.13 DT Relationships Network; Phase Guadalupe . . . . . . . . . . . . . 107

8.14 RS Relationships Network; Phase Guadalupe . . . . . . . . . . . . . . 108

8.15 Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . . . . 110

8.16 DT Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . 111

8.17 RS Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . . 112

8.18 Real Sites; Phase Rosario . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.19 Relationships Network; Phase Rosario . . . . . . . . . . . . . . . . . . 114

8.20 DT Relationships Network; Phase Rosario . . . . . . . . . . . . . . . 115

8.21 RS Relationships Network; Phase Rosario . . . . . . . . . . . . . . . 116

8.22 Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . . . 118

8.23 DT Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . 119

8.24 RS Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . 120

8.25 Real Sites; Phase Monte Alban Early I . . . . . . . . . . . . . . . . . 121

8.26 Relationship Network; Phase Monte Alban Early I . . . . . . . . . . . 122

8.27 DT Relationship Network; Phase Monte Alban Early I . . . . . . . . 123

8.28 RS Relationship Network; Phase Monte Alban Early I . . . . . . . . . 124

8.29 Relationship Network; Phase Monte Alban Late I . . . . . . . . . . . 126

8.30 DT Relationship Network - Big Sites; Phase Monte Alban Early I . . 127

8.31 RS Relationship Network - Big Sites; Phase Monte Alban Early I . . 128

8.32 DT Relationship Network - Monte Alban; Phase Monte Alban Early I 129

8.33 RS Relationship Network - Monte Alban; Phase Monte Alban Early I 130

8.34 DT Relationship Network - Big Sites; Phase Monte Alban Late I . . . 131

8.35 RS Relationship Network - Big Sites; Phase Monte Alban Late I . . . 132

8.36 DT Relationship Network - Monte Alban; Phase Monte Alban II . . . 133

xi

8.37 RS Relationship Network - Monte Alban; Phase Monte Alban II . . . 134

8.38 DT Relationship Network - Big Sites; Phase Monte Alban II . . . . . 135

8.39 RS Relationship Network - Big Sites; Phase Monte Alban II . . . . . 136

xii

1

CHAPTER 1

INTODUCTION

1.1 Simulating Culture in Algorithms

Culture is a complex system of general customs, ideas, values and beliefs, of a

particular civilization, society or social group, especially at a particular time. In

human societies, culture can be viewed as a system which stores information generated

by groups of one or more individuals, in a symbolic form that is potentially accessible

to all the members of the society. This model of a cultural system is helpful in

guiding the problem solving activities and the social interactions of the societys

members. Knowledge such as general customs, ideas, values, and beliefs are stored

and manipulated at a symbolic level as opposed to a genetic level. This knowledge

can be used to direct and control the self-adaptation and the evolution of the social

population. Thus, cultural systems are viewed as a multiple-inheritance system where,

at each time step, knowledge at both the population level and the symbolic or belief

level is transmitted to the next generation.

Cultural Algorithms have been developed by Reynolds [37] in order to model

evolution in systems upon principles of human social evolution. Cultural Algorithms,

are population-based algorithms, and provide a framework for the integration of an

evolution-based search engine (i.e. genetic algorithm), with a knowledge repository or

belief space. The goal of the coevolutionary search engine is to identify individuals in

a population relevant to the problem at hand. Next, relevant knowledge is extracted

from these individuals and reasoned about in the belief space. This knowledge is then

used to guide the individuals in the next time step.

Basic application areas to which cultural algorithms have been applied include re-

source scheduling, evolution of agriculture, concept learning applications, real-valued

2

function optimization, genetic programming, fraud detection and recently data min-

ing and knowledge discovery, and modeling dynamic environments.

1.2 Archaeological Knowledge Discovery Problem

Anthropologists interested in ancient societies of Highland Mesoamerica, Valley

of Oaxaca, have used intensive archaeological survey in order to study the state

formation. Since these archaeological surveys were begun in the 1960s, the computer

was an essential tool because of the large quantity of data resulting from the surveys.

After the data was collected, it was placed on punch cards and the additional results

published in several books [4, 5, 19] along with extensive site maps. The reason behind

this archaeological survey was to find answers to the following questions: What were

the characteristics of Mesoamerican agricultural systems? What role did hydraulic

agriculture play in prompting or facilitating the growth of large population centers?

When was irrigation first introduced? What was the nature of these population

centers? When and where did urbanism first arise? What decision making structures

and adaptations were necessary to facilitate these changes? [5].

Our goal for the proposed research is to integrate evolutionary learning tools into

the knowledge discovery process and to apply them to the large-scale, archaeological

spatial-temporal data produced by the surveys. This heuristic based approach used

here will employ rough set concepts in order to represent the domain knowledge and

the hypotheses.

While answers to the questions above can possibly be found by investigating the

large-scale database resulting from the archaeological survey, this database contains

over 2700 regional sites and over 2000 residential sites at the Monte Alban urban

center. Each site is comprised of one or more components and can be occupied

in one or more archaeological periods, spanning a period from approximately 9000

B.C. to 1500 A.C. Thus, the total spatial and temporal scope is so vast as to make

3

manual interpretation difficult if not impossible task. In addition, each temporal and

spatial instance of a site component can be described in terms of several hundred

variables of differing types. We can clearly see a gap between data generation and

data understanding here. Tools and techniques from Artificial Intelligence can be

used to fill this gap and aid in the extraction of emergent patterns hidden in the

data, as is shown by Reynolds [35, 38, 39].

Classical tools for generating database queries are good when you know what to

look for, and can answer questions of type How many sites have ceramics from a given

time period and where are their locations? The goal here is to interrogate this data

in order to answer questions about archaic systems, systems that existed thousands

of years ago. To answer these questions we can use a data driven search. Each record

in the data set is accessed sequentially and associated with a node in the search

space. Data driven searches minimize data accesses and can result in time efficient

queries solving, but they are not good for complicated questions, like the ones above,

questions that require the reasoning about spatial and temporal patterns. It may

require exhaustive search in the worst case. To be able to deal with such advanced

queries, more intelligent search tools are needed. The problem now is how to find the

right intelligent search strategy in order to reach our goal.

1.3 Heuristics

Uninformed or blind search, which processes and evaluates all nodes of a search

space in the worst case, is not realistic here because of time constraints are close

related to the dimension of the data. Generally, the search space increases exponen-

tially with problem size thereby limiting the size of problems which can realistically

be solved using exact techniques such as exhaustive search. An alternative solution

is represented by heuristic techniques, which can provide much help in areas where

classical search methods failed.

4

The word heuristic comes from greek and means to know, to find, to

discover or to guide a investigation. Specifically, Heuristics are techniques which

seek good (near-optimal) solutions at a reasonable computational cost without being

able to guarantee either feasibility or optimality, or even in many cases to state how

close to optimality a particular feasible solution is. [42]

Heuristic refers to any techniques that improves the average-case performance on

a problem-solving task but does not necessarily improve the worst case performance.

Heuristic techniques search the problem space intelligently using knowledge of pre-

viously tried solutions to guide the search into fruitful areas of the search space.

Often, search spaces are so large that only heuristic search can produce a solution in

reasonable time. These techniques improve the efficiency of a search process, some-

times by sacrificing the completeness or the optimality of the solution. Heuristics are

estimates of the distance remaining to the goal, estimates computed based on the

domain knowledge.

The domain knowledge provides help to heuristics in guiding the search and can

be represented in a variety of knowledge formats. These formats include patterns,

networks, trees, graphs, version spaces, rule sets, equations, and contingency tables.

With regard to heuristics there are a number of generic approaches such as greedy,

A search, tabu search, simulating annealing, and population-based heuristics. The

heuristic methods can be applied to a wide class of problems in optimization, classi-

fication, statistics, recognition, planning and design.

Of special interest is the integration of heuristic search principles with the dynamic

processes in which data becomes available in successive stages, or where data and

inputs are subjects to uncertainties or with large-scale data sets. The integration

is a vehicle to generate data driven hypotheses. The process is shown in figure 1.1.

Our goal is to generate hypotheses in terms of the archaeological data in order help

anthropologists answer their questions.

5

Data

Model Hypothesis

Acceptable Solutions

Heuristic Learning

Testing

Patterns

Generating

Dual Interaction

Figure 1.1: Knowledge Discovery

The kind of knowledge produced, and the heuristic search algorithm selected, will

reflect the nature of the data analysis task. In this thesis, the hypotheses will be

represented as sets of decision rules and the extracted rules will be represented in

terms of rough sets. Rough sets were selected because of the nature of our data sets

as we will discuss later, in chapter 3.

From a mathematical point of view the problems that we want to solve, can

be formulated in terms of the well known, minimal set cover problem, which is a

combinatorial optimization problem.

Traditional methods for combinatorial optimization problems are not appropriate

here for several reasons. These methods are NP-hard in the worst case and would be

costly to use given the size of the data set. Also, since archaeological efforts in the

valley are ongoing, new sites can be added to the database which would require the

traditional combinatorial approach to be restarted from scratch. The techniques used

to solve these difficult optimization problems have slowly evolved from constructive

methods, like uniformed search, to local search techniques and to population-based

algorithms.

Cultural Algorithms, as population-based algorithms, are good vehicles in which

6

to build meta-level heuristics to guide the search more efficiently. That knowledge,

here we well use rough sets concepts, or rules, can be employed to direct the evolu-

tionary search. The rules can reflect spatial and temporal patterns that will guide the

generation of new candidate search objects by the evolutionary engine. The spatial

and temporal continuity of the data will facilitate this process.

1.4 Multi-agent Based Simulation

After getting decision rules trough evolutionary techniques we want to use them

in a multi-agent model based upon Cultural Algorithms that simulates the evolution

of complex social systems in the Valley of Oaxaca, Mexico. Here, we operationalize

the evolution without stages model of state evolution described by Marcus and

Flannery [23]. Agents are based at sites and have plans for production and resource

exchange with other agents. Agents interact via an evolving game matrix that begins

as a prisoners dilemma model and can evolve to a wardens dilemma model.

Cultural Algorithms are used to model agent learning as the environment evolves.

The organization of the thesis is as follows. Chapter 2 briefly describes the evolu-

tionary learning and ques a short literature review about related approaches. Chapter

3 presents the archeological data sets and the data related problems. Chapter 4 is

dedicated to the theory of Rough Sets. In Chapter 5 we make a comparison between

the Decision Tree approach and Rough Set approach in building a set of rules for a

multi-agent simulation. Chapter 6 presents the Rough Set rules for all five regions of

the Valley using the environmental variables, including land use type. While the site

data identified just 75 sites as targets of warfare, both decision trees and Rough Sets

generalized on this data to identify 987 and 242 sites respectively that are predicted

to be targets of warfare based upon their similarities to the known sites. Chapter 7

describes the large-scale multi-agent simulation model. Chapter 8 describes the runs

of the model using the Rough Set rules and the Decision Tree rules to see the semantic

7

impact of the two different rule sets. The resultant set of association between sites

in terms of trade and warfare demonstrated that the Rough Set rules produced a hi-

erarchical structure of associations that was a better fit to the hierarchical structure

observed in the data than that for Decision Trees. It is suggested that lack of consid-

eration of uncertainty in the generation of decision trees led to an overgeneralization

of the rules that describe warfare activities. Chapter 9 concludes with a discussion

of the impact that the rules used determine agent interaction and expertise have on

the emergence of phenomenon in multi-agent systems. And the impact that the rule

learning process has on the rules formed. It is suggested that based upon this model,

warfare is an important device for the structuring of emergent complexity in the social

organization of the valley

8

CHAPTER 2

EVOLUTIONARY LEARNING UNDER UNCERTAINTY

Population-based heuristic methods are iterative solution techniques that handle

a population of individuals which are evolving according to a given search strat-

egy. At each iteration, periods of self-adaptation (mutations) alternate with periods

of cooperation (crossover), and periods of competition (selection). The population-

based heuristic search [8] is dependent of the following components: the knowledge

representation for the specific problem we want to solve and the search strategy or

the evolution process. The adaptability of an individual represents its ability to

survive in an uncertain environment [7]. Artificial Intelligence researchers have ex-

plored different ways to represent uncertainty [42]: belief networks, default reasoning,

Dempster-Shafer theory, Fuzzy Set theory, Rough Set theory.

For the problems we want to solve, the learning task will require a representation

that explicitly deals with uncertainty. The evolutionary learning methods that are

employed must be able to work with such a representation. In this chapter we look

first at basic ways to represent uncertainty in developing rules. And, then we will

investigate how that uncertain knowledge can be used to direct evolutionary search

and learning.

2.1 Uncertainty

Uncertainty, as well as evolution, is a part of nature. When humans describe

complex environments, they use linguistic descriptors of cognised real-world circum-

stances, that are often not precise, but rather fuzzy. The theory of fuzzy sets

[55] provides an effective method of describing the behavior of a system which is too

complex to be handle with the classical precise mathematical analysis. The theory

9

of rough sets [33] emerged as another mathematical approach for dealing with uncer-

tainty that arises from inexact, noisy or incomplete information. Fuzzy Set theory

assumes that the membership of the objects in some set is defined as a degree rang-

ing over the interval [0, 1]. Rough set theory focuses on the ambiguity caused by the

limited distinction between objects in a given domain.

2.1.1 Fuzzy Sets

Fuzzy Sets have been employed to represent rules generated by evolutionary learn-

ing systems.

Using fuzzy concepts, Valenzuela-Rendon [50] tried to overcome the limitations

of the conventional rule-based classifier system [14] when representing continuous

variables. He used fuzzy logic to represent the results of the genetic-based search of

the classifier system.

Likewise, fuzzy functions have been used to describe and update knowledge in

Cultural Algorithms. First, Reynolds et al. [40] employed a fuzzy acceptance and

influence function in the solution of real-valued constrained optimization problems.

Following the same idea Zhu designed a fully fuzzy Cultural Algorithm [57] which

included a fuzzy knowledge representation scheme in order to deal with the continuous

variables [56] in the belief space, as well as a fuzzy acceptance and influence function.

All these approaches were tested on real-values function optimization problems. More

recently, Jin [16] used a fuzzy knowledge representation for normative knowledge in

the belief space of Cultural Algorithms, to solve the real-valued constrained function

optimization.

The design of a fuzzy representation system is not an easy job, because of the

membership functions should be carefully chosen, and the procedures that use these

functions should specified precisely. The problem is to optimize the fuzzy membership

functions for a problem and to find optimum plans related to the fuzzy performance

10

measures. It is natural approach to use heuristics (i.e. evolutionary algorithms) to

solve this task.

2.1.2 Rough Sets

Another approach to represent uncertainty is with Rough Sets. Rough Sets are

based on equivalence relations and set approximations, and the algorithms for com-

puting rough set properties are combinatorial in nature. Wroblewski [53] implemented

a Genetic Algorithms for computing reducts, based on permutation code as well as a

greedy algorithm. Another approach for building reducts is described by Vinterbo

in [51, 52] and it is based on the set cover problem, in particular on finding minimal

hitting sets using a classical Genetic Algorithm. Finding a minimal set of decision

rules or a satisfactory set is an NP-complete problem. Agotnes [1, 2] used a Genetic

Algorithms to build a optimal set of decision rules, where the fitness function was

based on the quality of each rule. In conclusion, there are many hybrid methods that

integrate evolutionary algorithms and other methods from soft computing, methods

such as Rough Sets.

2.2 Evolutionary Computation

Evolution can be defined in one word, adaptation in an uncertain environment.

Nature has a robust way of dealing with the adaptation of organisms to all kind of

changes and to evolve successful organisms. According to the principles of natural

selection, the organisms that have a good performance in a given environment, sur-

vive and reproduce, whereas the others die off. After reproduction, a new generation

of offspring, derived from the members of the previous generation is formed. The

selection of parents from these offspring is often based upon fitness. Changes in the

environment will affect the population of organisms through the random mutations.

Mayr said that Evolution is a dynamic, two-step process of random variation and se-

11

lection [11]. Using examples from natural systems and theories of adaptive behavior

researchers have been trying to build heuristic evolutionary learning systems.

Evolutionary algorithms are heuristic optimization methods inspired from natu-

ral evolution processes. Currently there are three basic population-only mechanisms

that model evolution: genetic algorithms, evolutionary strategies and evolutionary

programming. Each of the methods, models the evolution of a population of indi-

viduals at a different scale and applies selection and reproduction operators to find

an individual that is fit with regard of the fitness function. The Genetic Algorithm

models evolution at the gene scale, but evolutionary strategies and evolutionary pro-

gramming, model evolution at the species level.

The Cultural Algorithms [37] approach adds another level to the evolutionary

process inspired from the human societies and cultural evolution. It adds to the

population space, a belief space. The belief space will be a collection of symbolic

knowledge that will be used to guide the evolution of the population.

These evolutionary learning approaches have been used to perform the general

rule extraction task that we are interested in. For example, Holland [15] proposed the

learning classifier system that use genetic search as the primary discovery heuristic for

rules. Classifier systems are inductive models capable of learning, in a reinforcement

learning regime, by using the evolutionary process schema. The reward is given to

individual rules not to a complete rule set.

Besides the rule based methods, Decision Trees are well known for their inductive

learning capabilities. Any decision tree can be reformulated as a set of rules. One

of the problems related to the Decision Trees is finding the smallest decision tree.

Simple heuristics can solve the problem. Researchers have tried to integrate Genetic

Algorithms with Decision Tree learning in order to solve complex classification prob-

lems [3, 38, 39]. Bala et al.[3, 49] applied the above methodology for difficult visual

recognition problems involving satellite and facial image data. Other researchers com-

12

bined the Genetic Algorithms or Evolutionary Strategies with neural networks. Yang

et al.[54] used a Genetic Algorithm for feature subset selection problem for neural

network classifiers.

Reynolds et al. [35, 38, 39] investigated the use of Cultural Algorithms to guide

decision tree learning. The data was taken from a real world archeological database,

with a collection of sites found in Valley of Oaxaca, Mexico. The problem was to

localize the sites that present evidence of warfare as opposed with those that did not.

Cultural Algorithms were also used to generate rule-based classifiers for the fraud

detection problem [44]. The task is hard because of the dynamic environments, but

the Cultural Algorithms provide self-adaptive capabilities to deal with the dynamic

changes.

13

CHAPTER 3

EXTRACTING PATTERNS FROM ARCHAEOLOGICAL DATA

Here we will work with the archeological data from [19], a survey of Valley of

Oaxaca, Mexico. This volumes describe and analyze the data collected during the

the Valley of Oaxaca Settlement Project. The project consisted of a detailed mapping

and surface collection of the regions sites plus the major archaeological site, Monte

Alban, part of the Oaxaca valley as is shown in figure 3.1.

Oaxa

Valley

Region

ValleyVall

Grande Tlacolula Ocatlan

Squares

Sites

Components

... ...

... ...

... ...

Figure 3.1: Complexity of the Data

3.1 Data Material

In Valley of Oaxaca the basic descriptive unit is the site, of which there are around

2700 in an area of about 1700 km2. The detailed description for each terrace con-

14

Figure 3.2: A Description of the Oaxaca Valley

15

tains 100 attributes. This data format was obtained after scanning the data from

[19]. First, a series of maps were drawn that show Valley of Oaxaca occupation at

various periods of time by plotting those categories specific to each period. Based

on these maps, population estimates were made for each period. Second, patterns

of differential distribution of ceramic categories were illustrated, within the site by

plotting their densities of occurrence by site division. Little statistical analysis [5],

[19] or data analysis was done previously. In our project the terrace data and the

ceramic categories are put together for the first time in order to discover the reasons

for growth and decline of Valley of Oaxaca and Monte Alban over time.

3.2 Large-Scale Data

One of the most important problems in data analysis relates to the dimensionality

of the data, because many data analysis techniques involve exhaustive search over

the object space. They are very sensitive to the size of the data in terms of time

complexity and it is hard to generate compact rules. The solution is to reduce the

search space horizontally (in terms of records or objects) and vertically (in terms of

fields or attributes or variables), and to use heuristics to guide the search through the

large space of possible combinations of attributes values and classes. Our data set,

for example, contains over 6000 records and over 100 attributes.

3.3 Uncertainty in Data

Uncertainty in a data set can appear for different reasons. One reason is noise.

Errors which can occur during data collection or data entry are referred as noise in

the data. It is also possible that the data set can have missing attribute values. In

this case, the objects containing missing attributes values can be discarded or the

missing values can be replaced with the most common values. Another problem is

that the available knowledge in many situations is incomplete and imprecise. This

16

means that sometimes the attribute values for a set of objects are not sufficient and

precise enough to differentiate between classes of objects. When we are taking about

the Oaxaca Data Set, errors and noise may have occurred for many reasons. The

ancient sites are damaged because plowing, erosion, pot hunting and grazing. Also,

human perception is subjective, and many people worked on the collection of the

data. Some errors are possible due to the scanning process since much of the data

was available from printed text only.

Many different ways of representing and reasoning about uncertainty have been

developed in Artificial Intelligence. These theories includes: belief networks, non-

monotonic logic, Fuzzy Sets along with Fuzzy Logic and Rough Sets. The well-known

Fuzzy Set theory [55] characterizes a concept approximately using a set membership

function with a range of values between 0 and 1. Another approach based on the rough

set theory [33] provides a lower and upper approximation in terms of set belonging

of a concept depending on how the relationship between two partitions of a finite

universe is defined.

Figure 3.3: Visual Representation of Rough Sets

Fuzzy Sets are good approaches for problems with multiple membership grade

requirements (figure 3.4), where judgment on set membership grades is possible and

17

Figure 3.4: Visual Representation of Fuzzy Sets

where the ability to deal with vague predicates is required. They are very good for

real-valued data. On the other hand, Rough Sets with the three-valued simplicity,

lower, upper, and boundary approximation sets (figure 3.3), work well on discrete and

categorical data. Rough Sets can be useful even with missing data, changes of scale,

and problems where membership grades are hard to define, and problems requiring

changes in the partition. Checking the attributes table for the Oaxaca Data Set we

can see that out of all attributes only very few attributes are integer in type with

no real-valued data types. All the other attributes are of the categorical data type,

nominal, ordinal, or binary. We want to find which sites where occupied in each

period of time, so we have to deal with data partitions. These facts suggest here us

that rough sets methods are more appropriate here.

3.4 Redundant Data

In addition to incomplete data, a data set may contain redundant or insignificant

attributes with respect to the problem, or variables that are obscure, and have non

interesting relationships. This case might arise in several situations, for example,

redundant attributes may result from combining relational data tables. Solution to

this problem exists, in the form of feature selection algorithms, such as the reduct

computation in the rough sets case. After discharging the irrelevant attributes we

18

may have horizontal or object related redundancy. This can be solved by applying

horizontal pruning methods, or merging identical objects.

For our data set, looking in the attributes table, we can see first that not all of the

attributes are relevant to our problem. Fields like Recorder, Published information,

and Written comments are useful for detecting noise, but are less important for the

overall analysis and can be removed after the preliminary stage. Secondly, we will

apply the reduct algorithm in order to find a subset of significant attributes.

3.5 Dynamic Data

A fundamental characteristic of real world data sets is that they are dynamic,

which means that their contents are often changing over time. There are two impor-

tant aspects of this problems. First, the run time efficiency of the knowledge model

becomes very important. Second, the knowledge model will no longer be static, but

should have the capability of evolving as data changes over time. Our data sets change

when new sites or ceramic types are discovered. One important characteristic of our

data sets is their temporal and spatial dimensions. The time periods span between

1400 B.C. to 500 A.D.. Spatially speaking, we work with the Valley of Oaxaca data,

1700 km2. In order to find the changes between different periods of time our models

have to change, and we will gain insights of what a dynamic data set is.

3.6 Conclusion

In this chapter a motivation for explicitly representing uncertainty in our knowl-

edge base was presented. The advantages of a Rough Set approach for representing

uncertain knowledge were presented. Then, a methodological framework for applying

rough sets to this data set was discussed. In the next section we describe the formal

properties of rough sets.

19

CHAPTER 4

ROUGH SETS FORMALIZATION

Pawlak [33] introduced Rough Set theory in the early 1980s as a tool for rep-

resenting imprecise or uncertain information, and for reasoning about it. Based on

the notion of indiscernability, rough set theory deals with the approximation of sets,

using equivalence relations. These approximations can form model hypotheses. Many

different applications can be found in the literature, but here we focus on the appli-

cations to the classification problem, in order to answer the questions posed in the

first chapter.

4.1 Formal Definitions and Properties

c1 c2 c3 c4 c5 d

o1 3 2 2 0 2 0o2 1 3 3 1 2 0o3 0 1 0 1 2 0o4 2 1 3 0 0 0o5 3 2 2 0 1 0o6 1 1 2 0 2 0o7 3 0 0 3 2 0o8 0 0 3 2 1 0o9 2 1 1 1 2 0o10 0 3 2 0 0 0o11 2 0 2 1 2 1o12 1 3 2 2 2 1o13 0 1 1 0 2 1o14 3 1 0 1 2 1o15 3 3 0 2 2 1o16 1 2 3 2 0 1o17 2 1 3 2 2 1o18 1 2 3 1 0 1o19 2 2 1 1 2 1o20 2 1 1 1 2 1

Table 4.1: Example of Decision Table

An information system can be defined as a pair S = (U,A), where U is a finite

set of objects and A is a finite set of attributes. Each attribute a A is a function

20

that maps elements of U into a set Va called the attribute domain, of attribute a,

a : U Va (4.1)

Let S = (U,A) be an information system and let C,D A be two subsets ofattributes, called the condition and the decision attributes respectively. A condition

attribute is an attribute that is thought to influence another attribute, the decision at-

tribute. An information system with distinguished conditions and decision attributes

it is called decision table and it is denoted by T = (U,A,C,D). Because a table

with more than one decision attribute can be easily transformed into a similar ta-

ble with only one decision attribute, usually the set of decision attributes contains

only one decision attribute, denoted D = {d}, and T = (U,C, {d}). The decisionattribute d determines a partition in the object space U . The partitions elements

are named decision classes. For example in our example from figure 4.1, U is the

set of objects in the decision table, U = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o11, o12,o13, o14, o15, o16, o17, o18, o19, o20}, C is the set of condition attributes relevant tothe current classification problem, C = {c1, c2, c3, c4, c5} and d is the variable whosevalues define the classification task.

In this example we have two decision classes:

D0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10}

D1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}

With every x U we associate a function dx, which gives the value from Vc fora given attribute c, dx : C Vc ([33]), such that dx = c(x), for every c C {d}.dx|C and dx|d are the restrictions of dx to C, respectively d.

For every x, y U , we say that the object x is discernible if for every y, suchthat x 6= y, dx|C = dy|C implies dx|d = dy|d, otherwise the object is indiscernible.A decision table is consistent if all of the objects are discernible, otherwise it is

inconsistent.

21

4.2 Discernibility Matrix

A decision table T = (U,C, {d}), defines a matrix MdC called the discernibilitymatrix modulo decision attribute d [26]. For x, y U with the condition that x andy are from different decision classes dx|d 6= dy|d.

MdC(x, y) = {c C|c(x) 6= c(y) and d(x) 6= d(y)} (4.2)

Since objects are divided into decision classes, we do not have to discern between

o1 o2 o3 o4 o5 o6 o7 o8 o9 o10

o11 1,2,4 1,2,3 1,2,3 2,3,4,5 1,2,4,5 1,2,4 1,3,4 1,3,4,5 2,3 1,2,4,5o12 1,2,4 3,4 1,2,3,4 1,2,3,4,5 1,2,4,5 2,4 1,2,3,4 1,2,3,5 1,2,3,4 1,4,5o13 1,2,3 1,2,3,4 3,4 1,3,5 1,2,3,5 1,3 1,2,3,4 2,3,4,5 1,4 2,3,5o14 2,3,4 1,2,3 1 1,3,4 2,3,4,5 2,3,4,5 2,4 1,2,3,4,5 1,3 1,2,3,4,5o15 2,3,4 1,3,4 1,2,4 1,2,3,4,5 2,3,4,5 1,2,3,4 2,4 1,2,3,5 1,2,3,4 1,3,4,5o16 1,3,4,52,4,5 1,2,3,4,5 1,2,4 1,3,4,5 2,3,4,5 1,2,3,4,5 1,2,5 1,2,3,4,5 1,2,3,4o17 1,2,3,41,2,4 1,3,4 4,5 1,2,3,4,5 1,3,4 1,2,3,4 1,2,5 3,4 1,2,3,4,5o18 1,3,4,52,5 1,2,3,5 1,2,4 1,3,4,5 2,3,4,5 1,2,3,4,5 1,2,4,5 1,2,3,5 1,2,3,4o19 1,3,4 1,2,3 1,2,3 2,3,4,5 1,3,4,5 1,2,3,4 1,2,3,4 1,2,3,4,5 2 1,2,3,4,5o20 1,2,3,41,2,3 1,3 3,4,5 1,2,3,4,5 1,2,3,4 1,2,3,4 1,2,3,4,5 - 1,2,3,4

Table 4.2: Full Discernibility Matrix

objects that belong to the same class. Figure 4.2 shows the discernibility matrix

for our decision table from figure 4.1. The entries in the table represent the set of

condition attributes. For example for the row o11 and column o1, the entry 1, 2, 4

identifies condition attributes set {c1, c2, c4}.

4.3 Indiscernibility Relations

A discernibility matrix MdC defines a binary relation RdC U2. The relation RdC

is called an indiscernibility relation with respect to C and d, and reveals the pairs of

objects from different classes that we cannot discern between. For x, y U underthe condition that x and y are from different decision classes, dx|d 6= dy|d,

xRdCy MdC(x, y) = . (4.3)

22

RdC is an equivalence relation that has three basic properties:

1. Reflexivity. For all x U , xRdCx.

2. Symmetry. For all x, y U , xRdCy yRdCx.

3. Transitivity. For all x, y, z U , if xRdCy and yRdCz xRdCz.

The equivalence relation RdC , induces a partition over the universe U, meaning that

the resultant equivalence classes are disjoint and the union equals the universe U.

The indiscernibility set of an object x U is denoted as

RdC(x) = {y U | xRdCy}. (4.4)

In our example RdC(o9) = RdC(o20) = {o9, o20}.

4.4 Rough Sets

The idea behind Rough Sets is to approximate a set of interest in terms of other

sets.

With each subset X U and an equivalence relation RdC defined over U we canassociate two subsets:

RdCX = {x U | RdC(x) X} (4.5)

RdCX = {x U | RdC(x) X 6= } (4.6)

are called the lower and upper approximations of X respectively. The set RdCX is

the set of all elements of U which can be classified with certainty as elements of X,

in terms of the equivalence relation Rdc . This set is also called the positive region

of X. The set RdCX is the set of elements of U which can be classified, with some

uncertainty, as elements of X. The set RdCX RdCX will be called the RdC boundaryof X.

We consider two subsets D0, D1 U , D0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10}and D1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}, and the equivalence relation

23

Figure 4.1: Example of Rough Sets

RdC over U defined from the decision table from table 4.1 and the discernability matrix

from table 4.2. Then we compute the lower and the upper aproximations for the two

subsets, D0, D1 (figure 4.1):

RdCD0 = {o1, o2, o3, o4, o5, o6, o7, o8, o10}RdCD0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o20}RdCD1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19}RdCD1 = {o9, o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}RdCD0RdCD0 = RdCD1RdCD1 = {o9, o20}

4.5 Reducts and the Core of Knowledge

One problem is whether some of the attributes in a decision system are redundant

with respect to the object classifications. If an attribute set B C preserves theindiscernibility relation, RdC , then the attributes that form the set C B are said tobe dispensable. All minimal subsets, in terms of size, of attributes B that preserve

the relation RdC are called reducts and we denoted the set by Red(T ).

24

The intersection of all minimal subsets of attributes, from B is called the core of

the attributes set C, and it is denoted by Core(C). The core can by defined in terms

of the discernibility matrix as the set of all single element entries of the discernibility

matrix.

Core(C) = {c C||Mdc (x, y)| = 1 for some x, y} (4.7)

In our case, looking in the table 4.2, we can conclude that Core(C) = {c1, c2}.

o1 o2 o3 o4 o5 o6 o7 o8 o9 o10o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2o12 1,2 3,4 1,2 1,2 1,2 2 1,2 1,2 1,2 1o13 1,2 1,2 3,4 1 1,2 1 1,2 2 1 2o14 2 1,2 1 1 2 2 2 1,2 1 1,2o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o17 1,2 1,2 1 4,5 1,2 1 1,2 1,2 3,4 1,2o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2o20 1,2 1,2 1 3,4,5 1,2 1,2 1,2 1,2 - 1,2

Table 4.3: Core Discernibility Matrix

Now, we can define the full set of reducts in terms of the discernibility matrix.

The set B, such that B C is the reduct of C if B is a minimal, with respect toinclusion, subset of C such that B MdC 6= for any nonempty MdC (MdC(x, y) 6= ).Looking at figure 4.3, it can be easily seen that there are two reducts in our example

Red(T ) = {{c1, c2, c4}, {c1, c2, c3, c5}} shown in figure 4.4.Besides the full reducts defined above, we can define reducts that are relative to

a particular object in the decision table. We call these reducts object-related reducts

If indiscernibility is relative to an object x, two other objects y and z are considered

to be indiscernible in comparison with x. Reducts that are related to a particular

object x are called x-relative reducts, Red(T, x), since they contain the minimum

information needed to select that particular object from other objects in the decision

table.

25

o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o1 o2 o3 o4 o5 o6 o7 o8 o9 o10

o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2 o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2o12 1,2 4 1,2 1,2 1,2 2 1,2 1,2 1,2 1 o12 1,2 3 1,2 1,2 1,2 2 1,2 1,2 1,2 1o13 1,2 1,2 4 1 1,2 1 1,2 2 1 2 o13 1,2 1,2 3 1 1,2 1 1,2 2 1 2o14 2 1,2 1 1 2 2 2 1,2 1 1,2 o14 2 1,2 1 1 2 2 2 1,2 1 1,2o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1 o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2 o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o17 1,2 1,2 1 4 1,2 1 1,2 1,2 4 1,2 o17 1,2 1,2 1 5 1,2 1 1,2 1,2 3 1,2o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2 o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2 o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2o20 1,2 1,2 1 4 1,2 1,2 1,2 1,2 - 1,2 o20 1,2 1,2 1 3,5 1,2 1,2 1,2 1,2 - 1,2

Table 4.4: Two Minimal Discernibility Matrix

There are several algorithms for computing reducts or reduct approximations.

These algorithms assume that any attributes subset of C can be an approximation

to a reduct. The Exhaustive Reducer algorithm [28, 29, 30] computes all the reducts

by brute force, by exhaustive search. The algorithm takes exponential time in terms

of the number of objects, so it is not suitable for very large decision systems as it

may be very time consuming. Another algorithm is the Johnson Reducer [28, 29, 30]

which invokes a simple greedy algorithm to compute only a single reduct. Because

of the NP-completeness of the problem, heuristic methods can be more effective.

Wroblewski [53] proposed a variation of a Genetic Algorithm to search for reducts,

either until the search space is exhausted or until a given maximum number of reducts

has been found. Another heuristic approach was proposed by Vinterbo [51, 52]. It is

based on minimal hitting sets. We will present this method in the following section.

4.6 Reducts, Hitting Sets, Approximate Hitting Sets

Multisets are unordered collections of elements where an element can occur as a

member more than once. A hitting set [51, 52] for a given multiset, MS, of elements

from 2C is a set B, B C, such that the intersection between B and every set in

26

MS is non-empty.

HS(MS) = {B C|B MSi 6= for all MSi MS} (4.8)

The set B HS(MS) is a minimal hitting set of MS, if B is no longer a hitting set,when ever any of its elements are removed. The set of minimal hitting sets is denoted

by minHS(MS).

An approximation to the hitting set is a set that covers enough elements of the

multiset MS as denoted by a constant . The set of -approximate hitting sets of S is

denoted by HS(MS, ), where the parameter controls the degree of approximation,

HS(MS, ) = {B C| |MSi MS and MSi B 6= ||MS| } (4.9)

The set B SH(MS, ) is a minimal -approximation hitting set if it is no longeran -approximation hitting set when any of its elements are removed.

The problem of computing the minimal hitting set, like the reducts computation,

is an NP-hard problem. Again, it is necessary to use heuristics in order to find reducts

using hitting sets, but we still cannot guarantee the minimality of the reducts.

4.7 Decision Rules, Decision Systems

A decision rule is an assertion, of the form if p then s, denoted by p s,where p and s are logical formulas in the first order logic. For each object, certain

values of the condition attributes determine the value of the decision attribute. We

define a decision system as a finite collection or set of decision rules. In order to

obtain a decision system with a minimum number of rules, superfluous decision rules

associated with the same decision class can be eliminated without disturbing the

decision making process.

Let D be a decision system and let T = (U,C, {d}) be an information system.The set of all decision rules in D having the same successor s as a decision attribute

value, is denoted by Ds. We can say that Ds is the decision subsystem associated

27

with class s. The set of all predecessor formulas of the decision rules belonging to Ds,

is denoted by Ps. We say that a decision rule p s from D is dispensable in D ifthe disjunction of all the formulas in Ps is equivalent to the disjunction of formulas

in Ps {p}. Otherwise, the rule is indispensable. If every decision rule belongingto Ds is indispensable, the decision subsystem Ds is said to be independent. For a

decision system, if all its decision subsystems Ds are independent, we say that the

decision system is minimal.

4.8 Decision System Construction

The problem of decision system construction is to induce a set of rule descriptors

of decision classes from the input set of objects. These sets of descriptors, named

decision systems, consist of a set of decision rules. We can classify the decision

system as following:

1. Decision systems with a minimum set of rules. They are focused on describing

input objects using a minimum number of necessary rules.

2. Decision systems with an exhaustive set of rules (figure 4.5, 4.6). These decision

systems contain all possible decision rules.

3. Decision systems with a satisfactory set of rules. This category represents sets

of decision rules which satisfy given a priori users requirement for an acceptable

decision system.

One strategy for finding a simple decision system with good classificatory capabil-

ities is to first induce an exhaustive set of rules, and then to prune away those rules

that do not lower the decision systems performance significantly. An exhaustive de-

cision system can be generated from the object-related reducts [26, 27], as described

in section 4.5.

28

Rule 1 2 3 4 5 6 7 81 C2(0) AND C4(2) => D(0) 1 1 1 0.05 0.1 1 2 12 C2(3) AND C4(1) => D(0) 1 1 1 0.05 0.1 1 2 13 C2(2) AND C4(0) => D(0) 2 2 1 0.1 0.2 1 2 14 C2(3) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 15 C1(0) AND C4(2) => D(0) 1 1 1 0.05 0.1 1 2 16 C1(1) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 17 C1(2) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 18 C1(0) AND C4(1) => D(0) 1 1 1 0.05 0.1 1 2 19 C1(3) AND C4(0) => D(0) 2 2 1 0.1 0.2 1 2 1

10 C4(3) => D(0) 1 1 1 0.05 0.1 1 1 111 C1(0) AND C2(3) => D(0) 1 1 1 0.05 0.1 1 2 112 C1(0) AND C2(0) => D(0) 1 1 1 0.05 0.1 1 2 113 C1(3) AND C2(0) => D(0) 1 1 1 0.05 0.1 1 2 114 C1(1) AND C2(1) => D(0) 1 1 1 0.05 0.1 1 2 115 C1(3) AND C2(2) => D(0) 2 2 1 0.1 0.2 1 2 116 C1(2) AND C2(1) AND C4(1) => D(0) OR D(1) 2 1,1 0.5,0.5 0.1 0.1,0.1 1.0,1.0 3 217 C2(1) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 118 C2(2) AND C4(1) => D(1) 2 2 1 0.1 0.2 1 2 119 C1(0) AND C2(1) AND C4(0) => D(1) 1 1 1 0.05 0.1 1 3 120 C2(2) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 121 C2(3) AND C4(2) => D(1) 2 2 1 0.1 0.2 1 2 122 C2(0) AND C4(1) => D(1) 1 1 1 0.05 0.1 1 2 123 C1(2) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 124 C1(3) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 125 C1(3) AND C4(1) => D(1) 1 1 1 0.05 0.1 1 2 126 C1(1) AND C4(2) => D(1) 2 2 1 0.1 0.2 1 2 127 C1(2) AND C2(2) => D(1) 1 1 1 0.05 0.1 1 2 128 C1(1) AND C2(2) => D(1) 2 2 1 0.1 0.2 1 2 129 C1(3) AND C2(3) => D(1) 1 1 1 0.05 0.1 1 2 130 C1(3) AND C2(1) => D(1) 1 1 1 0.05 0.1 1 2 131 C1(2) AND C2(0) => D(1) 1 1 1 0.05 0.1 1 2 1

Table 4.5: Exhaustive Decision System

Pruning can be done by identifying and removing components of the decision

system that only explain small parts of the data, thereby preserving general trends in

the underlying data material. In order to find a minimal decision system we can use a

simple greedy heuristic algorithm described by Lazar and Sethi [20]. This algorithm

computes only one decision system. If more than one minimal decision system is

required we can use a Genetic Algorithm which solves the minimal cover set problem.

Agotnes [1, 2] proposed two algorithms for generating satisfactory decision systems,

a quality-based rule filtering algorithm and a genetic rule filtering algorithm. Rule

29

Rule 91 C2(0) AND C4(2) => D(0) {o2}2 C2(3) AND C4(1) => D(0) {o2}3 C2(2) AND C4(0) => D(0) {o1,o5}4 C2(3) AND C4(0) => D(0) {o10}5 C1(0) AND C4(2) => D(0) {o8}6 C1(1) AND C4(0) => D(0) {o6}7 C1(2) AND C4(0) => D(0) {o4}8 C1(0) AND C4(1) => D(0) {o3}9 C1(3) AND C4(0) => D(0) {o1,o5}

10 C4(3) => D(0) {o7}11 C1(0) AND C2(3) => D(0) {o10}12 C1(0) AND C2(0) => D(0) {o8}13 C1(3) AND C2(0) => D(0) {o7}14 C1(1) AND C2(1) => D(0) {o6}15 C1(3) AND C2(2) => D(0) {o1,o5}16 C1(2) AND C2(1) AND C4(1) => D(0) OR D(1) {o9,o20}17 C2(1) AND C4(2) => D(1) {o17}18 C2(2) AND C4(1) => D(1) {o18,o19}19 C1(0) AND C2(1) AND C4(0) => D(1) {o13}20 C2(2) AND C4(2) => D(1) {o16}21 C2(3) AND C4(2) => D(1) {o12,o15}22 C2(0) AND C4(1) => D(1) {o11}23 C1(2) AND C4(2) => D(1) {o17}24 C1(3) AND C4(2) => D(1) {o15}25 C1(3) AND C4(1) => D(1) {o14}26 C1(1) AND C4(2) => D(1) {o12,o16}27 C1(2) AND C2(2) => D(1) {o19}28 C1(1) AND C2(2) => D(1) {o16,o18}29 C1(3) AND C2(3) => D(1) {o15}30 C1(3) AND C2(1) => D(1) {o14}31 C1(2) AND C2(0) => D(1) {o11}

Table 4.6: Exhaustive Decision System

filtering operates on an existing exhaustive decision system, pruning it while retaining

a high performance. Both of the above solutions make no assumptions about the

minimal set cover condition. As a result, the decision system may not be minimal.

We will propose a new solution based on the Cultural Algorithm which addresses the

minimal set cover problem explicitly.

30

4.9 Steps for Model Construction

The typical steps (figure 4.2) for constructing model hypotheses for classification

problems using the Rough Sets methodology are as follows:

Figure 4.2: Model Development

1. Sampling and selection: This step imply the construction of a research file. Usu-

ally this is done by selecting a random sample of prospects from the database.

Also, the irrelevant attributes should be removed. Then, the selected data is

represented as a two-dimensional table.

2. Preprocessing: If the selected table contains missing values or empty cell entries,

the table must be preprocessed in order to remove some of the incompleteness.

Statistics should be run to obtain more information about the data.

31

In the initial data set, missing values were coded differently for each variable.

To make the table consistent we replaced all the values assigned to missing

data with the SPSS convention, system missing data. We did not replace the

missing values with a default value since the Rough Sets representation can deal

with uncertainty, and the algorithms define them as undetermined. Descriptive

statistics, using SPPS 8.0 were run for all of the attributes and used to identify

outliers for modification.

3. Transformation: For example, measurement attributes should be discretized,

and used instead of exact observations. Categorical data may be recoded to

provide a consistent interpretation. The Rough Sets procedure is potentially

sensitive to large differences in the number of categories for variables. Thus,

we performed transformations of the data with many categories to consolidate

them when possible into smaller set of categories. This was done in a systematic

way, after checking the statistics, and carefully without loosing much informa-

tion. Also, the decision variables should be identified. After this step the data

becomes more qualitative than quantitative.

4. Training and validation sample: The initial table is divided into at least two

subtables. One will be used in the training step, the other in the validation or

testing step.

5. Develop the model: Knowledge discovery techniques are applied to the training

data in order to generate a set of hypothesized relations. Following the Rough

Set methodology, the full set of reducts are computed, a set of minimal reducts

is chosen, and the data table is vertically pruned. Then the object related

reducts are computed and the exhaustive decision rule system is generated. At

the end a pruning method for the decision rule set is applied in order to obtain a

good decision system, with a good balance between the number of rules and the

32

Data

Compute full

reducts

Compute object-related

reducts

Generate exaustive

decision-rule system

Find minimal or satisfectory

decision-rule system

Figure 4.3: Model Construction Phases

accuracy of the classifications. The process is shown in figure 4.3. The above

procedure was followed exactly and the results are shown in section 6.

6. Interpretation and evaluation: The validation or test data set is then used

to test the classificatory performance of the new model. Also, if it is a rule-

based model, it can be checked by specialists, in order to understand the data

sets, understand and explain dependencies between values of attributes and

definitions of decision classes. The expert will check the decision rule system.

33

CHAPTER 5

EVOLUTION-BASED LEARNING OF ONTOLOGICAL

KNOWLEDGE FOR A LARGE-SCALE MULTI-AGENT SIMULATION

5.1 Introduction

The results of the data mining process can be used in many different ways. There-

fore, the form of the knowledge collected will have a major impact on the efficiency

and effectiveness of its use in a given application. In this chapter we examine the

problem of extracting knowledge for use by agents in a large-scale multi-agent system

[42]. Here, the knowledge is ontological knowledge that represents constraints that

the physical and social environments placed upon the agents and their interactions.

The ontological knowledge represents the semantic building blocks around which the

world models are formed. For an agent in a particular model, only the things in

his ontology can exist and it cannot perceive things that are not represented in the

ontology. An ontology Fox et al [12] is a basic level of knowledge representation

scheme, a formal definition of entities and their properties, interactions, behaviors

and constraints. Each agents decisions need to be checked against these constraints

prior to their execution. In a complex multi-agent system, hundreds of thousands of

agents may need to check these constraints regularly which means that a successful

Data Mining activity will need to produce a relatively small set of syntactically simple

rules for the process to be efficient. Fox et al [12] have used Data Mining techniques

to produce corporate ontogenies.

There are several factors that can influence the nature of the ontological con-

straints that are produced. First, the nature of the data collection and measurement

process and the uncertainty induced into the data set by the presence of noise. Second,

the nature of the representation used to express the extracted patterns; e.q. whether

34

it allows for uncertainty or not. Third, the data mining technique employed and the

assumptions that it makes about the collected data. Fourth, how these constraints

will be stored, accessed, and used by the agents involved.

For a given data set one can compare the different Data Mining techniques in

terms of the syntactic and semantics of the induced constraints. In this application

we are interested in simulating the emergence of the archaic state in the Valley of

Oaxaca, Mexico. A state is among the most sophisticated and powerful structures

that has emerged from the social evolution process. In the modern world these are

termed nation states with a government composed of a hierarchical decision-making

structure where the decision-makers are either elected or appointed. States are sup-

ported by various economies and are able to interact with each other via warfare,

trade, etc.. Most states in the ancient world-often called archaic states were ruled by

hereditary royal families. These archaic states exhibited much internal diversity with

populations numbering from tens of thousands to millions. They had a bureaucracy,

organized religion, a military presence, large urban centers, public buildings, public

works, and services provided by various professional specialists. The state itself could

enter into warfare and trade-based relationships with other states and less complex

neighbors.

The process by which complex social entities such as the state emerged from lower

level structures and other supporting economies has long been of prime interest to

anthropologists and other disciplines as well. This is because the emergence of such

a social structure can have a profound impact on the societies physical and social

environment. However, the task of developing realistic computational models that

aid in the understanding and explanation of state emergence has been a difficult one.

This is the result of two basic factors:

The process of state formation inherently takes place on a variety of temporaland spatial scales.

35

The emergence of hierarchical decision-making [35] can be viewed as an adapta-tion that allows decision-makers to specialize their decisions to particular spatial

and temporal scales.

The formation of the state is a complex process that is fundamentally directed by

the social variables but requiring dynamic interaction between the emergent system

and its environment. Identifying the nature of these interactions is one of the reasons

why the process of state formation is of such interest.

The goal of this project is to produce a large-scale knowledge-based computational

model of the origins of the Zapotec State (Marcus and Flannery 1996), centered at

Monte Alban, in the Valley of Oaxaca, Mexico. State formation took place between

1400 B.C. and 300 B.C. While archaic states have emerged in various parts of the

world, the relative isolation of the valley allowed the processes of social evolution

to be more visible there. Extensive surveys [4, 5, 19] of the 1700 km2 valley, were

undertaken by the Oaxaca Settlement Pattern Project in the 1970s and 1980s. The

location and features of over 2,700 sites dating from the archaic period (8000 B.C.)

to Late Monte Alban V (just prior to the arrival of the Spaniards) were documented.

Several hundred variables were recorded for each site. In addition, they surveyed the

6.5 square kilometer urban center of Monte Alban, a site that contained over 2,000

residential terraces. This site was the focus for early state formation in the valley.

Both surveys provided the knowledge needed to create our multi-agent simulation

model. We then produced a spatial temporal database that contained the results

of both surveys and used data mining techniques from Artificial Intelligence [42] to

produce knowledge about site location, warfare, trade, and economic decisions to be

used for the construction of the multi-agent model. However, in order to do this

we needed to add more data about the spatial and temporal context to both the

regional and urban center surveys. Specifically, we had to add variables that allowed

us to locate each site spatially and temporally to a level of precision consistent with

36

the scale of our simulation. For example, temporal periods are characterized by the

presence of pottery of different styles. That data was available only in text form. All

of this pottery data, over 130 variables for each residential terrace, was scanned into

the computer, corrected for errors, and added to the Monte Alban data set. This

data allowed us to identify the periods that each terrace was occupied. Pottery data

was also integrated into the regional data set.

In addition, the survey had produced hundreds of pages of hand drawn maps

for both the Monte Alban and regional surveys that contained the spatial context

for the location of each site. Since our goal was to ask specific questions about the

spatial and temporal context we needed to tie each site into its mapped location. We

then proceeded to digitize each of the maps and to associate each site object with its

corresponding data record. This allowed us to produce a geographical information

system (GIS) that serves as our virtual valley of Oaxaca. This acts as a vehicle

for our data mining activities and as a knowledge base for the multi-agent simulation

and allows the results of the simulation to be displayed and compared with the actual

data in a spatial context. It is envisioned that the resultant GIS system will be a

useful tool for researchers and students from various fields to study the emergence of

complexity in the future.

In order to perform the data mining activities, we extended traditional data mining

techniques and developed new ones in order to deal with the complexities inherent

in the Oaxaca database. At the regional level we used Utgoffs incremental Decision

Tree Algorithm (IDTI) [48] to generate the Decision Trees for each region and phase

of the valley. The approach was used to generate decision trees that discriminated

between sites that were targets for warfare and those that were not for a given period

[39, 38].

However, given the many disparate steps under which the data was collected and

organized it was felt that perhaps some improvements might be made by using a

37

technique that took into account the presence of uncertainty in the data, especially

in regions and periods when the social and settlement patterns were complex and

prone to data collection error. To test this hypothesis we selected a period of time

just before the emergence of the state, Rosario, where there was evidence of increased

social strife brought about in part by increased population growth. The part of the

valley that exhibited the greatest population at the time, Etla, was also selected.

Since the majority of the data was discrete rather than continuous in nature we

selected Rough Sets as a vehicle for representing uncertainty here. We employed an

evolutionary technique, Genetic Algorithms [13, 14], to control the search in this case

because Genetic Algorithms had been successfully used with Rough Sets previously.

The decision systems or rule sets produced by both approaches were then compared in

terms of their ability to decide about the location of sites that are targets for warfare

in this period. We then compared the two approaches over all relevant phases of

social evolution in the valley.

In section 2 we begin with an overview of Decision Trees and their generation.

Section 3 provides a comparison of a specific decision system for the Etla region in

the Rosario phase as produced by Decision Trees and Rough Sets respectively. Section

4 compares the two approaches together over all phases in the Etla region. The Rough

Set approach appears most useful in those phases with the most evidence for change

in social complexity. Section 5 gives our conclusions.

5.2 Decision Tree

Decision Tree induction is a very good method for high-dimensional applications. It

is a fast non-linear prediction method and employs dynamic feature selection. The

solution complexity is expressed in terms of the number of terminal nodes. The most

complex tree covers all cases in the training data. Pruning the tree and measuring

the errors in progressively smaller trees find less complex solutions.

38

Period Approximate Date

Tierras Largas 1400 - 1150 BCSan Jose 1150 - 850 BCGuadalupe 850 - 700 BCRosario 700 - 500 BCMonte Alban Ia 500 - 300 BCMonte Alban Ic 300 - 150/100 BCMonte Alban II 150/100 BC - AD 200Monte Alban IIIa AD 200 - 500Monte Alban IIIb AD 500 - 700/750Monte Alban IV AD 700/750Monte Alban V AD 1000 - 1521

Table 5.1: Number of Terminal Nodes for each Phase

Any decision tree can be rewritten in a form of decision rule set, however there are

sets of rules which can not be expressed as any decision tree. An implied decision rule

in a tree is a complete path to a terminal node. Because these rules are not mutually

exclusive, the size of the decision rule set can be much larger then the logic needed

for overlapping rules. One rule can combine a large number of terms or true-false

conditions, which takes a lot of time for evaluation, when the rule set is used in an

agent-based simulation.

One of the advantages of logic-based solutions as decision trees and corresponding

decision rules is their powerful explanatory capabilities. Table 5.1 gives all of the

relevant periods of social evolution in the valley. Tierras Largas marks the beginning

of early village settlement there. The state emerged at Monte Alban in period Monte

Alban Ia. The valley came under control of the state by Monte Alban II, and Monte

Alban IIIa signaled the decline of the state and it succession by a collection of city-

states localized in different parts of the valley.

For some periods there were several hundred rules produced just for the warfare

constraint alone. Adding in the other constraints would produce a knowledge base of

several thousand rules at minimum. However, since the data were collected over a ten-

year period over varying landscape using different surveyors and recording equipment

39

env_zone

40

it was felt that a significant amount of noise might be present in the data. The

original technique did not account explicitly for uncertainty in the measurement data.

Thus, it was possible that certain rules or rule conditions were present only to deal

with the specific noise introduced into the process by the survey methods and data

transformation activities. If this was the case, then by using a representation and

associated learning technique that dealt explicitly with uncertainty it might be that

fewer rules would be needed in periods where the noise is most pronounced.

Rules

1 env zone

41

and the those that are not. Three variables were used in order to compute the

decision: the presence of burnt daubt at the site, other evidence of burning, and the

presence of defensive walls. The variables used to predict these decisions from positive

and negative examples in the training set were: Environmental zone, Slope, Hilltop

or Ridge top, Soil character, On the boundary between the loam and the swampy

region, Water source, Depth of Water table, Type of irrigation, and Land use type

among others.

In section 2 we presented a decision tree 5.1 and a corresponding decision system

5.2 for the Rosario phase (700-500 B.C.) generated by the Decision Tree approach.

It is the fourth phase of occupation in the study and at that time population size

and warfare increased substantially (Marcus at al [23]). For example, it was observed

that chunks of burnt daub appear on the surface of the villages seven times more

frequently than in the previous phases. There are 36 sites in the Rosario phase. The

archaic state emerged in the period following this phase of increased warfare.

Further, we consider another approach for extracting logical patterns within data

sets Rule induction using Rough Sets. Although, trees and rules may seem similar

they are in fact very different as it can be seen in the table 5.3.

In essence, Decision Trees may be viewed as a simplistic approach to rule discovery.

In order to extract rules, we performed a feature selection using the Rough Set guided

by Genetic Algorithm with the variables above. The Rough Set approach selected the

four variables in two reducts: environmental zone, slope, water source, land use and

slope, soil character, water source and land use. The corresponding decision systems

are given in tables 5.4 and 5.5.

This table represents the exhaustive set of rules produced. While it is clear that

several of the rules are so simple that they can be easily combined to produce a

smaller set of rules overall, it is sufficient for comparative purposes here.

Our focus here is on the impact to which the use of a technique, such as Rough

42

Decision Trees Rough Sets

Advantages Easy to understand Very expressiveModular knowledgeGood with missing dataThey handle unprecise data

Disadvantages May be difficult to use withcontinuous data

Can be memory intensive

They look at simple combina-tion of attributes

Can be computational intensive

They need to break numericfields into fixed rangesNot very good with inexactdataNot flexibleNo way to handle missing dataCan not easily approach largedata setsMay have overfittingLess accurate predictions

Table 5.3: A Comparison of Decision Trees versus Rough Sets

Sets, that explicitly is able to deal with uncertainty in the recognition decision. From

this standpoint there are two basic points of comparison. Firstly, how many of the

rules identify a site for attack unambiguously and, what percentage of the rules that

select sites for attack do they comprise? Secondly, in those cases in which the rule

produces a split decision we will need to resolve the tie using other means. The

question is, how much effort do we need to spend in order to find out that we must

contact another source to resolve the question?

In answer to the first question, explicitly dealing with uncertainty using the rough

set representation produced four rules that identify sites for attack as opposed to just

three rules in the decision tree approach. Of these five rules, three of the five (7, 10

and 12 or 6, 9 and 11) result in unambiguous decisions. That is, 66% of the rules

that can conclude that a site can be attacked are unambiguously so whereas the other

two need further clarification. The Decision Trees approach produces 3 rules that can

43

Rules

1 env zone(High Alluvium) => Not Attacked2 env zone(Middle Piedmont) => Not Attacked3 env zone(Lower Piedmont) AND slope(Shallow) AND water source(Arroyo)

AND land use(Class I) => Not Attacked OR Attacked4 water source(Tributary Stream) AND land use(Class III 100%) => Not At-

tacked5 water source(Tributary Stream) AND land use(Class I) => Not Attacked OR

Attacked6 water source(Arroyo) AND land use(Class III 100%) => Not Attacked7 water source(Spring) AND land use(Class III 100%) => Attacked8 water source(Spring) AND land use(Class I) => Not Attacked9 slope(Shallow) AND land use(Class III 100%) => Not Attacked10 slope(Flat) AND land use(Class III 100%) => Attacked11 slope(Flat) AND land use(Class I) => Not Attacked12 slope(Flat) AND water source(Spring) => Attacked13 slope(Flat) AND water source(Arroyo) => Not Attacked14 slope(Shallow) AND water source(Spring) => Not Attacked15 water source(Well) => Not Attacked16 water source(Main River) => Not Attacked

Table 5.4: First Exhaustive Decision System for the Rosario Phase in Etla Region

44

Rules

1 soil(Alluvium) => Not Attacked2 slope(Shallow) AND soil(Bedrock) AND water source(Arroyo)

AND land use(Class I) => Not Attacked OR Attacked3 water source(Tributary Stream) AND land use(Class III 100%) => Not At-

tacked4 water source(Tributary Stream) AND land use(Class I) => Not Attacked OR

Attacked5 water source(Arroyo) AND land use(Class III 100%) => Not Attacked6 water source(Spring) AND land use(Class III 100%) => Attacked7 water source(Spring) AND land use(Class I) => Not Attacked8 slope(Shallow) AND land use(Class III 100%) => Not Attacked9 slope(Flat) AND land use(Class III 100%) => Attacked10 slope(Flat) AND land use(Class I) => Not Attacked11 slope(Flat) AND water source(Spring) => Attacked12 slope(Flat) AND water source(Arroyo) => Not Attacked13 slope(Shallow) AND water source(Spring) => Not Attacked14 water source(Well) => Not Attacked15 water source(Main River) => Not Attacked

Table 5.5: Second Exhaustive Decision System for the Rosario Phase in Etla Region

conclude that a site can be attacked with only one of them (rule 3) being conclusive.

Thus, only 33% of the rules that identify a site for attack are conclusive as opposed

to 66% for the Rough Set approach. By taking data uncertainty into account the

Rough Set approach not only produced more rules for the identification of the target

concept, but also a higher percentage of unambiguous ones.

The other question concerns the relative amount of effort expended to produce

an uncertain conclusion. In the decision system produced using Rough Sets the

inconclusive rules have fewer conditions to be checked than for those from the Decision

Trees approach. Specifically, the inclusive Rough Set rules have 2 and 4 conditions

respectively for a total of 7 conditions, one of which is shared between them (land

use type = Class I). In the Decision Tree system 8 conditions must be checked in the

two inconclusive rules for a total of 16. However, each shares the same 8 so that the

total number of unique conditions to be tested is 8 as opposed to 6 for the Rough

45

Set approach. More effort must then be expended in order to check the inconclusive

rules in the Decision Tree approach as opposed to that for Rough Sets.

Since both approaches extracted the same set of condition variables, the differences

are likely to reflect the impact that

impact of heuristic knowledge discovery techniques on ...alazar/pdf/adiss.pdf · impact of...

Documents