impact of heuristic knowledge discovery techniques on ...alazar/pdf/adiss.pdf · impact of...

189
IMPACT OF HEURISTIC KNOWLEDGE DISCOVERY TECHNIQUES ON MULTI-AGENT SIMULATION OF CULTURAL EVOLUTION by ALINA LAZAR DISSERTATION Submitted to the Graduate School of Wayne State University, Detroit, Michigan in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY 2002 MAJOR: COMPUTER SCIENCE Approved by: Advisor Date

Upload: hoanghanh

Post on 29-Apr-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

  • IMPACT OF HEURISTIC KNOWLEDGE DISCOVERY TECHNIQUESON MULTI-AGENT SIMULATION OF CULTURAL EVOLUTION

    by

    ALINA LAZAR

    DISSERTATION

    Submitted to the Graduate School

    of Wayne State University,

    Detroit, Michigan

    in partial fulfillment of the requirements

    for the degree of

    DOCTOR OF PHILOSOPHY

    2002

    MAJOR: COMPUTER SCIENCE

    Approved by:

    Advisor Date

  • c COPYRIGHT BY

    ALINA LAZAR

    2002

    All Rights Reserved

  • DEDICATION

    To Mircea and to my parents.

    ii

  • ACKNOWLEDGMENTS

    I would like to offer my thanks to my advisor Professor Robert Reynolds for his

    continuing support and guidance. It has been a privilege to have him as professor as

    well as thesis advisor.

    Most of this research was supported by NSF IIS Grant ]9907257.

    Appreciation goes to all my dissertation committee members: Dr. William Grosky,

    Dr. George Yin and Dr. Sherali Zeadally for accepting this task.

    Thanks to the Department of Computer Science, Wayne State University for giv-

    ing me the opportunity to study here and make this work possible.

    Finally, I wish to express my grateful acknowledgments to the professors in the

    Department of Mathematics and Computer Science, West Univeristy of Timisoara.

    They taught me the background in computer science and mathematics during my

    undergraduate studies and encouraged me to pursue a doctoral degree.

    iii

  • TABEL OF CONTENTS

    Chapter Page

    DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

    ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

    CHAPTERS

    CHAPTER - 1 INTODUCTION . . . . . . . . . . . . . . . . . 1

    SECTION - 1.1 Simulating Culture in Algorithms . . . . . . . . 1

    SECTION - 1.2 Archaeological Knowledge Discovery Problem . 2

    SECTION - 1.3 Heuristics . . . . . . . . . . . . . . . . . . . . . 3

    SECTION - 1.4 Multi-agent Based Simulation . . . . . . . . . . 6

    CHAPTER - 2 EVOLUTIONARY LEARNING UNDER UNCER-

    TAINTY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    SECTION - 2.1 Uncertainty . . . . . . . . . . . . . . . . . . . . 8

    SUBSECTION - 2.1.1 Fuzzy Sets . . . . . . . . . . . . . . . . . 9

    SUBSECTION - 2.1.2 Rough Sets . . . . . . . . . . . . . . . . . 10

    SECTION - 2.2 Evolutionary Computation . . . . . . . . . . . . 10

    CHAPTER - 3 EXTRACTING PATTERNS FROM ARCHAE-

    OLOGICAL DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    SECTION - 3.1 Data Material . . . . . . . . . . . . . . . . . . . 13

    SECTION - 3.2 Large-Scale Data . . . . . . . . . . . . . . . . . 15

    iv

  • SECTION - 3.3 Uncertainty in Data . . . . . . . . . . . . . . . . 15

    SECTION - 3.4 Redundant Data . . . . . . . . . . . . . . . . . 17

    SECTION - 3.5 Dynamic Data . . . . . . . . . . . . . . . . . . . 18

    SECTION - 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . 18

    CHAPTER - 4 ROUGH SETS FORMALIZATION . . . . . . . 19

    SECTION - 4.1 Formal Definitions and Properties . . . . . . . . 19

    SECTION - 4.2 Discernibility Matrix . . . . . . . . . . . . . . . 21

    SECTION - 4.3 Indiscernibility Relations . . . . . . . . . . . . . 21

    SECTION - 4.4 Rough Sets . . . . . . . . . . . . . . . . . . . . 22

    SECTION - 4.5 Reducts and the Core of Knowledge . . . . . . . 23

    SECTION - 4.6 Reducts, Hitting Sets, Approximate Hitting Sets 25

    SECTION - 4.7 Decision Rules, Decision Systems . . . . . . . . 26

    SECTION - 4.8 Decision System Construction . . . . . . . . . . 27

    SECTION - 4.9 Steps for Model Construction . . . . . . . . . . 30

    CHAPTER - 5 EVOLUTION-BASED LEARNING OF ONTO-

    LOGICAL KNOWLEDGE FOR A LARGE-SCALE MULTI-AGENT

    SIMULATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    SECTION - 5.1 Introduction . . . . . . . . . . . . . . . . . . . . 33

    SECTION - 5.2 Decision Tree . . . . . . . . . . . . . . . . . . . 37

    SECTION - 5.3 Decision Trees and Rough Set Rules . . . . . . . 40

    SECTION - 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . 45

    SECTION - 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . 47

    CHAPTER - 6 RULE SET GENERATION FOR ALL REGIONS

    AND PERIODS OF THE VALLEY . . . . . . . . . . . . . . . . . . . 49

    SECTION - 6.1 Introduction . . . . . . . . . . . . . . . . . . . . 49

    SECTION - 6.2 Reducts and Their Rules for the Tierras Largas

    Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    v

  • SECTION - 6.3 San Jose Phase Reducts and Related Rules . . . 54

    SECTION - 6.4 Reducts for the Guadalupe Phase and Related

    Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    SECTION - 6.5 Reducts and related rules for the Rosario Phase 55

    SECTION - 6.6 Reducts and Related Rules for Monte Alban Ia 58

    SECTION - 6.7 Reducts and Related Rules of Monte Alban Late I 64

    SECTION - 6.8 Monte Alban II Reducts and Related Rules . . . 67

    SECTION - 6.9 Monte Alban IIIa reducts and Related Rules . . 71

    SECTION - 6.10 Comparison of the Rough Set Results with Deci-

    sion Trees Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    SECTION - 6.11 Conclusions . . . . . . . . . . . . . . . . . . . . 77

    CHAPTER - 7 PROTOTYPE OF THE MODEL . . . . . . . . 78

    SECTION - 7.1 Introduction . . . . . . . . . . . . . . . . . . . . 78

    SECTION - 7.2 Modeling the Environment in the Valley of Oaxaca 82

    SECTION - 7.3 Modeling the Impact of Changing Social Rela-

    tions on Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    SECTION - 7.4 Acquiring the Rules Used to Constrain Agent Be-

    haviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    SECTION - 7.5 Modeling Agent Interaction Using Evolutionary

    Game Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    SECTION - 7.6 Conclusions . . . . . . . . . . . . . . . . . . . . 91

    CHAPTER - 8 THE IMPACT OF RULE ONTOLOGIES ON

    MODEL PERFORMANCE . . . . . . . . . . . . . . . . . . . . . . . 92

    SECTION - 8.1 Introduction . . . . . . . . . . . . . . . . . . . . 92

    SECTION - 8.2 Experimental Setup . . . . . . . . . . . . . . . . 93

    SECTION - 8.3 Simulating Social Evolution from Tierras Largas

    Phase into San Jose Phase . . . . . . . . . . . . . . . . . . . . . . . . 93

    vi

  • SECTION - 8.4 San Jose to Guadalupe Results . . . . . . . . . 101

    SECTION - 8.5 Simulating from Guadalupe into Rosario . . . . 109

    SECTION - 8.6 Simulation of Rosario into Monte Alban I . . . . 110

    SECTION - 8.7 Simulating social evolution from Monte Alban Ia

    to Monte Alban Ib . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

    SECTION - 8.8 Simulation from Monte Alban Ib to Monte Alban II125

    SECTION - 8.9 Conclusions . . . . . . . . . . . . . . . . . . . . 133

    CHAPTER - 9 CONCLUSION AND FUTURE RESEARCH . . 138

    SECTION - 9.1 Summary and Conclusions . . . . . . . . . . . . 138

    SECTION - 9.2 Future Work . . . . . . . . . . . . . . . . . . . . 140

    APPENDICIES

    APPENDIX - A DECISION TREE RULES . . . . . . . . . . . 141

    APPENDIX - B ROUGH SET REDUCTS . . . . . . . . . . . . 146

    APPENDIX - C ROUGH SET RULES . . . . . . . . . . . . . . 151

    BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

    ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

    AUTOBIOGRAPHICAL STATEMENT . . . . . . . . . . . . . . . . . . . 176

    vii

  • LIST OF TABLES

    4.1 Example of Decision Table . . . . . . . . . . . . . . . . . . . . . . . . 19

    4.2 Full Discernibility Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4.3 Core Discernibility Matrix . . . . . . . . . . . . . . . . . . . . . . . . 24

    4.4 Two Minimal Discernibility Matrix . . . . . . . . . . . . . . . . . . . 25

    4.5 Exhaustive Decision System . . . . . . . . . . . . . . . . . . . . . . . 28

    4.6 Exhaustive Decision System . . . . . . . . . . . . . . . . . . . . . . . 29

    5.1 Number of Terminal Nodes for each Phase . . . . . . . . . . . . . . . 38

    5.2 Decision Rule Set Induced from the Decision Tree . . . . . . . . . . . 40

    5.3 A Comparison of Decision Trees versus Rough Sets . . . . . . . . . . 42

    5.4 First Exhaustive Decision System for the Rosario Phase in Etla Region 43

    5.5 Second Exhaustive Decision System for the Rosario Phase in Etla Region 44

    5.6 A Comparison of the Rules Produced by Using Decision Trees (DT)

    and Rough Set (RS) Constraint Representations. . . . . . . . . . . . 47

    6.1 Reducts for Each of the First Three Phases in the Etla Region . . . . 52

    6.2 Reducts for Each of the Last Five Phases in the Etla Region . . . . . 52

    6.3 Reducts for Each of the Eight Phases in the Central Valley Region . . 52

    6.4 Reducts for Each of the Eight Phases in the Valle Grande Region . . 53

    6.5 Reducts for Each of the Eight Phases in the Tlacolula Region . . . . 53

    6.6 Reducts for Each of the Eight Phases in the Ocatlan Region . . . . . 53

    6.7 Comparison between DT and RS in Terms of Rules Length . . . . . 75

    6.8 Comparison between DT and RS in Terms of Average Rules Length . 75

    viii

  • 6.9 Sensitivity and Specificity for DT and RS . . . . . . . . . . . . . . . . 76

    7.1 Environmental Variables . . . . . . . . . . . . . . . . . . . . . . . . . 85

    7.2 Land Use Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    7.3 Payoff Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    ix

  • LIST OF FIGURES

    1.1 Knowledge Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    3.1 Complexity of the Data . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.2 A Description of the Oaxaca Valley . . . . . . . . . . . . . . . . . . . 14

    3.3 Visual Representation of Rough Sets . . . . . . . . . . . . . . . . . . 16

    3.4 Visual Representation of Fuzzy Sets . . . . . . . . . . . . . . . . . . . 17

    4.1 Example of Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    4.2 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    4.3 Model Construction Phases . . . . . . . . . . . . . . . . . . . . . . . 32

    5.1 The decision tree for the locations of sites with evidence of raiding in

    Rosario phase in the Etla region of the valley . . . . . . . . . . . . . . 39

    7.1 Oaxaca Valley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    7.2 Catchment Area for Sites in The Subgrid 7 . . . . . . . . . . . . . . . 84

    8.1 Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . . . 94

    8.2 DT Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . 95

    8.3 RS Occupied Cells; Phase San Jose . . . . . . . . . . . . . . . . . . . 96

    8.4 Real Sites; Phase San Jose . . . . . . . . . . . . . . . . . . . . . . . . 97

    8.5 Relationships Network; Phase San Jose . . . . . . . . . . . . . . . . . 98

    8.6 DT Relationships Network; Phase San Jose . . . . . . . . . . . . . . . 99

    8.7 RS Relationships Network; Phase San Jose . . . . . . . . . . . . . . . 100

    8.8 Occupied Cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . . . 102

    8.9 DT Occupied Cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . 103

    x

  • 8.10 RS Occupied cells; Phase Guadalupe . . . . . . . . . . . . . . . . . . 104

    8.11 Real Sites; Phase Guadalupe . . . . . . . . . . . . . . . . . . . . . . . 105

    8.12 Relationships Network; Phase Guadalupe . . . . . . . . . . . . . . . . 106

    8.13 DT Relationships Network; Phase Guadalupe . . . . . . . . . . . . . 107

    8.14 RS Relationships Network; Phase Guadalupe . . . . . . . . . . . . . . 108

    8.15 Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . . . . 110

    8.16 DT Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . 111

    8.17 RS Occupied Cells; Phase Rosario . . . . . . . . . . . . . . . . . . . . 112

    8.18 Real Sites; Phase Rosario . . . . . . . . . . . . . . . . . . . . . . . . . 113

    8.19 Relationships Network; Phase Rosario . . . . . . . . . . . . . . . . . . 114

    8.20 DT Relationships Network; Phase Rosario . . . . . . . . . . . . . . . 115

    8.21 RS Relationships Network; Phase Rosario . . . . . . . . . . . . . . . 116

    8.22 Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . . . 118

    8.23 DT Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . 119

    8.24 RS Occupied Cells; Phase Monte Alban Early I . . . . . . . . . . . . 120

    8.25 Real Sites; Phase Monte Alban Early I . . . . . . . . . . . . . . . . . 121

    8.26 Relationship Network; Phase Monte Alban Early I . . . . . . . . . . . 122

    8.27 DT Relationship Network; Phase Monte Alban Early I . . . . . . . . 123

    8.28 RS Relationship Network; Phase Monte Alban Early I . . . . . . . . . 124

    8.29 Relationship Network; Phase Monte Alban Late I . . . . . . . . . . . 126

    8.30 DT Relationship Network - Big Sites; Phase Monte Alban Early I . . 127

    8.31 RS Relationship Network - Big Sites; Phase Monte Alban Early I . . 128

    8.32 DT Relationship Network - Monte Alban; Phase Monte Alban Early I 129

    8.33 RS Relationship Network - Monte Alban; Phase Monte Alban Early I 130

    8.34 DT Relationship Network - Big Sites; Phase Monte Alban Late I . . . 131

    8.35 RS Relationship Network - Big Sites; Phase Monte Alban Late I . . . 132

    8.36 DT Relationship Network - Monte Alban; Phase Monte Alban II . . . 133

    xi

  • 8.37 RS Relationship Network - Monte Alban; Phase Monte Alban II . . . 134

    8.38 DT Relationship Network - Big Sites; Phase Monte Alban II . . . . . 135

    8.39 RS Relationship Network - Big Sites; Phase Monte Alban II . . . . . 136

    xii

  • 1

    CHAPTER 1

    INTODUCTION

    1.1 Simulating Culture in Algorithms

    Culture is a complex system of general customs, ideas, values and beliefs, of a

    particular civilization, society or social group, especially at a particular time. In

    human societies, culture can be viewed as a system which stores information generated

    by groups of one or more individuals, in a symbolic form that is potentially accessible

    to all the members of the society. This model of a cultural system is helpful in

    guiding the problem solving activities and the social interactions of the societys

    members. Knowledge such as general customs, ideas, values, and beliefs are stored

    and manipulated at a symbolic level as opposed to a genetic level. This knowledge

    can be used to direct and control the self-adaptation and the evolution of the social

    population. Thus, cultural systems are viewed as a multiple-inheritance system where,

    at each time step, knowledge at both the population level and the symbolic or belief

    level is transmitted to the next generation.

    Cultural Algorithms have been developed by Reynolds [37] in order to model

    evolution in systems upon principles of human social evolution. Cultural Algorithms,

    are population-based algorithms, and provide a framework for the integration of an

    evolution-based search engine (i.e. genetic algorithm), with a knowledge repository or

    belief space. The goal of the coevolutionary search engine is to identify individuals in

    a population relevant to the problem at hand. Next, relevant knowledge is extracted

    from these individuals and reasoned about in the belief space. This knowledge is then

    used to guide the individuals in the next time step.

    Basic application areas to which cultural algorithms have been applied include re-

    source scheduling, evolution of agriculture, concept learning applications, real-valued

  • 2

    function optimization, genetic programming, fraud detection and recently data min-

    ing and knowledge discovery, and modeling dynamic environments.

    1.2 Archaeological Knowledge Discovery Problem

    Anthropologists interested in ancient societies of Highland Mesoamerica, Valley

    of Oaxaca, have used intensive archaeological survey in order to study the state

    formation. Since these archaeological surveys were begun in the 1960s, the computer

    was an essential tool because of the large quantity of data resulting from the surveys.

    After the data was collected, it was placed on punch cards and the additional results

    published in several books [4, 5, 19] along with extensive site maps. The reason behind

    this archaeological survey was to find answers to the following questions: What were

    the characteristics of Mesoamerican agricultural systems? What role did hydraulic

    agriculture play in prompting or facilitating the growth of large population centers?

    When was irrigation first introduced? What was the nature of these population

    centers? When and where did urbanism first arise? What decision making structures

    and adaptations were necessary to facilitate these changes? [5].

    Our goal for the proposed research is to integrate evolutionary learning tools into

    the knowledge discovery process and to apply them to the large-scale, archaeological

    spatial-temporal data produced by the surveys. This heuristic based approach used

    here will employ rough set concepts in order to represent the domain knowledge and

    the hypotheses.

    While answers to the questions above can possibly be found by investigating the

    large-scale database resulting from the archaeological survey, this database contains

    over 2700 regional sites and over 2000 residential sites at the Monte Alban urban

    center. Each site is comprised of one or more components and can be occupied

    in one or more archaeological periods, spanning a period from approximately 9000

    B.C. to 1500 A.C. Thus, the total spatial and temporal scope is so vast as to make

  • 3

    manual interpretation difficult if not impossible task. In addition, each temporal and

    spatial instance of a site component can be described in terms of several hundred

    variables of differing types. We can clearly see a gap between data generation and

    data understanding here. Tools and techniques from Artificial Intelligence can be

    used to fill this gap and aid in the extraction of emergent patterns hidden in the

    data, as is shown by Reynolds [35, 38, 39].

    Classical tools for generating database queries are good when you know what to

    look for, and can answer questions of type How many sites have ceramics from a given

    time period and where are their locations? The goal here is to interrogate this data

    in order to answer questions about archaic systems, systems that existed thousands

    of years ago. To answer these questions we can use a data driven search. Each record

    in the data set is accessed sequentially and associated with a node in the search

    space. Data driven searches minimize data accesses and can result in time efficient

    queries solving, but they are not good for complicated questions, like the ones above,

    questions that require the reasoning about spatial and temporal patterns. It may

    require exhaustive search in the worst case. To be able to deal with such advanced

    queries, more intelligent search tools are needed. The problem now is how to find the

    right intelligent search strategy in order to reach our goal.

    1.3 Heuristics

    Uninformed or blind search, which processes and evaluates all nodes of a search

    space in the worst case, is not realistic here because of time constraints are close

    related to the dimension of the data. Generally, the search space increases exponen-

    tially with problem size thereby limiting the size of problems which can realistically

    be solved using exact techniques such as exhaustive search. An alternative solution

    is represented by heuristic techniques, which can provide much help in areas where

    classical search methods failed.

  • 4

    The word heuristic comes from greek and means to know, to find, to

    discover or to guide a investigation. Specifically, Heuristics are techniques which

    seek good (near-optimal) solutions at a reasonable computational cost without being

    able to guarantee either feasibility or optimality, or even in many cases to state how

    close to optimality a particular feasible solution is. [42]

    Heuristic refers to any techniques that improves the average-case performance on

    a problem-solving task but does not necessarily improve the worst case performance.

    Heuristic techniques search the problem space intelligently using knowledge of pre-

    viously tried solutions to guide the search into fruitful areas of the search space.

    Often, search spaces are so large that only heuristic search can produce a solution in

    reasonable time. These techniques improve the efficiency of a search process, some-

    times by sacrificing the completeness or the optimality of the solution. Heuristics are

    estimates of the distance remaining to the goal, estimates computed based on the

    domain knowledge.

    The domain knowledge provides help to heuristics in guiding the search and can

    be represented in a variety of knowledge formats. These formats include patterns,

    networks, trees, graphs, version spaces, rule sets, equations, and contingency tables.

    With regard to heuristics there are a number of generic approaches such as greedy,

    A search, tabu search, simulating annealing, and population-based heuristics. The

    heuristic methods can be applied to a wide class of problems in optimization, classi-

    fication, statistics, recognition, planning and design.

    Of special interest is the integration of heuristic search principles with the dynamic

    processes in which data becomes available in successive stages, or where data and

    inputs are subjects to uncertainties or with large-scale data sets. The integration

    is a vehicle to generate data driven hypotheses. The process is shown in figure 1.1.

    Our goal is to generate hypotheses in terms of the archaeological data in order help

    anthropologists answer their questions.

  • 5

    Data

    Model Hypothesis

    Acceptable Solutions

    Heuristic Learning

    Testing

    Patterns

    Generating

    Dual Interaction

    Figure 1.1: Knowledge Discovery

    The kind of knowledge produced, and the heuristic search algorithm selected, will

    reflect the nature of the data analysis task. In this thesis, the hypotheses will be

    represented as sets of decision rules and the extracted rules will be represented in

    terms of rough sets. Rough sets were selected because of the nature of our data sets

    as we will discuss later, in chapter 3.

    From a mathematical point of view the problems that we want to solve, can

    be formulated in terms of the well known, minimal set cover problem, which is a

    combinatorial optimization problem.

    Traditional methods for combinatorial optimization problems are not appropriate

    here for several reasons. These methods are NP-hard in the worst case and would be

    costly to use given the size of the data set. Also, since archaeological efforts in the

    valley are ongoing, new sites can be added to the database which would require the

    traditional combinatorial approach to be restarted from scratch. The techniques used

    to solve these difficult optimization problems have slowly evolved from constructive

    methods, like uniformed search, to local search techniques and to population-based

    algorithms.

    Cultural Algorithms, as population-based algorithms, are good vehicles in which

  • 6

    to build meta-level heuristics to guide the search more efficiently. That knowledge,

    here we well use rough sets concepts, or rules, can be employed to direct the evolu-

    tionary search. The rules can reflect spatial and temporal patterns that will guide the

    generation of new candidate search objects by the evolutionary engine. The spatial

    and temporal continuity of the data will facilitate this process.

    1.4 Multi-agent Based Simulation

    After getting decision rules trough evolutionary techniques we want to use them

    in a multi-agent model based upon Cultural Algorithms that simulates the evolution

    of complex social systems in the Valley of Oaxaca, Mexico. Here, we operationalize

    the evolution without stages model of state evolution described by Marcus and

    Flannery [23]. Agents are based at sites and have plans for production and resource

    exchange with other agents. Agents interact via an evolving game matrix that begins

    as a prisoners dilemma model and can evolve to a wardens dilemma model.

    Cultural Algorithms are used to model agent learning as the environment evolves.

    The organization of the thesis is as follows. Chapter 2 briefly describes the evolu-

    tionary learning and ques a short literature review about related approaches. Chapter

    3 presents the archeological data sets and the data related problems. Chapter 4 is

    dedicated to the theory of Rough Sets. In Chapter 5 we make a comparison between

    the Decision Tree approach and Rough Set approach in building a set of rules for a

    multi-agent simulation. Chapter 6 presents the Rough Set rules for all five regions of

    the Valley using the environmental variables, including land use type. While the site

    data identified just 75 sites as targets of warfare, both decision trees and Rough Sets

    generalized on this data to identify 987 and 242 sites respectively that are predicted

    to be targets of warfare based upon their similarities to the known sites. Chapter 7

    describes the large-scale multi-agent simulation model. Chapter 8 describes the runs

    of the model using the Rough Set rules and the Decision Tree rules to see the semantic

  • 7

    impact of the two different rule sets. The resultant set of association between sites

    in terms of trade and warfare demonstrated that the Rough Set rules produced a hi-

    erarchical structure of associations that was a better fit to the hierarchical structure

    observed in the data than that for Decision Trees. It is suggested that lack of consid-

    eration of uncertainty in the generation of decision trees led to an overgeneralization

    of the rules that describe warfare activities. Chapter 9 concludes with a discussion

    of the impact that the rules used determine agent interaction and expertise have on

    the emergence of phenomenon in multi-agent systems. And the impact that the rule

    learning process has on the rules formed. It is suggested that based upon this model,

    warfare is an important device for the structuring of emergent complexity in the social

    organization of the valley

  • 8

    CHAPTER 2

    EVOLUTIONARY LEARNING UNDER UNCERTAINTY

    Population-based heuristic methods are iterative solution techniques that handle

    a population of individuals which are evolving according to a given search strat-

    egy. At each iteration, periods of self-adaptation (mutations) alternate with periods

    of cooperation (crossover), and periods of competition (selection). The population-

    based heuristic search [8] is dependent of the following components: the knowledge

    representation for the specific problem we want to solve and the search strategy or

    the evolution process. The adaptability of an individual represents its ability to

    survive in an uncertain environment [7]. Artificial Intelligence researchers have ex-

    plored different ways to represent uncertainty [42]: belief networks, default reasoning,

    Dempster-Shafer theory, Fuzzy Set theory, Rough Set theory.

    For the problems we want to solve, the learning task will require a representation

    that explicitly deals with uncertainty. The evolutionary learning methods that are

    employed must be able to work with such a representation. In this chapter we look

    first at basic ways to represent uncertainty in developing rules. And, then we will

    investigate how that uncertain knowledge can be used to direct evolutionary search

    and learning.

    2.1 Uncertainty

    Uncertainty, as well as evolution, is a part of nature. When humans describe

    complex environments, they use linguistic descriptors of cognised real-world circum-

    stances, that are often not precise, but rather fuzzy. The theory of fuzzy sets

    [55] provides an effective method of describing the behavior of a system which is too

    complex to be handle with the classical precise mathematical analysis. The theory

  • 9

    of rough sets [33] emerged as another mathematical approach for dealing with uncer-

    tainty that arises from inexact, noisy or incomplete information. Fuzzy Set theory

    assumes that the membership of the objects in some set is defined as a degree rang-

    ing over the interval [0, 1]. Rough set theory focuses on the ambiguity caused by the

    limited distinction between objects in a given domain.

    2.1.1 Fuzzy Sets

    Fuzzy Sets have been employed to represent rules generated by evolutionary learn-

    ing systems.

    Using fuzzy concepts, Valenzuela-Rendon [50] tried to overcome the limitations

    of the conventional rule-based classifier system [14] when representing continuous

    variables. He used fuzzy logic to represent the results of the genetic-based search of

    the classifier system.

    Likewise, fuzzy functions have been used to describe and update knowledge in

    Cultural Algorithms. First, Reynolds et al. [40] employed a fuzzy acceptance and

    influence function in the solution of real-valued constrained optimization problems.

    Following the same idea Zhu designed a fully fuzzy Cultural Algorithm [57] which

    included a fuzzy knowledge representation scheme in order to deal with the continuous

    variables [56] in the belief space, as well as a fuzzy acceptance and influence function.

    All these approaches were tested on real-values function optimization problems. More

    recently, Jin [16] used a fuzzy knowledge representation for normative knowledge in

    the belief space of Cultural Algorithms, to solve the real-valued constrained function

    optimization.

    The design of a fuzzy representation system is not an easy job, because of the

    membership functions should be carefully chosen, and the procedures that use these

    functions should specified precisely. The problem is to optimize the fuzzy membership

    functions for a problem and to find optimum plans related to the fuzzy performance

  • 10

    measures. It is natural approach to use heuristics (i.e. evolutionary algorithms) to

    solve this task.

    2.1.2 Rough Sets

    Another approach to represent uncertainty is with Rough Sets. Rough Sets are

    based on equivalence relations and set approximations, and the algorithms for com-

    puting rough set properties are combinatorial in nature. Wroblewski [53] implemented

    a Genetic Algorithms for computing reducts, based on permutation code as well as a

    greedy algorithm. Another approach for building reducts is described by Vinterbo

    in [51, 52] and it is based on the set cover problem, in particular on finding minimal

    hitting sets using a classical Genetic Algorithm. Finding a minimal set of decision

    rules or a satisfactory set is an NP-complete problem. Agotnes [1, 2] used a Genetic

    Algorithms to build a optimal set of decision rules, where the fitness function was

    based on the quality of each rule. In conclusion, there are many hybrid methods that

    integrate evolutionary algorithms and other methods from soft computing, methods

    such as Rough Sets.

    2.2 Evolutionary Computation

    Evolution can be defined in one word, adaptation in an uncertain environment.

    Nature has a robust way of dealing with the adaptation of organisms to all kind of

    changes and to evolve successful organisms. According to the principles of natural

    selection, the organisms that have a good performance in a given environment, sur-

    vive and reproduce, whereas the others die off. After reproduction, a new generation

    of offspring, derived from the members of the previous generation is formed. The

    selection of parents from these offspring is often based upon fitness. Changes in the

    environment will affect the population of organisms through the random mutations.

    Mayr said that Evolution is a dynamic, two-step process of random variation and se-

  • 11

    lection [11]. Using examples from natural systems and theories of adaptive behavior

    researchers have been trying to build heuristic evolutionary learning systems.

    Evolutionary algorithms are heuristic optimization methods inspired from natu-

    ral evolution processes. Currently there are three basic population-only mechanisms

    that model evolution: genetic algorithms, evolutionary strategies and evolutionary

    programming. Each of the methods, models the evolution of a population of indi-

    viduals at a different scale and applies selection and reproduction operators to find

    an individual that is fit with regard of the fitness function. The Genetic Algorithm

    models evolution at the gene scale, but evolutionary strategies and evolutionary pro-

    gramming, model evolution at the species level.

    The Cultural Algorithms [37] approach adds another level to the evolutionary

    process inspired from the human societies and cultural evolution. It adds to the

    population space, a belief space. The belief space will be a collection of symbolic

    knowledge that will be used to guide the evolution of the population.

    These evolutionary learning approaches have been used to perform the general

    rule extraction task that we are interested in. For example, Holland [15] proposed the

    learning classifier system that use genetic search as the primary discovery heuristic for

    rules. Classifier systems are inductive models capable of learning, in a reinforcement

    learning regime, by using the evolutionary process schema. The reward is given to

    individual rules not to a complete rule set.

    Besides the rule based methods, Decision Trees are well known for their inductive

    learning capabilities. Any decision tree can be reformulated as a set of rules. One

    of the problems related to the Decision Trees is finding the smallest decision tree.

    Simple heuristics can solve the problem. Researchers have tried to integrate Genetic

    Algorithms with Decision Tree learning in order to solve complex classification prob-

    lems [3, 38, 39]. Bala et al.[3, 49] applied the above methodology for difficult visual

    recognition problems involving satellite and facial image data. Other researchers com-

  • 12

    bined the Genetic Algorithms or Evolutionary Strategies with neural networks. Yang

    et al.[54] used a Genetic Algorithm for feature subset selection problem for neural

    network classifiers.

    Reynolds et al. [35, 38, 39] investigated the use of Cultural Algorithms to guide

    decision tree learning. The data was taken from a real world archeological database,

    with a collection of sites found in Valley of Oaxaca, Mexico. The problem was to

    localize the sites that present evidence of warfare as opposed with those that did not.

    Cultural Algorithms were also used to generate rule-based classifiers for the fraud

    detection problem [44]. The task is hard because of the dynamic environments, but

    the Cultural Algorithms provide self-adaptive capabilities to deal with the dynamic

    changes.

  • 13

    CHAPTER 3

    EXTRACTING PATTERNS FROM ARCHAEOLOGICAL DATA

    Here we will work with the archeological data from [19], a survey of Valley of

    Oaxaca, Mexico. This volumes describe and analyze the data collected during the

    the Valley of Oaxaca Settlement Project. The project consisted of a detailed mapping

    and surface collection of the regions sites plus the major archaeological site, Monte

    Alban, part of the Oaxaca valley as is shown in figure 3.1.

    Oaxa

    Valley

    Region

    ValleyVall

    Grande Tlacolula Ocatlan

    Squares

    Sites

    Components

    ... ...

    ... ...

    ... ...

    Figure 3.1: Complexity of the Data

    3.1 Data Material

    In Valley of Oaxaca the basic descriptive unit is the site, of which there are around

    2700 in an area of about 1700 km2. The detailed description for each terrace con-

  • 14

    Figure 3.2: A Description of the Oaxaca Valley

  • 15

    tains 100 attributes. This data format was obtained after scanning the data from

    [19]. First, a series of maps were drawn that show Valley of Oaxaca occupation at

    various periods of time by plotting those categories specific to each period. Based

    on these maps, population estimates were made for each period. Second, patterns

    of differential distribution of ceramic categories were illustrated, within the site by

    plotting their densities of occurrence by site division. Little statistical analysis [5],

    [19] or data analysis was done previously. In our project the terrace data and the

    ceramic categories are put together for the first time in order to discover the reasons

    for growth and decline of Valley of Oaxaca and Monte Alban over time.

    3.2 Large-Scale Data

    One of the most important problems in data analysis relates to the dimensionality

    of the data, because many data analysis techniques involve exhaustive search over

    the object space. They are very sensitive to the size of the data in terms of time

    complexity and it is hard to generate compact rules. The solution is to reduce the

    search space horizontally (in terms of records or objects) and vertically (in terms of

    fields or attributes or variables), and to use heuristics to guide the search through the

    large space of possible combinations of attributes values and classes. Our data set,

    for example, contains over 6000 records and over 100 attributes.

    3.3 Uncertainty in Data

    Uncertainty in a data set can appear for different reasons. One reason is noise.

    Errors which can occur during data collection or data entry are referred as noise in

    the data. It is also possible that the data set can have missing attribute values. In

    this case, the objects containing missing attributes values can be discarded or the

    missing values can be replaced with the most common values. Another problem is

    that the available knowledge in many situations is incomplete and imprecise. This

  • 16

    means that sometimes the attribute values for a set of objects are not sufficient and

    precise enough to differentiate between classes of objects. When we are taking about

    the Oaxaca Data Set, errors and noise may have occurred for many reasons. The

    ancient sites are damaged because plowing, erosion, pot hunting and grazing. Also,

    human perception is subjective, and many people worked on the collection of the

    data. Some errors are possible due to the scanning process since much of the data

    was available from printed text only.

    Many different ways of representing and reasoning about uncertainty have been

    developed in Artificial Intelligence. These theories includes: belief networks, non-

    monotonic logic, Fuzzy Sets along with Fuzzy Logic and Rough Sets. The well-known

    Fuzzy Set theory [55] characterizes a concept approximately using a set membership

    function with a range of values between 0 and 1. Another approach based on the rough

    set theory [33] provides a lower and upper approximation in terms of set belonging

    of a concept depending on how the relationship between two partitions of a finite

    universe is defined.

    Figure 3.3: Visual Representation of Rough Sets

    Fuzzy Sets are good approaches for problems with multiple membership grade

    requirements (figure 3.4), where judgment on set membership grades is possible and

  • 17

    Figure 3.4: Visual Representation of Fuzzy Sets

    where the ability to deal with vague predicates is required. They are very good for

    real-valued data. On the other hand, Rough Sets with the three-valued simplicity,

    lower, upper, and boundary approximation sets (figure 3.3), work well on discrete and

    categorical data. Rough Sets can be useful even with missing data, changes of scale,

    and problems where membership grades are hard to define, and problems requiring

    changes in the partition. Checking the attributes table for the Oaxaca Data Set we

    can see that out of all attributes only very few attributes are integer in type with

    no real-valued data types. All the other attributes are of the categorical data type,

    nominal, ordinal, or binary. We want to find which sites where occupied in each

    period of time, so we have to deal with data partitions. These facts suggest here us

    that rough sets methods are more appropriate here.

    3.4 Redundant Data

    In addition to incomplete data, a data set may contain redundant or insignificant

    attributes with respect to the problem, or variables that are obscure, and have non

    interesting relationships. This case might arise in several situations, for example,

    redundant attributes may result from combining relational data tables. Solution to

    this problem exists, in the form of feature selection algorithms, such as the reduct

    computation in the rough sets case. After discharging the irrelevant attributes we

  • 18

    may have horizontal or object related redundancy. This can be solved by applying

    horizontal pruning methods, or merging identical objects.

    For our data set, looking in the attributes table, we can see first that not all of the

    attributes are relevant to our problem. Fields like Recorder, Published information,

    and Written comments are useful for detecting noise, but are less important for the

    overall analysis and can be removed after the preliminary stage. Secondly, we will

    apply the reduct algorithm in order to find a subset of significant attributes.

    3.5 Dynamic Data

    A fundamental characteristic of real world data sets is that they are dynamic,

    which means that their contents are often changing over time. There are two impor-

    tant aspects of this problems. First, the run time efficiency of the knowledge model

    becomes very important. Second, the knowledge model will no longer be static, but

    should have the capability of evolving as data changes over time. Our data sets change

    when new sites or ceramic types are discovered. One important characteristic of our

    data sets is their temporal and spatial dimensions. The time periods span between

    1400 B.C. to 500 A.D.. Spatially speaking, we work with the Valley of Oaxaca data,

    1700 km2. In order to find the changes between different periods of time our models

    have to change, and we will gain insights of what a dynamic data set is.

    3.6 Conclusion

    In this chapter a motivation for explicitly representing uncertainty in our knowl-

    edge base was presented. The advantages of a Rough Set approach for representing

    uncertain knowledge were presented. Then, a methodological framework for applying

    rough sets to this data set was discussed. In the next section we describe the formal

    properties of rough sets.

  • 19

    CHAPTER 4

    ROUGH SETS FORMALIZATION

    Pawlak [33] introduced Rough Set theory in the early 1980s as a tool for rep-

    resenting imprecise or uncertain information, and for reasoning about it. Based on

    the notion of indiscernability, rough set theory deals with the approximation of sets,

    using equivalence relations. These approximations can form model hypotheses. Many

    different applications can be found in the literature, but here we focus on the appli-

    cations to the classification problem, in order to answer the questions posed in the

    first chapter.

    4.1 Formal Definitions and Properties

    c1 c2 c3 c4 c5 d

    o1 3 2 2 0 2 0o2 1 3 3 1 2 0o3 0 1 0 1 2 0o4 2 1 3 0 0 0o5 3 2 2 0 1 0o6 1 1 2 0 2 0o7 3 0 0 3 2 0o8 0 0 3 2 1 0o9 2 1 1 1 2 0o10 0 3 2 0 0 0o11 2 0 2 1 2 1o12 1 3 2 2 2 1o13 0 1 1 0 2 1o14 3 1 0 1 2 1o15 3 3 0 2 2 1o16 1 2 3 2 0 1o17 2 1 3 2 2 1o18 1 2 3 1 0 1o19 2 2 1 1 2 1o20 2 1 1 1 2 1

    Table 4.1: Example of Decision Table

    An information system can be defined as a pair S = (U,A), where U is a finite

    set of objects and A is a finite set of attributes. Each attribute a A is a function

  • 20

    that maps elements of U into a set Va called the attribute domain, of attribute a,

    a : U Va (4.1)

    Let S = (U,A) be an information system and let C,D A be two subsets ofattributes, called the condition and the decision attributes respectively. A condition

    attribute is an attribute that is thought to influence another attribute, the decision at-

    tribute. An information system with distinguished conditions and decision attributes

    it is called decision table and it is denoted by T = (U,A,C,D). Because a table

    with more than one decision attribute can be easily transformed into a similar ta-

    ble with only one decision attribute, usually the set of decision attributes contains

    only one decision attribute, denoted D = {d}, and T = (U,C, {d}). The decisionattribute d determines a partition in the object space U . The partitions elements

    are named decision classes. For example in our example from figure 4.1, U is the

    set of objects in the decision table, U = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o11, o12,o13, o14, o15, o16, o17, o18, o19, o20}, C is the set of condition attributes relevant tothe current classification problem, C = {c1, c2, c3, c4, c5} and d is the variable whosevalues define the classification task.

    In this example we have two decision classes:

    D0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10}

    D1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}

    With every x U we associate a function dx, which gives the value from Vc fora given attribute c, dx : C Vc ([33]), such that dx = c(x), for every c C {d}.dx|C and dx|d are the restrictions of dx to C, respectively d.

    For every x, y U , we say that the object x is discernible if for every y, suchthat x 6= y, dx|C = dy|C implies dx|d = dy|d, otherwise the object is indiscernible.A decision table is consistent if all of the objects are discernible, otherwise it is

    inconsistent.

  • 21

    4.2 Discernibility Matrix

    A decision table T = (U,C, {d}), defines a matrix MdC called the discernibilitymatrix modulo decision attribute d [26]. For x, y U with the condition that x andy are from different decision classes dx|d 6= dy|d.

    MdC(x, y) = {c C|c(x) 6= c(y) and d(x) 6= d(y)} (4.2)

    Since objects are divided into decision classes, we do not have to discern between

    o1 o2 o3 o4 o5 o6 o7 o8 o9 o10

    o11 1,2,4 1,2,3 1,2,3 2,3,4,5 1,2,4,5 1,2,4 1,3,4 1,3,4,5 2,3 1,2,4,5o12 1,2,4 3,4 1,2,3,4 1,2,3,4,5 1,2,4,5 2,4 1,2,3,4 1,2,3,5 1,2,3,4 1,4,5o13 1,2,3 1,2,3,4 3,4 1,3,5 1,2,3,5 1,3 1,2,3,4 2,3,4,5 1,4 2,3,5o14 2,3,4 1,2,3 1 1,3,4 2,3,4,5 2,3,4,5 2,4 1,2,3,4,5 1,3 1,2,3,4,5o15 2,3,4 1,3,4 1,2,4 1,2,3,4,5 2,3,4,5 1,2,3,4 2,4 1,2,3,5 1,2,3,4 1,3,4,5o16 1,3,4,52,4,5 1,2,3,4,5 1,2,4 1,3,4,5 2,3,4,5 1,2,3,4,5 1,2,5 1,2,3,4,5 1,2,3,4o17 1,2,3,41,2,4 1,3,4 4,5 1,2,3,4,5 1,3,4 1,2,3,4 1,2,5 3,4 1,2,3,4,5o18 1,3,4,52,5 1,2,3,5 1,2,4 1,3,4,5 2,3,4,5 1,2,3,4,5 1,2,4,5 1,2,3,5 1,2,3,4o19 1,3,4 1,2,3 1,2,3 2,3,4,5 1,3,4,5 1,2,3,4 1,2,3,4 1,2,3,4,5 2 1,2,3,4,5o20 1,2,3,41,2,3 1,3 3,4,5 1,2,3,4,5 1,2,3,4 1,2,3,4 1,2,3,4,5 - 1,2,3,4

    Table 4.2: Full Discernibility Matrix

    objects that belong to the same class. Figure 4.2 shows the discernibility matrix

    for our decision table from figure 4.1. The entries in the table represent the set of

    condition attributes. For example for the row o11 and column o1, the entry 1, 2, 4

    identifies condition attributes set {c1, c2, c4}.

    4.3 Indiscernibility Relations

    A discernibility matrix MdC defines a binary relation RdC U2. The relation RdC

    is called an indiscernibility relation with respect to C and d, and reveals the pairs of

    objects from different classes that we cannot discern between. For x, y U underthe condition that x and y are from different decision classes, dx|d 6= dy|d,

    xRdCy MdC(x, y) = . (4.3)

  • 22

    RdC is an equivalence relation that has three basic properties:

    1. Reflexivity. For all x U , xRdCx.

    2. Symmetry. For all x, y U , xRdCy yRdCx.

    3. Transitivity. For all x, y, z U , if xRdCy and yRdCz xRdCz.

    The equivalence relation RdC , induces a partition over the universe U, meaning that

    the resultant equivalence classes are disjoint and the union equals the universe U.

    The indiscernibility set of an object x U is denoted as

    RdC(x) = {y U | xRdCy}. (4.4)

    In our example RdC(o9) = RdC(o20) = {o9, o20}.

    4.4 Rough Sets

    The idea behind Rough Sets is to approximate a set of interest in terms of other

    sets.

    With each subset X U and an equivalence relation RdC defined over U we canassociate two subsets:

    RdCX = {x U | RdC(x) X} (4.5)

    RdCX = {x U | RdC(x) X 6= } (4.6)

    are called the lower and upper approximations of X respectively. The set RdCX is

    the set of all elements of U which can be classified with certainty as elements of X,

    in terms of the equivalence relation Rdc . This set is also called the positive region

    of X. The set RdCX is the set of elements of U which can be classified, with some

    uncertainty, as elements of X. The set RdCX RdCX will be called the RdC boundaryof X.

    We consider two subsets D0, D1 U , D0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10}and D1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}, and the equivalence relation

  • 23

    Figure 4.1: Example of Rough Sets

    RdC over U defined from the decision table from table 4.1 and the discernability matrix

    from table 4.2. Then we compute the lower and the upper aproximations for the two

    subsets, D0, D1 (figure 4.1):

    RdCD0 = {o1, o2, o3, o4, o5, o6, o7, o8, o10}RdCD0 = {o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o20}RdCD1 = {o11, o12, o13, o14, o15, o16, o17, o18, o19}RdCD1 = {o9, o11, o12, o13, o14, o15, o16, o17, o18, o19, o20}RdCD0RdCD0 = RdCD1RdCD1 = {o9, o20}

    4.5 Reducts and the Core of Knowledge

    One problem is whether some of the attributes in a decision system are redundant

    with respect to the object classifications. If an attribute set B C preserves theindiscernibility relation, RdC , then the attributes that form the set C B are said tobe dispensable. All minimal subsets, in terms of size, of attributes B that preserve

    the relation RdC are called reducts and we denoted the set by Red(T ).

  • 24

    The intersection of all minimal subsets of attributes, from B is called the core of

    the attributes set C, and it is denoted by Core(C). The core can by defined in terms

    of the discernibility matrix as the set of all single element entries of the discernibility

    matrix.

    Core(C) = {c C||Mdc (x, y)| = 1 for some x, y} (4.7)

    In our case, looking in the table 4.2, we can conclude that Core(C) = {c1, c2}.

    o1 o2 o3 o4 o5 o6 o7 o8 o9 o10o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2o12 1,2 3,4 1,2 1,2 1,2 2 1,2 1,2 1,2 1o13 1,2 1,2 3,4 1 1,2 1 1,2 2 1 2o14 2 1,2 1 1 2 2 2 1,2 1 1,2o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o17 1,2 1,2 1 4,5 1,2 1 1,2 1,2 3,4 1,2o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2o20 1,2 1,2 1 3,4,5 1,2 1,2 1,2 1,2 - 1,2

    Table 4.3: Core Discernibility Matrix

    Now, we can define the full set of reducts in terms of the discernibility matrix.

    The set B, such that B C is the reduct of C if B is a minimal, with respect toinclusion, subset of C such that B MdC 6= for any nonempty MdC (MdC(x, y) 6= ).Looking at figure 4.3, it can be easily seen that there are two reducts in our example

    Red(T ) = {{c1, c2, c4}, {c1, c2, c3, c5}} shown in figure 4.4.Besides the full reducts defined above, we can define reducts that are relative to

    a particular object in the decision table. We call these reducts object-related reducts

    If indiscernibility is relative to an object x, two other objects y and z are considered

    to be indiscernible in comparison with x. Reducts that are related to a particular

    object x are called x-relative reducts, Red(T, x), since they contain the minimum

    information needed to select that particular object from other objects in the decision

    table.

  • 25

    o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o1 o2 o3 o4 o5 o6 o7 o8 o9 o10

    o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2 o11 1,2 1,2 1,2 2 1,2 1,2 1 1 2 1,2o12 1,2 4 1,2 1,2 1,2 2 1,2 1,2 1,2 1 o12 1,2 3 1,2 1,2 1,2 2 1,2 1,2 1,2 1o13 1,2 1,2 4 1 1,2 1 1,2 2 1 2 o13 1,2 1,2 3 1 1,2 1 1,2 2 1 2o14 2 1,2 1 1 2 2 2 1,2 1 1,2 o14 2 1,2 1 1 2 2 2 1,2 1 1,2o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1 o15 2 1 1,2 1,2 2 1,2 2 1,2 1,2 1o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2 o16 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o17 1,2 1,2 1 4 1,2 1 1,2 1,2 4 1,2 o17 1,2 1,2 1 5 1,2 1 1,2 1,2 3 1,2o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2 o18 1 2 1,2 1,2 1 2 1,2 1,2 1,2 1,2o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2 o19 1 1,2 1,2 2 1 1,2 1,2 1,2 2 1,2o20 1,2 1,2 1 4 1,2 1,2 1,2 1,2 - 1,2 o20 1,2 1,2 1 3,5 1,2 1,2 1,2 1,2 - 1,2

    Table 4.4: Two Minimal Discernibility Matrix

    There are several algorithms for computing reducts or reduct approximations.

    These algorithms assume that any attributes subset of C can be an approximation

    to a reduct. The Exhaustive Reducer algorithm [28, 29, 30] computes all the reducts

    by brute force, by exhaustive search. The algorithm takes exponential time in terms

    of the number of objects, so it is not suitable for very large decision systems as it

    may be very time consuming. Another algorithm is the Johnson Reducer [28, 29, 30]

    which invokes a simple greedy algorithm to compute only a single reduct. Because

    of the NP-completeness of the problem, heuristic methods can be more effective.

    Wroblewski [53] proposed a variation of a Genetic Algorithm to search for reducts,

    either until the search space is exhausted or until a given maximum number of reducts

    has been found. Another heuristic approach was proposed by Vinterbo [51, 52]. It is

    based on minimal hitting sets. We will present this method in the following section.

    4.6 Reducts, Hitting Sets, Approximate Hitting Sets

    Multisets are unordered collections of elements where an element can occur as a

    member more than once. A hitting set [51, 52] for a given multiset, MS, of elements

    from 2C is a set B, B C, such that the intersection between B and every set in

  • 26

    MS is non-empty.

    HS(MS) = {B C|B MSi 6= for all MSi MS} (4.8)

    The set B HS(MS) is a minimal hitting set of MS, if B is no longer a hitting set,when ever any of its elements are removed. The set of minimal hitting sets is denoted

    by minHS(MS).

    An approximation to the hitting set is a set that covers enough elements of the

    multiset MS as denoted by a constant . The set of -approximate hitting sets of S is

    denoted by HS(MS, ), where the parameter controls the degree of approximation,

    HS(MS, ) = {B C| |MSi MS and MSi B 6= ||MS| } (4.9)

    The set B SH(MS, ) is a minimal -approximation hitting set if it is no longeran -approximation hitting set when any of its elements are removed.

    The problem of computing the minimal hitting set, like the reducts computation,

    is an NP-hard problem. Again, it is necessary to use heuristics in order to find reducts

    using hitting sets, but we still cannot guarantee the minimality of the reducts.

    4.7 Decision Rules, Decision Systems

    A decision rule is an assertion, of the form if p then s, denoted by p s,where p and s are logical formulas in the first order logic. For each object, certain

    values of the condition attributes determine the value of the decision attribute. We

    define a decision system as a finite collection or set of decision rules. In order to

    obtain a decision system with a minimum number of rules, superfluous decision rules

    associated with the same decision class can be eliminated without disturbing the

    decision making process.

    Let D be a decision system and let T = (U,C, {d}) be an information system.The set of all decision rules in D having the same successor s as a decision attribute

    value, is denoted by Ds. We can say that Ds is the decision subsystem associated

  • 27

    with class s. The set of all predecessor formulas of the decision rules belonging to Ds,

    is denoted by Ps. We say that a decision rule p s from D is dispensable in D ifthe disjunction of all the formulas in Ps is equivalent to the disjunction of formulas

    in Ps {p}. Otherwise, the rule is indispensable. If every decision rule belongingto Ds is indispensable, the decision subsystem Ds is said to be independent. For a

    decision system, if all its decision subsystems Ds are independent, we say that the

    decision system is minimal.

    4.8 Decision System Construction

    The problem of decision system construction is to induce a set of rule descriptors

    of decision classes from the input set of objects. These sets of descriptors, named

    decision systems, consist of a set of decision rules. We can classify the decision

    system as following:

    1. Decision systems with a minimum set of rules. They are focused on describing

    input objects using a minimum number of necessary rules.

    2. Decision systems with an exhaustive set of rules (figure 4.5, 4.6). These decision

    systems contain all possible decision rules.

    3. Decision systems with a satisfactory set of rules. This category represents sets

    of decision rules which satisfy given a priori users requirement for an acceptable

    decision system.

    One strategy for finding a simple decision system with good classificatory capabil-

    ities is to first induce an exhaustive set of rules, and then to prune away those rules

    that do not lower the decision systems performance significantly. An exhaustive de-

    cision system can be generated from the object-related reducts [26, 27], as described

    in section 4.5.

  • 28

    Rule 1 2 3 4 5 6 7 81 C2(0) AND C4(2) => D(0) 1 1 1 0.05 0.1 1 2 12 C2(3) AND C4(1) => D(0) 1 1 1 0.05 0.1 1 2 13 C2(2) AND C4(0) => D(0) 2 2 1 0.1 0.2 1 2 14 C2(3) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 15 C1(0) AND C4(2) => D(0) 1 1 1 0.05 0.1 1 2 16 C1(1) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 17 C1(2) AND C4(0) => D(0) 1 1 1 0.05 0.1 1 2 18 C1(0) AND C4(1) => D(0) 1 1 1 0.05 0.1 1 2 19 C1(3) AND C4(0) => D(0) 2 2 1 0.1 0.2 1 2 1

    10 C4(3) => D(0) 1 1 1 0.05 0.1 1 1 111 C1(0) AND C2(3) => D(0) 1 1 1 0.05 0.1 1 2 112 C1(0) AND C2(0) => D(0) 1 1 1 0.05 0.1 1 2 113 C1(3) AND C2(0) => D(0) 1 1 1 0.05 0.1 1 2 114 C1(1) AND C2(1) => D(0) 1 1 1 0.05 0.1 1 2 115 C1(3) AND C2(2) => D(0) 2 2 1 0.1 0.2 1 2 116 C1(2) AND C2(1) AND C4(1) => D(0) OR D(1) 2 1,1 0.5,0.5 0.1 0.1,0.1 1.0,1.0 3 217 C2(1) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 118 C2(2) AND C4(1) => D(1) 2 2 1 0.1 0.2 1 2 119 C1(0) AND C2(1) AND C4(0) => D(1) 1 1 1 0.05 0.1 1 3 120 C2(2) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 121 C2(3) AND C4(2) => D(1) 2 2 1 0.1 0.2 1 2 122 C2(0) AND C4(1) => D(1) 1 1 1 0.05 0.1 1 2 123 C1(2) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 124 C1(3) AND C4(2) => D(1) 1 1 1 0.05 0.1 1 2 125 C1(3) AND C4(1) => D(1) 1 1 1 0.05 0.1 1 2 126 C1(1) AND C4(2) => D(1) 2 2 1 0.1 0.2 1 2 127 C1(2) AND C2(2) => D(1) 1 1 1 0.05 0.1 1 2 128 C1(1) AND C2(2) => D(1) 2 2 1 0.1 0.2 1 2 129 C1(3) AND C2(3) => D(1) 1 1 1 0.05 0.1 1 2 130 C1(3) AND C2(1) => D(1) 1 1 1 0.05 0.1 1 2 131 C1(2) AND C2(0) => D(1) 1 1 1 0.05 0.1 1 2 1

    Table 4.5: Exhaustive Decision System

    Pruning can be done by identifying and removing components of the decision

    system that only explain small parts of the data, thereby preserving general trends in

    the underlying data material. In order to find a minimal decision system we can use a

    simple greedy heuristic algorithm described by Lazar and Sethi [20]. This algorithm

    computes only one decision system. If more than one minimal decision system is

    required we can use a Genetic Algorithm which solves the minimal cover set problem.

    Agotnes [1, 2] proposed two algorithms for generating satisfactory decision systems,

    a quality-based rule filtering algorithm and a genetic rule filtering algorithm. Rule

  • 29

    Rule 91 C2(0) AND C4(2) => D(0) {o2}2 C2(3) AND C4(1) => D(0) {o2}3 C2(2) AND C4(0) => D(0) {o1,o5}4 C2(3) AND C4(0) => D(0) {o10}5 C1(0) AND C4(2) => D(0) {o8}6 C1(1) AND C4(0) => D(0) {o6}7 C1(2) AND C4(0) => D(0) {o4}8 C1(0) AND C4(1) => D(0) {o3}9 C1(3) AND C4(0) => D(0) {o1,o5}

    10 C4(3) => D(0) {o7}11 C1(0) AND C2(3) => D(0) {o10}12 C1(0) AND C2(0) => D(0) {o8}13 C1(3) AND C2(0) => D(0) {o7}14 C1(1) AND C2(1) => D(0) {o6}15 C1(3) AND C2(2) => D(0) {o1,o5}16 C1(2) AND C2(1) AND C4(1) => D(0) OR D(1) {o9,o20}17 C2(1) AND C4(2) => D(1) {o17}18 C2(2) AND C4(1) => D(1) {o18,o19}19 C1(0) AND C2(1) AND C4(0) => D(1) {o13}20 C2(2) AND C4(2) => D(1) {o16}21 C2(3) AND C4(2) => D(1) {o12,o15}22 C2(0) AND C4(1) => D(1) {o11}23 C1(2) AND C4(2) => D(1) {o17}24 C1(3) AND C4(2) => D(1) {o15}25 C1(3) AND C4(1) => D(1) {o14}26 C1(1) AND C4(2) => D(1) {o12,o16}27 C1(2) AND C2(2) => D(1) {o19}28 C1(1) AND C2(2) => D(1) {o16,o18}29 C1(3) AND C2(3) => D(1) {o15}30 C1(3) AND C2(1) => D(1) {o14}31 C1(2) AND C2(0) => D(1) {o11}

    Table 4.6: Exhaustive Decision System

    filtering operates on an existing exhaustive decision system, pruning it while retaining

    a high performance. Both of the above solutions make no assumptions about the

    minimal set cover condition. As a result, the decision system may not be minimal.

    We will propose a new solution based on the Cultural Algorithm which addresses the

    minimal set cover problem explicitly.

  • 30

    4.9 Steps for Model Construction

    The typical steps (figure 4.2) for constructing model hypotheses for classification

    problems using the Rough Sets methodology are as follows:

    Figure 4.2: Model Development

    1. Sampling and selection: This step imply the construction of a research file. Usu-

    ally this is done by selecting a random sample of prospects from the database.

    Also, the irrelevant attributes should be removed. Then, the selected data is

    represented as a two-dimensional table.

    2. Preprocessing: If the selected table contains missing values or empty cell entries,

    the table must be preprocessed in order to remove some of the incompleteness.

    Statistics should be run to obtain more information about the data.

  • 31

    In the initial data set, missing values were coded differently for each variable.

    To make the table consistent we replaced all the values assigned to missing

    data with the SPSS convention, system missing data. We did not replace the

    missing values with a default value since the Rough Sets representation can deal

    with uncertainty, and the algorithms define them as undetermined. Descriptive

    statistics, using SPPS 8.0 were run for all of the attributes and used to identify

    outliers for modification.

    3. Transformation: For example, measurement attributes should be discretized,

    and used instead of exact observations. Categorical data may be recoded to

    provide a consistent interpretation. The Rough Sets procedure is potentially

    sensitive to large differences in the number of categories for variables. Thus,

    we performed transformations of the data with many categories to consolidate

    them when possible into smaller set of categories. This was done in a systematic

    way, after checking the statistics, and carefully without loosing much informa-

    tion. Also, the decision variables should be identified. After this step the data

    becomes more qualitative than quantitative.

    4. Training and validation sample: The initial table is divided into at least two

    subtables. One will be used in the training step, the other in the validation or

    testing step.

    5. Develop the model: Knowledge discovery techniques are applied to the training

    data in order to generate a set of hypothesized relations. Following the Rough

    Set methodology, the full set of reducts are computed, a set of minimal reducts

    is chosen, and the data table is vertically pruned. Then the object related

    reducts are computed and the exhaustive decision rule system is generated. At

    the end a pruning method for the decision rule set is applied in order to obtain a

    good decision system, with a good balance between the number of rules and the

  • 32

    Data

    Compute full

    reducts

    Compute object-related

    reducts

    Generate exaustive

    decision-rule system

    Find minimal or satisfectory

    decision-rule system

    Figure 4.3: Model Construction Phases

    accuracy of the classifications. The process is shown in figure 4.3. The above

    procedure was followed exactly and the results are shown in section 6.

    6. Interpretation and evaluation: The validation or test data set is then used

    to test the classificatory performance of the new model. Also, if it is a rule-

    based model, it can be checked by specialists, in order to understand the data

    sets, understand and explain dependencies between values of attributes and

    definitions of decision classes. The expert will check the decision rule system.

  • 33

    CHAPTER 5

    EVOLUTION-BASED LEARNING OF ONTOLOGICAL

    KNOWLEDGE FOR A LARGE-SCALE MULTI-AGENT SIMULATION

    5.1 Introduction

    The results of the data mining process can be used in many different ways. There-

    fore, the form of the knowledge collected will have a major impact on the efficiency

    and effectiveness of its use in a given application. In this chapter we examine the

    problem of extracting knowledge for use by agents in a large-scale multi-agent system

    [42]. Here, the knowledge is ontological knowledge that represents constraints that

    the physical and social environments placed upon the agents and their interactions.

    The ontological knowledge represents the semantic building blocks around which the

    world models are formed. For an agent in a particular model, only the things in

    his ontology can exist and it cannot perceive things that are not represented in the

    ontology. An ontology Fox et al [12] is a basic level of knowledge representation

    scheme, a formal definition of entities and their properties, interactions, behaviors

    and constraints. Each agents decisions need to be checked against these constraints

    prior to their execution. In a complex multi-agent system, hundreds of thousands of

    agents may need to check these constraints regularly which means that a successful

    Data Mining activity will need to produce a relatively small set of syntactically simple

    rules for the process to be efficient. Fox et al [12] have used Data Mining techniques

    to produce corporate ontogenies.

    There are several factors that can influence the nature of the ontological con-

    straints that are produced. First, the nature of the data collection and measurement

    process and the uncertainty induced into the data set by the presence of noise. Second,

    the nature of the representation used to express the extracted patterns; e.q. whether

  • 34

    it allows for uncertainty or not. Third, the data mining technique employed and the

    assumptions that it makes about the collected data. Fourth, how these constraints

    will be stored, accessed, and used by the agents involved.

    For a given data set one can compare the different Data Mining techniques in

    terms of the syntactic and semantics of the induced constraints. In this application

    we are interested in simulating the emergence of the archaic state in the Valley of

    Oaxaca, Mexico. A state is among the most sophisticated and powerful structures

    that has emerged from the social evolution process. In the modern world these are

    termed nation states with a government composed of a hierarchical decision-making

    structure where the decision-makers are either elected or appointed. States are sup-

    ported by various economies and are able to interact with each other via warfare,

    trade, etc.. Most states in the ancient world-often called archaic states were ruled by

    hereditary royal families. These archaic states exhibited much internal diversity with

    populations numbering from tens of thousands to millions. They had a bureaucracy,

    organized religion, a military presence, large urban centers, public buildings, public

    works, and services provided by various professional specialists. The state itself could

    enter into warfare and trade-based relationships with other states and less complex

    neighbors.

    The process by which complex social entities such as the state emerged from lower

    level structures and other supporting economies has long been of prime interest to

    anthropologists and other disciplines as well. This is because the emergence of such

    a social structure can have a profound impact on the societies physical and social

    environment. However, the task of developing realistic computational models that

    aid in the understanding and explanation of state emergence has been a difficult one.

    This is the result of two basic factors:

    The process of state formation inherently takes place on a variety of temporaland spatial scales.

  • 35

    The emergence of hierarchical decision-making [35] can be viewed as an adapta-tion that allows decision-makers to specialize their decisions to particular spatial

    and temporal scales.

    The formation of the state is a complex process that is fundamentally directed by

    the social variables but requiring dynamic interaction between the emergent system

    and its environment. Identifying the nature of these interactions is one of the reasons

    why the process of state formation is of such interest.

    The goal of this project is to produce a large-scale knowledge-based computational

    model of the origins of the Zapotec State (Marcus and Flannery 1996), centered at

    Monte Alban, in the Valley of Oaxaca, Mexico. State formation took place between

    1400 B.C. and 300 B.C. While archaic states have emerged in various parts of the

    world, the relative isolation of the valley allowed the processes of social evolution

    to be more visible there. Extensive surveys [4, 5, 19] of the 1700 km2 valley, were

    undertaken by the Oaxaca Settlement Pattern Project in the 1970s and 1980s. The

    location and features of over 2,700 sites dating from the archaic period (8000 B.C.)

    to Late Monte Alban V (just prior to the arrival of the Spaniards) were documented.

    Several hundred variables were recorded for each site. In addition, they surveyed the

    6.5 square kilometer urban center of Monte Alban, a site that contained over 2,000

    residential terraces. This site was the focus for early state formation in the valley.

    Both surveys provided the knowledge needed to create our multi-agent simulation

    model. We then produced a spatial temporal database that contained the results

    of both surveys and used data mining techniques from Artificial Intelligence [42] to

    produce knowledge about site location, warfare, trade, and economic decisions to be

    used for the construction of the multi-agent model. However, in order to do this

    we needed to add more data about the spatial and temporal context to both the

    regional and urban center surveys. Specifically, we had to add variables that allowed

    us to locate each site spatially and temporally to a level of precision consistent with

  • 36

    the scale of our simulation. For example, temporal periods are characterized by the

    presence of pottery of different styles. That data was available only in text form. All

    of this pottery data, over 130 variables for each residential terrace, was scanned into

    the computer, corrected for errors, and added to the Monte Alban data set. This

    data allowed us to identify the periods that each terrace was occupied. Pottery data

    was also integrated into the regional data set.

    In addition, the survey had produced hundreds of pages of hand drawn maps

    for both the Monte Alban and regional surveys that contained the spatial context

    for the location of each site. Since our goal was to ask specific questions about the

    spatial and temporal context we needed to tie each site into its mapped location. We

    then proceeded to digitize each of the maps and to associate each site object with its

    corresponding data record. This allowed us to produce a geographical information

    system (GIS) that serves as our virtual valley of Oaxaca. This acts as a vehicle

    for our data mining activities and as a knowledge base for the multi-agent simulation

    and allows the results of the simulation to be displayed and compared with the actual

    data in a spatial context. It is envisioned that the resultant GIS system will be a

    useful tool for researchers and students from various fields to study the emergence of

    complexity in the future.

    In order to perform the data mining activities, we extended traditional data mining

    techniques and developed new ones in order to deal with the complexities inherent

    in the Oaxaca database. At the regional level we used Utgoffs incremental Decision

    Tree Algorithm (IDTI) [48] to generate the Decision Trees for each region and phase

    of the valley. The approach was used to generate decision trees that discriminated

    between sites that were targets for warfare and those that were not for a given period

    [39, 38].

    However, given the many disparate steps under which the data was collected and

    organized it was felt that perhaps some improvements might be made by using a

  • 37

    technique that took into account the presence of uncertainty in the data, especially

    in regions and periods when the social and settlement patterns were complex and

    prone to data collection error. To test this hypothesis we selected a period of time

    just before the emergence of the state, Rosario, where there was evidence of increased

    social strife brought about in part by increased population growth. The part of the

    valley that exhibited the greatest population at the time, Etla, was also selected.

    Since the majority of the data was discrete rather than continuous in nature we

    selected Rough Sets as a vehicle for representing uncertainty here. We employed an

    evolutionary technique, Genetic Algorithms [13, 14], to control the search in this case

    because Genetic Algorithms had been successfully used with Rough Sets previously.

    The decision systems or rule sets produced by both approaches were then compared in

    terms of their ability to decide about the location of sites that are targets for warfare

    in this period. We then compared the two approaches over all relevant phases of

    social evolution in the valley.

    In section 2 we begin with an overview of Decision Trees and their generation.

    Section 3 provides a comparison of a specific decision system for the Etla region in

    the Rosario phase as produced by Decision Trees and Rough Sets respectively. Section

    4 compares the two approaches together over all phases in the Etla region. The Rough

    Set approach appears most useful in those phases with the most evidence for change

    in social complexity. Section 5 gives our conclusions.

    5.2 Decision Tree

    Decision Tree induction is a very good method for high-dimensional applications. It

    is a fast non-linear prediction method and employs dynamic feature selection. The

    solution complexity is expressed in terms of the number of terminal nodes. The most

    complex tree covers all cases in the training data. Pruning the tree and measuring

    the errors in progressively smaller trees find less complex solutions.

  • 38

    Period Approximate Date

    Tierras Largas 1400 - 1150 BCSan Jose 1150 - 850 BCGuadalupe 850 - 700 BCRosario 700 - 500 BCMonte Alban Ia 500 - 300 BCMonte Alban Ic 300 - 150/100 BCMonte Alban II 150/100 BC - AD 200Monte Alban IIIa AD 200 - 500Monte Alban IIIb AD 500 - 700/750Monte Alban IV AD 700/750Monte Alban V AD 1000 - 1521

    Table 5.1: Number of Terminal Nodes for each Phase

    Any decision tree can be rewritten in a form of decision rule set, however there are

    sets of rules which can not be expressed as any decision tree. An implied decision rule

    in a tree is a complete path to a terminal node. Because these rules are not mutually

    exclusive, the size of the decision rule set can be much larger then the logic needed

    for overlapping rules. One rule can combine a large number of terms or true-false

    conditions, which takes a lot of time for evaluation, when the rule set is used in an

    agent-based simulation.

    One of the advantages of logic-based solutions as decision trees and corresponding

    decision rules is their powerful explanatory capabilities. Table 5.1 gives all of the

    relevant periods of social evolution in the valley. Tierras Largas marks the beginning

    of early village settlement there. The state emerged at Monte Alban in period Monte

    Alban Ia. The valley came under control of the state by Monte Alban II, and Monte

    Alban IIIa signaled the decline of the state and it succession by a collection of city-

    states localized in different parts of the valley.

    For some periods there were several hundred rules produced just for the warfare

    constraint alone. Adding in the other constraints would produce a knowledge base of

    several thousand rules at minimum. However, since the data were collected over a ten-

    year period over varying landscape using different surveyors and recording equipment

  • 39

    env_zone

  • 40

    it was felt that a significant amount of noise might be present in the data. The

    original technique did not account explicitly for uncertainty in the measurement data.

    Thus, it was possible that certain rules or rule conditions were present only to deal

    with the specific noise introduced into the process by the survey methods and data

    transformation activities. If this was the case, then by using a representation and

    associated learning technique that dealt explicitly with uncertainty it might be that

    fewer rules would be needed in periods where the noise is most pronounced.

    Rules

    1 env zone

  • 41

    and the those that are not. Three variables were used in order to compute the

    decision: the presence of burnt daubt at the site, other evidence of burning, and the

    presence of defensive walls. The variables used to predict these decisions from positive

    and negative examples in the training set were: Environmental zone, Slope, Hilltop

    or Ridge top, Soil character, On the boundary between the loam and the swampy

    region, Water source, Depth of Water table, Type of irrigation, and Land use type

    among others.

    In section 2 we presented a decision tree 5.1 and a corresponding decision system

    5.2 for the Rosario phase (700-500 B.C.) generated by the Decision Tree approach.

    It is the fourth phase of occupation in the study and at that time population size

    and warfare increased substantially (Marcus at al [23]). For example, it was observed

    that chunks of burnt daub appear on the surface of the villages seven times more

    frequently than in the previous phases. There are 36 sites in the Rosario phase. The

    archaic state emerged in the period following this phase of increased warfare.

    Further, we consider another approach for extracting logical patterns within data

    sets Rule induction using Rough Sets. Although, trees and rules may seem similar

    they are in fact very different as it can be seen in the table 5.3.

    In essence, Decision Trees may be viewed as a simplistic approach to rule discovery.

    In order to extract rules, we performed a feature selection using the Rough Set guided

    by Genetic Algorithm with the variables above. The Rough Set approach selected the

    four variables in two reducts: environmental zone, slope, water source, land use and

    slope, soil character, water source and land use. The corresponding decision systems

    are given in tables 5.4 and 5.5.

    This table represents the exhaustive set of rules produced. While it is clear that

    several of the rules are so simple that they can be easily combined to produce a

    smaller set of rules overall, it is sufficient for comparative purposes here.

    Our focus here is on the impact to which the use of a technique, such as Rough

  • 42

    Decision Trees Rough Sets

    Advantages Easy to understand Very expressiveModular knowledgeGood with missing dataThey handle unprecise data

    Disadvantages May be difficult to use withcontinuous data

    Can be memory intensive

    They look at simple combina-tion of attributes

    Can be computational intensive

    They need to break numericfields into fixed rangesNot very good with inexactdataNot flexibleNo way to handle missing dataCan not easily approach largedata setsMay have overfittingLess accurate predictions

    Table 5.3: A Comparison of Decision Trees versus Rough Sets

    Sets, that explicitly is able to deal with uncertainty in the recognition decision. From

    this standpoint there are two basic points of comparison. Firstly, how many of the

    rules identify a site for attack unambiguously and, what percentage of the rules that

    select sites for attack do they comprise? Secondly, in those cases in which the rule

    produces a split decision we will need to resolve the tie using other means. The

    question is, how much effort do we need to spend in order to find out that we must

    contact another source to resolve the question?

    In answer to the first question, explicitly dealing with uncertainty using the rough

    set representation produced four rules that identify sites for attack as opposed to just

    three rules in the decision tree approach. Of these five rules, three of the five (7, 10

    and 12 or 6, 9 and 11) result in unambiguous decisions. That is, 66% of the rules

    that can conclude that a site can be attacked are unambiguously so whereas the other

    two need further clarification. The Decision Trees approach produces 3 rules that can

  • 43

    Rules

    1 env zone(High Alluvium) => Not Attacked2 env zone(Middle Piedmont) => Not Attacked3 env zone(Lower Piedmont) AND slope(Shallow) AND water source(Arroyo)

    AND land use(Class I) => Not Attacked OR Attacked4 water source(Tributary Stream) AND land use(Class III 100%) => Not At-

    tacked5 water source(Tributary Stream) AND land use(Class I) => Not Attacked OR

    Attacked6 water source(Arroyo) AND land use(Class III 100%) => Not Attacked7 water source(Spring) AND land use(Class III 100%) => Attacked8 water source(Spring) AND land use(Class I) => Not Attacked9 slope(Shallow) AND land use(Class III 100%) => Not Attacked10 slope(Flat) AND land use(Class III 100%) => Attacked11 slope(Flat) AND land use(Class I) => Not Attacked12 slope(Flat) AND water source(Spring) => Attacked13 slope(Flat) AND water source(Arroyo) => Not Attacked14 slope(Shallow) AND water source(Spring) => Not Attacked15 water source(Well) => Not Attacked16 water source(Main River) => Not Attacked

    Table 5.4: First Exhaustive Decision System for the Rosario Phase in Etla Region

  • 44

    Rules

    1 soil(Alluvium) => Not Attacked2 slope(Shallow) AND soil(Bedrock) AND water source(Arroyo)

    AND land use(Class I) => Not Attacked OR Attacked3 water source(Tributary Stream) AND land use(Class III 100%) => Not At-

    tacked4 water source(Tributary Stream) AND land use(Class I) => Not Attacked OR

    Attacked5 water source(Arroyo) AND land use(Class III 100%) => Not Attacked6 water source(Spring) AND land use(Class III 100%) => Attacked7 water source(Spring) AND land use(Class I) => Not Attacked8 slope(Shallow) AND land use(Class III 100%) => Not Attacked9 slope(Flat) AND land use(Class III 100%) => Attacked10 slope(Flat) AND land use(Class I) => Not Attacked11 slope(Flat) AND water source(Spring) => Attacked12 slope(Flat) AND water source(Arroyo) => Not Attacked13 slope(Shallow) AND water source(Spring) => Not Attacked14 water source(Well) => Not Attacked15 water source(Main River) => Not Attacked

    Table 5.5: Second Exhaustive Decision System for the Rosario Phase in Etla Region

    conclude that a site can be attacked with only one of them (rule 3) being conclusive.

    Thus, only 33% of the rules that identify a site for attack are conclusive as opposed

    to 66% for the Rough Set approach. By taking data uncertainty into account the

    Rough Set approach not only produced more rules for the identification of the target

    concept, but also a higher percentage of unambiguous ones.

    The other question concerns the relative amount of effort expended to produce

    an uncertain conclusion. In the decision system produced using Rough Sets the

    inconclusive rules have fewer conditions to be checked than for those from the Decision

    Trees approach. Specifically, the inclusive Rough Set rules have 2 and 4 conditions

    respectively for a total of 7 conditions, one of which is shared between them (land

    use type = Class I). In the Decision Tree system 8 conditions must be checked in the

    two inconclusive rules for a total of 16. However, each shares the same 8 so that the

    total number of unique conditions to be tested is 8 as opposed to 6 for the Rough

  • 45

    Set approach. More effort must then be expended in order to check the inconclusive

    rules in the Decision Tree approach as opposed to that for Rough Sets.

    Since both approaches extracted the same set of condition variables, the differences

    are likely to reflect the impact that