[ieee international conference on expert systems for development - bangkok, thailand (28-31 march...

IN SEARCH FOR HIGH PERFORMANCE Pierre A.I. Wijkman

Royal Institute of Technology & University of Stockholm, Department of Computer and Systems Sciences

Electrum 230,164 40 Kista, Sweden e-mail: [email protected]

ABSTRACT In this paper we present a framework for machine learning that is based on, or more accurately; inspired from, concepts taken from the natural process of biological evolution. Currently, there exist many such evolutionary computational approaches to machine learning. The approach presented here differs radically from these approaches; by taking a very abstract and general viewpoint to the process of evolutionary computation we arrive at some very remarkable and general results. This new approach and its accompanying results are presented in this paper.

1 INTRODUCTION The idea to look in a new duection, in a different area, for the solution to some problem is an example of an often very frwtful methodology for solving problems. This is called reasoning by analogy. In this paper we look at the area of evolutionary biology to solve problems withm the area of machine learning. Earlier approaches in the same duection are Frasers work on "Genetic System Simulation" 1957 [2], Fogel, Owen and Walshs work on "Evolutionary Programming" 1966 [ 11, Rechenbergs work on "Evolution Strategies" 1973 [5], and Hollands work on "Genetic Algorithms" 1975 [4]. Recently a more general approach was taken by Spears, De Jong, Back, Fogel and Garis in their work on "Evolutionary Computation" [6] which compared the most important approaches. The approach presented in this paper distinguish itself in a significant way from previous work in that it is looking at the problems involved from a new, possibly higher, viewpoint by introducing a new, more general and abstract, framework as its basis, The results from this introduction are a number of very interesting and general conclusions regarding the nature of the process of evolution. These conclusions are of interest both for evolutionary biology and for machme learning.

2 DEFINITION OF PROBLEM It is often true that an intelligible problem definition can gyve you part of, or indeed, the whole answer to some specific problem. Can we find such an intelligible problem definition to the problem dealt with in the field of ma- clune learning or evolutionary computation? Yes we can. Before we give such a definition, we must (informally) define the general concept of system.

0 Definition of system. A system is described by two complementary types of descriptions; one focusing on the systems static properties, the other focusing on the systems dynamic properties:

1. The systems static properties are its composition and structure. The composition is composed of a finite number of sub systems and the structure is composed of a finite number of connections, The connections in the structure of a system determine how the sub systems in the composition of the system are connected. The connections determine the pattern of communication between the sub systems.

The systems dynamic properties are its behaviour. A system has some speclfic finite behaviour. A system can in this case be seen as an entity that transforms signals from and to the environment. This transfor- mation is dependent on the internal stute and the history of the system.

We will refer to the first type of description as the declarative description and the second type as the procedural description. The next step to an intelligible problem definition is to distinguish two kinds of systems; the developmental system and the performance system. The developmental system can, simply put, construct or develop performance systems. The intelligible problem definition can now be given as follows:

0 Define the composition and the structure of a developmental system that have a behaviour that accords to the following, so called, minimal description:

Given the following as input;

1.

2.

a specified composition and an unspecified structure for some performance system, and

42 0-8186-5780-4/94 $03.00 0 1994 IEEE

2.

the developmental system should output;

3 .

a minimal description of some wished for behaviour for some performance Jystem,

a performance Jystem with a specified composition and a specified structure that has the behaviour described in (2),

in a maximally space-time effective way.

We will refer to this problem as the main problem. The concept of a minimal description can informally be defined as a description consisting of two parts where one part describes a systems short-term behaviour, and the other part describes a system long-term goal behaviour. The concept of maximal space-time effective behaviour can informally be defined as a behaviour that performs in a manner that is both close to time maximal and close to space maximal. Without the constrain on space-time efficiency the problem above would not be a problem at all: we could simply use an easily defined developmental system having a space-ineffective, time-ineffective or space- time ineffective behaviour. To be able to solve the problem defined above, we first have to expand our understand- ing of the behaviour of the developmental system.

3 THE BEHAVIOUR OF THE DEVELOPMENTAL SYSTEM

The main problem defined above can simply be stated; find a general way of constructing, in a space-time effective manner, a performance system so that from a given finite set of sub systems and a given finite set of connections the constructed performance system behaves as a given minimal behaviour description. In this section will we look into why this problem is difficult and also show in more detail what kind of behaviour that the developmental system should have to solve it.

3.1 The very reason for the main problem being difficult is twofold. The first reason is simple to state; the number of performance systems that can be constructed grows exponentially with the size of the structure, i.e. with the number of connections, In contrast, the number of performance systems that can be constructed grows only polyno- mially with the size of the composition, i.e. with the number of sub systems. We will therefore, in the following, focus on the structure of the performance system. The second reason needs more elaboration to be understood. As- sume that we order the set of all possible performance system constructions into a metric space with the help of a distance function over this space that is dependent on the concept of minimal structural change, where a minimal structural change is equal to an addition or a deletion of a single connection between any two sub systems compos- ing the performance system. We will refer to this space as the search space.Assume next, that we can measure how close a performance systems behaviour is to some goal behaviour. We will refer to the function that measure this closeness between hvo performance systems behaviours as the performance function. We will refer to the result of the performance function as the performance value. Now, to each performance system construction in the search space is thus a performance value associated. If we draw this function over the whole search space we can imagine a very large and very complicated landscape of hills and valleys taking form. We will refer to this landscape as thejtness landscape and to its actual form as thejtness topography. The fitness topography is evidently dependent on the a priori given composition, the a priori given structure and the a priori given minimal behaviour description for the performance system. It is also, however, dependent on the, above introduced, distance function that defines the order of the search space.

The main problem can, with this new vision in mind, be reformulated as follows. Define a general method that can find the highest peaks in an arbitrary fitness landscape in a maximally space-time effective way. We are, evidently, dealing with a search problem. The fact that the lateral size of the fitness landscape is very large ex- cludes the possibility to exhaustively search the whole space. The only possibility left is to: somehow, use the information inherent in the fitness topography. If, for example, the fitness topography was formed as one single large mountain it would be easy to find the mountain's top by an ordinary hill climbing search method. This method does, however, only work in strictly monotonical fitness topographies and will accordingly not perform well in arbitrary fitness topographies. The second reason for the main question being difficult is thus due to the generality of the main question; the search method must be able to deal with arbitrary fitness topographies.

3.2 THE OPTIMAL SEARCH BEHAVIOUR We will now, with the &scussion in the previous sections in mind, define the concept of optimal search behaviour.

Definition of neutral-positive structural change. If we make a number c of structural changes, i.e. a number of additions and/or deletions of connections, to a system and the resulting new system is a system with equal or

THE DIFFICULTY OF THE MAIN QUESTION

43

higher performance value than the original system, we say that we have made a degree c of neutral-positive structural change to the original system. We will denote the degree c of neutral-positive structural change between (1) two systems and between (2) two sets of systems with the following notation;

(1) SI &+)SI (2) SI &+, SI '

In case (2), all systems in Si can be transformed to all systems in 9 by a degree c of neutral-positive structural change.

Definition of connected cluster. Assume that the we have a set of systems;

S = SI U S,u ... U Sn, where; S, S, x=+) ... %==& S,, sc(c,=t)

We then say that the set S forms a degree c connected cluster and we will denote a set of systems that form such a cluster by the following notation;

Ssc(r.=t)

Definition of structure cumulative search. Assume that we are searching for a performance system with the performance value v . A structure cumulative search proceeds in a number n of steps. During each step the structure cumulative search tries to transform some performance system into another performance system by a degree of neutral-positive structural change in the following manner;

sc;f=*j s2 sc;f=+j ... se:+, sn

where;

and the path that this series of transformations takes is the one that minimise the following sum;

performance(s,) < performance(s,) <. . . < performanct f~~) = v ,

where; E Sl%(r,.=+i , s, E Sy , .=+) , , , . , $" E gy=..=+i

cl, c, ,..., c,,, is set optimally. Simply put, the fitness landscape can be thought of as partitioned into several clusters with appropriate connectedness values set a priori. The structure cumulative search proceeds by climbing from one cluster to another, higher, cluster, There exist, in general, several alternative paths that a search can take. The structure cumulative search takes the path that requires the least total search effort. The reason for that the structure cumulative search is maximal space-time effective is that the structure cumulative search makes an optimal use of the information implicit in the fitness topography.

4 THE COMPOSITION AND STRUCTURE OF THE DEVELOPMENTAL SYSTEM In this section will we show how the structure cumulative search that we defined in the previous section can be implemented. There exist, however, a number of aspects of this behaviour that not are straightforward to implement:

0 The fitness landscape can be partitioned into many different clusters and each cluster can have a different degree of connectedness.

i. How does the structure cumulative search set the connectednessparameter's c1, c2, ... , c, for each cluster?

ii. How does the structure cumulative search know which path it should take among all these different clusters? If it takes the wrong path it could, for example, be stuck on a local maxima.

iii. Once the structure cumulative search has reached some specific cluster, how does the structure cumulative search know the localisation of the whole clusters in the fitness landscape?

The fitness landscape can be partitioned into many different performance levels.

44

. .. .

iv. How can the structure cumulative search Merentiate between different performance levels? We previously assumed that we had an explicit performance function. Even if we, in principle, could define a very space-time ineffective one, there would be no use in doing so. The reason for that is that one of the two arguments to the performance function only is partially given. The behaviour of the performance system is given as a minimal description of some behaviour. Using this as input to a performance fimction would result in a performance function that only could differentiate between a number of smaller hills together with a differentiation between the highest hills and all other hills in the fitness landscape. The performance function could not differentiate between all the levels in between these two extremes.

We will now present the details of the composition and structure of the developmental system that solves these problems. The resulting developmental system will be shown to implement the wished for structure cumulative search behaviour to an arbitrarily close degree. This far, we know that the composition and structure of the developmental system, in essence, should implement a special kind of search behaviour. The characteristic components of a general search are;

0

0

The characteristic structure of a general search are;

0

the generaror, that defines the scope of the search, i.e. the search space, and

the tester, that defines what is being searched for in the search space.

an alternating cycle, where the generator generates a candidate and the tester validates this candidate. If the test succeeds, the search is finished, otherwise the system returns to the generator again.

For our purposes, we must extend this view. In essence, the composition of the developmental system consists of three sub systems:

The composition and the structure of the developmental system.

0

0

The population ofperformance systems, that consists of a finite set of performance systems.

The adder, that adds performance systems to the population of performance systems, where the adder consists, in turn, of two sub systems:

0 The generator, that generates possible candidate performance systems from the performance systems selected by the parent selector (described below). This generation can be random or biased by the structure and/or the composition of the selected performance systems. The generated performance system is added to the population of performance systems. The generator belongs to one of the following three categories:

0 Absolute. An absolute generator is not talung any performance system as input in the construction of new performance systems. This type of generator can not be part of an implementation of a cumulative search behaviour, and will not be considered further.

Semi-relative. A semi-relative generator is taking exactly one performance system as input in the construction of new performance systems. To generate a new performance system from a single performance system involves the following:

45

0 Make specijk number of randomly chosen deletions and/or additions of connections in the structure of the input performance system.

Relative. A relative generator is taking more than one performance system as input in the construction of new performance systems. This means that the generator has the possibility to employ much more complex schemata of generation than (the random change schemata that) the semi-relative generator can use. To generate a new performance system from several input performance systems involves the following:

0 Generate a new performance system by choosing a number of connections in the new performance system randomly from the input performance systems.

There are three fundamental differences between the semi-relative generator and the relative generator that makes the relative generator superior over the semi-relative generator.

1. The semi-relative generator can not adaptively bias the location of the random change to be made to the generated performance system. The higher performance value the performance system has, the more sensitive is the performance system to random change. The performance system relative generator can, on the other hand, adaptively bias the location of the random change to be made to the generated performance system. There is, for example, no room for any change when the input performance systems have identical connections. The whole population of performance systems determine together statistically the bias of the location of the change to be made to the generated performance system. The semi-relative generator can not adaptively set the degree of the random change to be made to the generated performance system.

0 If we set the degree of change to be constantly low, then will the search process be very sensitive to the problem of local maxima. The search process can locate connected (possibly smaller) clusters but it can not locate more unconnected (possibly larger) clusters.

If we set the degree of change to be constantly high, then will the search process perform in- creasingly p r l y with the degree of performance level. The search process can locate unconnected (possible larger) clusters but it can not locate more connected (possibly smaller) clusters.

The relative generator can, on the other hand, adaptively set the degree of the random change to be made to the generated performance system because the more structural similar two, or more, input performance systems are, the less degree of change will be made to the generated performance system, and the less structural similar two, or more, input performance systems are the more degree of change will be made to the generated performance system. The whole population of performance systems determine together statistically the bias of the degree of change to be made to the generated performance system.

The semi-relative generator can not adaptively bias the range of the random change to be made to the generated performance system. The relative generator can, on the other hand, adaptively bias the range of the random change to be made to the generated performance system because the range is now determined by the input performance systems. The whole population of performance systems determine together statistically the bias of the range of the change to be made to the generated performance system.

The parent selector, that selects a specific number of performance systems from the population of performance systems. This selection can be random or biased by (1) the structure and/or the composition of the performance systems, and/or (2) the behaviour of the performance systems in the population of performance systems. It is preferable to have a parent selector that is, what we call, positively structure bi- used. A positively structure biased parent selector selects performance systems that are more structurally similar with higher probability than performance systems that are less structurally similar. The reason for this preference has to do with the performance system relative generator. If the fitness topography has more than one cluster at some specific level, it would be ineflcient to choose the input performance systems randomly. If the selection of the performance systems is random, then is the probability small for

0

2.

0

3.

0

46

selecting two, or more, performance systems from a cluster containing only a small number of performance systems. A positively structure biased parent selector eliminates this problem.

In summary, by using a population of performance systems, a positive structure similarity biased parent selector and a relative generator we get a method that solves the two problems (i) of cluster fragmentation and (iii) of cluster localisation. A set of performance systems can in parallel approximately determine the fragmentation and localisation of the clusters in the topography of the search space if the performance systems are "connected" by a positive structure similarity biased parent selector and a relative generator. The degree of approximation is determined by the size of the population of systems and the size and topography of the search space.

The deleter, that deletes performance systems from the population of performance systems, where the deleter consist, in turn, of two sub systems:

0 The tester. that tests the performance systems selected by the competitor selector (described below). This testing can be random or biased by the behaviour of the selected performance systems. The tested and poorly performing performance systems are then deleted from the population of performance systems. The tester belongs to one of the following three categories:

0 Absolute. An absolute tester is not taking any performance systems as input in the validation of performance systems. This type of tester is meaningless, and will not be considered further.

Semi-relative. A semi-relative tester is taking exactly one performance system as input in the validation of performance systems.

0 Relative. A relative tester is taking more than one performance system as input in the validation of performance systems.

There is one fundamental difference between the semi-relative tester and the relative tester that makes the relative tester superior over the semi-relative tester.

0 The problem with a semi-relative tester is that it must use information given a priori to validate the performance system taken as input, The source for this kind of information exists in the minimal description of the wished for behaviour of the performance system. This information is, however: only partial. A semi-relative tester validates the performance system taken as input by an absolute measure, A relative tester must not, on the other hand, use information given a priori to vaiidate the performance systems taken as input. The relative tester can use the two, or more, systems to validate each other. T h s is done by setting up a competition between the performance systems taken as input. A relative tester does not evaluate the performance systems taken as input by an absolute measure, the evaluation is totally relative.

The competitor selector, that selects a specific number of performance systems from the population of performance systems. This selection can be random or biased by (1) the structure and/or the composition of the performance systems, and/or (2) the behaviour of the performance systems in the population of performance systems. It is preferable to have a competitor selector that is, what we call, negatively hehm- iour biased. A negatively behaviour biased parent selector selects performance systems that are less behaviourally similar with higher probability than performance systems that are more behaviourally similar. The reason for this preference has to do with the performance system relative tester. If we have:

1.

2.

it would be inefficient to choose the input performance systems randomly. If the selection of the performance systems is random, then can, in some cases, the probability for selecting two. or more, performance systems from dfferent performance levels be small. This implies that the cumulative search would be inefficient when there are only a relatively few performance systems on a higher performance level.

In summary, by using a population of performance systems, a negative behaviour similarity biased competitor selector and a relative tester we get a method that solves the problem (iv) of performance level differentiation, A set of performance systems can in parallel approximately determine the (current local) differentiation of the performance levels in the topography of the search space if they are connected by a negative behaviour simi-

0

0

0

a large number of performance systems that have a low performance value, and

a small number of performance systems with a high performance value,

47

larity biased competitor selector and a relative tester. The degree of approximation is determined by the size of the population of systems and the size and topography of the search space.

Finally, the above described composition of the performance system is structured in the following manner; the adder and the deleter are working in parallel on the population of systems. By having an adder and an deleter structured like this, we get a method that solves the problem (ii) of finding the right path. A set of performance systems can together, more or less globally, try several Merent paths simultaneosly by a kind of "path competition"; systems following bad path's is simply deleted from the population of systems in the long run. The degree of approximation is determined by the size of the population of systems and the size and topography of the search space. For formal proofs regarding this section, see [7].

5 CONCLUSION T h s paper has defined and answered the main question in machine learning. We started out by malung an analogy between the area of e v ~ l ~ t i ~ n a r y biology and the area of machine learning. The result of tlus was a formalisa- tion of the process of evolution that has helped us to solve the main problem in machine learning. This formalisa- tion can also, in addition to being a very b e a u ~ l and compact framework for biology, return sometlung back to the area of evolutionary biology. The importance of a relative tester has been known since the time of Darwin. This is cllarly demonstrated in some of the common alternative names for the theory of evolution; survival of the fittest and the theory of natural selection. This paper has, however, shown that the concept of a relative generator is of equal importance. The concept of a relative generator gwes us an explanation to the, long sought after, solution to the problem of why sexual reproduction is so common, or even existing at all, in nature. The common view has been that sexual reproduction is an important strategy for higher organisms to employ in the battle between higher organisms and lower organism like, for example, virus and bacteria [3]. This is probably true, but this paper has shown that h s is not the main reason. The main reason for sexual reproduction being useful is its capacity to bias (1) the the location, (2) the degree, and (3) the range of the change to be made in the construction of a new performance system. These capacities are, as we have seen previously, very important, or indeed es- sential, for space-time effective performance system development. The function with having two separately constructed sexes is of course to remove the possibility for selffertilization. Contrast this with the concept of a semi-relative generator, or equally; asexual reproduction. Asexual reproduction can not do any of the tasks (l), (2) and (3). Species that rely on the strategy of asexual reproduction are doomed to a very slow process of evolution.

REFERENCES 1.

2.

3. 4.

5 .

6.

7.

Fogel, L. J., Owens, A. J., Walsh, M. J., Amficial Intelligence Through Simulated Evolution, Wiley Pub- lishing, New York 1966. Fraser, A. S., Simulation of genetic systems by automatic digital computers, Australian Journal of Biological Science, Australia 1957. Futuyama, D. J., Evolutionary Biology, Sinauer Associates, Inc, Massachusetts, 1986. Holland, J. H., "Adaptation in Natural and Artlfcial Systems", Ann Arbor, Michigan: The University of Michlgan Press, 1975. Rechenberg, I., "Evolutionsstrategie: Optimierung Technisher Systeme nach Prinzipien der Biologischen Evolution", Frommann-Holzboog, Stuttgart, 1973. Spears, W. M., De Jong, K. A., Back, T., Fogel, D. B., Garis, H, "An Overview of Evolutionary Computa- tion", Machme Learning ECML-93, Springer-Verlag, Berlin Heidelberg, 1993. Wijkman, P. A. I., "A Formalised Model of The Evolution", Adaptive and Learning Systems 11, SPIE, Or- lando, 1993.

48

[ieee international conference on expert systems for development - bangkok, thailand (28-31 march...

Documents