ch. eick: what ec-algorithm designers can learn from genetics 1 part 1 - natural genetics ben...

24
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics 1 Part 1 - Natural Genetics Ben Paechter with thanks to the EvoNet Training Committee and its “Flying Circus”

Upload: amberly-stevens

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

1

Part 1 - Natural Genetics

Ben Paechter

with thanks to the EvoNet Training Committee and its “Flying Circus”

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

2

Natural Genetics

The information required to build a living organism is coded in the DNA and other genetic material found in the cells of that organism

Within a species, most of the genetic material is the same

Small changes in the genetic material give rise to small changes in the organism– E.g height, hair colour

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

3

DNA and Genes

DNA is a large molecule made up of fragments. There are several fragment types, each one acting like a letter in a long coded message:

-A-B-A-D-C-B-B-C-C-A-D-B-C-C-A- Certain groups of letters are meaningful together - a

bit like words. These groups are called genes The DNA is made up of genes and rubbish

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

4

Example: Human Reproduction

Human DNA is organised into chromosomes Most human cells contains 23 pairs of chromosomes which together

define the physical attributes of the person:

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

5

Reproductive Cells

Sperm and egg cells contain 23 individual chromosomes rather than 23 pairs

Reproductive cells are formed by one cell splitting into two

During this process the pairs of chromosome undergo an operation called crossover

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

6

Crossover

During crossover the chromosome pairs link up and swap parts of themselves:

Before After

After crossover one of each pair goes into each cell

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

7

Fertilisation

Sperm cell from Father Egg cell from Mother

New person cell

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

8

Mutation

Occasionally some of the genetic material changes very slightly during this process

This means that the child might have genetic material information not inherited from either parent

This is most likely to be catastrophic

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

9

Theory of Evolution From time to time, reproduction, crossover and mutation

produce new genetic material or new combinations of genes Usually this reduces the organism’s ability to survive and so

reproduce Occasionally the new genetic material increases the organism’s

ability survive and so reproduce If it allows the organism to reproduce more then this leads to

more and more organisms have the “new improved” genetic make-up

“Good” sets of genes get reproduced more “Bad” sets of genes get reproduce less

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

10

Theory of Evolution (2)

The organisms as a whole get better and better at surviving in their environment

Evolutionists claim that all the species of plants and animals have been produced by this slow changing of genetic material - with organisms becoming better and better at surviving in their niche, and new organisms evolving to fill any vacant niche

They agree that evolution requires reproduction, selection and mutation

Some say evolution also requires crossover

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

11

Evolution as Search We can think of evolution as a search through the

enormous genetic parameter space for the genetic make-up that best allows an organism to reproduce in its changing environment

Since it seems pretty good at doing this job, we can borrow ideas from nature to help us solve problems that have an equally large search spaces or similarly changing environment

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

12

Dr. Eick’s Transparencies:

Genetics and What EC AlgorithmDesigners can learn from it

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

13

More Genetics: Diploidy and Dominance Diploidy: Most chromosomes in biological systems are double-

stranded(diploid) and not single-standed(haploid) carrying pairs of chromosomes each containing information for the same function.

The primary mechanism to select which genotypical information will be expressed in the phenotype is dominance:

– AbCDe + aBCde ABCDe Diploidy provides a mechanism for remembering alleles and allel

combinations that were previously useful; dominance provides a mechanism to shield those remembered alleles from harmful selection in a current hostile environment (increasing implicitly the richness of the genes expressed in the current population by providing a shield against overselection).

Dominance relationships frequently adapt in biological systems when the need arises.

Hollstien(1971) simulated dominance using a three letter instead of a binary alphabet consisting of: dominant 1, non-dominant 1, and 0 with:

1dom > 0 and 1rec < 0.

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

14

Dominance and Diploidy (Continued)

Other research represents the dominance information separately from the gene and lets it undergo evolution --- a kind of co-evolution approach.

In the late 70s, Smith and Goldberg explored the use of redundancy for the normal knapsack problem with dynamic weight changes:

– Holstein’s triadic scheme showed improvement over a static dominance scheme.

– it turned out that the diploid approach coped better with ascillations in the weight function.

– decreases the probability that desired schemas are lost “forever”. In summary, there seems to be some evidence that exploiting diploidy

can be beneficiary for GAs in dynamically changing environments, especially if scenarios encountered in the past have a tendency to reoccur in the future; on the other hand, diploidy is quite expensive, and not too much research has been performed in the last 15 years that explores its use for GAs.

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

15What can GA-designer learn

from plant genetics and horticulture? polyploidy and dominance gametogenesis is used as the crossover operator use of selfing unusual ways to prevent self fertilization use of intercrossing (create cartesian products of good initial

solutions) preference for heterozygous sources and rich gene pools plant breeders employ complex search strategies to breed the

best possible plant (such as recurrent selection, which will be the topic of this talk).

mutation not very important, because it is hard to control; large population sizes are difficult to handle because of pragmatic reasons.

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

16

Polyploidy

Polyploidy: using two are more complete sets of chromosomes; the

phenotype of an organism is determined through dominance of alleles.

Advantages: adaptation to changing environments, “memorize” alleles that worked successfully in the past, richer gene pool.

Previous Research on Polyploidy: two major approaches to simulate polyploidy in GAs:

using an extra chromosome to represent dominance information [Brindel, this talk]

extending the alphabet to distinguishes between dominant and recessive elements [Holstein, Smith&Goldberg, Ng&Wong]

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

17

Features of our Approach

uses at least 2 sets of chromosomes uses a dominance vector as a tie breaker uses a crossover control vector to restrict possible crossover points dominance vectors and crossover control vectors take part of the

evolution gametogenesis is used as the crossover operator

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

18

3. Experiments

Benchmarks:– Knapsack problem with dynamically changing weight constraints– Schwefel function

Evaluation is performed with respect to the following measure:

M2= (Ti-Xi)2/G

where Ti is the true optimimum for generation i and Xi is the best

solution found in generation i, and G is the number of generations.

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

19

4. Summary

proposed an approach to support polyploidy that uses dominance vectors

demonstrated the benefits of the approach in oscillating environments which cycle among several different states.

crossover control vectors are employed to provide linkage between the dominance vector and the chromosomes themselves.

approach facilitates maintaining diversity in relatively small populations our experiments at least partially explain why diploidy and polyploidy

exist in biological systems.

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

20

Literature

Ben S. Hadad and Christoph F. Eick: Using Recurrent Selection to Improve GA-performance, ISMIS, Charlotte, October 1997.

Ben S. Hadad and Christoph F. Eick: Supporting Polyploidy in Genetic Algorithms Using Dominance Vectors, EP’97, Indianapolis, April 1997.

Ben S. Hadad: Extending Genetic Algorithms Using Ideas Borrowed from Plant Genetics and Horticulture, Master’s Thesis, University of Houston, December 1996.

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

21

Inversion and Other Reordering Operators Reordering operators change the position/location of genes in a

chromosome, but do not change the composition of the chromosome:– consequently, reordering operators do not directly affect the fitness.– however, crossover is effected: namely, the defining length of a schema is

changed by applying reordering operators, which increases or decreases the probability that instances of a particular schema reoccur in the future.

– reordering causes that genes are nolonger lined up corrrectly, which, in many applications, causes problems with the crossover operator:

necessary genes might be missing: non-complete gene combinations can occur. duplicated genes can occur, wbich is usually not desirable.

The most popular reordering operators are inversion and swapping:

1 2 3 | 4 5 6 7 | 8 inversion: 12376548 swap: 12375648 Empirical evidence seem to indicate that at least in some applications

reordering operators are useful “secondary” operator, whose employment induces slight improvements in the overall performance.

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

22

Niche and Speciation We can view a niche as an organism’s job or role in an environment,

and we can think of a species as a class of organisms with common characteristics.

Niche Methods in Genetic Search:– crowding (DeJong(1975)) and sharing functions (Goldberg(1987)).– external schemes (Perry(1984)) which are similarity templates that define

species membership that have be provided by the GA-developer.– Mating restrictions in genetic search:

line breading (breed the champion repeatedly with others) Hollstein’s inbreeding with intermittent crossbreeding (close individuals still bread

as long as their family average fitness continues to improve; otherwise, crossbreeding between different families is used).

Booker introduces mating templates that are mate selection mechamisms that become part of the individual (which themselves undergo evolution) and proposes different mating rules:

– bidirectional match

– unidirectional match

– best partial matches disallow breeding of simimlar indiduals (e.g. incest)

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

23

Example of a Booker Mating Template

Assume we have chromosomes over alphabet A with chromosome length n, and let A’=union(A,{#}).

Extend chromosomes tripling their length to:

ind=a1...anb1...bnc1...cn with aiA, bi and ciA’ (i=1,n) with the meaning:

ind is allowed to mate with ind’: if ind’Schema(b1...bn ) or ind’Schema(c1...cn ).

Example: Let n=4 and A be the binary alphabet:

ind1=0010 0000 1111

ind2=0000 1### 0111

ind3=0111 001# 1111 Bidirectional match requests that “a must want b” and “b must want a”,

whereas in unidirectional match it is sufficient that one partner wants the other. Many other matching schemes are possible; e.g. more complicated ones that

operate on scores and thresholds.

Ch. Eick: What EC-Algorithm Designers can Learn from Genetics

24

Artificial Mating Tags the problem with Booker’s approach is that mating templates have the

same length as the chromosomes themselves, producing a significant overhead. To reduce this overhead Holland proposed to use a three-part strings consisting of:

– a short mating template(used to test suitability of other mates)– a short mating tag(used by others to match, characterizes the string)– the functional substring

Example: #10#:1010:111111000011

#0##:1100:011111110001– mating tags effect the compatibility with other strings, but do not effect the

fitness.– usually, the three-part string is evolved.– Holland’s scheme of using artificial mating tags can also be used to define

mating niches abstractly, similar to Perry’s external schema approach, by freezing particular positions in templates and tags. For example, mating can easily restricted to particular subsets of the population. Mating tags can also be used to simulate distributed GAs.