genetic algorithmseugennc/teaching/nim/res/min2.pdf · •a representation (genotype) of candidate...

28
1 GENETIC ALGORITHMS Slides following Michalewicz’s monograph on Genetic Algorithms Eiben&Smith’s monograph on Evolutionary Computing 1 Local search (maximization): Iterated Hillclimbing begin t0; repeat localFALSE; select a candidate solution (bitstring) v c at random; evaluate v c ; repeat generate the length strings Hamming-neighbors_1 of v c {or select some among them}; select v n , the one among these which gives the largest value for the objective function f; if f(v c ) < f(v n ) then v c v n else local TRUE until local; tt+1; until t=Max end. Mix of random search and deterministic search. New idea: small mutation on the representation. 2 Local search (maximization): Simulated Annealing begin t0; initialize temperature T; select a current candidate solution (bitstring) v c at random; evaluate v c ; repeat repeat select at random one of the length strings Hamming-neighbors_1 of v c; if f(v c ) < f(v n ) then v c v n else if random [0,1) < exp ((f(v n ) - f(v c ))/T) then v c v n until (termination condition); Tg(T,t); {g(T,t)<T, t} tt+1 until (halting-criterion) end. Little determinism remains. Diversity encouraged (=). New ideas: the worse solution has a chance to survive - a smaller and smaller chance. Escape local optima! 3 A non-classical approach Self-adaptation: optimisation as discovery (rather than invention) Less approximations More than one good solution Processing directly computer representations of solutions (not those of intermediary mathematical objects) Turning the combinatorial explosion to the user’s benefit Meta-optimisation (of the algorithm) 4

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

1

GENETIC

ALGORITHMS

Slides following

Michalewicz’s monograph on Genetic Algorithms

Eiben&Smith’s monograph on Evolutionary Computing

1

Local search (maximization):

Iterated Hillclimbingbegin t0;

repeat

localFALSE;

select a candidate solution (bitstring) vc at random; evaluate vc;

repeat generate the length strings Hamming-neighbors_1 of vc

{or select some among them};

select vn , the one among these which gives the largest value for the objective function f;

if f(vc) < f(vn) then vcvn else local TRUE

until local;

tt+1;

until t=Max

end.

Mix of random search and deterministic search.

New idea: small mutation on the representation.2

Local search (maximization): Simulated Annealing

begint0;

initialize temperature T;

select a current candidate solution (bitstring) vc at random;

evaluate vc;

repeatrepeat select at random one of the length strings

Hamming-neighbors_1 of vc;

if f(vc) < f(vn) then vcvn else

if random [0,1) < exp ((f(vn) - f(vc))/T) then vcvn

until (termination condition);

Tg(T,t); {g(T,t)<T, t}

tt+1

until (halting-criterion)

end.

Little determinism remains. Diversity encouraged (=).

New ideas: the worse solution has a chance to survive -

a smaller and smaller chance. Escape local optima!3

A non-classical approach

• Self-adaptation: optimisation as discovery (rather than

invention)

• Less approximations

• More than one good solution

• Processing directly computer representations of solutions

(not those of intermediary mathematical objects)

• Turning the combinatorial explosion to the user’s benefit

• Meta-optimisation (of the algorithm)

4

Page 2: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

2

What are we dealing with ?

• “General-purpose” search methods

• Probabilistic algorithms

• Inspired from natural evolution

• Soft-computing paradigm

• Useful in

– optimisation

– machine learning

– design of complex systems

5

Optimisation

• Constraints

• Discrete or continuous variables

• Dynamic (moving) optimum

• Multicriterial

• Procedural view:

– Step by step improvement

– “Needle in the haystack”

• Criteria: accuracy and precision6

Computing (1)

• Hard computing

– Rigid, “static” models: no adaptation, or

adaptation driven from outside

– The model for a problem is completely

described beforehand, not to be changed as

solving goes on

• Same “roadmap” for each instance of the problem

– Calculus, Numerical Analysis (Newton

method), …

7

• Soft computing

– Self-adapting models

– Only the frame for auto-adaptation is specified

– Properties of the instance of the problem - fully

exploited

– Probabilistic modelling and reasoning

• Evolutionary Computing

• Artificial neural networks

• Fuzzy sets

8

Computing (2)

Page 3: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

3

• Chromosome (individual) – a sequence of genes (…)

• Gene – atomic information in a chromosome

• Locus – position of a gene

• Allele – all posible values for a locus

• Mutation – elementary random change of a gene

• Crossover (recombination) – exchange of genetic

information between parents

• Fitness function – actual function to optimise; environment

9

Glossary (1) Glossary (2)

• Population – set of candidate solutions

• Generation – one iteration of the evolutionary process

• Parent – chromosome which “creates” offspring via a genetic

operator

• Descendant (offspring) – chromosome obtained after

applying a genetic operator to one or more parents

• Selection – (random) process of choosing parents and / or survivors

• Genotype – representation at the individual level

• Phenotype – (theoretical) object represented through genotype

• Solution given by a GA – best candidate in the last generation

10

Evolutionary computing

• Evolution Strategies

• Evolutionary Programming

• Genetic Algorithms

– Genetic Programming

11

The first two major directions

• Evolution strategies

– A method to optimise real-valued parameters

– Random mutation and selection of the fittest

– Ingo Rechenberg (1965,1973); Schwefel (1975,1977)

• Evolutionary programming

– Fogel, Owens, Walsh (1966)

– Random mutations on the state-transition diagrams of finite state machines; selection of the fittest

12

Page 4: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

4

A Breakthrough: GAs

• John Holland – 60’s and 70’s.

• Goal: not solving specific problems, but

rather study adaptation

• “Adaptation in Natural and Artificial

Systems” (1975)

• Abstraction of biological evolution

• Theoretical study (schema theorem).

13

Further ideas

– Messy Genetic Algorithms (Goldberg)

– Genetic Programming (Koza)

– Co-evolution (Hillis)

– Memetic / cultural algorithms

– Micro-economic algorithms

• Metaphor-based algorithms

• Statistical study of “suitability” between representation and fitness function, operators, etc.

14

Evolutionary Computing - Shared

Features

• maintain a population of representations of candidate solutions

• which are improved over successive generations

• by means of mutation (, crossover,…)

• and selection based on individual merit

• assigned to individuals through evaluationagainst fitness function (“environment”)

15

Data Structures

• Genetic Algorithms

– standard: bitstring representations (integers, real numbers, permutations etc.);

– others: bidimensional string structures (graphs, matrices etc.); neighborhood-based (“island”); “geographically”-distributed etc.

– Genetic Programming: trees (programs)

• Evolution Strategies: floating point representations (real numbers)

• Evolution Algorithms: representations of finite state machines

16

Page 5: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

5

Theoretical Models

• Schema model (Holland)

– Scheme;

– Building-blocks hypothesis

– Implicit parallelism

• Markov-chain models (linked processes)

• Vose models (statistical mechanics)

• Bayesian models (Muehlenbein, …)

17

The Elements of a

Genetic Algorithm

• A population of pop_size individuals (chromosomes)

– An initialisation procedure

• A representation (genotype) of candidate solutions (phenotype)

• An evaluation (fitness) function

• A selection scheme

• Genetic operators (mutation, crossover, …)

• Parameters (size, rates, scalability, …)

• A halting condition

18

Meta-optimisation

• Design of a GA for a given problem:

– choose a representation (out of a few possible);

– choose a fitness function (various models; scalable);

– choose a selection scheme (around 10 “classical”);

– choose operators (or invent them);

– set the parameter values (tens/hundreds).

• A moderate evaluation: 5*10*10*10*100500.000:

hundreds of thousands of GAs for a given problem.

• Most of them probably perform poorly, but usually a few of them outperform other kinds of algorithms

• Finding the “good” GA for a problem is an optimisation problem in itself

– Meta-GA19

General Scheme of a GAbegin

t0;

Initialise P(t);

Evaluate P(t);

While not (halting condition) do begin

tt+1;

Select P(t) from P(t-1);

Apply operators to P(t);

Evaluate P(t)

end

end.

• Except possibly for the Evaluation step, the scheme is polynomial.

• This scheme is valid for (most) evolutionary algorithms.20

Page 6: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

6

Algorithms Work?

How Do Genetic

21

Implementation and Use of a GA

1. Define its elements

• Representation

• Fitness function

• Selection mechanism

• Genetic operators

• Parameters

2. Design the experiment (includes optimising the GA)

• Stand-alone

• Comparison to non-GA algorithms

3. Actually perform the experiment

4. Interpret the results

22

What about the Problem ?

1. State the problem as optimisation of a real function ( clustering?)

2. A step-by-step solution improvement strategy possible

• Otherwise, create a frame for this (e.g., #_of_minterms)

3. Uni-criterial optimisation. Otherwise:• build one global criterion (linear – timetable –/ non-linear) or

• work in a Pareto (Nash…) setting

4. Optimisation as maximisation. Otherwise:• use -f or 1/f or analogous, instead of f

5. Optimisation of positive functions• Otherwise, translate the function by a constant

6. Restrictions:• Encoded (included in the representation);

• Penalties (non-admissible solutions accepted, with lower fitness);

• Repair (find closest admissible solution; costly time-wise).23

The Satisfiability Problem

• Straightforward to encode solutions (a string of k bits is a truth assignment for k variables)

• Straightforward evaluation (the truth value of the resulting expression)

• But with only two possible values of the fitness function, impossible to “learn” / improve step-by-step.

• However: fitness for conjunctive normal form: number of terms which are evaluated as “true”

– step-by-step improvement possible.

24

Page 7: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

7

A GA Frame for

Numerical Optimisation

• f: RkR, with restrictions xi Di = [ai ;bi], f > 0.

• Optimisation using 6 decimal positions.

• Di is to be cut into subintervals, where

, with mi minimal

• Each xi would then be represented by a substring

of mi bits:

• Decoding: a candidate solution is represented by a

bitstring of length

1210)a-(b 6

ii im

1-2

a-b)d...d(dax

ii m

ii011-mii decimal

.mmk

1i i

25

im2

Elements of a GA for

Numerical Optimisation

• Initialization:

– population v1 , v2 , …, vpop_size .

– random bitwise or

– heuristic (initiate with some found solutions).

• Evaluation:

– decode the bitstring v into a vector x of numbers

– calculate eval(v)=f(x) .

• Halting condition: a maximum number of generations

(iterations) may be enforced.

26

One More Element of the GA:

Selection for Survival (1)

• Probability field based on fitness values

• Best known procedure is “fortune wheel”:

– Slots proportional to fitness.

– Calculate (steps 1., 2., 3. ,4. , for i=1..pop_size)

1. Individual fitnesses: eval(vi)

2. Total fitness:

3. Individual selection probabilities: pi=eval(vi) / F

Probability field already built!

Other options: deterministic, rank-based, tournament...

4. Cumulative selection probabilities:

q0 = 0.

Procedure: iterated fortune wheel.

sizepop

i i

_

1)(vF eval

i

j 1 ji pq

27

Probability field and cumulative

probabilities

28

… …

0 1

q0 q1 q2 q3 qi qi+1qpop_sizeqpop_size-2

Page 8: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

8

• Idea: create an intermediate population

– Spin the fortune wheel pop_size times. Each time:• select for survival one chromosome from the current

population

• put it into the intermediate population.

• Implementation:

– generate a random (float) number r in [0 , 1];

– if (qi-1 < r qi) then select vi.

• With higher probability:– best chromosomes get more and more copies;

– average stay even;

– worst die off.

29

One More Element of the GA:

Selection for Survival (2)

Operators for a Numerical

Optimisation GA: Crossover (1)

• pc – the probability of crossover – gives the expected number of chromosomes which undergo crossover: pc·pop_size.

• Selection for crossover (parents).

– In the intermediate population

for (each chromosome) do

generate a random (float) number r in [0 , 1];

if r < pc then select the current chromosome.

mate the parents randomly.

30

• for (each paired parents) do

– generate a random number pos{1,..,m-1}

– replace the two parents by their offspring.

P1 = (a1 a2 …apos apos+1 … am)

P2 = (b1 b2 …bpos bpos+1 … bm)

O1 =(a1 a2 …apos bpos+1 … bm)

O2 =(b1 b2 …bpos apos+1 … am)

• Uniform distribution?

31

Operators for a Numerical

Optimisation GA: Crossover (1)

• pm – probability of mutation – gives the expected

number of mutated bits: pm·m ·pop_size

• Each bit in the population has the same chance to

undergo mutation.

for (each chromosome after crossover) do

for (each bit within the chromosome) do

generate a random number r in [0 , 1];

if (r < pm) then mutate the current bit.

32

Operators for a Numerical

Optimisation GA: Mutation

Page 9: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

9

A Detailed Example (Michalewicz)

max f

f(x1,x2) = 21.5 + x1·sin(4x1) + x2·sin(20x2),

with restrictions: -3.0x112.1, 4.1x25.8,

with 4 decimal places for each variable.

• Representation:

– 217<151000<218; 214<17000<215: 33 bits.

– v = (010001001011010000.111110010100010)

– 0100010010110100001.0524

– 1111100101000105.7553

– eval(v) = f(1.0524, 5.7553) = 20.2526.

• Set: pop_size = 20; pc = 0.25; pm= 0.01.

33

First Generation

Random initialisation

Evaluation (best_so_far: v15; worst: v2)

• F=eval(vi)=387.7768

• Field of probabilities (pi, i=1..pop_size).

• Cumulative probabilities (qi, i=0..pop_size).

• Spinning the fortune wheel pop_size times.

• New population.

34

First Generation - Crossover

• Selection for crossover.

– Some chromosomes selected (pc·pop_size ?).

– If (odd number) then randomly decide {drop / add}.

– Randomly couple together selected parents.

• Example – one pair. Generate pos{1..32}. pos = 9.

v2’= (100011000.101101001111000001110010)

v11’= (111011101.101110000100011111011110)

v2’’= (100011000. 101110000100011111011110)

v11’’=(111011101. 101101001111000001110010)

35

First Generation – Mutation

• Selection for mutation.

– 660 times generate a random number in [0;1]

– Distribution? (uniform / targeted)

– Expected number of mutated bits: pm·m·pop_size=6.6

r Bit position

(population)

Chromos.

number

Bit position

(chromos.)

0.000213 112 4 13

0.009945 349 11 19

0.008809 418 13 22

0.005425 429 13 33

0.002836 602 19 8 36

Page 10: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

10

Best_so_far

• Best chromosome in the final, 1000th generation:

v11=(110101100000010010001100010110000)

eval (v11) = f (9.6236 , 4.4278) = 35.4779.

• But, in the 396th generation:

• eval (best_of generation) = 38.8275 !

• So, store best_so_far as the current solution given by the GA (not best_of_generation).

• Reason: stochastic errors of sampling (pseudo-random numbers, finite populations, finite number of generations).

37

Genetic Algorithms for Various

Types of Problems

38

39

Four Sample GAs

1. Numerical optimisation

• Representing numbers

• The fitness function is the studied function

2. Combinatorial optimisation

• Representing graphs / permutations

• The objective function becomes fitness

3. Machine learning

• Representing strategies

• Optimisation of the “the overall gain”.

4. Optimisation with restrictions

• Restrictions treated through penalties.

A Simple Numerical Example (1)

• Find up to 6 digits x0[-1; +2] to maximize

f(x) = xsin(10x) + 1.

• Representation.

[-1; +2] to be divided into 3·106 equal

subintervals. 22 bits are required:

2097152 = 221 < 3·106 < 222 = 4194304

• Decoding.

(b21 b20 …b0) bi·2i = x’ x = -1+x’·(3/(222 -1))

40

Page 11: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

11

A Simple Numerical Example (2)

• (0000000000000000000000) -1.

• (1111111111111111111111) +2.

• v1=(1000101110110101000111)x1= +0.637197

• v2=(0000001110000000010000)x2= -0.958973

• v3=(1110000000111111000101)x3= +1.627888

• Evaluation. eval(v1) = f(x1) = 1.586245;

eval(v2)=f(x2) = 0.078878; eval(v3)=f(x3) = 2.250650

• Mutation. Possible improvement with no arithmetic calculation!

• v’3 = (1110000001111111000101)

x’3 =+1.630818 eval(v3’) = f(x3’) = 2.343555 > f(x3) = eval(v3) !

41

A Simple Numerical Example (3)

• Crossover.

• Suppose v2 , v3 were selected as parents.

• v2 = (00000 | 01110000000010000)

• v3 = (11100 | 00000111111000101)

• Offspring:

• v’2=(00000|00000111111000101)x’2=-0.99811

• v’3=(11100|01110000000010000)x’3=+1.66602

• eval(v’2) = f(x’2) = 0.940865 > eval(v2)

• eval(v’3) = f(x’3) = 2.459245 > eval(v3) > eval(v2)

42

A Simple Numerical Example (4)

• (Main) parameters.

• pop_size = 50; pc = 0.25; pm = 0.01.

• Experiments.

• Solution after 150 generations:

• vmax = (1111001101000100000101)

• xmax = 1.850773; eval(vmax) = f(xmax)= 2.850227

• Evolution of f(xmax_so_far) over generations:

1 10 40 100

1.441942 2.250363 2.345087 2.849246

43

A Combinatorial Example

• Travelling Salesperson Problem. Given the cost of

travelling between every two cities, find the least expensive

itinerary which passes exactly once through each city and

returns to the starting one.

• An evolutionary approach.

– Representation: integer vector for permutations.

– Evaluation: sum of costs of successive edges in the tour.

– Mutation: switch two genes.

– Crossover (OX): choose a subsequence from a parent,

for the rest preserve the relative order from the other one

P: (1 2 3 4 5 6 7 8 9 10 11 12) (7 3 1 11 4 12 5 2 10 9 6 8)

O: (3 1 11 4 5 6 7 12 2 10 9 8) (1 2 3 11 4 12 5 6 7 8 9 10)

44

Page 12: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

12

Iterated Prisoner’s Dilemma (1)• Gain table (one iteration):

• Maximize the number of points# points = 5 - penalty

• Always defect? Overall in many iterations, worse than “always cooperate”.

Player1 Player2 Points1 Points2

Defect Defect 1 1

Defect Cooperate 5 0

Cooperate Defect 0 5

Cooperate Cooperate 3 3

45

Iterated Prisoner’s Dilemma (2)

• Axelrod (1987).

• Deterministic strategies, based on previous three moves: 43 = 64 different histories.

• Strategy: what move to make against each possible history: 64 bits (instead of “C/D”).

– Each bit address is a “history”.

• 6 extra bits for the history of the actual iterated game.

• 70 bits represent a chromosome

• The actual search space has cardinality 264

– All possible strategies taking into consideration the previous 3 moves. 46

Iterated Prisoner’s Dilemma (3)

• Random initialisation of the population;

• One generation:

– Evaluate each player (chromosome) against

other players on the actual games;

– Select chromosomes for applying operators

(average scorebreed once;

one SD above averagebreed twice;

one SD below averageno breeding);

• SD stands for standard deviation

– Mutation and crossover on bitstrings are used.47

Human Tournaments and the

First Round of GA Experiments• TIT FOR TAT won two human tournaments:

– cooperate;

– while (not halt) do what the oponent did in the previous step.

• GA experiments – first round: fixed environment.

– pop_size = 20; max_gen = 50; no_runs = 40;

– evaluation of each chromosome: average of iterated games against 8 best human strategies;

– results: TIT FOR TAT or better in the static fitness landscape (the 8 strategies)

• Explanation of “better”: TIT FOR TAT is general; some GA solutions were over-adapted to the specific environment.

• The global optimum was found evaluating at most 20*50=1000≈210 strategies out of 264.

– only 0.000000000000000001 of the search space!48

Page 13: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

13

Second Round of GA Experiments

• Experiments – second round: changing environment.

– each individual evaluated against all 20 individuals;

– environment changes as individuals evoluate;

– fitness: average score of the individual;

– early generations: uncooperative strategies;

– after 10-20 generations: more and more cooperation in the TIT FOR TAT manner: reciprocation!

• Further ideas:

– no crossover (Axelrod 1987);

– expandable memory (Lindgren 1992).

49

Place 8 queens on an 8x8 chessboard in

such a way that they cannot check each other

Fourth Example: the 8 queens problem

50

The 8 queens problem: representation

1 23 45 6 7 8

Genotype:

a permutation of the numbers 1 - 8

Phenotype:

a board configuration

Obvious mapping

A non-optimal solution

51

• Penalty of one queen:

the number of queens she can check.

• Penalty of a configuration:

the sum of the penalties of all queens.

• Note: penalty is to be minimized

• Fitness of a configuration:

inverse penalty to be maximized

The 8 Queens Problem: Fitness evaluation

52

Page 14: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

14

The 8 queens problem: Mutation

Small variation in one permutation, e.g.:• swapping values of two randomly chosen positions,

1 23 45 6 7 8 1 23 4 567 8

53

The 8 queens problem: Crossover

Combining two permutations into two new permutations:

• choose random crossover point• copy first parts into children

• create second part by inserting values from other

parent:

• in the order they appear there

• beginning after crossover point• skipping values already in child

8 7 6 42 531

1 3 5 24 678

8 7 6 45 123

1 3 5 62 874

54

• A peculiar parent selection procedure:

– Pick 5 parents and take best two to undergo

crossover

• Survivor selection (replacement)

– When inserting a new child into the population,

choose an existing member to replace by:

– sorting the whole population by decreasing

fitness

– enumerating this list from high to low

– replacing the first with a fitness lower than the

given child

The 8 queens problem: Selection

55

8 Queens Problem: summary

Note that is is only one possible set of choices of

operators and parameters56

Page 15: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

15

Typical behaviour of a GA

Early phase:

quasi-random population distribution

Mid-phase:

population arranged around/on hills

Late phase:

population concentrated on high hills

Phases in optimising on a 1-dimensional fitness landscape

A GA enforces a topology on the search space, through its operators57

Typical run: progression of fitness

Typical run of an EA shows so-called “anytime behavior”

Be

st fitn

ess in p

opula

tio

n

Time (number of generations)

58

Best fitn

ess in p

op

ula

tion

Time (number of generations)

Progress in 1st half

Progress in 2nd half

Are long runs beneficial?

• Answer:

- it depends how much you want the last bit of progress- it may be better to do more shorter runs (“iterated”)

59

T: time needed to reach level F after random initialisation

TTime (number of generations)

Best fitn

ess in p

op

ula

tion

F: fitness after smart initialisationF

Is it worth expending effort on smart

initialisation?

Answer: possibly, if good, fast solutions/methods exist.

60

Page 16: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

16

Scale of “all” problems

Pe

rfo

rma

nce

of

me

tho

ds o

n p

roble

ms

Random search

A method tailored for one/few problem(s)

Evolutionary algorithm

GAs as problem solvers:

Goldberg’s 1989 view

Problem-adapted

This was before No-Free-Lunch Theorem (1996):

wrongly presumed unequal average behaviours

61

Scale of “all” problems

P

Michalewicz’ 1996 view

Pe

rfo

rma

nce

of

me

tho

ds o

n p

roble

ms

EA 1

EA 4

EA 3EA 2

This is compatible with the No-Free-Lunch Theorem:

looks like equal average behaviours

62

63

The Need for Theoretical Models

64

• Inspired from Nature, Genetic Algorithms have a

behaviour of their own• Which may or may not help explaining natural evolution

• Which laws describe the macroscopic behaviour of

Genetic Algorithms?

• How do low-level operators (selection, crossover,

mutation) result in the macroscopic behaviour of

Genetic Algorithms?

• What makes a problem suited for GAs to solve it?

• What makes a problem unsuited for GAs?

• Which performance criteria are relevant?

• See “Foundations of Genetic Algorithms” series

(started in 1991).

Page 17: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

17

65

Existing Theoretical Models• Schema Theorem:

– “The two-armed bandit problem” (Holland

1975);

– “Royal Roads” (Forrest, Holland, Mitchell,

1992)

• Exact mathematical models of simple

GAs (Vose 1991, Goldberg 1987, Davis

1991, Horn 1993, Whitley 1993)

• Statistical-Mechanics approaches (Prügel-

Bennett 1994)66

The Schema Approach

• Focuses on the genotype

– Relates to the phenotype only through fitness;

– Missing: homomorphism

• “Schema” (“schemata”) is a formalization

of the informal notion of “building blocks”.

• A schema is a set of bitstrings (hyperplane)

which is described by a template made up

of ones, zeroes and asterisks (wild cards).

• Example: m=10.– H1={(0011000111), (0011000110)}; S1=(001100011*)

– H2={(0011000111), (0001000111), (0011001111),

(0001001111)}; S2=(00*100*111)

67

Schemata• A schema with a wildcards describes a hyperplane

containing 2a strings.

• A string of length m is matched by (is represented, is an instance of) 2m schemata.

– m=2. 01 is an instance of: 01, 0*, *1, **.

• For all the 2m strings of length m there exist exactly 3m

schemata.

• Between 2m and pop_size·2m of them may be represented in one generation.

• Homework: considering schemas as sets, when is the union / difference of two schemas a schema?

68

Characteristics of Schemata (1)

1. Defined bits of a given schema S:all non-wildcard positions of S.

2. Order of a schema S, o(S):

number of defined bits in S

• how specific is S?

• o(S2) = 8.

3. Defining length of a schema S, (S):distance between its outermost defined bits.

• how compact is the information in the schema?

• (S2) = 10-1 = 9.

• if S has only one defined bit, then (S)=0.

Page 18: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

18

69

4. Average (“static”) fitness of a schema S in a

Genetic Algorithm: the arithmetic mean of

the evaluations of all instances stri of S:

eval(S) = (1/2a)·i eval(stri)

• a – the number of wildcards in S.

5. Average fitness of a schema S in generation n of

a specific Genetic Algorithm (dynamic-fitness):

eval(S,n) = (1/k)·jeval(strj)

• k – the number of instances of S in the population

P(n).

• If k=0, then eval(S,n)=0.

Characteristics of Schemata (2)

70

Schema Dynamics• Notation: “xH” for “x is an instance of H”

• Let H be a schema with

– (H,t) instances in P(t), generation t;

– eval(H,t) - the observed average fitness in generation t.

E((H,t+1)) = ?

• What is the expected number of instances of H in

generation t+1 , under the Schema Theorem

Hypothesis:

• the theorem deals with the standard Genetic Algorithm

– “fortune wheel” selection procedure

– one-point crossover

– standard mutation.

71

The Effect of Selection• One chromosome xϵP(t) will have in P(t+1) an

expected number of copies equal to f(x) /

– which comes from: 𝑝𝑜𝑝_𝑠𝑖𝑧𝑒 ∗ { f(x) / ΣyϵP(t) f(y) }

• (remember how the probability field for selection is built)

– Above, f(x)=eval(x); - average fitness in generation t

• The GA does not calculate explicitly eval(H,t), however this quantity “decides” the expected number of instances of H in subsequent generations.

• Schemas with higher dynamic-fitness will have more individuals in the next generation

)t(f

)t(f

, )t(f)/,eval(),η()t(feval(x)/1)),E(η(x

tHtHtHH

:get wet),η(H,/eval(x))(),eval( SinceHx

tH

72

• Effect: destroy or create instances of H.

• Consider the disruptive effect only lower bound for E((H,t+1)).

• Suppose an instance of H is a parent.

• The lower bound for the probability Sc(H) that H will survive after crossover is given by:

Sc(H) 1- pc·((H)/(m-1))

• The probability of survival under crossover is higher for more compact (“shorter”) schemas.

The Effect of Crossover

Page 19: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

19

73

• Again, consider the destructive effect only.

• The probability of survival under mutation:

Sm(H) = (1-pm)o(H) ≈ (1-pm*o(H))

– the probability that no defined bits will be

mutated.

• The probability of survival under mutation

is higher for lower-order schemas.

The Effect of Mutation

74

Schema Theorem• For the “standard GA” described in the hypothesis,

• This theorem summarizes the calculations above.

• It describes the (minimal) expected growth of a schema from one generation to the next.

• A consequence for schemas constantly above average:

– “Short,

– low-order,

– constantly above average schemas

– receive exponentially increasing numbers of instances in successive generations of a GA”

• because the (constant) geometric ratio is eval(H,t)/

o(H)

mc )p-(1)1-m

δ(H)p-(1t)η(H,

(t)f

t)eval(H,1))t,HE(η(

)t(f

75

Operators as “Builders”• The Schema Theorem gives a lower bound because it

neglects the “creativity” of operators.

• It is believed that crossover is a major source of GA power: it recombines instances of good, short schemas to create instances of at least that good larger schemas until, hopefully, the representation of the global optimum is obtained.

• “Building Block Hypothesis”: the supposition that this is how GAs work.

• Mutation provides diversity even when the population tends to converge

– if in the population a bit becomes 0 in all chromosomes, then only the mutation gives a chance to still try 1.

76

Implicit Parallelism• In one generation, a Genetic Algorithm acts as if it:

– estimates the average fitnesses of all schemas which have

instances in that generation;

– increases / decreases the representation of these schemas in

the next generation accordingly,

– despite the fact that such calculations are not included in the

algorithm.

• Implicit evaluation of an exponentially large number

of schemas, using only pop_size chromosomes.

• This is implicit parallelism (Holland 1975).

• Different from inherent parallelism (GAs also lend

themselves to parallel implementations).

Page 20: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

20

77

Adaptation Revisited

• Holland: an adaptive system should identify, test and

incorporate structural properties which are likely to

give better performance in some environment.

• Exploration vs. Exploitation:

– search for new, useful adaptations vs.

– use and propagation of adaptations.

• Neglecting one of them may lead to:

– overadaptation (inflexibility to novelty) “stuck”

– underadaptation (few/no gained properties).

• A good GA should have (or find…) a proper balance

between them.78

Schema Approach: Drawbacks

• Explains how convergence happens, if it happens.

• How does a non-represented schema appear in a future generation?

• Which new schemas are more likely to be discovered?

• How are the (not calculated) estimated schema average fitnesses implicitly used in the selection for survival ?

• What about situations where the observed fitness of a schema and its actual fitness are not correlated?

» Topology!!! (Route to optimum through neighbourhoods defined by operators)

• What about the speed of convergence?

• Schemas are fit for the likely building blocks of one-point crossover. Other structures deal better with other operators.

• One-step process only; unrealistic assumptions for longer-time predictions.

Further Topics in the

Implementation of

Genetic Algorithms

79 80

Design of Genetic Algorithms

• Huge number of design choices.

• Little theoretical guidance as to which ones are

better in specific cases.

• Bitstring representation, fitness proportionate

selection, simple operators may not be (and

definitely are not!) the choice in every

particular case.

• There are as many best choices for GAs as

there are GA projects

Page 21: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

21

81

When Should a GA Be Used?

• Many successes, but there are failures as well.

• A comparatively good GA is likely to exist if:

1. the search space is large (otherwise: exhaustive search);

2. the function to optimize

– is “non-standard” and / or noisy

– is multi-modal (otherwise: gradient-ascent)

– lacks any regularity properties / theoretical results

– requires prohibitive “classical” algorithms (complexity-wise)

3. a sufficiently good solution, not necessarily the global optimum, is required.

82

Data Structures (1)

Binary Encodings• The representation of candidate solutions is believed to

be the central factor of success / failure of a GA.

• Widespread option: fixed length, fixed order bitstrings.

• Advantages:

– more schemas for the same amount of information

– better developed theory and parameter-setting techniques (e.g., operator rates).

• Disadvantages: unnatural, uneasy to use for many problems (e.g.: evolving rules; optimising weights for artificial neural networks).

83

Data Structures (2)

• Diploid encodings (recessive genes).

• Many-character and real-valued encodings are more

natural and easy to handle. For some problems, better

performance with larger encoding alphabets.

• Tree encodings led to Genetic Programming (Koza). The search space is open-ended: any size.

• Multi-dimensional encodings (e.g., for clustering).

• Non-homogenous encodings (e.g., instructions to build a solution –

timetable).

• Sometimes, decodings performed with problem-specific heuristics.

• Davis (1991): choose the natural encoding (“data structure”),

then devise the GA!84

Evolution of Encoding

• If a less understood problem is fitted for

using GAs, how could one know the

“natural” encoding ahead of time?

• Technical example:

– The linkage problem: which are the important

loci for useful schemata? (in order to prevent

them from being disrupted by crossover)

• Why not let the encoding itself evolve?

Page 22: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

22

85

Encoding-adapting Techniques

• Adaptation via length evolution (variable-

length chromosomes)

• Inversion (representation-adapting operator)

• Identifying crossover “hot spots” (where to

cut in order to obtain the best-fitted

offspring?)

• Messy Genetic Algorithms (incomplete /

contradictory representations).86

Inversion (1)

• Holland (1975).

• An operator to handle the linkage problem

in fixed-length strings.

• It is a reordering operator inspired from real

genetics.

• The interpretation of an allele does not

depend on its position.

87

Inversion (2)

• Acts for order similarly to mutation for genes

• (00011010001)

( (1,0) (2,0) (3,0) (4,1) (5,1) (6,0) (7,1) (8,0) (9,0) (10,0) (11,1) )

• If inversion points are randomly generated to be after

loci 3 and 7, then after inversion the chromosome

becomes:

( (1,0) (2,0) (3,0)|(7,1) (6,0) (5,1) (4,1)|(8,0) (9,0) (10,0) (11,1) )

• Schemas like (10*********01) can be preserved after

crossover, if successful inversion finds the building

block ( (1,1) (2,0) (13,1) (12,0) (11,*)…(3,*) )

88

Inversion (3)

• Main drawback: crossing over two parents may result in ill-defined offspring.

• This is because permutations are involved.

• Solution: master/slave approach.

– one parent is chosen to be the master;

– the other one is temporarily reordered to the same permutation as the master.

Page 23: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

23

89

Co-evolving Crossover “Hot Spots”

• Schaffer, Morishima: a dual approach to inversion

– which is: avoid crossover in certain places.

• Find places where crossover is allowed to occur.

• In each chromosome, the crossover points are

marked (say, with “!”):

( 1 0 0 1! 1 1 1! 1 ) ( 1 0 0 1! 0 0! 1! 0 )

( 0 0 0 0 0 0! 0 0 ) ( 0 0 0 0 1 1 0 1 )

• Mutation may change 0’s and 1’s, but also it may

erase a “!” or insert a new “!”.

• Evaluation does not take into consideration the !’s.90

Messy Genetic Algorithms

• Goldberg, Deb – 1989.

• Inspiration: (human) genome did not start with strings of length 5.9*109, but rather from very simple structures.

• Representation:

– bits tagged with locus

– under-specification (schemata);

– over-specification (diploidy);

– {(1,0),(2,0),(4,1),(4,0)} 00*1

– {(3,1),(3,0),(3,1),(4,0),(4,1),(3,1)} **10

91

The Messy Genetic Algorithm (2)

• Evaluation:

– overspecified genes: left to right;

– underspecified genes:

• estimation of static fitness (randomly generated

chromosomes representing the scheme) OR

• template: given by a local optimum found before

running the GA

• Phases:

– primordial phase (exploration);

– juxtapositional phase (exploitation).92

The Messy Genetic Algorithm (3)

• Primordial phase:

– guess k – the shortest order for useful schemas;

– enumerate all these schemas.

– for k=3, m=8, this enumeration is: {(1,0),(2,0),(3,0)}; {(1,0),(2,0),(3,1)};… {(1,1),(2,1),(3,1)}; {(1,0),(2,0),(4,0)}; {(1,0),(2,0),(4,1)}; … {(6,1),(7,1),(8,1)}.

– apply selection:

• copies in proportion to fitness;

• delete half of the population at regular intervals.

Page 24: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

24

93

The Messy Genetic Algorithm (4)

• Juxtapositional phase:

– fixed population size;

– selection continues;

– two operators:

• cut : one messy chromosome is cut to give birth to two messy

chromosomes

• splice : two messy chromosomes are spliced into one.

• Test function: m=30; eval(chr) = eval(b3i+1b3i+2b3(i+1))

• 00028; 00126; 01022; 0110;

10014; 1010; 1100; 11130.

94

The Messy Genetic Algorithm (5)

• Problems:

– k must be guessed – no a priori knowledge;

– combinatorial explosion: the number of

schemas to be generated is 2kCmk

– population size grows exponentially with k (is k

always small?)

• “probabilistically-complete initialisation”:

initial chromosomes of length between k

and m (implicit parallelism helps).

95

Sampling Mechanism

1. Fitness-proportionate with “fortune wheel”

- already discussed in “Implementation”

2. Stochastic Universal Sampling

3. Sigma Scaling

4. Elitism

5. Boltzmann Selection

6. Rank-based selection

7. Tournament Selection

8. Steady-State selectionIn the sequel, Exp_Val(i, t) = pop-size*pi

the expected value of the number of copies of chromosome i in generation t,

where pi , i=1…pop-size, is the selection probability field in generation t.96

Sochastic Universal Sampling (1)

• “Fortune Wheel” selection only occasionally

results in the expected number of copies

– relatively small populations

– there is a non-zero probability to have all

offspring alocated to the worse individual!

• Baker (1987): spin the roulette only once, not

pop_size times, but with pop_size equally

spaced pointers on it.

– “enforced” uniform distribution.

Page 25: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

25

97

Remember the probability field

and the cumulative probabilities

… …

0 1

q0 q1 q2 q3 qi qi+1qpop_sizeqpop_size-2

98

Stochastic Universal Sampling (2)

• t is fixed: the discussion below is for

generation t.

• Each individual i is guaranteed to be

selected, at generation t:

at least Exp_Val(i,t) times and

at most Exp_Val(i,t) times.

• Once the values Exp-Val(i, t) are calculated

for all i, the procedure on the next side is

applied.

alea=Rand();

aux=0;

for (i=1; i<pop_size; i++)

for (sum=aux; sum>alea; sum+=Exp-Val(i,t))

select chromosome i; alea++;

aux+=sum;• The only random number generation happens at the start, then

aux increases by 1 for each new survival competition, which is

won by the current chromosome while aux/pop-size remains in

the ith subinterval of [0,1].

– aux increases by 1, not by 1/pop-size, because the sum of fitnesses in the

current generation is used, not their average – see how the probability

field for “fortune wheel” selection is built.99

Stochastic Universal Sampling procedure

• The rate of evolution depends on the variance of fitnesses in the population.

• Can this influence be smoothed?

– too large variances 2 lead to premature convergence

– small variances lead to a nearly random algorithm

• Forrest (1985): making variance less influential, by scaling.

• Calculation of the expected value Exp_Valin two steps, using the standard deviation and an auxiliary variable E_V(i,t): 100

Sigma Selection

Page 26: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

26

101

Sigma Selection (2)

E_V(i,t) = if ((t) 0)

then (1+(f(i)-f_med(t))/2(t)) else 1.0

Exp_Val(i,t) =

if (E_V(i,t)>=0) then E_V(i,t) else c1

(for example, c1 = 0.1).

• Expected number of copies in the next generation (Exp_Val) is scaled to the number of standard deviations from the mean.

• All chromosomes with fitness under -2 from the mean, are assigned a small Exp_Val, say 0.1

– Of course, if all fitnesses are equal then the selection is akin to random search. 102

Elitism

• Ken DeJong (1975).

• A possible addition to any selection

mechanism.

• Retain the best k individuals at each

generation; all the others, compete for the

remaining pop_size - k positions in the next

generation.

• k is a parameter of the algorithm.

• Often, it significantly improves the GA

performance.

103

Boltzmann Selection

• Unlike what happens under Sigma scaling: often, it makes sense to vary the selection pressure during the run of a GA: different rates may be needed at different moments.

• Boltzmann selection: a continuously varying parameter, temperature T, controls the selection pressure.

• Starting temperature – high selection pressure low (exploration)

• Then temperature lowers selection pressure increases (exploitation)

– Some analogy with Simulated Annealing.

• Typical implementation: for each individual i , the expected survival value in generation t is:

Exp_Val(i,t) = ef(i)/T / «ef(i)/T»t ,

where

– f(i) is the fitness of chromosome i and

– «·»t means “average over the whole generation t”.

• As T decreases, differences in Exp_Val increase (exploitation).

104

Boltzmann Selection (2)

Page 27: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

27

105

Rank Selection

• Purpose: to prevent premature convergence.

• Baker (1985):

– individuals ranked according to fitness

– expected values depend on rank, not on

fitnesses

• No fitness-scaling is necessary.

• Rank selection decreases the selection

pressure if the fitness variance is high; the

opposite happens if the fitness variance is

low. 106

Rank Selection: Another Fortune Wheel

• Ranking individuals (decreasing order of

fitness): 1, 2, …, pop_size;

choose q;

for each rank i, set the probability to select

chromosome i, prob(i)=q(1-q)i-1.

• i=1 best chromosome

• Example. pop_size=50, q=0.04:

prob(1)=0.04; prob(2)=0.0384; prob(3)=0.036864; etc.

.1)-(1)(prob11

pop_size

i

ipop_size

i

qqi

107

Tournament Selection• Similar to rank selection in terms of selection pressure.

Two individuals are chosen at random from the

population;

A random number r [0;1] is generated;

if (r < k) then (the fitter of the two individuals

is selected) else (the less fit is selected);

The two are returned to the sampled population

• k is a parameter (e.g., k=0.9).• Deb and Goldberg analysed this selection mechanism

108

Steady State Selection

• Generational GAs: new generation consists only of offspring.

• No, few or more parents may survive unchanged.

• Generational gap: the fraction of new individuals in the new generation (DeJong).

• Steady state selection: only a few individuals are replaced in each generation.

• Several of the least fit individuals are replaced by offspring of the fittest ones.

• Useful in evolving rule-based systems (classifier systems – Holland), as well as for problems where the proportion of admissible solutions is very low.

• Analysed by DeJong and Sarma.

Page 28: GENETIC ALGORITHMSeugennc/teaching/nim/res/min2.pdf · •A representation (genotype) of candidate solutions (phenotype) •An evaluation (fitness) function •A selection scheme

28

109

Selection Mechanisms –

A Brief Comparative Discussion• Fitness proportionate selection mechanism is used

traditionally (as Holland’s original proposal and because of its use in the schema theorem).

• Alternative selection mechanisms have been shown to improve convergence in many cases.

• Fitness-proportionate methods require two passes through each generation (one for mean fitness, one for expected values);

• Rank selection requires sorting the population – time consuming.

• Tournament selection is computationally more efficient and amenable to parallelisation.

110

Taxonomies for

Selection Mechanisms

• By dynamics of the field of probabilities:

– dynamic selections (expected value of the number of copies in the next generation for any chromosome, varies over generations: “fortune wheel”);

– static selections (fixed expected values: ranking).

• By survival probabilities:

– extinctive – survival probabilities may be zero

• left-extinctive – best chromosomes have survival probability 0;

• right-extinctive – worst chromosomes have survival probability 0.

– non-extinctive – all survival probabilities have non-zero values.

• Elitist / non-elitist

111

The Island Model

• The population is made of subsets of chromosomes.

• These subsets evoluate separately

– selection and operators applied only inside each subset of the population

• At times, subsets exchange chromosomes with a certain probability.

• Advantage: more evolution histories in one run.