Building Blocks
CS 5764
Evolutionary Computation
Hod Lipson
Unifying ideas
• Knowledge represented as a population of
solutions containing building blocks
• Progress is driven by two key processes:
– Incremental progress: e.g. mutation
(traditional optimization): Refinement
– Recombination of solutions (e.g. crossover):
Discovering new areas (possibly initially
inferior)
“Chromosome” “Gene”
Terminology
01010100111001010101010010110
Allele one of two or more forms of a gene or a genetic locus
A GA Schema
• A “template”
– a string of symbols taken from the alphabet
{0,1,*}
– 010*1, *110*, *****, 10101
• The character “*” means “don’t care”
– *10*1 represents 01001, 01011, 11001,
and 11011
Geometric Interpretation
A Schema is a hyperplane in the larger search space manifold
Order of a schema
• Number of specified alleles in a gene
?
?
?
?
000 001 010 011
010 011 110 111
010 110
101
Order of a schema
• How many different strings of length N
does a schema of order “O” represent?
– A schema of order O represents 2N-O
different strings of length N
Schema Order Represented Strings
*** 0 000 001 010 011 100 101 110 111
*1* 1 010 011 110 111
*10 2 010 110
101 3 101
Destructive Dynamics
• Probability of surviving mutation
Sm(H)=
Defining Length
• “D” = The distance between the furthest
two non-* symbols
Schemata D
**** *1** 0
*10* 10** 1
1*1* 2
1*11 0**1 1001 3
Why is the length important?
Destructive Dynamics
• Probability of surviving single point crossover
Strings containing schemata
• A bit string represented by a schema is
said to “contain” the schema
Bit String Contained Schemata
1 1 *
00 00 0* *0 **
110 110 11* 1*0 1** *10 *1* **0 ***
1011 1011 101* 10*1 10** 1*11 1*1* 1**1 1***
*011 *01* *0*1 *0** **11 **1* ***1 ****
How many schemata does a string of length N include?
How many schemata in a
population?
• There are 3N different schemata
(potential genes) of length N
• A population of P bit-strings each of
length N contains between 2N and
min(P2N, 3N) schemata
N P Number of Schemata
3 100 ? - ?
All possible
schemata
How many schemata in a
population?
• There are 3N different schemata of
length N
• A population of P bit-strings each of
length N contains between 2N and
min(P2N, 3N) schemata
N P Number of Schemata
6 20 64 - 729
20 50 1048576 - 52428800
40 100 -
100 300 -
All possible
schemata
N=3
Estimating Fitness
associated with a gene
Population f
101 5
100 1
010 2
110 3
Schemata f
***
**0
**1
*0*
*00
*01
*1*
(5+1+2+3) / 4 = 2.75
(1+2+3) / 3 = 2
5 / 1 = 5
(5+1) / 2 = 3
1 / 1 = 1
5 / 1 = 5
(2+3)/2 = 2.5
Estimation uncertainty: Standard error
Observations
• If only fitness-proportionate selection is applied (no crossover or
mutation), schemata with above (below) average fitness are
sampled, generation after generation, by an increasing
(decreasing) number of chromosomes.
• Schemata with a long defining length have a higher probability
to be disrupted by crossover
• Schemata with high order have a higher probability of being
disrupted by mutation
• Schemata with a low order and a short defining length are called
building blocks
• Building blocks are processed with minimum disruption by GAs,
therefore GAs use building blocks of relatively high fitness to
build entire solutions
• GAs will be successful insofar as the
problem has been encoded in a way
that can be solved with compact
building blocks (low order, low defining
lengths)
• What is the easiest problem you can
think of?
Dynamics
• H is a schema present in the population at time t
• m(H,t) is the number of instances of H at time t
• u(H,t) is the observed average fitness of H
• expected number of offspring of x is f(x)/favg(t)
• If x is an instance of H, then
Destructive Dynamics
• Probability of surviving single point crossover
• Probability of surviving mutation
Sm(H)=
Combining Effects
0 500 1000 1500 2000 2500 3000
4
5
6
7
8
9
10
11
12
13
Random Search
GA (Roulette, Tight Linkage)Parallel HillclimberParallel Simulated Annealing
GA (Diversity, Poor Linkage)
GA (Diversity, Tight Linkage)
Be
st
Fitn
ess
GenerationEvaluations (x100)
Large defining length and small order = poor linkage
The Building Block Hypothesis
• GAs performs adaptation by identifying
and recombining "building blocks", i.e.
low order, low defining-length schemata
with above average fitness.
• GAs perform adaptation by implicitly
and efficiently implementing this
heuristic.
Caveats
• Model assumes particular form of
representation:
– Bit strings, single point crossover, mutation
• Assumes fitness-proportionate selection
• Assume fixed fitness criterion
• Assumes fixed population size
Many variations have been published