the$rubik$cube$and$gp$temporal$ …mheywood/x-files/publications/... · 2010-06-18 · away from...

27
The Rubik Cube and GP Temporal Sequence Learning: An ini<al study Peter Lichodzijewski and Malcolm Heywood Dalhousie University, Computer Science

Upload: others

Post on 19-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

The  Rubik  Cube  and  GP  Temporal  Sequence  Learning:  An  ini<al  study  

Peter  Lichodzijewski  and  Malcolm  Heywood    

Dalhousie  University,  Computer  Science  

Page 2: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Serial  Primary  Endosymbiosis  Kutschera  (2009)  

organelles was based on several lines of evidence—data

from the literature, and novel microscopic observations bythe author (Mereschkowsky 1905). In his preface to a

subsequent article (signed 11 January 1909), which was

published in the following year, Mereschkowsky (1910)wrote that his intention was to introduce a new theory on

the origin of the organisms on Earth. He explicitly pointed

out that the attempts of Charles Darwin, Ernst Haeckel andothers to solve this problem have been unsuccessful,

because not all pertinent facts were available when thesenaturalists published their books (Darwin 1859, 1872;

Haeckel 1877). He argued that so many novel findings in

the fields of cytology, biochemistry and physiology haveaccumulated, notably with respect to lower (unicellular)

organisms since Darwin’s era, the time has come to pro-

pose a new theory.Based on the principle of symbiosis (i.e., the union of

two different organisms whereby both partners mutually

benefit), Mereschkowsky (1910) coined the term symbio-genesis theory. The principle of symbiogenesis, as envi-

sioned by Mereschkowsky, is based on an analogy between

phagocytosis of extant unicellular eukaryotes such asamoebae (Fig. 1a) and hypothetical processes that may

have occurred millions of years ago in the oceans of the

young Earth. Mereschkowsky’s symbiogenesis hypothesisexplained the origin of the chloroplasts from ancient cya-

nobacteria and hence gave insight into the first steps in the

evolution of the plant kingdom (Sapp et al. 2002).Six years after Mereschkowsky’s death, the Russian

cytologist Ivan E. Wallin (1883–1969) proposed that

the mitochondria of eukaryotic cells are descendants ofancient, once free-living bacteria (Wallin 1927). In

addition, he suggested that the primary source of genetic

novelty for speciation events was a periodic, repeatedfusion of bacterial endosymbionts with eukaryotic host

cells. However, this second hypothesis of Wallin is not

supported by empirical data (Kutschera and Niklas 2005).

Evolutionary origin of multicellular organisms

Endosymbiotic events that occurred ca. 2,200–1,500 mil-

lion years ago (mya) in the oceans gave rise to the firsteukaryotic cells (Fig. 1b). Today, they are described within

the framework of the serial primary endosymbiosis theory

for cell evolution, which is supported by a solid body ofempirical data (see Margulis 1993, 1996 for classic reviews

and Kutschera and Niklas 2004, 2005 for more recent

accounts).The capture of an ancient a-proteobacterium by a

nucleus-containing (eukaryotic) host cell that resembled

extant amitochondriate protists occurred probably onlyonce in evolution. After a subsequent intracellular

domestication process, the once free-living a-proteobac-terium was reduced to an organelle for the production

and export of energy-rich adenosine triphosphate (ATP).In a subsequent primary endosymbiotic event an ancient

cyanobacterium was engulfed, domesticated and finally

enslaved to become a photosynthetic, green organelle(chloroplast). These acquisitions of an a-proteobacterialand a cyanobacterial endosymbiont (i.e., the ancestral

mitochondrion and chloroplast, respectively), organellesthat multiply in the cytoplasm by binary fission and are

inherited via the egg cell, were momentous events in the

history of life on Earth. Molecular phylogenetic analysesof plastid and nuclear genes led to the conclusion that

only a single, ancient primary plastid endosymbiosis

had persisted (Cavalier-Smith 2000; Reyes-Prieto et al.2007).

Fig. 1 Phagocytosis and primary serial endosymbiosis. In phagocy-tosis, which is a kind of cellular eating, an eukaryotic cell such as anamoeba engulfs prey organisms (bacteria), packages it in a foodvacuole and digests it (a). Diagrammatic rendering of the symbio-genetic origin of mitochondria and chloroplasts during the proterozoic(ca. 2,200–1,500 million years ago, see Fig. 7) according to the serialprimary endosymbiosis theory (b). A free-living a-proteobacterialancestor (Bacterium) is engulfed by an eukaryotic host cell withnucleus (N) and reduced (domesticated) to an organelle (mitochon-drion, Mito). Millions of years later, this heterotrophic cell engulfedand domesticated an ancient cyanobacterium, which evolved into achloroplast (Chloro). The phylogeny of the multicellular Fungi,Animalia and Plantae via unicellular ancestors (Protozoa, Chloro-phyta) is indicated. Mitochondria and chloroplasts multiply in thecytoplasm of their host cell and are inherited from the parent cell

192 Theory Biosci. (2009) 128:191–203

123

May  2010   2  GPTP  2010  

Page 3: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Abstract  Model  of  Symbiosis:  Maynard  Smith  (1991)  

Ecological

Coexistence

A

B

C

Subsets

of individuals

coexist

`Compartmentalization’

of the subsets

Synchronized

replication

Increasing complexity

May  2010   3  GPTP  2010  

Page 4: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Mul<-­‐Popula<on/  Coevolu<on  –  GP:  Lichodzijewski  and  Heywood  (2008)  

Point

Population

Host

Population

Host

Population

Symbiont

Population

GA!GP interaction

Symbiotic Coevolution

Competitive Coevolution GA!GA interaction

Host    Compe<<on  

(fitness  sharing)  

Bid-­‐based  GP  (context)  

May  2010   4  GPTP  2010  

Symbiont  Coopera<on  

Page 5: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Achieving  Context  –  Bid-­‐based  GP:  Lichodzijewski  and  Heywood  (2007)  

Ac<on   Bid  

Scalar   Program  

Sta<c   Evolved  

Instruc<on  Set  

Domain  Dependent  

Phenotypic  responses  

Domain  knowledge  

May  2010   5  GPTP  2010  

Page 6: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

May  2010   GPTP  2010   6  

6 GENETIC PROGRAMMING THEORY AND PRACTICE VIII

Algorithm 1 The core SBB training algorithm. Pt, H

t, and St refer to the

point, host, and symbiont populations at time t.1: procedure Train2: t = 0 � Initialization3: initialize point population P

t

4: initialize host population Ht (and symbiont population L

t)5: while t ≤ tmax do � Main loop6: create new points and add to P

t

7: create new hosts and add to Ht (add new symbionts to S

t)8: for all hi ∈ H

t do9: for all pk ∈ P

t do10: evaluate hi on pk

11: end for12: end for13: remove points from P

t

14: remove hosts from Ht (remove symbionts from S

t)15: t = t + 116: end while17: end procedure

suming a breeder style mode of replacement in which the worst Pgap pointsare removed (Step 13) and a corresponding number of new points are intro-duced (Step 6) at each generation. Parent points are therefore selected withfrequency pgenp under fitness proportional selection, resulting in a child pointdefined through genewise mutation relative to the parent. Discussion of thepoint population variation operators is necessarily application dependent, andis therefore discussed later in terms of the truck backer-upper problem domain(Section 3).

The evaluation function of Step 10 assumes the application of a real-valuedreward function that is a function of the interaction between point (pk) and host(hi) individuals, or G(hi, pk). This is also necessarily application dependentand is therefore defined later (Equation (1.6), Section 3) as a function of theideal target state for the truck backer-upper problem domain.

Initial or ‘raw’ point fitness, fk, may now be defined relative to the count ofhosts, ck, within a neighbourhood (?), or

fk =�

1 + 1−ckHsize

if ck > 00 otherwise

(1.1)

where Hsize is the host population size, and the count ck is the relative to thearithmetic mean of outcome µk on point pk or,

Page 7: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Asexual  Varia<on  Operators  Cannot  assume  genotypic  context  

May  2010   GPTP  2010   7  

Step  0:  Copy  parent  host  

Step  1:  Remove  index(es)  

Step  2:  Add  index(es)  

Step  3:  Create  new  symbiont(s)  

Page 8: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Hidden  State  Truck  Backer-­‐upper:  Lichodzijewski  and  Heywood  (2009)  

Wall  obstacle    

Out  of  b

ound

s  

(0,0)  

Y-­‐axis  

X-­‐axis   trailer  

Start  configura<on  Defined  by  point  

popula<on  

<x,y,θ(cab),θ(trailer)>   GP  <θ(steering>  

Goal  

May  2010   8  GPTP  2010  

Page 9: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Hidden  State  Truck  Backer-­‐upper:  Case  for  layered  learning  

Impact  of  specia<on  Across  teaming  popula<on  

May  2010   9  

Page 10: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Case  for  layered  Learning:  Layer  (i  –  1)  as  ac<ons  for  layer(i)  

May  2010   10  GPTP  2010  

Page 11: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Truck  backer-­‐upper:  Layered  SBB  Lichodzijewski  and  Heywood  (2010)  

May  2010   11  

!

!

!!

SBB 0 SBB 0 pop SBB 1 SBB 1 pop SBB!2k SBB!2k pop NEAT NEAT pop

0

200

400

600

800

1000

num

ber

of te

st cases s

olv

ed

Page 12: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Truck  backer-­‐upper:  Layered  SBB  

May  2010   GPTP  2010   12  

4 51 2 3

0

1 x -30!1 x -30! 3 x 30!4 x 30!1 x -30! 3 x 30!5 x 30!

2 x -30!6 x 30!1 x -30!

1 x 30!2 x -30!

level 1

level 0

32 2

Page 13: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Rubik  Cube:  Base  Representa<on  (54  cell  vector)  

May  2010   GPTP  2010   13  

36  

37  

38  

39  

40  

41  

42  

43  

44  

0  

1  

2  

3  

4  

5  

6  

7  

8  

9  

10  

11  

12  

13  

14  

15  

16  

17  

27  

28  

29  

30  

31  

32  

33  

34  

35  

18  

19  

20  

21  

22  

23  

24  

25  

26  

45  

46  

47  

48  

49  

50  

51  

52  

53  

Cell  ID  

Center  facelet  of  each  3  ×  3  face  is  ‘fixed’  

Page 14: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Rubik  Cube:  Ac<on  Set  (12  ac<ons:  6  clk-­‐wise,  6  counter  clk-­‐wise)  

May  2010   GPTP  2010   14  

Page 15: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Reward  Func<ons:  Host  

Host  

•  Compe<<ve  Fitness  Sharing:  

•  Reward  func<on:  

•  Distance  func<on:  D(s2,  s1)  

May  2010   GPTP  2010   15  

8 GENETIC PROGRAMMING THEORY AND PRACTICE VIII

Thus, for point pk the shared fitness score si re-weights the reward that host hi

receives on pk relative to the reward on the same point as received by all hosts.Once the shared score for each host is calculated, the Hgap lowest ranked hostsare removed. Any symbionts that are no longer indexed by hosts are consid-ered ineffective and are therefore also deleted. Thus, the symbiont populationsize may dynamically vary, with variation operators having the capacity to addadditional symbionts (?), whereas the point and host populations are of a fixedsize.

3. SBB Representation and Action setAs introduced in Section 2, we adopt the SBB framework of evolution (?). A

linear GP representation is assumed with instruction set for the Bid-Based GPindividuals consisting of the following generic set of operators {+,−,×,÷, ln(·), cos(·), exp(·), if}.The conditional operator ‘if ’ applies an inequality operator to two registers andinterchanges the sign of the first register if it’s value is smaller than the second.There are always 8 registers and a maximum of 24 instructions per symbiont.

The reward function applies a simple weighting scheme to the number ofactions (i.e., any of 12 quarter turn twists) necessary to move the final cubestate to a solved cube. Naturally, such a test becomes increasingly expensive asthe number of moves applied in the ‘search’ about the final cube state increases.Hence, the search is limited to testing for up to 2 moves away from the solution,resulting in the following evaluation function,

G(hi, pk) =1

(1 + D(sf , s∗))2(1.6)

where sf is the final state of the cube relative to cube configuration pk andsequence of moves defined by host hi; s

∗ is the ideal solved cube configuration,and; D(s2, s1) defines the ‘raw’ reward based on a search of up to two movesaway from the host’s final state, or

D(s2, s1) =

0, when 0 quarter twists match state s2 with s1

1, when 1 quarter twists match state s2 with s1

4, when 2 quarter twists match state s2 with s1

16, when > 2 quarter twists match state s2 with s1

(1.7)

The representation assumed directly indexes all 54 facelets comprising the3 × 3 Rubik cube. Indexing is sequential, beginning at the centre face withcubie colours differentiated in terms of integers over the interval [0, ..., 5]. Sucha scheme is simplistic with no explicit support for indicating which faceletsare explicitly connected to make corner or edges. Actions in layer 0 definea 90 degree clock-wise or counter clock-wise twists to each face; there are 6faces resulting in a total of 12 actions. When additional layers are added under

8 GENETIC PROGRAMMING THEORY AND PRACTICE VIII

Thus, for point pk the shared fitness score si re-weights the reward that host hi

receives on pk relative to the reward on the same point as received by all hosts.Once the shared score for each host is calculated, the Hgap lowest ranked hostsare removed. Any symbionts that are no longer indexed by hosts are consid-ered ineffective and are therefore also deleted. Thus, the symbiont populationsize may dynamically vary, with variation operators having the capacity to addadditional symbionts (?), whereas the point and host populations are of a fixedsize.

3. SBB Representation and Action setAs introduced in Section 2, we adopt the SBB framework of evolution (?). A

linear GP representation is assumed with instruction set for the Bid-Based GPindividuals consisting of the following generic set of operators {+,−,×,÷, ln(·), cos(·), exp(·), if}.The conditional operator ‘if ’ applies an inequality operator to two registers andinterchanges the sign of the first register if it’s value is smaller than the second.There are always 8 registers and a maximum of 24 instructions per symbiont.

The reward function applies a simple weighting scheme to the number ofactions (i.e., any of 12 quarter turn twists) necessary to move the final cubestate to a solved cube. Naturally, such a test becomes increasingly expensive asthe number of moves applied in the ‘search’ about the final cube state increases.Hence, the search is limited to testing for up to 2 moves away from the solution,resulting in the following evaluation function,

G(hi, pk) =1

(1 + D(sf , s∗))2(1.6)

where sf is the final state of the cube relative to cube configuration pk andsequence of moves defined by host hi; s

∗ is the ideal solved cube configuration,and; D(s2, s1) defines the ‘raw’ reward based on a search of up to two movesaway from the host’s final state, or

D(s2, s1) =

0, when 0 quarter twists match state s2 with s1

1, when 1 quarter twists match state s2 with s1

4, when 2 quarter twists match state s2 with s1

16, when > 2 quarter twists match state s2 with s1

(1.7)

The representation assumed directly indexes all 54 facelets comprising the3 × 3 Rubik cube. Indexing is sequential, beginning at the centre face withcubie colours differentiated in terms of integers over the interval [0, ..., 5]. Sucha scheme is simplistic with no explicit support for indicating which faceletsare explicitly connected to make corner or edges. Actions in layer 0 definea 90 degree clock-wise or counter clock-wise twists to each face; there are 6faces resulting in a total of 12 actions. When additional layers are added under

The Rubik cube and GP Temporal Sequence learning 7

µk =�

miG(hi, pk)Hsize

(1.2)

where µ → 0 implies that hosts are failing on point pk and ck is set to zero.Otherwise, ck is defined by the number of hosts satisfying G(hi, pk) ≥ µk;that is the number of hosts with an outcome reaching the mean performanceon point pk.

Equation (1.1) establishes the raw fitness of a point. However, unlike classi-fication problem domains, points frequently have context under reinforcementlearning domains i.e., a geometric interpretation. Specifically, each point isrewarded in proportion to the distance from the point to a subset of its nearestneighbours using ideas from outlier detection (Harmeling et al., 2006). To doso, all the points are first normalized by the maximum pair-wise Euclidean dis-tance – as estimated across the point population content, therefore limiting rk

to the unit interval – after which the following reward scheme is adopted:

1 The set of K points nearest to pk is identified;

2 A point reward factor rk is calculated as,

rk =

��pl

(D(pk, pl))2

K

�2

(1.3)

where the summation is taken over the set of K points nearest to pk andD(·, ·) is the application specific ‘raw’ reward function (Equation (1.7),Section 3).

3 The reward for point pk is then used to normalize its raw fitness value or

f�k = fk · rk (1.4)

With the normalized fitness f�k established we can now delete the worst per-

forming Pgap points (Step 13).

Host and Symbiont Population

Hosts are also subject to the removal and addition of a fixed number of Hgap

individuals per generation, Steps 14 and 7 respectively. However, in order toalso promote diversity in the host population behaviours, we assume a fitnesssharing formulation. Thus, shared fitness, si of host mi has the form,

si =�

pk

�G(hi, pk)�hj

G(hj , pk)

�3

(1.5)

Page 16: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Reward  Func<ons:  Point  

•  ‘Raw’  fitness  (fk):   •  Point  Reward  factor:  

•  Resul<ng  Point  fitness:  

May  2010   GPTP  2010   16  

The Rubik cube and GP Temporal Sequence learning 7

µk =�

miG(hi, pk)Hsize

(1.2)

where µ → 0 implies that hosts are failing on point pk and ck is set to zero.Otherwise, ck is defined by the number of hosts satisfying G(hi, pk) ≥ µk;that is the number of hosts with an outcome reaching the mean performanceon point pk.

Equation (1.1) establishes the raw fitness of a point. However, unlike classi-fication problem domains, points frequently have context under reinforcementlearning domains i.e., a geometric interpretation. Specifically, each point isrewarded in proportion to the distance from the point to a subset of its nearestneighbours using ideas from outlier detection (Harmeling et al., 2006). To doso, all the points are first normalized by the maximum pair-wise Euclidean dis-tance – as estimated across the point population content, therefore limiting rk

to the unit interval – after which the following reward scheme is adopted:

1 The set of K points nearest to pk is identified;

2 A point reward factor rk is calculated as,

rk =

��pl

(D(pk, pl))2

K

�2

(1.3)

where the summation is taken over the set of K points nearest to pk andD(·, ·) is the application specific ‘raw’ reward function (Equation (1.7),Section 3).

3 The reward for point pk is then used to normalize its raw fitness value or

f�k = fk · rk (1.4)

With the normalized fitness f�k established we can now delete the worst per-

forming Pgap points (Step 13).

Host and Symbiont Population

Hosts are also subject to the removal and addition of a fixed number of Hgap

individuals per generation, Steps 14 and 7 respectively. However, in order toalso promote diversity in the host population behaviours, we assume a fitnesssharing formulation. Thus, shared fitness, si of host mi has the form,

si =�

pk

�G(hi, pk)�hj

G(hj , pk)

�3

(1.5)

The Rubik cube and GP Temporal Sequence learning 7

µk =�

miG(hi, pk)Hsize

(1.2)

where µ → 0 implies that hosts are failing on point pk and ck is set to zero.Otherwise, ck is defined by the number of hosts satisfying G(hi, pk) ≥ µk;that is the number of hosts with an outcome reaching the mean performanceon point pk.

Equation (1.1) establishes the raw fitness of a point. However, unlike classi-fication problem domains, points frequently have context under reinforcementlearning domains i.e., a geometric interpretation. Specifically, each point isrewarded in proportion to the distance from the point to a subset of its nearestneighbours using ideas from outlier detection (Harmeling et al., 2006). To doso, all the points are first normalized by the maximum pair-wise Euclidean dis-tance – as estimated across the point population content, therefore limiting rk

to the unit interval – after which the following reward scheme is adopted:

1 The set of K points nearest to pk is identified;

2 A point reward factor rk is calculated as,

rk =

��pl

(D(pk, pl))2

K

�2

(1.3)

where the summation is taken over the set of K points nearest to pk andD(·, ·) is the application specific ‘raw’ reward function (Equation (1.7),Section 3).

3 The reward for point pk is then used to normalize its raw fitness value or

f�k = fk · rk (1.4)

With the normalized fitness f�k established we can now delete the worst per-

forming Pgap points (Step 13).

Host and Symbiont Population

Hosts are also subject to the removal and addition of a fixed number of Hgap

individuals per generation, Steps 14 and 7 respectively. However, in order toalso promote diversity in the host population behaviours, we assume a fitnesssharing formulation. Thus, shared fitness, si of host mi has the form,

si =�

pk

�G(hi, pk)�hj

G(hj , pk)

�3

(1.5)

14

! !! "#$%&

'()*+$,-&##

./

!

0

0

Figure 3.5: Shape of the raw point fitness function. Since ck may assume only values

over {0, 1, . . . ,Msize}, the function is defined over this set only, but is drawn here over

the entire range [0, Msize]. The function is maximized when ck is one, and minimized

when ck is zero.

determined as the number of teams mi such that G(mi, pk) ≥ µk, i.e., the number of

teams with outcome on pk greater than or equal to the mean. Having established ck,

the fitness of the point is calculated as in Eq. 3.2.

This fitness function embodies several properties that are viewed as desirable:

1. A point will receive a higher raw fitness if it identifies regions of the problem

space where the team behaviour is productive but does not overlap. If the

point population is evolved to contain points with a high raw fitness, this cor-

responds to a team population where individual behaviours do not overlap, i.e.,

decompose the problem space, right side of Figure 1.1.

2. A point will receive a low fitness if many teams do well on it and a few teams

do poorly on it. If the point is removed, the fitness of the teams that do well

on it will be adversely affected. However, since there are a lot of these teams,

it is less likely that all of them will be removed (compared to having just a few

of these teams), so the knowledge related to doing well on the point is likely to

remain in the team population.

3. A point will receive a high fitness if most teams do poorly on it and few teams

do well on it. The point is therefore more likely to persist in the population,

supporting the few teams that do well on it, and helping to sustain the associated

Hsize  ck  

fk  

All  hosts  Fail  on  point  pk  

One  host  solves  point  pk  

Global  reward  func<on   Local  Reward  func<on  

Page 17: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Evalua<on  Methodology  

•  Layered  SBB  –  2  layers  – Max  Genera<ons  

•  1,000  –  `Default`  Parameters  

•  Max  24  symbiont/  host  

•  Max  24  Instruc<ons  

•  Point  /  Host  Pop  Size:  –  120  

•  Single  Layer  SBB  – Max  Genera<ons:  2000  

•  `Big  Prog`  –  Instruc<on  limit:  36  

•  `Big  Team`  –  Sym./  host  Limit:  36  

May  2010   GPTP  2010   17  

Page 18: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Post  Training  Test  Cases:  Test  Set  A  Sampled  

May  2010   18  

1 2 3 4 5 6 7 8 9 10

0

200

400

600

800

number of twists

num

ber

of te

st cases

9

86

403

527

588

662640

728

673 684

Page 19: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Test  Performance:  Test  Set  A    Single  Champion  

May  2010   19  

!

!

SBB 0 SBB 1 SBB big team SBB big prog

250

300

350

400

450

500nu

mbe

r of t

est c

ases

sol

ved

(out

of 5

000)

Page 20: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Test  Performance:  Test  Set  A    Popula<on  wide  

May  2010   20  

!

!

SBB 0 SBB 1 SBB big team SBB big prog

800

900

1000

1100

1200

1300

1400

num

ber o

f tes

t cas

es s

olve

d (o

ut o

f 500

0)

Page 21: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Test  Performance:  Test  Set  B  (Complete  enumera<on)  

Number  of  Twists  

Number  of  unique  cube  configura6ons  

1   12  2   114  3   1  068  

May  2010   GPTP  2010   21  

Page 22: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Test  Performance:  Test  Set  B    

May  2010   22  

!

!!!!!!!

!

!!

!

SBB 0 SBB 1 SBB big team SBB big prog

0.2

0.4

0.6

0.8

1.0

!!

!!

!!!

!

1 tw

ist

2 tw

ist

3 tw

ist

prop

ortio

n of

cas

es s

olve

d

Page 23: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Solu<on  Complexity:    Symbiont  Count  

May  2010   23  

!

!

SBB 0 SBB 1 SBB big team SBB big prog

0

5

10

15

20

num

ber o

f tea

m m

embe

rs

Page 24: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Solu<on  Complexity:    Instruc<on  Count  

May  2010   24  

!

SBB 0 SBB 1 SBB big team SBB big prog

0

20

40

60

80

100

num

ber o

f effe

ctive

inst

ruct

ions

Page 25: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Discussion  (1)  

•  Rubik  Cube  domain  –  ‘Decep<ve’  fitness  func<on  –  Face  independent  representa<on  –  Layers  poten<ally  solve  for  different    

•  Cube  faces  •  Objec<ves  

•  Truck  backer-­‐upper  –  Informa<ve  fitness  func<on  –  Point  to  Host  interac<on  

•  Random  sampling  sufficient    

May  2010   GPTP  2010   25  

Page 26: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Discussion  (2)  •  SBB  implies  

–  Hosts:    •  Combinatorial  search  for  relevant  symbionts  •  Enforce  Inter  host  compe<<on:  fitness  sharing  

–  Symbionts:  •  No  direct  fitness  evalua<on  •  Popula<on  size  `floats`  •  Bid-­‐based  GP  

–  Context  and  communica<on  

–  Points:  •  Diversity  of  training  scenarios  

–  Major  source  of  evolu<onary  innova<on  •  Point  fitness  biases  replacement  

•  Layered  Learning  –  Hosts.layer(m  –  1)    Symbiont  ac<ons.layer(m)  

May  2010   26  GPTP  2010  

Page 27: The$Rubik$Cube$and$GP$Temporal$ …mheywood/X-files/Publications/... · 2010-06-18 · away from the host’s final state, or D(s2,s1)= 0, when 0 quarter twists match state s2 with

Acknowledgements  

•  NSERC  •  MITACS  

•  Killam  Scholarship  program  

May  2010   GPTP  2010   27