coevolutionary gradient algorithms and their application...
TRANSCRIPT
Coevolutionary Gradient Algorithmsand their Application to Othello
Marcin Szubert Krzysztof Krawiec
Institute of Computing SciencePoznan University of Technology
May 17, 2011
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 2 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 3 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 4 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Coevolutionary Algorithms
Coevolutionary Algorithms
Bio-inspired methods that attempt to harness Darwinian notions ofheredity and survival of the fittest but in contrast to traditionalevolutionary algorithms do not attempt to objectively measure thefitness of individuals. Instead, individuals are compared on thebasis of their outcomes from interactions with other individuals.
Natural evolution is coevolution, where the fitness of an individual isdefined with respect to its competitors and collaborators, as well as tothe environment.
Simon M. Lucas
Coevolutionary Gradient Algorithms and their Application to Othello 5 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Coevolutionary Algorithms
The outcome of evaluating an individual in a coevolutionary algorithm dependsupon the context of whom the individual interacts with. This context sensitivityis characteristic of coevolutionary systems and responsible for the complexdynamics for which coevolution is (in)famous.
Sevan G. Ficici
Single-population coevolutionary algorithm
18 2 Coevolution
Algorithm 1: Basic scheme of a generational evolutionary algorithm1: P ! createRandomPopulation()2: A ! initializeArchive()3: evaluatePopulation(A, P)4: while ¬terminationCondition() do5: S ! selectParents(P)6: P ! recombineAndMutate(S)7: evaluatePopulation(A, P)8: updateArchive(A, P)9: end while
10: return getFittestIndividual(A, P)
The family of EA is composed of a few methods that di!er slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most importantdi!erence between these methods concerns so called representation which defines amapping from phenotypes onto a set of genotypes and specifies what data structuresare employed in this encoding. Phenotypes are objects forming solutions to theoriginal problem, i.e., points of the problem space of possible solutions. Genotypes,on the other hand, are used to denote points in the evolutionary search space whichare subject to genetic operations. The process of genotype-phenotype decoding isintended to model natural phenomenon of embryogenesis. More detailed descriptionof these terms can be found in [Weise 09].
Returning to di!erent dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finitestate machines in classical Evolutionary Programming (EP) [Fogel 95] and trees inGenetic Programming (GP) [Koza 92]. A certain representation might be preferableif it makes encoding solutions to a given problem more natural. Obviously, geneticoperations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees betweencombined individuals.
The most significant advantage of EA lies in their flexibility and adaptability tothe given task. This may be explained by their metaheuristic character of “blackbox” that makes only few assumptions about the underlying objective function whichis the subject of optimization. EA are claimed to be robust problem solvers showingroughly good performance over a wide range of problems, as reported by Goldberg[Goldberg 89]. Especially the combination of EA with problem-specific heuristicsincluding local-search based techniques, often make possible highly e"cient opti-mization algorithms for many areas of application. Such hybridization of EA isgetting popular due to their capabilities in handling real-world problems involvingnoisy environment, imprecision or uncertainty. The latest state-of-the-art method-
18 2 Coevolution
Procedure evaluatePopulation(A, P)
1: E ! selectEvaluators(A, P)2: performInteractions(P , E)3: aggregateInteractionOutcomes(P , E)
The family of EA is composed of a few methods that di!er slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most importantdi!erence between these methods concerns so called representation which defines amapping from phenotypes onto a set of genotypes and specifies what data structuresare employed in this encoding. Phenotypes are objects forming solutions to theoriginal problem, i.e., points of the problem space of possible solutions. Genotypes,on the other hand, are used to denote points in the evolutionary search space whichare subject to genetic operations. The process of genotype-phenotype decoding isintended to model natural phenomenon of embryogenesis. More detailed descriptionof these terms can be found in [Weise 09].
Returning to di!erent dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finitestate machines in classical Evolutionary Programming (EP) [Fogel 95] and trees inGenetic Programming (GP) [Koza 92]. A certain representation might be preferableif it makes encoding solutions to a given problem more natural. Obviously, geneticoperations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees betweencombined individuals.
The most significant advantage of EA lies in their flexibility and adaptability tothe given task. This may be explained by their metaheuristic character of “blackbox” that makes only few assumptions about the underlying objective function whichis the subject of optimization. EA are claimed to be robust problem solvers showingroughly good performance over a wide range of problems, as reported by Goldberg[Goldberg 89]. Especially the combination of EA with problem-specific heuristicsincluding local-search based techniques, often make possible highly e"cient opti-mization algorithms for many areas of application. Such hybridization of EA isgetting popular due to their capabilities in handling real-world problems involvingnoisy environment, imprecision or uncertainty. The latest state-of-the-art method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].
Coevolutionary Gradient Algorithms and their Application to Othello 6 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Reinforcement Learning
Reinforcement Learning (RL)
Machine learning paradigm focused on solving problems in whichan agent interacts with an environment by taking actions andreceiving rewards at discrete time steps. The objective is to findsuch a decision policy that maximizes cumulative reward.
Agent
Environment
2. action at
3. reward rt1. state st
4. learn on the basis of < st , at , rt , st+1 >
In board games:
agent =⇒ player
environment =⇒ game
state =⇒ board state
action =⇒ legal move
reward =⇒ game result
Coevolutionary Gradient Algorithms and their Application to Othello 7 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Temporal Difference Learning
Temporal Difference Learning (TDL)
RL method which attempts to estimate a value function byobserving the progression of states – the learner adjusts it to makethe value of the current state more like the value of the next state.
Value function V (b) can be represented as a neural networkwith a modifiable weight vector w.The adjustment is based on a gradient-descent update, e.g.
∆wi := ηebi
e = v - v
Coevolutionary Gradient Algorithms and their Application to Othello 8 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 9 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Coevolutionary Temporal Difference Learning
Coevolutionary Temporal Difference Learning (CTDL)
A hybrid of coevolutionary search with reinforcement learning thatworks by interlacing one-population competitive coevolution withtemporal difference learning.
18 2 Coevolution
Algorithm 1 Basic scheme of a generational evolutionary algorithm
1: P ! createRandomPopulation()2: A ! initializeArchive()3: evaluatePopulation(A, P)4: while ¬terminationCondition() do5: S ! selectParents(P)6: P ! recombineAndMutate(S)7: individualReinforcementLearning(P)8: evaluatePopulation(A, P)9: updateArchive(A, P)
10: end while11: return getFittestIndividual(A, P)
The family of EA is composed of a few methods that di!er slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most importantdi!erence between these methods concerns so called representation which defines amapping from phenotypes onto a set of genotypes and specifies what data structuresare employed in this encoding. Phenotypes are objects forming solutions to theoriginal problem, i.e., points of the problem space of possible solutions. Genotypes,on the other hand, are used to denote points in the evolutionary search space whichare subject to genetic operations. The process of genotype-phenotype decoding isintended to model natural phenomenon of embryogenesis. More detailed descriptionof these terms can be found in [Weise 09].
Returning to di!erent dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finitestate machines in classical Evolutionary Programming (EP) [Fogel 95] and trees inGenetic Programming (GP) [Koza 92]. A certain representation might be preferableif it makes encoding solutions to a given problem more natural. Obviously, geneticoperations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees betweencombined individuals.
The most significant advantage of EA lies in their flexibility and adaptability tothe given task. This may be explained by their metaheuristic character of “blackbox” that makes only few assumptions about the underlying objective function whichis the subject of optimization. EA are claimed to be robust problem solvers showingroughly good performance over a wide range of problems, as reported by Goldberg[Goldberg 89]. Especially the combination of EA with problem-specific heuristicsincluding local-search based techniques, often make possible highly e"cient opti-mization algorithms for many areas of application. Such hybridization of EA isgetting popular due to their capabilities in handling real-world problems involvingnoisy environment, imprecision or uncertainty. The latest state-of-the-art method-
Coevolutionary Gradient Algorithms and their Application to Othello 10 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Relative Methods Performance Over Time
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
0 10 20 30 40
poin
ts in tourn
am
ents
games played (x 100 000)
CTDL + HoF
CTDL
TDL
CEL + HoF
CEL
Coevolutionary Gradient Algorithms and their Application to Othello 11 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Observations and Motivation
Observations on learning Othello strategies
Temporal Difference Learning is much faster and under mostexperimental settings it is able to learn better strategies.
Coevolution can eventually produce better strategies if it issupported by an archive which sustains progress.
CTDL benefits from these complementary characteristics.
Motivation for further research on CTDL
No need for human expertise – useful when the knowledge ofthe problem domain is unavailable or expensive to obtain.
Potential for employing more complex learner architecture.
Interesting biological interpretation.
Coevolutionary Gradient Algorithms and their Application to Othello 12 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 13 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 14 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Evolution of Artificial Neural Networks
Typically, a network topology is chosen before the experimentand evolution searches the space of weight connections.
Can evolving topologies along with weights provide anadvantage over evolving weights on a fixed-topology?
Any continuous function can be approximated by a fully connected neural networkhaving only one internal hidden layer and with an arbitrary sigmoidal nonlinearity.
George V. Cybenko
Challenges of Topology and Weight Evolving ANNs (TWEANNs)
How to cross over disparate topologies in a meaningful way?How can topological innovation that needs a few generations to beoptimized be protected so that it does not disappear prematurely?How can topologies be minimized throughout evolution?
Coevolutionary Gradient Algorithms and their Application to Othello 15 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Evolvability and Neural Interference
Evolvability is an organism’s capacity to generate heritable phenotypic variation.
Marc Kirschner & John Gerhart
Evolvability of neural networks allows evolutionary algorithms to find weightsettings that produce a desired behavior or approximate a given function.
Julian Togelius
Topology of a neural network largely influences its evolvability– it can be increased by removing single inputs or connections.
The availability of certain information at certain points in thenetwork can lead evolution into local optima.
Neural interference appears in nonmodular neural networksthat learn complex behavior consisting of multiple tasks.
Coevolutionary Gradient Algorithms and their Application to Othello 16 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Neuroevolution of Augmenting Topologies (NEAT)
Matching Topologies using Innovation Numbers
Different network structures (size and connection order) –Competing Conventions Problem
NEAT performs artificial synapsis based on historical markingsEvolving NN’s through Augmenting Topologies
Figure 1: The competing conventions problem. The two networks compute the sameexact function even though their hidden units appear in a different order and are repre-sented by different chromosomes, making them incompatible for crossover. The figureshows that the two single-point recombinations are both missing one of the 3 maincomponents of each solution. The depicted networks are only 2 of the 6 possible per-mutations of hidden unit orderings.
We now turn to several specific problems with TWEANNs and address each inturn.
2.2 Competing Conventions
One of the main problems for NE is the Competing Conventions Problem (Montana andDavis, 1989; Schaffer et al., 1992), also known as the Permutations Problem (Radcliffe,1993). Competing conventions means having more than one way to express a solutionto a weight optimization problem with a neural network. When genomes represent-ing the same solution do not have the same encoding, crossover is likely to producedamaged offspring.
Figure 1 depicts the problem for a simple 3-hidden-unit network. The three hid-den neurons A, B, and C, can represent the same general solution in 3! = 6 differentpermutations. When one of these permutations crosses over with another, critical in-formation is likely to be lost. For example, crossing [A,B, C] and [C, B, A] can resultin [C,B,C], a representation that has lost one third of the information that both of theparents had. In general, for n hidden units, there are n! functionally equivalent solu-tions. The problem can be further complicated with differing conventions, i.e., [A,B, C]and [D,B,E], which share functional interdependence on B.
An even more difficult form of competing conventions is present in TWEANNs,because TWEANN networks can represent similar solutions using entirely differenttopologies, or even genomes of different sizes. Because TWEANNs do not satisfy strictconstraints on the kinds of topologies they produce, proposed solutions to the com-peting conventions problem for fixed or constrained topology networks such as nonre-dundant genetic encoding (Thierens, 1996) do not apply. Radcliffe (1993) goes as far ascalling an integrated scheme combining connectivity and weights the “Holy Grail in
Evolutionary Computation Volume 10, Number 2 103
A
2
3
1
B C C
2
3
1
B A
[A,B,C]
[A,B,A] [C,B,C] Crossovers:
[C,B,A] x
(both are missing information)
Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen
Coevolutionary Gradient Algorithms and their Application to Othello 17 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Neuroevolution of Augmenting Topologies (NEAT)
Two types of structural mutations in NEAT:
2 NEUROEVOLUTION OF
AUGMENTING TOPOLOGIES (NEAT)
The NEAT method of evolving artificial neural networks
combines the usual search for appropriate network weights
with complexification of the network structure. This ap-
proach is highly effective: NEAT outperforms other neu-
roevolution (NE) methods, e.g. on the benchmark double
pole balancing task by a factor of five (Stanley and Miik-
kulainen 2001, 2002b,c). The NEATmethod consists of so-
lutions to three fundamental challenges in evolving neural
network topology: (1) What kind of genetic representation
would allow disparate topologies to crossover in a mean-
ingful way? (2) How can topological innovation that needs
a few generations to optimize be protected so that it does
not disappear from the population prematurely? (3) How
can topologies be minimized throughout evolution so the
most efficient solutions will be discovered? In this section,
we explain how NEAT addresses each challenge.1
2.1 GENETIC ENCODING
Evolving structure requires a flexible genetic encoding. In
order to allow structures to complexify, their representa-
tions must be dynamic and expandable. Each genome in
NEAT includes a list of connection genes, each of which
refers to two node genes being connected. Each connec-
tion gene specifies the in-node, the out-node, the weight of
the connection, whether or not the connection gene is ex-
pressed (an enable bit), and an innovation number, which
allows finding corresponding genes during crossover.
Mutation in NEAT can change both connectionweights and
network structures. Connection weights mutate as in any
NE system, with each connection either perturbed or not.
Structural mutations, which form the basis of complexifi-
cation, occur in two ways (figure 1). In the add connection
mutation, a single new connection gene is added connect-
ing two previously unconnected nodes. In the add node
mutation an existing connection is split and the new node
placed where the old connection used to be. The old con-
nection is disabled and two new connections are added to
the genome. This method of adding nodes was chosen in
order to integrate new nodes immediately into the network.
Through mutation, genomes of varying sizes are created,
sometimes with completely different connections specified
at the same positions.
In order to perform crossover, the system must be able to
tell which genes match up between any individuals in the
population. The key observation is that two genes that have
the same historical origin represent the same structure (al-
1A more comprehensive description of the NEAT method isgiven in Stanley and Miikkulainen (2001, 2002c).
1 2 3
4
5
1 2 3
4
5
1 2 3
4
5
1 2 3
4 6
5
1!>4
1!>4
1!>4
1!>4
2!>4
2!>4
2!>4
2!>4
2!>5
2!>5
2!>5
2!>5
3!>5
3!>5
3!>5
3!>5
4!>5
4!>5
4!>5
4!>5
3!>4
3!>6 6!>5
DIS
DIS
DIS
DIS DIS
1
1
1
1
3
3
3
3
4
4
4
4
5
5
5
5
6
6
6
6
7
8 9
Mutate Add Connection
Mutate Add Node
Figure 1: The two types of structural mutation in NEAT.Both types, adding a connection and adding a node, are illus-trated with the genes above their phenotypes. The top number ineach genome is the innovation number of that gene. The bottomtwo numbers denote the two nodes connected by that gene. Theweight of the connection, also encoded in the gene, is not shown.The symbol DISmeans that the gene is disabled, and therefore notexpressed in the network. The figure shows how connection genesare appended to the genome when a new connection is added tothe network and when a new node is added. Assuming the de-picted mutations occurred one after the other, the genes would beassigned increasing innovation numbers as the figure illustrates,thereby allowing NEAT to keep an implicit history of the originof every gene in the population.
though possibly with different weights), since they were
both derived from the same ancestral gene from some point
in the past. Thus, all a system needs to do to know which
genes line up with which is to keep track of the historical
origin of every gene in the system.
Tracking the historical origins requires very little compu-
tation. Whenever a new gene appears (through structural
mutation), a global innovation number is incremented and
assigned to that gene. The innovation numbers thus rep-
resent a chronology of every gene in the system. As an
example, let us say the two mutations in figure 1 occurred
one after another in the system. The new connection gene
created in the first mutation is assigned the number , and
the two new connection genes added during the new node
mutation are assigned the numbers and . In the future,
whenever these genomes crossover, the offspring will in-
herit the same innovation numbers on each gene; innova-
tion numbers are never changed. Thus, the historical origin
of every gene in the system is known throughout evolution.
Through innovation numbers, the system now knows ex-
actly which genes match up with which. Genes that do not
match are either disjoint or excess, depending on whether
they occur within or outside the range of the other parent’s
innovation numbers. When crossing over, the genes in both
Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen
Coevolutionary Gradient Algorithms and their Application to Othello 18 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Neuroevolution of Augmenting Topologies (NEAT)
Protecting Innovation through Speciation
Changing the topology of a network is often very disruptive.
Structural innovation is unlikely to survive in the population.
NEAT divides the population into species that competeprimarily within their own niches.
Minimizing Dimensionality
Forcing minimal topologies could be achieved by incorporatingnetwork size into the fitness function.
NEAT biases the search towards minimal-dimensional spacesby starting with a population with no hidden nodes.
Coevolutionary Gradient Algorithms and their Application to Othello 19 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 20 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
N-tuple Network Architecture
Type of ANN that operates on compound object (matrix,image) x which elements can be easily indexed and retrieved.
Formed by a set of m tuples – each created by (randomly)sampling input object with n locations.
29 39 191
134210203
195 189 90
Figure comes from “Face Recognition with the Continuous N-tuple Classifier” by S. M. Lucas
Coevolutionary Gradient Algorithms and their Application to Othello 21 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
N-tuple Network Output Value
Each input location has v possible values – a single n-tuplerepresents an n-digit number in base-v numeral system.
Each n-tuple has an associated look-up table (LUT) whichcontains parameters equivalent to weights in standard ANN.
Locations aij , where j=0..n−1 specified by each n-tuple ti areused to identify an address in a look-up table.
The output of the network is calculated by summing LUTvalues indexed by particular n-tuples:
f (x) =m∑i=0
fi (x) =m∑i=0
LUTi
n−1∑j=0
x(aij)v j
Coevolutionary Gradient Algorithms and their Application to Othello 22 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
N-tuple Network for Othello
In the context of Othello, an n-tuple network acts as a stateevaluation function – computes utility of a given board state.
2 0 1
1 0 20
0 0.57
26 - 0.02
- 0.34
0.87
1
19
LUT1
0 0.43
80 0.09
- 0.76
- 0.21
1
33
LUT2
Snake-shaped inputs are randomly assigned and stay fixedwhile learning affects weights in the look-up table.
Coevolutionary Gradient Algorithms and their Application to Othello 23 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
N-tuple Network as TWEANN
Structural Genetic Operators
Mutation consists in changing the input assignment of a singleelement of a tuple to one of its neighbouring locations.
Size of tuples remains constant throughout the evolution.
Crossover is restricted to exchanging whole tuples.
Each tuple represents an independent module that can beeasily combined with other modules.
Innovations are protected by applying an intensive individuallearning to a newly created structures.
Size of the representation does not grow.
Coevolutionary Gradient Algorithms and their Application to Othello 24 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 25 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Coevolutionary Gradient Search Process
Our approach is to analyse characteristics of the problem search space andthence to identify the algorithms (within the class considered) which exploitthese characteristics – we pay for our lunch, one might say.
Lionel Barnett
We aim to search both spaces in parallel – discrete networktopology space and continuous weight space.
How to move in these spaces to gain from their character?
Coevolutionary Gradient Search
Directed gradient search – numerically estimates direction ofchange in the vicinity of the current candidate solution.
Undirected coevolutionary search – stochastically jumps overthe search space starting from the fittest configurations.
Coevolutionary Gradient Algorithms and their Application to Othello 26 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Search Operators
Genetic Operators
Following genetic operators operate on the fittest individuals:
Weight mutation (mw )Topology mutation (mt)Topology crossover (x)
Gradient Operators
Gradient-based search operators work in the weight space andconsist in a single gradient-descent TDL learning scenario.
How to create a competitive learning environment?
self-play scenario (s)population opponent (p)archival opponent (a)
Coevolutionary Gradient Algorithms and their Application to Othello 27 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Guiding the Search Process
Interactions between candidate solutions is the only source ofinformation that guides the search process.
18 2 Coevolution
Algorithm 1 Basic scheme of a generational evolutionary algorithm
P ! createRandomPopulation()evaluatePopulation(P)while ¬terminationCondition() doS ! selectParents(P)P ! recombineAndMutate(S)evaluatePopulation(P)
end whilereturn getFittestIndividual(P)
The family of EA is composed of a few methods that di!er slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most importantdi!erence between these methods concerns so called representation which defines amapping from phenotypes onto a set of genotypes and specifies what data structuresare employed in this encoding. Phenotypes are objects forming solutions to theoriginal problem, i.e., points of the problem space of possible solutions. Genotypes,on the other hand, are used to denote points in the evolutionary search space whichare subject to genetic operations. The process of genotype-phenotype decoding isintended to model natural phenomenon of embryogenesis. More detailed descriptionof these terms can be found in [Weise 09].
Returning to di!erent dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finitestate machines in classical Evolutionary Programming (EP) [Fogel 95] and trees inGenetic Programming (GP) [Koza 92]. A certain representation might be preferableif it makes encoding solutions to a given problem more natural. Obviously, geneticoperations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees betweencombined individuals.
The most significant advantage of EA lies in their flexibility and adaptability tothe given task. This may be explained by their metaheuristic character of “blackbox” that makes only few assumptions about the underlying objective function whichis the subject of optimization. EA are claimed to be robust problem solvers showingroughly good performance over a wide range of problems, as reported by Goldberg[Goldberg 89]. Especially the combination of EA with problem-specific heuristicsincluding local-search based techniques, often make possible highly e"cient opti-mization algorithms for many areas of application. Such hybridization of EA isgetting popular due to their capabilities in handling real-world problems involvingnoisy environment, imprecision or uncertainty. The latest state-of-the-art method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].
18 2 Coevolution
Algorithm 1 Basic scheme of a generational evolutionary algorithm
P ! createRandomPopulation()evaluatePopulation(A)while ¬terminationCondition() doS ! selectParents(P)P ! recombineAndMutate(S)evaluatePopulation(P)
end whilereturn getFittestIndividual(P)
The family of EA is composed of a few methods that di!er slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most importantdi!erence between these methods concerns so called representation which defines amapping from phenotypes onto a set of genotypes and specifies what data structuresare employed in this encoding. Phenotypes are objects forming solutions to theoriginal problem, i.e., points of the problem space of possible solutions. Genotypes,on the other hand, are used to denote points in the evolutionary search space whichare subject to genetic operations. The process of genotype-phenotype decoding isintended to model natural phenomenon of embryogenesis. More detailed descriptionof these terms can be found in [Weise 09].
Returning to di!erent dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finitestate machines in classical Evolutionary Programming (EP) [Fogel 95] and trees inGenetic Programming (GP) [Koza 92]. A certain representation might be preferableif it makes encoding solutions to a given problem more natural. Obviously, geneticoperations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees betweencombined individuals.
The most significant advantage of EA lies in their flexibility and adaptability tothe given task. This may be explained by their metaheuristic character of “blackbox” that makes only few assumptions about the underlying objective function whichis the subject of optimization. EA are claimed to be robust problem solvers showingroughly good performance over a wide range of problems, as reported by Goldberg[Goldberg 89]. Especially the combination of EA with problem-specific heuristicsincluding local-search based techniques, often make possible highly e"cient opti-mization algorithms for many areas of application. Such hybridization of EA isgetting popular due to their capabilities in handling real-world problems involvingnoisy environment, imprecision or uncertainty. The latest state-of-the-art method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].
1. Play round robin tournament between population members
2. Randomly select archival individuals to act as opponents
3. Select the best-of-generation individual and add it to the archive
Search operators use different types of interaction feedback.
Coevolutionary Gradient Algorithms and their Application to Othello 28 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 29 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Learning 7 x 4 N-tuple Networks
0
0.1
0.2
0.3
0.4
0.5
0.6
0 200 400 600 800 1000 1200 1400 1600 1800 2000
avera
ge p
erc
enta
ge s
core
games played (x 1,000)
CTDL-sxmw + HoF
CTDL-sxmw
TDL
CEL + HoF
CEL
Coevolutionary Gradient Algorithms and their Application to Othello 30 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Learning 9 x 5 N-tuple Networks
0
0.1
0.2
0.3
0.4
0.5
0.6
0 200 400 600 800 1000 1200 1400 1600 1800 2000
avera
ge p
erc
enta
ge s
core
games played (x 1,000)
TDL
CTDL-sxmw + HoF
CTDL-sxmw
CEL + HoF
CEL
Coevolutionary Gradient Algorithms and their Application to Othello 31 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Learning 12 x 6 N-tuple Networks
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 200 400 600 800 1000 1200 1400 1600 1800 2000
avera
ge p
erc
enta
ge s
core
games played (x 1,000)
ETDL-sxmt
CTDL-sxmt + HoF
TDLCTDL-sxmw
CEL
Coevolutionary Gradient Algorithms and their Application to Othello 32 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Relative Performance of Self-play Methods
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
0 400 800 1200 1600 2000
poin
ts in tourn
am
ents
games played (x 1,000)
TDL
PTDL
CTDL-s
CTDL-sx
CTDL-sxmt
CTDL-sxmt + HOF
Coevolutionary Gradient Algorithms and their Application to Othello 33 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Relative Performance of Mutual-play Methods
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
0 400 800 1200 1600 2000
poin
ts in tourn
am
ents
games played (x 1,000)
CTDL-p
CTDL-px
CTDL-pxmt
CTDL-ax + HoF
CTDL-asxmt + HoF
Coevolutionary Gradient Algorithms and their Application to Othello 34 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Relative Performance of All Methods
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 400 800 1200 1600 2000
poin
ts in tourn
am
ents
games played (x 1,000)
CTDL-px
ETDL-sxmt
CTDL-sxmt
CTDL-sxmt + HoF
CTDL-asxmt + HoF
Coevolutionary Gradient Algorithms and their Application to Othello 35 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Evolutionary Player in the Othello League
Coevolutionary Gradient Algorithms and their Application to Othello 36 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Outline
1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning
2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks
3 Coevolutionary Gradient Algorithms
4 Experimental Results
5 Summary
Coevolutionary Gradient Algorithms and their Application to Othello 37 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Summary
Learning models that can be identified from observing nature:
population learning by genetic meanslife-time learning at an individual levelcultural learning by social interactions
We have implemented these models as different searchprocedures that incrementally improve candidate solutions.
A properly balanced combination of these models can result inobtaining the best performance in the long-term perspective.
The efficiency of n-tuple networks and our hybrid CTDLalgorithm has been confirmed in the Othello League.
Coevolutionary Gradient Algorithms and their Application to Othello 38 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Future Work
I am an enthusiastic Darwinian, but I think Darwinism is too big a theory to beconfined to the narrow context of the gene.
Richard Dawkins
Sociobiological inspirations:
Gene-culture coevolution (Dual Inheritence Theorem)– two types of replicators: genes and memes.Epigenetic transmission mechanisms – niche construction.
Improvement of selection procedures in noisy environments.
Designing more complex structural mutations for n-tuples.
Comparison between CTDL and NEAT algorithms.
Coevolutionary Gradient Algorithms and their Application to Othello 39 / 40 M. Szubert, K.Krawiec
Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary
Thank You
Coevolutionary Gradient Algorithms and their Application to Othello 40 / 40 M. Szubert, K.Krawiec