evolving modular neural networks with es hyper-neat, link expression output and connection...
TRANSCRIPT
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 2
Evolving Modular Neural Networks with ES Hyper-NEAT, Link Expression
Output and Connection Costs
Victor Chudinov
A thesis submitted in partial fulfillment of the requirements for the degree of
Master of Science in Software Development and Technology
IT University of Copenhagen
31.08.2015
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 3
Abstract
Modularity is a key property of natural neural networks and an important factor in their ability to
produce complex behaviors. However, so far neuroevolution has struggled to create modular
neural networks without special encouragement. Such encouragement usually produces solutions
that are limited in both layout and performance. A promising approach for modularity in
artificial evolution is the connection cost technique developed by Clune et al. (2013). Unlike
previous approaches it does not put constrains on specific patterns of modularity, but allows
evolution to select the most feasible layout. The current paper introduces the Connection Cost
technique to Evolvable Substrate HyperNEAT and explores its effect on performance and ability
to produce modular solutions. The approach is compared to another successful technique – that
of locality seeding with ES-HyperNEAT. Further comparisons are carried out against Fixed
Substrate HyperNEAT with connection cost and Fixed Substrate HyperNEAT-NEAT with
locality seeding. All approaches are evaluated on an XOR task and a retina task. While all
conditions were able to solve the XOR task, none solved the retina task. Furthermore, there were
no solutions that were visibly modular. Finally, the ES-HyperNEAT with the connection cost
technique had significantly lower scores than all other approaches on both tasks.
Keywords: ES-HyperNEAT, HyperNEAT, modularity, connection cost, locality, neuroevolution
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 4
CONTENTS
CONTENTS ....................................................................................................................................... 4
INTRODUCTION ................................................................................................................................ 5
I. BACKGROUND .......................................................................................................................... 7
1.1 Modularity In Natural Systems ............................................................................................. 7
1.2 Modularity in Artificial Systems ........................................................................................ 10
1.3 Regularity and Compositional Pattern Producing Networks .............................................. 14
1.4 Neuroevolution Of Augmenting Topologies (NEAT) ........................................................ 16
1.5 HyperNEAT And Evolvable Substrate HyperNEAT ......................................................... 19
II. APPROACH: ............................................................................................................................ 25
2.1 Seeding Evolution Towards Modularity ............................................................................. 25
2.2 Modularity through Connection Cost ................................................................................. 26
2.3 Hypotheses .......................................................................................................................... 27
2.4 Methodology .................................................................................................................. 28
Evolutions setup. ................................................................................................................... 28
XOR task ............................................................................................................................... 30
Retina task ............................................................................................................................. 31
Software ................................................................................................................................ 31
III. RESULTS ............................................................................................................................. 33
3.1 XOR task ............................................................................................................................. 33
6.2 Retina task ...................................................................................................................... 35
Modularity............................................................................................................................. 40
3.3 Discussion ........................................................................................................................... 41
IV. CONCLUSION ....................................................................................................................... 46
BIBLIOGRAPHY .............................................................................................................................. 48
Appendix A: Pseudo Code ............................................................................................................ 51
ES-HyperNEAT. ....................................................................................................................... 51
NSGA-II . .................................................................................................................................. 54
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 5
INTRODUCTION
The field of AI has advanced immensely in the last decades. Modern AIs can recognize
complex objects, translate from foreign languages, and build cars. However, even then they
remain limited, even stupid, when compared to animals. Despite all the progress this complexity
has so far eluded our attempts to recreate it. One of the possible reasons is that most modern AIs
lack modularity (Huizinga, Clune, & Mouret, 2014).
At the same time modularity is a major factor in the complexity of animal behavior
(Meunier, Lambiotte, & Bullmore, 2010). This modularity also provides several substantial
benefits to systems that utilize it – they tend to be more robust, more evolvable, and more
flexible (E Bullmore & Sporns, 2012). However, at the same time modular solutions are only a
subset of all possible solutions (Verbancsics & Stanley., 2011). For many tasks there are plenty
non-modular solutions that are within closer reach (Verbancsics & Stanley., 2011). Furthermore,
modularity does not offer an immediate fitness reward and most modern artificial evolution
favors immediate rewards over long term benefits (Verbancsics & Stanley., 2011). Thus the
question what caused modularity to become so prevalent in nature is still open.
One recent approach, argues that modularity is a by-product of selection that strives to
minimize connection costs for building and maintenance of new connections (Clune, Mouret, &
Lipson, 2013). Huizinga et al. (2014) tested this approach with Hypercube based Neuroevolution
of Augmenting Topologies (HyperNEAT), and showed that it can create modular solutions that
even perform better than non-modular one. However, one distinct characteristic of HyperNEAT
is that the human operator specifies the positions of the possible hidden nodes (Risi & Stanley,
2012). At the same time, in nature such structures grow on their own (Risi & Stanley, 2012).
Thus there is the question if the connection cost technique would be able to create modular
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 6
solution with an approach that evolves the placement of the neurons as well as the connectivity
patterns. Evolvable-Substrate HyperNEAT is an extension to HyperNEAT that can discover both
the connectivity and the placement of nodes. It could potentially take advantage of the
connection cost technique to discover even better solutions, than the ones created by the fixed
substrate HyperNEAT. Furthermore, it has shown good results with another way to evolve
modularity – that of locality seeding. Thus it is possible that enhancing ES-HyperNEAT with the
connection cost would yield similar good results. Based on this, the paper investigates three
questions:
1. Can ES-HyperNEAT with connection cost outperform fixed substrate HyperNEAT
with connection cost and with locality seeding?
2. Can ES-HyperNEAT with connection cost perform as well as ES-HyperNEAT with
locality seeding?
3. Can ES-HyperNEAT-CCT produce visually modular phenotypes?
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 7
I. BACKGROUND
1.1 Modularity In Natural Systems
Many natural and social systems are organized as complex networks of interconnected
elements (E Bullmore & Sporns, 2012). Many of those systems are also modular (E Bullmore &
Sporns, 2012). That is, one can divide them into many relatively independent modules (Meunier
et al., 2010). The elements in each module are densely connected to each other, but have few
connections to elements in other modules (Meunier et al., 2010).
Perhaps the greatest example of such modularity is the animal brain (Meunier et al.,
2010). At the highest level it is composed of hierarchy of modules that could be identified
visually – two hemispheres, stem, cerebellum and so on (Meunier et al., 2010). At its lowest
level neurons self-organize into cortical columns, which then organize into supper columns
(Meunier et al., 2010). Then separate areas of the brain are responsible for processing different
information – visual processing area, area responsible for speech, etc (Meunier et al., 2010).
Modularity offers some considerable advantages to a network (E Bullmore & Sporns,
2012). First, modular networks are more robust to unexpected changes than a non-modular one.
Separate modules are more or less self-contained (E Bullmore & Sporns, 2012). Thus
an unexpected change to one of them is less likely to affect the general functionality of the
system (E Bullmore & Sporns, 2012). In contrast, an unexpected change to a non-modular
systems is more likely to disrupt a component that affects its general functionality (E Bullmore
& Sporns, 2012). Furthermore, if multiple modules process similar tasks, if one of them is
damaged, the rest can compensate (E Bullmore & Sporns, 2012).
Second, modular networks are more evolvable (E Bullmore & Sporns, 2012; Clune et al.,
2013). Mutations in modular networks can remain contained within a module. This allows
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 8
evolution to introduce and develop innovations without disrupting the entire system evolvable (E
Bullmore & Sporns, 2012; Clune et al., 2013). The innovation is can remain inside the module,
until the system learns how to utilize it best evolvable (E Bullmore & Sporns, 2012; Clune et al.,
2013).. Furthermore, an innovation could spread through the network just by replicating the
module,instead of having to discover it all over again evolvable (E Bullmore & Sporns, 2012;
Clune et al., 2013). .
Modular networks offer better functionality than non-modular ones evolvable (Bullinaria,
2007). Separate modules are able to process information and solve tasks separately (Bullinaria,
2007). This can not only increase the speed at which a task is solved but also allow the network
to process fundamentally different tasks simultaneously (Bullinaria, 2007). In fact this could
create a "whole bigger than the sum of its parts" effect (Bullinaria, 2007). Integrating the
information from multiple modules could lead to the emergence of much more complex
behaviours (Bullinaria, 2007; Clune et al., 2013).
Finally, as the information is contained to the individual module, the network has to deal
with less noise (Bullinaria, 2007; E Bullmore & Sporns, 2012). In contrast nodes in a non-
modular network would have to deal with much more irrelevant information and signal
transmission could slow down (E Bullmore & Sporns, 2012; Huizinga et al., 2014). Finally, the
lack of local separation means that the network would struggle with processing multiple tasks at
the same time (E Bullmore & Sporns, 2012).
The origin of modularity is still a topic of debate however (Clune et al., 2013). Studies in
artificial evolution offer several hypotheses (Clune et al., 2013; Kashtan & Alon, 2005;
Verbancsics & Stanley., 2011). One such hypothesis is that modular networks can arise
spontaneously in a changing environment (Kashtan & Alon, 2005). Survival in nature is defined
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 9
by a number of tasks and goals that share a set of common sub goals (Kashtan & Alon, 2005). In
such a case common modules might emerge to handle each of these common sub goals. After
that, when the environment changes the network would need little adjustment (Kashtan & Alon,
2005).
Another hypothesis on the origin of modularity is grounded in the principle of locality
(Verbancsics & Stanley., 2011). This principle states that in a physical systems, an event is likely
related to events that are nearby rather than ones further away (Verbancsics & Stanley., 2011).
Thus, a node in a network will likely form connections with other closer nodes (Verbancsics &
Stanley., 2011). In fact many natural networks follow this principle, brains included nodes
(Verbancsics & Stanley., 2011). In the brain neurons have more connections to neurons that are
close than to neurons farther away (Verbancsics & Stanley., 2011).
The idea is that forming close-range connections is easy, but as their number increases
this task gets more difficult (Verbancsics & Stanley., 2011). At some point there will be no more
close neurons and forming long-range connections would be infeasible (E Bullmore & Sporns,
2012; Clune et al., 2013). Furthermore, longer connections need better coordination and higher
accuracy and more resources (E Bullmore & Sporns, 2012; Verbancsics & Stanley., 2011).
A third hypothesis states that modularity is a by-product from a selection to reduce
connection costs (Clune et al., 2013). Each connection requires resources for building and
maintaining it (E Bullmore & Sporns, 2012; Clune et al., 2013). Furthermore, sending signals
along longer connections is slower and again requires more resources (E Bullmore & Sporns,
2012; Clune et al., 2013). Finally - more connections make signal transmission slower and
noisier (E Bullmore & Sporns, 2012; Clune et al., 2013). Thus an environmental pressure that
minimizes the use of resources will also strive to keep the connection cost of neurons low. As
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 10
evolution cuts these costs, it would preserve only connections that have a direct effect to
performance (E Bullmore & Sporns, 2012; Clune et al., 2013). The result of this pruning process
is often modularity (Clune et al., 2013).
In nature many systems seem to adhere to this hypothesis at least in part (Clune et al.,
2013). Vascular systems tend to have minimised connectivity. (Clune et al., 2013). Brains and
neural systems also are minimized to an extent (Bassett et al., 2011; Ed Bullmore & Sporns,
2009; Meunier et al., 2010; Thivierge & Marcus, 2007). At the same time the wiring diagram of
the brain is not completely minimized (E Bullmore & Sporns, 2012). Instead it operates at a
trade-off between connection cost and performance (E Bullmore & Sporns, 2012). This suggests
that optimal performance needs at least some long-range connections (E Bullmore & Sporns,
2012). Despite being costly, such connections could reduce the time needed to transmit a signal
to distant regions (E Bullmore & Sporns, 2012).
Most likely none of these hypotheses is solely responsible for modularity (E Bullmore &
Sporns, 2012). Instead a multitude of factors could all contribute to the emergence of modular
structures modularity (E Bullmore & Sporns, 2012). For example connection cost could work in
conjunction with locality to bootstrap modularity (Clune et al., 2013; Verbancsics & Stanley.,
2011).
1.2 Modularity in Artificial Systems
The goal of AI research is to produce systems that have complex and intelligent
behaviours on the scale seen in animals, yet this has so far eluded researchers (Huizinga et al.,
2014). These complex behaviours are possible in part due to the modular organization of
the brain modularity (E Bullmore & Sporns, 2012). As discussed in previous sections modularity
also overs some advantages over non-modular systems. Yet, modularity does not emerge in
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 11
artificial neuroevolution without specific techniques to encourage it (Verbancsics & Stanley.,
2011). Researchers have developed several techniques to encourage modularity - L-systems
(Hornby & Pollack, 2001a), grammar encoding (Hornby & Pollack, 2001b), seeding network
motifs (Li, Yuan, Shi, & Zagal, 2015). However most of these techniques tend to produce rather
limited modularity that adheres to rigid rules and perform worse than non-modular networks
(Kashtan & Alon, 2005; Verbancsics & Stanley., 2011).
One reason for this is that for most tasks there usually is a perfect non-modular solution
(Clune et al., 2013; Verbancsics & Stanley., 2011). Furthermore, modular solutions are only a
subset of all available solutions (Clune et al., 2013; Verbancsics & Stanley., 2011). They often
are not the most easily reachable ones and do not convey an immediate fitness reward (Clune et
al., 2013; Verbancsics & Stanley., 2011). At the same time artificial evolution tends to select for
the solutions that offer the most short-term rewards (Clune et al., 2013; Verbancsics & Stanley.,
2011).
A more recent approach that does not suffer from these issues is the modularly varying
goals (Kashtan & Alon, 2005). The evolution is carried on a task, where the overall goal
switches after some generations (Kashtan & Alon, 2005). Furthermore, all goals have common
sub goals (Kashtan & Alon, 2005).
Kashtan and Alon test the technique on two modularly decomposable tasks - a logic gate
task and a retina task. They tested three conditions - modularity varying goals, fixed goal and
random goals (Kashtan & Alon, 2005). On both tasks the MVG condition produced more
modular networks than the other conditions (Kashtan & Alon, 2005). Furthermore, no
modularity emerged in the random goals condition (Kashtan & Alon, 2005).
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 12
Clune et al. (2011) applied the MVG approach to Hypercube based Neuroevolution of
Augmenting Topologies (HyperNEAT) and tested it on the same retina task, but, it performed
poorly. When the authors imposed modularity there was improvement, but still the fixed goal
setup performed best (Clune, Beckmann, McKinley, & Ofria, 2010). Finally Clune et al. (2010)
also tested a simplified version of the retina problem that was able to solve the task and produce
modular solutions. Based on the poor results they got, Clune et al. (2010) questioned the
conclusion of Kashtan and Alon about the generality of their approach. Furthermore, many
practical tasks cannot be defined in a modular way, which makes the algorithm harder to apply
(Clune et al., 2010).
Clune et al. (2013) introduced another approach for evolving modularity in ANNs that
does need special modification of the task - connection cost. It steps on the idea that in natural
networks each connection has a cost associated with it (Clune et al., 2013). As a result many
natural networks have minimized connections (Clune et al., 2013). Clune et al. implement
connection cost as a second objective in a non-domination multi-objective algorithm (Clune et
al., 2013). They then tested it on a retina task like that of Kashtan and Alon (Clune et al.,
2013). The results showed that the connection cost objective both improves performance and
introduces modularity (Clune et al., 2013). Furthermore they show that modularity could also
arise without the need for switching the goal (Clune et al., 2013). Huzinga et al.(2014) brought
the connection cost technique a step further, by also introducing regularity. To do so they
combined the connection cost technique with HyperNEAT and tested it on a retina problem, a 5-
XOR problem and a H-XOR problem (Huizinga et al., 2014). They compared this approach to
another approach – that of locality seeding (see below), direct encoding and unmodified
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 13
HyperNEAT (Huizinga et al., 2014). In all tasks HyperNEAT with connection cost had the
highest fitness and highest modularity (Huizinga et al., 2014).
An interesting result that the authors note is that the modularity of the Genetic seeding
condition was high early on, but dropped later in the evolution (Huizinga et al., 2014).. In
contrast the modularity of the CCT-HyperNEAT raised the most after it had already solved the
task (Huizinga et al., 2014). A possible explanation is that after solving the task the only way the
network could improve is by optimizing connection costs, while in contrast as the CPPN grows
more complex with time, the influence of the seeding diminishes (Huizinga et al., 2014)..
In another paper Lowell and Pollack took a simpler approach to connection cost (Lowell
& Pollack, 2014). Instead of multiobjective algorithm, they used a single objective with fixed
penalty for connections (Lowell & Pollack, 2014). They tested this on the retina task, but
the connection cost did not convey fitness benefits and did not increase the level of modularity of
the best networks (Lowell & Pollack, 2014). Still it led to more consistent levels of modularity in
the population (Lowell & Pollack, 2014). However, the use of a single fitness function as well as
underlying properties of NEAT might have prevented more significant results (Lowell &
Pollack, 2014).
A third approach to modularity is that of locality seeding (Verbancsics & Stanley., 2011).
It is inspired by the idea, that locality is a widespread property of natural systems (Verbancsics &
Stanley., 2011). The approach is simple – it consists of Gaussian hidden nodes that are added to
the Compositional Pattern Producing Network (CPPN) underlying (Verbancsics & Stanley.,
2011) . As the Gaussian function peaks at inputs closer to 0, then it can introduce locality in the
pattern generated by the CPPN - see section II.1. of the current paper for a detailed review of the
technique.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 14
Verbancsics and Stanley (2011) put the locality seed to a test against several other
variants of HyperNEAT. The results showed that the locality seed conditions performed best and
they tended to produce significantly more modular solutions than unseeded approaches
(Verbancsics & Stanley., 2011). An interesting result was that locality over a single axis
outperformed general locality over all axes (Verbancsics & Stanley., 2011). Another note, that
the authors make is that locality over the weight patterns tended to reduce fitness (Verbancsics &
Stanley., 2011).
Still, a drawback of HyperNEAT is that one has to specify the layout of the hidden nodes
(Risi, Lehman, & Stanley, 2010; Risi & Stanley, 2012).This creates a case, where HyperNEAT
can discover the weights of connections, but tells nothing about where the nodes should be (Risi
et al., 2010; Risi & Stanley, 2012). At the same time the CPPN does contain such information
(Risi et al., 2010; Risi & Stanley, 2012). Evolvable-Substrate HyperNEAT is an extension to
HyperNEAT that can take advantage of this information and discover the placement and density
of neurons in addition to the weight patterns (Risi et al., 2010; Risi & Stanley, 2012).
Furthermore it is compatible with the locality seeding and the LEO (Risi & Stanley, 2012). Risi
& Stanley (2012) tested the ES-HypperNEAT with locality seeding on the retina task. It
outperformed all other conditions and the solutions were predominantly modular (Risi &
Stanley, 2012).
1.3 Regularity and Compositional Pattern Producing Networks
Modularity, regularity and hierarchy are ubiquitous properties in natural organisms
(Huizinga et al., 2014). Cells combine together to form tissues, tissues form organs, organs form
the body . On each level there is repetition of components, symmetry, self-similarity and
repetition with variation (Stanley, 2007). For example, a centipede's body has several self-
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 15
similar segments (repetition, self-similarity). It’s left and right sides are mirror images of each
other (symmetry). Its legs vary in size, but maintain the same structure (repetition with
variation).
This regularity offers several advantages in evolution. First, one can reuse information
when describing the structure (Huizinga et al., 2014; Stanley, 2007). Second, a smaller genome
is easier to reproduce and reuse without error (Stanley, 2007). The human genome contains less
than 30 000 genes, 50% of which it shares with bananas (Dobbs, 2015). At the same time it
handles the development of the most complex structure known to man - the brain. Third,
evolution can discover a working solution only once and then re-use it (Stanley, 2007). For
example, in the evolution of a table one can discover the general structure of the legs, instead of
evolving each leg on its own. Finally the composition of regularities can lead to increased
complexity (Risi & Stanley, 2012; Stanley, 2007). On a higher level hands are mirror images of
each other. On a lower, fingers lie in a radial pattern around the palm. Each finger is similar to its
neighbors and at the same time mirrors the fingers on the other hand.
Figure 1: Various CPPN
Patterns evolved with the
online platform
PicBreeder. Source:
https://evolvegold.wordpre
ss.com/pic-breeder/
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 16
Regularity can help Evolutionary Computation discover solutions with complexity of a
natural scale (Risi & Stanley, 2012; Stanley, D’Ambrosio, & Gauci, 2009; Stanley, 2007). Many
higher order solutions could be achieved by composing simple elements (Stanley, 2007). At the
same time, re-using components would make these solutions less expensive computationally
(Stanley, 2007).
One approach that can encode regularity, used in the current study is Compositional
Pattern Producing Networks (CPPNs) (Stanley, 2007). CPPNs are mathematical abstractions of
development that can represent complex patterns in Cartesian Space (Stanley, 2007). In structure
and implementation CPPNs are similar to neural networks. They consist of nodes linked by
weighted connections, they have inputs, hidden nodes and outputs (Stanley, 2007). However,
where neural networks have a single activation function, CPPNs can have many different ones
(Stanley, 2007). Each function can introduce new coordinate frames that can hold other functions
(Stanley, 2007). This allows the network to produce complex patterns that exhibit various
symmetries or regularities (Stanley, 2007). For example a Gaussian node introduces left-right
symmetry and locality, while a sine – repetition (Stanley, 2007). This way the CPPN does not
need to simulate development to take advantage of its properties – the end result of the
development is encoded in the CPPN itself (Stanley, 2007).
1.4 Neuroevolution Of Augmenting Topologies (NEAT)
Neuroevolution is the artificial evolution of neural networks using genetic algorithms
(Stanley & Miikkulainen, 2002). In neuroevolution, a population of ANNs has to solve a task
over many generations, where each generation the networks are evaluated on that task (Stanley
& Miikkulainen, 2002). Improvement happens through mutation or through the crossover of fit
individuals (Stanley & Miikkulainen, 2002). This allows neuroevolution to create neural
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 17
networks that can solve a task without the need for extensive information gathering, as in
backpropagation (Stanley & Miikkulainen, 2002). However, despite this advantage, traditional
neuroevolution approaches are plagued by several problems.
First is the competing conventions problem. A task can have many successful solutions,
yet, these solutions might be incompatible with each other (Stanley & Miikkulainen, 2002). The
genomes can have different sizes, genes in same positions might encode different structures or
the same structure might appear at different positions (Stanley & Miikkulainen, 2002). As a
consequence, the offspring of two successful solutions might not be successful at all (Stanley &
Miikkulainen, 2002).
Next, innovations usually bring long-term benefits, but lead to a temporary drop in fitness
(Stanley & Miikkulainen, 2002). Thus non-novel ANNs that offer better immediate fitness
returns might dominate the population (Stanley & Miikkulainen, 2002). This leads to less
Figure 2 Crossover in
NEAT. Each genome is
assigned a historical marker
– a global number that marks
the specific structures.
During crossover the two
genomes are aligned. Genes
that are in common are
carried over. Disjoint genes
and excess genes are taken
from the more fit parent.
Image source:(Stanley &
Miikkulainen, 2002)..
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 18
diversity in the population and the evolution can stall in a local optima (Stanley & Miikkulainen,
2002).
Finally, a random starting topology poses problems on its own. Many of the initial
networks might not have a valid path from inputs to outputs (Stanley & Miikkulainen, 2002). It
would then take time for evolution to weed them out. Furthermore, random topologies tend to
not produce minimal solutions (Stanley & Miikkulainen, 2002). The population already starts
with unnecessary structures amd unless there is explicit fitness penalty, these structures will
carry over through the generations (Stanley & Miikkulainen, 2002). Thus, networks will
accumulate structures that increase computational costs, without improving performance
(Stanley & Miikkulainen, 2002). One can impose a fitness penalty for network size, but then
different problems would need discovering the appropriate penalty (Stanley & Miikkulainen,
2002).
Figure 3 Mutation in
NEAT: There are two types
of mutations in NEAT. One
that adds connection
between two nodes that are
not connected. Second adds
a node that splits an already
existing connection. In both
cases the new genes are
added at the end of the
genome. Image source:
(Stanley & Miikkulainen,
2002)
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 19
Neuroevolution of Augmenting Topologies (NEAT), is an approach that attempts to solve
these problems (Stanley & Miikkulainen, 2002). NEAT achieves this through the introduction of
historical markers, speciation and minimal initial population (Stanley & Miikkulainen, 2002).
Historical markers are global numbers associated with each innovation in the population (Stanley
& Miikkulainen, 2002). This allows one to keep track of which gene is which (HyperNEAT
paper). This way one can compare and crossover two genomes without needing complex
topological analyses. (Stanley & Miikkulainen, 2002) Next, speciation allows the subdivision of
the population into species. Each specie represents its own evolutionary niche where a genome
can compete with other similar genomes and optimize a particular innovation (Stanley &
Miikkulainen, 2002). Finally, NEAT starts from a minimal structure - just input and output
neurons. This allows networks to have an incremental grow as evolution discovers and optimizes
innovations. The result is compact and more efficient networks (Stanley & Miikkulainen, 2002).
NEAT has seen succesfull aplication in sevaral domains like pole-balancing tasks
(Gomez, Schmidhuber, & Miikkulainen, 2006; Stanley & Miikkulainen, 1996, 2002), game AI
(Cardamone, Loiacono, & Lanzi, 2009; Pugh, Goodell, & Stanley, 2013; Togelius, Shaker,
Karakovskiy, & Yannakakis, 2013) robot control (Trujillo, Olague, Lutton, & De Vega, 2008).
NEAT can also evolve CPPNs with minor adaptations (Stanley et al., 2009; Stanley &
Miikkulainen, 1996).
1.5 HyperNEAT And Evolvable Substrate HyperNEAT
Hypercube-based NEAT is an extension to NEAT and CPPNs inspired by the idea that a
geometric pattern could represent the connectivity patterns of a neural network (Stanley et al.,
2009). This idea steps on a few insights. First, the pattern generated by a CPPN is the result of
the output of the points that make it up (Stanley et al., 2009). Second, one could
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 20
specify a connection between two points just by their coordinates (Stanley et al., 2009). Third
one could represent a connection in n dimensions as a point in 2n dimensions (Stanley et al.,
2009). For example, take a connection in 2-dimensional space that has two nodes with
coordinates (x1, y1) and (x2, y2). These coordinates also correspond to a 4-dimensional point
with coordinates (x1, y1, x2, y2) (Risi & Stanley, 2012; Stanley et al., 2009). One can
then query a CPPN with this point and interpret the output as the weight of the connection
(Stanley et al., 2009). This process then repeats for each of the points that could make up the
neural network (Stanley et al., 2009). Finally, as outlined above, NEAT could evolve CPPNs
(Stanley et al., 2009). Thus one can test the generated networks on a given task, and assign the
result as fitness of the CPPN. The evolution then would create patterns that produce more
effective networks (Stanley et al., 2009).
Figure 4: HyperNEAT: 1. The substrate is composed of several predefined points. 2. The
CPPN is queried with the coordinates of these points. 3. The output of the CPPN then
becomes the weight of the connection between the input nodes. Image source: (Stanley et
al., 2009)
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 21
A specific property of the connectivity pattern is that it is isomorphic to the geometric
pattern that underlies it (Stanley et al., 2009). This allows it to preserve all regularities and
symmetries found in the geometry pattern (Stanley et al., 2009). One can then take advantage of
these regularities by making use of the geometry of the task (Risi & Stanley, 2012; Stanley et al.,
2009). For example once can place the inputs and outputs for a robot controller in a way that
reflects their actual position on the robot (Risi & Stanley, 2012; Stanley et al., 2009). Evolution
can then exploit this to speed up the discovery of the relationship between sensors and actuators
robot (Risi & Stanley, 2012; Stanley et al., 2009).
HyperNEAT offers several advantages over other approaches. First, it allows for a
compact description of much larger networks (Stanley et al., 2009). A CPPN could describe a
network of thousands of nodes with a genotype of just a few lines (Stanley et al., 2009).
Furthermore, the same CPPN can generate networks with varying sizes with just a change in a
few parameters (Stanley et al., 2009). Finally, one does not need to simulate development - the
end results are already encoded in the CPPN (Stanley et al., 2009).
Yet, in HyperNEAT one still has to specify the positions of the possible hidden
nodes. The pattern would have to intersect with these specific positions and adapt to them (Risi
et al., 2010; Risi & Stanley, 2012). Thus, the CPPN tells us what the connection weight between
nodes should be, but not where to place them (Risi et al., 2010; Risi & Stanley, 2012).
At the same time a CPPN pattern does contain information on where the nodes could be
(Risi et al., 2010; Risi & Stanley, 2012). Whenever HyperNEAT discovers a connection it also
discovers the nodes that define the connection. When asking which connections to include we
are also asking which nodes to include. Thus by switching the focus from connections to nodes
we would not need to specify them in advance (Risi et al., 2010; Risi & Stanley, 2012).
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 22
Furthermore, expressing a node depends on the information stored in the area of the
pattern where it lies (Risi et al., 2010; Risi & Stanley, 2012). This information is not. There are
areas of relative uniformity, where the nodes compute the same functions (Risi & Stanley, 2012).
There are also areas of high information density where nodes compute a variety of functions
(Risi & Stanley, 2012). It makes sense then to place nodes in these areas of high density, rather
than in the areas of uniform density (Risi & Stanley, 2012). Evolvable Substrate HyperNEAT
can discover these areas of high information variance and place nodes there (Risi & Stanley,
2012). In this way it can discover the locations of the hidden neurons in addition to the weights
of the connections (Risi & Stanley, 2012).
At the heart of ES-HyperNEAT is a procedure that takes a node as input and returns
its connectivity pattern as output. This procedure consists of two steps - an initialization phase
and a pruning phase (Risi & Stanley, 2012). During the Initialization phase the algorithm scans
the pattern for areas of high information (Risi & Stanley, 2012). It does this by dividing the
search space into regions until a specified resolution or until the variance of the region is too low
(Risi & Stanley, 2012). Once this happens nodes are placed on these locations (Risi & Stanley,
Figure 5 The two phases of the information extraction phase of the ES-HyperNEAT algorithm. (1)
During the pruning and extraction phase the space is iteratively subdivided until a maximum level is
reached or there is no more available variation. (2) Once the three with the possible nodes is
constructed, it is traversed once more until low variance is reached. Once this happens the connection
is expressed for the qualifying nodes. Image source: (Risi & Stanley, 2012)
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 23
2012). During the traversal, the algorithm constructs a quad tree that holds all thes discovered
nodes (Risi & Stanley, 2012).
During the pruning, phase the algorithm traverses the quad tree and removes nodes with
low variance (Risi & Stanley, 2012). . The end result is more nodes in areas with higher
information density (Risi & Stanley, 2012). . An important part of the Pruning and Expression
phase is the so called band pruning. Areas next to an edge in the pattern have higher variance,
and thus more expressed nodes (Risi & Stanley, 2012). However, on such edges the CPPN
might have a hard time determining in which region a node lies and what value to assign to it
(Risi & Stanley, 2012). . Band pruning restricts the expression of nodes close to the edge
between regions and focuses on nodes that exist well within a region of the pattern(Risi &
Stanley, 2012).
With this procedure established the algorithm constructs the neural network in an
iterative fashion (Risi & Stanley, 2012). . First discovering nodes that connected to the inputs,
(Risi & Stanley, 2012). Then further layers of hidden nodes for some iterations (Risi & Stanley,
2012). . Finally the procedure checks which of the hidden nodes connect to the outputs (Risi &
Stanley, 2012). . The last step is to remove nodes that do not have a path to both an input and an
output (Risi & Stanley, 2012). .
Figure 6 presents a complete overview of ES-HyperNEAT. Pseudo-code of the algorithm
is in appendix B. For a more thorough discussion on the methods behind ES-HyperNEAT look at
Risi & Stanley (2012)
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 24
Figure 6: The ES-HyperNEAT algorithm. First the hidden nodes that are connected directly to
the input are discovered (a). Next, the algorithm can discovers new hidden nodes for some
iterations (b). Finally, the algorithm discovers which hidden nodes connect to the output (c).
After that only nodes that have a complete path to input and output are kept (d). Image source:
(Risi & Stanley, 2012).
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 25
II. APPROACH:
2.1 Seeding Evolution Towards Modularity
The original HyperNEAT controls the expression of connections through a static threshold
(Verbancsics & Stanley., 2011). This threshold is a predefined parameter that affects all
connections in the same way (Verbancsics & Stanley., 2011). At the same time modularity needs
different thresholds in different areas of the substrate (Verbancsics & Stanley., 2011). Thus one
needs a way to put in place such a dynamic threshold if one wants to see modular solutions
(Verbancsics & Stanley., 2011). At the same time, a task might have a better non-modular solution.
If one puts an explicit drive towards modularity, such solutions would remain undiscovered
(Verbancsics & Stanley., 2011). Instead one can only bias evolution towards modular solutions but
without restricting non-modular ones (Verbancsics & Stanley., 2011).
The Link Expression Output (LEO) combined with locality seeding does just that. The
idea behind it is that a CPPN can encode the connectivity pattern separate from the weight pattern
(Verbancsics & Stanley., 2011). By adding this extra output one can decouple the two (Verbancsics
& Stanley., 2011).
Furthermore, one can bias the CPPN towards certain natural properties like modularity and
locality (Verbancsics & Stanley., 2011). This happens by adding a Gaussian hidden node to the
initial CPPN (Verbancsics & Stanley., 2011). The Gaussian function peaks at 0, so it would output
higher values for nodes that are closer to each other. Then if this node connects to the LEO it would
introduce locality in connectivity (Verbancsics & Stanley., 2011). If it connects to the normal
output it would introduce locality in weights (Verbancsics & Stanley., 2011). Seeding the CPPN
in this way might encourage modularity, but without constraining non-modular solutions
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 26
(Verbancsics & Stanley., 2011). If the task turns out to not need modularity, evolution would drive
it out (Verbancsics & Stanley., 2011).
2.2 Modularity through Connection Cost
The novel contribution of this paper is the introduction of the connection cost technique
to ES-HyperNEAT. In its essence the technique relies on optimizing both the connection cost of
the ANN and its performance at the same time (Huizinga et al., 2014). HyperNEAT and
HyperNEAT allow one to compute the length of the connections in the substrate (Huizinga et al.,
2014). Thus one can calculate a score that represents the connection cost and use it alongside
fitness using a multiobjective algorithm (Huizinga et al., 2014).
In their implementation of Connection Cost, Huizinga et al. (2014) used the NSGA-II
multiobjective algorithm (Huizinga et al., 2014). This algorithm sorts the population by dividing
in into subgroups called fronts (Huizinga et al., 2014). The individuals in each front dominate
the individuals in the next fronts (Huizinga et al., 2014). Here, one individual dominates
another, if for every objective it is not worse, and for at least one objective it is better than that
individual (Huizinga et al., 2014). Individuals that do not dominate each other would end up
together in the same front (Huizinga et al., 2014). Furthermore, within each front individuals are
further assigned a distance value based on the similarity of the scores of the individual to the
other individuals in the same front (Huizinga et al., 2014). The population is then sorted by front
and distance (Huizinga et al., 2014). Here an individual is deemed more fit than another if it has
lower rank or higher distance (Huizinga et al., 2014).
The original NSGA-II gives equal priority to all objectives (Huizinga et al., 2014). Yet,
there are cases where one would need to assign priority to these objectives (Huizinga et al.,
2014). One way to do this is by stochasticity. That is one can assign a probability to each
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 27
objective. The higher the probability, the more often this objective is included when determining
domination (Huizinga et al., 2014). In the current paper, fitness is always taken into account,
while connection cost - 25% of the time.
Unfortunately NSGA-II is incompatible with NEAT's Speciation (Huizinga et al., 2014;
Lehman, Risi, D’Ambrosio, & Stanley, 2013). So when using it with NEAT based approaches
one would need to adjust for that (Lehman et al., 2013). One way to do this is by adding an extra
objective that calculates genomic diversity(Lehman et al., 2013). This genomic diversity is the
average distance of the genome to its k-nearest neighbours as measured by the NEATs genomic
distance function(Lehman et al., 2013). Adding this objective then serves a function like that of
Speciation in NEAT (Lehman et al., 2013).
2.3 Hypotheses
The results of Risi & Stanley (2012) show that an extension developed for HyperNEAT
could give even better results with ES-HyperNEAT. Similarly, the connection cost technique
worked well as an extension to HyperNEAT in Huizinga et al. (2014). This raises the question
whether the connection cost would also work well with Evolvable Substrate HyperNEAT like
the seeding did. Combining the connection cost technique and ES-HyperNEAT might produce
better modular solution than its fixed substrate counterpart. Furthermore, in Huizinga et al.
(2014) the HyperNEAT-CCT outperformed the seeded HyperNEAT. Thus it is possible that this
relationship would also hold for ES-HyperNEAT. Thus I pose the following three hypotheses:
4. ES-HyperNEAT with connection cost will outperform HyperNEAT with connection
cost and HyperNEAT with seeding.
5. ES-HyperNEAT with connection cost will outperform seeded ES-HyperNEAT.
6. ES-HyperNEAT-CCT will produce visually modular solutions.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 28
2.4 Methodology
Evolutions setup.
Five different approaches (table 1) will be tested on two tasks – XOR task and a retina
task. In both cases the ANNs are activated 5 times before taking the output. In all cases the
CPPN is also activated 5 times. The initial CPPN for the seeded conditions was modeled after
Risi & Stanley (2012) and is presented on figure 7. It has two hidden Gaussian nodes. The first
one is responsible for weight locality along the y axis, while the second – for locality in
connectivity along the x axis. Thus the first one is connected to the normal CPPN output, while
the second – to the LEOThe substrates for the HyperNEAT conditions were similar to those of
Huizinga et al. and are presented on Fig 8. The substrates for the ES conditions used the same
inputs and output as the substrates for the HyperNEAT.
Table 1: Experimental Conditions
The XOR task was run 20 times for 300 generations each. The retina task was run 20
times for 3000 generations per run. Population size on both tasks was 150 individuals. The
multiobjective conditions had three objectives – genomic diversity, fitness and connection cost.
Name Description
HyperNEAT-LEO Fixed substrate HyperNEAT with Link Expression Output. This
condition serves as control case.
Seeded HyperNEAT Fixed substrate HyperNEAT with geometric and locality seeding.
HyperNEAT-CCT Fixed substrate HyperNEAT evolved with the connection cost.
Seeded ES-HyperNEAT Evolvable Substrate HyperNEAT with geometric and locality
seeding.
ES-HyperNEAT-CCT Evolvable Substrate HyperNEAT evolved with the connection cost.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 29
There was 25% chance to take the connection cost objective in comparisons between the
genomes, while the other two objectives were taken into account every time.
. The fitness for the XOR task was(4 − 𝐸)2, where E is the summed error. The fitness
function for the retina task was 1000
(1+𝐸∗𝐸) where E is again the summed error. The connection
cost was measured as 1000
(1+𝐿∗𝐿) where L is the summed connection length of the substrate
calculated through Euclidean distance. The genomic diversity objective was calculated as the
average genomic distance to all other members of the population in terms of the NEAT genomic
distance.
Finally, the maximum resolution for the ES-HyperNEAT conditions was 5,
corresponding to a grid of 32x32 nodes. Band threshold was set to 0.3, Division threshold to 0.5
and Variance threshold to 0.03. The substrates for the HyperNEAT conditions are presented on
the figure below. The rest of the parameters are presented in Appendix A.
Figure 7: Seeded CPPN. The CPPN
for the seeded conditions is initialized
with two Gaussian hidden nodes. One
of the nodes takes as input x1 – x2
and specifies locality in connectivity.
The other one takes as input y1 –y2+
bias and specifies locality in weights.
Image source: (Risi & Stanley, 2012)
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 30
XOR task
The ES-HyperNEAT-CCT is a novel approach and both the NSGA-II and ES-
HyperNEAT presented here are custom implementations. It makes sense to validate them on a
simpler task before moving on to harder problems. The XOR problem is one such benchmarking
task that is well established in the field (Dhar, Tickoo, Dubey, & others, 2009). In it the ANNs
have to solve a logical exclusive or problem. The network has two inputs, a bias input and a
single output. In the HyperNEAT setups the network also has two hidden neurons (Fig 9.). The
inputs are binary, and the network must identify whether they evaluate to true or false (Table 2).
Table 2 The XOR problem truth table
A B Output
0 1 1
1 0 1
0 0 0
1 1 0
Figure 8 Layout of the
HyperNEAT Substrate. Red nodes are at z = -1
Green Nodes are at z = 1
Grey Nodes are at z = 0
The grey-red node is the output.
The ES-HyperNEAT substrate uses
the same input and output positons
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 31
Retina task
The retina task is more complex than the XOR one. In it task inputs are projected on two
retinas of 2x2 pixels each. As the input is binary there are 16 possible combinations per retina for
a total of 256 pattern. Out of the 16 patterns that a retina receives 8 are considered valid. The
ANN must learn to recognize whether there is a valid pattern present on the left and the right
retina. This problem is modularly decomposable, as it requires the network to first process the
two patterns before outputting a final answer (Kashtan & Alon, 2005). Thus in order to solve the
task the neural network would benefit from modularity (Kashtan & Alon, 2005). At the same
time the retina task also has non-modular solutions (Kashtan & Alon, 2005).
Figure 9 The retina task. Inputs are projected on the left and the right half of the retina. It must
then recognize whether both retinas contain valid objects. Image source: (Risi & Stanley, 2012)
Software
All experiments were implemented using the MultiNEAT library. The Evolvable
Substrate HyperNEAT and the NSGA-II multiobjective algorithm were implemented by the
author for the purpose of this paper. The tests were run on Ubuntu 14.04LS and OpenSUSE 13.
The source code is available at: https://github.com/vchudinov/MultiNEAT
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 32
and https://github.com/peter-ch/MultiNEAT (no NSGA-II).
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 33
III. RESULTS
3.1 XOR task
All conditions were able to solve the XOR task within 300 generations. Furthermore, all
conditions but the ES-HyperNEAT-CCT were able to achieve perfect performance. The
descriptive statistics are presented in Table 3. Figure 10 presents the fitness curves. The seeded
Figure 10: Average fitness on the XOR task over 20 runs.
Mean Min Max St. Dev.95% CI
Low
95% CI
High
ES-HyperNEAT-CCT 13.18 9 15.79 2.64 12.03 14.34
ES-HyperNEAT-seeded 15.65 9.01 16 1.52 14.98 16
HyperNEAT-CCT 14 5.74 16 3.29 12.56 15.45
HyperNEAT-Seeded 15.84 12.78 16 0.7 15.53 16
HyperNEAT-LEO 15.23 7.63 16 2.31 14.21 16
Table 3 Descriptive statistics for the XOR task
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 34
HyperNEAT condition was fastest in solving the task, followed by the HyperNEAT-LEO.
An interesting result here is the seeded ES-HyperNEAT. It had the lowest fitness until
about generation 75 when it rapidly overtook the other conditions. The ES-HyperNEAT with
connection cost had the lowest average performance across the entire evolution.
Finally a 95% confidence interval shows that both seeded conditions significantly
outperform the ES-HyperNEAT with connection cost. The HyperNEAT-LEO performed on par
with the other two fixed substrate conditions and the seeded ES-HyperNEAT. It had significantly
higher performance than the ES-HyperNEAT-CCT. Finally, the confidence intervals show some
overlap between the seeded ES HyperNEAT and the CCT HyperNEAT. The two conditions
were further compared with a Mann Whitney U test. The Seeded ES-HyperNEAT significantly
outperforms the CCT-HyperNEAT (U = 64, p < 0.01).
Figure 11: 95% Confidence intervals for the XOR task. The ES-HyperNEAT-CCT shows
significantly lower score than all conditions but the HyperNEAT-CCT one.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 35
6.2 Retina task
No condition was able to solve the Retina task within 3000 generations. However until
the very end of the evolution the fitness was rising for all conditions. The descriptive stats at
generation 3000 are presented on Table 4. The highest average score at generation 3000 was held
by the unseeded HyperNEAT, while the highest score in the population by the seeded ES-
HyperNEAT. The Lowest average and the lowest overall scores both belonged to the ES-
HyperNEAT with connection cost.
Figure XX shows the fitness curves for the averaged best fitness. ES-HyperNEAT
maintains the lowest average score throughout the entire evolution The highest average scores
vary between the seeded ES-HyperNEAT and the HyperNEAT-LEO. Note that in all cases the
speed of the evolution starts slowing down at around fitness score of 0.12-0,16.
Figure 12 shows the distribution of the fitness for each condition over three points in the
evolution. In general it shows that the entire population improves. Furthermore, as the evolution
progresses the best individuals become more pronounced and the upper quartiles of the
populations grow. The most interesting score is that of the ES-HyperNEAT-CCT. The variance
of the population of that condition drops considerably. This shows that the population is getting
trapped in a local optima. At the same time the presence of outliers signifies that it is able to
escape it. Another interesting result is that when outliers are not taken into account the best
Table 4: Retina performance at generation 3000.
Mean Min Max St. Dev.95% CI
Low
95% CI
High
ES-HyperNEAT-CCT 0.166 0.116 0.244 0.023 0.156 0.176
ES-HyperNEAT-seeded 0.237 0.164 0.492 0.082 0.201 0.273
HyperNEAT-CCT 0.206 0.155 0.26 0.03 0.192 0.219
HyperNEAT-Seeded 0.193 0.164 0.278 0.034 0.178 0.207
HyperNEAT-Control 0.249 0.172 0.382 0.054 0.225 0.272
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 36
performer overall seem to be the HyperNEAT-LEO condition with close to 50% of the sample
outperforming the next best – the Seeded HyperNEAT condition.
The comparison between the different conditions was done with 95% confidence
intervals. Whenever the Cis have a small (less than 25%) overlap the conditions were further
compared using Mann-Whitney’s U. Finally, an overall presence of differences between the
samples was determined with Kruskal-Wallis analysis of Variance.
Figure 12 Average best fitness for the retina task
Figure 13 Fitness Distributions at three points of Evolution. Note the change of variance in the ES-
HyperNEAT-CCT. It suggests that the evolution has stumbled on a local optima. Also note the change
in the upper quartile of the HyperNEAT-LEO. By Generation 200 already it had the highest fitness
values.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 37
The Kruskal-Wallis test showed that samples differ from each other significantly (K =
44.18, p < 0.01). Confidence intervals (fig 15) show which samples differ from each other. First
– the ES-HyperNEAT-CCT has significantly worse performance than the seeded ES. The ES-
HyperNEAT-CCT also performed significantly less than the control condition. Finally – the
seeded ES-HyperNEAT did not perform significantly better than the control condition.
From the two connection cost conditions, the one with the fixed substrate had higher
performance than the evolvable substrate. With the seeded HyperNEAT conditions the situation
is the opposite and the ES-HyperNEAT had better performance. Since there is a slight overlap in
the CIs, the samples were also compared with Mann-Whitney’s U. It showed that the seeded ES-
HyperNEAT is significantly better (U = 95, p<0.01). The HyperNEAT-LEO condition
significantly outperformed all conditions save for the seeded ES-HyperNEAT condition.
Finally, analyzing the accuracy on the retina task revealed similar results (Fig 16., Table
5. ). The Seeded-ES-HyperNEAT outperformed all other conditions but the HyperNEAT-CCT.
Here the Hyper-NEAT-LEO had similar results to those of the other fixed substrate tasks. The
ES-HyperNEAT-CCT again had significantly lower scores than all other conditions. Do note
however, that the accuracy scores are within a few percent of each other. This suggests that the
difference is only in a few patterns.
Mean St. Dev.95% CI
Low
95% CI
High
ES-HyperNEAT-CCT 0.8478 0.0126 0.8422 0.8534
ES-HyperNEAT-seeded 0.87031 0.00358 0.8633 0.8773
HyperNEAT-CCT 0.8634 0.0159 0.8593 0..8675
HyperNEAT-Seeded 0.8583 0.0108 0.8536 0.8631
HyperNEAT-LEO 0.8583 0.0108 0.8536 0.8631
Table 5 Accuracy on the retina task in percent
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 38
Finally, figure 14 presents the change of accuracy over the evolution. All conditions but
the ES-HyperNEAT-CCT reached their maximum performance early on in evolution and barely
improved after that. In contrast, the ES-HyperNEAT-CCT showed slow and gradual
improvement. It is possible that this point of accuracy represents a local optimum that the
conditions were not able to overcome. Furthermore, a look at the patterns themselves showed
that the conditions were able to classify correctly both some pattern that belonged to the correct
objects, as well as some that did not.
Figure 14 Average accuracy over 20 runs on the retina task. All conditions but the ES-
HyperNEAT-CCT reach their maximum accuracy within the first 50 generations.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 39
Figure 15: 95% Confidence Intervals for the retina task at generation 3000
Figure 146: 95% Confidence intervals for retina task accuracy. Seeded ES-HyperNEAT has
significantly higher accuracy scores than all conditions but the HyperNEAT-CCT. ES-
HyperNEAT-CCT has a significantly lower score than all other conditions.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 40
Modularity
The visualizations of all neural networks at generation 3000 are attached as
supplementary material to the current paper. None of the conditions yielded solutions that are
visibly modular. In all cases the networks tend to be highly connected and with variable levels of
complexity. This also holds true for the ES-HyperNEAT with connection cost. Despite the
connection cost it managed to produce both very simple and very complex networks. Also
despite the connection cost the complexity of the networks changed in both directions during the
Figure 17: Clustering of nodes in evolved networks. Such node clusters might be indicative
of internal modularity
Figure 18 Similar networks evolved by different approaches. Left - Connection Cost ES-
HyperNEAT. Right - Seeded ES-HyperNEAT
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 41
evolution. Another interesting result is that the two different methods – the CCT-ES
HyperNEAT and the Seeded ES HyperNEAT could produce solutions with similar topology and
topography (fig 18).
Finally, despite the lack of modularity, the networks exhibited different levels of
regularities. Most interesting is that in some of the cases, the nodes where clustered into two
groups – in the left and the right half of the substrate. While these networks are still connected
pretty strongly, it is possible that there is underlying pattern of modularity that is not visible at a
glance.
3.3 Discussion
The results did not support any of the hypotheses. First, the ES-HyperNEAT-CCT did not
outperform the HyperNEAT-CCT. On the XOR task the two conditions had similar performance.
On the retina task the ES condition performed significantly worse. Second - the ES-HyperNEAT
with connection cost did not outperform the seeded ES-HyperNEAT. In fact the opposite was
true. The ES-HypernNEAT-CCT had the poorest performance on both tasks, while the seeded
Figure 19: Networks of different complexity, both evolved by ES-HyperNEAT with Connection cost
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 42
ES-HyperNEAT – the highest. Third, none of the tested approaches exhibited visually
identifiable modularity. Finally, none of the solutions was able to solve the retina task within
3000 generations. In contrast all were able to solve the much simpler XOR task within 300
generations.
The poor performance of the connection cost technique with ES-HyperNEAT comes as
particularly surprising. A look at fig 13. shows that most solutions ended up grouped around a
narrow range of values. At the same time each solution is quite distinct. This suggests that the
evolution stalls in a local optimum. The seeded ES-HyperNEAT also experienced a temporary
stall at quire similar fitness score. Furthermore, at a similar score the learning of all solutions
started slowing down. However, the Seeded ES-HyperNEAT was able to overcome it relatively
Figure 20: A sample evolutionary line from the Connection Cost ES-HyperNEAT condition. Left -
generation 1000, middle - generation 2000, right - generation 3000
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 43
quickly. Finally, none of the fixed substrate approaches experienced that local optimum. Thus
one can hypothesize that it is a problem, specific to the Evolvable Substrate approach.
The two factors that might account for the difference are the presence of seeding in one
case and the CCT in the other. Thus one possible explanation is that the locality seed makes
finding novel solutions easier. In contrast, when starting with an empty genome, evolution might
need more time to find a successful configuration. Alternatively, the evolution might be stuck at
a point that requires more complex solution. However, the connection cost acts as pressure
against that. This narrows down the search space to only those solutions that both work and have
lower connection cost. Yet, the produced phenotypes seem to have varying complexity. This
shows that the ES-HyperNEAT-CCT has no problem producing more complex phenotypes. Do
also note that there are outliers that have scores close to those of the other approaches. The
number of these outliers also seems to increase with time. This suggests that the evolution can, in
fact, find these successful solutions. Furthermore, both ES-HyperNEAT conditions were
outfitted with a LEO. Risi & Stanley (2012) note that ES-HyperNEAT with LEO actually
performs worse than ES-HyperNEAT without it. Thus it might be the case that LEO is
hampering evolution, and not the connection cost.
One way to discover whether the difficulties of finding them is a case of seeding or of
connection cost would be to compare these approaches to unseeded ES-HyperNEAT without
LEO. Furthermore trying the connection cost technique itself on ES-HyperNEAT without LEO
could show different results. Another would be to have a larger sample size. With a sample size
of just 20 it might be difficult to measure actual distributions and statistical effects. Finally, on
both tasks the fitness of the ES-HyperNEAT-CCT was growing the slowest. As the evolution
struggles to balance the objectives, it might need more time to improve on them all. In Huizinga
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 44
et al. by generation 3000 fitness was still changing fast. Thus longer evolution runs could
actually allow the ES-HyperNEAT-CCT to reach its full potential.(Risi et al., 2010)
The second result was that none of the approaches evolved visually modular phenotypes.
However, the lack of such modularity does not mean lack of modularity per se. With more
complex networks determining modularity is not straightforward. Thus one would need a more
appropriate way to measure modularity. One such approach used in the literature is the
modularity Q score.
Other research reports the present of visual modularity. However there are some notable
differences from the current paper. For example Huizinga et al. run their task for a much longer
time - 50 000 generations. They note that modularity did not emerge until evolution had solved
the task. Verbancsics & Stanley (2011) and Risi & Stanley (2012) used shorter runs, but had a
task set-up with two output nodes instead of one.
Both could be causes for the current lack of results. For example the connection cost
cases might have not run long enough. This early on in evolution there is still room to improve
on other objectives as well as the connection cost. Thus modularity might not emerge until it is
the only objective that offers some benefit. For the seeded conditions - it is possible that having
two outputs encourages separated processing. Furthermore it might be easier for evolution to
discover modularity. Here it only needs to learn to process inputs separately. In contrast a single
output does not suggest that the task needs modularity. Evolution would first have to discover
how to process the two parts of the retina, and then how to connect them to the output. Another
problem is that evolution could easily break modularity. For any task there are many more non-
modular solutions available than modular ones. Despite initial seeding or connection cost, the
solutions that are most available are the non-modular ones.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 45
The substrate for the seeded conditions had biases towards both locality in weights and in
connectivity. Verbancsics and Stanley (2011) note that biasing the weights towards locality can
limit the emergence of modularity. So the presence of this bias might be a factor in the lack of
modularity. At the same time Risi and Stanley (2012) had both biases, and the modular solutions
were prevalent in their results. Thus another topic of interest becomes the roles of the different
seeding, and how they contribute to both fitness and modularity.
A factor in the seeded condition is that as time passes the CPPN becomes more complex.
Thus after some time if modularity had not emerged it might not emerge at all. The probabilities
for addition of a node or a connection to the CPPN were similar to those in other research.
However, still the complexity of the CPPNs grew quite fast. This might have prevented any
stable patterns from forming. It also led to genome bloat and cases where a genome was much
larger than the neural network it encoded. Furthermore, nothing but the minimal start prevents
this from happening. There is no mechanism that controls the genome size itself. This bloat also
defeats the purpose of generative encoding to encode larger structures in a compact way. The
most intuitive approach would be to further limit the probabilities for mutating the CPPN.
Furthermore some notion of genome size could be added to the genome itself. For example it
would be trivial to include genome size as an extra objective to the CCT.
Finally - none of the solutions were able to solve the retina task. As with modularity, it
might be the case that the current set-up needs longer time. However, there are still some
interesting patterns. For example the best performing results were the seeded ES-HyperNEAT
and the HyperNEAT-LEO like in Risi & Stanley (2012). In contrast to Huizinga et al. (2014)
The HyperNEAT connection cost did not outperform the other HyperNEAT conditions. In
contrast to Verbancsics & Stanley (2011), the unseeded HyperNEAT-LEO had the best
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 46
performance of the fixed substrate conditions. The results here were closest to Clune et al.
(2010). In Clune et al. (2010) the retina task proved to be too hard for HyperNEAT as it is.
However simplifying the task did evolve perfectly modular solutions that were able to solve it.
Furthermore, all conditions were able to solve the XOR task. So testing them on a simpler
problem might reveal additional results. Finally it is possible that the differences in the results
are due to different set-ups and parameters.
IV. CONCLUSION
The current paper investigated the application of the Connection cost technique to ES-
HyperNEAT. The expectation was that the combination of these two will produce effective
modular neural networks. However, this was not the case – ES-HyperNEAT with connection
cost had the poorest average performance of all tested approaches. Furthermore it did not yield
any modular solutions that could be identified at a glance. At the same time the ES-HyperNEAT
with connection cost was able to solve a simpler XOR task. Thus the poor performance might be
due to parameters and task setup.
Since the data is already available a logical next step would be to measure modularity
with a more reliant approach. For example the Q-score is already used throughout research for
calculating the modularity of networks (Newman, 2003). Next, the evolutionary runs in the
current paper were short compared to those in other research. Thus it might be useful to test the
ES-HyperNEAT with connection cost on longer runs, with different setups and different setups.
Modularity in nature is likely not a result of a single factor. Thus, another avenue for
future research is the combination of the different approaches to modularity. For example
enhancing the Seeded ES-HyperNEAT with connection cost might further help it take advantage
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 47
of the bias in the CPPN pattern. Alternatively, both the Seeded and the Connection cost
approaches could be tested under modularly varying goals.
Finally, while none of the approaches managed to solve the retina, task, but all managed
to solve the XOR task. Another possible next step from here is to evaluate the approaches on a
simpler problem. One possibility is the simplified retina problem of Clune et al. (2010).
Alternatively one could also try the multiple XOR problem as in Huizinga et al. (2014).
The take home lesson of the current paper is that modularity does not emerge easily even
with established methods. All the tested conditions were able to learn and to improve, yet
modular solutions did not emerge, even when under the pressure of connection cost or the bias of
seeding. At the same time the evolutionary runs were rather short – 3000 generations. Modularity
might be a property that emerges later at evolution. At the early stages it might not offer any
advantage. However, modularity might become important at later stages of the evolution when
the easier goals are already reached.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 48
BIBLIOGRAPHY
Bassett, D. S., Wymbs, N. F., Porter, M. A., Mucha, P. J., Carlson, J. M., & Grafton, S. T.
(2011). Dynamic reconfiguration of human brain networks during learning. Proceedings of
the National Academy of Sciences of the United States of America, 108(18), 7641–6.
http://doi.org/10.1073/pnas.1018985108
Bullinaria, J. A. (2007). Understanding the emergence of modularity in neural systems.
Cognitive Science, 31(4), 673–695.
Bullmore, E., & Sporns, O. (2009). Complex brain networks: graph theoretical analysis of
structural and functional systems. Nature Reviews. Neuroscience, 10(3), 186–98.
http://doi.org/10.1038/nrn2575
Bullmore, E., & Sporns, O. (2012). The economy of brain network organization. Retrieved
February 24, 2015, from http://www.cogsci.msu.edu/DSS/2013-2014/Sporns/Bullmore
Sporns 2012 NRN.pdf
Cardamone, L., Loiacono, D., & Lanzi, P. L. (2009). Evolving competitive car controllers for
racing games with neuroevolution. In Proceedings of the 11th Annual conference on
Genetic and evolutionary computation (pp. 1179–1186).
Clune, J., Beckmann, B. E., McKinley, P. K., & Ofria, C. (2010). Investigating whether
hyperneat produces modular neural networks. In Proceedings of the 12th annual conference
on Genetic and evolutionary computation (pp. 635–642).
Clune, J., Mouret, J.-B., & Lipson, H. (2013). The evolutionary origins of modularity.
Proceedings. Biological Sciences / The Royal Society, 280(1755), 20122863.
http://doi.org/10.1098/rspb.2012.2863
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective
genetic algorithm: NSGA-II. Evolutionary Computation, IEEE Transactions on, 6(2), 182–
197.
Dhar, V. K., Tickoo, A. K., Dubey, R. K., & others. (2009). Comparative performance of some
popular ANN algorithms on benchmark and function approximation problems. arXiv
Preprint arXiv:0911.1210.
Dobbs, D. (2015). Die, selfish gene, die. Retrieved from http://aeon.co/magazine/science/why-
its-time-to-lay-the-selfish-gene-to-rest/
Gomez, F., Schmidhuber, J., & Miikkulainen, R. (2006). Efficient non-linear control through
neuroevolution. In Machine Learning: ECML 2006 (pp. 654–662). Springer.
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 49
Hornby, G. S., & Pollack, J. B. (2001a). Evolving L-systems to generate virtual creatures.
Computers & Graphics, 25(6), 1041–1048.
Hornby, G. S., & Pollack, J. B. (2001b). The advantages of generative grammatical encodings
for physical design. In Evolutionary Computation, 2001. Proceedings of the 2001 Congress
on (Vol. 1, pp. 600–607).
Huizinga, J., Clune, J., & Mouret, J.-B. (2014). Evolving neural networks that are both modular
and regular: HyperNEAT plus the connection cost technique. In Proceedings of the 2014
conference on Genetic and evolutionary computation (pp. 697–704).
Kashtan, N., & Alon, U. (2005). Spontaneous evolution of modularity and network motifs.
Proceedings of the National Academy of Sciences of the United States of America, 102(39),
13773–8. http://doi.org/10.1073/pnas.0503610102
Lehman, J., Risi, S., D’Ambrosio, D., & Stanley, K. O. (2013). Encouraging reactivity to create
robust machines. Adaptive Behavior, 1059712313487390.
Li, S., Yuan, J., Shi, Y., & Zagal, J. C. (2015). Growing scale-free networks with tunable
distributions of triad motifs. Physica A: Statistical Mechanics and Its Applications, 428,
103–110.
Lowell, J., & Pollack, J. (2014). The Effect of Connection Cost on Modularity in Evolved Neural
Networks. In ALIFE 14: The Fourteenth Conference on the Synthesis and Simulation of
Living Systems (Vol. 14, pp. 726–733).
Meunier, D., Lambiotte, R., & Bullmore, E. T. (2010). Modular and hierarchically modular
organization of brain networks. Frontiers in Neuroscience, 4, 200.
http://doi.org/10.3389/fnins.2010.00200
Newman, M. (2003). Modularity, community structure, and spectral properties of networks.
Physical Review E.
Pugh, J. K., Goodell, S., & Stanley, K. O. (2013). Directional Communication in Evolved
Multiagent Teams University of Central Florida University of Central Florida University of
Central Florida, 1–17.
Risi, S., Lehman, J., & Stanley, K. O. (2010). Evolving the placement and density of neurons in
the hyperneat substrate. Proceedings of the 12th Annual Conference on Genetic and
Evolutionary Computation - GECCO ’10, (Gecco), 563.
http://doi.org/10.1145/1830483.1830589
Risi, S., & Stanley, K. O. (2012). An enhanced hypercube-based encoding for evolving the
placement, density, and connectivity of neurons. Artificial Life, 18(4), 331–63.
http://doi.org/10.1162/ARTL_a_00071
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 50
Stanley, K. O. (2007). Compositional pattern producing networks: A novel abstraction of
development. Genetic Programming and Evolvable Machines, 8(2), 131–162.
Stanley, K. O., D’Ambrosio, D. B., & Gauci, J. (2009). A hypercube-based encoding for
evolving large-scale neural networks. Artificial Life, 15(2), 185–212.
Stanley, K. O., & Miikkulainen, R. (1996). Efficient reinforcement learning through evolving
neural network topologies. Network (Phenotype), 1(2), 3.
Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting
topologies. Evolutionary Computation, 10(2), 99–127.
Thivierge, J.-P., & Marcus, G. F. (2007). The topographic brain: from neural connectivity to
cognition. Trends in Neurosciences, 30(6), 251–9. http://doi.org/10.1016/j.tins.2007.04.004
Togelius, J., Shaker, N., Karakovskiy, S., & Yannakakis, G. N. (2013). The mario ai
championship 2009-2012. AI Magazine, 34(3), 89–92.
Trujillo, L., Olague, G., Lutton, E., & De Vega, F. F. (2008). Discovering several robot
behaviors through speciation. In Applications of Evolutionary Computing (pp. 164–174).
Springer.
Verbancsics, P., & Stanley., K. O. (2011). Constraining Connectivity to Encourage Modularity in
HyperNEAT. Retrieved January 14, 2015, from
http://eplex.cs.ucf.edu/papers/verbancsics_gecco11.pdf
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 51
Appendix A: Pseudo Code
ES-HyperNEAT.
All images are taken from http://eplex.cs.ucf.edu/ESHyperNEAT/
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 52
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 53
EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 54
NSGA-II .
1. Sorting. Source: (Deb, Pratap, Agarwal, & Meyarivan, 2002)
2. Distance assignment. . Source: (Deb et al., 2002)