evolving modular neural networks with es hyper-neat, link expression output and connection...

EVOLVING MODULARITY AND REGULARITY WITH ES-HYPERNEAT 2

Evolving Modular Neural Networks with ES Hyper-NEAT, Link Expression

Output and Connection Costs

Victor Chudinov

A thesis submitted in partial fulfillment of the requirements for the degree of

Master of Science in Software Development and Technology

IT University of Copenhagen

31.08.2015


Abstract

Modularity is a key property of natural neural networks and an important factor in their ability to

produce complex behaviors. However, so far neuroevolution has struggled to create modular

neural networks without special encouragement. Such encouragement usually produces solutions

that are limited in both layout and performance. A promising approach for modularity in

artificial evolution is the connection cost technique developed by Clune et al. (2013). Unlike

previous approaches it does not put constrains on specific patterns of modularity, but allows

evolution to select the most feasible layout. The current paper introduces the Connection Cost

technique to Evolvable Substrate HyperNEAT and explores its effect on performance and ability

to produce modular solutions. The approach is compared to another successful technique – that

of locality seeding with ES-HyperNEAT. Further comparisons are carried out against Fixed

Substrate HyperNEAT with connection cost and Fixed Substrate HyperNEAT-NEAT with

locality seeding. All approaches are evaluated on an XOR task and a retina task. While all

conditions were able to solve the XOR task, none solved the retina task. Furthermore, there were

no solutions that were visibly modular. Finally, the ES-HyperNEAT with the connection cost

technique had significantly lower scores than all other approaches on both tasks.

Keywords: ES-HyperNEAT, HyperNEAT, modularity, connection cost, locality, neuroevolution


CONTENTS

CONTENTS ....................................................................................................................................... 4

INTRODUCTION ................................................................................................................................ 5

I. BACKGROUND .......................................................................................................................... 7

1.1 Modularity In Natural Systems ............................................................................................. 7

1.2 Modularity in Artificial Systems ........................................................................................ 10

1.3 Regularity and Compositional Pattern Producing Networks .............................................. 14

1.4 Neuroevolution Of Augmenting Topologies (NEAT) ........................................................ 16

1.5 HyperNEAT And Evolvable Substrate HyperNEAT ......................................................... 19

II. APPROACH: ............................................................................................................................ 25

2.1 Seeding Evolution Towards Modularity ............................................................................. 25

2.2 Modularity through Connection Cost ................................................................................. 26

2.3 Hypotheses .......................................................................................................................... 27

2.4 Methodology .................................................................................................................. 28

Evolutions setup. ................................................................................................................... 28

XOR task ............................................................................................................................... 30

Retina task ............................................................................................................................. 31

Software ................................................................................................................................ 31

III. RESULTS ............................................................................................................................. 33

3.1 XOR task ............................................................................................................................. 33

6.2 Retina task ...................................................................................................................... 35

Modularity............................................................................................................................. 40

3.3 Discussion ........................................................................................................................... 41

IV. CONCLUSION ....................................................................................................................... 46

BIBLIOGRAPHY .............................................................................................................................. 48

Appendix A: Pseudo Code ............................................................................................................ 51

ES-HyperNEAT. ....................................................................................................................... 51

NSGA-II . .................................................................................................................................. 54


INTRODUCTION

The field of AI has advanced immensely in the last decades. Modern AIs can recognize

complex objects, translate from foreign languages, and build cars. However, even then they

remain limited, even stupid, when compared to animals. Despite all the progress this complexity

has so far eluded our attempts to recreate it. One of the possible reasons is that most modern AIs

lack modularity (Huizinga, Clune, & Mouret, 2014).

At the same time modularity is a major factor in the complexity of animal behavior

(Meunier, Lambiotte, & Bullmore, 2010). This modularity also provides several substantial

benefits to systems that utilize it – they tend to be more robust, more evolvable, and more

flexible (E Bullmore & Sporns, 2012). However, at the same time modular solutions are only a

subset of all possible solutions (Verbancsics & Stanley., 2011). For many tasks there are plenty

non-modular solutions that are within closer reach (Verbancsics & Stanley., 2011). Furthermore,

modularity does not offer an immediate fitness reward and most modern artificial evolution

favors immediate rewards over long term benefits (Verbancsics & Stanley., 2011). Thus the

question what caused modularity to become so prevalent in nature is still open.

One recent approach, argues that modularity is a by-product of selection that strives to

minimize connection costs for building and maintenance of new connections (Clune, Mouret, &

Lipson, 2013). Huizinga et al. (2014) tested this approach with Hypercube based Neuroevolution

of Augmenting Topologies (HyperNEAT), and showed that it can create modular solutions that

even perform better than non-modular one. However, one distinct characteristic of HyperNEAT

is that the human operator specifies the positions of the possible hidden nodes (Risi & Stanley,

2012). At the same time, in nature such structures grow on their own (Risi & Stanley, 2012).

Thus there is the question if the connection cost technique would be able to create modular


solution with an approach that evolves the placement of the neurons as well as the connectivity

patterns. Evolvable-Substrate HyperNEAT is an extension to HyperNEAT that can discover both

the connectivity and the placement of nodes. It could potentially take advantage of the

connection cost technique to discover even better solutions, than the ones created by the fixed

substrate HyperNEAT. Furthermore, it has shown good results with another way to evolve

modularity – that of locality seeding. Thus it is possible that enhancing ES-HyperNEAT with the

connection cost would yield similar good results. Based on this, the paper investigates three

questions:

1. Can ES-HyperNEAT with connection cost outperform fixed substrate HyperNEAT

with connection cost and with locality seeding?

2. Can ES-HyperNEAT with connection cost perform as well as ES-HyperNEAT with

locality seeding?

3. Can ES-HyperNEAT-CCT produce visually modular phenotypes?


I. BACKGROUND

1.1 Modularity In Natural Systems

Many natural and social systems are organized as complex networks of interconnected

elements (E Bullmore & Sporns, 2012). Many of those systems are also modular (E Bullmore &

Sporns, 2012). That is, one can divide them into many relatively independent modules (Meunier

et al., 2010). The elements in each module are densely connected to each other, but have few

connections to elements in other modules (Meunier et al., 2010).

Perhaps the greatest example of such modularity is the animal brain (Meunier et al.,

2010). At the highest level it is composed of hierarchy of modules that could be identified

visually – two hemispheres, stem, cerebellum and so on (Meunier et al., 2010). At its lowest

level neurons self-organize into cortical columns, which then organize into supper columns

(Meunier et al., 2010). Then separate areas of the brain are responsible for processing different

information – visual processing area, area responsible for speech, etc (Meunier et al., 2010).

Modularity offers some considerable advantages to a network (E Bullmore & Sporns,

2012). First, modular networks are more robust to unexpected changes than a non-modular one.

Separate modules are more or less self-contained (E Bullmore & Sporns, 2012). Thus

an unexpected change to one of them is less likely to affect the general functionality of the

system (E Bullmore & Sporns, 2012). In contrast, an unexpected change to a non-modular

systems is more likely to disrupt a component that affects its general functionality (E Bullmore

& Sporns, 2012). Furthermore, if multiple modules process similar tasks, if one of them is

damaged, the rest can compensate (E Bullmore & Sporns, 2012).

Second, modular networks are more evolvable (E Bullmore & Sporns, 2012; Clune et al.,

2013). Mutations in modular networks can remain contained within a module. This allows


evolution to introduce and develop innovations without disrupting the entire system evolvable (E

Bullmore & Sporns, 2012; Clune et al., 2013). The innovation is can remain inside the module,

until the system learns how to utilize it best evolvable (E Bullmore & Sporns, 2012; Clune et al.,

2013).. Furthermore, an innovation could spread through the network just by replicating the

module,instead of having to discover it all over again evolvable (E Bullmore & Sporns, 2012;

Clune et al., 2013). .

Modular networks offer better functionality than non-modular ones evolvable (Bullinaria,

2007). Separate modules are able to process information and solve tasks separately (Bullinaria,

2007). This can not only increase the speed at which a task is solved but also allow the network

to process fundamentally different tasks simultaneously (Bullinaria, 2007). In fact this could

create a "whole bigger than the sum of its parts" effect (Bullinaria, 2007). Integrating the

information from multiple modules could lead to the emergence of much more complex

behaviours (Bullinaria, 2007; Clune et al., 2013).

Finally, as the information is contained to the individual module, the network has to deal

with less noise (Bullinaria, 2007; E Bullmore & Sporns, 2012). In contrast nodes in a non-

modular network would have to deal with much more irrelevant information and signal

transmission could slow down (E Bullmore & Sporns, 2012; Huizinga et al., 2014). Finally, the

lack of local separation means that the network would struggle with processing multiple tasks at

the same time (E Bullmore & Sporns, 2012).

The origin of modularity is still a topic of debate however (Clune et al., 2013). Studies in

artificial evolution offer several hypotheses (Clune et al., 2013; Kashtan & Alon, 2005;

Verbancsics & Stanley., 2011). One such hypothesis is that modular networks can arise

spontaneously in a changing environment (Kashtan & Alon, 2005). Survival in nature is defined


by a number of tasks and goals that share a set of common sub goals (Kashtan & Alon, 2005). In

such a case common modules might emerge to handle each of these common sub goals. After

that, when the environment changes the network would need little adjustment (Kashtan & Alon,

2005).

Another hypothesis on the origin of modularity is grounded in the principle of locality

(Verbancsics & Stanley., 2011). This principle states that in a physical systems, an event is likely

related to events that are nearby rather than ones further away (Verbancsics & Stanley., 2011).

Thus, a node in a network will likely form connections with other closer nodes (Verbancsics &

Stanley., 2011). In fact many natural networks follow this principle, brains included nodes

(Verbancsics & Stanley., 2011). In the brain neurons have more connections to neurons that are

close than to neurons farther away (Verbancsics & Stanley., 2011).

The idea is that forming close-range connections is easy, but as their number increases

this task gets more difficult (Verbancsics & Stanley., 2011). At some point there will be no more

close neurons and forming long-range connections would be infeasible (E Bullmore & Sporns,

2012; Clune et al., 2013). Furthermore, longer connections need better coordination and higher

accuracy and more resources (E Bullmore & Sporns, 2012; Verbancsics & Stanley., 2011).

A third hypothesis states that modularity is a by-product from a selection to reduce

connection costs (Clune et al., 2013). Each connection requires resources for building and

maintaining it (E Bullmore & Sporns, 2012; Clune et al., 2013). Furthermore, sending signals

along longer connections is slower and again requires more resources (E Bullmore & Sporns,

2012; Clune et al., 2013). Finally - more connections make signal transmission slower and

noisier (E Bullmore & Sporns, 2012; Clune et al., 2013). Thus an environmental pressure that

minimizes the use of resources will also strive to keep the connection cost of neurons low. As


evolution cuts these costs, it would preserve only connections that have a direct effect to

performance (E Bullmore & Sporns, 2012; Clune et al., 2013). The result of this pruning process

is often modularity (Clune et al., 2013).

In nature many systems seem to adhere to this hypothesis at least in part (Clune et al.,

2013). Vascular systems tend to have minimised connectivity. (Clune et al., 2013). Brains and

neural systems also are minimized to an extent (Bassett et al., 2011; Ed Bullmore & Sporns,

2009; Meunier et al., 2010; Thivierge & Marcus, 2007). At the same time the wiring diagram of

the brain is not completely minimized (E Bullmore & Sporns, 2012). Instead it operates at a

trade-off between connection cost and performance (E Bullmore & Sporns, 2012). This suggests

that optimal performance needs at least some long-range connections (E Bullmore & Sporns,

2012). Despite being costly, such connections could reduce the time needed to transmit a signal

to distant regions (E Bullmore & Sporns, 2012).

Most likely none of these hypotheses is solely responsible for modularity (E Bullmore &

Sporns, 2012). Instead a multitude of factors could all contribute to the emergence of modular

structures modularity (E Bullmore & Sporns, 2012). For example connection cost could work in

conjunction with locality to bootstrap modularity (Clune et al., 2013; Verbancsics & Stanley.,

2011).

1.2 Modularity in Artificial Systems

The goal of AI research is to produce systems that have complex and intelligent

behaviours on the scale seen in animals, yet this has so far eluded researchers (Huizinga et al.,

2014). These complex behaviours are possible in part due to the modular organization of

the brain modularity (E Bullmore & Sporns, 2012). As discussed in previous sections modularity

also overs some advantages over non-modular systems. Yet, modularity does not emerge in


artificial neuroevolution without specific techniques to encourage it (Verbancsics & Stanley.,

2011). Researchers have developed several techniques to encourage modularity - L-systems

(Hornby & Pollack, 2001a), grammar encoding (Hornby & Pollack, 2001b), seeding network

motifs (Li, Yuan, Shi, & Zagal, 2015). However most of these techniques tend to produce rather

limited modularity that adheres to rigid rules and perform worse than non-modular networks

(Kashtan & Alon, 2005; Verbancsics & Stanley., 2011).

One reason for this is that for most tasks there usually is a perfect non-modular solution

(Clune et al., 2013; Verbancsics & Stanley., 2011). Furthermore, modular solutions are only a

subset of all available solutions (Clune et al., 2013; Verbancsics & Stanley., 2011). They often

are not the most easily reachable ones and do not convey an immediate fitness reward (Clune et

al., 2013; Verbancsics & Stanley., 2011). At the same time artificial evolution tends to select for

the solutions that offer the most short-term rewards (Clune et al., 2013; Verbancsics & Stanley.,

2011).

A more recent approach that does not suffer from these issues is the modularly varying

goals (Kashtan & Alon, 2005). The evolution is carried on a task, where the overall goal

switches after some generations (Kashtan & Alon, 2005). Furthermore, all goals have common

sub goals (Kashtan & Alon, 2005).

Kashtan and Alon test the technique on two modularly decomposable tasks - a logic gate

task and a retina task. They tested three conditions - modularity varying goals, fixed goal and

random goals (Kashtan & Alon, 2005). On both tasks the MVG condition produced more

modular networks than the other conditions (Kashtan & Alon, 2005). Furthermore, no

modularity emerged in the random goals condition (Kashtan & Alon, 2005).


Clune et al. (2011) applied the MVG approach to Hypercube based Neuroevolution of

Augmenting Topologies (HyperNEAT) and tested it on the same retina task, but, it performed

poorly. When the authors imposed modularity there was improvement, but still the fixed goal

setup performed best (Clune, Beckmann, McKinley, & Ofria, 2010). Finally Clune et al. (2010)

also tested a simplified version of the retina problem that was able to solve the task and produce

modular solutions. Based on the poor results they got, Clune et al. (2010) questioned the

conclusion of Kashtan and Alon about the generality of their approach. Furthermore, many

practical tasks cannot be defined in a modular way, which makes the algorithm harder to apply

(Clune et al., 2010).

Clune et al. (2013) introduced another approach for evolving modularity in ANNs that

does need special modification of the task - connection cost. It steps on the idea that in natural

networks each connection has a cost associated with it (Clune et al., 2013). As a result many

natural networks have minimized connections (Clune et al., 2013). Clune et al. implement

connection cost as a second objective in a non-domination multi-objective algorithm (Clune et

al., 2013). They then tested it on a retina task like that of Kashtan and Alon (Clune et al.,

2013). The results showed that the connection cost objective both improves performance and

introduces modularity (Clune et al., 2013). Furthermore they show that modularity could also

arise without the need for switching the goal (Clune et al., 2013). Huzinga et al.(2014) brought

the connection cost technique a step further, by also introducing regularity. To do so they

combined the connection cost technique with HyperNEAT and tested it on a retina problem, a 5-

XOR problem and a H-XOR problem (Huizinga et al., 2014). They compared this approach to

another approach – that of locality seeding (see below), direct encoding and unmodified


HyperNEAT (Huizinga et al., 2014). In all tasks HyperNEAT with connection cost had the

highest fitness and highest modularity (Huizinga et al., 2014).

An interesting result that the authors note is that the modularity of the Genetic seeding

condition was high early on, but dropped later in the evolution (Huizinga et al., 2014).. In

contrast the modularity of the CCT-HyperNEAT raised the most after it had already solved the

task (Huizinga et al., 2014). A possible explanation is that after solving the task the only way the

network could improve is by optimizing connection costs, while in contrast as the CPPN grows

more complex with time, the influence of the seeding diminishes (Huizinga et al., 2014)..

In another paper Lowell and Pollack took a simpler approach to connection cost (Lowell

& Pollack, 2014). Instead of multiobjective algorithm, they used a single objective with fixed

penalty for connections (Lowell & Pollack, 2014). They tested this on the retina task, but

the connection cost did not convey fitness benefits and did not increase the level of modularity of

the best networks (Lowell & Pollack, 2014). Still it led to more consistent levels of modularity in

the population (Lowell & Pollack, 2014). However, the use of a single fitness function as well as

underlying properties of NEAT might have prevented more significant results (Lowell &

Pollack, 2014).

A third approach to modularity is that of locality seeding (Verbancsics & Stanley., 2011).

It is inspired by the idea, that locality is a widespread property of natural systems (Verbancsics &

Stanley., 2011). The approach is simple – it consists of Gaussian hidden nodes that are added to

the Compositional Pattern Producing Network (CPPN) underlying (Verbancsics & Stanley.,

2011) . As the Gaussian function peaks at inputs closer to 0, then it can introduce locality in the

pattern generated by the CPPN - see section II.1. of the current paper for a detailed review of the

technique.


Verbancsics and Stanley (2011) put the locality seed to a test against several other

variants of HyperNEAT. The results showed that the locality seed conditions performed best and

they tended to produce significantly more modular solutions than unseeded approaches

(Verbancsics & Stanley., 2011). An interesting result was that locality over a single axis

outperformed general locality over all axes (Verbancsics & Stanley., 2011). Another note, that

the authors make is that locality over the weight patterns tended to reduce fitness (Verbancsics &

Stanley., 2011).

Still, a drawback of HyperNEAT is that one has to specify the layout of the hidden nodes

(Risi, Lehman, & Stanley, 2010; Risi & Stanley, 2012).This creates a case, where HyperNEAT

can discover the weights of connections, but tells nothing about where the nodes should be (Risi

et al., 2010; Risi & Stanley, 2012). At the same time the CPPN does contain such information

(Risi et al., 2010; Risi & Stanley, 2012). Evolvable-Substrate HyperNEAT is an extension to

HyperNEAT that can take advantage of this information and discover the placement and density

of neurons in addition to the weight patterns (Risi et al., 2010; Risi & Stanley, 2012).

Furthermore it is compatible with the locality seeding and the LEO (Risi & Stanley, 2012). Risi

& Stanley (2012) tested the ES-HypperNEAT with locality seeding on the retina task. It

outperformed all other conditions and the solutions were predominantly modular (Risi &

Stanley, 2012).

1.3 Regularity and Compositional Pattern Producing Networks

Modularity, regularity and hierarchy are ubiquitous properties in natural organisms

(Huizinga et al., 2014). Cells combine together to form tissues, tissues form organs, organs form

the body . On each level there is repetition of components, symmetry, self-similarity and

repetition with variation (Stanley, 2007). For example, a centipede's body has several self-


similar segments (repetition, self-similarity). It’s left and right sides are mirror images of each

other (symmetry). Its legs vary in size, but maintain the same structure (repetition with

variation).

This regularity offers several advantages in evolution. First, one can reuse information

when describing the structure (Huizinga et al., 2014; Stanley, 2007). Second, a smaller genome

is easier to reproduce and reuse without error (Stanley, 2007). The human genome contains less

than 30 000 genes, 50% of which it shares with bananas (Dobbs, 2015). At the same time it

handles the development of the most complex structure known to man - the brain. Third,

evolution can discover a working solution only once and then re-use it (Stanley, 2007). For

example, in the evolution of a table one can discover the general structure of the legs, instead of

evolving each leg on its own. Finally the composition of regularities can lead to increased

complexity (Risi & Stanley, 2012; Stanley, 2007). On a higher level hands are mirror images of

each other. On a lower, fingers lie in a radial pattern around the palm. Each finger is similar to its

neighbors and at the same time mirrors the fingers on the other hand.

Figure 1: Various CPPN

Patterns evolved with the

online platform

PicBreeder. Source:

https://evolvegold.wordpre

ss.com/pic-breeder/


Regularity can help Evolutionary Computation discover solutions with complexity of a

natural scale (Risi & Stanley, 2012; Stanley, D’Ambrosio, & Gauci, 2009; Stanley, 2007). Many

higher order solutions could be achieved by composing simple elements (Stanley, 2007). At the

same time, re-using components would make these solutions less expensive computationally

(Stanley, 2007).

One approach that can encode regularity, used in the current study is Compositional

Pattern Producing Networks (CPPNs) (Stanley, 2007). CPPNs are mathematical abstractions of

development that can represent complex patterns in Cartesian Space (Stanley, 2007). In structure

and implementation CPPNs are similar to neural networks. They consist of nodes linked by

weighted connections, they have inputs, hidden nodes and outputs (Stanley, 2007). However,

where neural networks have a single activation function, CPPNs can have many different ones

(Stanley, 2007). Each function can introduce new coordinate frames that can hold other functions

(Stanley, 2007). This allows the network to produce complex patterns that exhibit various

symmetries or regularities (Stanley, 2007). For example a Gaussian node introduces left-right

symmetry and locality, while a sine – repetition (Stanley, 2007). This way the CPPN does not

need to simulate development to take advantage of its properties – the end result of the

development is encoded in the CPPN itself (Stanley, 2007).

1.4 Neuroevolution Of Augmenting Topologies (NEAT)

Neuroevolution is the artificial evolution of neural networks using genetic algorithms

(Stanley & Miikkulainen, 2002). In neuroevolution, a population of ANNs has to solve a task

over many generations, where each generation the networks are evaluated on that task (Stanley

& Miikkulainen, 2002). Improvement happens through mutation or through the crossover of fit

individuals (Stanley & Miikkulainen, 2002). This allows neuroevolution to create neural


networks that can solve a task without the need for extensive information gathering, as in

backpropagation (Stanley & Miikkulainen, 2002). However, despite this advantage, traditional

neuroevolution approaches are plagued by several problems.

First is the competing conventions problem. A task can have many successful solutions,

yet, these solutions might be incompatible with each other (Stanley & Miikkulainen, 2002). The

genomes can have different sizes, genes in same positions might encode different structures or

the same structure might appear at different positions (Stanley & Miikkulainen, 2002). As a

consequence, the offspring of two successful solutions might not be successful at all (Stanley &

Miikkulainen, 2002).

Next, innovations usually bring long-term benefits, but lead to a temporary drop in fitness

(Stanley & Miikkulainen, 2002). Thus non-novel ANNs that offer better immediate fitness

returns might dominate the population (Stanley & Miikkulainen, 2002). This leads to less

Figure 2 Crossover in

NEAT. Each genome is

assigned a historical marker

– a global number that marks

the specific structures.

During crossover the two

genomes are aligned. Genes

that are in common are

carried over. Disjoint genes

and excess genes are taken

from the more fit parent.

Image source:(Stanley &

Miikkulainen, 2002)..


diversity in the population and the evolution can stall in a local optima (Stanley & Miikkulainen,

2002).

Finally, a random starting topology poses problems on its own. Many of the initial

networks might not have a valid path from inputs to outputs (Stanley & Miikkulainen, 2002). It

would then take time for evolution to weed them out. Furthermore, random topologies tend to

not produce minimal solutions (Stanley & Miikkulainen, 2002). The population already starts

with unnecessary structures amd unless there is explicit fitness penalty, these structures will

carry over through the generations (Stanley & Miikkulainen, 2002). Thus, networks will

accumulate structures that increase computational costs, without improving performance

(Stanley & Miikkulainen, 2002). One can impose a fitness penalty for network size, but then

different problems would need discovering the appropriate penalty (Stanley & Miikkulainen,

2002).

Figure 3 Mutation in

NEAT: There are two types

of mutations in NEAT. One

that adds connection

between two nodes that are

not connected. Second adds

a node that splits an already

existing connection. In both

cases the new genes are

added at the end of the

genome. Image source:

(Stanley & Miikkulainen,

2002)


Neuroevolution of Augmenting Topologies (NEAT), is an approach that attempts to solve

these problems (Stanley & Miikkulainen, 2002). NEAT achieves this through the introduction of

historical markers, speciation and minimal initial population (Stanley & Miikkulainen, 2002).

Historical markers are global numbers associated with each innovation in the population (Stanley

& Miikkulainen, 2002). This allows one to keep track of which gene is which (HyperNEAT

paper). This way one can compare and crossover two genomes without needing complex

topological analyses. (Stanley & Miikkulainen, 2002) Next, speciation allows the subdivision of

the population into species. Each specie represents its own evolutionary niche where a genome

can compete with other similar genomes and optimize a particular innovation (Stanley &

Miikkulainen, 2002). Finally, NEAT starts from a minimal structure - just input and output

neurons. This allows networks to have an incremental grow as evolution discovers and optimizes

innovations. The result is compact and more efficient networks (Stanley & Miikkulainen, 2002).

NEAT has seen succesfull aplication in sevaral domains like pole-balancing tasks

(Gomez, Schmidhuber, & Miikkulainen, 2006; Stanley & Miikkulainen, 1996, 2002), game AI

(Cardamone, Loiacono, & Lanzi, 2009; Pugh, Goodell, & Stanley, 2013; Togelius, Shaker,

Karakovskiy, & Yannakakis, 2013) robot control (Trujillo, Olague, Lutton, & De Vega, 2008).

NEAT can also evolve CPPNs with minor adaptations (Stanley et al., 2009; Stanley &

Miikkulainen, 1996).

1.5 HyperNEAT And Evolvable Substrate HyperNEAT

Hypercube-based NEAT is an extension to NEAT and CPPNs inspired by the idea that a

geometric pattern could represent the connectivity patterns of a neural network (Stanley et al.,

2009). This idea steps on a few insights. First, the pattern generated by a CPPN is the result of

the output of the points that make it up (Stanley et al., 2009). Second, one could


specify a connection between two points just by their coordinates (Stanley et al., 2009). Third

one could represent a connection in n dimensions as a point in 2n dimensions (Stanley et al.,

2009). For example, take a connection in 2-dimensional space that has two nodes with

coordinates (x1, y1) and (x2, y2). These coordinates also correspond to a 4-dimensional point

with coordinates (x1, y1, x2, y2) (Risi & Stanley, 2012; Stanley et al., 2009). One can

then query a CPPN with this point and interpret the output as the weight of the connection

(Stanley et al., 2009). This process then repeats for each of the points that could make up the

neural network (Stanley et al., 2009). Finally, as outlined above, NEAT could evolve CPPNs

(Stanley et al., 2009). Thus one can test the generated networks on a given task, and assign the

result as fitness of the CPPN. The evolution then would create patterns that produce more

effective networks (Stanley et al., 2009).

Figure 4: HyperNEAT: 1. The substrate is composed of several predefined points. 2. The

CPPN is queried with the coordinates of these points. 3. The output of the CPPN then

becomes the weight of the connection between the input nodes. Image source: (Stanley et

al., 2009)


A specific property of the connectivity pattern is that it is isomorphic to the geometric

pattern that underlies it (Stanley et al., 2009). This allows it to preserve all regularities and

symmetries found in the geometry pattern (Stanley et al., 2009). One can then take advantage of

these regularities by making use of the geometry of the task (Risi & Stanley, 2012; Stanley et al.,

2009). For example once can place the inputs and outputs for a robot controller in a way that

reflects their actual position on the robot (Risi & Stanley, 2012; Stanley et al., 2009). Evolution

can then exploit this to speed up the discovery of the relationship between sensors and actuators

robot (Risi & Stanley, 2012; Stanley et al., 2009).

HyperNEAT offers several advantages over other approaches. First, it allows for a

compact description of much larger networks (Stanley et al., 2009). A CPPN could describe a

network of thousands of nodes with a genotype of just a few lines (Stanley et al., 2009).

Furthermore, the same CPPN can generate networks with varying sizes with just a change in a

few parameters (Stanley et al., 2009). Finally, one does not need to simulate development - the

end results are already encoded in the CPPN (Stanley et al., 2009).

Yet, in HyperNEAT one still has to specify the positions of the possible hidden

nodes. The pattern would have to intersect with these specific positions and adapt to them (Risi

et al., 2010; Risi & Stanley, 2012). Thus, the CPPN tells us what the connection weight between

nodes should be, but not where to place them (Risi et al., 2010; Risi & Stanley, 2012).

At the same time a CPPN pattern does contain information on where the nodes could be

(Risi et al., 2010; Risi & Stanley, 2012). Whenever HyperNEAT discovers a connection it also

discovers the nodes that define the connection. When asking which connections to include we

are also asking which nodes to include. Thus by switching the focus from connections to nodes

we would not need to specify them in advance (Risi et al., 2010; Risi & Stanley, 2012).


Furthermore, expressing a node depends on the information stored in the area of the

pattern where it lies (Risi et al., 2010; Risi & Stanley, 2012). This information is not. There are

areas of relative uniformity, where the nodes compute the same functions (Risi & Stanley, 2012).

There are also areas of high information density where nodes compute a variety of functions

(Risi & Stanley, 2012). It makes sense then to place nodes in these areas of high density, rather

than in the areas of uniform density (Risi & Stanley, 2012). Evolvable Substrate HyperNEAT

can discover these areas of high information variance and place nodes there (Risi & Stanley,

2012). In this way it can discover the locations of the hidden neurons in addition to the weights

of the connections (Risi & Stanley, 2012).

At the heart of ES-HyperNEAT is a procedure that takes a node as input and returns

its connectivity pattern as output. This procedure consists of two steps - an initialization phase

and a pruning phase (Risi & Stanley, 2012). During the Initialization phase the algorithm scans

the pattern for areas of high information (Risi & Stanley, 2012). It does this by dividing the

search space into regions until a specified resolution or until the variance of the region is too low

(Risi & Stanley, 2012). Once this happens nodes are placed on these locations (Risi & Stanley,

Figure 5 The two phases of the information extraction phase of the ES-HyperNEAT algorithm. (1)

During the pruning and extraction phase the space is iteratively subdivided until a maximum level is

reached or there is no more available variation. (2) Once the three with the possible nodes is

constructed, it is traversed once more until low variance is reached. Once this happens the connection

is expressed for the qualifying nodes. Image source: (Risi & Stanley, 2012)


2012). During the traversal, the algorithm constructs a quad tree that holds all thes discovered

nodes (Risi & Stanley, 2012).

During the pruning, phase the algorithm traverses the quad tree and removes nodes with

low variance (Risi & Stanley, 2012). . The end result is more nodes in areas with higher

information density (Risi & Stanley, 2012). . An important part of the Pruning and Expression

phase is the so called band pruning. Areas next to an edge in the pattern have higher variance,

and thus more expressed nodes (Risi & Stanley, 2012). However, on such edges the CPPN

might have a hard time determining in which region a node lies and what value to assign to it

(Risi & Stanley, 2012). . Band pruning restricts the expression of nodes close to the edge

between regions and focuses on nodes that exist well within a region of the pattern(Risi &

Stanley, 2012).

With this procedure established the algorithm constructs the neural network in an

iterative fashion (Risi & Stanley, 2012). . First discovering nodes that connected to the inputs,

(Risi & Stanley, 2012). Then further layers of hidden nodes for some iterations (Risi & Stanley,

2012). . Finally the procedure checks which of the hidden nodes connect to the outputs (Risi &

Stanley, 2012). . The last step is to remove nodes that do not have a path to both an input and an

output (Risi & Stanley, 2012). .

Figure 6 presents a complete overview of ES-HyperNEAT. Pseudo-code of the algorithm

is in appendix B. For a more thorough discussion on the methods behind ES-HyperNEAT look at

Risi & Stanley (2012)


Figure 6: The ES-HyperNEAT algorithm. First the hidden nodes that are connected directly to

the input are discovered (a). Next, the algorithm can discovers new hidden nodes for some

iterations (b). Finally, the algorithm discovers which hidden nodes connect to the output (c).

After that only nodes that have a complete path to input and output are kept (d). Image source:

(Risi & Stanley, 2012).


II. APPROACH:

2.1 Seeding Evolution Towards Modularity

The original HyperNEAT controls the expression of connections through a static threshold

(Verbancsics & Stanley., 2011). This threshold is a predefined parameter that affects all

connections in the same way (Verbancsics & Stanley., 2011). At the same time modularity needs

different thresholds in different areas of the substrate (Verbancsics & Stanley., 2011). Thus one

needs a way to put in place such a dynamic threshold if one wants to see modular solutions

(Verbancsics & Stanley., 2011). At the same time, a task might have a better non-modular solution.

If one puts an explicit drive towards modularity, such solutions would remain undiscovered

(Verbancsics & Stanley., 2011). Instead one can only bias evolution towards modular solutions but

without restricting non-modular ones (Verbancsics & Stanley., 2011).

The Link Expression Output (LEO) combined with locality seeding does just that. The

idea behind it is that a CPPN can encode the connectivity pattern separate from the weight pattern

(Verbancsics & Stanley., 2011). By adding this extra output one can decouple the two (Verbancsics

& Stanley., 2011).

Furthermore, one can bias the CPPN towards certain natural properties like modularity and

locality (Verbancsics & Stanley., 2011). This happens by adding a Gaussian hidden node to the

initial CPPN (Verbancsics & Stanley., 2011). The Gaussian function peaks at 0, so it would output

higher values for nodes that are closer to each other. Then if this node connects to the LEO it would

introduce locality in connectivity (Verbancsics & Stanley., 2011). If it connects to the normal

output it would introduce locality in weights (Verbancsics & Stanley., 2011). Seeding the CPPN

in this way might encourage modularity, but without constraining non-modular solutions


(Verbancsics & Stanley., 2011). If the task turns out to not need modularity, evolution would drive

it out (Verbancsics & Stanley., 2011).

2.2 Modularity through Connection Cost

The novel contribution of this paper is the introduction of the connection cost technique

to ES-HyperNEAT. In its essence the technique relies on optimizing both the connection cost of

the ANN and its performance at the same time (Huizinga et al., 2014). HyperNEAT and

HyperNEAT allow one to compute the length of the connections in the substrate (Huizinga et al.,

2014). Thus one can calculate a score that represents the connection cost and use it alongside

fitness using a multiobjective algorithm (Huizinga et al., 2014).

In their implementation of Connection Cost, Huizinga et al. (2014) used the NSGA-II

multiobjective algorithm (Huizinga et al., 2014). This algorithm sorts the population by dividing

in into subgroups called fronts (Huizinga et al., 2014). The individuals in each front dominate

the individuals in the next fronts (Huizinga et al., 2014). Here, one individual dominates

another, if for every objective it is not worse, and for at least one objective it is better than that

individual (Huizinga et al., 2014). Individuals that do not dominate each other would end up

together in the same front (Huizinga et al., 2014). Furthermore, within each front individuals are

further assigned a distance value based on the similarity of the scores of the individual to the

other individuals in the same front (Huizinga et al., 2014). The population is then sorted by front

and distance (Huizinga et al., 2014). Here an individual is deemed more fit than another if it has

lower rank or higher distance (Huizinga et al., 2014).

The original NSGA-II gives equal priority to all objectives (Huizinga et al., 2014). Yet,

there are cases where one would need to assign priority to these objectives (Huizinga et al.,

2014). One way to do this is by stochasticity. That is one can assign a probability to each


objective. The higher the probability, the more often this objective is included when determining

domination (Huizinga et al., 2014). In the current paper, fitness is always taken into account,

while connection cost - 25% of the time.

Unfortunately NSGA-II is incompatible with NEAT's Speciation (Huizinga et al., 2014;

Lehman, Risi, D’Ambrosio, & Stanley, 2013). So when using it with NEAT based approaches

one would need to adjust for that (Lehman et al., 2013). One way to do this is by adding an extra

objective that calculates genomic diversity(Lehman et al., 2013). This genomic diversity is the

average distance of the genome to its k-nearest neighbours as measured by the NEATs genomic

distance function(Lehman et al., 2013). Adding this objective then serves a function like that of

Speciation in NEAT (Lehman et al., 2013).

2.3 Hypotheses

The results of Risi & Stanley (2012) show that an extension developed for HyperNEAT

could give even better results with ES-HyperNEAT. Similarly, the connection cost technique

worked well as an extension to HyperNEAT in Huizinga et al. (2014). This raises the question

whether the connection cost would also work well with Evolvable Substrate HyperNEAT like

the seeding did. Combining the connection cost technique and ES-HyperNEAT might produce

better modular solution than its fixed substrate counterpart. Furthermore, in Huizinga et al.

(2014) the HyperNEAT-CCT outperformed the seeded HyperNEAT. Thus it is possible that this

relationship would also hold for ES-HyperNEAT. Thus I pose the following three hypotheses:

4. ES-HyperNEAT with connection cost will outperform HyperNEAT with connection

cost and HyperNEAT with seeding.

5. ES-HyperNEAT with connection cost will outperform seeded ES-HyperNEAT.

6. ES-HyperNEAT-CCT will produce visually modular solutions.


2.4 Methodology

Evolutions setup.

Five different approaches (table 1) will be tested on two tasks – XOR task and a retina

task. In both cases the ANNs are activated 5 times before taking the output. In all cases the

CPPN is also activated 5 times. The initial CPPN for the seeded conditions was modeled after

Risi & Stanley (2012) and is presented on figure 7. It has two hidden Gaussian nodes. The first

one is responsible for weight locality along the y axis, while the second – for locality in

connectivity along the x axis. Thus the first one is connected to the normal CPPN output, while

the second – to the LEOThe substrates for the HyperNEAT conditions were similar to those of

Huizinga et al. and are presented on Fig 8. The substrates for the ES conditions used the same

inputs and output as the substrates for the HyperNEAT.

Table 1: Experimental Conditions

The XOR task was run 20 times for 300 generations each. The retina task was run 20

times for 3000 generations per run. Population size on both tasks was 150 individuals. The

multiobjective conditions had three objectives – genomic diversity, fitness and connection cost.

Name Description

HyperNEAT-LEO Fixed substrate HyperNEAT with Link Expression Output. This

condition serves as control case.

Seeded HyperNEAT Fixed substrate HyperNEAT with geometric and locality seeding.

HyperNEAT-CCT Fixed substrate HyperNEAT evolved with the connection cost.

Seeded ES-HyperNEAT Evolvable Substrate HyperNEAT with geometric and locality

seeding.

ES-HyperNEAT-CCT Evolvable Substrate HyperNEAT evolved with the connection cost.


There was 25% chance to take the connection cost objective in comparisons between the

genomes, while the other two objectives were taken into account every time.

. The fitness for the XOR task was(4 − 𝐸)2, where E is the summed error. The fitness

function for the retina task was 1000

(1+𝐸∗𝐸) where E is again the summed error. The connection

cost was measured as 1000

(1+𝐿∗𝐿) where L is the summed connection length of the substrate

calculated through Euclidean distance. The genomic diversity objective was calculated as the

average genomic distance to all other members of the population in terms of the NEAT genomic

distance.

Finally, the maximum resolution for the ES-HyperNEAT conditions was 5,

corresponding to a grid of 32x32 nodes. Band threshold was set to 0.3, Division threshold to 0.5

and Variance threshold to 0.03. The substrates for the HyperNEAT conditions are presented on

the figure below. The rest of the parameters are presented in Appendix A.

Figure 7: Seeded CPPN. The CPPN

for the seeded conditions is initialized

with two Gaussian hidden nodes. One

of the nodes takes as input x1 – x2

and specifies locality in connectivity.

The other one takes as input y1 –y2+

bias and specifies locality in weights.

Image source: (Risi & Stanley, 2012)


XOR task

The ES-HyperNEAT-CCT is a novel approach and both the NSGA-II and ES-

HyperNEAT presented here are custom implementations. It makes sense to validate them on a

simpler task before moving on to harder problems. The XOR problem is one such benchmarking

task that is well established in the field (Dhar, Tickoo, Dubey, & others, 2009). In it the ANNs

have to solve a logical exclusive or problem. The network has two inputs, a bias input and a

single output. In the HyperNEAT setups the network also has two hidden neurons (Fig 9.). The

inputs are binary, and the network must identify whether they evaluate to true or false (Table 2).

Table 2 The XOR problem truth table

A B Output

0 1 1

1 0 1

0 0 0

1 1 0

Figure 8 Layout of the

HyperNEAT Substrate. Red nodes are at z = -1

Green Nodes are at z = 1

Grey Nodes are at z = 0

The grey-red node is the output.

The ES-HyperNEAT substrate uses

the same input and output positons


Retina task

The retina task is more complex than the XOR one. In it task inputs are projected on two

retinas of 2x2 pixels each. As the input is binary there are 16 possible combinations per retina for

a total of 256 pattern. Out of the 16 patterns that a retina receives 8 are considered valid. The

ANN must learn to recognize whether there is a valid pattern present on the left and the right

retina. This problem is modularly decomposable, as it requires the network to first process the

two patterns before outputting a final answer (Kashtan & Alon, 2005). Thus in order to solve the

task the neural network would benefit from modularity (Kashtan & Alon, 2005). At the same

time the retina task also has non-modular solutions (Kashtan & Alon, 2005).

Figure 9 The retina task. Inputs are projected on the left and the right half of the retina. It must

then recognize whether both retinas contain valid objects. Image source: (Risi & Stanley, 2012)

Software

All experiments were implemented using the MultiNEAT library. The Evolvable

Substrate HyperNEAT and the NSGA-II multiobjective algorithm were implemented by the

author for the purpose of this paper. The tests were run on Ubuntu 14.04LS and OpenSUSE 13.

The source code is available at: https://github.com/vchudinov/MultiNEAT

https://github.com/vchudinov/MultiNEAT


and https://github.com/peter-ch/MultiNEAT (no NSGA-II).

https://github.com/peter-ch/MultiNEAT


III. RESULTS

3.1 XOR task

All conditions were able to solve the XOR task within 300 generations. Furthermore, all

conditions but the ES-HyperNEAT-CCT were able to achieve perfect performance. The

descriptive statistics are presented in Table 3. Figure 10 presents the fitness curves. The seeded

Figure 10: Average fitness on the XOR task over 20 runs.

Mean Min Max St. Dev.95% CI

Low

95% CI

High

ES-HyperNEAT-CCT 13.18 9 15.79 2.64 12.03 14.34

ES-HyperNEAT-seeded 15.65 9.01 16 1.52 14.98 16

HyperNEAT-CCT 14 5.74 16 3.29 12.56 15.45

HyperNEAT-Seeded 15.84 12.78 16 0.7 15.53 16

HyperNEAT-LEO 15.23 7.63 16 2.31 14.21 16

Table 3 Descriptive statistics for the XOR task


HyperNEAT condition was fastest in solving the task, followed by the HyperNEAT-LEO.

An interesting result here is the seeded ES-HyperNEAT. It had the lowest fitness until

about generation 75 when it rapidly overtook the other conditions. The ES-HyperNEAT with

connection cost had the lowest average performance across the entire evolution.

Finally a 95% confidence interval shows that both seeded conditions significantly

outperform the ES-HyperNEAT with connection cost. The HyperNEAT-LEO performed on par

with the other two fixed substrate conditions and the seeded ES-HyperNEAT. It had significantly

higher performance than the ES-HyperNEAT-CCT. Finally, the confidence intervals show some

overlap between the seeded ES HyperNEAT and the CCT HyperNEAT. The two conditions

were further compared with a Mann Whitney U test. The Seeded ES-HyperNEAT significantly

outperforms the CCT-HyperNEAT (U = 64, p < 0.01).

Figure 11: 95% Confidence intervals for the XOR task. The ES-HyperNEAT-CCT shows

significantly lower score than all conditions but the HyperNEAT-CCT one.


6.2 Retina task

No condition was able to solve the Retina task within 3000 generations. However until

the very end of the evolution the fitness was rising for all conditions. The descriptive stats at

generation 3000 are presented on Table 4. The highest average score at generation 3000 was held

by the unseeded HyperNEAT, while the highest score in the population by the seeded ES-

HyperNEAT. The Lowest average and the lowest overall scores both belonged to the ES-

HyperNEAT with connection cost.

Figure XX shows the fitness curves for the averaged best fitness. ES-HyperNEAT

maintains the lowest average score throughout the entire evolution The highest average scores

vary between the seeded ES-HyperNEAT and the HyperNEAT-LEO. Note that in all cases the

speed of the evolution starts slowing down at around fitness score of 0.12-0,16.

Figure 12 shows the distribution of the fitness for each condition over three points in the

evolution. In general it shows that the entire population improves. Furthermore, as the evolution

progresses the best individuals become more pronounced and the upper quartiles of the

populations grow. The most interesting score is that of the ES-HyperNEAT-CCT. The variance

of the population of that condition drops considerably. This shows that the population is getting

trapped in a local optima. At the same time the presence of outliers signifies that it is able to

escape it. Another interesting result is that when outliers are not taken into account the best

Table 4: Retina performance at generation 3000.

Mean Min Max St. Dev.95% CI

Low

95% CI

High

ES-HyperNEAT-CCT 0.166 0.116 0.244 0.023 0.156 0.176

ES-HyperNEAT-seeded 0.237 0.164 0.492 0.082 0.201 0.273

HyperNEAT-CCT 0.206 0.155 0.26 0.03 0.192 0.219

HyperNEAT-Seeded 0.193 0.164 0.278 0.034 0.178 0.207

HyperNEAT-Control 0.249 0.172 0.382 0.054 0.225 0.272


performer overall seem to be the HyperNEAT-LEO condition with close to 50% of the sample

outperforming the next best – the Seeded HyperNEAT condition.

The comparison between the different conditions was done with 95% confidence

intervals. Whenever the Cis have a small (less than 25%) overlap the conditions were further

compared using Mann-Whitney’s U. Finally, an overall presence of differences between the

samples was determined with Kruskal-Wallis analysis of Variance.

Figure 12 Average best fitness for the retina task

Figure 13 Fitness Distributions at three points of Evolution. Note the change of variance in the ES-

HyperNEAT-CCT. It suggests that the evolution has stumbled on a local optima. Also note the change

in the upper quartile of the HyperNEAT-LEO. By Generation 200 already it had the highest fitness

values.


The Kruskal-Wallis test showed that samples differ from each other significantly (K =

44.18, p < 0.01). Confidence intervals (fig 15) show which samples differ from each other. First

– the ES-HyperNEAT-CCT has significantly worse performance than the seeded ES. The ES-

HyperNEAT-CCT also performed significantly less than the control condition. Finally – the

seeded ES-HyperNEAT did not perform significantly better than the control condition.

From the two connection cost conditions, the one with the fixed substrate had higher

performance than the evolvable substrate. With the seeded HyperNEAT conditions the situation

is the opposite and the ES-HyperNEAT had better performance. Since there is a slight overlap in

the CIs, the samples were also compared with Mann-Whitney’s U. It showed that the seeded ES-

HyperNEAT is significantly better (U = 95, p<0.01). The HyperNEAT-LEO condition

significantly outperformed all conditions save for the seeded ES-HyperNEAT condition.

Finally, analyzing the accuracy on the retina task revealed similar results (Fig 16., Table

5. ). The Seeded-ES-HyperNEAT outperformed all other conditions but the HyperNEAT-CCT.

Here the Hyper-NEAT-LEO had similar results to those of the other fixed substrate tasks. The

ES-HyperNEAT-CCT again had significantly lower scores than all other conditions. Do note

however, that the accuracy scores are within a few percent of each other. This suggests that the

difference is only in a few patterns.

Mean St. Dev.95% CI

Low

95% CI

High

ES-HyperNEAT-CCT 0.8478 0.0126 0.8422 0.8534

ES-HyperNEAT-seeded 0.87031 0.00358 0.8633 0.8773

HyperNEAT-CCT 0.8634 0.0159 0.8593 0..8675

HyperNEAT-Seeded 0.8583 0.0108 0.8536 0.8631

HyperNEAT-LEO 0.8583 0.0108 0.8536 0.8631

Table 5 Accuracy on the retina task in percent


Finally, figure 14 presents the change of accuracy over the evolution. All conditions but

the ES-HyperNEAT-CCT reached their maximum performance early on in evolution and barely

improved after that. In contrast, the ES-HyperNEAT-CCT showed slow and gradual

improvement. It is possible that this point of accuracy represents a local optimum that the

conditions were not able to overcome. Furthermore, a look at the patterns themselves showed

that the conditions were able to classify correctly both some pattern that belonged to the correct

objects, as well as some that did not.

Figure 14 Average accuracy over 20 runs on the retina task. All conditions but the ES-

HyperNEAT-CCT reach their maximum accuracy within the first 50 generations.


Figure 15: 95% Confidence Intervals for the retina task at generation 3000

Figure 146: 95% Confidence intervals for retina task accuracy. Seeded ES-HyperNEAT has

significantly higher accuracy scores than all conditions but the HyperNEAT-CCT. ES-

HyperNEAT-CCT has a significantly lower score than all other conditions.


Modularity

The visualizations of all neural networks at generation 3000 are attached as

supplementary material to the current paper. None of the conditions yielded solutions that are

visibly modular. In all cases the networks tend to be highly connected and with variable levels of

complexity. This also holds true for the ES-HyperNEAT with connection cost. Despite the

connection cost it managed to produce both very simple and very complex networks. Also

despite the connection cost the complexity of the networks changed in both directions during the

Figure 17: Clustering of nodes in evolved networks. Such node clusters might be indicative

of internal modularity

Figure 18 Similar networks evolved by different approaches. Left - Connection Cost ES-

HyperNEAT. Right - Seeded ES-HyperNEAT


evolution. Another interesting result is that the two different methods – the CCT-ES

HyperNEAT and the Seeded ES HyperNEAT could produce solutions with similar topology and

topography (fig 18).

Finally, despite the lack of modularity, the networks exhibited different levels of

regularities. Most interesting is that in some of the cases, the nodes where clustered into two

groups – in the left and the right half of the substrate. While these networks are still connected

pretty strongly, it is possible that there is underlying pattern of modularity that is not visible at a

glance.

3.3 Discussion

The results did not support any of the hypotheses. First, the ES-HyperNEAT-CCT did not

outperform the HyperNEAT-CCT. On the XOR task the two conditions had similar performance.

On the retina task the ES condition performed significantly worse. Second - the ES-HyperNEAT

with connection cost did not outperform the seeded ES-HyperNEAT. In fact the opposite was

true. The ES-HypernNEAT-CCT had the poorest performance on both tasks, while the seeded

Figure 19: Networks of different complexity, both evolved by ES-HyperNEAT with Connection cost


ES-HyperNEAT – the highest. Third, none of the tested approaches exhibited visually

identifiable modularity. Finally, none of the solutions was able to solve the retina task within

3000 generations. In contrast all were able to solve the much simpler XOR task within 300

generations.

The poor performance of the connection cost technique with ES-HyperNEAT comes as

particularly surprising. A look at fig 13. shows that most solutions ended up grouped around a

narrow range of values. At the same time each solution is quite distinct. This suggests that the

evolution stalls in a local optimum. The seeded ES-HyperNEAT also experienced a temporary

stall at quire similar fitness score. Furthermore, at a similar score the learning of all solutions

started slowing down. However, the Seeded ES-HyperNEAT was able to overcome it relatively

Figure 20: A sample evolutionary line from the Connection Cost ES-HyperNEAT condition. Left -

generation 1000, middle - generation 2000, right - generation 3000


quickly. Finally, none of the fixed substrate approaches experienced that local optimum. Thus

one can hypothesize that it is a problem, specific to the Evolvable Substrate approach.

The two factors that might account for the difference are the presence of seeding in one

case and the CCT in the other. Thus one possible explanation is that the locality seed makes

finding novel solutions easier. In contrast, when starting with an empty genome, evolution might

need more time to find a successful configuration. Alternatively, the evolution might be stuck at

a point that requires more complex solution. However, the connection cost acts as pressure

against that. This narrows down the search space to only those solutions that both work and have

lower connection cost. Yet, the produced phenotypes seem to have varying complexity. This

shows that the ES-HyperNEAT-CCT has no problem producing more complex phenotypes. Do

also note that there are outliers that have scores close to those of the other approaches. The

number of these outliers also seems to increase with time. This suggests that the evolution can, in

fact, find these successful solutions. Furthermore, both ES-HyperNEAT conditions were

outfitted with a LEO. Risi & Stanley (2012) note that ES-HyperNEAT with LEO actually

performs worse than ES-HyperNEAT without it. Thus it might be the case that LEO is

hampering evolution, and not the connection cost.

One way to discover whether the difficulties of finding them is a case of seeding or of

connection cost would be to compare these approaches to unseeded ES-HyperNEAT without

LEO. Furthermore trying the connection cost technique itself on ES-HyperNEAT without LEO

could show different results. Another would be to have a larger sample size. With a sample size

of just 20 it might be difficult to measure actual distributions and statistical effects. Finally, on

both tasks the fitness of the ES-HyperNEAT-CCT was growing the slowest. As the evolution

struggles to balance the objectives, it might need more time to improve on them all. In Huizinga


et al. by generation 3000 fitness was still changing fast. Thus longer evolution runs could

actually allow the ES-HyperNEAT-CCT to reach its full potential.(Risi et al., 2010)

The second result was that none of the approaches evolved visually modular phenotypes.

However, the lack of such modularity does not mean lack of modularity per se. With more

complex networks determining modularity is not straightforward. Thus one would need a more

appropriate way to measure modularity. One such approach used in the literature is the

modularity Q score.

Other research reports the present of visual modularity. However there are some notable

differences from the current paper. For example Huizinga et al. run their task for a much longer

time - 50 000 generations. They note that modularity did not emerge until evolution had solved

the task. Verbancsics & Stanley (2011) and Risi & Stanley (2012) used shorter runs, but had a

task set-up with two output nodes instead of one.

Both could be causes for the current lack of results. For example the connection cost

cases might have not run long enough. This early on in evolution there is still room to improve

on other objectives as well as the connection cost. Thus modularity might not emerge until it is

the only objective that offers some benefit. For the seeded conditions - it is possible that having

two outputs encourages separated processing. Furthermore it might be easier for evolution to

discover modularity. Here it only needs to learn to process inputs separately. In contrast a single

output does not suggest that the task needs modularity. Evolution would first have to discover

how to process the two parts of the retina, and then how to connect them to the output. Another

problem is that evolution could easily break modularity. For any task there are many more non-

modular solutions available than modular ones. Despite initial seeding or connection cost, the

solutions that are most available are the non-modular ones.


The substrate for the seeded conditions had biases towards both locality in weights and in

connectivity. Verbancsics and Stanley (2011) note that biasing the weights towards locality can

limit the emergence of modularity. So the presence of this bias might be a factor in the lack of

modularity. At the same time Risi and Stanley (2012) had both biases, and the modular solutions

were prevalent in their results. Thus another topic of interest becomes the roles of the different

seeding, and how they contribute to both fitness and modularity.

A factor in the seeded condition is that as time passes the CPPN becomes more complex.

Thus after some time if modularity had not emerged it might not emerge at all. The probabilities

for addition of a node or a connection to the CPPN were similar to those in other research.

However, still the complexity of the CPPNs grew quite fast. This might have prevented any

stable patterns from forming. It also led to genome bloat and cases where a genome was much

larger than the neural network it encoded. Furthermore, nothing but the minimal start prevents

this from happening. There is no mechanism that controls the genome size itself. This bloat also

defeats the purpose of generative encoding to encode larger structures in a compact way. The

most intuitive approach would be to further limit the probabilities for mutating the CPPN.

Furthermore some notion of genome size could be added to the genome itself. For example it

would be trivial to include genome size as an extra objective to the CCT.

Finally - none of the solutions were able to solve the retina task. As with modularity, it

might be the case that the current set-up needs longer time. However, there are still some

interesting patterns. For example the best performing results were the seeded ES-HyperNEAT

and the HyperNEAT-LEO like in Risi & Stanley (2012). In contrast to Huizinga et al. (2014)

The HyperNEAT connection cost did not outperform the other HyperNEAT conditions. In

contrast to Verbancsics & Stanley (2011), the unseeded HyperNEAT-LEO had the best


performance of the fixed substrate conditions. The results here were closest to Clune et al.

(2010). In Clune et al. (2010) the retina task proved to be too hard for HyperNEAT as it is.

However simplifying the task did evolve perfectly modular solutions that were able to solve it.

Furthermore, all conditions were able to solve the XOR task. So testing them on a simpler

problem might reveal additional results. Finally it is possible that the differences in the results

are due to different set-ups and parameters.

IV. CONCLUSION

The current paper investigated the application of the Connection cost technique to ES-

HyperNEAT. The expectation was that the combination of these two will produce effective

modular neural networks. However, this was not the case – ES-HyperNEAT with connection

cost had the poorest average performance of all tested approaches. Furthermore it did not yield

any modular solutions that could be identified at a glance. At the same time the ES-HyperNEAT

with connection cost was able to solve a simpler XOR task. Thus the poor performance might be

due to parameters and task setup.

Since the data is already available a logical next step would be to measure modularity

with a more reliant approach. For example the Q-score is already used throughout research for

calculating the modularity of networks (Newman, 2003). Next, the evolutionary runs in the

current paper were short compared to those in other research. Thus it might be useful to test the

ES-HyperNEAT with connection cost on longer runs, with different setups and different setups.

Modularity in nature is likely not a result of a single factor. Thus, another avenue for

future research is the combination of the different approaches to modularity. For example

enhancing the Seeded ES-HyperNEAT with connection cost might further help it take advantage


of the bias in the CPPN pattern. Alternatively, both the Seeded and the Connection cost

approaches could be tested under modularly varying goals.

Finally, while none of the approaches managed to solve the retina, task, but all managed

to solve the XOR task. Another possible next step from here is to evaluate the approaches on a

simpler problem. One possibility is the simplified retina problem of Clune et al. (2010).

Alternatively one could also try the multiple XOR problem as in Huizinga et al. (2014).

The take home lesson of the current paper is that modularity does not emerge easily even

with established methods. All the tested conditions were able to learn and to improve, yet

modular solutions did not emerge, even when under the pressure of connection cost or the bias of

seeding. At the same time the evolutionary runs were rather short – 3000 generations. Modularity

might be a property that emerges later at evolution. At the early stages it might not offer any

advantage. However, modularity might become important at later stages of the evolution when

the easier goals are already reached.


BIBLIOGRAPHY

Bassett, D. S., Wymbs, N. F., Porter, M. A., Mucha, P. J., Carlson, J. M., & Grafton, S. T.

(2011). Dynamic reconfiguration of human brain networks during learning. Proceedings of

the National Academy of Sciences of the United States of America, 108(18), 7641–6.

http://doi.org/10.1073/pnas.1018985108

Bullinaria, J. A. (2007). Understanding the emergence of modularity in neural systems.

Cognitive Science, 31(4), 673–695.

Bullmore, E., & Sporns, O. (2009). Complex brain networks: graph theoretical analysis of

structural and functional systems. Nature Reviews. Neuroscience, 10(3), 186–98.

http://doi.org/10.1038/nrn2575

Bullmore, E., & Sporns, O. (2012). The economy of brain network organization. Retrieved

February 24, 2015, from http://www.cogsci.msu.edu/DSS/2013-2014/Sporns/Bullmore

Sporns 2012 NRN.pdf

Cardamone, L., Loiacono, D., & Lanzi, P. L. (2009). Evolving competitive car controllers for

racing games with neuroevolution. In Proceedings of the 11th Annual conference on

Genetic and evolutionary computation (pp. 1179–1186).

Clune, J., Beckmann, B. E., McKinley, P. K., & Ofria, C. (2010). Investigating whether

hyperneat produces modular neural networks. In Proceedings of the 12th annual conference

on Genetic and evolutionary computation (pp. 635–642).

Clune, J., Mouret, J.-B., & Lipson, H. (2013). The evolutionary origins of modularity.

Proceedings. Biological Sciences / The Royal Society, 280(1755), 20122863.

http://doi.org/10.1098/rspb.2012.2863

Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective

genetic algorithm: NSGA-II. Evolutionary Computation, IEEE Transactions on, 6(2), 182–

197.

Dhar, V. K., Tickoo, A. K., Dubey, R. K., & others. (2009). Comparative performance of some

popular ANN algorithms on benchmark and function approximation problems. arXiv

Preprint arXiv:0911.1210.

Dobbs, D. (2015). Die, selfish gene, die. Retrieved from http://aeon.co/magazine/science/why-

its-time-to-lay-the-selfish-gene-to-rest/

Gomez, F., Schmidhuber, J., & Miikkulainen, R. (2006). Efficient non-linear control through

neuroevolution. In Machine Learning: ECML 2006 (pp. 654–662). Springer.


Hornby, G. S., & Pollack, J. B. (2001a). Evolving L-systems to generate virtual creatures.

Computers & Graphics, 25(6), 1041–1048.

Hornby, G. S., & Pollack, J. B. (2001b). The advantages of generative grammatical encodings

for physical design. In Evolutionary Computation, 2001. Proceedings of the 2001 Congress

on (Vol. 1, pp. 600–607).

Huizinga, J., Clune, J., & Mouret, J.-B. (2014). Evolving neural networks that are both modular

and regular: HyperNEAT plus the connection cost technique. In Proceedings of the 2014

conference on Genetic and evolutionary computation (pp. 697–704).

Kashtan, N., & Alon, U. (2005). Spontaneous evolution of modularity and network motifs.

Proceedings of the National Academy of Sciences of the United States of America, 102(39),

13773–8. http://doi.org/10.1073/pnas.0503610102

Lehman, J., Risi, S., D’Ambrosio, D., & Stanley, K. O. (2013). Encouraging reactivity to create

robust machines. Adaptive Behavior, 1059712313487390.

Li, S., Yuan, J., Shi, Y., & Zagal, J. C. (2015). Growing scale-free networks with tunable

distributions of triad motifs. Physica A: Statistical Mechanics and Its Applications, 428,

103–110.

Lowell, J., & Pollack, J. (2014). The Effect of Connection Cost on Modularity in Evolved Neural

Networks. In ALIFE 14: The Fourteenth Conference on the Synthesis and Simulation of

Living Systems (Vol. 14, pp. 726–733).

Meunier, D., Lambiotte, R., & Bullmore, E. T. (2010). Modular and hierarchically modular

organization of brain networks. Frontiers in Neuroscience, 4, 200.

http://doi.org/10.3389/fnins.2010.00200

Newman, M. (2003). Modularity, community structure, and spectral properties of networks.

Physical Review E.

Pugh, J. K., Goodell, S., & Stanley, K. O. (2013). Directional Communication in Evolved

Multiagent Teams University of Central Florida University of Central Florida University of

Central Florida, 1–17.

Risi, S., Lehman, J., & Stanley, K. O. (2010). Evolving the placement and density of neurons in

the hyperneat substrate. Proceedings of the 12th Annual Conference on Genetic and

Evolutionary Computation - GECCO ’10, (Gecco), 563.

http://doi.org/10.1145/1830483.1830589

Risi, S., & Stanley, K. O. (2012). An enhanced hypercube-based encoding for evolving the

placement, density, and connectivity of neurons. Artificial Life, 18(4), 331–63.

http://doi.org/10.1162/ARTL_a_00071


Stanley, K. O. (2007). Compositional pattern producing networks: A novel abstraction of

development. Genetic Programming and Evolvable Machines, 8(2), 131–162.

Stanley, K. O., D’Ambrosio, D. B., & Gauci, J. (2009). A hypercube-based encoding for

evolving large-scale neural networks. Artificial Life, 15(2), 185–212.

Stanley, K. O., & Miikkulainen, R. (1996). Efficient reinforcement learning through evolving

neural network topologies. Network (Phenotype), 1(2), 3.

Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting

topologies. Evolutionary Computation, 10(2), 99–127.

Thivierge, J.-P., & Marcus, G. F. (2007). The topographic brain: from neural connectivity to

cognition. Trends in Neurosciences, 30(6), 251–9. http://doi.org/10.1016/j.tins.2007.04.004

Togelius, J., Shaker, N., Karakovskiy, S., & Yannakakis, G. N. (2013). The mario ai

championship 2009-2012. AI Magazine, 34(3), 89–92.

Trujillo, L., Olague, G., Lutton, E., & De Vega, F. F. (2008). Discovering several robot

behaviors through speciation. In Applications of Evolutionary Computing (pp. 164–174).

Springer.

Verbancsics, P., & Stanley., K. O. (2011). Constraining Connectivity to Encourage Modularity in

HyperNEAT. Retrieved January 14, 2015, from

http://eplex.cs.ucf.edu/papers/verbancsics_gecco11.pdf


Appendix A: Pseudo Code

ES-HyperNEAT.

All images are taken from http://eplex.cs.ucf.edu/ESHyperNEAT/


NSGA-II .

1. Sorting. Source: (Deb, Pratap, Agarwal, & Meyarivan, 2002)

2. Distance assignment. . Source: (Deb et al., 2002)

evolving modular neural networks with es hyper-neat, link expression output and connection...

Documents