born to learn: the inspiration, progress, and future of ... · born to learn: the inspiration,...

1

Born to Learn: the Inspiration, Progress, and Futureof Evolved Plastic Artificial Neural Networks

Andrea Soltoggio, Kenneth O. Stanley, Sebastian Risi

Abstract—Biological plastic neural networks are systems ofextraordinary computational capabilities shaped by evolution, de-velopment, and lifetime learning. The interplay of these elementsleads to the emergence of adaptive behavior and intelligence.Inspired by such intricate natural phenomena, Evolved PlasticArtificial Neural Networks (EPANNs) use simulated evolutionin-silico to breed plastic neural networks with a large varietyof dynamics, architectures, and plasticity rules: these artificialsystems are composed of inputs, outputs, and plastic componentsthat change in response to experiences in an environment. Thesesystems may autonomously discover novel adaptive algorithms,and lead to hypotheses on the emergence of biological adaptation.EPANNs have seen considerable progress over the last twodecades. Current scientific and technological advances in artificialneural networks are now setting the conditions for radicallynew approaches and results. In particular, the limitations ofhand-designed networks could be overcome by more flexibleand innovative solutions. This paper brings together a varietyof inspiring ideas that define the field of EPANNs. The mainmethods and results are reviewed. Finally, new opportunities anddevelopments are presented.

Index Terms—Artificial Neural Networks, Lifelong Learning,Plasticity, Evolutionary Computation.

I. INTRODUCTION

Over the course of millions of years, evolution has led to theemergence of innumerable biological systems, and intelligenceitself, crowned by the evolution of the human brain. Evolution,development, and learning are the fundamental processes thatunderpin biological intelligence. Thus, it is no surprise thatscientists have tried to engineer artificial systems to reproducesuch phenomena (Sanchez et al., 1996; Sipper et al., 1997;Dawkins, 2003).

This endeavor is the domain of fields such as artificialintelligence (AI) and artificial life (AL) (Langton, 1997) thatseek the emergence of intelligence and life from artificiallydesigned computation: the idea is to abstract the principlesfrom the medium, i.e., biological or in-silico, and utilize suchprinciples to devise algorithms and devices that reproduceproperties of their biological counterparts. A possible way todesign complex and intelligent systems is to simulate naturalevolution in-silico in the field of evolutionary computation(Holland, 1975; Eiben and Smith, 2015). Sub-fields of evo-lutionary computation such as evolutionary robotics (Harveyet al., 1997; Nolfi and Floreano, 2000) and neuroevolution(Yao, 1999) specifically research algorithms that, exploiting

Department of Computer Science, Loughborough University, LE11 3TU,Loughborough, UK, [email protected]

Department of Computer Science, University of Central Florida, Orlando,FL, USA, [email protected]

IT University of Copenhagen, Copenhagen, Denmark, [email protected]

artificial evolution of physical and neural models, seek moreautonomous design methods and principles towards artificialintelligence. One assumption is that intelligence in biologicalorganisms relies considerably on powerful and general learn-ing algorithms, designed by evolution, which are executedduring both development and throughout life. Thus, researchin neuroevolution, and more in general in AI, is progressivelymoving towards the design, or the evolution, of lifelong onlinelearning neural systems (Coleman and Blair, 2012), rather thanstatic systems.

This paper reviews and organizes the field that studiesevolved plastic artificial neural networks, and introduces theacronym EPANN. EPANNs are evolved because parts oftheir design are determined by an evolutionary algorithm;they are plastic because they are specifically designed toundergo lifetime changes, e.g., in the connectivity amongneurons, at various time-scales while experiencing sensory-motor information streams. Lifetime learning is manifestedthrough continuous adaptation to often dynamic scenarios.The final capabilities of such networks are determined by thecombination of evolved genetic instructions and continuouslearning that takes place as the network interacts with anenvironment.

EPANNs are motivated by the promise of discoveringprinciples of neural adaptation, learning, and memory, and arenot limited to performance optimization on limited datasets.This ambitious endeavor also entails a number of researchproblems. One problem is the interaction of dynamics acrossthe evolutionary and learning dimensions, which make thewhole system difficult to design and analyze. An open questionis what are the abstractions that capture essential computa-tional principles and enable intelligence. One further problemis the high computational cost required to simulate even simplemodels of lifetime learning and evolution. Finally, experimentsto evolve adaptation and intelligence are difficult to benchmarkbecause they aim to achieve loosely defined objectives suchas increasing behavioral complexity, intelligence, adaptability,and evolvability (Miconi, 2008). For these reasons, EPANNsexplore a much larger search space and rarely compete withmachine learning algorithms specifically designed to improveperformance on well-defined and narrow problems.

The power of EPANNs, however, derives from two au-tonomous search processes: evolution and learning, whicharguably place them among the most advanced AI and machinelearning systems in terms of open-endedness, potential fordiscovery, and human-free design. These systems rely the leaston pre-programmed instructions because they are designed toautonomously evolve while interacting with a real or simu-

arX

iv:1

703.

1037

1v2

[cs

.NE

] 6

Dec

201

7

2

lated world. Plastic networks, in particular recurrent plasticnetworks, are known for their computational power, which hasalso been hypothesized to go beyond that of Turing Machines(Cabessa and Siegelmann, 2014): evolution can be a valuabletool to explore the power of those computational structures.

In recent years, progress in a number of relevant areashas set the stage for renewed advancement of EPANNs:neural networks, in particular deep networks, are becom-ing increasingly more successful and used; there has beena remarkable increase in available computational power bymeans of parallel GPU computing and dedicated hardware; abetter understanding of search, complexity, and evolutionarycomputation allows for less naive approaches; and finally,neuroscience and genetics provide us with an increasinglylarge set of inspirational principles. This progress has changedthe theoretical and technological landscape in which EPANNsfirst emerged, providing greater research opportunities than inthe past.

This paper aims to delineate a unique research field, firstlydefining the inspirational principles of EPANNs (Section II).The body of research that advanced EPANNs is broughttogether and described in Section III. Finally, this paperdescribes research directions that are creating opportunitiesand challenges (Section IV).

II. INSPIRATION

EPANNs are strongly inspired by the biological evidence ofnatural computation and intelligence (Floreano and Mattiussi,2008; Downing, 2015). From a computational perspective, aprimary focus is to understand what are the fundamental ingre-dients and design principles discovered by nature that resultedin high levels of adaptation and intelligence. Those principlesmight then be abstracted and implemented in original ways in-silico to achieve similar or better results to those observed innature. Crucially, artificial systems do not need to implementthe same constraints and limitations as biological systems(Bullinaria, 2003). Thus, inspiration is not simple imitation.

We now know that genetic instructions determine the ul-timate capabilities of a brain (Deary et al., 2009; Hopkinset al., 2014). Different animal species manifest different levelsof skills and intelligence because of their genetic blueprint(Schreiweis et al., 2014). The intricate structure of the brainemerges from one single zygote cell through a developmentalprocess (Kolb and Gibb, 2011), which is also strongly affectedby input-output learning experiences throughout early life(Hensch et al., 1998; Kolb and Gibb, 2011). Yet high levelsof plasticity are maintained throughout the entire lifespan(Merzenich et al., 1984; Kiyota, 2017). These dimensions,evolution, development and learning, also known as the phylo-genetic (evolution), ontogenetic (development) and epigenetic(learning) (POE) dimensions (Sipper et al., 1997), are essentialfor the emergence of biological plastic brains.

The POE dimensions lead to a number of research ques-tions. Can artificial intelligence systems be entirely engineeredby humans, or do they need to undergo through a less human-controlled process such as evolution? Do intelligent systemsneed to learn, or could they be born already knowing? Opin-ions and approaches are diverse. EPANNs assume that both

evolution and learning, if not strictly necessary, are conduciveto the emergence of artificial intelligence. While artificialevolution is justified by the remarkable achievements of naturalevolution, the role of learning has gathered significance inrecent years. We are now more aware of the high level of brainplasticity, and its impact on the manifestation of behaviorsand skills (LeDoux, 2003; Doidge, 2007; Grossberg, 2012).Recent developments in machine learning (Michalski et al.,2013; Alpaydin, 2014) and neural learning (Deng et al., 2013;LeCun et al., 2015; Silver et al., 2016), highlight the impor-tance of learning from large input-output data and extensivetraining. Other areas of cognition such as the capabilities tomake predictions (Hawkins and Blakeslee, 2007), to estab-lish associations (Rescorla, 2014) and to regulate behaviors(Carver and Scheier, 2012) are also based on learning fromexperience. Interestingly, skills such as reading, playing amusical instrument, or driving a car, are mastered even ifnone of those behaviors existed during evolutionary time, andyet they are mostly unique to humans. Thus, human geneticinstructions have evolved not to learn specific tasks, but tosynthesize recipes to learn a large variety of general skills.We can conclude that the evolutionary search of learningmechanisms in EPANNs tackles both the long-running naturevs. nurture debate (Moore, 2003), and a fundamental andfast growing research area in AI. EPANNs’ inspiration, andconsequently this review, focus on evolution and learning,and less on development because that may be seen as a formof learning if affected by sensory-motor signals. We refer toStanley and Miikkulainen (2003a) for an overview of artificialdevelopmental theories.

Whilst the range of inspiring ideas is large and heteroge-neous, the analysis in this review proposes that EPANNs buildupon the following broad principles:

• natural and artificial evolutionary processes,• plasticity in biological neural networks,• plasticity in artificial neural networks, and• lifetime learning environments.

Figure 1 graphically illustrates the inspiration areas and theparticular research topics and fields that make the structure ofthis review and define EPANNs.

A. Evolutionary processes

A central idea in evolutionary computation (Goldberg andHolland, 1988) is that evolutionary processes, similar to thosethat occurred in nature during the course of billions of years(Darwin, 1859; Dobzhansky, 1970), can be simulated withcomputer software. This idea led to the belief that intelligentcomputer programs could emerge with little human interven-tion by means of evolution in-silico (Holland and Reitman,1977; Koza, 1992; Fogel, 2006).

The emergence of evolved intelligent software, however, didnot occur as easily as initially hoped. The reasons for the slowprogress are not completely understood, but a number of prob-lems have been identified, likely related to the simplicity of theearly implementations of evolutionary algorithms and the highcomputational requirements. Current topics of investigationfocus on levels of abstraction, diversity in the population,

3

catastrophic forgetting

adaptive behavior

lifetim

e learning

environments (II.D)

evolutionary

processes (II.A)

plasticity in biological neural networks (II.B)

plasticity in artificial neural networks (II.C)

fitness objectives

novelty search and diversity

indirect encodings

evolvability

models of pathways

learning & memory

behavior and brain functions

reward-based learning

machine learning

deep neural learning

AI in games

control problems

cognitive problems

Hebbian plasticityopen-ended evolution

neuromodulation

structural plasticity

models of plasiticity

S e c . I I IP R O G R E S S

S e c . I V - N E W D I R E C T I O N S

evolving plasticity

rules

EPANNs in Evolutionary

Robotics

interaction of evolution and

learning

evolving neuro-

modulation

evolving learning

architectures

indirectly encoded plasticity

abstractions and

representations

evolving general learning

incremental and social learning

fast learning

evolving memory

evolving deep

learning

GPU/neuromorphic

hardware

measuring progress

Evolutionary algorithms for

EPANNs

C O N T R I B U T I N G F I E L D S A N D T O P I C S

Se

c.

IIIN

SP

IRA

TIO

N

Evolved Plastic Artificial Neural Networks

Figure 1: Organization of the field of EPANNs, reflected in the structure of this paper.

selection criteria, the concepts of evolvability and scalability(Wagner and Altenberg, 1996; Pigliucci, 2008; Lehman andStanley, 2013), the encoding of genetic information throughthe genotype-phenotype mapping processes (Wagner and Al-tenberg, 1996; Hornby et al., 2002), the deception of fitnessobjectives and how to avoid them (Lehman and Stanley, 2008;Stanley and Lehman, 2015). It is also not clear yet whichstepping stones were more challenging for natural evolution(Roff, 1993; Stanley and Lehman, 2015) in the evolutionarypath to intelligent and complex forms of life. This lack ofknowledge outlines that our understanding of evolutionarysearch processes is incomplete, and thus the potential toexploit these methods is not fully realized. Nevertheless,artificial evolution provides a tool to search for counter-intuitive and innovative solutions that emerge from adaptationto the environment with minimal or no human intervention.Effective evolutionary algorithms and their desirable featuresfor applications to EPANNs are detailed later in Section III-A.

B. Plasticity in biological neural networks

Biological neural networks demonstrate lifetime learning,from simple reflex adaptation to the acquisition of astonishingskills such as social behavior and learning to speak one ormore languages. Those skills are acquired by experiencingstimuli and actions. The fact that experiences guide life-longlearning is an old and intuitive notion. Scientific evidence ofanimal learning from experience was documented initially inthe works of behaviorism with scientists such as Thorndike(1911), Pavlov (1927), Skinner (1938, 1953), and Hull (1943)who started to test scientifically how experiences cause achange in behavior, in particular as a result of learning asso-ciations and observable behavioral patterns (Staddon, 1983).This approach means linking behavior to brain mechanismsand dynamics, an idea initially entertained by Freud (KϕPPE,

1983) and later by other illustrious scientists (Hebb, 1949;Kandel, 2007). A seminal contribution to link psychology tophysiology came from Hebb (1949), whose ubiquitous prin-ciple that neurons that fire together, wire together is relevantto understanding both low level neural wiring and high levelpsychological theories (Doidge, 2007). Much later, a Hebbian-compatible rule that regulates synaptic changes according tothe firing times of the presynaptic and postsynaptic neuronswas observed by Markram et al. (1997) and named Spike-Timing-Dependent Plasticity (STDP).

The seminal work of Kandel and Tauc (1965), and followingstudies (Clark and Kandel, 1984), were the first to demonstratethat changes in the strength of connectivity among neuronsrelated to behavior learning. Walters and Byrne (1983) showedthat a single neuron can perform associative learning such asclassical conditioning, a class of learning that is observed insimple neural systems such as that of the Aplysia (Carew et al.,1981). Plasticity driven by local neural stimuli, i.e. compatiblewith the Hebb synapse (Hebb, 1949; Brown et al., 1990), isresponsible not only for fine tuning, but also for building aworking visual system in the cat’s visual cortex (Rauscheckerand Singer, 1981).

Biological plastic neural networks are also capable ofstructural plasticity, which creates new pathways among neu-rons (Lamprecht and LeDoux, 2004; Chklovskii et al., 2004;Russo et al., 2010): it occurs primarily during development,but there is evidence that it continues well into adulthood(Pascual-Leone et al., 2005). Axon growth, known to be reg-ulated by neurotrophic nerve growth factors (Tessier-Lavigneand Goodman, 1996), was also modeled computationally inRoberts et al. (2014). Developmental processes and neuralplasticity are often indistinguishable (Kolb, 1989; Pascual-Leone et al., 2005) because the brain is highly plastic duringdevelopment. Neuroscientific advances reviewed in Damasio(1999); LeDoux (2003); Pascual-Leone et al. (2005); Doidge

4

(2007); Draganski and May (2008) outline the importance ofstructural plasticity in learning motor patterns, associations,and ways of thinking. Both structural and functional plasticityin biology is essential to acquiring long-lasting new skills, andfor this reason appears to be an important point of inspirationfor EPANNs.

Neuromodulation (Marder and Thirumalai, 2002; Gu, 2002;Bailey et al., 2000) is a mechanism that mediates neural pro-cesses by means of modulatory signals. Modulatory chemicalssuch as acetylcholine (ACh), norepinephrine (NE), serotonin(5-HT) and dopamine (DA) appear to regulate a large varietyof neural functions, from arousal and behavior (Harris-Warrickand Marder, 1991; Hasselmo and Schnell, 1994; Marder, 1996;Katz, 1995; Katz and Frost, 1996), to pattern generation (Katzet al., 1994), to memory consolidation (Kupfermann, 1987;Hasselmo, 1995; Marder, 1996; Hasselmo, 1999). Learning byreward in monkeys was linked to dopaminergic activity duringthe 1990s with studies by Schultz et al. (1993, 1997); Schultz(1998). For these reasons, neuromodulation is considered anessential element in cognitive and behavioral processes, andhas been the topic of a considerable amount of work inEPANNs (Section III-F).

In summary, plasticity in biological neural networks isdriven by an extraordinarily rich set of dynamics and chem-icals. Relevant computational questions for EPANNs include:(1) How does the brain structure form—driven both by geneticinstructions and neural activity—and acquire functions andbehavior? (2) What are the key plasticity mechanisms frombiology that can be applied to EPANNs? (3) Are memories,skills, and behaviors stored in plastic synaptic connections,in patterns of activities, or a combination of both? Whilstneuroscience continues to provide inspiration and insight intoplasticity in biological brains, EPANNs serve the comple-mentary objective of seeking, implementing, and verifyingdesigns of bio-inspired methods for adaptation, learning, andintelligent behavior.

C. Plasticity in artificial neural networks

In EPANN experiments, evolution can be seen as a meta-learning process. Established learning rules are often used asingredients that evolution uses to search for good parameterconfigurations, efficient combinations of rules and networktopologies, new functions representing novel learning rules,etc. EPANN experiments are suited to include the largestpossible variety of rules because of (1) the variety of pos-sible tasks in a simulated behavioral experiment and (2) theflexibility of evolution to combine rules with no assumptionsabout their dynamics. The following gives a snapshot of theextent and scope of various learning algorithms for ANN thatcan effectively be building blocks of EPANNs.

In supervised learning, backpropagation is the most popularlearning rule used to train both shallow and deep networks(Rumelhart et al., 1988; Widrow and Lehr, 1990; LeCun et al.,2015) for classification or regression. Unsupervised learningis implemented in neural networks with self-organizing maps(SOM) (Kohonen, 1982, 1990), auto-encoders (Bourlard andKamp, 1988), restricted Boltzmann machines (RBM) (Hinton

and Salakhutdinov, 2006), Hebbian plasticity (Hebb, 1949;Gerstner and Kistler, 2002a; Cooper, 2005), and variouscombinations of the above (Willshaw and Dayan, 1990).Hebbian rules, given their biological plausibility and unsu-pervised learning, are a particularly important inspirationalprinciple. Variations have been proposed to include, e.g., termsto achieve stability (Oja, 1982; Bienenstock et al., 1982)and various constraints (Miller and Mackay, 1994), or moreadvanced update dynamics such as dual weights for fast andslow decay (Levy and Bairaktaris, 1995; Hinton and Plaut,1987; Bullinaria, 2009a; Soltoggio, 2015). The final learningdynamics of those rules depend often on the topology of thenetwork and on the parameters of the rule. New Hebbianrules have been recently proposed to minimize defined costfunctions (Pehlevan et al., 2015; Bahroun et al., 2017).

Often, behavior learning requires a measure of the qualityof such behaviors, implemented as positive or negative valuesthat represent the consequences of stimulus-action sequences.This type of learning, also referred to as trial and error, oroperant reward learning in animal learning, was formalizedmathematically as Reinforcement Learning (RL) (Sutton andBarto, 1998). RL in neural networks (Pennartz, 1997) makesuse of robotic or agent-environment problems with a rewardsignal that affects plasticity in the network (Lin, 1993; Nivet al., 2002; Soltoggio and Stanley, 2012; Auerbach et al.,2014; Pugh et al., 2014). Neuromodulated plasticity (Fellousand Linster, 1998) is often used to implement reward-learningin neural networks (reviewed later in Section III-F). Sucha modulation of signals, or gated learning (Abbott, 1990),allows for the regulation and learning of behaviors as shownin numerous computational models (Baxter et al., 1999; Suriet al., 2001; Birmingham, 2001; Alexander and Sporns, 2002;Doya, 2002; Fujii et al., 2002; Suri, 2002; Ziemke and Thieme,2002; Sporns and Alexander, 2003; Krichmar, 2008).

Plastic neural models are also used to demonstrate howbehavior can emerge from a particular circuitry modeledafter biological brains. Computational models of, e.g., thebasal ganglia and modulatory systems may propose plastic-ity mechanisms and aim to demonstrate the computationalrelations among various nuclei, pathways, and learning pro-cesses (Krichmar, 2008; Vitay and Hamker, 2010; Schroll andHamker, 2015).

Finally, plasticity rules for spiking neural networks (Maassand Bishop, 2001) aim to demonstrate unique learning mech-anisms that emerge from spiking dynamics (Markram et al.,1997; Izhikevich, 2006, 2007), as well as model biologicalsynaptic plasticity (Gerstner and Kistler, 2002b).

Plasticity in neural networks, when continuously active, wasalso observed to cause catastrophic forgetting (Robins, 1995).If learning occurs continuously, new information or skills havethe potential to overwrite previously acquired information orskills (Abraham and Robins, 2005; Finnie and Nader, 2012).

In conclusion, plasticity is central to most approaches toANNs, but in the majority of cases, a careful matching andengineering of rules, architectures and problems are necessary,requiring considerable design effort. The variety of algorithmsalso reflects the variety of problems and solutions. EPANNsoffer a unique testing tool to assess the effectiveness and

5

suitability of different models, or their combination, in avariety of different scenarios.

D. Lifelong learning environments

Evolution works on organisms within an environment toexploit its resources, find survival solutions, and adapt tochanges (Darwin, 1859). However, what makes an environ-ment conducive to the evolution of learning and intelligenceis not clear. In particular, what are the challenges faced bylearning organisms in the natural world, and how can those beabstracted and ported to a simulated environment?

In the early phases of AI, logic and reasoning were thoughtto be key to intelligence (Cervier, 1993), so symbolic input-output mappings were employed as tests. Soon it becameevident that intelligence is not only symbol manipulation, butresides also in subsymbolic problem solving abilities emergingfrom the interaction of brain, body, and environment (Steels,1993; Sims, 1994), so more complex simulators of real-lifeenvironments were developed. A further step in complexityis represented by high-level planning and strategies required,e.g., when applying AI to games (Allis et al., 1994; Millingtonand Funge, 2016) or articulated robotic tasks. Planning anddecision making with high bandwidth sensory-motor informa-tion flow such as those required for humanoid robots or self-driving vehicles are current benchmarks of intelligent systems.Finally, environments in which affective dynamics and feelingsplay a role are recognized as important for human wellbeing (De Botton, 2016; Lee and Narayanan, 2005). Thoseintelligence-testing environments are effectively the “worlds”in which EPANNs live and evolve, and thus largely shape thedesign process.

Such different testing environments have very differentfeatures, dynamics, and goals. In some cases, a precise targetbehavior exists and is known, and an error between desiredand actual behavior can be used in supervised learning. Othertimes, the environment provides rewards, but the behaviorsthat lead to rewards are unknown. In such cases, the objectiveis to search for behavioral policies that lead to collectingrewards. In yet other cases, it is useful to find relationships andregularities in the environment to construct complex behaviorsand accomplish particular tasks, which can either lead toreward, exploit resources, or guarantee survival. Finally, recentadvances in the theory of evolutionary computation (Lehmanand Stanley, 2011; Stanley and Lehman, 2015) suggest that isnot always the fittest, but at times is the novel individual orbehavior that can exploit environmental niches, thus leadingto creative evolutionary processes as those observed in nature.Such an increasingly large range of testing environments andmulti-skill integration also calls for new evaluation metrics(see Section IV-H).

III. PROGRESS ON EVOLVING ARTIFICIAL PLASTICNEURAL NETWORKS

After outlining important features of evolutionary algo-rithms that operate with EPANNs (Section III-A), this sectionproceeds to review studies that have evolved plastic neuralnetworks. Those studies generally include an evolutionary

process, neural plastic models, and a learning environment:a simplified graphical representation of an EPANN setup isgiven in Fig. 2.

A. Evolutionary algorithms for EPANNs

As opposed to parameter optimization (Back and Schwefel,1993) in which search spaces are often of fixed dimensionand static, EPANNs evolve in dynamic search spaces inwhich lifetime learning further increases the complexity of theevolutionary search. Evolutionary algorithms (Holland, 1975;Michalewicz, 1994) for EPANNs often require additionaladvanced features to cope with the challenges of the evolutionof learning and open evolutionary design (Bentley, 1999).Desirable properties of EA for EPANNs are listed in thefollowing.

1) Variable genotype length and growing complexity: Inmany learning problems, the complexity and the characteristicsof the solution required to solve the problem are not knownin advance. Therefore, a desirable property of the EA isthat of increasing the length of the genotype, and thus theinformation contained in it, as evolution discovers increasinglymore complex strategies and solutions.

2) Indirect genotype to phenotype encoding: In nature, phe-notypes are expressions of a more compact representation, thegenetic code. Similarly, evolutionary algorithms may representgenetic information in a compact form, which is then mappedto a larger phenotype.

3) Expressing regularities, repetitions, and patterns: In-direct encodings are beneficial when they can use one setof instructions in the genotype to generate more parts inthe phenotype. This may involve expressing regularities likesymmetry (e.g. symmetrical neural architectures), repetition(e.g. neural modules), repetition with variation (similar neuralmodules), and patterns (e.g. motifs in the neural architecture).

4) Effective exploration via mutation and recombination:Genetic mutation and sexual reproduction in nature allowfor the expression of a variety of phenotypes and for theexploration of new solutions, but seldom lead to highly unfitindividuals (Ay et al., 2007). Similarly, EAs for EPANNsneed to be able to mutate and recombine genomes withoutdestroying the essential properties of the solutions. As opposedto other heuristic searches, EAs use recombination to generatenew solutions from two parents: how to effectively recombinegenetic information from two EPANN is still an open question.

5) Genetic encoding of plasticity rules: Just as neural net-works need a genetic encoding to be evolved, so do plasticityrules. EPANN algorithms require the integration of such a rulein the genome. The encoding may be restricted to a simpleparameter search, but the field is moving progressively towardsthe evolution of arbitrary and general plasticity rules. Plasticitymay also be applied to all or parts of the network, thus effec-tively implementing the evolution of learning architectures.

6) Diversity, survival criteria, and low selection pressure:The variety of solutions in nature seems to suggest thatdiversity is a key aspect of natural evolution. EAs for EPANNsare likely to perform better when they can maintain diversityin the population, both at the genotype and phenotype levels.

6

variable genotype length

genetic representations of neural systems

effective mutation/recombination

speciation

pool of genes mapping EPANN

interaction of evolution and

learning

modules

foraging

control

reward learning

survival/reproduction

adaptation / self regulation

behavioral diversity

indirectmapping:regularitiesrepetitionspatterns

direct mapping

genetic representations of plasticity rules

fitness / objective

modulatory neurons

plasticity functions

navigation (e.g. mazes)

evolution via EPANN-specific

algorithms

learning parameters

lifetime learning

(a) Neural Network (b) Sensors

Figure 1: Maze Navigating Robot. The artificial neural networkthat controls the maze navigating robot is shown in (a). The layoutof the sensors is shown in (b). Each arrow outside of the robot’sbody in (b) is a rangefinder sensor that indicates the distance tothe closest obstacle in that direction. The robot has four pie-slicesensors that act as a compass towards the goal, activating when aline from the goal to the center of the robot falls within the pie-slice. The solid arrow indicates the robot’s heading.

expanding the neural networks in NEAT) to create novel be-haviors as the simpler variants are exhausted.

ExperimentA good domain for testing novelty search should have a de-ceptive fitness landscape. In such a domain, the search al-gorithm following the fitness gradient may perform worsethan an algorithm following novelty gradients because nov-elty cannot be deceived; it ignores fitness entirely. A com-pelling, easily-visualized domain with this property is a two-dimensional maze navigation task. A reasonable fitnessfunction for such a domain is how close the maze naviga-tor is to the goal at the end of the evaluation. Thus, deadends that lead close to the goal are local optima to whichan objective-based algorithm may converge, which makes agood model for deceptive problems in general.

This paper’s experiments utilize NEAT, which has beenproven in many control tasks [27, 29], including maze navi-gation [26], the domain of the experiments in this paper.

The maze domain works as follows. A robot controlledby an ANN must navigate from a starting point to an endpoint in a fixed time. The task is complicated by cul-de-sacsthat prevent a direct route and that create local optima in thefitness landscape. The robot (figure 1) has six rangefind-ers that indicate the distance to the nearest obstacle and fourpie-slice radar sensors that fire when the goal is within thepie-slice. The robot’s two effectors result in forces that re-spectively turn and propel the robot. This setup is similar tothe successful maze navigating robots in NERO [26].

Two maps are designed to compare the performance ofNEAT with fitness-based search and NEAT with noveltysearch. The first (figure 2a) has deceptive dead ends thatlead the robot close to the goal. To achieve a higher fitnessthan the local optimum provided by a dead end, the robotmust travel part of the way through a more difficult path thatrequires a weaving motion. The second maze (figure 2b)provides a more deceptive fitness landscape that requires thesearch algorithm to explore areas of significantly lower fit-ness before finding the global optimum (which is a networkthat reaches the goal).

(a) Medium Map (b) Hard Map

Figure 2: Maze Navigation Maps. In both maps, the large circlerepresents the starting position of the robot and the small circlerepresents the goal. Cul-de-sacs in both maps that lead toward thegoal create the potential for deception.

Fitness-based NEAT, which will be compared to noveltysearch, requires a fitness function to reward maze-navigatingrobots. Because the objective is to reach the goal, the fitnessf is defined as the distance from the robot to the goal at theend of an evaluation: f = bf � dg , where bf is a constantbias and dg is the distance from the robot to the goal. Givena maze with no deceptive obstacles, this fitness function de-fines a monotonic gradient for search to follow. The constantbf ensures all individuals will have positive fitness.

NEAT with novelty search, on the other hand, requiresa novelty metric to distinguish between maze-navigatingrobots. Defining the novelty metric requires careful con-sideration because it biases the search in a fundamentallydifferent way than the fitness function. The novelty met-ric determines the behavior-space through which search willproceed. It is important that the type of behaviors that onehopes to distinguish are recognized by the metric.

Thus, for the maze domain, the behavior of a navigator isdefined as its ending position. The novelty metric is then theEuclidean distance between the ending positions of two in-dividuals. For example, two robots stuck in the same cornerappear similar, while one robot that simply sits at the startposition looks very different from one that reaches the goal,though they are both equally viable to the novelty metric.

This novelty metric rewards the robot for ending in a placewhere none have ended before; the method of traversal is ig-nored. This measure reflects that what is important is reach-ing a certain location (i.e. the goal) rather than the methodof locomotion. Thus, although the novelty metric has noknowledge of the final goal, a solution that reaches the goalwill appear novel. Furthermore, the comparison betweenfitness-based and novelty-based search is fair because bothscores are computed only based on the distance of the finalposition of the robot from other points.

Finally, to confirm that novelty search is indeed not any-thing like random search, NEAT is also tested with a ran-dom fitness assigned to every individual regardless of per-formance, which means that selection is random. If the mazeis solved, the number of evaluations is recorded.

Experimental ParametersBecause NEAT with novelty search differs from orig-inal NEAT only in substituting a novelty metric for

output

input

plastic architectures: acquired skills

fixed architectures instincts & learning methods

environment

sensory motor

experiences

Figure 2: Main elements of an EPANN setup in which simulated evolution (left) and an environment (right) allow for plasticnetworks to evolve through generations and learn within a lifetime in the environment. The network may include diverseelements, e.g., modules or architectures, modulatory neurons, inputs/outputs, learning parameters, and other structures thatenable learning.

Niche exploration and behavioral diversity could also play akey role for creative design processes. Low selection pressuremight be crucial to evolve learning in deceptive environments(see Section III-E).

7) Open-ended evolution: Simple evolutionary experimentsseem to quickly optimize parameters and reach a fitnessplateau without further improvement. Open-ended evolutionseeks to set up experiments that can evolve indefinitely morecomplex solutions given sufficient computational power (Tay-lor et al., 2016).

Many evolutionary algorithms include one or more of thosedesirable properties (Fogel, 2006). Due to the complexity ofneural network design, the field of neuroevolution was thefirst to explore most of those extensions of standard evo-lutionary algorithms. Popular algorithms include early workof Angeline et al. (1994) and Yao and Liu (1997) to evolvefixed weights (i.e. weights that do not change while the agentinteracts with its environment) and the topology of arbitraryneural networks, e.g., recurrent (addressing III-A1 and III-A4).Neuroevolution of Augmenting Topologies (NEAT) (Stanleyand Miikkulainen, 2002) leverages on three main aspects: arecombination operator intended to preserve network func-tion (addressing III-A4); speciation (addressing III-A6); andevolution from small to larger networks (addressing III-A1).Analog Genetic Encoding (AGE) (Mattiussi and Floreano,2007) is a method for indirect genotype to phenotype mappingthat can be used in combination with evolution to designarbitrary network topologies (III-A1 and III-A2). HyperNEAT(Stanley et al., 2009) is an indirect representation method thatcombines the NEAT algorithm with compositional patternsproducing networks (CPPN) (Stanley, 2007) (III-A1, III-A2

and III-A3). Novelty search (Lehman and Stanley, 2008, 2011)was introduced as an alternative to the survival of the fittestas a selection mechanism (III-A6). Initially, these algorithmswere not devised to evolve plastic networks, but rather fixednetworks in which the final synaptic weights were encodedin the genome. To operate with EPANNs, these algorithmsneed to integrate additional genotypical instructions to evolveplasticity rules (III-A5).

By combining these features (III-A1 - III-A7), neuroevolu-tion of plastic networks aims to search extremely large searchspaces in fundamentally different and more creative waysthan traditional heuristic searches of parameters and hyper-parameters.

B. Evolving plasticity rules

Early experiments evolved the parameters of learning rulesfor fixed or hand-designed ANN architectures. Learning rulesare functions that change the connection weight w betweentwo neurons, and are generally expressed as

∆w = f(x, θ) , (1)

where x is a vector of neural signals and θ is a vector of fixedparameters. The incoming connection weights w to a neuroni are used to determine the activation value

xi = σ(∑

(wji · xj))

, (2)

where xj are the activation values of presynaptic neurons thatconnect to neuron i with the weights wji, and σ is a nonlinearfunction such as the sigmoid or the hyperbolic tangent. x mayprovide local signals such as pre and postsynaptic activities,the value of the weight w, and modulatory or error signals.

7

Bengio et al. (1990, 1992) proposed the optimization of theparameters θ of generic learning rules with gradient descent,simulated annealing, and evolutionary search for problemssuch as conditioning, boolean function mapping, and classi-fication. Those studies are also among the first to include amodulatory term in the learning rules. The optimization wasshown to improve the performance for those different taskswith respect to manual parameter settings. Chalmers (1990)evolved a learning rule that applied to every connection andhad a teaching signal. He discovered that, in 20% of theevolutionary runs, the algorithm evolved the well-known deltarule, or Widrow-Hoff rule (Widrow et al., 1960), used in back-propagation, thereby demonstrating the validity of evolutionas an autonomous tool to discover learning. Fontanari andMeir (1991) used the same approach of Chalmers (1990) butconstrained the weights to binary values. Also in this case,evolution autonomously rediscovered a hand-designed rule, thedirected drift rule by Venkatesh (1993). They also observedthat the performance on new tasks was better when the networkevolved on a larger set of tasks, possibly encouraging theevolution of more general learning strategies.

With back-propagation of errors (Widrow and Lehr, 1990),the input vector x of Eq. 1 requires an error signal betweeneach input/output pair. Rules that use only local signals havebeen a more popular choice for EPANNs, though this isalready changing with the rise in effectiveness of deep learningusing back-propagation and related methods. In the simplestform, the product of presynaptic (xj) and postsynaptic (xi)activities, and a learning rate η

∆w = η · xj · xi . (3)

is known as Hebbian plasticity (Hebb, 1949; Cooper, 2005).More generally, any function as in Eq. 1 that uses only localsignals is considered a local plasticity rule for unsupervisedlearning. Baxter (1992) evolved a network that applied the ba-sic Hebbian rule in Eq. 3 to a subset of weights (determined byevolution) to learn four boolean functions of one variable. Thenetwork, called Local Binary Neural Net (LBNN), evolvedto change its weights to one of two possible values (±1),or have fixed weights. The experiment proved that learningcan evolve when rules are optimized and applied to individualconnections.

Nolfi and Parisi (1993) evolved networks with “auto-teaching” inputs, which could then provide an error signal forthe network to adjust weights during lifetime. The implicationis that error-signals do not always need to be hand-designedbut can be discovered by evolution to fit a particular problem.A set of eight different local rules was used in Rolls andStringer (2000) to investigate the evolution of rules in combi-nation with the number of synaptic connections for each neu-ron, different neuron classes, and other network parameters.They found that evolution was effective in selecting from largeparameter sets to solve simple linear problems. In Maniadakisand Trahanias (2006), co-evolution was used to evolve agents(each being a network) that could use ten different types ofHebbian-like learning rules for simple navigation tasks: theauthors reported that, despite the increase in the search space,using many different learning rules results in better perfor-

mance but, understandably, a more difficult analysis of theevolved systems. Meng et al. (2011) evolved a gene regulatorynetwork that in turn determined the learning parameters ofthe BCM rule (Bienenstock et al., 1982), showing promisingperformance in time series classification and other supervisedtasks.

One general finding from these studies is that evolutionoperates well within large search spaces, particularly whena large set of evolvable rules is used.

C. Evolving learning architectures

The interdependency of learning rules and neural archi-tectures led to experiments in which evolution had morefreedom on the network’s design, including the architecture.The evolution of architectures in ANNs may involve searchingan optimal number of hidden neurons, the number of layersin a network, particular topologies or modules, the type ofconnectivity, and other properties of the network architecture.In EPANNs, evolving learning architectures implies morespecifically to discover a combination of architectures andlearning rules whose synergetic matching enables particu-lar learning dynamics. As opposed to biological networks,EPANNs do not have the neurophysiological constraints, e.g.,short neural connections, sparsity, brain size limits, etc., thatimpose limitations on the natural evolution of biological net-works. Thus, biologically implausible artificial systems maynevertheless be evolved in-silico (Bullinaria, 2007b, 2009b).

One seminal early study by Happel and Murre (1994) pro-posed the evolutionary design of modular neural networks inwhich modules could perform unsupervised learning, and theintermodule connectivity was shaped by Hebbian rules. Thefinding was that the use of evolution led to enhanced learningand better generalization capabilities in comparison to hand-designed networks. In Arifovic and Gencay (2001), the authorsused evolution to optimize the number of inputs and hiddennodes, and allowed connections in a feedforward neural net-work to be trained with backpropagation. Abraham (2004) pro-posed a method called Meta-Learning Evolutionary ArtificialNeural Networks (MLEANN) in which evolution searches forinitial weights, neural architectures and transfer functions for arange of supervised learning problems to be solved by evolvednetworks. The evolved networks were tested in time series pre-diction and compared with manually designed networks. Theanalysis showed that evolution consistently found networkswith better performance than the hand-designed structures.Khan et al. (2008) proposed an evolutionary developmentalsystem that created an architecture that adapted with learning:the network had a dynamic morphology in which neuronscould be inserted or deleted, and synaptic connections formedand changed in response to stimuli. The networks were evolvedwith Cartesian genetic programming and appeared to improvetheir performance while playing checkers over the generations.Downing (2007) looked at different computational models ofneurogenesis to evolve learning architectures. The proposedevolutionary developmental system focused in particular onabstraction levels and principles such as Neural Darwinism(Edelman and Tononi, 2000). A combination of evolution of

8

recurrent networks with a linear learner in the output wasproposed in Schmidhuber et al. (2007), showing that theevolved RNNs were more compact and resulted in betterlearning than randomly initialized echo state networks (Jaegerand Haas, 2004). In Khan et al. (2011b,a); Khan and Miller(2014), the authors introduced a large number of bio-inspiredmechanisms to evolve networks with rich learning dynamics.The idea was to use evolution to design a network that wascapable of advanced plasticity such as dendrite branch andaxon growth and shrinkage, neuron insertion and destruction,and many others. The system was tested on the Wumpus World(Russell and Norvig, 2013), a fairly simple problem with noonline learning required, but the purpose was to show thatevolution can design working control networks even within alarge search space of developmental dynamics.

In summary, learning mechanisms and neural architecturesare strongly interdependent, but a large set of available dynam-ics seem to facilitate the evolution of learning. Thus, EPANNsbecome more effective precisely when manual network designbecomes less practical because of complexity and rich dynam-ics.

D. EPANNs in Evolutionary Robotics

Evolutionary robotics (ER) (Cliff et al., 1993; Floreano andMondada, 1994, 1996; Urzelai and Floreano, 2000; Floreanoand Nolfi, 2004) contributed particularly to the developmentof EPANNs, providing a testbed for applied controllers inrobotics. Although ER had no specific assumptions on neu-ral systems or plasticity (Smith, 2002), robotics experimentssuggested that neural control structures evolved with fixedweights perform less well than those evolved with plasticweights (Nolfi and Parisi, 1996; Floreano and Urzelai, 2001b).Floreano and Urzelai (2001a) reported that networks evolvedfaster when synaptic plasticity and neural architectures wereevolved simultaneously. In particular, plastic networks wereshown to adapt better in the transition from simulation to realrobots. The better simulation-to-hardware transition, and theincreased adaptability in changing ER environments, appearedintuitive and supported by evidence (Nolfi and Parisi, 1996).However, the precise nature and magnitude of the changesfrom simulation to hardware is not always easy to quantify:those studies do not clearly outline the precise principles, e.g.,better or adaptive feedback control, minimization principles,etc., that are discovered by evolution with plasticity to producethose advantages. In fact, the behavioral changes required toswitch behaviors in simple ER experiments can also take placewith non-plastic recurrent neural networks because inputs canswitch the network’s activity to different attractors. Thesedynamics were observed in particular in recurrent networks inwhich neurons are modeled as leaky integrators with differenttime constants. Such networks were called continuous timerecurrent neural networks (CTRNN) (Beer and Gallagher,1992; Yamauchi and Beer, 1994). A study in 2003 observedsimilar performance in learning when comparing plastic andnon-plastic CTRNNs (Stanley and Miikkulainen, 2003b). Thatstudy indicates that the learning of plastic networks as de-scribed in Blynel and Floreano (2002, 2003) was at that point

still a proof-of-concept rather than a superior learning toolbecause networks with fixed weights could achieve similarperformance. Nevertheless, ER maintained a focus on plastic-ity as demonstrated, e.g., in The Cyber Rodent Project (Doyaand Uchibe, 2005) that investigated the evolution of learningby seeking to implement a number of features such as (1)evolution of neural controllers, (2) learning of foraging andmating behaviors, (3) evolution of learning architectures andmeta-parameters, (4) simultaneous learning of multiple agentsin a body, and (5) learning and evolution in a self-sustainedcolony. Plasticity in the form of modulated neural activationwas used in Husbands et al. (1998) and Smith et al. (2002) witha network that adapts its activation functions according to thediffusion of a simulated gas spreading to the substrate of thenetwork. Although the robotic visual discrimination tasks didnot involve lifetime learning, the plastic networks appeared toevolve faster than a network evolved with fixed activation func-tions. Similar conclusions were reached in Di Paolo (2003) andFederici (2005). Di Paolo (2002, 2003) evolved networks withSTDP for a wheeled robot to perform positive and negativephototaxis, depending on a conditioned stimulus, and observedthat networks with fixed weights could learn but had inferiorperformance with respect to plastic networks. Federici (2005)evolved plastic networks with STDP and an indirect encoding,showing that plasticity helped performance even if adaptivitywas not required. Stability and evolvability of simple roboticcontrollers were investigated in Hoinville et al. (2011) whofocused on EPANNs with homeostatic mechanisms.

Experiments in ER in the 1990s and early 2000s revealedthe extent, complexity, and multitude of ideas behind theevolutionary design of learning neuro-robotics controllers.They generally indicate that plasticity helps evolution undera variety of conditions, even when lifetime learning is notrequired, thereby promoting further interest in more specifictopics. Among those are the interaction of evolution andlearning, the evolution of neuromodulation, and the evolutionof indirectly encoded plasticity, as described in the following.

E. Interaction of evolution and learning

EPANN algorithms are influenced by the complex interac-tion of evolution and learning. Such interaction dynamics arepresent in any system, i.e., beyond EPANNs, that seeks theevolution of learning. Lifetime learning was first suggested tohave an impact on accelerating evolution by Baldwin (1896).The studies in Smith (1986); Hinton and Nowlan (1987);Boers et al. (1995); Mayley (1996); Bullinaria (2001) demon-strated the Baldwin effect with computational simulations. Forexample, Mayley (1997) investigated the effects of learningon the rate of evolution, showing that a learning processduring an individual lifetime, although costly, can improvethe speed of evolution of the species. In general, learningcan be classified in (1) learning of invariant environmentalfeatures, typically affected by the Baldwin effect, and (2)learning variable environmental features. A large body of workis dedicated to study the interaction and dynamics of all threePOE dimensions, e.g. Paenke (2008); Paenke et al. (2009).Paenke et al. (2007) investigated the influence of plasticity

9

25000 1000 2000

234

178

190

200

210

220

Generations

Fitn

ess/

Rew

ard

A learning strategyis discovered by anEPANN

A navigation strategy

reward-learningis first evolved without

Figure 3: Discovery of a learning strategy during evolution.After approximately 1 000 generations, a reward signal is usedby one network to learn where is the changing location of astochastic reward. Figure adapted from Soltoggio et al. (2007).This experiment suggests that the evolution of learning mayproceed through milestones without obvious gradients in thesearch.

and learning on evolution under directional selection; Meryand Burns (2010) demonstrate that behavioral plasticity affectsevolution.

In EPANNs, similarly complex interaction dynamics be-tween evolution and learning were also observed (Nolfi andFloreano, 1999). Stone (2007) showed that distributed neuralrepresentations accelerate the evolution of adaptive behaviorbecause learning part of a skill induced the automatic acqui-sition of other skill components. One study (Soltoggio et al.,2007) suggested that evolution discovers learning by discretestepping stones: when evolution casually discovers a weakmechanism of learning, it is sufficient to create an evolutionaryadvantage, so the neural mechanism is subsequently evolvedquickly to perfect learning: Fig. 3 shows a sudden jump in thefitness when the agent suddenly evolves a learning strategy:such jumps in fitness graphs are common in evolutionaryexperiments in which learning is discovered from scratch,rather than optimized, and were observed as early as inFontanari and Meir (1991).

When an environment changes over time, the frequency ofthose changes plays a role. With timescales comparable to alifetime, evolution may lead to phenotypic plasticity, which isthe capacity for a genotype to express different phenotypes inresponse to different environmental conditions (Lalejini andOfria, 2016). The frequency of environmental changes wasobserved experimentally in plastic neural networks to affectthe evolution of learning (Ellefsen, 2014), revealing a complexrelationship between environmental variability and evolvedlearning.

The evolution of neural learning was shown to be inherentlydeceptive in Risi et al. (2010); Lehman and Miikkulainen(2014). In Risi et al. (2009, 2010), EPANN-controlled sim-ulated robots, evolved in a discrete T-Maze domain, revealedthat the evolutionary stepping stones towards adaptive be-haviors are often not rewarded by objective-based perfor-mance measures. In other words, the necessary evolutionarystepping stones towards achieving adaptive behavior receive

a lower fitness score than more brittle solutions with lowlearning but effective behaviors. A solution to this problemwas devised in Risi et al. (2010, 2009), in which noveltysearch (Lehman and Stanley, 2008, 2011) was adopted as asubstitute for performance in the fitness objective with theaim of finding novel behaviors. Novelty search was observedto perform significantly better in the T-Maze domain. Lehmanand Miikkulainen (2014) later showed that novelty search canencourage the evolution of more adaptive behaviors across avariety of different variations of the T-Maze learning tasks.As a consequence, novelty search contributed to a philo-sophical change by questioning the centrality of objective-driven search in current evolutionary algorithms (Stanleyand Lehman, 2015). By rewarding novel behaviors, noveltysearch validates the importance of exploration or curiosity,previously proposed in Schmidhuber (1991, 2006), also froman evolutionary viewpoint. With the aim of validating thesame hypothesis, Soltoggio and Jones (2009) devised a simpleEPANN experiment in which exploration was more advanta-geous than exploitation in the absence of reward learning; todo this, the reward at a particular location depleted itself ifcontinuously visited, so that changing location at random ina T-maze became beneficial. Evolution discovered exploratorybehavior before discovering reward-learning, which in turn,and surprisingly, led to an earlier evolution of reward-basedlearning. Counterintuitively, this experiment suggests that astepping stone to evolve reward-based learning is to encouragenon-reward based exploration.

The seminal work in Bullinaria (2003, 2007a, 2009c) pro-poses the more general hypothesis that learning requires theevolution of long periods of parental protection and late onsetof maturity. Similarly, Ellefsen (2013b,a) investigates sensitiveand critical periods of learning in evolved neural networks.This fascinating hypothesis has wider implications for experi-ments with EPANNs, and more generally for machine learningand AI. It is therefore foreseeable that future EPANNs willhave a protected childhood during which parental guidancemay be provided (Clutton-Brock, 1991; Klug and Bonsall,2010; Eskridge and Hougen, 2012).

F. Evolving neuromodulationGrowing neuroscientific evidence on the role of neuromodu-

lation (previously outlined in Section II-B) inspired the designof experiments with neuromodulatory signals to evolve controlbehavior and learning strategies. One particular case is whenneuromodulation gates plasticity. Eq. 1 can be rewritten as as

∆w = m · f(x,θ) , (4)

to emphasize the role of m, a modulatory signal used asa multiplicative factor that can enhance or reduce plasticity(Abbott, 1990). A network may produce many independentmodulatory signals m targeting different neurons or areas ofthe network. Thus, modulation can vary in space and time.Modulation may also affect other aspects of the networkdynamics, e.g., modulating activations rather than plasticity(Krichmar, 2008). Graphically, modulation can be representedas a different type of signal affecting various properties of thesynaptic connections of an afferent neuron i (Fig. 4).

10

modulatory neuron

k

standard neuron

i

standard neuron

j plastic connection

synthesis of learning signals

synthesis of control signals

modulated network

wji

wki

xj xi

modulatory connection

m modulatory signal

Figure 4: A modulatory neuron gates plasticity of the synapsesthat connect to the postsynaptic neuron. The learning is local,but a learning signal can be created by one part of the networkand used to regulate learning elsewhere.

Evolutionary search was used to find the parameters ofa neuromodulated Hebbian learning rule in a reward-basedarmed-bandit problem in Niv et al. (2002). The same problemwas used later in Soltoggio et al. (2007) to evolve arbitrarylearning architectures with a bio-inspired gene representationmethod called Analog Genetic Encoding (AGE) (Mattiussi andFloreano, 2007). In that study, evolution was used to searchboth modulatory topologies and parameters of a particularform of Eq. 4:

∆w = m · (Axixj +Bxj + Cxi +D) , (5)

where the parameters A to D determined the influence of fourfactors in the rule: a multiplicative Hebbian term A, a presy-naptic term B, a postsynaptic term C, and pure modulatory,or heterosynaptic, term D. Such a rule is not dissimilar fromthose presented in previous studies (see Section 3.2). However,when used in combination with modulation and a searchfor network topologies, evolution seems to be particularlyeffective at solving reward-based problems. Kondo (2007)proposed an evolutionary design and behavior analysis ofneuromodulatory neural networks for mobile robot control,validating the potential of the method.

Soltoggio et al. (2008) tested the hypothesis of whethermodulatory dynamics held an evolutionary advantage in T-maze environments with changing reward locations. In theiralgorithm, modulatory neurons were freely inserted or deletedby random mutations, effectively allowing the evolutionaryselection mechanism to autonomously pick those networkswith advantageous computational components. After evolu-tion, the best performing networks had modulatory neuronsregulating learning, and evolved faster than a control evolu-tionary experiment that could not employ modulatory neurons.Modulatory neurons were maintained in the networks in a sec-ond phase of the experiment when genetic operators allowedfor the deletion of such neurons but not for their insertion,thus demonstrating their essential function. In another study,Soltoggio (2008b) suggested that evolved modulatory topolo-gies may be essential to separate the learning circuity fromthe input-output controller, effectively rediscovering an actor-critic architecture, and shortening the input-output pathwayswhich sped up decision processes. Soltoggio (2008a) showedthat the learning dynamics are affected by tight couplingbetween rules and architectures in a search space with many

equivalent but different control structures. Fig. 4 also suggeststhat modulatory networks require evolution to find two es-sential topological structures: what signals or combination ofsignals trigger modulation, and what neurons are to be targetedby modulatory signals. In other words, a balance betweenfixed and plastic architectures, or selective plasticity (DARPA-L2M, 2017), is an intrinsically emergent property of evolvedmodulated networks.

A number of further studies on the evolution of neuromod-ulatory dynamics confirmed the evolutionary advantages inlearning scenarios (Soltoggio, 2008c). Silva et al. (2012a) usedsimulations of 2-wheel robots performing a dynamic concur-rent foraging task, in which scattered food items periodicallychanged their nutritive value or become poisonous, similarly tothe setup in Soltoggio and Stanley (2012). The results showedthat when neuromodulation was enabled, learning evolvedfaster than when neuromodulation was not enabled, also withmulti-robot distributed systems (Silva et al., 2012b). Nogueiraet al. (2013) also reported evolutionary advantages in foragingbehavior of an autonomous virtual robot when equipped withneuromodulated plasticity. Harrington et al. (2013) demon-strated how evolved neuromodulation applied to a gene reg-ulatory network consistently generalized better than agentstrained with fixed parameter settings. Interestingly, Arnoldet al. (2013b) showed that neuromodulatory architecturesprovided an evolutionary advantage also in reinforcement-freeenvironments, validating the hypothesis that plastic modulatednetworks have higher evolvability in a large variety of tasks.The evolution of social representations in neural networkswas shown to be facilitated by neuromodulatory dynamics inArnold et al. (2013a). An artificial life simulation environmentcalled Polyword (Yoder and Yaeger, 2014) helped to assess theadvantage of neuromodulated plasticity in various scenarios.The authors found that neuromodulation may be able toenhance or diminish foraging performance in a competitive,dynamic environment.

Neuromodulation was evolved in Ellefsen et al. (2015) incombination with modularity to address the problem of catas-trophic forgetting. In Gustafsson (2016), networks evolvedwith AGE (Mattiussi and Floreano, 2007) for video gameplaying were shown to perform better with the addition of neu-romodulation. In Norouzzadeh and Clune (2016), the authorsshowed that neuromodulation produced forward models thatcould adapt to changes significantly better than the controls.They verified that evolution exploited variable learning ratesto perform adaptation when needed. In Velez and Clune(2017), diffusion-based modulation, i.e., targeting entire partsof the network, evolved to produce task-specific localizedlearning and functional modularity, thus reducing the problemof catastrophic forgetting.

The evidence in these studies suggests that neuromodulationis a key ingredient to facilitate the evolution of learning inEPANNs. They also indirectly suggest that neural systemswith more than one type of signal, e.g., activation and othermodulatory signals, are beneficial in the neuroevolution oflearning.

11

G. Indirectly encoded plasticity

An indirect genotype to phenotype mapping means thatevolution operates on a compact genotypical representation(analogous to the DNA) that is then mapped into a fullyfledged network (analogous to a biological brain). Learningrules may undergo a similar indirect mapping, so that compactinstructions in the genome expand to fully fledged plasticityrules in the phenotype. One early study (Gruau and Whitley,1993) encoded plasticity and development with a grammartree, and compared different learning rules on a simplestatic task (parity and symmetry), demonstrating that learningprovided an evolutionary advantage in a static scenario. Innon-static contexts, and using a T-Maze domain as learningtask, Risi and Stanley (2010) showed that HyperNEAT, whichusually implements a compact encoding of weight patterns forlarge-scale ANNs (Fig. 5a), can also encode patterns of locallearning rules. The approach, called adaptive HyperNEAT,can encode arbitrary learning rules for each connection in anevolving ANN based on a function of the ANN’s geometry(Fig. 5b). Further flexibility was added in Risi and Stanley(2012) to simultaneously encode the density and placementof nodes in substrate space. The approach, called adaptiveevolvable-substrate HyperNEAT, makes it possible to indi-rectly encode plastic ANNs with thousands of connectionsthat exhibit regularities and repeating motifs. Adaptive ES-HyperNEAT allows each individual synaptic connection, ratherthan neuron, to be standard or modulatory, thus introducingfurther design flexibility. Risi and Stanley (2014) showed howadaptive HyperNEAT can be seeded to produce a specificlateral connectivity pattern, thereby allowing the weights toself-organize to form a topographic map of the input space.The study shows that evolution can be seeded with specificplasticity mechanisms that can facilitate the evolution ofspecific types of learning.

The effect of indirectly encoded plasticity on the learningand on the evolutionary process was investigated by Tonelliand Mouret (2011, 2013). Using an operant conditioning task,i.e., learning by reward, the authors showed that indirectencodings that produced more regular neural structures alsoimproved the general EPANN learning abilities when com-pared to direct encodings. In an approach similar to adaptiveHyperNEAT, Orchard and Wang (2016) encoded the learningrule itself as an evolving network. They named the approachneural weights and bias update (NWB), and observed thatincreasing the search space of the possible plasticity rulescreated more general solutions than those based on onlyHebbian learning.

IV. FUTURE DIRECTIONS

The progress of EPANNs reviewed so far is based onrapidly developing theories and technologies. In particular,new advances in AI, machine learning, neural networks andincreased computational resources are currently creating a newfertile research landscape, and are setting the groundwork fornew directions for EPANNs. This section presents promisingresearch themes that have the potential to extend and radicallychange the field of EPANNs and AI as a whole.

A. Levels of abstraction and representations

Choosing the right level of abstraction and the right repre-sentation (Bengio et al., 2013) are themes at the heart of manyproblems in AI. In ANNs, low levels of abstraction are morecomputationally expensive, but might be richer in dynamics.High levels are faster to simulate, but require an intuition ofthe essential dynamics that are necessary in the model.

One example that abstraction levels are still an open ques-tion derives from the comparison of two fields: deep learningand bio-physical neural models. The performance of deepnetworks (Krizhevsky et al., 2012; LeCun et al., 2015; Schmid-huber, 2015), based mainly on rate-based neurons and lowprecision in the computation, appears to contrast with the low-level simulations of accurate bio-physical models. Examplesare the Blue Brain Project (Markram, 2006) and the HumanBrain Project (Human-Brain-Project, 2017). Although deeplearning and bio-physical models have different aims, theyboth represent abstractions of neural computation. Researchin EPANNs is best placed to address the problem of levelsof abstraction because it can autonomously discover the mostappropriate level to evolve neural adaptation and intelligence.

Similarly to abstractions, representations play a criticalrole. Compositional Patterns Producing Networks (CPPNs)(Stanley, 2007), and also the previous work of Sims (1991),demonstrated that structured phenotypes can be generatedthrough a function without going through the dynamic devel-opmental process typical of multicellular organisms. Relatedly,Hornby et al. (2002) showed that the different phenotypicalrepresentations led to considerably different results in theevolution of furniture. Miller (2014) discussed explicitly theeffect of abstraction levels for evolved developmental learningnetworks, in particular in relation to two approaches that modeldevelopment at the neuron level or at the network level.

Finding appropriate abstractions and representations, justas it was fundamental in the advances in deep learning torepresent input spaces and hierarchical features (Bengio et al.,2013; Oquab et al., 2014), can also extend to representations ofinternal models, learning mechanisms, and genetic encodings,affecting the algorithms’ capabilities of evolving learningabilities.

B. Evolving general learning

One challenge in the evolution of learning is that evolvedlearning may simply result in a switch among a finite set ofevolved behaviors, e.g., turning left or right in a T-Maze in afinite sequence, which is all that evolving solutions encounterduring lifetime. A challenge for EPANNs is to acquire generallearning abilities in which the network is capable of learn-ing problems not encountered during evolution. Mouret andTonelli (2014) propose the distinction between the evolutionof behavioral switches and the evolution of synaptic generallearning abilities, and suggest conditions that favor these typesof learning. General learning can be intuitively understoodas the capability to learn any association among input, in-ternal, and output patterns, both in the spatial and temporaldimensions, regardless of the complexity of the problem. Suchobjective clearly poses practical and philosophical challenges.

12

(a) HyperNEAT (b) Adaptive HyperNEAT

Figure 5: Example of an indirect mapping of plasticity rules from a compact genotype to a larger phenotype. (a) ANN nodesin HyperNEAT are situated in physical space by assigning them specific coordinates. The connections between nodes aredetermined by an evolved Compositional Patterns Producing Network (CPPN; Stanley (2007)), which takes as inputs thecoordinates of two ANN neurons and returns the weight between them. In the normal HyperNEAT approach (a), the CPPN isqueried once for all potential ANN connections when the agent is born. On the other hand, in adaptive HyperNEAT (b), theCPPN is continually queried during the lifetime of the agent to determine individual connection weight changes based on thelocation of neurons and additionally the activity of the presynaptic and postsynaptic neuron, and current connection weight.Adaptive HyperNEAT is able to indirectly encode a pattern of nonlinear learning rules for each connection in the ANN (right).

Although humans are considered better at general learningthan machines, human learning skills are also specific and notunlimited (Ormrod and Davis, 2004). Nevertheless, movingfrom behavior switches to more general learning is a desirablefeature for EPANNs. Encouraging the emergence of generallearners may likely involve (1) an increased computational costfor testing in rich environments, which include a large varietyof uncertain and stochastic scenarios with problems of variouscomplexity, and (2) an increased search space to explore theevolution of complex strategies and avoid deception.

One promising direction is to evolve the ability to learnalgorithms and programs instead of simple associations. Thisapproach has recently shown promise in the context of deeplearning; methods such as the Neural Turing Machine (Graveset al., 2014) can learn simple algorithms such as sort, copy,etc. and approaches such as the Neural Programmer-Interface(Reed and de Freitas, 2015) can learn to represent and executeprograms.

C. Incremental and social learningAn important open challenge for machine learning in gen-

eral is the creation of neural systems that can continuouslyintegrate new knowledge and skills without forgetting whatthey previously learned. A promising approach is progressiveneural networks (Rusu et al., 2016), in which a new net-work is created for each new task, and lateral connectionsbetween networks allow the system to leverage previouslylearned features. In the presence of time delays among stimuli,actions and rewards, a rule called hypothesis testing plasticity(HTP) (Soltoggio, 2015) implements fast and slow decay toconsolidate weights and suggests neural dynamics to avoidcatastrophic forgetting. A method to find the best sharedweights across multiple tasks, called elastic weight consol-idation (EWC) was proposed in Kirkpatrick et al. (2017).Plasticity rules that implement weight consolidation, giventheir promise to prevent catastrophic forgetting, are likely tobecome standard components in EPANNs.

Encouraging modularity (Ellefsen et al., 2015; Durr et al.,2010) or augmenting evolving networks with a dedicatedexternal memory component (Luders et al., 2016) have beenproposed recently. An evolutionary advantage is likely toemerge for networks that can elaborate on previously learnedsub-skills during their lifetime to learn more complex tasks.

One interesting case in which incremental learning may playa role is social learning (Best, 1999). EPANNs may learn bothfrom the environment and from other individuals, from scratchor incrementally (Offerman and Sonnemans, 1998). In anearly study, McQuesten and Miikkulainen (1997) showed thatneuroevolution can benefit from parent networks teaching theiroffspring through backpropagation. When social, learning mayinvolve imitation, language or communication, or other socialbehaviors. Bullinaria (2017) proposes an EPANN frameworkto simulate the evolution of culture and social learning. Itis reasonable to assume that future AI learning systems,whether based on EPANNs or not, will acquire knowledgethrough different modalities, involving direct experience withthe environment, but also social interaction, and possibly withcomplex incremental learning phases.

D. Fast learningAnimal learning does not always require myriads of trials.

Humans can very quickly generalize from only a few givenexamples, possibly leveraging previous experiences and along learning process during infancy. This type of learning,advocated in AI and robotics systems (Thrun and Mitchell,1995), is currently still missing in EPANNs. Inspiration fornew approaches could come from complementary learningsystems (McClelland et al., 1995; Kumaran et al., 2016) thathumans seem to possess, which include fast and slow learningcomponents. Additionally, approaches such as probabilisticprogram induction seem to be able to learn concepts in one-shot at a human-level in some tasks (Lake et al., 2015). Fastlearning is likely to derive not just from trial-and-error, but alsofrom mental models that can be applied to diverse problems,

13

similarly to transfer learning (Thrun and Mitchell, 1995; Thrunand O’Sullivan, 1996; Pan and Yang, 2010). Reusable mentalmodels, once learned, will allow agents to make predictionsand plan in new and uncertain scenarios with similaritiesto previously learned ones. If EPANNs can discover neuralstructures or learning rules that allow for generalization, anevolutionary advantage of such a discovery will lead to itsfull emergence and further optimization of such a property.

A rather different approach to accelerate learning wasproposed in Fernando et al. (2008); de Vladar and Szathmary(2015) and called Evolutionary Neurodynamics. Accordingto this theory, replication and selection might happen in aneural system as it learns, thus mimicking an evolutionarydynamics at the much faster time scale of a lifetime. We referto Fernando et al. (2012); de Vladar and Szathmary (2015)for an overview of the field. The appeal of this method is thatevolutionary search can be accelerated by implementing itsdynamics at both the evolution’s and life’s time-scales.

E. Evolving memory

The consequence of learning is memory, both explicitand implicit (Anderson, 2013), and its consolidation (Dudai,2012). For a review of computational models of memory seeFusi (2017). EPANNs may reach solutions in which memoryevolved in different fashions, e.g., preserved as self-sustainedneural activity, encoded by connection weights modified byplasticity rules, stored with an external memory (e.g. NeuralTuring Machine), or a combination of these approaches. Re-current neural architectures based on long short-term memory(LSTM) allow very complex tasks to be solved throughgradient descent training (Greff et al., 2015; Hochreiter andSchmidhuber, 1997) and have recently shown promise whencombined with evolution (Rawal and Miikkulainen, 2016).Neuromodulation and weight consolidation could also be usedto target areas of the network where information is stored.

Graves et al. (2014) introduced the Neural Turing Machine(NTM), networks augmented with an external memory thatallows long-term memory storage. NTMs have shown promisewhen trained through evolution (Greve et al., 2016; Luderset al., 2016, 2017) or gradient descent (Graves et al., 2014,2016). The Evolvable Neural Turing Machine (ENTM) showedgood performance in solving the continuous version of thedouble T-Maze navigation task (Greve et al., 2016), andavoided catastrophic forgetting in a continual learning domain(Luders et al., 2016, 2017) because memory and control areseparated by design. Research in this area will reveal whichcomputational systems are more evolvable and how memorieswill self organize and form in EPANNs.

F. EPANNs and deep learning

Deep learning has shown remarkable results in a varietyof different fields (Krizhevsky et al., 2012; Schmidhuber,2015; LeCun et al., 2015). However, the model structuresof these networks are mostly hand-designed, include a largenumber of parameters, and require extensive experiments todiscover optimal configurations. Advances in deep networkshave come from manually testing various design aspects such

as: different transfer functions, e.g., the ReLU (Krizhevskyet al., 2012); generalization mechanisms, e.g. the dropout(Srivastava et al., 2014); networks with different depth, e.g.(Simonyan and Zisserman, 2014), and others (Schmidhuber,2015). With increased computational resources, it is nowpossible to search those design aspects with evolution.

Koutnık et al. (2014) used evolution to design a controllerthat combined evolved recurrent neural networks, for thecontrol part, and a deep max-pooling convolutional neuralnetwork to reduce the input dimensionality. The study does notuse evolution on the deep preprocessing networks itself, butdemonstrates nevertheless the evolutionary design of a deepneural controller. Young et al. (2015) used an evolutionaryalgorithm to optimize two parameters of a deep network: thesize (range [1,8]) and the number (range [16,126]) of thefilters in a convolutional neural network, showing that the op-timized parameters could vary considerably from the standardbest-practice values. An established evolutionary computationtechnique, the Covariance Matrix Adaptation Evolution Strat-egy (CMA-ES) Hansen and Ostermeier (2001), was used inLoshchilov and Hutter (2016) to optimize the parameters ofa deep network to learn to classify the MNIST dataset. Theauthors reported performance close to the state-of-the-art using30 GPU devices.

Real et al. (2017) and Miikkulainen et al. (2017) showedthat evolutionary search can be used to determine the topology,hyperparameters and building blocks of deep networks trainedthrough gradient descent. The performance were shown to rivalthose of hand-designed architectures in the CIFAR-10 classifi-cation task and a language modeling task (Miikkulainen et al.,2017), while Real et al. (2017) also tested the method on thelarger CIFAR-100 dataset. Desell (2017) proposes a methodcalled evolutionary exploration of augmenting convolutionaltopologies, inspired by NEAT (Stanley and Miikkulainen,2002), which evolves progressively more complex unstruc-tured convolutional neural networks using genetically specifiedfeature maps and filters. This approach is also able to co-evolve neural network training hyperparameters. Results wereobtained using 5 500 volunteered computers at the CitizenScience Grid who were able to evolve competitive resultson the MNIST dataset in under 12 500 trained networks ina period of approximately ten days. Liu et al. (2017) usedevolution to search for hierarchical architecture representationsshowing competitive performance on the CIFAR-10 and Im-agenet databases. The Evolutionary DEep Networks (EDEN)framework (Dufourq and Bassett, 2017) aims to generalizedeep network optimization to a variety of problems and isinterfaced with TensorFlow (Abadi et al., 2016). A number ofsimilar software frameworks are currently being developed.

Fernando et al. (2017) used evolution to determine a sub-set of pathways through a network that are trained throughbackpropagation, allowing the same network to learn a varietyof different tasks. Fernando et al. (2016) were also able torediscover convolutional networks by means of evolution ofDifferentiable Pattern Producing Networks (Stanley, 2007).

In summary, evolutionary search allows not only to tunehyperparameters of the networks, but also to combine differentlearning rules in an innovative and counter-intuitive fashion.

14

Evolutionary algorithms are particularly suitable to operatewith unsupervised and reward-based problems. EPANNs neednot be differentiable, as in gradient descent based approaches,thereby allowing a greater variety in architectures (Greveet al., 2016). Finally, the deception in the evolution of learn-ing suggests that gradient descent approaches can encounterlimitations when searching for complex learning strategies,leaving EPANNs as a compelling alternative to discover moreadvanced learning models.

G. GPU implementations and neuromorphic hardwareThe progress of EPANNs will crucially depend on imple-

mentations that take advantage of the increased computationalpower of parallel computation with GPUs and neuromorphichardware (Jo et al., 2010; Monroe, 2014). Deep learninggreatly benefited from GPU-accelerated machine learning butalso standardized tools (e.g. Torch, Tensorflow, Theano, etc.)that made it easy for anybody to download, experiment, andextend promising deep learning models.

Howard et al. (2011, 2012, 2014) devised experiments toevolve plastic spiking networks implemented as memristorsfor simulated robotic navigation tasks. Memristive plasticitywas observed consistently to enable higher performance thanconstant-weighted connections in both static and dynamicreward scenarios. Carlson et al. (2014) used GPU imple-mentations to evolve plastic spiking neural networks with anevolution strategy, which resulted in an efficient and automatedparameter tuning framework.

In the context of newly emerging technologies, it is worthnoting that, just as GPUs were not developed initially fordeep learning, so novel neural computation tools and hardwaresystems, not developed for EPANNs, can now be exploited tosetup the articulated processes and computationally expensiveexperiments required by EPANNs.

H. Measuring progressThe number of platforms and environments for testing

the capabilities of intelligent systems is constantly growing,e.g., the Atari or General Video Game Playing Benchmark(GVGAI, 2017), the Project Malmo (Microsoft, 2017), or theOpenAI Universe, “a software platform for measuring andtraining an AI’s general intelligence across the world’s supplyof games, websites and other applications” (OpenAI, 2017).Measuring performance in supervised learning problems isstraight forward because the classification error can be usedas metrics. However, EPANNs are often evolved in reward-based, survival, or novelty-oriented environments to discovernew, unknown, or creative learning strategies or behaviors.Thus, desired behaviors or errors are not always defined.Moreover, the goal for EPANNs is often not to be good atsolving one particular task, but rather to test the capabilityto evolve the learning required for a range of problems, togeneralize to new problems, or to recover performance aftera change in the environment. Therefore, EPANNs will requirethe community to devise and accept new metrics based on oneor more objectives such as the following:

• the time (in the evolutionary scale) to evolve the learningmechanisms in one or more scenarios;

• the time (in the lifetime scale) for learning in one or morescenarios;

• the number of different tasks that an EPANN evolves tosolve;

• a measure of the variety of skills acquired by oneEPANN;

• the complexity of the tasks and/or datasets, e.g., variationsin distributions, stochasticity, etc.;

• the robustness and generalization capabilities of thelearner;

• the recovery time in front of high-level variations orchanges, e.g., data distribution, type of problem, stochas-ticity levels, etc.;

• computational resources used, e.g., number of lifetimeevaluations, length of a lifetime;

• size, complexity, and computational requirements of thesolution once deployed.

• novelty or richness of the behavior repertoire from mul-tiple solutions, e.g., the variety of different EPANNs andtheir strategies that were designed during evolution.

Few of those metrics are currently used to benchmarkmachine learning algorithms. Research in EPANNs will fosterthe adoption of such criteria as wider performance metrics forassessing lifelong learning capabilities (Thrun and Pratt, 2012;DARPA-L2M, 2017) of evolved plastic networks.

V. CONCLUSION

This paper first described the broad inspiration and aspira-tions of evolved artificial plastic neural networks (EPANNs),which are shown to draw from large, diverse, and interdis-ciplinary areas. The inspirations also reveal the ambitiousand long-term research objectives of EPANNs with importantimplications for artificial intelligence and biology.

EPANNs saw considerable progress in the last two decades,which led to the discovery of a number of further researchthemes, such as the design of evolutionary algorithms topromote the evolution of learning, a better understanding of thecomplex interaction dynamics between evolution and learning,the advantages of multi-signal networks such as modulatorynetworks, and the attempt to evolve learning rules, architec-tures, and appropriate representations of learning mechanisms.

Finally, this paper outlined that recent scientific and tech-nical progress may enable a step change in the potential ofEPANNs. In light of current advances, a pivotal moment forEPANNs may be at hand as the next AI tool to find flexi-ble algorithms, capable of discovering principles for generaladaptation and intelligent systems.

ACKNOWLEDGEMENTS

We thank John Bullinaria, Kris Carlson, Jeff Clune, TravisDesell, Keith Downing, Dean Hougen, Joel Lehman, JeffKrichmar, Jay McClelland, Robert Merrison-Hort, JulianMiller, Jean-Baptiste Mouret, James Stone, Eors Szathmary,and Joanna Turner for insightful discussions and commentson earlier versions of this paper.

15

REFERENCES

Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo,Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis,Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scalemachine learning on heterogeneous distributed systems.arXiv preprint arXiv:1603.04467, 2016.

L. F. Abbott. Modulation of Function and Gated Learning in aNetwork Memory. Proceedings of the National Academy ofScience of the United States of America, 87(23):9241–9245,1990.

Ajith Abraham. Meta learning evolutionary artificial neuralnetworks. Neurocomputing, 56:1–38, 2004.

Wickliffe C Abraham and Anthony Robins. Memoryretention–the synaptic stability versus plasticity dilemma.Trends in neurosciences, 28(2):73–78, 2005.

William H. Alexander and Olaf Sporns. An Embodied Modelof Learning, Plasticity, and Reward. Adaptive Behavior, 10(3-4):143–159, 2002.

Louis Victor Allis et al. Searching for solutions in games andartificial intelligence. Ponsen & Looijen, 1994.

Ethem Alpaydin. Introduction to machine learning. MIT press,2014.

John R Anderson. The architecture of cognition. PsychologyPress, 2013.

Peter J Angeline, Gregory M Saunders, and Jordan B Pollack.An evolutionary algorithm that constructs recurrent neuralnetworks. IEEE transactions on Neural Networks, 5(1):54–65, 1994.

Jasmina Arifovic and Ramazan Gencay. Using genetic al-gorithms to select architecture of a feedforward artificialneural network. Physica A: Statistical mechanics and itsapplications, 289(3):574–594, 2001.

Solvi Arnold, Reiji Suzuki, and Takaya Arita. Evolutionof social representation in neural networks. In Advancesin Artificial Life, ECAL 2013: Proceedings of the TwelfthEuropean Conference on the Synthesis and Simulation ofLiving Systems. MIT Press, Cambridge, MA, pages 425–430, 2013a.

Solvi Arnold, Reiji Suzuki, and Takaya Arita. Selectionfor reinforcement-free learning ability as an organizingfactor in the evolution of cognition. Advances in ArtificialIntelligence, 2013:8, 2013b.

Joshua Auerbach, Chrisantha Fernando, and Dario Floreano.Online Extreme Evolutionary Learning Machines. In Pro-ceedings of the Fourteenth International Conference on theSynthesis and Simulation of Living Systems, 2014.

Nihat Ay, Jessica Flack, and David C Krakauer. Robustnessand complexity co-constructed in multimodal signalling net-works. Philosophical Transactions of the Royal Society ofLondon B: Biological Sciences, 362(1479):441–447, 2007.

Thomas Back and Hans-Paul Schwefel. An overview of evolu-tionary algorithms for parameter optimization. Evolutionarycomputation, 1(1):1–23, 1993.

Y. Bahroun, E. Hunsicker, and A. Soltoggio. Building EfficientDeep Hebbian Networks for Image Classification Tasks.In International Conference on Artificial Neural Networks,2017.

C. H. Bailey, M. Giustetto, Y.-Y. Huang, R. D. Hawkins, andE. R. Kandel. Is heterosynaptic modulation essential forstabilizing Hebbian plasticity and memory? Nature ReviewsNeuroscience, 1(1):11–20, October 2000.

J Mark Baldwin. A new factor in evolution (continued).American naturalist, pages 536–553, 1896.

Douglas A. Baxter, Carmen C. Canavier, John W. Clark, andJohn H. Byrne. Computational Model of the SerotonergicModulation of Sensory Neurons in Aplysia. Journal ofNeurophysiology, 82:1914–2935, 1999.

Jonathan Baxter. The Evolution of Learning Algorithms forArtificial Neural Networks. Complex Systems, D. Greenand T.Bossomaier, Eds. Amsterdam, The Netherlands: IOS,pages 313–326, 1992.

Randall D. Beer and John C. Gallagher. Evolving dynamicalneural networks for adaptive behavior. Adaptive Behavior,1(1):91–122, 1992.

S. Bengio, Y. Bengio, J. Cloutier, and J. Gecsei. On theoptimization of a synaptic learning rule. In Preprints Conf.Optimality in Artificial and Biological Neural Networks,Univ. of Texas, Dallas, Feb 6-8, 1992, 1992.

Yoshua Bengio, Samy Bengio, and Jocelyn Cloutier. Learn-ing a synaptic learning rule. Technical report, Universitede Montreal, Departement d’informatique et de rechercheoperationnelle, 1990.

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Repre-sentation learning: A review and new perspectives. IEEEtransactions on pattern analysis and machine intelligence,35(8):1798–1828, 2013.

Peter Bentley. Evolutionary design by computers. MorganKaufmann, 1999.

Michael L Best. How culture can guide evolution: An inquiryinto gene/meme enhancement and opposition. AdaptiveBehavior, 7(3-4):289–306, 1999.

L. Elie Bienenstock, Leon N. Cooper, and Paul W. Munro.Theory for the development of neuron selectivity: orienta-tion specificity and binocular interaction in visual cortex.The Journal of Neuroscience, 2(1):32–48, January 1982.

John T. Birmingham. Increasing Sensor Flexibility ThroughNeuromodulation. Biological Bulletin, 200:206–210, April2001.

Jesper Blynel and Dario Floreano. Levels of Dynamics andAdaptive Behavior in Evolutionary Neural Controllers. InProceedings of the seventh international conference on sim-ulation of adaptive behavior on From animals to animats,pages 272–281. MIT Press Cambridge, MA, USA, 2002.

Jesper Blynel and Dario Floreano. Exploring the T-Maze:Evolving Learning-Like Robot Behaviors Using CTRNNs.In EvoWorkshops, pages 593–604, 2003.

Egbert JW Boers, Marko V Borst, and Ida G Sprinkhuizen-Kuyper. Evolving neural networks using the “Baldwineffect”. In Artificial Neural Nets and Genetic Algorithms,pages 333–336. Springer, 1995.

Herve Bourlard and Yves Kamp. Auto-association by multi-layer perceptrons and singular value decomposition. Bio-logical cybernetics, 59(4-5):291–294, 1988.

http://arxiv.org/abs/1603.04467

16

Thomas H. Brown, Edward W. Kairiss, and Claude L.Keenan. Hebbian Synapse: Biophysical Mechanisms andAlgorithms. Annual Review of Neuroscience, 13:475–511,1990.

John A Bullinaria. Exploring the Baldwin effect in evolvingadaptable control systems. In Connectionist models of learn-ing, development and evolution, pages 231–242. Springer,2001.

John A Bullinaria. From biological models to the evolution ofrobot control systems. Philosophical Transactions: Math-ematical, Physical and Engineering Sciences, pages 2145–2164, 2003.

John A Bullinaria. The effect of learning on life historyevolution. In Proceedings of the 9th annual conferenceon Genetic and evolutionary computation, pages 222–229.ACM, 2007a.

John A Bullinaria. Understanding the emergence of modularityin neural systems. Cognitive science, 31(4):673–695, 2007b.

John A Bullinaria. Evolved Dual Weight Neural Architecturesto Facilitate Incremental Learning. In IJCCI, pages 427–434, 2009a.

John A Bullinaria. The importance of neurophysiological con-straints for modelling the emergence of modularity. Compu-tational modelling in behavioural neuroscience: closing thegap between neurophysiology and behaviour, 2:188–208,2009b.

John A Bullinaria. Lifetime learning as a factor in life historyevolution. Artificial Life, 15(4):389–409, 2009c.

John A. Bullinaria. Imitative and Direct Learning as Interact-ing Factors in Life History Evolution. Artificial Life, 23,2017.

Jeremie Cabessa and Hava T Siegelmann. The super-Turingcomputational power of plastic recurrent neural networks.International journal of neural systems, 24(08):1450029,2014.

Thomas J. Carew, Edgar T. Walters, and Eric R. Kandel.Classical conditioning in a simple withdrawal reflex inAplysia californica. The Journal of Neuroscience, 1(12):1426–1437, December 1981.

Kristofor D Carlson, Jayram Moorkanikara Nageswaran, NikilDutt, and Jeffrey L Krichmar. An efficient automatedparameter tuning framework for spiking neural networks.Neuromorphic Engineering Systems and Applications, 8,2014.

Charles S Carver and Michael F Scheier. Attention and self-regulation: A control-theory approach to human behavior.Springer Science & Business Media, 2012.

D Cervier. AI: The Tumultuous Search for Artificial Intelli-gence, 1993.

David J Chalmers. The evolution of learning: An experimentin genetic connectionism. In Proceedings of the 1990connectionist models summer school, pages 81–90. SanMateo, CA, 1990.

Dmitri B Chklovskii, BW Mel, and K Svoboda. Corticalrewiring and information storage. Nature, 431(7010):782–788, 2004.

Gregory A. Clark and Eric R. Kandel. Branch-specificheterosynaptic facilitation in Aplysia siphon sensory cells.PNAS, 81(8):2577–2581, 1984.

Dave Cliff, Phil Husbands, and Inman Harvey. Explorationsin Evolutionary Robotics. Adaptive Behavior, 2(1):73–110,1993.

Tim H Clutton-Brock. The evolution of parental care. Prince-ton University Press, 1991.

Oliver J Coleman and Alan D Blair. Evolving plastic neuralnetworks for online learning: review and future directions.In AI 2012: Advances in Artificial Intelligence, pages 326–337. Springer, 2012.

Steven J. Cooper. Donald O. Hebb’s synapse and learning rule:a history and commentary. Neuroscience and BiobehavioralReviews, 28(8):851–874, January 2005.

Antonio R Damasio. The feeling of what happens: Body andemotion in the making of consciousness. Houghton MifflinHarcourt, 1999.

DARPA-L2M. DARPA, Lifelong Learning Machines.https://www.fbo.gov/spg/ODA/DARPA/CMO/HR001117S0016/listing.html, accessed July 2017, 2017.

Charles Darwin. On the origin of species by means of naturalselection, or The preservation of favoured races in thestruggle for life. Murray, London, 1859.

Richard Dawkins. The evolution of evolvability. On growth,form and computers, pages 239–255, 2003.

Alain De Botton. Six areas that artificial emotional intelligencewill revolutionise. http://www.wired.co.uk/article/wired-world-2016-alain-de-botton-artificial-emotional-intelligence, 2016. Accessed: 12/11/2016.

Harold P de Vladar and Eors Szathmary. Neuronal boostto evolutionary dynamics. Interface focus, 5(6):20150074,2015.

Ian J Deary, Wendy Johnson, and Lorna M Houlihan. Geneticfoundations of human intelligence. Human genetics, 126(1):215–232, 2009.

Li Deng, Geoffrey Hinton, and Brian Kingsbury. New typesof deep neural network learning for speech recognition andrelated applications: An overview. In Acoustics, Speechand Signal Processing (ICASSP), 2013 IEEE InternationalConference on, pages 8599–8603. IEEE, 2013.

Travis Desell. Large Scale Evolution of Convolutional NeuralNetworks Using Volunteer Computing. arXiv preprintarXiv:1703.05422, 2017.

Ezequiel Di Paolo. Spike-timing dependent plasticity forevolved robots. Adaptive Behavior, 10(3-4):243–263, 2002.

Ezequiel A. Di Paolo. Evolving spike-timing-dependingplasticity for single-trial learning in robots. PhilosophicalTransactions of the Royal Society, 361(1811):2299–2319,October 2003.

Theodosius Dobzhansky. Genetics of the evolutionary process,volume 139. Columbia University Press, 1970.

Norman Doidge. The brain that changes itself: Stories of per-sonal triumph from the frontiers of brain science. Penguin,2007.

Keith L. Downing. Supplementing Evolutionary Develop-ment Systems with Abstract Models of Neurogenesis. In

https://www.fbo.gov/spg/ODA/DARPA/CMO/ HR001117S0016/listing.html

https://www.fbo.gov/spg/ODA/DARPA/CMO/ HR001117S0016/listing.html

http://www.wired.co.uk/article/wired-world-2016-alain-de-botton-artificial-emotional-intelligence




17

Dirk Thiersens at al., editor, Proceeding of the Geneticand Evolutionary Computation Conference (GECCO 2007),pages 990–998. ACM, 2007.

Keith L Downing. Intelligence Emerging: Adaptivity andSearch in Evolving Neural Systems. MIT Press, 2015.

Kenji Doya. Metalearning and neuromodulation. NeuralNetworks, 15(4-6):495–506, 2002.

Kenji Doya and Eiji Uchibe. The Cyber Rodent Project:Exploration and Adaptive Mechanisms for Self-Preservationand Self-Reproduction. Adaptive Behavior, 13(2):149–160,2005.

Bogdam Draganski and Arne May. Training-induced structuralchanges in the adult human brain. Behavioural brainresearch, 192(1):137–142, 2008.

Yadin Dudai. The restless engram: consolidations never end.Annual review of neuroscience, 35:227–247, 2012.

Emmanuel Dufourq and Bruce A Bassett. EDEN: Evolution-ary Deep Networks for Efficient Machine Learning. arXivpreprint arXiv:1709.09161, 2017.

Peter Durr, Dario Floreano, and Claudio Mattiussi. Ge-netic representation and evolvability of modular neuralcontrollers. IEEE Computational Intelligence Magazine, 5(3):10–19, 2010.

Gerald M Edelman and Giulio Tononi. A universe of con-sciousness: How matter becomes imagination. Basic books,2000.

Agoston E Eiben and Jim Smith. From evolutionary computa-tion to the evolution of things. Nature, 521(7553):476–482,2015.

Kai Olav Ellefsen. Different regulation mechanisms forevolved critical periods in learning. In 2013 IEEE Congresson Evolutionary Computation, pages 1193–1200. IEEE,2013a.

Kai Olav Ellefsen. Evolved sensitive periods in learning.Advances in Artificial Life, ECAL 2013: Proceedings ofthe Twelfth European Confer- ence on the Synthesis andSimulation of Living Systems, 12:409–416, 2013b.

Kai Olav Ellefsen. The Evolution of Learning Under Environ-mental Variability. Artificial Life 14: Proceedings of TheFourteenth International Conference on the Synthesis andSimulation of Living Systems, 2014.

Kai Olav Ellefsen, Jean-Baptiste Mouret, and Jeff Clune.Neural Modularity Helps Organisms Evolve to Learn NewSkills without Forgetting Old Skills. PLoS Comput Biol, 11(4):e1004128, 2015.

Brent E Eskridge and Dean F Hougen. Nurturing promotesthe evolution of learning in uncertain environments. In De-velopment and Learning and Epigenetic Robotics (ICDL),2012 IEEE International Conference on, pages 1–6. IEEE,2012.

Diego Federici. Evolving Developing Spiking Neural Net-works. In Proceedings of the IEEE Congress on Evolution-ary Computation, CEC 2005, 2005.

Jean-Marc Fellous and Christiane Linster. ComputationalModels of Neuromodulation. Neural Computation, 10:771–805, 1998.

Chrisantha Fernando, KK Karishma, and Eors Szathmary.Copying and evolution of neuronal topology. PloS one, 3(11):e3775, 2008.

Chrisantha Fernando, Eors Szathmary, and Phil Husbands.Selectionist and evolutionary approaches to brain function: acritical appraisal. Frontiers in computational neuroscience,6, 2012.

Chrisantha Fernando, Dylan Banarse, Malcolm Reynolds,Frederic Besse, David Pfau, Max Jaderberg, Marc Lanc-tot, and Daan Wierstra. Convolution by Evolution: Dif-ferentiable Pattern Producing Networks. arXiv preprintarXiv:1606.02580, 2016.

Chrisantha Fernando, Dylan Banarse, Charles Blundell, YoriZwols, David Ha, Andrei A Rusu, Alexander Pritzel,and Daan Wierstra. PathNet: Evolution Channels Gradi-ent Descent in Super Neural Networks. arXiv preprintarXiv:1701.08734, 2017.

Peter SB Finnie and Karim Nader. The role of metaplasticitymechanisms in regulating memory destabilization and re-consolidation. Neuroscience & Biobehavioral Reviews, 36(7):1667–1707, 2012.

D. Floreano and F. Mondada. Evolution of Homing Navigationin a Real Mobile Robot. IEEE Transactions on Systems,Man, and Cybernetics–Part B, 26(3):396–407, 1996.

D. Floreano and J. Urzelai. Neural morphogenesis, synapticplasticity, and evolution. Theory in Biosciences, 240:225–240, 2001a.

Dario Floreano and Claudio Mattiussi. Bio-inspired artificialintelligence: theories, methods, and technologies. MITpress, 2008.

Dario Floreano and Francesco Mondada. Automatic creationof an autonomous agent: genetic evolution of a neural-network driven robot. In Proceedings of the third inter-national conference on Simulation of adaptive behavior :from animals to animats 3, pages 421–430. MIT Press,Cambridge, MA, USA, 1994.

Dario Floreano and Stefano Nolfi. Evolutionary Robotics. TheMIT Press, 2004.

Dario Floreano and Joseba Urzelai. Evolution of PlasticControl Networks. Auton. Robots, 11(3):311–317, 2001b.

David B Fogel. Evolutionary computation: toward a newphilosophy of machine intelligence, volume 1. John Wiley& Sons, 2006.

JF Fontanari and R Meir. Evolving a learning algorithm forthe binary perceptron. Network: Computation in NeuralSystems, 2(4):353–359, 1991.

Akinobu Fujii, Nobuhiro Saito, Kota Nakahira, and AkioIshiguro. Generation of an Adaptive Controller CPG fora Quadruped Robot with Neuromodulation Mechanism. InProceedings of the 2002 IEEE/RSJ International Confer-ence On Intelligent Robots and Systems, EPFL, Lausanne,Switzerland, pages 2619–2624, 2002.

Stefano Fusi. Computational models of long term plasticityand memory. arXiv preprint arXiv:1706.04946, 2017.

Wulfram Gerstner and Werner M Kistler. Mathematicalformulations of Hebbian learning. Biological cybernetics,87(5-6):404–415, 2002a.





18

Wulfram Gerstner and Werner M Kistler. Spiking neuronmodels: Single neurons, populations, plasticity. Cambridgeuniversity press, 2002b.

David E Goldberg and John H Holland. Genetic algorithmsand machine learning. Machine learning, 3(2):95–99, 1988.

Alex Graves, Greg Wayne, and Ivo Danihelka. Neural Turingmachines. arXiv preprint arXiv:1410.5401, 2014.

Alex Graves, Greg Wayne, Malcolm Reynolds, TimHarley, Ivo Danihelka, Agnieszka Grabska-Barwinska, Ser-gio Gomez Colmenarejo, Edward Grefenstette, Tiago Ra-malho, John Agapiou, et al. Hybrid computing using aneural network with dynamic external memory. Nature, 538(7626):471–476, 2016.

Klaus Greff, Rupesh Kumar Srivastava, Jan Koutnık, Bas RSteunebrink, and Jurgen Schmidhuber. LSTM: A searchspace odyssey. arXiv preprint arXiv:1503.04069, 2015.

Rasmus Boll Greve, Emil Juul Jacobsen, and Sebastian Risi.Evolving Neural Turing Machines for Reward-based Learn-ing. In Proceedings of the Genetic and EvolutionaryComputation Conference 2016, GECCO ’16, pages 117–124, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4206-3.

ST Grossberg. Studies of mind and brain: Neural principlesof learning, perception, development, cognition, and motorcontrol, volume 70. Springer Science & Business Media,2012.

Frederic Gruau and Darrell Whitley. Adding learning to thecellular development of neural networks: Evolution and theBaldwin effect. Evolutionary computation, 1(3):213–233,1993.

Q. Gu. Neuromodulatory transmitter systems in the cortexand their role in cortical plasticity. Neuroscience, 111(4):815–853, 2002.

Jimmy Gustafsson. Evolving neuromodulatory topologies forplasticity in video game playing. Master’s thesis, BlekingeInstitute of Technology, Faculty of Computing, Departmentof Software Engineering, 2016.

GVGAI. The General Video game AI Competition.http://www.gvgai.net, 2017. Accessed: 21/01/2017.

Nikolaus Hansen and Andreas Ostermeier. Completely deran-domized self-adaptation in evolution strategies. Evolution-ary computation, 9(2):159–195, 2001.

Bart LM Happel and Jacob MJ Murre. Design and evolutionof modular neural network architectures. Neural networks,7(6):985–1004, 1994.

Kyle Ira Harrington, Emmanuel Awa, Sylvain Cussat-Blanc,and Jordan Pollack. Robot coverage control by evolvedneuromodulation. In Neural Networks (IJCNN), The 2013International Joint Conference on, pages 1–8. IEEE, 2013.

Ronald M. Harris-Warrick and Eve Marder. Modulation ofneural networks for behavior. Annual Review of Neuro-science, 14:39–57, 1991.

Inman Harvey, Phil Husbands, Dave Cliff, Adrian Thompson,and Nick Jakobi. Evolutionary robotics: the Sussex ap-proach. Robotics and autonomous systems, 20(2):205–224,1997.

Michael E. Hasselmo. Neuromodulation and cortical function:modeling the physiological basis of behavior. BehaviouralBrain Research, 67:1–27, 1995.

Michael E. Hasselmo. Neuromodulation: acetylcholine andmemory consolidation. Trends in Cognitive Sciences, 3:351–359, September 1999.

Michael E. Hasselmo and Eric Schnell. Laminar selectivityof the cholinergic suppression of synaptic transmission inrat hippocampal region CA1: computational modeling andbrain slice physiology. Journal of Neuroscience, 14(6):3898–3914, 1994.

Jeff Hawkins and Sandra Blakeslee. On intelligence. Macmil-lan, 2007.

Donald Olding Hebb. The Organization of Behavior: ANeuropsychological Theory. Wiley, New York, 1949.

Takao K Hensch, Michela Fagiolini, Nobuko Mataga,Michael P Stryker, Steinunn Baekkeskov, and Shera FKash. Local GABA circuit control of experience-dependentplasticity in developing visual cortex. Science, 282(5393):1504–1508, 1998.

Geoffrey E. Hinton and Steven J. Nowlan. How Learning CanGuide Evolution. Complex Systems, 1:495–502, 1987.

Geoffrey E Hinton and David C Plaut. Using fast weights todeblur old memories. In Proceedings of the ninth annualconference of the Cognitive Science Society, pages 177–186,1987.

Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing thedimensionality of data with neural networks. Science, 313(5786):504–507, 2006.

Sepp Hochreiter and Jurgen Schmidhuber. Long Short-TermMemory. Neural computation, 9(8):1735–1780, 1997.

Thierry Hoinville, Cecilia Tapia Siles, and Patrick Henaff.Flexible and multistable pattern generation by evolvingconstrained plastic neurocontrollers. Adaptive Behavior, 19(3):187–207, 2011.

John H Holland. Adaptation in natural and artificial systems:an introductory analysis with applications to biology, con-trol, and artificial intelligence. U Michigan Press, 1975.

John H Holland and Judith S Reitman. Cognitive systemsbased on adaptive algorithms. ACM SIGART Bulletin, 63:49–49, 1977.

William D Hopkins, Jamie L Russell, and Jennifer Schaeffer.Chimpanzee intelligence is heritable. Current Biology, 24(14):1649–1652, 2014.

Gregory Hornby, Jordan Pollack, et al. Creating high-levelcomponents with a generative representation for body-brainevolution. Artificial life, 8(3):223–246, 2002.

Gerard Howard, Ella Gale, Larry Bull, Ben de Lacy Costello,and Andrew Adamatzky. Towards evolving spiking net-works with memristive synapses. In Artificial Life (ALIFE),2011 IEEE Symposium on, pages 14–21. IEEE, 2011.

Gerard Howard, Ella Gale, Larry Bull, Ben de Lacy Costello,and Andy Adamatzky. Evolution of plastic learning inspiking networks via memristive connections. IEEE Trans-actions on Evolutionary Computation, 16(5):711–729, 2012.

Gerard Howard, Larry Bull, Ben de Lacy Costello, Ella Gale,and Andrew Adamatzky. Evolving spiking networks with



http://www.gvgai.net

19

variable resistive memories. Evolutionary computation, 22(1):79–103, 2014.

C. L. Hull. Principles of behavior. New-Your: AppletonCentury, 1943.

Human-Brain-Project. Human Brain Project. https://www.humanbrainproject.eu/en/, accessed July 2017, 2017.

Phil Husbands, Tom Smith, Nick Jakobi, and Michael O’Shea.Better Living Through Chemistry: Evolving GasNet forRobots Control. Connection Science, 10:185–210, 1998.

Eugene M Izhikevich. Polychronization: computation withspikes. Neural computation, 18(2):245–282, 2006.

Eugene M Izhikevich. Solving the distal reward problemthrough linkage of STDP and dopamine signaling. Cerebralcortex, 17(10):2443–2452, 2007.

Herbert Jaeger and Harald Haas. Harnessing nonlinearity:Predicting chaotic systems and saving energy in wirelesscommunication. science, 304(5667):78–80, 2004.

Sung Hyun Jo, Ting Chang, Idongesit Ebong, Bhavitavya BBhadviya, Pinaki Mazumder, and Wei Lu. Nanoscale mem-ristor device as synapse in neuromorphic systems. Nanoletters, 10(4):1297–1301, 2010.

E. R. Kandel and L. Tauc. Heterosynaptic facilitation inneurones of the abdominal ganglion of Aplysia depilans.The Journal of Physiology, 181:1–27, 1965.

Eric R. Kandel. In Search of Memory - The Emergence of aNew Science of Mind. WW Norton & Company, New York,2007.

Paul S. Katz. Intrinsic and extrinsic neuromodulation ofmotor circuits. Current Opinion in Neurobiology, 5:799–808, 1995.

Paul S. Katz and William N. Frost. Intrinsic neuromodulation:altering neuronal circuits from within. Trends in Neuro-sciences, 19:54–61, 1996.

Paul S. Katz, Peter A. Getting, and William N. Frost. Dynamicneuromodulation of synaptic strength intrinsic to a centralpattern generator circuit. Nature, 367:729–731, 1994.

Gul Muhammad Khan and Julian Francis Miller. In search ofintelligence: evolving a developmental neuron capable oflearning. Connection Science, 26(4):297–333, 2014.

Gul Muhammad Khan, Julian F. Miller, and David M. Hal-liday. Coevolution of neuro-developmental programs thatplay checkers. In Springer, editor, Proceedings of 8thInternational Conference on Evolvable Systems (ICES):From Biology to Hardware, volume 5216, pages 352–361,2008.

Gul Muhammad Khan, Julian F. Miller, and David M. Hal-liday. Evolution of Cartesian Genetic Programs for De-velopment of Learning Neural Architecture. EvolutionaryComputation, 19(3):469–523, 2011a.

Maryam Mahsal Khan, Gul Muhammad Khan, and Julian F.Miller. Developmental Plasticity in Cartesian GeneticProgramming based Neural Networks. In ICINCO 2011- Proceedings of the 8th International Conference on In-formatics in Control, Automation and Robotics, Volume1, Noordwijkerhout, The Netherlands, 28 - 31 July, 2011,pages 449–458, 2011b.

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Ve-ness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan,John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska,Demis Hassabis, Claudia Clopath, Dharshan Kumaran, andRaia Hadsell. Overcoming catastrophic forgetting in neuralnetworks. Proceedings of the National Academy of Sciences,2017.

Tomomi Kiyota. Neurogenesis and Brain Repair. In Neuroim-mune Pharmacology, pages 575–597. Springer, 2017.

Hope Klug and Michael B Bonsall. Life history and theevolution of parental care. Evolution, 64(3):823–835, 2010.

Teuvo Kohonen. Self-organized formation of topologicallycorrect feature maps. Biological cybernetics, 43(1):59–69,1982.

Teuvo Kohonen. The Self-Organizing Map. Proceedings ofthe IEEE, 78(9):1464–1480, 1990.

Bryan Kolb. Brain development, plasticity, and behavior.American Psychologist, 44(9):1203, 1989.

Bryan Kolb and Robbin Gibb. Brain plasticity and behaviourin the developing brain. Journal of the Canadian Academyof Child & Adolescent Psychiatry, 20(4), 2011.

Toshiyuki Kondo. Evolutionary design and behaviour analysisof neuromodulatory neural networks for mobile robots con-trol. Applied Soft Computing, 7(1):189–202, January 2007.

Jan Koutnık, Jurgen Schmidhuber, and Faustino Gomez.Evolving deep unsupervised convolutional networks forvision-based reinforcement learning. In Proceedings ofthe 2014 Annual Conference on Genetic and EvolutionaryComputation, pages 541–548. ACM, 2014.

John R Koza. Genetic programming: on the programming ofcomputers by means of natural selection, volume 1. MITpress, 1992.

Jeffrey L Krichmar. The neuromodulatory system: a frame-work for survival and adaptive behavior in a challengingworld. Adaptive Behavior, 16(6):385–399, 2008.

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.Imagenet classification with deep convolutional neural net-works. In Advances in neural information processingsystems, pages 1097–1105, 2012.

Dharshan Kumaran, Demis Hassabis, and James L McClel-land. What learning systems do intelligent agents need?complementary learning systems theory updated. Trends inCognitive Sciences, 20(7):512–534, 2016.

Irving Kupfermann. Cellular Neurobiology: Neuromodulation.Science, 236:863, 1987.

SIMO KϕPPE. The psychology of the neuron: Freud, Cajaland Golgi. Scandinavian journal of psychology, 24(1):1–12,1983.

Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenen-baum. Human-level concept learning through probabilisticprogram induction. Science, 350(6266):1332–1338, 2015.

Alexander Lalejini and Charles Ofria. The EvolutionaryOrigins of Phenotypic Plasticity. In Proceedings of theArtificial Life Conference, 2016.

Raphael Lamprecht and Joseph LeDoux. Structural plasticityand memory. Nature Reviews Neuroscience, 5(1):45–54,2004.

https://www.humanbrainproject.eu/en/

https://www.humanbrainproject.eu/en/

20

Christopher G Langton. Artificial life: An overview. Mit Press,1997.

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deeplearning. Nature, 521(7553):436–444, 2015.

Joseph E LeDoux. Synaptic self: How our brains become whowe are. Penguin, 2003.

Chul Min Lee and Shrikanth S Narayanan. Toward detectingemotions in spoken dialogs. IEEE transactions on speechand audio processing, 13(2):293–303, 2005.

Joel Lehman and Risto Miikkulainen. Overcoming Decep-tion in Evolution of Cognitive Behaviors. In Proceedingsof the Genetic and Evolutionary Computation Conference(GECCO 2014), Vancouver, BC, Canada, July 2014.

Joel Lehman and Kenneth O Stanley. Exploiting Open-Endedness to Solve Problems Through the Search for Nov-elty. In ALIFE, pages 329–336, 2008.

Joel Lehman and Kenneth O Stanley. Abandoning objectives:Evolution through the search for novelty alone. Evolution-ary computation, 19(2):189–223, 2011.

Joel Lehman and Kenneth O Stanley. Evolvability is in-evitable: increasing evolvability without the pressure toadapt. PloS one, 8(4):e62186, 2013.

Joseph P Levy and Dimitrios Bairaktaris. Connectionist dual-weight architectures. Language and Cognitive Processes,10(3-4):265–283, 1995.

Long-Ji Lin. Reinforcement Learning for Robots Using Neu-ral Networks. PhD thesis, School of Computer Science,Carnegie Mellon University, 1993.

Hanxiao Liu, Karen Simonyan, Oriol Vinyals, ChrisanthaFernando, and Koray Kavukcuoglu. Hierarchical repre-sentations for efficient architecture search. arXiv preprintarXiv:1711.00436, 2017.

Ilya Loshchilov and Frank Hutter. CMA-ES for Hyperparam-eter Optimization of Deep Neural Networks. arXiv preprintarXiv:1604.07269, 2016.

Benno Luders, Mikkel Schlager, and Sebastian Risi. ContinualLearning through Evolvable Neural Turing Machines. InProceedings of the NIPS 2016 Workshop on ContinualLearning and Deep Networks (CLDL 2016), 2016.

Benno Luders, Mikkel Schlager, Aleksandra Korach, andSebastian Risi. Continual and One-Shot Learning throughNeural Networks with Dynamic External Memory. InProceedings of the 20th European Conference on the Ap-plications of Evolutionary Computation (EvoApplications2017), 2017.

Wolfgang Maass and Christopher M Bishop. Pulsed neuralnetworks. MIT press, 2001.

Michail Maniadakis and Panos Trahanias. Modelling brainemergent behaviours through coevolution of neural agents.Neural Networks, 19(5):705–720, 2006.

Eve Marder. Neural modulation: Following your own rhythm.Current Biology, 6(2):119–121, 1996.

Eve Marder and Vatsala Thirumalai. Cellular, synaptic andnetwork effects of neuromodulation. Neural Networks, 15:479–493, 2002.

Henry Markram. The Blue Brain Project. Nature ReviewsNeuroscience, 7(2):153–160, 2006.

Henry Markram, Joachim Lubke, Michael Frotscher, and BertSakmann. Regulation of synaptic efficacy by coincidence ofpostsynaptic APs and EPSPs. Science, 275(5297):213–215,1997.

Claudio Mattiussi and Dario Floreano. Analog genetic en-coding for the evolution of circuits and networks. IEEETransactions on evolutionary computation, 11(5):596–607,2007.

Giles Mayley. Landscapes, learning costs, and genetic assim-ilation. Evolutionary Computation, 4(3):213–234, 1996.

Giles Mayley. Guiding or hiding: Explorations into the effectsof learning on the rate of evolution. In Proceedings of theFourth European Conference on Artificial Life, volume 97,pages 135–144, 1997.

James L McClelland, Bruce L McNaughton, and Randall CO’reilly. Why there are complementary learning systemsin the hippocampus and neocortex: insights from the suc-cesses and failures of connectionist models of learning andmemory. Psychological review, 102(3):419, 1995.

Paul McQuesten and Risto Miikkulainen. Culling and Teach-ing in Neuro-Evolution. In ICGA, pages 760–767, 1997.

Yan Meng, Yaochu Jin, and Jun Yin. Modeling activity-dependent plasticity in BCM spiking neural networks withapplication to human behavior recognition. IEEE transac-tions on neural networks, 22(12):1952–1966, 2011.

Frederic Mery and James G Burns. Behavioural plasticity: aninteraction between evolution and experience. EvolutionaryEcology, 24(3):571–583, 2010.

Michael M Merzenich, Randall J Nelson, Michael P Stryker,Max S Cynader, Axel Schoppmann, and John M Zook.Somatosensory cortical map changes following digit ampu-tation in adult monkeys. Journal of comparative neurology,224(4):591–605, 1984.

Zbigniew Michalewicz. GAs: What are they? In Geneticalgorithms+ data structures= evolution programs, pages13–30. Springer, 1994.

Ryszard S Michalski, Jaime G Carbonell, and Tom MMitchell. Machine learning: An artificial intelligence ap-proach. Springer Science & Business Media, 2013.

Thomas Miconi. The road to everywhere: Evolution, com-plexity and progress in natural and artificial systems. PhDthesis, University of Birmingham, 2008.

Microsoft. Project malmo. https://github.com/Microsoft/malmo, accessed Nov 2017, 2017.

Risto Miikkulainen, Jason Liang, Elliot Meyerson, AdityaRawal, Dan Fink, Olivier Francon, Bala Raju, ArshakNavruzyan, Nigel Duffy, and Babak Hodjat. Evolving DeepNeural Networks. arXiv preprint arXiv:1703.00548, 2017.

Julian F Miller. Neuro-Centric and Holocentric Approaches tothe Evolution of Developmental Neural Networks. In Grow-ing Adaptive Machines, pages 227–249. Springer, 2014.

Kenneth D. Miller and David J. C. Mackay. The Role ofConstraints in Hebbian Learning. Neural Computation, 6:100–126, 1994.

Ian Millington and John Funge. Artificial intelligence forgames. CRC Press, 2016.



https://github.com/Microsoft/malmo

https://github.com/Microsoft/malmo


21

Don Monroe. Neuromorphic computing gets ready for the(really) big time. Communications of the ACM, 57(6):13–15, 2014.

David S Moore. The Dependent Gene: The Fallacy of “Naturevs. Nurture”. 2003.

Jean-Baptiste Mouret and Paul Tonelli. Artificial evolution ofplastic neural networks: a few key concepts. In GrowingAdaptive Machines, pages 251–261. Springer, 2014.

Yael Niv, Daphna Joel, Isaac Meilijson, and Eytan Ruppin.Evolution of Reinforcement Learning in Uncertain Envi-ronments: A Simple Explanation for Complex ForagingBehaviours. Adaptive Behavior, 10(1):5–24, 2002.

B Nogueira, Y Lenon, C Eduardo, Creto A Vidal, andJoaquim B Cavalcante Neto. Evolving plastic neuromod-ulated networks for behavior emergence of autonomousvirtual characters. In Advances in Artificial Life, ECAL2013: Proceedings of the Twelfth European Conference onthe Synthesis and Simulation of Living Systems. MIT Press,Cambridge, MA, pages 577–584, 2013.

Stefano Nolfi and Dario Floreano. Learning and evolution.Autonomous robots, 7(1):89–113, 1999.

Stefano Nolfi and Dario Floreano. Evolutionary robotics,2000.

Stefano Nolfi and Domenico Parisi. Auto-teaching: networksthat develop their own teaching input. In Free University ofBrussels. Citeseer, 1993.

Stefano Nolfi and Domenico Parisi. Learning to adapt tochanging environments in evolving neural networks. Adap-tive behavior, 5(1):75–98, 1996.

Mohammad Sadegh Norouzzadeh and Jeff Clune. Neuromod-ulation Improves the Evolution of Forward Models. InProceedings of the Genetic and Evolutionary ComputationConference, 2016.

Theo Offerman and Joep Sonnemans. Learning by experienceand learning by imitating successful others. Journal ofeconomic behavior & organization, 34(4):559–575, 1998.

Erkki Oja. A Simplified Neuron Model as a Principal Com-ponent Analyzer. Journal of Mathematical Biology, 15(3):267–273, November 1982.

OpenAI. OpenAI Universe. https://openai.com/blog/universe/,2017. Accessed: 21/01/2017.

Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic.Learning and transferring mid-level image representationsusing convolutional neural networks. In Proceedings of theIEEE conference on computer vision and pattern recogni-tion, pages 1717–1724, 2014.

Jeff Orchard and Lin Wang. The evolution of a generalizedneural learning rule. In Neural Networks (IJCNN), 2016International Joint Conference on, pages 4688–4694. IEEE,2016.

Jeanne Ellis Ormrod and Kevin M Davis. Human learning.Merrill, 2004.

Ingo Paenke. Dynamics of Evolution and Learning. PhD the-sis, Universitat Karlsruhe (TH), Fakultat fur Wirtschaftwsis-senschaften, 2008.

Ingo Paenke, Bernhard Sendhoff, and Tadeusz J Kawecki.Influence of plasticity and learning on evolution underdirectional selection. The American Naturalist, 170(2):E47–E58, 2007.

Ingo Paenke, T Kawecki, and Bernhard Sendhoff. The influ-ence of learning on evolution: A mathematical framework.Artificial Life, 15(2):227–245, 2009.

Sinno Jialin Pan and Qiang Yang. A survey on transfer learn-ing. IEEE Transactions on knowledge and data engineering,22(10):1345–1359, 2010.

Alvaro Pascual-Leone, Amir Amedi, Felipe Fregni, and Lotfi BMerabet. The plastic human brain cortex. Annu. Rev.Neurosci., 28:377–401, 2005.

I. P. Pavlov. Conditioned reflexes. Oxford : Oxford UniversityPress, 1927.

Cengiz Pehlevan, Tao Hu, and Dmitri B Chklovskii. AHebbian/anti-Hebbian neural network for linear subspacelearning: A derivation from multidimensional scaling ofstreaming data. Neural computation, 2015.

C. M. A. Pennartz. Reinforcement Learning by HebbianSynapses with Adaptive Threshold. Neuroscience, 81(2):303–319, 1997.

Massimo Pigliucci. Is evolvability evolvable? Nature ReviewsGenetics, 9(1):75–82, 2008.

Justin Pugh, Andrea Soltoggio, and Kenneth O. Stanley. Real-time Hebbian Learning from Autoencoder Features forControl Tasks. In Proceedings of the Fourteenth Interna-tional Conference on the Synthesis and Simulation of LivingSystems, 2014.

Josef P. Rauschecker and Wolf Singer. The effects of earlyvisual experience on the cat’s visual cortex and their possi-ble explanation by Hebb synapses. Journal of Physiology,310:215–239, 1981.

Aditya Rawal and Risto Miikkulainen. Evolving Deep LSTM-based Memory Networks Using an Information Maximiza-tion Objective. In Proceedings of the Genetic and Evolu-tionary Computation Conference 2016, GECCO ’16, pages501–508, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4206-3.

E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, Q. Le,and A. Kurakin. Large-Scale Evolution of Image Classifiers.ArXiv e-prints, March 2017.

Scott Reed and Nando de Freitas. Neural programmer-interpreters. arXiv preprint arXiv:1511.06279, 2015.

Robert A Rescorla. Pavlovian Second-Order Conditioning(Psychology Revivals): Studies in Associative Learning.Psychology Press, 2014.

Sebastian Risi and Kenneth O. Stanley. Indirectly EncodingNeural Plasticity as a Pattern of Local Rules. In Proceedingsof the 11th International Conference on Simulation ofAdaptive Behavior (SAB 2010), New York, 2010. Springer.

Sebastian Risi and Kenneth O Stanley. A unified approach toevolving plasticity and neural geometry. In Neural Networks(IJCNN), The 2012 International Joint Conference on, pages1–8. IEEE, 2012.

https://openai.com/blog/universe/


22

Sebastian Risi and Kenneth O. Stanley. Guided Self-Organization in Indirectly Encoded and Evolving Topo-graphic Maps. In Proceedings of the Genetic and Evolution-ary Computation Conference (GECCO-2014). New York,NY: ACM (8 pages), 2014.

Sebastian Risi, Sandy D Vanderbleek, Charles E Hughes,and Kenneth O Stanley. How novelty search escapes thedeceptive trap of learning to learn. In Proceedings ofthe 11th Annual conference on Genetic and evolutionarycomputation, pages 153–160. ACM, 2009.

Sebastian Risi, Charles E Hughes, and Kenneth O Stanley.Evolving plastic neural networks with novelty search. Adap-tive Behavior, 18(6):470–491, 2010.

Alan Roberts, Deborah Conte, Mike Hull, Robert Merrison-Hort, Abul Kalam al Azad, Edgar Buhl, Roman Borisyuk,and Stephen R Soffe. Can simple rules control develop-ment of a pioneer vertebrate neuronal network generatingbehavior? Journal of Neuroscience, 34(2):608–621, 2014.

Anthony Robins. Catastrophic forgetting, rehearsal, andpseudorehearsal. Connection Science: Journal of NeuralComputing, Artificial Intelligence and Cognitive Research,7(123-146), 1995.

Derek Roff. Evolution of life histories: theory and analysis.Springer Science & Business Media, 1993.

Edmund T Rolls and Simon M Stringer. On the design ofneural networks in the brain by genetic evolution. Progressin Neurobiology, 61(6):557–579, 2000.

David E Rumelhart, Geoffrey E Hinton, and Ronald JWilliams. Learning representations by back-propagatingerrors. Cognitive modeling, 5(3):1, 1988.

Stuart Russell and Peter Norvig. Artificial intelligence: amodern approach. Pearson; 3 edition (5 Aug. 2013), 2013.

Scott J Russo, David M Dietz, Dani Dumitriu, John H Mor-rison, Robert C Malenka, and Eric J Nestler. The addictedsynapse: mechanisms of synaptic and structural plasticityin nucleus accumbens. Trends in neurosciences, 33(6):267–276, 2010.

Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hu-bert Soyer, James Kirkpatrick, Koray Kavukcuoglu, RazvanPascanu, and Raia Hadsell. Progressive Neural Networks.arXiv preprint arXiv:1606.04671, 2016.

Eduardo Sanchez, Daniel Mange, Moshe Sipper, MarcoTomassini, Andres Perez-Uribe, and Andre Stauffer. Phy-logeny, ontogeny, and epigenesis: Three sources of biolog-ical inspiration for softening hardware. In InternationalConference on Evolvable Systems, pages 33–54. Springer,1996.

Jurgen Schmidhuber. Curious model-building control systems.In Neural Networks, 1991. 1991 IEEE International JointConference on, pages 1458–1463. IEEE, 1991.

Jurgen Schmidhuber. Developmental robotics, optimal artifi-cial curiosity, creativity, music, and the fine arts. ConnectionScience, 18(2):173–187, 2006.

Jurgen Schmidhuber. Deep learning in neural networks: Anoverview. Neural Networks, 61:85–117, 2015.

Jurgen Schmidhuber, Daan Wierstra, Matteo Gagliolo, andFaustino Gomez. Training recurrent networks by evolino.Neural computation, 19(3):757–779, 2007.

Christiane Schreiweis, Ulrich Bornschein, Eric Burguiere,Cemil Kerimoglu, Sven Schreiter, Michael Dannemann,Shubhi Goyal, Ellis Rea, Catherine A French, RathiPuliyadi, et al. Humanized Foxp2 accelerates learningby enhancing transitions from declarative to proceduralperformance. Proceedings of the National Academy ofSciences, 111(39):14253–14258, 2014.

Henning Schroll and Fred H Hamker. Computational modelsof basal-ganglia pathway functions: focus on functionalneuroanatomy. Frontiers in systems neuroscience, 7, 2015.

Wolfram Schultz. Predictive Reward Signal of DopamineNeurons. Journal of Neurophysiology, 80:1–27, 1998.

Wolfram Schultz, Paul Apicella, and Tomas Ljungberg. Re-sponses of Monkey Dopamine Neurons to Reward andConditioned Stimuli during Successive Steps of Learninga Delayed Response Task. The Journal of Neuroscience,13:900–913, 1993.

Wolfram Schultz, Peter Dayan, and P. Read Montague. ANeural Substrate for Prediction and Reward. Science, 275:1593–1598, 1997.

Fernando Silva, Paulo Urbano, and Anders Lyhne Christensen.Adaptation of robot behaviour through online evolution andneuromodulated learning. In Ibero-American Conference onArtificial Intelligence, pages 300–309. Springer, 2012a.

Fernando Silva, Paulo Urbano, and Anders Lyhne Chris-tensen. Continuous adaptation of robot behaviour throughonline evolution and neuromodulated learning. In FifthInternational Workshop on Evolutionary and ReinforcementLearning for Autonomous Robot Systems (ERLARS 2012) inconjunction with the 20th European Conference on ArtificialIntelligence (ECAI 2012). Montpellier. Citeseer, 2012b.

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Lau-rent Sifre, George Van Den Driessche, Julian Schrittwieser,Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot,et al. Mastering the game of Go with deep neural networksand tree search. Nature, 529(7587):484–489, 2016.

Karen Simonyan and Andrew Zisserman. Very deep convo-lutional networks for large-scale image recognition. arXivpreprint arXiv:1409.1556, 2014.

Karl Sims. Artificial evolution for computer graphics, vol-ume 25. ACM, 1991.

Karl Sims. Evolving 3D morphology and behavior by com-petition. Artificial life, 1(4):353–372, 1994.

M. Sipper, E. Sanchez, D. Mange, M. Tomassini, A. Perez-Uribe, and A. Stauffer. A phylogenetic, ontogenetic, andepigenetic view of bio-inspired hardware system. Evolution-ary Computation, IEEE Transactions on, 1(1):83–97, April1997.

B. F. Skinner. Science and Human Behavior. New York,MacMillan, 1953.

Burrhus Frederic Skinner. The behavior of organisms: Anexperimental analysis. New York, London, D. Appleton-Century Company, Incorporated, 1938.

John Maynard Smith. When learning guides evolution. Nature,329(6142):761–762, 1986.



23

T.M.C. Smith, P. Husbands, A. Philippides, and MichaelO’Shea. Neuronal Plasticity and Temporal Adaptivity:GasNet Robot Control Networks. Adaptive Behavior, 10:161–183, 2002.

Tom Smith. The evolvability of artificial neural networks forrobot control. PhD thesis, University of Sussex, October2002.

Andrea Soltoggio. Neural Plasticity and Minimal Topologiesfor Reward-based Learning Problems. In Proceeding of the8th International Conference on Hybrid Intelligent Systems(HIS2008), 10-12 September, Barcelona, Spain, 2008a.

Andrea Soltoggio. Neuromodulation Increases Decision Speedin Dynamic Environments. In Proceedings of the 8th Inter-national Conference on Epigenetic Robotics, Southampton,July 2008, 2008b.

Andrea Soltoggio. Evolutionary and Computational Advan-tages of Neuromodulated Plasticity. PhD thesis, School ofComputer Science, The University of Birmingham, 2008c.

Andrea Soltoggio. Short-term plasticity as cause-effect hy-pothesis testing in distal reward learing. Biological Cyber-netics, 109(1):75–94, 2015.

Andrea Soltoggio and Ben Jones. Novelty of behaviour as abasis for the neuro-evolution of operant reward learning. InProceedings of the 11th Annual conference on Genetic andevolutionary computation, pages 169–176. ACM, 2009.

Andrea Soltoggio and Kenneth O. Stanley. From ModulatedHebbian Plasticity to Simple Behavior Learning throughNoise and Weight Saturation. Neural Networks, 34:28–41,October 2012.

Andrea Soltoggio, Peter Durr, Claudio Mattiussi, and DarioFloreano. Evolving Neuromodulatory Topologies for Rein-forcement Learning-like Problems. In Proceedings of theIEEE Congress on Evolutionary Computation, CEC 2007,2007.

Andrea Soltoggio, John A. Bullinaria, Claudio Mattiussi, Pe-ter Durr, and Dario Floreano. Evolutionary Advantagesof Neuromodulated Plasticity in Dynamic, Reward-basedScenarios. In Artificial Life XI: Proceedings of the EleventhInternational Conference on the Simulation and Synthesisof Living Systems. MIT Press, 2008.

Olaf Sporns and William H. Alexander. Neuromodulationin a learning robot: interactions between neural plasticityand behavior. In Proceedings of the International JointConference on Neural Networks, volume 4, pages 2789–2794, 2003.

Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, IlyaSutskever, and Ruslan Salakhutdinov. Dropout: a simpleway to prevent neural networks from overfitting. Journal ofMachine Learning Research, 15(1):1929–1958, 2014.

J. E. R. Staddon. Adaptive Behaviour and Learning. Cam-bridge University Press, 1983.

Kenneth O Stanley. Compositional pattern producing net-works: A novel abstraction of development. Genetic pro-gramming and evolvable machines, 8(2):131–162, 2007.

Kenneth O Stanley and Joel Lehman. Why Greatness CannotBe Planned. Springer Science Business Media., 10:978–3,2015.

Kenneth O Stanley and Risto Miikkulainen. Evolving neuralnetworks through augmenting topologies. Evolutionarycomputation, 10(2):99–127, 2002.

Kenneth O Stanley and Risto Miikkulainen. A taxonomy forartificial embryogeny. Artificial Life, 9(2):93–130, 2003a.

Kenneth O. Stanley and Risto Miikkulainen. Evolving Adap-tive Neural Networks with and without Adaptive Synapses.In Bart Rylander, editor, Genetic and Evolutionary Com-putation Conference Late Breaking Papers, pages 275–282,Chicago, USA, 12–16July 2003b.

Kenneth O Stanley, David B D’Ambrosio, and Jason Gauci.A hypercube-based encoding for evolving large-scale neuralnetworks. Artificial life, 15(2):185–212, 2009.

Luc Steels. The artificial life roots of artificial intelligence.Artificial life, 1(1 2):75–110, 1993.

James V Stone. Distributed representations accelerate evolu-tion of adaptive behaviours. PLoS computational biology,3(8):e147, 2007.

R. E. Suri. TD models of reward predictive responses indopamine neurons. Neural Networks, 15:523–533, 2002.

R. E. Suri, J. Bargas, and M. A. Arbib. Modeling functionsof striatal dopamine modulation in learning and planning.Neuroscience, 103(1):65–85, 2001.

Richard S. Sutton and Andrew G. Barto. ReinforcementLearning: An Introduction. MIT Press, Cambridge, MA,USA, 1998.

Tim Taylor, Mark Bedau, Alastair Channon, David Ackley,Wolfgang Banzhaf, Guillaume Beslon, Emily Dolson, TomFroese, Simon Hickinbotham, Takashi Ikegami, et al. Open-ended evolution: perspectives from the OEE workshop inYork. Artificial life, 2016.

Marc Tessier-Lavigne and Corey S Goodman. The molecularbiology of axon guidance. Science, 274(5290):1123, 1996.

Edward Lee Thorndike. Animal Intelligence. New York:Macmillan, 1911.

Sebastian Thrun and Tom M Mitchell. Lifelong robot learning.Robotics and autonomous systems, 15(1-2):25–46, 1995.

Sebastian Thrun and Joseph O’Sullivan. Discovering structurein multiple learning tasks: The TC algorithm. In ICML,volume 96, pages 489–497, 1996.

Sebastian Thrun and Lorien Pratt. Learning to learn. SpringerScience & Business Media, 2012.

Paul Tonelli and Jean-Baptiste Mouret. Using a map-basedencoding to evolve plastic neural networks. In Evolving andAdaptive Intelligent Systems (EAIS), 2011 IEEE Workshopon, pages 9–16. IEEE, 2011.

Paul Tonelli and Jean-Baptiste Mouret. On the relationshipsbetween generative encodings, regularity, and learning abil-ities when evolving plastic artificial neural networks. PloSone, 8(11):e79138, 2013.

J. Urzelai and D Floreano. Evolutionary Robotics: Copingwith Environmental Change. In Genetic and EvolutionaryComputation Conference (GECCO’2000), 2000.

Roby Velez and Jeff Clune. Diffusion-based neuromodula-tion can eliminate catastrophic forgetting in simple neuralnetworks. CoRR, abs/1705.07241, 2017.

24

Santosh S Venkatesh. Directed drift: A new linear thresholdalgorithm for learning binary weights on-line. Journal ofComputer and System Sciences, 46(2):198–217, 1993.

Julien Vitay and Fred H. Hamker. A computational model ofbasal ganglia and its role in memory retrieval in rewardedvisual memory tasks. Frontiers in Computational Neuro-science, 2010.

Gunter P Wagner and Lee Altenberg. Perspective: complexadaptations and the evolution of evolvability. Evolution, 50(3):967–976, 1996.

Edgar T. Walters and John H. Byrne. Associative Conditioningof Single Sensory Neurons Suggests a Cellular Mechanismfor Learning. Science, 219:405–408, 1983.

Bernard Widrow and Michael A Lehr. 30 years of adaptiveneural networks: perceptron, madaline, and backpropaga-tion. Proceedings of the IEEE, 78(9):1415–1442, 1990.

Bernard Widrow, Marcian E Hoff, et al. Adaptive switchingcircuits. In IRE WESCON convention record, volume 4,pages 96–104. New York, 1960.

David Willshaw and Peter Dayan. Optimal plasticity frommatrix memories: What goes up must come down. NeuralComputation, 2(1):85–93, 1990.

Brian M. Yamauchi and Randall D. Beer. Sequential Behaviorand Learning in Evolved Dynamical Neural Networks.Adaptive Behavior, 2(3), 1994.

Xin Yao. Evolving Artificial Neural Networks. Evolution-ary Computation, IEEE Transactions on, 87(9):1423–1447,September 1999.

Xin Yao and Yong Liu. A new evolutionary system forevolving artificial neural networks. IEEE transactions onneural networks, 8(3):694–713, 1997.

Jason Yoder and Larry Yaeger. Evaluating Topological Modelsof Neuromodulation in Polyworld. In Artificial Life XIV,MIT Press, Cambridge, MA, 2014.

Steven R Young, Derek C Rose, Thomas P Karnowski, Seung-Hwan Lim, and Robert M Patton. Optimizing deep learninghyper-parameters through an evolutionary algorithm. InProceedings of the Workshop on Machine Learning in High-Performance Computing Environments, page 4. ACM, 2015.

Tom Ziemke and Mikael Thieme. Neuromodulation ofReactive Sensorimotor Mappings as Short-Term MemoryMechanism in Delayed Response Tasks. Adaptive Behavior,10:185–199, 2002.

born to learn: the inspiration, progress, and future of ... · born to learn: the inspiration,...

Documents