mutation operator evolution for ea-based neural networks
DESCRIPTION
Mutation Operator Evolution for EA-Based Neural Networks. By Ryan Meuth. Environment. Reward. State. Agent. Action. State Value Estimate. Action Policy. Reinforcement Learning. Reinforcement Learning. Good for On-Line learning where little is known about environment - PowerPoint PPT PresentationTRANSCRIPT
Mutation Operator Evolution Mutation Operator Evolution for EA-Based Neural for EA-Based Neural
NetworksNetworks
By Ryan MeuthBy Ryan Meuth
Reinforcement LearningReinforcement Learning
State
Action
Reward
Environment
Agent
State Value Estimate
Action Policy
Reinforcement LearningReinforcement Learning
Good for On-Line learning where little is known Good for On-Line learning where little is known about environmentabout environment
Easy to Implement in Discrete EnvironmentsEasy to Implement in Discrete Environments Value estimate can be stored for each stateValue estimate can be stored for each state In infinite time, optimal policy guaranteed.In infinite time, optimal policy guaranteed.
Hard to Implement in Continuous EnvironmentsHard to Implement in Continuous Environments Infinite States! Must estimate Value Function.Infinite States! Must estimate Value Function. Neural Networks Can be used for function Neural Networks Can be used for function
approximation.approximation.
Neural Network OverviewNeural Network Overview
Feed Forward Neural NetworkFeed Forward Neural Network Based on biological theories of neuron operationBased on biological theories of neuron operation
Feed-Forward Neural NetworkFeed-Forward Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Neural Network OverviewNeural Network Overview
Traditionally used with Error Back-Traditionally used with Error Back-PropagationPropagation BP uses Samples to Generalize to ProblemBP uses Samples to Generalize to Problem Few “Unsupervised” Learning MethodsFew “Unsupervised” Learning Methods
Problems with No Samples: On-Line Problems with No Samples: On-Line LearningLearning
Conjugate Reinforcement Back Conjugate Reinforcement Back PropagationPropagation
EA-NNEA-NN
Both Supervised and Unsupervised Both Supervised and Unsupervised Learning Method.Learning Method.Uses weight set as genome of individualUses weight set as genome of individualFitness Function is Mean-Squared Error Fitness Function is Mean-Squared Error over target function.over target function.Mutation Operator is a sample from a Mutation Operator is a sample from a Gaussian Distribution.Gaussian Distribution. Possible that mutation operator might not be Possible that mutation operator might not be
best.best.
Uh… Why?Uh… Why?
Could improve EA-NN efficiencyCould improve EA-NN efficiency Faster Online LearningFaster Online Learning Revamped tool for Reinforcment LearningRevamped tool for Reinforcment Learning Smarter Robots.Smarter Robots.
Why Use an EA?Why Use an EA? Knowledge – IndependentKnowledge – Independent
Experimental ImplementationExperimental Implementation
First Tier – Genetic ProgrammingFirst Tier – Genetic Programming Individual is Parse-tree representing Mutation Individual is Parse-tree representing Mutation
operatoroperator Fitness is Inverse of sum of MSE’s from EA TestbedFitness is Inverse of sum of MSE’s from EA Testbed
Second Tier – EA TestbedSecond Tier – EA Testbed 4 EA’s, spanning 2 classes of problems4 EA’s, spanning 2 classes of problems 2 Feed-Forward Non-Linear Approximations2 Feed-Forward Non-Linear Approximations
1 High-Order, 1 Low-Order1 High-Order, 1 Low-Order 2 Recurrent Time Series Predictions2 Recurrent Time Series Predictions
1 Will be Time-Delayed, 1 Not Time-Delayed1 Will be Time-Delayed, 1 Not Time-Delayed
GP ImplementationGP Implementation
Functional Set: {+,-,*,/}Functional Set: {+,-,*,/}Terminal Set:Terminal Set:
Weight to be ModifiedWeight to be Modified Random ConstantRandom Constant Uniform Random VariableUniform Random Variable
Over-Selection: 80% of Parents from top 32% Over-Selection: 80% of Parents from top 32% Rank-Based SurvivalRank-Based SurvivalInitialized by Grow Method (Max Depth of 8)Initialized by Grow Method (Max Depth of 8)Fitness: 1000/(AvgMSE) – num_nodesFitness: 1000/(AvgMSE) – num_nodesP(Recomb) = 0.5; P(Mutation) = 0.5;P(Recomb) = 0.5; P(Mutation) = 0.5;Repair FunctionRepair Function5 runs, 100 generations each.5 runs, 100 generations each.Steady State: Population of 1000 individuals, 20 children per Steady State: Population of 1000 individuals, 20 children per generation.generation.
EA-NN ImplementationEA-NN Implementation
Recombination: Multi-Point CrossoverRecombination: Multi-Point Crossover
Mutation: Provided by GPMutation: Provided by GP
Fitness: MSE over test function (minimize)Fitness: MSE over test function (minimize)
P(Recomb) = 0.5; P(Mutation) = 0.5;P(Recomb) = 0.5; P(Mutation) = 0.5;
Non-Generational: Population of 10 Non-Generational: Population of 10 individuals, 10 children per generationindividuals, 10 children per generation
50 Runs of 50 Generations.50 Runs of 50 Generations.
ResultsResults
This is where results would go.This is where results would go.
Single Uniform Random Variable: ~380Single Uniform Random Variable: ~380
Observed Individuals: ~600Observed Individuals: ~600
Improvement! Just have to Wait and Improvement! Just have to Wait and See…See…
ConclusionsConclusions
I don’t know anything yet.I don’t know anything yet.
Questions?Questions?
Thank You!Thank You!