Download - Coevolutionary Automated Software Correction
Coevolutionary Automated Software Correction Coevolutionary Automated Software Correction
Josh WilkersonJosh Wilkerson
PhD Candidate in Computer SciencePhD Candidate in Computer Science
Missouri S&TMissouri S&T
Page 2Technical Background
Evolutionary Algorithms (EAs)– Subfield of evolutionary computation (in artificial intelligence)
– Based on biological evolution
– Uses mutation, reproduction, and selection
– Population composed of candidate solutions
– Needed:
• Solution representation
• Fitness function
– Applicable to a wide variety of fields
– Makes no assumptions about the problem space (ideally)
Page 3Technical Background
EA Operation– Start with an initial population
– Each generation
• Create new individuals and evaluate them
• Population competition (survival of the fittest)
– Mutation and reproduction
• Explore the problem space
• Bring in new genetic material
– Selection
• Applies pressure to individuals
• More fit individuals are selected for mutation and reproduction more often
Page 4Technical Background
Genetic Programming– Type of EA
– Evolves tree representations
– E.g., computer program parse trees
Coevolution– Extension of standard EA
– Fitness dependency between individuals
– Dependency can be either cooperative or competitive
– CASC system uses competitive coevolution
– Evolutionary arms-race
Page 5High Level View of CASC
Page 6CASC Evolutionary Model
Page 7CASC Evolutionary Model
Page 8CASC Evolutionary Model
Page 9CASC Evolutionary Model
Page 10Reproduction Phase: Programs
Randomly select a genetic operation to perform
– Probability of operation selection is configurable
Perform operation, generate new program(s)
Add new individuals to population
Repeat until specified number of individuals has been created
Page 11Reproduction Phase: Programs
Genetic Operations
– Reset
– Copy
– Crossover
• Two individuals are randomly selected based off fitness
• Randomly select and exchange compatible sub-trees
• Generates two new programs
– Mutation
• Randomly select individual based off fitness
• Randomly select and change mutable node
• Generate a new sub-tree (if necessary)
– Architecture Altering Operations
Reselection is allowed for all operators
Page 12Reproduction Phase: Test Cases
Reproduction employs uniform crossover
Each offspring has a chance to mutate
Genes to mutate are selected random
Mutated gene is randomly adjusted
– The amount adjusted is selected from a Gaussian distribution
Page 13CASC Evolutionary Model
Page 14CASC Evolutionary Model
Page 15CASC Evolutionary Model
Page 16CASC Evolutionary Model
Page 17CASC Implementation Details
Adaptive parameter control
– EAs typically have many control parameters
– Difficult to find optimal settings for these parameters
– In CASC genetic operator probabilities are adaptive parameters
– Rewarded/punished based on performance
• If one operator is generating improved individuals more than the others make it more likely to be used
– Allows the system to adapt to the different phases in the search
Page 18CASC Implementation Details
Parallel Computation– Computational complexity is generally a problem for Eas
– CASC writes, compiles, and executes hundreds (or even thousands) of C++ programs in a given run
– To reduce run times this is done in parallel (on the NIC cluster here on campus)
– Main node: responsible for generating and writing programs
– Worker nodes: responsible for compiling and executing programs
– Dramatically speeds up execution
– Investigating new options for this (discussed later)
Page 19Current and Future Work
Fitness Function Design– For each new problem CASC needs a new fitness function
– Fitness function design can often be difficult
– Developing a guide for fitness function design
– Starts a program specifications
– Walks through the thought process for designing a fitness function for the problem
– Long term goal: automate fitness function creation
Page 20Current and Future Work
File system slow down– CASC is writing and compiling many many programs each run
– I.e., many many files in the file system each run
– File system access is bottlenecking the speed of the CASC system
– Currently reworking the system to store program files and executables in RAM
– Uses a virtually mounted hard disk that stored data in RAM
– Expecting a dramatic speed up (fingers crossed…)
– Other option: distributed computing (like BOINC, Folding@home, etc.)
Page 21Current and Future Work
Scalability– As program size increases so does the problem space
• Many more modifications possible
• More genetic material
– Investigating options to allow CASC to scale with problem size
– Current idea: break the program up into pieces
• Multiple program populations
• Each population is based on a piece of the original program
• Each population has its own objective
• Cooperative coevolution
Page 22Current and Future Work
Page 23
Questions?