info to genetic algorithms - dc ruby users group 11.10.2016
Post on 13-Apr-2017
96 Views
Preview:
TRANSCRIPT
A Brief Introduction to Genetic Algorithms
Geoff Harcourt
Hi
I’m Geoff
Developer at thoughtbot
Maintainer of thoughtbot/dotfiles and parity (Heroku app shortcuts)
The Knapsack Problem
Given a set of items, each with its own weight, size,
price, determine the combination of items under
the weight and size budget that has the most value
CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=985491
• Positive points for putting guests who like each other
together • Negative points for putting guests who don’t get
along together • families must be together
what arrangement produces the most happiness?
Seating Chart for a Wedding
Delivery Truck Route
Given a set of packages that must be delivered in
one trip, what’s the ideal order of stops to do the
delivery in the shortest time (and/or least distance
travelled?)
What Do These Problems Have In Common?
• Potentially massive/infinite set of possible solutions (“large solution space”)
• Optimized solutions are better, but “great” is almost as good as “perfect”
• Cheap to test any one solution’s value (“fitness”)
When Force Isn’t Enough
First potential approach is brute force: test every possible solution
For some problems this technique can find the solution, but if the solution space is too large and/or infinite, may not be feasible.
Genetic Algorithms
Genetic Algorithms (GA) are a type of search algorithm that mimics the mechanic of natural
selection to traverse a space of possible solutions and generate high-quality solutions.
What does that mean?
A Genetic Algorithm combines and re-combines
elements of solutions to “evolve” toward more
optimal solutions in manner similar to that by
which a biological population evolves over time
Current Genetic Research
How does GA work?Representation: represent parts of the problem as “genes”
Fitness: a function that can be run against the expressed genes to measure the quality of the solution
Evolution: breeding and/or mutation and selection
Representation
A DNA-based organism represents its genetic code with a base-4 system [A, C, G, T]. A chromosome’s gene sequence might read as AACTGACTGA
Many problems can be expressed as base-2: 0110101000
Representation: The Knapsack Problem
Generate a random set of 100 items, each with its own weight, size, and value. Put the items in an array for reference. Our organism will have one “chromosome” with 100 “genes”. Each gene is set to either 0 (not in the knapsack) or 1 (in the knapsack).
The gene’s position in the chromosome matches that of the item’s state that it represents.
Fitness
The Fitness Function is how we test any solution’s fitness, or how effective the solution is.
For some problems the best fitness will be the highest number possible or lowest number possible, or it might be the number closest to an ideal value.
Fitness: The Knapsack Problem
The fitness function for the Knapsack Problem would be the sum of the value of the items that are in the Knapsack.
Our 9-gene chromosome: [0, 1, 1, 0, 0, 0, 1, 0, 1]
Our knapsack has items 1 ($11), 2 ($5), 6 ($3), and 8 ($21). Our knapsack’s value is 40, the sum of the dollar values of the items inside.
But wait, there’s more (to fitness)!
Some solutions are invalid. In the Knapsack Problem, solutions whose summed weights or volumes are greater than the knapsack can hold aren’t valid even if they contain the highest dollar value.
These problems need to return a fitness that disqualifies them. In our case, we’ll return 0 for any knapsack that exceeds the weight or size limit.
Seeding a Population
To determine a solution to our problem, let’s start with a population.
We’ll randomly generate 200 organisms by building 200 chromosomes, randomly flipping the bits in our chromosome either to put the item into the knapsack or hold it out.
Seeding (continued)
If we think we know some parts of the solution already (such as an item that’s worth a lot and is small and lightweight), we can use non-random or partially random seed data to nudge the population closer to the solution.
This is called “warm starting”. It should be used carefully, as it may preclude unexpectedly fit solutions.
Now what?We are going to iterate through a number of generations. In each generation, we’ll use the following mechanisms to move toward the best fitness:
• Crossover
• Mutation
• Selection
Sweet, sweet love
Our algorithm takes two organisms* from the population and has them mate. Mating the organisms combines their respective genes and produces two new organisms, each containing some elements of their parents’ gene expressions.
* Complicated algorithms can mate more than two organisms at once, we won’t do that here.
Crossover
Crossover is how we combine genes from two organisms to produce new solutions.
Crossover takes the chromosomes from two organisms and has them trade pieces with each other. The result of crossover is a group of organisms with new combinations of gene expressions that might not have existed in prior generations.
Crossover
Cr0ssover
Crossover mimics the process of gene recombination
(both between organisms and between
chromosomes themselves) that occurs in biological
organisms.
Crossover
Crossover will do much of the work for creating
genetic diversity (different solutions through different
combinations), from one generation to another, but
what if our seed population was missing some gene
expressions that would produce better solutions?
Mutation
Mutation is a mechanism to maintain genetic
diversity.
Mutation is applied by flipping a gene’s expression
according to a probability defined by the algorithm.
These flipped states introduce new values into the
population and contribute to a wider search space.
Mutation
Mutation probabilities need to be kept low or else
they result in a loss of progress from generation to
generation, and the genetic algorithm becomes more
of random search than an evolution toward an ideal
solution.
Mutation
It’s often helpful to tweak mutation settings
(probability of mutation) over several tests to see how
it affects the search.
A New Generation
By mating our initial population through crossover
and randomly applying mutations to a small
percentage of genes, we’ve produces a new
generation of solutions.
We’ll test each organism’s fitness to see which
organisms are most fit.
Selection
Here’s where it gets interesting (and where we have
some decisions to make).
In each generation, we want to promote the most fit
solutions and demote the least fit. There’s a number
of factors to consider.
SelectionThe fittest solutions will be selected more often to
breed into the next generation. The frequency can be
determined by various techniques including:
• weigh by organism’s % of generation’s total fitness • randomly, weighted by fitness (“roulette wheel”) • “tournament selection” (taking a subset and
picking the best of the subset)
ElitismIn order to ensure we never lose the best solution
taken, we can ensure that the best organism(s) found
is/are always included in the next generation.
This mechanism is termed elitism.
If the threshold for elitism is too harsh, the solution
may prematurely converge on a solution, sometimes
called “hill-climbing”.
Hill-climbing
A “hill-climbing” algorithm takes a solution and checks adjacent solutions to see if neighboring options are better.
Vulnerable to local maxima/minima
Why Did I Do This?Fantasy Baseball!
I play in a fantasy baseball league where over 600
baseball players are controlled by the teams.
Our league was expanding from 14 to 16 teams, and
I wanted to see the effect that the change would
have on positional scarcity.
Why Did I Do This?Some positions (first base, outfield) have lots of great
hitters, while some (shortstop, catcher) have fewer
good hitters.
Some players are eligible to play at multiple
positions.
I wanted a way to simulate how our league would
draft and allocate players in the upcoming season.
Why Did I Do This?My first attempt was to build a program that
performed a draft. Each turn it found the weakest
position and then selected the best player who could
play that position.
I noticed that this program frequently turned out
solutions that looked incorrect (positions looked
oddly ranked relatively to one another).
Why Did I Do This?
It turns out my solution prematurely optimized, so I
was allocating players inefficiently and failing to
accurately simulate what would happen in a real
draft where people could observe the scarcity of
each position in real-time.
Why Did I Do This?
I wasn’t actually interested in getting a perfect
solution, but was concerned with getting something
that was a reasonable representation of what would
happen in an auction.
I used the Darwinning gem, which provides a GA
framework, to simulate the allocation of players.
Further Reading• Daniel Sellergren - Solving the 0-1 Knapsack
Problem with a Genetic Algorithm in Ruby http://www.danielsellergren.com/posts/solving-the-0-1-knapsack-problem-with-a-genetic-algorithm-in-ruby
• MIT Course Lecture (very high-level, great introduction!) - Genetic Algorithms https://www.youtube.com/watch?v=kHyNqSnzP8Y
• Darwinning - Ruby gem for GAhttps://github.com/dorkrawk/darwinning
Keep in Touch!GitHub - @geoffharcourt
Twitter - @geoffharcourt
Email - geoff@thoughtbot.com
DC Tech Slack - @geoffharcourt
top related