info to genetic algorithms - dc ruby users group 11.10.2016

Post on 13-Apr-2017

96 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Brief Introduction to Genetic Algorithms

Geoff Harcourt

Hi

I’m Geoff

Developer at thoughtbot

Maintainer of thoughtbot/dotfiles and parity (Heroku app shortcuts)

The Knapsack Problem

Given a set of items, each with its own weight, size,

price, determine the combination of items under

the weight and size budget that has the most value

CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=985491

• Positive points for putting guests who like each other

together • Negative points for putting guests who don’t get

along together • families must be together

what arrangement produces the most happiness?

Seating Chart for a Wedding

Delivery Truck Route

Given a set of packages that must be delivered in

one trip, what’s the ideal order of stops to do the

delivery in the shortest time (and/or least distance

travelled?)

What Do These Problems Have In Common?

• Potentially massive/infinite set of possible solutions (“large solution space”)

• Optimized solutions are better, but “great” is almost as good as “perfect”

• Cheap to test any one solution’s value (“fitness”)

When Force Isn’t Enough

First potential approach is brute force: test every possible solution

For some problems this technique can find the solution, but if the solution space is too large and/or infinite, may not be feasible.

Genetic Algorithms

Genetic Algorithms (GA) are a type of search algorithm that mimics the mechanic of natural

selection to traverse a space of possible solutions and generate high-quality solutions.

What does that mean?

A Genetic Algorithm combines and re-combines

elements of solutions to “evolve” toward more

optimal solutions in manner similar to that by

which a biological population evolves over time

Current Genetic Research

How does GA work?Representation: represent parts of the problem as “genes”

Fitness: a function that can be run against the expressed genes to measure the quality of the solution

Evolution: breeding and/or mutation and selection

Representation

A DNA-based organism represents its genetic code with a base-4 system [A, C, G, T]. A chromosome’s gene sequence might read as AACTGACTGA

Many problems can be expressed as base-2: 0110101000

Representation: The Knapsack Problem

Generate a random set of 100 items, each with its own weight, size, and value. Put the items in an array for reference. Our organism will have one “chromosome” with 100 “genes”. Each gene is set to either 0 (not in the knapsack) or 1 (in the knapsack).

The gene’s position in the chromosome matches that of the item’s state that it represents.

Fitness

The Fitness Function is how we test any solution’s fitness, or how effective the solution is.

For some problems the best fitness will be the highest number possible or lowest number possible, or it might be the number closest to an ideal value.

Fitness: The Knapsack Problem

The fitness function for the Knapsack Problem would be the sum of the value of the items that are in the Knapsack.

Our 9-gene chromosome: [0, 1, 1, 0, 0, 0, 1, 0, 1]

Our knapsack has items 1 ($11), 2 ($5), 6 ($3), and 8 ($21). Our knapsack’s value is 40, the sum of the dollar values of the items inside.

But wait, there’s more (to fitness)!

Some solutions are invalid. In the Knapsack Problem, solutions whose summed weights or volumes are greater than the knapsack can hold aren’t valid even if they contain the highest dollar value.

These problems need to return a fitness that disqualifies them. In our case, we’ll return 0 for any knapsack that exceeds the weight or size limit.

Seeding a Population

To determine a solution to our problem, let’s start with a population.

We’ll randomly generate 200 organisms by building 200 chromosomes, randomly flipping the bits in our chromosome either to put the item into the knapsack or hold it out.

Seeding (continued)

If we think we know some parts of the solution already (such as an item that’s worth a lot and is small and lightweight), we can use non-random or partially random seed data to nudge the population closer to the solution.

This is called “warm starting”. It should be used carefully, as it may preclude unexpectedly fit solutions.

Now what?We are going to iterate through a number of generations. In each generation, we’ll use the following mechanisms to move toward the best fitness:

• Crossover

• Mutation

• Selection

Sweet, sweet love

Our algorithm takes two organisms* from the population and has them mate. Mating the organisms combines their respective genes and produces two new organisms, each containing some elements of their parents’ gene expressions.

* Complicated algorithms can mate more than two organisms at once, we won’t do that here.

Crossover

Crossover is how we combine genes from two organisms to produce new solutions.

Crossover takes the chromosomes from two organisms and has them trade pieces with each other. The result of crossover is a group of organisms with new combinations of gene expressions that might not have existed in prior generations.

Crossover

Cr0ssover

Crossover mimics the process of gene recombination

(both between organisms and between

chromosomes themselves) that occurs in biological

organisms.

Crossover

Crossover will do much of the work for creating

genetic diversity (different solutions through different

combinations), from one generation to another, but

what if our seed population was missing some gene

expressions that would produce better solutions?

Mutation

Mutation is a mechanism to maintain genetic

diversity.

Mutation is applied by flipping a gene’s expression

according to a probability defined by the algorithm.

These flipped states introduce new values into the

population and contribute to a wider search space.

Mutation

Mutation probabilities need to be kept low or else

they result in a loss of progress from generation to

generation, and the genetic algorithm becomes more

of random search than an evolution toward an ideal

solution.

Mutation

It’s often helpful to tweak mutation settings

(probability of mutation) over several tests to see how

it affects the search.

A New Generation

By mating our initial population through crossover

and randomly applying mutations to a small

percentage of genes, we’ve produces a new

generation of solutions.

We’ll test each organism’s fitness to see which

organisms are most fit.

Selection

Here’s where it gets interesting (and where we have

some decisions to make).

In each generation, we want to promote the most fit

solutions and demote the least fit. There’s a number

of factors to consider.

SelectionThe fittest solutions will be selected more often to

breed into the next generation. The frequency can be

determined by various techniques including:

• weigh by organism’s % of generation’s total fitness • randomly, weighted by fitness (“roulette wheel”) • “tournament selection” (taking a subset and

picking the best of the subset)

ElitismIn order to ensure we never lose the best solution

taken, we can ensure that the best organism(s) found

is/are always included in the next generation.

This mechanism is termed elitism.

If the threshold for elitism is too harsh, the solution

may prematurely converge on a solution, sometimes

called “hill-climbing”.

Hill-climbing

A “hill-climbing” algorithm takes a solution and checks adjacent solutions to see if neighboring options are better.

Vulnerable to local maxima/minima

Why Did I Do This?Fantasy Baseball!

I play in a fantasy baseball league where over 600

baseball players are controlled by the teams.

Our league was expanding from 14 to 16 teams, and

I wanted to see the effect that the change would

have on positional scarcity.

Why Did I Do This?Some positions (first base, outfield) have lots of great

hitters, while some (shortstop, catcher) have fewer

good hitters.

Some players are eligible to play at multiple

positions.

I wanted a way to simulate how our league would

draft and allocate players in the upcoming season.

Why Did I Do This?My first attempt was to build a program that

performed a draft. Each turn it found the weakest

position and then selected the best player who could

play that position.

I noticed that this program frequently turned out

solutions that looked incorrect (positions looked

oddly ranked relatively to one another).

Why Did I Do This?

It turns out my solution prematurely optimized, so I

was allocating players inefficiently and failing to

accurately simulate what would happen in a real

draft where people could observe the scarcity of

each position in real-time.

Why Did I Do This?

I wasn’t actually interested in getting a perfect

solution, but was concerned with getting something

that was a reasonable representation of what would

happen in an auction.

I used the Darwinning gem, which provides a GA

framework, to simulate the allocation of players.

Further Reading• Daniel Sellergren - Solving the 0-1 Knapsack

Problem with a Genetic Algorithm in Ruby http://www.danielsellergren.com/posts/solving-the-0-1-knapsack-problem-with-a-genetic-algorithm-in-ruby

• MIT Course Lecture (very high-level, great introduction!) - Genetic Algorithms https://www.youtube.com/watch?v=kHyNqSnzP8Y

• Darwinning - Ruby gem for GAhttps://github.com/dorkrawk/darwinning

Keep in Touch!GitHub - @geoffharcourt

Twitter - @geoffharcourt

Email - geoff@thoughtbot.com

DC Tech Slack - @geoffharcourt

top related