info to genetic algorithms - dc ruby users group 11.10.2016

39
A Brief Introduction to Genetic Algorithms GeoHarcourt

Upload: geoff-harcourt

Post on 13-Apr-2017

96 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

A Brief Introduction to Genetic Algorithms

Geoff Harcourt

Page 2: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Hi

I’m Geoff

Developer at thoughtbot

Maintainer of thoughtbot/dotfiles and parity (Heroku app shortcuts)

Page 3: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

The Knapsack Problem

Given a set of items, each with its own weight, size,

price, determine the combination of items under

the weight and size budget that has the most value

CC BY-SA 2.5, https://commons.wikimedia.org/w/index.php?curid=985491

Page 4: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

• Positive points for putting guests who like each other

together • Negative points for putting guests who don’t get

along together • families must be together

what arrangement produces the most happiness?

Seating Chart for a Wedding

Page 5: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Delivery Truck Route

Given a set of packages that must be delivered in

one trip, what’s the ideal order of stops to do the

delivery in the shortest time (and/or least distance

travelled?)

Page 6: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

What Do These Problems Have In Common?

• Potentially massive/infinite set of possible solutions (“large solution space”)

• Optimized solutions are better, but “great” is almost as good as “perfect”

• Cheap to test any one solution’s value (“fitness”)

Page 7: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

When Force Isn’t Enough

First potential approach is brute force: test every possible solution

For some problems this technique can find the solution, but if the solution space is too large and/or infinite, may not be feasible.

Page 8: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Genetic Algorithms

Genetic Algorithms (GA) are a type of search algorithm that mimics the mechanic of natural

selection to traverse a space of possible solutions and generate high-quality solutions.

Page 9: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

What does that mean?

A Genetic Algorithm combines and re-combines

elements of solutions to “evolve” toward more

optimal solutions in manner similar to that by

which a biological population evolves over time

Page 10: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Current Genetic Research

Page 11: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

How does GA work?Representation: represent parts of the problem as “genes”

Fitness: a function that can be run against the expressed genes to measure the quality of the solution

Evolution: breeding and/or mutation and selection

Page 12: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Representation

A DNA-based organism represents its genetic code with a base-4 system [A, C, G, T]. A chromosome’s gene sequence might read as AACTGACTGA

Many problems can be expressed as base-2: 0110101000

Page 13: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Representation: The Knapsack Problem

Generate a random set of 100 items, each with its own weight, size, and value. Put the items in an array for reference. Our organism will have one “chromosome” with 100 “genes”. Each gene is set to either 0 (not in the knapsack) or 1 (in the knapsack).

The gene’s position in the chromosome matches that of the item’s state that it represents.

Page 14: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Fitness

The Fitness Function is how we test any solution’s fitness, or how effective the solution is.

For some problems the best fitness will be the highest number possible or lowest number possible, or it might be the number closest to an ideal value.

Page 15: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Fitness: The Knapsack Problem

The fitness function for the Knapsack Problem would be the sum of the value of the items that are in the Knapsack.

Our 9-gene chromosome: [0, 1, 1, 0, 0, 0, 1, 0, 1]

Our knapsack has items 1 ($11), 2 ($5), 6 ($3), and 8 ($21). Our knapsack’s value is 40, the sum of the dollar values of the items inside.

Page 16: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

But wait, there’s more (to fitness)!

Some solutions are invalid. In the Knapsack Problem, solutions whose summed weights or volumes are greater than the knapsack can hold aren’t valid even if they contain the highest dollar value.

These problems need to return a fitness that disqualifies them. In our case, we’ll return 0 for any knapsack that exceeds the weight or size limit.

Page 17: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Seeding a Population

To determine a solution to our problem, let’s start with a population.

We’ll randomly generate 200 organisms by building 200 chromosomes, randomly flipping the bits in our chromosome either to put the item into the knapsack or hold it out.

Page 18: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Seeding (continued)

If we think we know some parts of the solution already (such as an item that’s worth a lot and is small and lightweight), we can use non-random or partially random seed data to nudge the population closer to the solution.

This is called “warm starting”. It should be used carefully, as it may preclude unexpectedly fit solutions.

Page 19: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Now what?We are going to iterate through a number of generations. In each generation, we’ll use the following mechanisms to move toward the best fitness:

• Crossover

• Mutation

• Selection

Page 20: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Sweet, sweet love

Our algorithm takes two organisms* from the population and has them mate. Mating the organisms combines their respective genes and produces two new organisms, each containing some elements of their parents’ gene expressions.

* Complicated algorithms can mate more than two organisms at once, we won’t do that here.

Page 21: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Crossover

Crossover is how we combine genes from two organisms to produce new solutions.

Crossover takes the chromosomes from two organisms and has them trade pieces with each other. The result of crossover is a group of organisms with new combinations of gene expressions that might not have existed in prior generations.

Page 22: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Crossover

Page 23: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Cr0ssover

Crossover mimics the process of gene recombination

(both between organisms and between

chromosomes themselves) that occurs in biological

organisms.

Page 24: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Crossover

Crossover will do much of the work for creating

genetic diversity (different solutions through different

combinations), from one generation to another, but

what if our seed population was missing some gene

expressions that would produce better solutions?

Page 25: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Mutation

Mutation is a mechanism to maintain genetic

diversity.

Mutation is applied by flipping a gene’s expression

according to a probability defined by the algorithm.

These flipped states introduce new values into the

population and contribute to a wider search space.

Page 26: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Mutation

Mutation probabilities need to be kept low or else

they result in a loss of progress from generation to

generation, and the genetic algorithm becomes more

of random search than an evolution toward an ideal

solution.

Page 27: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Mutation

It’s often helpful to tweak mutation settings

(probability of mutation) over several tests to see how

it affects the search.

Page 28: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

A New Generation

By mating our initial population through crossover

and randomly applying mutations to a small

percentage of genes, we’ve produces a new

generation of solutions.

We’ll test each organism’s fitness to see which

organisms are most fit.

Page 29: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Selection

Here’s where it gets interesting (and where we have

some decisions to make).

In each generation, we want to promote the most fit

solutions and demote the least fit. There’s a number

of factors to consider.

Page 30: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

SelectionThe fittest solutions will be selected more often to

breed into the next generation. The frequency can be

determined by various techniques including:

• weigh by organism’s % of generation’s total fitness • randomly, weighted by fitness (“roulette wheel”) • “tournament selection” (taking a subset and

picking the best of the subset)

Page 31: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

ElitismIn order to ensure we never lose the best solution

taken, we can ensure that the best organism(s) found

is/are always included in the next generation.

This mechanism is termed elitism.

If the threshold for elitism is too harsh, the solution

may prematurely converge on a solution, sometimes

called “hill-climbing”.

Page 32: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Hill-climbing

A “hill-climbing” algorithm takes a solution and checks adjacent solutions to see if neighboring options are better.

Vulnerable to local maxima/minima

Page 33: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Why Did I Do This?Fantasy Baseball!

I play in a fantasy baseball league where over 600

baseball players are controlled by the teams.

Our league was expanding from 14 to 16 teams, and

I wanted to see the effect that the change would

have on positional scarcity.

Page 34: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Why Did I Do This?Some positions (first base, outfield) have lots of great

hitters, while some (shortstop, catcher) have fewer

good hitters.

Some players are eligible to play at multiple

positions.

I wanted a way to simulate how our league would

draft and allocate players in the upcoming season.

Page 35: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Why Did I Do This?My first attempt was to build a program that

performed a draft. Each turn it found the weakest

position and then selected the best player who could

play that position.

I noticed that this program frequently turned out

solutions that looked incorrect (positions looked

oddly ranked relatively to one another).

Page 36: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Why Did I Do This?

It turns out my solution prematurely optimized, so I

was allocating players inefficiently and failing to

accurately simulate what would happen in a real

draft where people could observe the scarcity of

each position in real-time.

Page 37: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Why Did I Do This?

I wasn’t actually interested in getting a perfect

solution, but was concerned with getting something

that was a reasonable representation of what would

happen in an auction.

I used the Darwinning gem, which provides a GA

framework, to simulate the allocation of players.

Page 38: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Further Reading• Daniel Sellergren - Solving the 0-1 Knapsack

Problem with a Genetic Algorithm in Ruby http://www.danielsellergren.com/posts/solving-the-0-1-knapsack-problem-with-a-genetic-algorithm-in-ruby

• MIT Course Lecture (very high-level, great introduction!) - Genetic Algorithms https://www.youtube.com/watch?v=kHyNqSnzP8Y

• Darwinning - Ruby gem for GAhttps://github.com/dorkrawk/darwinning

Page 39: Info to Genetic Algorithms - DC Ruby Users Group 11.10.2016

Keep in Touch!GitHub - @geoffharcourt

Twitter - @geoffharcourt

Email - [email protected]

DC Tech Slack - @geoffharcourt