symbolic regression via genetic programming ai project #2 biointelligence lab cho, dong-yeon

20
Symbolic Regression via Symbolic Regression via Genetic Programming Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon ([email protected])

Upload: ashley-craig

Post on 20-Jan-2018

223 views

Category:

Documents


0 download

DESCRIPTION

© 2005 SNU CSE Biointelligence Lab 3 Example (2/2) Kepler’s Third Law  Square of any planet's orbital period (sidereal) is proportional to cube of its mean distance (semi-major axis) from Sun PlanetAP Mercury Venus Earth1.00 Mars Jupiter Saturn Uranus

TRANSCRIPT

Page 1: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

Symbolic Regression via Genetic Symbolic Regression via Genetic ProgrammingProgramming

AI Project #2

Biointelligence labCho, Dong-Yeon

([email protected])

Page 2: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

2

Example (1/2)Example (1/2) Data

Relationship between A and P

A P0.39 0.240.72 0.611.00 1.001.52 1.845.20 11.99.53 29.419.1 83.5

Page 3: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

3

Example (2/2)Example (2/2) Kepler’s Third Law

Square of any planet's orbital period (sidereal) is proportional to cube of its mean distance (semi-major axis) from Sun

Planet A PMercury 0.39 0.24Venus 0.72 0.61Earth 1.00 1.00Mars 1.52 1.84

Jupiter 5.20 11.9Saturn 9.53 29.4Uranus 19.1 83.5

Page 4: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

4

Koza’s Algorithm1. Choose a set of possible functions and terminals for the program.

F = {+, - *, /, }, T = {A}2. Generate an initial population of random trees (programs) using the set of possible functions and terminals.3. Calculate the fitness of each program in the population by running it on a set of “fitness cases” (a set of input for which the correct output is known).4. Apply selection, crossover, and mutation to the population to form a new population.5. Steps 3 and 4 are repeated for some number of generations.

Evolving the Programs (1/2)Evolving the Programs (1/2)

Page 5: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

5

Evolving Lisp Programs (2/2) Evolving Lisp Programs (2/2) Kepler’s Third Law: P2 = cA3

FORTRAN

LISP

PROGRAM ORBITAL_PERIORDC # Mars #

A = 1.52P = SQRT(A * A * A)PRINT P

END ORBITAL_PERIORD

(defun orbital_period (); Mars ;(setf A 1.52)(sqrt (* A (* A A))))

Parse tree

Page 6: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

6

Symbolic Regression by GPSymbolic Regression by GP Objective

Find the function f for the given data (x, y)

Data Sets Set 1 and 2: 11 pairs Set 3: 50 pairs

)(xfy

Page 7: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

7

Functions and TerminalsFunctions and Terminals Functions

Numerical operators {+, -, *, /, exp, log, sin, cos, sqrt} Some operators should be protected from the illegal operation.

Terminals Input and constants

{x, R} where R [a, b]

Page 8: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

8

InitializationInitialization Maximum initial depth of trees Dmax is set. Full method (each branch has depth = Dmax):

nodes at depth d < Dmax randomly chosen from function set F nodes at depth d = Dmax randomly chosen from terminal set T

Grow method (each branch has depth Dmax): nodes at depth d < Dmax randomly chosen from F T nodes at depth d = Dmax randomly chosen from T

Common GP initialisation: ramped half-and-half, where grow and full method each deliver half of initial population

Page 9: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

9

Fitness FunctionsFitness Functions Relative Squared Error

The number of outputs that are within % of the correct value

n

i i

ii

yxfyFitness

1

2)(ˆ

Page 10: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

10

Selection (1/2)Selection (1/2) Fitness proportional (roulette wheel) selection

The roulette wheel can be constructed as follows. Calculate the total fitness for the population.

Calculate selection probability pk for each chromosome vk.

Calculate cumulative probability qk for each chromosome vk.

SIZEPOP

kkifF

_

1

)(

SIZEPOPkFifp k

k _,...,2,1 ,)(

SIZEPOPkpqk

jjk _,...,2,1 ,

1

Page 11: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

11

Procedure: Proportional_Selection Generate a random number r from the range [0,1]. If r q1, then select the first chromosome v1; else, select the

kth chromosome vk (2 k pop_size) such that qk-1 < r qk.pk qk

1 0.082407 0.082407

2 0.110652 0.193059

3 0.131931 0.324989

4 0.121423 0.446412

5 0.072597 0.519009

6 0.128834 0.647843

7 0.077959 0.725802

8 0.102013 0.827802

9 0.083663 0.911479

10 0.088521 1.000000

0.036441)(_

1

sizepop

kkifF

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

Page 12: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

12

Selection (2/2)Selection (2/2) Tournament selection

Tournament size q Ranking-based selection

2 POP_SIZE 1 + 2 and - = 2 - +

11)(1

ipi

Page 13: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

13

GP FlowchartGP Flowchart

GA loop GP loop

Page 14: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

14

BloatBloat Bloat = “survival of the fattest”, i.e., the tree sizes

in the population are increasing over time Ongoing research and debate about the reasons Needs countermeasures, e.g.

Prohibiting variation operators that would deliver “too big” children

Parsimony pressure: penalty for being oversized

)#,(#)#,(#

DNCRSEDNPenaltyErrorFitness

Page 15: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

15

Page 16: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

16

ExperimentsExperiments At least three problems (+ your own data) Various experimental setup

Termination condition: maximum_generation 2 Models 3 settings 20 runs

Polynomial and general Effects of the penalty term Selection methods and their parameters Crossover pc and mutation pm

Page 17: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

17

ResultsResults For each problem

Result table and your analysis

Present the optimal function. Readable form and predicted function graph with data

Draw a learning curve for the run where the best solution was found.

You can draw all learning curves in one plot.

Polynomial GeneralAverage SD

Best Worst Average SD

Best Worst

Setting 1Setting 2Setting 3

Page 18: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

18Generation

Fitness (Error)

Page 19: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

19

ReferencesReferences Source Codes

GP libraries (C, C++, JAVA, …) MATLAB Tool box

Web sites http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://cs.gmu.edu/~eclab/projects/ecj/ http://www.geneticprogramming.com/GPpages/softwar

e.html …

Page 20: Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2005 SNU CSE Biointelligence Lab

20

Pay Attention!Pay Attention! Due: May 3, 2005 Submission

Source code and executable file(s) Proper comments in the source code Via e-mail

Report: Hardcopy!! Running environments Results for many experiments with various parameter settings Analysis and explanation about the results in your own way