genetic programming using an evolutionary process to design computer programs. automatic programming...

33
Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program should do, not HOW to design it. In traditional AI Automatic Programming (AP), the system reasons through a complex design logic, the HOW (derived from human software engineers) to generate code that is easy to understand and modify. In GP, the system only knows WHAT the system should do. The WHAT is the basis for fitness tests, but there is no step-by-step design logic. The GP simply generates whole solutions (as random combinations of previous solutions) and tests them… just like in nature. The resulting programs are often very efficient, but terribly hard to understand and modify…just like in nature.

Upload: kimberly-shaw

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Genetic Programming

• Using an evolutionary process to design computer programs.

• Automatic Programming where the computer is only told WHAT the final program should do, not HOW to design it.

– In traditional AI Automatic Programming (AP), the system reasons through a complex design logic, the HOW (derived from human software engineers) to generate code that is easy to understand and modify.

– In GP, the system only knows WHAT the system should do. The WHAT is the basis for fitness tests, but there is no step-by-step design logic.

– The GP simply generates whole solutions (as random combinations of previous solutions) and tests them…just like in nature.

– The resulting programs are often very efficient, but terribly hard to understand and modify…just like in nature.

Page 2: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Overview

Goal: Give enough background of both general principles and essential details so that you can begin designing a GP system.

• Basic GP Design Issues

• Multiplexer Example

• The closure property for GP function sets.

• Genotype Representations

• Development and Execution– Going from genotype to phenotype

– Running the phenotype

Page 3: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Key GP Design Issues

• Phenotype Representation - The general format of solutions.

• Genotype Representation - This defines the size of the search space.

– Primitives: Terminals & Functions

– Form: Linear, Tree, Graph

• Development (Morphogenesis) - How are genotypes converted into phenotypes?

• Fitness Function - This defines the texture of the search space.

• Fitness Cases - Test set for the phenotypes. The “environment” in which the individuals must “live”.

• Genetic Operators - How are new genotypes generated from old ones?

– Standard or specialized mutation and crossover.

• Selection Mechanism – How are fitness values converted to areas on the roulette wheel?

• Phenotype Execution Process– Interpreted or Compiled?

Page 4: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

GP Primitives• GP Terminals - leaf nodes of a GP tree

– Args/params (to the main GP function or “genetic program”)

– Constants

– Zero-argument primitive functions

• GP functions – non-leaf nodes of a GP tree

– Primitive functions (building-blocks) of the genetic program

– Arity = # of arguments (must be pre-defined)

– Many types• Boolean: and, or, not, xor, nand…

• Arithmetic: +, -, *, % (protected division)

• Transcendental sin, cos, exp, expt, rlog (protected log)

• Conditional: if, case, switch…

• Loop while, repeat…

• Block progn2, progn3…

• Assignment assign (val to var)

• Memory access mem (retrieves value in a given memory location)

Page 5: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Fitness

Fitness Function: Assesses the performance of a phenotype and outputs a fitness score.

Raw fitness = basic performance score (e.g., # of correct mappings) of the phenotype.

Standardized Fitness = scaled fitness so that 0 is always the best value.

Normalized fitness = scaled fitness to range [0 1]

• In practice, raw fitness is often used, without any scaling.

• The fitness function should give a continuous grade of credit, not just yes/no info.

• It should serve as a heuristic to indicate the closeness of a phenotype to an optimal phenotype, and thus provide useful information to the genetic search process.

Page 6: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

The Fitness Landscape of the Search SpaceF

itne

ss

Genotype (Search is performed at this level)

• Genotype representation determines the size and density of the search space• Fitness function determines its texture (rough, smooth, etc.).• Rough landscapes are harder to search, since the partial information provided by the fitness function does not have good heuristic value. E.g. A local max gets a high fitness score though it may be quite distant from the global max.

Phenotype

(Testing is performed at this level)FitnessFunc

Page 7: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Boolean Multiplexer Problem• Design a Boolean expression of that takes N = K + M inputs, where

the K bits a(0)…a(K-1) denote an address/index A, and the data bit d(A) from d(0)…d(M-1) is the desired return value. In other words, the address bits select a data bit.

• Standard problem sizes: log2(M) + M

• 1 + 2 = 3

• 2 + 4 = 6

• 3 + 8 = 11

11-bitMux

Addressa0-a2

Datad0-d7

dA

Page 8: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Boolean Multiplexer Solutions

3-bit Mux

(if a0 d1 d0)

6-bit Mux(if a0

(if a1 d3 d1)

(if a1 d2 d0))

11-bit Mux(if a0

(if a1

(if a2 d7 d3)

(if a2 d5 d1)

(if a1

(if a2 d6 d2)

(if a2 d4 d0)))

*There are MANY other solutions for the 6-bit and 11-bit cases

Page 9: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Primitives for Mux Design using GP• Terminals:

– Address bits: a0, a1… and Data bits: d0,d1…

• Functions:

– GOR(2)

(defun gor (x y) (if (or (> x 0) (> y 0)) 1 -1)

– GAND(2)

(defun gand (x y) (if (and (> x 0) (> y 0)) 1 -1)

– GNOT(1)

(defun gnot(x) (if (> x 0) -1 1))

– GIF(3)

(defmacro gif (condition act1 act2)

`(if (> ,condition 0) ,act1 ,act2))

• Closure: All functions output an integer, and all functions accept integers for all their arguments.

• Truth value of an integer (I) is T iff I > 0. For this problem, the acts in gif are any func calls or terminals.

Page 10: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Fitness Testing of GP Multiplexers• For an n-bit Mux, there are 2n test cases: all possible combinations of

input bits.

• Each case has a correct output: the value of dA, where A is the integer encoded by the address bits.

• For each individual program in the population

– hits = 0

– For each of the 2n cases:

• Set the values of primitives a0..ak and d0…dm according to their values in the test case.

• Run the individual program with bindings for ai’s and dj’s from above.

• Compare the output of the program, O*, to dA.

• If 0* = dA, then hits = hits + 1.

– Fitness(individual) = hits

Page 11: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

GP Tableau (Koza, 1992)

Parameters Values

Objective: Evolve an 11-bit MultiplexerTerminal Set: a0,a1,a2,d0,d1…d7Function Set: Gif, Gor, Gand, GnotFitness Cases: All 211 input casesFitness Func: # outputs correctly predicted.

Population Size: 4000Crossover Probability: 75%Mutation Probability: 5% (per individual)Selection Mechanism: Sigma ScalingTermination Criteria: NoneMax # Generations: 50Max Tree Depth: 15Max Init Tree Depth: 6… many others possible…

Page 12: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

GP Multiplexer Solutions (Koza, 1992)

Generation 1: Fitness = 1408

(gif a0 (gif a2 d7 d3) d0)

Generation 4: Fitness = 1664

(gif a0 (gif a2 d7 d3)

(gif a2 d4 (gif a1 d2

(gif a2 d7 d0))))

Generation 9: Fitness = 2048 (perfect)

(gif a0 (gif a2 (gif a1 d7 (gif a0 d5 d0))

(gif a0 (gif a1 (gif a2 d7 d3) d1) d0))

(gif a2 (gif a1 d6 d4)

(gif a2 d4 (gif a1 d2 (gif a2 d7 d0)))))

* Population size = 4000!!

Page 13: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

The Closure Property of GP Functions• Genetic programs are randomly generated and combined, so ANY

function could end up with the output of ANY OTHER function (or itself) or ANY terminal as its input.

• So, EVERY GP primitive function must be capable of accepting the output of ANY primitive function for ANY of its input arguments. This is the closure property.

• Protected versions of functions like Division and Log are needed to handle inputs of 0, or negative numbers, respectively.

• The only constraint that classic GP systems check when generating trees is function arity: each function call gets exactly the number of arguments that it requires.

• Once the initial population is generated, recombination of programs via crossover will never violate the arity constraints, so no extra checking is necessary.

Page 14: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Closure Tricks• If mixing booleans and arithmetic funcs, rewrite your booleans to

interpret every number other than 0 as TRUE, and have them output a 0 for FALSE and a 1 for TRUE. Or, view all neg numbers as FALSE, pos as TRUE, and then output -1 or 1.– (GAND 8 0) => 0 (GOR 5 -3.2) => 1…

• All action funcs such as Move-Forward, Turn-Left, etc. should return a number. For example, they can return a number that denotes TRUE if they succeed, FALSE otherwise.

• Use the MOD (modulo) operator to convert numeric inputs into indices in collections. For example, if your genetic program includes a memory buffer of size 4, then a call to (ASSIGN 33 12) could be interpreted as “Assign the value 12 to MEM(1)”, since 1 = 33 Mod 4. Or, (COPY 17 15) could be “Assign the value at MEM(1) to MEM(3)”, since 17 Mod 4 = 1 and 15 Mod 4 = 3. If the inputs are real numbers, then they’ll need to be rounded first.

• Disadvantage: Using these tricks, any program is legal, and many genetic programs can represent the exact same procedure (I.e., N-to-1 mapping between representation/search space and solution space. Hence the search space can be huge!

Page 15: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Strongly-Typed GPs• Restrict the set of legal programs by requiring the inputs to each function to be

of specified types => Great reduction in representation/search space.

• Each function declaration must include: types for all arguments + output/return type. Also, each terminal must be typed. A func’s type = type of return value.

• E.g., GIFACT(bool, act1, act2) => real - if boolean condition is true, then perform action 1, else action 2. The return value is a real, denoting, for example, something about the actions, such as how far the agent moved while doing act1 or act2.

• Type-checking needed during both program generation + crossover.

– E.g. f(bool,act) => dir g(real) => real h(dir) => bool

f

Must bebool

Must beact

Generation

f g h3

7 LeftTrueTurnLeft

Crossover

??

Page 16: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

GP Genotype Representations

• It is important to distinguish between the static form of the genotype and the pattern of interactions that it generates when its corresponding phenotype is executed.

• Static Forms

– Linear – sequence of operations

– Tree - node-arc rep with all nodes (except the root) having 1 parent. Classic GP!

– Graph – node-arc rep with no restrictions on interconnections

Page 17: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Linear GP Genomes

• Genome

– Linear string of commands (in any language or machine code).

• Execution structure of the Genome:

– Linear: sequence of independent commands.

– Graph: inter-dependent commands due to shared memory (e.g. registers)

– Tree: Nested sequence of commands with only local interactions - the results from one command are available ONLY to the command that it is a nested argument of . To make it available many places, you have to recompute it at several points in the tree.

f

g

1 h

3 4

6(f 6 (g 1 (h 3 4)))

Page 18: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Linear Genomes with Linear & Graph Execution

Linear genome with linear execution

- Each action is independent

Linear genome with graph execution

- Actions are inter-dependent via shared memory registers.

Turn leftGo forwardTurn rightTurn right :

1: C = A + B2: D = C + A3: E = D + D4: A = E + B :

A: B: C: D: E:

Memory

1

32

AB C D

E

4

Page 19: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Graph-Based GP • PADO (Teller & Veloso, 1995)

• Each node must:– Compute using stack and/or indexed memory

– Decide on the neighbor node to move to. +

*

7

readwrite

start

end

XY

12213551278

Run-timeStack

0 1 2 3 461 32 197 404 88

Indexed Memory

push/pop

• Read/Write work only with indexed memory• Other nodes pop values from the stack, compute

a function, then push result onto the stack

Page 20: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Development & Execution of GP Programs

gtype ApplicationLanguage

SourceLanguage

MachineLanguage

gtype gtype

ptype

ptype

GP-Appinterpreter

compiler

Fitness

ptypeptype*

cpu

Fitness

DevpExec

Sourceinterpreter

Fitness

1

234

Page 21: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Executing a Genetic Program• Interpreting the Genotype

– Hand-coded interpreter reads through the genotype and performs the appropriate actions during fitness testing.

– Done in Lisp and C++.

– Very slow, but better than compiling when there are not many fitness cases to run.

• Compiling the Genotype

– Hand-coded wrapper routines for the genotype enable its compilation to machine code, which then runs during fitness testing.

– Easy in Lisp.. Hard in C++

• Genotype IS Machine Code

– No compilation necessary.

– Possible with any type of machine code.

– Extremely fast (60 x faster than compiled C++)

– Limited expressibility within a reasonable program size.

Page 22: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Interpreting a GP GenotypeSample Genotype: (gor (gand A0 D1) D3)

(defun interpret (sexpr)

(cond

((equal sexpr ‘A0) (fetch-input-value 0))

((equal sexpr ‘A1) (fetch-input-value 1))

:

((equal (first sexpr) ‘gnot)

(if (> (interpret (second sexpr)) 0) -1 1))

((equal (first sexpr) ‘gor)

(if (or (> (interpret (second sexpr)) 0)

(> (interpret (third sexpr)) 0))

1 -1))

:

*You can compile the interpreter, but then you still have to interpret the genotype code for EVERY fitness test case.

*This is easy in C++ too, but you’ll have to deal with the genotype tree explicitly: C++ doesn’t handle nested lists (I.e. trees) implicitly the way Lisp does.

Page 23: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Interpreter for PrefixLinear Prefix of (+ (* (- 3 4) 5) (/ 8 4)) is:

(+ * - 3 4 5 / 8 4)

 

(defun interp-prefix (symbols)

(labels ((num-args (func) 2)) ;; Local function

(let ((sym (car symbols)) ;; Declare 4 local vars

(evaled-args nil)

evaled-arg remains)

(cond ((and (symbolp sym) (fboundp sym)) ;; symbol is func name

(setf remains (cdr symbols))

(dotimes (i (num-args sym))

(multiple-value-setq (evaled-arg remains)

(interp-prefix remains))

(push evaled-arg evaled-args))

(values (apply sym (reverse evaled-args))

remains))

(t ;; symbol is not a function name

(values sym (cdr symbols)))))))

Page 24: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Interpreter for PostfixThe Postfix of (+ (* (- 3 4) 5) (/ 8 4)) is:

(3 4 - 5 * 8 4 / +)  (defun interp-postfix (symbols) (let ((stack nil) args) ;; the operand stack

;; 2 Local functions: num-args and interp (labels ((num-args (func) 2) (interp (symbols) ;; the main, recursive local func (cond ((null symbols) (pop stack)) ((and (symbolp (car symbols)) ;; func symbol

(fboundp (car symbols))) (setf args nil) (dotimes (i (num-args (car symbols))) (push (pop stack) args)) (push (apply (car symbols) args) stack) (interp (cdr symbols))) (t (push (car symbols) stack) (interp (cdr symbols))))))

;; the main program (interp symbols))))

Page 25: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Variable Binding in GP

• Assume we convert the genotype into code that is executable in the application language (e.g. Lisp, C++) without the aid of a special-purpose GP interpreter. Note that the code may still be interpreted, but now it’s the general LISP language interpreter that does the job.

• When this code runs, how do we get the proper variable bindings from a fitness case into the code?

(gif A0 D1 (gor A1 D2…)A0: 1A1: 0D0: 1D1: 1D2: 0D3: 1

Fitness Case

??

• Simple (ugly) solution

– Declare A0,A1…D0, D1… as global variables

– Update them with each new fitness case.

– Use Lisp’s interpreter, EVAL.

• (eval ‘(gif A0 D1 (gor A1 D2….)))

Page 26: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Wrapping with a Lambda

A slightly more elegant solution

? (setf ptype

(eval (append ‘(lambda (A0 A1 D0 D1 D2 D3)) (list genotype))))

This creates an unnamed phenotype function of 6 arguments. The body code of the phenotype is the genotype s-expression.

E.g. (lambda (A0 A1 D0 D1 D2 D3)

(gif A0 D1 (gor A1 D2….)))

? (apply ptype fitness-case)

This calls the phenotype function with its arguments bound to the fitness case. No global variables are needed.

(dolist (case fitness-cases)

(let ((res (apply ptype case)))

.. compare res to expected result

.. update error sum … )

Page 27: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Compiling the Genotype in Lisp(setf ptype

(compile nil (append ‘(lambda (A0 A1 D0 D1 D2 D3))

(list genotype))))

Lisp’s compile function allows us to compile code at run-time! The first argument to compile (nil in this case) is the name to be given to the compiled function.

Sometimes, eval will automatically compile its argument too, but only within the scope of the global environment.

Eval and compile are not recommended for normal Lisp applications, but GP is special since it must create executable code on the fly (i.e. at run time).

Run-time compiling is very difficult in C++ and similar languages. GPs in these other languages normally rely on a hand-written interpreter.

Page 28: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Machine-Code GP• Cramer (1985), Nordin (1994), Crepeau (1995)

• Genome = Linear sequence of Machine-Code instruction, often in binary.

• Extensive use of machine registers allows storage of intermediate results, which can be used MANY different places in the code. Hence run-time structure ressembles a GRAPH (not a tree)

• E.g.– A = A + B

• 011 (op code for reg-reg add) 000 (dest reg 0) 000 (src 1 = reg 0) 001 (src 2 = reg 1)

– C = A + 5• 100 (code for reg-const add) 010 (dest reg 2) 000 (src1 = reg 0) 101 (src2 = const 5)

– GOTO 1• 111 (code for jump) 000000001 (destination address)

• By using the actual machine codes for your machine, the GP creates fully-compiled and assembled programs => 60X faster than C++ GP interpreters.

Page 29: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Maintaining Legal Genomes

• Tree-based GP– As long as closure holds or if strong-typing is

used, then subtree swapping does not create illegal programs.

• Linear or Graph-based GP– Crossover can create invalid programs, so

special care must be taken.

Page 30: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Maintaining Legal Machine Code ProgramsGenerating:

Once an op-code is (randomly) selected, restrict the operands to legal ones.

E.g. if it’s a reg-reg add instruction, make sure the operands are legal register indices (At the machine level, closure tricks like modulo are not performed!!)

Crossing over:

Swap whole-instruction sub-sequences between individuals: never pick the crossover point inside of an instruction.

Mutating:

For each operator, define a set of legal mutations.

E.g. Legal mutations for a reg-reg subtract: reg1 = reg2 - reg3

reg1 = reg2 + reg3

reg2 = reg2 - reg3

reg3 = reg2 - reg1

reg1 = reg2 AND reg3 ….

Page 31: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Maintaining Legal Linear Genomes• For genomes with linear or graph execution modes, the same

techniques as for constrained generation, crossover and mutation of machine-code genomes are applicable (see below).

• For genomes with tree-type execution structure, there is no standard instruction length, since each operator has subtree operands of widely-varying size. Hence, different considerations must be taken within the genetic operators to preserve program validity, I.e. to insure that the resulting linear genome represents a well-formed tree.

• Mutation: You cannot just replace an operator or operand with any other, since:– Arities may vary among functions, so a 3-argument operator cannot be

replaced by a 2-argument operator . – Terminals cannot replace operators, or vice versa.

• Crossover: You cannot just swap linear segments unless each represents a complete sub-tree.

• In short, genetic operators working on linear representations of tree-structured programs cannot forget that they are really working with trees!!

Page 32: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Linear Prefix-Coded Genomes for GP Trees

• Keith & Martin (1994)

• (f 6 (g 1 (h 3 4))) represented as f 6 g 1 h 3 4

• Given the arity of each function, a recursive interpreter for these strings is simple to write in any language.

• Preserving program validity

– Generating: randomly choose terminals & functions, but always check that the new node is the root of a subtree for SOME earlier node. Also, keep track of current tree depth so as not to exceed the maximum depth bound. When at maximum depth, use only terminals.

– Mutating: Only replace function names with the names of same-arity functions, and always replace terminals with terminals.

– Crossover: Only swap segments that represent complete subtrees.

– Once found, whole subtrees can be changed to other subtrees as a form of mutation.

– Key trick: finding the subtrees.

Page 33: Genetic Programming Using an evolutionary process to design computer programs. Automatic Programming where the computer is only told WHAT the final program

Subtree Detection in Prefix-Coded GenomesUse arity info, where arity(func) = number of args; arity(terminal) = 0.

Procedure Random Subtree Fetch:

sum = 0

Begin at any random array element, A(j).

k = j

While sum <> -1 and A(j) <> end-of-array Do

sum = sum + (arity(A(k)) - 1).

k = k + 1

end while

If sum = -1, then subtree = A(j)…A(k)

(gor (gand (gnot 2) (gor 5 8)) 1)[ Gor Gand Gnot 2 Gor 5 8 1 ]

Sum: 1 1 0 1 0 -1

Tree Linear Genome

Subtree