evolving rules to solve problems: the learning classifier systems way

1. Evolving Rules to Solve Problems: The Learning Classifier Systems Way Pier Luca Lanzi EPIA 2007, Guimares , Portugal, September 4th, 2007

2. Evolving

3. Early Evolutionary Research 3 Box (1957). Evolutionary operations. Led to simplex methods, Nelder-Mead. Other Evolutionaries: Friedman (1959), Bledsoe (1961), Bremermann (1961) Rechenberg (1964), Schwefel (1965). Evolution Strategies. Fogel, Owens & Walsh (1966). Evolutionary programming. Common view Evolution = Random mutation + Save the best. Pier Luca Lanzi

4. Early intuitions 4 There is the genetical or evolutionary search by which a combination of genes is looked for, the criterion being the survival value. Alan M. Turing, Intelligent Machinery, 1948 We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications Structure of the child machine = Hereditary material Changes of the child machine = Mutations Natural selection = Judgment of the experimenter Alan M. Turing, Computing Machinery and Intelligence 1950. Pier Luca Lanzi

5. Meanwhile in Ann Arbor 5 Holland (1959). Iterative circuit computers. Holland (1962). Outline for a logical theory of adaptive systems. Role of recombination (Holland 1965) Role of schemata (Holland 1968, 1971) Two-armed bandit (Holland 1973, 1975) First dissertations (Bagley, Rosenberg 1967) Simple Genetic Algorithm(De Jong 1975) Pier Luca Lanzi

6. What are Genetic Algorithms? 6 Genetic algorithms (GAs) are search algorithms based on the mechanics of natural selection and genetics Two components Natural selection: survival of the fittest Genetics: recombination of structures, variation Underlying methaphor Individuals in a population must be adapted to the environment to survive and reproduce A problem can be viewed as an environment, we evolve a population of solutions to solve it Different individuals are differently adapted To survive a solution must be adapted to the problem Pier Luca Lanzi

7. A Peek into Genetic Algorithms 7 Population Representation A set of candidate The coding of solutions solutions Originally, binary strings Fitness function Operators inspired by Nature Evaluates candidate Selection solutions Recombination Mutation Genetic Algorithm Generate an initial random population Repeat Select promising solutions Create new solutions by applying variation Incorporate new solutions into original population Until stop criterion met Pier Luca Lanzi

8. Rules

9. Hollands Vision, Cognitive System One 9 To state, in concrete technical form, a model of a complete mind and its several aspects A cognitive system interacting with an environment Binary detectors and effectors Knowledge = set of classifiers Condition-action rules that recognize a situation and propose an action Payoff reservoir for the systems needs Payoff distributed through an epochal algorithm Internal memory as message list Genetic search of classifiers Pier Luca Lanzi

10. What was the goal? 10 1#11:buy 30 0#0#:sell -2 A real system with an unknown underlying dynamics Use a classifier system online to generate a behavior that matched the real system. The evolved rules would provide a plausible, human readable, model of the unknown system Pier Luca Lanzi

11. Hollands Learning Classifier Systems 11 Explicit representation of the incoming reward Good classifiers are the ones that predict high rewards Credit Assignment using Bucket Brigade Rule Discovery through a genetic algorithm on all the rule base (on the whole solution) Description was vast It did not work right off! Very limited success David E. Goldberg: Computer-aided gas pipeline operation using genetic algorithms and rule learning, PhD thesis. University of Michigan. Ann Arbor, MI. Pier Luca Lanzi

12. Learning System LS-1 & 12 Pittsburgh Classifier Systems Holland models learning as an adaptation process De Jong models learning as an optimization process Genetic algorithm applied to a population of rule sets 1. t := 0 2. Initialize the population P(t) 3. Evaluate the rules sets in P(t) 4. While the termination condition is not satisfied 5. Begin 6. Select the rule sets in P(t) and generate Ps(t) 7. Recombine and mutate the rule sets in Ps(t) 8. P(t+1) := Ps(t) 9. t := t+1 No apportionment of credit 10. Evaluate the rules sets in P(t) Offline evaluation of rule sets 11. End Pier Luca Lanzi

13. As time goes by 13 Genetic algorithms and CS-1 1970s Research flourishes Success is limited Reinforcement Learning Evolving rules as optimization Machine 1980s Research follows Hollands vision Learning Success is still limited Stewart Wilson Robotics applications 1990s creates XCS First results on classification But the interest fades away Classifier systems finally work 2000s Large development of models, facetwise theory, and applications Pier Luca Lanzi

14. Stewart W. Wilson & 14 The XCS Classifier System 1. Simplify the model 2. Go for accurate predictions not high payoffs 3. Apply the genetic algorithm to subproblems not to the whole problem 4. Focus on classifier systems as reinforcement learning with rule-based generalization 5. Use reinforcement learning (Q-learning) to distribute reward Most successfull model developed so far Wilson, S.W.: Classifier Fitness Based on Accuracy. Evolutionary Computation 3(2), 149-175 (1995). Pier Luca Lanzi

15. The Classifier System Way

16. Learning Classifier Systems as 16 Reinforcement Learning Methods System stt+1 at rt+1 Environment The goal: maximize the amount of reward received How much future reward when at is performed in st? What is the expected payoff for st and at? Need to compute a value function, Q(st,at) payoff Pier Luca Lanzi

17. 17 Define the inputs, the actions, and how the reward is determined HowDefine the expected payoff does reinforcement learning work? Compute a value function Q(st,at) mapping state-action pairs into expected payoffs Pier Luca Lanzi

18. How does reinforcement learning work? 18 First we define the expected payoff First we define the expected payoff as is the discount factor Pier Luca Lanzi

19. How does reinforcement learning work? 19 Then, Q-learning is an option. At the beginning, is initialized with random values At time t, previous value new estimate incoming reward new estimate Parameters, Discount factor The learning rate The action selection strategy Pier Luca Lanzi

20. This looks simple 20 Lets bring RL to the real world! Reinforcement learning assumes that Q(st,at) is represented as a table But the real world is complex, the number of possible inputs can be huge! We cannot afford an exact Q(st,at) Pier Luca Lanzi

21. Example: The Mountain Car 21 rt = 0 when goal is reached, -1 otherwise. GOAL Value Function Q(st,at) st = position, acc . acc velocity . ht, . left, no a cc a= r ig t Task: drive an underpowered car up a steep mountain road Pier Luca Lanzi

22. What are the issues? 22 Exact representation infeasible Approximation mandatory The function is unknown, it is learnt online from experience Pier Luca Lanzi

23. What are the issues? 23 Learning an unknown payoff function while also trying to approximate it Approximator works on intermediate estimates While also providing information for the learning Convergence is not guaranteed Pier Luca Lanzi

24. Whats does this have to do with 24 Learning Classifier Systems? They solve reinforcement learning problems Represent the payoff function Q(st, at) as a population of rules, the classifiers Classifiers are evolved while Q(st, at) is learnt online Pier Luca Lanzi

25. What is a classifier? 25 IF condition C is true for input s THEN the payoff of action A is p Accurate approximations payoff payoff surface for A p General conditions covering large portions Condition of the problem space C(s)=lsu s l u Pier Luca Lanzi

26. What types of solutions? 26 Pier Luca Lanzi

27. How do learning classifier systems work? 27 The main performance cycle Pier Luca Lanzi

28. How do learning classifier systems work? 28 The main performance cycle The classifiers predict an expected payoff The incoming reward is used to update the rules which helped in getting the reward Any reinforcement learning algorithm can be used to estimate the classifier prediction. Pier Luca Lanzi

29. How do learning classifier systems work? 29 The main performance cycle Pier Luca Lanzi

30. 30 In principle, any search method may be used I prefer genetic algorithms Where do classifiers come from? because they are representation independent A genetic algorithm select, recombines, mutate existing classifiers to search for better ones Pier Luca Lanzi

31. What are the good classifiers? 31 What is the classifier fitness? The goal is to approximate a target value function with as few classifiers as possible We wish to have an accurate approximation One possible approach is to define fitness as a function of the classifier prediction accuracy Pier Luca Lanzi

32. What about getting as 32 few classifiers as possible? The genetic algorithm can take care of this General classifiers apply more often, thus they are reproduced more But since fitness is based on classifiers accuracy only accurate classifiers are likely to be reproduced The genetic algorithm evolves maximally general maximally accurate classifiers Pier Luca Lanzi

33. How to apply learning classifier systems 33 Environment Determine the inputs, the actions, and how reward is distributed Determine what is the expected payoff that must be maximized st rt at Decide an action selection strategy Set up the parameters and Learning Classifier System Select a representation for conditions, the recombination and the mutation operators Select a reinforcement learning algorithm Setup the parameters, mainly the population size, the parameters for the genetic algorithm, etc. Pier Luca Lanzi

34. Things can be extremely simple! 34 For instance in supervised classification Environment 1 if the class is correct 0 if the class is not correct example class Select a representation for conditions and the recombination and mutation operators Setup the parameters, mainly the population size, the parameters for the genetic algorithm, etc. Learning Classifier System Pier Luca Lanzi

35. 35 Genetics-Based Accurate Estimates Generalization About Classifiers (Powerful RL) Classifier Representation Pier Luca Lanzi

36. One Representation, 36 One Principle Data described by 6 variables a1, , a6 They represents the simple concept a1=a2 a5=1 A rather typical approach Select a representation Select an algorithm which produces such a representation Apply the algorithm Decision Rules (attribute-value) if (a5 = 1) then class 1 [95.3%] If (a1=3 a2=3) then class = 1 [92.2%] FOIL Clause 0: is_0(a1,a2,a3,a4,a5,a6) :- a1a2, a5 1 Pier Luca Lanzi

37. Learning Classifier Systems: 37 One Principle Many Representations Ternary rules ####1#:1 if a5

evolving rules to solve problems: the learning classifier systems way

Technology