computed prediction: so far, so good. what now?

Computed PredictionSo far, so good. What now?

Pier Luca Lanzi

Politecnico di Milano, ItalyIllinois Genetic Algorithms Laboratory,University of Illinois at Urbana Champaign, USA

What is the problem?

Environment

Agent

st atrt+1st+1

Compute a value function Q(st,at) mapping state-action pairs into expected future payoffs

How much future reward when action at is performed in state st?

What is the expected payoff for st and at?

GOAL: maximize the amount of reward received in the long run

Example: The Mountain Car

GOAL

Task: drive an underpoweredcar up a steep mountain road

a t = a

cc. l

eft,

acc.

right

, no

acc.

st = position, velocity

rt = 0 when goal isreached, -1 otherwise.

Value FunctionQ(st,at)

What are the issues?

Exact representation infeasible Approximation mandatory The function is unknown,

it is learnt online from experience

Learning the unknown payoff functionwhile also trying to approximate it

Approximator works on intermediate estimatesbut it also tries to provide information for the

learning

Convergence is not guaranteed

Classifiers

Learning Classifier Systems

Solve reinforcement learning problems

Represent the payoff function Q(st, at) asa population of rules, the classifiers.

Classifiers are evolved whileQ(st, at) is learnt online

payoff

surface for A

What is a classifier?

IF condition C is true for input sTHEN the payoff of action a is p

s

payoff

l u

p

ConditionC(s)=l≤s≤u

General conditionscovering large portionsof the problem space

Accurateapproximations

Generalization depends on how wellconditions can partition the problem space

What is the best representation for theproblem?

Several representations have beendeveloped to improve generalization

payoff

landscape of A

What is computed prediction?

Replace the prediction p bya parametrized functionp(x,w)

x

payoff

l u

p(x,w)=w0+xw1

ConditionC(s)=l≤s≤u

IF condition C is true for input sTHEN the value of action a isp(x,w)

Which Representation?

Which type ofapproximation?

Computed Prediction:Linear approximation

Each classifier has a vector of parameters wClassifier prediction is computed as,

Classifier weights are updated usingWidrow-Hoff update,

Summary

Typical RL approach:What is the best approximator?

GOAL: Learn thepayoff function

Typical LCS approach asks:What is the best representation

for the problem?

What are the differences?

REPRESENTATION

intervals messy Symbols

Hullsellipsoid0/1/#AP

PR

OX

IMA

TOR

GradientDescent

Radial Basis

NNs

Tile Coding

ComputedPrediction

BooleanRepresentationSigmoidPrediction

BooleanRepresentation

NeuralPrediction

(O’hara & Bull2004)

Real IntervalsNeuralPrediction

Convex HullsLinearPrediction

To represent or to approximate?

Powerful representations allow the solution ofdifficult problems with basic approximators

Powerful approximators may make thechoice of the representation less critical

Experiment

Consider a very powerful approximatorthat we know it can solve a certain RL problem

Use it to compute classifier prediction in an LCSand apply the LCS to solve the same problem

Does genetic search stillprovide an advantage?

Computed prediction with Tile Coding

Powerful approximator developed inthe reinforcement learning community

Tile coding can solve the mountain car problemgiven an adequate parameter setting

Classifier prediction is computed using tile coding Each tile coding has a different parameter settings When using tile coding to compute

classifier prediction, one classifier cansolve the whole problem

What should we expect?

The performance?

Computed prediction can perform as well as theapproximator with the most adequate configuration

The evolution of a population of classifiersprovides advantages over one approximator

Even if the same approximator alonemight solve the whole problem

How do parameters evolve?

What now?

What now?REPRESENTATION

AP

PR

OX

IMA

TOR

Problem

Whichrepresentation?

Whichapproximator?

Which approximator?

Let evolution decide!

Population of classifiers using differentapproximators to compute prediction

The genetic algorithm selects the bestapproximators for each problem subspace

Evolving the best approximator

What next?REPRESENTATION

AP

PR

OX

IMA

TOR

Problem

Whichrepresentation?

Whichapproximator?

Which approximator?

Let evolution decide!

Population of classifiers using differentapproximators to compute prediction

Even if the same approximator alonemight solve the whole problem

Evolving Heterogeneous Approximators

HeterogeneousApproximators

Most PowerfulApproximator

What next?

Allow different representationsin the same populations

Let evolution evolve the most adequaterepresentation for each problem subspace

Then, allow different representations anddifferent approximators evolve all together

Probably donefor BooleanConditions

Acknowledgements

Daniele LoiaconoMatteo ZaniniAll the current and former

members of IlliGAL

Thank you!Any question?

computed prediction: so far, so good. what now?

Technology