computed prediction: so far, so good. what now?

24
Computed Prediction So far, so good. What now? Pier Luca Lanzi Politecnico di Milano, Italy Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana Champaign, USA

Upload: xavier-llora

Post on 26-Dec-2014

249 views

Category:

Technology


0 download

DESCRIPTION

Pier Luca Lanzi talks at NIGEL 2006 about computed predictions

TRANSCRIPT

Page 1: Computed Prediction:  So far, so good. What now?

Computed PredictionSo far, so good. What now?

Pier Luca Lanzi

Politecnico di Milano, ItalyIllinois Genetic Algorithms Laboratory,University of Illinois at Urbana Champaign, USA

Page 2: Computed Prediction:  So far, so good. What now?

RL

Page 3: Computed Prediction:  So far, so good. What now?

What is the problem?

Environment

Agent

st atrt+1st+1

Compute a value function Q(st,at) mapping state-action pairs into expected future payoffs

How much future reward when action at is performed in state st?

What is the expected payoff for st and at?

GOAL: maximize the amount of reward received in the long run

Page 4: Computed Prediction:  So far, so good. What now?

Example: The Mountain Car

GOAL

Task: drive an underpoweredcar up a steep mountain road

a t = a

cc. l

eft,

acc.

right

, no

acc.

st = position, velocity

rt = 0 when goal isreached, -1 otherwise.

Value FunctionQ(st,at)

Page 5: Computed Prediction:  So far, so good. What now?

What are the issues?

Exact representation infeasible Approximation mandatory The function is unknown,

it is learnt online from experience

Learning the unknown payoff functionwhile also trying to approximate it

Approximator works on intermediate estimatesbut it also tries to provide information for the

learning

Convergence is not guaranteed

Page 6: Computed Prediction:  So far, so good. What now?

Classifiers

Page 7: Computed Prediction:  So far, so good. What now?

Learning Classifier Systems

Solve reinforcement learning problems

Represent the payoff function Q(st, at) asa population of rules, the classifiers.

Classifiers are evolved whileQ(st, at) is learnt online

Page 8: Computed Prediction:  So far, so good. What now?

payoff

surface for A

What is a classifier?

IF condition C is true for input sTHEN the payoff of action a is p

s

payoff

l u

p

ConditionC(s)=l≤s≤u

General conditionscovering large portionsof the problem space

Accurateapproximations

Generalization depends on how wellconditions can partition the problem space

What is the best representation for theproblem?

Several representations have beendeveloped to improve generalization

Page 9: Computed Prediction:  So far, so good. What now?

payoff

landscape of A

What is computed prediction?

Replace the prediction p bya parametrized functionp(x,w)

x

payoff

l u

p(x,w)=w0+xw1

ConditionC(s)=l≤s≤u

IF condition C is true for input sTHEN the value of action a isp(x,w)

Which Representation?

Which type ofapproximation?

Page 10: Computed Prediction:  So far, so good. What now?

Computed Prediction:Linear approximation

Each classifier has a vector of parameters wClassifier prediction is computed as,

Classifier weights are updated usingWidrow-Hoff update,

Page 11: Computed Prediction:  So far, so good. What now?

Summary

Page 12: Computed Prediction:  So far, so good. What now?

Typical RL approach:What is the best approximator?

GOAL: Learn thepayoff function

Typical LCS approach asks:What is the best representation

for the problem?

What are the differences?

REPRESENTATION

intervals messy Symbols

Hullsellipsoid0/1/#AP

PR

OX

IMA

TOR

GradientDescent

Radial Basis

NNs

Tile Coding

ComputedPrediction

BooleanRepresentationSigmoidPrediction

BooleanRepresentation

NeuralPrediction

(O’hara & Bull2004)

Real IntervalsNeuralPrediction

Convex HullsLinearPrediction

Page 13: Computed Prediction:  So far, so good. What now?

To represent or to approximate?

Powerful representations allow the solution ofdifficult problems with basic approximators

Powerful approximators may make thechoice of the representation less critical

Experiment

Consider a very powerful approximatorthat we know it can solve a certain RL problem

Use it to compute classifier prediction in an LCSand apply the LCS to solve the same problem

Does genetic search stillprovide an advantage?

Page 14: Computed Prediction:  So far, so good. What now?

Computed prediction with Tile Coding

Powerful approximator developed inthe reinforcement learning community

Tile coding can solve the mountain car problemgiven an adequate parameter setting

Classifier prediction is computed using tile coding Each tile coding has a different parameter settings When using tile coding to compute

classifier prediction, one classifier cansolve the whole problem

What should we expect?

Page 15: Computed Prediction:  So far, so good. What now?

The performance?

Computed prediction can perform as well as theapproximator with the most adequate configuration

The evolution of a population of classifiersprovides advantages over one approximator

Even if the same approximator alonemight solve the whole problem

Page 16: Computed Prediction:  So far, so good. What now?

How do parameters evolve?

Page 17: Computed Prediction:  So far, so good. What now?

What now?

Page 18: Computed Prediction:  So far, so good. What now?

What now?REPRESENTATION

AP

PR

OX

IMA

TOR

Problem

Whichrepresentation?

Whichapproximator?

Which approximator?

Let evolution decide!

Population of classifiers using differentapproximators to compute prediction

The genetic algorithm selects the bestapproximators for each problem subspace

Page 19: Computed Prediction:  So far, so good. What now?

Evolving the best approximator

Page 20: Computed Prediction:  So far, so good. What now?

What next?REPRESENTATION

AP

PR

OX

IMA

TOR

Problem

Whichrepresentation?

Whichapproximator?

Which approximator?

Let evolution decide!

Population of classifiers using differentapproximators to compute prediction

Even if the same approximator alonemight solve the whole problem

Page 21: Computed Prediction:  So far, so good. What now?

Evolving Heterogeneous Approximators

HeterogeneousApproximators

Most PowerfulApproximator

Page 22: Computed Prediction:  So far, so good. What now?

What next?

Allow different representationsin the same populations

Let evolution evolve the most adequaterepresentation for each problem subspace

Then, allow different representations anddifferent approximators evolve all together

Probably donefor BooleanConditions

Page 23: Computed Prediction:  So far, so good. What now?

Acknowledgements

Daniele LoiaconoMatteo ZaniniAll the current and former

members of IlliGAL

Page 24: Computed Prediction:  So far, so good. What now?

Thank you!Any question?