machine learning and games simon m. lucas centre for computational intelligence university of essex,...

40
Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Upload: roman-cleasby

Post on 31-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Machine Learning and Games

Simon M. LucasCentre for Computational Intelligence

University of Essex, UK

Page 2: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Overview• Games: dynamic, uncertain, open-ended

– Ready-made test environments– 21 billion dollar industry: space for more machine learning…

• Agent architectures– Where the Computational Intelligence fits– Interfacing the Neural Nets etc– Choice of learning machine (WPC, neural network,

NTuple systems)• Training algorithms

– Evolution / co-evolution– TDL– Hybrids

• Methodology: strong belief in open competitions

Page 3: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

My Angle

• Machine learning– How well can systems learn– Given complex semi-structured environment– With indirect reward schemes

Page 4: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Sample Games

• Car Racing• Othello• Ms Pac-Man– Demo

Page 5: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Agent Basics

• Two main approaches– Action selector– State evaluator

• Each of these has strengths and weaknesses• For any given problem, no hard and fast rules– Experiment!

• Success or failure can hinge on small details!

Page 6: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Co-evolutionEvolutionary algorithm: rank them using a league

Page 7: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

(Co) Evolution v. TDL

• Temporal Difference Learning– Often learns much faster– But less robust– Learns during game-play– Uses information readily available (i.e. current observable

game-state)• Evolution / Co-evolution (vanilla form)– Information from game result(s)– Easier to apply– But wasteful

• Both can learn game strategy from scratch

Page 8: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

In Pictures…

Page 9: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Simple Example: Mountain Car

• Often used to test TD learning methods• Accelerate a car to reach goal at top of incline• Engine force weaker than gravity (DEMO)

Page 10: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

State Value Function

• Actions are applied to current state to generate set of future states

• State value function is used to rate these

• Choose action that leads to highest state value

• Discrete set of actions

Page 11: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Action Selector

• A decision function selects an output directly based on current state of system

• Action may be a discrete choice, or continuous outputs

Page 12: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

TDL – State Value Learned

Page 13: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Evolution : Learns Policy, not Value

Page 14: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Example Network Found by NEAT+Q(Whiteson and Stone, JMLR 2006)

• EvoTDL Hybrid• They used a different input coding• So results not directly comparable

Page 15: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

~Optimal State Value Policy Functionf = abs(v)

Page 16: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Action Controller

• Directly connect velocity to output

• Simple network!• One neuron!• One connection!• Easy to

interpret!vs

Page 17: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

OthelloWith Thomas Runarsson,

University of Iceland

Page 18: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Volatile Piece Difference

moveMove

Page 19: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Setup• Use weighted piece counter– Fast to compute (can play billions of games)– Easy to visualise– See if we can beat the ‘standard’ weights

• Limit search depth to 1-ply– Enables billions of games to be played– For a thorough comparison

• Focus on machine learning rather than game-tree search

• Force random moves (with prob. 0.1)– Get a more robust evaluation of playing ability

Page 20: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Standard “Heuristic” Weights(lighter = more advantageous)

Page 21: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

CEL Algorithm

• Evolution Strategy (ES)– (1, 10) (non-elitist worked best)

• Gaussian mutation– Fixed sigma (not adaptive)– Fixed works just as well here

• Fitness defined by full round-robin league performance (e.g. 1, 0, -1 for w/d/l)

• Parent child averaging– Defeats noise inherent in fitness evaluation

Page 22: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

TDL Algorithm

• Nearly as simple to apply as CELpublic interface TDLPlayer extends Player { void inGameUpdate(double[] prev, double[] next);

void terminalUpdate(double[] prev, double tg);

}

• Reward signal only given at game end• Initial alpha and alpha cooling rate tuned

empirically

Page 23: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

TDL in Java

Page 24: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

CEL (1,10) v. Heuristic

Page 25: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

TDL v. Random and Heuristic

Page 26: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

TDL + CEL v. Heuristic (1 run)

Page 27: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Can we do better?

• Enforce symmetry– This speeds up learning

• Use trusty old friend: N-Tuple System

Page 28: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

NTuple Systems• W. Bledsoe and I. Browning. Pattern recognition and reading by

machine. In Proceedings of the EJCC, pages 225 232, December 1959.

• Sample n-tuples of input space• Map sampled values to memory indexes

– Training: adjust values there– Recognition / play: sum over the values

• Superfast• Related to:

– Kernel trick of SVM (non-linear map to high dimensional space; then linear model)

– Kanerva’s sparse memory model– Also similar to Buro’s look-up table

Page 29: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Symmetric N-Tuple Sampling

Page 30: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

3-tuple Example

Page 31: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

N-Tuple System

• Results used 30 random n-tuples• Snakes created by a random 6-step walk– Duplicates squares deleted

• System typically has around 15000 weights• Simple training rule:

Page 32: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

NTuple System (TDL)total games = 1250

Page 33: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Learned strategy…

Page 34: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Web-based League(snapshot before CEC 2006 Competition)

Page 35: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Results versus CEC 2006 Champion(a manual EVO / TDL hybrid)

Page 36: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

N-Tuple Summary

• Stunning results compared to other game-learning architectures such as MLP

• How might this hold for other problems?• How easy are N-Tuples to apply to other

domains?

Page 37: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Screen Capture Mode:Ms Pac-Man Challenge

Page 38: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Robotic Car Racing

Page 39: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

Conclusions

• Games are great for CI research– Intellectually challenging– Fun to work with

• Agent learning for games is still a black art• Small details can make big differences!– Which inputs to use

• Big details also! (NTuple versus MLP)• Grand challenge: how can we design more efficient

game learners?• EvoTDL hybrids are the way forward.

Page 40: Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

CIG 2008: Perth, WA; http://cigames.org