queen mary university of london general video game ai and ...epia2017/wp-content/uploads/...talk...

General Video Game AI and Bandit Landscape EAs

Simon LucasQueen Mary University of London

(with Jialin Liu, Diego Perez,

Raluca Gaina, Kamolwan (Mike) Kunanusont)

Game AI Research Group

History of Artificial Intelligence

Boring AI – not very adaptive; siloed

Exciting Learning AI – adaptive and general

Artificial General IntelligenceThree Pillars

• Evolutionary Algorithms• Deep Learning• Simulation-Based Learning / Planning

• Most interesting to use hybrids of all three

• At different temporal scales – including evolution for real-time action selection

Progressing towards AGIWhy Games?

• Games provide the perfect platform:– Experimental development and testing of theories and

systems– Wide range: from simple to complex– Fun for humans to engage– Generally harmless

• Easy to simulate fast and in parallel (unlike real robots)

• Creative aspects as well as performance– Automated game design and game tuning

Talk Outline

• Motivation in Game AI – and General Video Game AI

• A practical new algorithm for noisy optimisation

• Main features:– Adapts the simplest evolutionary algorithm (Random

Mutation Hill-Climber) to use model to guide search

– UCB (Bandit) equation to balance exploration versus exploitation

• Initial results: very promising

Games: Great Source of Noisy Optimisation Problems

• Here we treat noise as uncertainty about how to play well

• About:– Hidden cards (e.g. Poker)

– Dice outcome (Backgammon)

– Hidden Random Seed (Ms Pac-Man)

• Different, but also related:– Unknown intended opponent actions (Chess)

Video Games

• For more than a decade our community has used video games as a great source of AI challenge – Check out IEEE CIG 2017 in New York

Visual Doom AI Competition(Vizdoom)

https://www.youtube.com/watch?v=E4yXb06VmCE

StarCraft AI (classic e-sport)http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/custom-ai-programs-take-on-top-ranked-huma

ns-in-starcraft

• Challenging RTS• Strategy +

lightning reactions

• AI currently at amateur level

• I predict AI will be super-human in 2019

http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/custom-ai-programs-take-on-top-ranked-humans-in-starcraft



Planet Wars – a Simple RTS (Real Time Strategy Game)

Getting Creative: Level Generation(See demo)

General Video Game AI

• Challenge for AI:– Play any video game– Don’t know the rules– But you know the

score – And when you die

• A bit like walking in to an arcade in the 80s and playing a new game

General Video Game AIhttp://gvgai.net

http://gvgai.net/

VGDL and the GVGAI Framework

(Sokoban)

14https://github.com/EssexUniversityMCTS/gvgai/wiki/VGDL-Language

GVGAI Videos

• http://gvgai.net/test/vid/Aliens.mov

• http://gvgai.net/test/vid/Butterflies.mov

• http://gvgai.net/test/vid/Seaquest.mov

• http://gvgai.net/test/vid/Sheriff.mov

http://gvgai.net/test/vid/Aliens.mov

http://gvgai.net/test/vid/Butterflies.mov

http://gvgai.net/test/vid/Seaquest.mov

http://gvgai.net/test/vid/Sheriff.mov

Statistical Simulation based CI / AI

• Relies on fast forward model:

–F(s,a) -> s’

Real-Time Decision MakingChallenges for Statistical Search Methods

• Rapid reaction needed: e.g. 40ms or less between requests for action

• Branching factor may be high• Limited horizon– Can’t look very far ahead

• Limited roll-out budget– Don’t have time to perform many roll-outs

• Random actions may be terrible!– And lead to a very flat reward landscape

Monte Carlo Tree Search: the main (CRAZY ?) idea

• Tree policy: choose which node to expand (not necessarily leaf of tree)

• Default (simulation) policy: random playout until end of game

MCTS Builds Asymmetric Trees

• Aims to balance exploration and exploitation

• In video games the limited roll-out budget is a challenge, but not the only one!

Rolling Horizon Evolution

• Evolve action sequences in real time

• Each time pick first action

• Then run evolution again

Rolling Horizon Evolution ExampleWhere might noise come from?

Int vec:1444333333322Translated into game actions:Up, Left, Left, Left, Down, …Then evaluated by game engineFirst action is used after each optimisation runRe-run EA every 40ms

http://videos/GanPacmanVideo.wmv

Motivation for Bandit Landscape EA

• Game AI– Evolving / tuning game parameters / rules / content

(e.g. level design)– Real-time control via rolling horizon evolution

• Applies when the fitness evaluations are:– Noisy– Limited in number either because they are:

• Computationally Expensive• Need to be done very rapidly (real-time)

• Not limited to games of course!

Evolutionary AlgorithmsSimple and Beautiful

• Initialise a RANDOM population of individuals

• Then REPEAT– Evaluate them all – and rank them by FITNESS– BREED Offspring from the FITTEST Parents

• UNTIL Satisfied, or Out of Time

• Attractive simplicity, takes a bit of skill to make them really work …

• One of my VERY FAVOURITE APPROACHES to AI• BUT: Can do even better with a fitness landscape model• (the part in RED is affected by noise)

Simple Evolutionary Algorithm Demo

We use Evolution to Evolve Vectors of Integers (Discrete Noisy Optimisation)

• But these have very different interpretations

• Could be a sequence of actions to take– Rolling Horizon Evolution

• Or parameters of a game design– Locations of pills in Pac-Man

– Missile velocity in Asteroids

– Jump height in Mario

– Etc.

Game Design: Also Noisy!

• Can design a game• But each time an AI agent or human player plays:– The experience will be different – To to the game, or the player actions

• In a population of players each one may play differently– And therefore have a different experience

• We can measure aspects of this experience• And view game design as noisy optimisation

Value of Fitness Landscape Modelling

• Can lead to more efficient search– Fitter solutions are found more quickly

• We learn more about the problem– Aim now is not just to find fittest possible

solutions

– But also estimate value of untested points in the search space

System Diagram

• Note the fat connection between the EA and the landscape model

Bandit EANoisy

Fitness Evaluator

Bandit Fitness Landscape

Model

The Multi-Armed Bandit Problem

At each step pull one arm

Noisy/random reward signal

In order to:* Find the best arm* Minimise regret* Maximise expected return

Which Arm to Pull?UCB1 Balances Exploration v. Exploitation

UCB1 (Auer et al, 2002)Choose arm j so as to maximise:

Mean so far (exploit)

Upper bound on variance (explore)

Example: Simple Space Battlepublic enum BattleParamNames {

DAMAGE_RADIUS, DAMAGE_COST, LOSS, SHIP_SIZE, ROTATION, THRUST}…

EvoVectorSet params = new EvoVectorSet();

params.params.add(new EvoDoubleSet(DAMAGE_RADIUS.toString(), new double[]{5, 20, 50, 100, 200}));

params.params.add(new EvoDoubleSet(DAMAGE_COST.toString(),

new double[]{1, 5, 20, 50}));…

Example Game: Space Battle(Unevolved)

Fitness Landscape Model Interface

public interface FitnessLandscapeModel {

void addPoint(int[] p, double value);

// return a Double object - a null return indicates that // we know nothing yet; Double getSimple(int[] x);

// careful - this can be slow – // it iterates over all points in the search space!

int[] getBestSolution();

int[] getBestOfSampled();

int[] getBestOfSampledPlusNeighbours(int nNeighbours);

}

The Bandit Landscape EA int[] p = SearchSpaceUtil.randomPoint(searchSpace);

while (evaluator.nEvals() < nEvals) {

double fitness = evaluator.evaluate(p); banditLandscape.addPoint(p, fitness);

EvaluateChoices evc = new EvaluateChoices(banditLandscape, kExplore);

while (evc.n() < nNeighbours) { int[] pp = mutator.randMut(p); evc.add(pp); }

p = evc.picker.getBest(); }

int[] solution = banditLandscape.getBest(); return solution;

Currently we Implement the Bandit Landscape Model as an N-Tuple System

• N-Tuples are the best function approximator that no-one has ever heard of!– (bit like random forests)

• Constant time access (independent of number of samples learned)

• Take projections of a high-dimensional space• Store values for each projection in a look-up table

– Each n-tuple sample provides a table index– Novel contribution:

•Each table entry stores a Statistical Summary Object

Statistical Summary Objectclass StatSummary

Each table entry is of type StatSummaryProvides efficient storage and access to:

Mean, Standard Deviation, Standard Error, Number of Samples, …

For the Bandit EA, we just need to know the mean and number of samples of each point in the search space, but also interesting to query other stats

2-Dimensional Example

Summary

• Bandit Landscape Evolutionary Algorithm

• Simple algorithm with attractive properties:– Balances exploration versus exploitation

– Makes use of all available information during search

– No need to choose a resampling rate

– Result of search is a landscape model in addition to estimate of best solution

Noisy Sample ProblemNoisy Win Rate Optimisation

• Optimise a bit string such that• Each fitness evaluation flips a biased coin– P(win) = Math.rand < (x / (2^n-1))– i.e. win prob is given by:

• binary number value of bit string / max possible

• This very roughly models this situation game parameter optimisation where some parameters are much more sensitive than others

Budget: 100 Fitness Evals

Simple Space BattleEach ship has a Damage Disc

N-Tuple Bandit Landscape Evolved Space Battle Video

Summary

• Games provide great application area for AI• And for noisy optimisation

– Both for generating smart players– And designing or tuning new games

• Bandit Landscape Evolutionary Algorithm• Simple algorithm:

– Balances exploration versus exploitation

• We use the same EAs for automated game design and automated game playing

• More detail in our IEEE CEC 2017 Paper

Thank you!

• Questions?

Some references…• Kamolwan Kunanusont, Raluca Gaina, Jialin Liu, Diego

Perez-Liebana and Simon Lucas, The N-Tuple Bandit Evolutionary Algorithm for Game Improvement, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]

• Jialin Liu, Julian Togelius, Diego Perez-Liebana and Simon M. Lucas, Evolving Game Skill-Depth using General Video Game AI Agents, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]

• Jialin Liu, Diego Perez-Liebana and Simon M. Lucas, Bandit-Based Random Mutation Hill-Climbing, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]

http://www.diego-perez.net/papers/NTupleBanditGameImprovement.pdf

http://www.diego-perez.net/papers/EvolvingGameSkillDepth.pdf

http://www.diego-perez.net/papers/BBasedRMHC.pdf

http://iggi.org.ukFully funded PhD Studentships

http://iggi.org.uk/

queen mary university of london general video game ai and ...epia2017/wp-content/uploads/...talk...

Documents