queen mary university of london general video game ai and ...epia2017/wp-content/uploads/...talk...

46
General Video Game AI and Bandit Landscape EAs Simon Lucas Queen Mary University of London (with Jialin Liu, Diego Perez, Raluca Gaina, Kamolwan (Mike) Kunanusont) Game AI Research Group

Upload: others

Post on 11-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

General Video Game AI and Bandit Landscape EAs

Simon LucasQueen Mary University of London

(with Jialin Liu, Diego Perez,

Raluca Gaina, Kamolwan (Mike) Kunanusont)

Game AI Research Group

Page 2: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

History of Artificial Intelligence

Boring AI – not very adaptive; siloed

Exciting Learning AI – adaptive and general

Page 3: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Artificial General IntelligenceThree Pillars

• Evolutionary Algorithms• Deep Learning• Simulation-Based Learning / Planning

• Most interesting to use hybrids of all three

• At different temporal scales – including evolution for real-time action selection

Page 4: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Progressing towards AGIWhy Games?

• Games provide the perfect platform:– Experimental development and testing of theories and

systems– Wide range: from simple to complex– Fun for humans to engage– Generally harmless

• Easy to simulate fast and in parallel (unlike real robots)

• Creative aspects as well as performance– Automated game design and game tuning

Page 5: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Talk Outline

• Motivation in Game AI – and General Video Game AI

• A practical new algorithm for noisy optimisation

• Main features:– Adapts the simplest evolutionary algorithm (Random

Mutation Hill-Climber) to use model to guide search

– UCB (Bandit) equation to balance exploration versus exploitation

• Initial results: very promising

Page 6: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Games: Great Source of Noisy Optimisation Problems

• Here we treat noise as uncertainty about how to play well

• About:– Hidden cards (e.g. Poker)

– Dice outcome (Backgammon)

– Hidden Random Seed (Ms Pac-Man)

• Different, but also related:– Unknown intended opponent actions (Chess)

Page 7: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Video Games

• For more than a decade our community has used video games as a great source of AI challenge – Check out IEEE CIG 2017 in New York

Page 8: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Visual Doom AI Competition(Vizdoom)

Page 9: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

StarCraft AI (classic e-sport)http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/custom-ai-programs-take-on-top-ranked-huma

ns-in-starcraft

• Challenging RTS• Strategy +

lightning reactions

• AI currently at amateur level

• I predict AI will be super-human in 2019

Page 10: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Planet Wars – a Simple RTS (Real Time Strategy Game)

Page 11: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Getting Creative: Level Generation(See demo)

Page 12: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

General Video Game AI

• Challenge for AI:– Play any video game– Don’t know the rules– But you know the

score – And when you die

• A bit like walking in to an arcade in the 80s and playing a new game

Page 13: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

General Video Game AIhttp://gvgai.net

Page 14: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

VGDL and the GVGAI Framework

(Sokoban)

14https://github.com/EssexUniversityMCTS/gvgai/wiki/VGDL-Language

Page 15: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

GVGAI Videos

• http://gvgai.net/test/vid/Aliens.mov

• http://gvgai.net/test/vid/Butterflies.mov

• http://gvgai.net/test/vid/Seaquest.mov

• http://gvgai.net/test/vid/Sheriff.mov

Page 16: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Statistical Simulation based CI / AI

• Relies on fast forward model:

–F(s,a) -> s’

Page 17: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Real-Time Decision MakingChallenges for Statistical Search Methods

• Rapid reaction needed: e.g. 40ms or less between requests for action

• Branching factor may be high• Limited horizon– Can’t look very far ahead

• Limited roll-out budget– Don’t have time to perform many roll-outs

• Random actions may be terrible!– And lead to a very flat reward landscape

Page 18: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Monte Carlo Tree Search: the main (CRAZY ?) idea

• Tree policy: choose which node to expand (not necessarily leaf of tree)

• Default (simulation) policy: random playout until end of game

Page 19: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

MCTS Builds Asymmetric Trees

• Aims to balance exploration and exploitation

• In video games the limited roll-out budget is a challenge, but not the only one!

Page 20: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Rolling Horizon Evolution

• Evolve action sequences in real time

• Each time pick first action

• Then run evolution again

Page 21: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Rolling Horizon Evolution ExampleWhere might noise come from?

Int vec:1444333333322Translated into game actions:Up, Left, Left, Left, Down, …Then evaluated by game engineFirst action is used after each optimisation runRe-run EA every 40ms

Page 22: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Motivation for Bandit Landscape EA

• Game AI– Evolving / tuning game parameters / rules / content

(e.g. level design)– Real-time control via rolling horizon evolution

• Applies when the fitness evaluations are:– Noisy– Limited in number either because they are:

• Computationally Expensive• Need to be done very rapidly (real-time)

• Not limited to games of course!

Page 23: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Evolutionary AlgorithmsSimple and Beautiful

• Initialise a RANDOM population of individuals

• Then REPEAT– Evaluate them all – and rank them by FITNESS– BREED Offspring from the FITTEST Parents

• UNTIL Satisfied, or Out of Time

• Attractive simplicity, takes a bit of skill to make them really work …

• One of my VERY FAVOURITE APPROACHES to AI• BUT: Can do even better with a fitness landscape model• (the part in RED is affected by noise)

Page 24: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Simple Evolutionary Algorithm Demo

Page 25: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

We use Evolution to Evolve Vectors of Integers (Discrete Noisy Optimisation)

• But these have very different interpretations

• Could be a sequence of actions to take– Rolling Horizon Evolution

• Or parameters of a game design– Locations of pills in Pac-Man

– Missile velocity in Asteroids

– Jump height in Mario

– Etc.

Page 26: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Game Design: Also Noisy!

• Can design a game• But each time an AI agent or human player plays:– The experience will be different – To to the game, or the player actions

• In a population of players each one may play differently– And therefore have a different experience

• We can measure aspects of this experience• And view game design as noisy optimisation

Page 27: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Value of Fitness Landscape Modelling

• Can lead to more efficient search– Fitter solutions are found more quickly

• We learn more about the problem– Aim now is not just to find fittest possible

solutions

– But also estimate value of untested points in the search space

Page 28: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

System Diagram

• Note the fat connection between the EA and the landscape model

Bandit EANoisy

Fitness Evaluator

Bandit Fitness Landscape

Model

Page 29: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

The Multi-Armed Bandit Problem

At each step pull one arm

Noisy/random reward signal

In order to:* Find the best arm* Minimise regret* Maximise expected return

Page 30: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Which Arm to Pull?UCB1 Balances Exploration v. Exploitation

UCB1 (Auer et al, 2002)Choose arm j so as to maximise:

Mean so far (exploit)

Upper bound on variance (explore)

Page 31: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Example: Simple Space Battlepublic enum BattleParamNames {

DAMAGE_RADIUS, DAMAGE_COST, LOSS, SHIP_SIZE, ROTATION, THRUST}…

EvoVectorSet params = new EvoVectorSet();

params.params.add(new EvoDoubleSet(DAMAGE_RADIUS.toString(), new double[]{5, 20, 50, 100, 200}));

params.params.add(new EvoDoubleSet(DAMAGE_COST.toString(),

new double[]{1, 5, 20, 50}));…

Page 32: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Example Game: Space Battle(Unevolved)

Page 33: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Fitness Landscape Model Interface

public interface FitnessLandscapeModel {

void addPoint(int[] p, double value);

// return a Double object - a null return indicates that // we know nothing yet; Double getSimple(int[] x);

// careful - this can be slow – // it iterates over all points in the search space!

int[] getBestSolution();

int[] getBestOfSampled();

int[] getBestOfSampledPlusNeighbours(int nNeighbours);

}

Page 34: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

The Bandit Landscape EA int[] p = SearchSpaceUtil.randomPoint(searchSpace);

while (evaluator.nEvals() < nEvals) {

double fitness = evaluator.evaluate(p); banditLandscape.addPoint(p, fitness);

EvaluateChoices evc = new EvaluateChoices(banditLandscape, kExplore);

while (evc.n() < nNeighbours) { int[] pp = mutator.randMut(p); evc.add(pp); }

p = evc.picker.getBest(); }

int[] solution = banditLandscape.getBest(); return solution;

Page 35: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Currently we Implement the Bandit Landscape Model as an N-Tuple System

• N-Tuples are the best function approximator that no-one has ever heard of!– (bit like random forests)

• Constant time access (independent of number of samples learned)

• Take projections of a high-dimensional space• Store values for each projection in a look-up table

– Each n-tuple sample provides a table index– Novel contribution:

•Each table entry stores a Statistical Summary Object

Page 36: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Statistical Summary Objectclass StatSummary

Each table entry is of type StatSummaryProvides efficient storage and access to:

Mean, Standard Deviation, Standard Error, Number of Samples, …

For the Bandit EA, we just need to know the mean and number of samples of each point in the search space, but also interesting to query other stats

Page 37: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

2-Dimensional Example

Page 38: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Summary

• Bandit Landscape Evolutionary Algorithm

• Simple algorithm with attractive properties:– Balances exploration versus exploitation

– Makes use of all available information during search

– No need to choose a resampling rate

– Result of search is a landscape model in addition to estimate of best solution

Page 39: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Noisy Sample ProblemNoisy Win Rate Optimisation

• Optimise a bit string such that• Each fitness evaluation flips a biased coin– P(win) = Math.rand < (x / (2^n-1))– i.e. win prob is given by:

• binary number value of bit string / max possible

• This very roughly models this situation game parameter optimisation where some parameters are much more sensitive than others

Page 40: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Budget: 100 Fitness Evals

Page 41: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Simple Space BattleEach ship has a Damage Disc

Page 42: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

N-Tuple Bandit Landscape Evolved Space Battle Video

Page 43: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Summary

• Games provide great application area for AI• And for noisy optimisation

– Both for generating smart players– And designing or tuning new games

• Bandit Landscape Evolutionary Algorithm• Simple algorithm:

– Balances exploration versus exploitation

• We use the same EAs for automated game design and automated game playing

• More detail in our IEEE CEC 2017 Paper

Page 44: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Thank you!

• Questions?

Page 45: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

Some references…• Kamolwan Kunanusont, Raluca Gaina, Jialin Liu, Diego

Perez-Liebana and Simon Lucas, The N-Tuple Bandit Evolutionary Algorithm for Game Improvement, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]

• Jialin Liu, Julian Togelius, Diego Perez-Liebana and Simon M. Lucas, Evolving Game Skill-Depth using General Video Game AI Agents, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]

• Jialin Liu, Diego Perez-Liebana and Simon M. Lucas, Bandit-Based Random Mutation Hill-Climbing, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]

Page 46: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm

http://iggi.org.ukFully funded PhD Studentships