queen mary university of london general video game ai and ...epia2017/wp-content/uploads/...talk...
TRANSCRIPT
General Video Game AI and Bandit Landscape EAs
Simon LucasQueen Mary University of London
(with Jialin Liu, Diego Perez,
Raluca Gaina, Kamolwan (Mike) Kunanusont)
Game AI Research Group
History of Artificial Intelligence
Boring AI – not very adaptive; siloed
Exciting Learning AI – adaptive and general
Artificial General IntelligenceThree Pillars
• Evolutionary Algorithms• Deep Learning• Simulation-Based Learning / Planning
• Most interesting to use hybrids of all three
• At different temporal scales – including evolution for real-time action selection
Progressing towards AGIWhy Games?
• Games provide the perfect platform:– Experimental development and testing of theories and
systems– Wide range: from simple to complex– Fun for humans to engage– Generally harmless
• Easy to simulate fast and in parallel (unlike real robots)
• Creative aspects as well as performance– Automated game design and game tuning
Talk Outline
• Motivation in Game AI – and General Video Game AI
• A practical new algorithm for noisy optimisation
• Main features:– Adapts the simplest evolutionary algorithm (Random
Mutation Hill-Climber) to use model to guide search
– UCB (Bandit) equation to balance exploration versus exploitation
• Initial results: very promising
Games: Great Source of Noisy Optimisation Problems
• Here we treat noise as uncertainty about how to play well
• About:– Hidden cards (e.g. Poker)
– Dice outcome (Backgammon)
– Hidden Random Seed (Ms Pac-Man)
• Different, but also related:– Unknown intended opponent actions (Chess)
Video Games
• For more than a decade our community has used video games as a great source of AI challenge – Check out IEEE CIG 2017 in New York
Visual Doom AI Competition(Vizdoom)
StarCraft AI (classic e-sport)http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/custom-ai-programs-take-on-top-ranked-huma
ns-in-starcraft
• Challenging RTS• Strategy +
lightning reactions
• AI currently at amateur level
• I predict AI will be super-human in 2019
Planet Wars – a Simple RTS (Real Time Strategy Game)
Getting Creative: Level Generation(See demo)
General Video Game AI
• Challenge for AI:– Play any video game– Don’t know the rules– But you know the
score – And when you die
• A bit like walking in to an arcade in the 80s and playing a new game
General Video Game AIhttp://gvgai.net
VGDL and the GVGAI Framework
(Sokoban)
14https://github.com/EssexUniversityMCTS/gvgai/wiki/VGDL-Language
GVGAI Videos
• http://gvgai.net/test/vid/Aliens.mov
• http://gvgai.net/test/vid/Butterflies.mov
• http://gvgai.net/test/vid/Seaquest.mov
• http://gvgai.net/test/vid/Sheriff.mov
Statistical Simulation based CI / AI
• Relies on fast forward model:
–F(s,a) -> s’
Real-Time Decision MakingChallenges for Statistical Search Methods
• Rapid reaction needed: e.g. 40ms or less between requests for action
• Branching factor may be high• Limited horizon– Can’t look very far ahead
• Limited roll-out budget– Don’t have time to perform many roll-outs
• Random actions may be terrible!– And lead to a very flat reward landscape
Monte Carlo Tree Search: the main (CRAZY ?) idea
• Tree policy: choose which node to expand (not necessarily leaf of tree)
• Default (simulation) policy: random playout until end of game
MCTS Builds Asymmetric Trees
• Aims to balance exploration and exploitation
• In video games the limited roll-out budget is a challenge, but not the only one!
Rolling Horizon Evolution
• Evolve action sequences in real time
• Each time pick first action
• Then run evolution again
Rolling Horizon Evolution ExampleWhere might noise come from?
Int vec:1444333333322Translated into game actions:Up, Left, Left, Left, Down, …Then evaluated by game engineFirst action is used after each optimisation runRe-run EA every 40ms
Motivation for Bandit Landscape EA
• Game AI– Evolving / tuning game parameters / rules / content
(e.g. level design)– Real-time control via rolling horizon evolution
• Applies when the fitness evaluations are:– Noisy– Limited in number either because they are:
• Computationally Expensive• Need to be done very rapidly (real-time)
• Not limited to games of course!
Evolutionary AlgorithmsSimple and Beautiful
• Initialise a RANDOM population of individuals
• Then REPEAT– Evaluate them all – and rank them by FITNESS– BREED Offspring from the FITTEST Parents
• UNTIL Satisfied, or Out of Time
• Attractive simplicity, takes a bit of skill to make them really work …
• One of my VERY FAVOURITE APPROACHES to AI• BUT: Can do even better with a fitness landscape model• (the part in RED is affected by noise)
Simple Evolutionary Algorithm Demo
We use Evolution to Evolve Vectors of Integers (Discrete Noisy Optimisation)
• But these have very different interpretations
• Could be a sequence of actions to take– Rolling Horizon Evolution
• Or parameters of a game design– Locations of pills in Pac-Man
– Missile velocity in Asteroids
– Jump height in Mario
– Etc.
Game Design: Also Noisy!
• Can design a game• But each time an AI agent or human player plays:– The experience will be different – To to the game, or the player actions
• In a population of players each one may play differently– And therefore have a different experience
• We can measure aspects of this experience• And view game design as noisy optimisation
Value of Fitness Landscape Modelling
• Can lead to more efficient search– Fitter solutions are found more quickly
• We learn more about the problem– Aim now is not just to find fittest possible
solutions
– But also estimate value of untested points in the search space
System Diagram
• Note the fat connection between the EA and the landscape model
Bandit EANoisy
Fitness Evaluator
Bandit Fitness Landscape
Model
The Multi-Armed Bandit Problem
At each step pull one arm
Noisy/random reward signal
In order to:* Find the best arm* Minimise regret* Maximise expected return
Which Arm to Pull?UCB1 Balances Exploration v. Exploitation
UCB1 (Auer et al, 2002)Choose arm j so as to maximise:
Mean so far (exploit)
Upper bound on variance (explore)
Example: Simple Space Battlepublic enum BattleParamNames {
DAMAGE_RADIUS, DAMAGE_COST, LOSS, SHIP_SIZE, ROTATION, THRUST}…
EvoVectorSet params = new EvoVectorSet();
params.params.add(new EvoDoubleSet(DAMAGE_RADIUS.toString(), new double[]{5, 20, 50, 100, 200}));
params.params.add(new EvoDoubleSet(DAMAGE_COST.toString(),
new double[]{1, 5, 20, 50}));…
Example Game: Space Battle(Unevolved)
Fitness Landscape Model Interface
public interface FitnessLandscapeModel {
void addPoint(int[] p, double value);
// return a Double object - a null return indicates that // we know nothing yet; Double getSimple(int[] x);
// careful - this can be slow – // it iterates over all points in the search space!
int[] getBestSolution();
int[] getBestOfSampled();
int[] getBestOfSampledPlusNeighbours(int nNeighbours);
}
The Bandit Landscape EA int[] p = SearchSpaceUtil.randomPoint(searchSpace);
while (evaluator.nEvals() < nEvals) {
double fitness = evaluator.evaluate(p); banditLandscape.addPoint(p, fitness);
EvaluateChoices evc = new EvaluateChoices(banditLandscape, kExplore);
while (evc.n() < nNeighbours) { int[] pp = mutator.randMut(p); evc.add(pp); }
p = evc.picker.getBest(); }
int[] solution = banditLandscape.getBest(); return solution;
Currently we Implement the Bandit Landscape Model as an N-Tuple System
• N-Tuples are the best function approximator that no-one has ever heard of!– (bit like random forests)
• Constant time access (independent of number of samples learned)
• Take projections of a high-dimensional space• Store values for each projection in a look-up table
– Each n-tuple sample provides a table index– Novel contribution:
•Each table entry stores a Statistical Summary Object
Statistical Summary Objectclass StatSummary
Each table entry is of type StatSummaryProvides efficient storage and access to:
Mean, Standard Deviation, Standard Error, Number of Samples, …
For the Bandit EA, we just need to know the mean and number of samples of each point in the search space, but also interesting to query other stats
2-Dimensional Example
Summary
• Bandit Landscape Evolutionary Algorithm
• Simple algorithm with attractive properties:– Balances exploration versus exploitation
– Makes use of all available information during search
– No need to choose a resampling rate
– Result of search is a landscape model in addition to estimate of best solution
Noisy Sample ProblemNoisy Win Rate Optimisation
• Optimise a bit string such that• Each fitness evaluation flips a biased coin– P(win) = Math.rand < (x / (2^n-1))– i.e. win prob is given by:
• binary number value of bit string / max possible
• This very roughly models this situation game parameter optimisation where some parameters are much more sensitive than others
Budget: 100 Fitness Evals
Simple Space BattleEach ship has a Damage Disc
N-Tuple Bandit Landscape Evolved Space Battle Video
Summary
• Games provide great application area for AI• And for noisy optimisation
– Both for generating smart players– And designing or tuning new games
• Bandit Landscape Evolutionary Algorithm• Simple algorithm:
– Balances exploration versus exploitation
• We use the same EAs for automated game design and automated game playing
• More detail in our IEEE CEC 2017 Paper
Thank you!
• Questions?
Some references…• Kamolwan Kunanusont, Raluca Gaina, Jialin Liu, Diego
Perez-Liebana and Simon Lucas, The N-Tuple Bandit Evolutionary Algorithm for Game Improvement, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]
• Jialin Liu, Julian Togelius, Diego Perez-Liebana and Simon M. Lucas, Evolving Game Skill-Depth using General Video Game AI Agents, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]
• Jialin Liu, Diego Perez-Liebana and Simon M. Lucas, Bandit-Based Random Mutation Hill-Climbing, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]
http://iggi.org.ukFully funded PhD Studentships