machine learning techniques at the core of ... - meetupfiles.meetup.com/20381738/alphago meet up...

ContextGame of Go, Complexity

AlphaGo ML SystemConclusion

Machine Learning Techniques at the core ofAlphaGo success

Stephane Senecal

Orange [email protected]

Paris Machine Learning Applications GroupMeetup, 14/09/2016

1 / 42



Some facts. . . (1/3)

AlphaGo

Computer program, designed by Google DeepMind, which playsthe game of Go

2 / 42




Breakthrough!

AlphaGo defeated EU Go champion Fan Hui in 2015by 5 games won to 0!

“Google DeepMind video: Ground-breaking AlphaGo masters the game of Go”3 / 42




Breakthrough!!!

AlphaGo defeated world-class professional Go player Lee Se-dolby 4 games won to 1!!! (ended 15 March 2016)

4 / 42



Questions. . . (1/2)

Game of Go?

What is the game of Go?Why is it a complex game to play?

5 / 42



Questions. . . (2/2)

AlphaGo Machine Learning (ML) System?

How AlphaGo is built? How does it work?⇒ What are the main ML techniques constituting the system?

6 / 42



Machine Learning at the core of AlphaGo success

Outline:

1 (Context: AlphaGo and its success)

2 Survey of the game of Go and of its complexity

3 High-level introduction to AlphaGo ML system

4 Take away messages, references

7 / 42



The Game of GoComplexity of GoReducing the Complexity

Go (1/3): How to play?

Board with a 19× 19 lines grid, each turn black and white“stones” are placed on the intersections of the lines on the board

(here numbers represent game rounds/turns)8 / 42




Go (2/3): Aim of the Game

⇒ Conquer a larger part of the board than your opponent→ the stones you placed on the board plus the stones which couldbe added inside your own walls

9 / 42




Go (2/3): Aim of the Game

Counting: (11 + 11 = 22) vs (11 + 16 = 27)→ black wins this game by 5 points

10 / 42




Go (3/3): Game Example (272 moves)

11 / 42




Complexity? (1/4)

Go is a game with perfect information:

Each player can see all of the pieces on the board at all times→ it is possible to determine the game outcome under thehypothesis of perfect play by the players

⇒ Optimal value function:

input = every board configuration

output determines the outcome of the game:for example +1 if you win and -1 if your opponent wins

12 / 42




Complexity? (2/4)

Playing Go Perfectly?

Game can be solved by computing the optimal value functionin a search tree

This tree contains ≈ bd possible sequences of moves, where:

b = tree’s breadth → number of possible moves per position

d = tree’s depth → game length

13 / 42




Complexity (3/4): Search Tree → Tic-Tac-Toe Example(tree breadth = 3, tree depth = 3)

14 / 42




Complexity. . . (4/4)

For classical and popular games:

Chess: b ≈ 35 and d ≈ 80⇒ bd ≈ 10124

Go: b ≈ 250 and d ≈ 150⇒ bd ≈ 10360

Magnitudes → number of atoms in the Universe ≈ 1080

⇒ Exhaustive search of optimal game strategies isinfeasible. . .

Huge search space for choosing efficient game strategies:→ difficulty of evaluating board configurations(i.e. the outcome of the game from board configurations)→ difficulty of selecting moves

15 / 42




Reducing the Complexity

Searching in the tree can be simplified via intuitive approaches:

Reducing the depth of the search tree

Reducing the breadth of the search tree

16 / 42




Reducing the Complexity: Tree Depth (1/3)

Reduction of tree depth by board configuration evaluation

→ truncate the search tree at a given level

17 / 42






→ replace the true optimal value function by an approximationfor the subtree below the cut⇒ this predicts the outcome of the game from the currentboard configuration

18 / 42






→ truncate the search tree at a given level→ replace the true optimal value function by an approximation forthe subtree below the cut⇒ this predicts the outcome of the game from the current boardconfiguration

Performance

Leads to efficient (superhuman!) performance in games like Chess,Checkers/Draughts and Othello. . .. . . but believed to be intractable for Go due its complexity

19 / 42




Reducing the Complexity: Tree Breadth (1/2)

Reduction of tree breadth by moves selection

Instead of performing exhaustive search among all possible moves→ Sampling an efficient move from a probability distribution(“policy”) over all possible moves from board configuration

20 / 42




Reducing the Complexity: Tree Breadth (2/2)

Reduction of tree breadth by moves selection

Instead of performing exhaustive search among all possible moves→ Sampling an efficient move from a probability distribution(“policy”) over all possible moves from current board configuration

Performance

Leads to efficient (superhuman!) performance in games likeBackgammon, Scrabble and Go. . .. . . but only for weak amateur playing level in Go

21 / 42



Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques

Google DeepMind AlphaGo ML System?

Reducing the depth and breadth of the search tree with classical approaches⇒ not efficient enough for playing Go at a professional level!

→ Quick review of Google DeepMind’s article [Silver et al. 2016]22 / 42




AlphaGo Summary

Reducing the complexity → deep neural networks

Evaluation of board configurations (prediction of the gameoutcome for a given board configuration, reduce tree depth)→ value networks

Selection of moves (reduce tree breadth) → policy networks

⇒ Deep neural networks trained/learnt by combination of:

Supervised learning from human expert games dataset

Reinforcement learning from games of self-play dataset

(→ Search algorithm in the tree uses Monte Carlo simulationtechniques with value networks and policy networks)

23 / 42




Starting Point: Neural Networks

24 / 42




Deep Neural Networks

Recent advances in Machine Learning (Artificial Intelligence)

⇒ Deep Learning: Deep/Convolutional Neural Networks

improve performance for pattern recognition applications incomputer vision

construct increasingly abstract and localized representations ofimages data

Core idea to design AlphaGo ML system ⇒ employ a similararchitecture/model for the game of Go

25 / 42




Example of Convolutional Neural Network (1/2):Modeling and Training/Learning

26 / 42




Example of a Convolution Kernel

27 / 42




Convolutional Neural Network (2/2): Prediction/Testing

Samoyed 16; Papillon 5.7; Pomeranian 2.7; Arctic fox 1.0;Eskimo dog 0.6; white wolf 0.4; Siberian husky 0.4

28 / 42




AlphaGo in a Nutshell

Deep learning architecture

⇒ Picture the board configuration as a 19× 19 image⇒ Use convolutional neural networks to build a representationof the board configuration

The consideration of deep neural networks aims at reducing thedepth and breadth of the search tree:

evaluating board configurations and predicting gameoutcomes via value networks (→ depth of the search tree)

sampling possible moves from policy networks(→ breadth of the search tree)

29 / 42




AlphaGo Deep Neural Networks Models (1/2)

Value Network (→ reduces tree depth)

takes an image representation of theboard configuration as input

passes it to a convolutional neuralnetwork model (estimated by regression)

outputs (numerical) approximate valueof the optimal value function

Value → predicts the expected gameoutcome for a given board configuration

30 / 42




AlphaGo Deep Neural Networks Models (2/2)

Policy Network (→ reduces tree breadth)

takes an image representation of theboard configuration as input

passes it to a convolutional neuralnetwork model (estimated by supervisedlearning or by reinforcement learning)

outputs a probability distribution forsampling efficient moves given the boardconfiguration

Policy → probability map over the boardfor sampling efficient moves

31 / 42




AlphaGo ML Training/Learning Global Scheme/Pipeline

32 / 42




Reinforcement Learning Framework (1/2)

33 / 42




Reinforcement Learning Framework (2/2)

Reinforcement learning goal: optimize rewards by choosingadequately actions for given observations ⇒ from policies

34 / 42




Reinforcement Learning for Computer Go

35 / 42




AlphaGo Reinforcement Learning Framework

⇒ Reinforcement learning policy network optimizes the finaloutcome of games of self-play, against its previous versions(Reinforcement learning combined with deep neural networksalso efficient for learning how to play to classical video games!)

36 / 42




AlphaGo Reinforcement Learning Framework

⇒ Reinforcement learning policy network optimizes the finaloutcome of games of self-play, against its previous versions(Reinforcement learning combined with deep neural networksalso efficient for learning how to play to classical video games!)

37 / 42



Take Away MessagesReferences

Key/Take Away Messages (1/2)

Game of Go, Complexity for Computer Go

Tractable in theory but quite complex in practice→ searching in a tree of ≈ 10360 sequences of moves. . .

AlphaGo ML System Core Idea

Picturing the board configurations as images and use deep neuralnetworks to build an approximate search tree easier to solve⇒ To perform training/learning efficiently, needs for:

ad hoc and efficient algorithms

massive datasets: 30M expert moves for reinforcementlearning policy network initialization for games vs Fan Hui

huge computational resources: 1202 CPU + 176 GPU forplaying the games vs EU Go champion Fan Hui

38 / 42




Key/Take Away Messages (2/2)

Deep Neural Networks in AlphaGo ML System

Aim at reducing depth and breadth in the original search tree:

by evaluating board configurations via value networks(→ predicting the outcomes of the games)

by sampling game moves from policy networks(computed in particular with reinforcement learning)

AlphaGo → Computer Go → Artificial Intelligence

Playing Go is a very specific task, with 2 enjoyable properties:

possibility to generate games and to perform self-play

stationary problem: game rules do not change over time(like for computer vision and natural language processing)

⇒ but general AI still remains an open and hard problem!39 / 42




AlphaGo and Beyond. . . (1/2)

David Silver et al. (2016)

Mastering the game of Go with deep neural networks and tree search

Nature (529), 484 – 489, 28 January 2016

Volodymyr Mnih et al. (2015) (→ “video games”)

Human-level control through deep reinforcement learning

Nature (518), 529 – 533, 26 February 2015

Richard Sutton and Andrew Barto (1998)

Reinforcement learning: an introduction

MIT Press, 1998

40 / 42




AlphaGo and Beyond. . . (2/2)

Yann LeCun et al. (1990)

Handwritten digit recognition with a back-propagation network

In Proc. of NIPS, 396 – 404, 1990

Geoffrey Hinton, Simon Osindero and Yee-Whye Teh (2006)

A fast learning algorithm for deep belief nets

Neural Computation 18(7), 1527 – 1554, 2006

Yann LeCun, Yoshua Bengio and Geoffrey Hinton (2015)

Deep learning (→ “review”)

Nature (521), 436 – 444, 28 May 2015

41 / 42




Thank you!

Thanks for your attention!

Questions?

(→ [email protected])

Credits: Anaelle Laurans, Vincent Lemaire, Henri Sanson, Mikael Touati @ Orange Labs and DemisHassabis@DeepMind! ©This work is supported by the collaborative research projects ANR NETLEARN (ANR-13-INFR-0004) andEU H2020 5G-PPP COGNET

42 / 42

machine learning techniques at the core of ... - meetupfiles.meetup.com/20381738/alphago meet up...

Documents