machine learning techniques at the core of ... - meetupfiles.meetup.com/20381738/alphago meet up...

ContextGame of Go, Complexity

AlphaGo ML SystemConclusion

Machine Learning Techniques at the core ofAlphaGo success

Stephane Senecal

Orange Labsstephane.senecal@orange.com

Paris Machine Learning Applications GroupMeetup, 14/09/2016

1 / 42

Some facts. . . (1/3)

AlphaGo

Computer program, designed by Google DeepMind, which playsthe game of Go

2 / 42

Breakthrough!

AlphaGo defeated EU Go champion Fan Hui in 2015by 5 games won to 0!

“Google DeepMind video: Ground-breaking AlphaGo masters the game of Go”3 / 42

Breakthrough!!!

AlphaGo defeated world-class professional Go player Lee Se-dolby 4 games won to 1!!! (ended 15 March 2016)

4 / 42

Questions. . . (1/2)

Game of Go?

What is the game of Go?Why is it a complex game to play?

5 / 42

Questions. . . (2/2)

AlphaGo Machine Learning (ML) System?

How AlphaGo is built? How does it work?⇒ What are the main ML techniques constituting the system?

6 / 42

Machine Learning at the core of AlphaGo success

Outline:

1 (Context: AlphaGo and its success)

2 Survey of the game of Go and of its complexity

3 High-level introduction to AlphaGo ML system

4 Take away messages, references

7 / 42

The Game of GoComplexity of GoReducing the Complexity

Go (1/3): How to play?

Board with a 19× 19 lines grid, each turn black and white“stones” are placed on the intersections of the lines on the board

(here numbers represent game rounds/turns)8 / 42

Go (2/3): Aim of the Game

⇒ Conquer a larger part of the board than your opponent→ the stones you placed on the board plus the stones which couldbe added inside your own walls

9 / 42

Go (2/3): Aim of the Game

Counting: (11 + 11 = 22) vs (11 + 16 = 27)→ black wins this game by 5 points

10 / 42

Go (3/3): Game Example (272 moves)

11 / 42

Complexity? (1/4)

Go is a game with perfect information:

Each player can see all of the pieces on the board at all times→ it is possible to determine the game outcome under thehypothesis of perfect play by the players

⇒ Optimal value function:

input = every board configuration

output determines the outcome of the game:for example +1 if you win and -1 if your opponent wins

12 / 42

Complexity? (2/4)

Playing Go Perfectly?

Game can be solved by computing the optimal value functionin a search tree

This tree contains ≈ bd possible sequences of moves, where:

b = tree’s breadth → number of possible moves per position

d = tree’s depth → game length

13 / 42

Complexity (3/4): Search Tree → Tic-Tac-Toe Example(tree breadth = 3, tree depth = 3)

14 / 42

Complexity. . . (4/4)

For classical and popular games:

Chess: b ≈ 35 and d ≈ 80⇒ bd ≈ 10124

Go: b ≈ 250 and d ≈ 150⇒ bd ≈ 10360

Magnitudes → number of atoms in the Universe ≈ 1080

⇒ Exhaustive search of optimal game strategies isinfeasible. . .

Huge search space for choosing efficient game strategies:→ difficulty of evaluating board configurations(i.e. the outcome of the game from board configurations)→ difficulty of selecting moves

15 / 42

Reducing the Complexity

Searching in the tree can be simplified via intuitive approaches:

Reducing the depth of the search tree

Reducing the breadth of the search tree

16 / 42

Reducing the Complexity: Tree Depth (1/3)

Reduction of tree depth by board configuration evaluation

→ truncate the search tree at a given level

17 / 42

→ replace the true optimal value function by an approximationfor the subtree below the cut⇒ this predicts the outcome of the game from the currentboard configuration

18 / 42

→ truncate the search tree at a given level→ replace the true optimal value function by an approximation forthe subtree below the cut⇒ this predicts the outcome of the game from the current boardconfiguration

Performance

Leads to efficient (superhuman!) performance in games like Chess,Checkers/Draughts and Othello. . .. . . but believed to be intractable for Go due its complexity

19 / 42

Reducing the Complexity: Tree Breadth (1/2)

Reduction of tree breadth by moves selection

Instead of performing exhaustive search among all possible moves→ Sampling an efficient move from a probability distribution(“policy”) over all possible moves from board configuration

20 / 42

Reducing the Complexity: Tree Breadth (2/2)

Reduction of tree breadth by moves selection

Instead of performing exhaustive search among all possible moves→ Sampling an efficient move from a probability distribution(“policy”) over all possible moves from current board configuration

Performance

Leads to efficient (superhuman!) performance in games likeBackgammon, Scrabble and Go. . .. . . but only for weak amateur playing level in Go

21 / 42

Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques

Google DeepMind AlphaGo ML System?

Reducing the depth and breadth of the search tree with classical approaches⇒ not efficient enough for playing Go at a professional level!

→ Quick review of Google DeepMind’s article [Silver et al. 2016]22 / 42

AlphaGo Summary

Reducing the complexity → deep neural networks

Evaluation of board configurations (prediction of the gameoutcome for a given board configuration, reduce tree depth)→ value networks

Selection of moves (reduce tree breadth) → policy networks

⇒ Deep neural networks trained/learnt by combination of:

Supervised learning from human expert games dataset

Reinforcement learning from games of self-play dataset

(→ Search algorithm in the tree uses Monte Carlo simulationtechniques with value networks and policy networks)

23 / 42

Starting Point: Neural Networks

24 / 42

Deep Neural Networks

Recent advances in Machine Learning (Artificial Intelligence)

⇒ Deep Learning: Deep/Convolutional Neural Networks

improve performance for pattern recognition applications incomputer vision

construct increasingly abstract and localized representations ofimages data

Core idea to design AlphaGo ML system ⇒ employ a similararchitecture/model for the game of Go

25 / 42

Example of Convolutional Neural Network (1/2):Modeling and Training/Learning

26 / 42

Example of a Convolution Kernel

27 / 42

Convolutional Neural Network (2/2): Prediction/Testing

Samoyed 16; Papillon 5.7; Pomeranian 2.7; Arctic fox 1.0;Eskimo dog 0.6; white wolf 0.4; Siberian husky 0.4

28 / 42

AlphaGo in a Nutshell

Deep learning architecture

⇒ Picture the board configuration as a 19× 19 image⇒ Use convolutional neural networks to build a representationof the board configuration

The consideration of deep neural networks aims at reducing thedepth and breadth of the search tree:

evaluating board configurations and predicting gameoutcomes via value networks (→ depth of the search tree)

sampling possible moves from policy networks(→ breadth of the search tree)

29 / 42

AlphaGo Deep Neural Networks Models (1/2)

Value Network (→ reduces tree depth)

takes an image representation of theboard configuration as input

passes it to a convolutional neuralnetwork model (estimated by regression)

outputs (numerical) approximate valueof the optimal value function

Value → predicts the expected gameoutcome for a given board configuration

30 / 42

AlphaGo Deep Neural Networks Models (2/2)

Policy Network (→ reduces tree breadth)

takes an image representation of theboard configuration as input

passes it to a convolutional neuralnetwork model (estimated by supervisedlearning or by reinforcement learning)

outputs a probability distribution forsampling efficient moves given the boardconfiguration

Policy → probability map over the boardfor sampling efficient moves

31 / 42

AlphaGo ML Training/Learning Global Scheme/Pipeline

32 / 42

Reinforcement Learning Framework (1/2)

33 / 42

Reinforcement Learning Framework (2/2)

Reinforcement learning goal: optimize rewards by choosingadequately actions for given observations ⇒ from policies

34 / 42

Reinforcement Learning for Computer Go

35 / 42

AlphaGo Reinforcement Learning Framework

⇒ Reinforcement learning policy network optimizes the finaloutcome of games of self-play, against its previous versions(Reinforcement learning combined with deep neural networksalso efficient for learning how to play to classical video games!)

36 / 42

AlphaGo Reinforcement Learning Framework

⇒ Reinforcement learning policy network optimizes the finaloutcome of games of self-play, against its previous versions(Reinforcement learning combined with deep neural networksalso efficient for learning how to play to classical video games!)

37 / 42

Take Away MessagesReferences

Key/Take Away Messages (1/2)

Game of Go, Complexity for Computer Go

Tractable in theory but quite complex in practice→ searching in a tree of ≈ 10360 sequences of moves. . .

AlphaGo ML System Core Idea

Picturing the board configurations as images and use deep neuralnetworks to build an approximate search tree easier to solve⇒ To perform training/learning efficiently, needs for:

ad hoc and efficient algorithms

massive datasets: 30M expert moves for reinforcementlearning policy network initialization for games vs Fan Hui

huge computational resources: 1202 CPU + 176 GPU forplaying the games vs EU Go champion Fan Hui

38 / 42

Key/Take Away Messages (2/2)

Deep Neural Networks in AlphaGo ML System

Aim at reducing depth and breadth in the original search tree:

by evaluating board configurations via value networks(→ predicting the outcomes of the games)

by sampling game moves from policy networks(computed in particular with reinforcement learning)

AlphaGo → Computer Go → Artificial Intelligence

Playing Go is a very specific task, with 2 enjoyable properties:

possibility to generate games and to perform self-play

stationary problem: game rules do not change over time(like for computer vision and natural language processing)

⇒ but general AI still remains an open and hard problem!39 / 42

AlphaGo and Beyond. . . (1/2)

David Silver et al. (2016)

Mastering the game of Go with deep neural networks and tree search

Nature (529), 484 – 489, 28 January 2016

Volodymyr Mnih et al. (2015) (→ “video games”)

Human-level control through deep reinforcement learning

Nature (518), 529 – 533, 26 February 2015

Richard Sutton and Andrew Barto (1998)

Reinforcement learning: an introduction

MIT Press, 1998

40 / 42

AlphaGo and Beyond. . . (2/2)

Yann LeCun et al. (1990)

Handwritten digit recognition with a back-propagation network

In Proc. of NIPS, 396 – 404, 1990

Geoffrey Hinton, Simon Osindero and Yee-Whye Teh (2006)

A fast learning algorithm for deep belief nets

Neural Computation 18(7), 1527 – 1554, 2006

Yann LeCun, Yoshua Bengio and Geoffrey Hinton (2015)

Deep learning (→ “review”)

Nature (521), 436 – 444, 28 May 2015

41 / 42

Thank you!

Thanks for your attention!

Questions?

(→ stephane.senecal@orange.com)

Credits: Anaelle Laurans, Vincent Lemaire, Henri Sanson, Mikael Touati @ Orange Labs and DemisHassabis@DeepMind! ©This work is supported by the collaborative research projects ANR NETLEARN (ANR-13-INFR-0004) andEU H2020 5G-PPP COGNET

42 / 42

machine learning techniques at the core of ... - meetupfiles.meetup.com/20381738/alphago meet up...

Documents

machine learning: basic techniques

frog classiﬁcation using machine learning techniques

malware detection using machine learning techniques

machine learning: applications, process and techniques

benchmarking machine learning techniques

machine learning techniques for face analysis -...

parameters and machine learning techniques

deep q-learning, alphago and alphazero

maurizio lenzerini tiziana catarci · 2016: alphago beats...

go in numbers€¦ · evaluating seoul alphago against...

alphago tech talk

machine intelligence sweden cto hans salomonsson google’s...

deepblue, alphago, and ai?

alphago vs alphago - game 2 (pdf) - deepmind · pdf...

techniques for machine understanding of live drum...

machine learning methods for user positioning with uplink...

machine learning techniques for analyzing training...

rushi techniques, rajkot, taping machine

machine learning techniques for computer vision

assessment of machine learning techniques for...