machine learning techniques at the core of ... - meetupfiles.meetup.com/20381738/alphago meet up...
TRANSCRIPT
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Machine Learning Techniques at the core ofAlphaGo success
Stephane Senecal
Orange [email protected]
Paris Machine Learning Applications GroupMeetup, 14/09/2016
1 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Some facts. . . (1/3)
AlphaGo
Computer program, designed by Google DeepMind, which playsthe game of Go
2 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Some facts. . . (2/3)
Breakthrough!
AlphaGo defeated EU Go champion Fan Hui in 2015by 5 games won to 0!
“Google DeepMind video: Ground-breaking AlphaGo masters the game of Go”3 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Some facts. . . (3/3)
Breakthrough!!!
AlphaGo defeated world-class professional Go player Lee Se-dolby 4 games won to 1!!! (ended 15 March 2016)
4 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Questions. . . (1/2)
Game of Go?
What is the game of Go?Why is it a complex game to play?
5 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Questions. . . (2/2)
AlphaGo Machine Learning (ML) System?
How AlphaGo is built? How does it work?⇒ What are the main ML techniques constituting the system?
6 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Machine Learning at the core of AlphaGo success
Outline:
1 (Context: AlphaGo and its success)
2 Survey of the game of Go and of its complexity
3 High-level introduction to AlphaGo ML system
4 Take away messages, references
7 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Go (1/3): How to play?
Board with a 19× 19 lines grid, each turn black and white“stones” are placed on the intersections of the lines on the board
(here numbers represent game rounds/turns)8 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Go (2/3): Aim of the Game
⇒ Conquer a larger part of the board than your opponent→ the stones you placed on the board plus the stones which couldbe added inside your own walls
9 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Go (2/3): Aim of the Game
Counting: (11 + 11 = 22) vs (11 + 16 = 27)→ black wins this game by 5 points
10 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Go (3/3): Game Example (272 moves)
11 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Complexity? (1/4)
Go is a game with perfect information:
Each player can see all of the pieces on the board at all times→ it is possible to determine the game outcome under thehypothesis of perfect play by the players
⇒ Optimal value function:
input = every board configuration
output determines the outcome of the game:for example +1 if you win and -1 if your opponent wins
12 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Complexity? (2/4)
Playing Go Perfectly?
Game can be solved by computing the optimal value functionin a search tree
This tree contains ≈ bd possible sequences of moves, where:
b = tree’s breadth → number of possible moves per position
d = tree’s depth → game length
13 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Complexity (3/4): Search Tree → Tic-Tac-Toe Example(tree breadth = 3, tree depth = 3)
14 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Complexity. . . (4/4)
For classical and popular games:
Chess: b ≈ 35 and d ≈ 80⇒ bd ≈ 10124
Go: b ≈ 250 and d ≈ 150⇒ bd ≈ 10360
Magnitudes → number of atoms in the Universe ≈ 1080
⇒ Exhaustive search of optimal game strategies isinfeasible. . .
Huge search space for choosing efficient game strategies:→ difficulty of evaluating board configurations(i.e. the outcome of the game from board configurations)→ difficulty of selecting moves
15 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Reducing the Complexity
Searching in the tree can be simplified via intuitive approaches:
Reducing the depth of the search tree
Reducing the breadth of the search tree
16 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Reducing the Complexity: Tree Depth (1/3)
Reduction of tree depth by board configuration evaluation
→ truncate the search tree at a given level
17 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Reducing the Complexity: Tree Depth (2/3)
Reduction of tree depth by board configuration evaluation
→ replace the true optimal value function by an approximationfor the subtree below the cut⇒ this predicts the outcome of the game from the currentboard configuration
18 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Reducing the Complexity: Tree Depth (3/3)
Reduction of tree depth by board configuration evaluation
→ truncate the search tree at a given level→ replace the true optimal value function by an approximation forthe subtree below the cut⇒ this predicts the outcome of the game from the current boardconfiguration
Performance
Leads to efficient (superhuman!) performance in games like Chess,Checkers/Draughts and Othello. . .. . . but believed to be intractable for Go due its complexity
19 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Reducing the Complexity: Tree Breadth (1/2)
Reduction of tree breadth by moves selection
Instead of performing exhaustive search among all possible moves→ Sampling an efficient move from a probability distribution(“policy”) over all possible moves from board configuration
20 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
The Game of GoComplexity of GoReducing the Complexity
Reducing the Complexity: Tree Breadth (2/2)
Reduction of tree breadth by moves selection
Instead of performing exhaustive search among all possible moves→ Sampling an efficient move from a probability distribution(“policy”) over all possible moves from current board configuration
Performance
Leads to efficient (superhuman!) performance in games likeBackgammon, Scrabble and Go. . .. . . but only for weak amateur playing level in Go
21 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Google DeepMind AlphaGo ML System?
Reducing the depth and breadth of the search tree with classical approaches⇒ not efficient enough for playing Go at a professional level!
→ Quick review of Google DeepMind’s article [Silver et al. 2016]22 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
AlphaGo Summary
Reducing the complexity → deep neural networks
Evaluation of board configurations (prediction of the gameoutcome for a given board configuration, reduce tree depth)→ value networks
Selection of moves (reduce tree breadth) → policy networks
⇒ Deep neural networks trained/learnt by combination of:
Supervised learning from human expert games dataset
Reinforcement learning from games of self-play dataset
(→ Search algorithm in the tree uses Monte Carlo simulationtechniques with value networks and policy networks)
23 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Starting Point: Neural Networks
24 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Deep Neural Networks
Recent advances in Machine Learning (Artificial Intelligence)
⇒ Deep Learning: Deep/Convolutional Neural Networks
improve performance for pattern recognition applications incomputer vision
construct increasingly abstract and localized representations ofimages data
Core idea to design AlphaGo ML system ⇒ employ a similararchitecture/model for the game of Go
25 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Example of Convolutional Neural Network (1/2):Modeling and Training/Learning
26 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Example of a Convolution Kernel
27 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Convolutional Neural Network (2/2): Prediction/Testing
Samoyed 16; Papillon 5.7; Pomeranian 2.7; Arctic fox 1.0;Eskimo dog 0.6; white wolf 0.4; Siberian husky 0.4
28 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
AlphaGo in a Nutshell
Deep learning architecture
⇒ Picture the board configuration as a 19× 19 image⇒ Use convolutional neural networks to build a representationof the board configuration
The consideration of deep neural networks aims at reducing thedepth and breadth of the search tree:
evaluating board configurations and predicting gameoutcomes via value networks (→ depth of the search tree)
sampling possible moves from policy networks(→ breadth of the search tree)
29 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
AlphaGo Deep Neural Networks Models (1/2)
Value Network (→ reduces tree depth)
takes an image representation of theboard configuration as input
passes it to a convolutional neuralnetwork model (estimated by regression)
outputs (numerical) approximate valueof the optimal value function
Value → predicts the expected gameoutcome for a given board configuration
30 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
AlphaGo Deep Neural Networks Models (2/2)
Policy Network (→ reduces tree breadth)
takes an image representation of theboard configuration as input
passes it to a convolutional neuralnetwork model (estimated by supervisedlearning or by reinforcement learning)
outputs a probability distribution forsampling efficient moves given the boardconfiguration
Policy → probability map over the boardfor sampling efficient moves
31 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
AlphaGo ML Training/Learning Global Scheme/Pipeline
32 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Reinforcement Learning Framework (1/2)
33 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Reinforcement Learning Framework (2/2)
Reinforcement learning goal: optimize rewards by choosingadequately actions for given observations ⇒ from policies
34 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
Reinforcement Learning for Computer Go
35 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
AlphaGo Reinforcement Learning Framework
⇒ Reinforcement learning policy network optimizes the finaloutcome of games of self-play, against its previous versions(Reinforcement learning combined with deep neural networksalso efficient for learning how to play to classical video games!)
36 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Deep Neural NetworksAlphaGo Deep Learning ArchitectureAlphaGo ML Training/Learning Techniques
AlphaGo Reinforcement Learning Framework
⇒ Reinforcement learning policy network optimizes the finaloutcome of games of self-play, against its previous versions(Reinforcement learning combined with deep neural networksalso efficient for learning how to play to classical video games!)
37 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Take Away MessagesReferences
Key/Take Away Messages (1/2)
Game of Go, Complexity for Computer Go
Tractable in theory but quite complex in practice→ searching in a tree of ≈ 10360 sequences of moves. . .
AlphaGo ML System Core Idea
Picturing the board configurations as images and use deep neuralnetworks to build an approximate search tree easier to solve⇒ To perform training/learning efficiently, needs for:
ad hoc and efficient algorithms
massive datasets: 30M expert moves for reinforcementlearning policy network initialization for games vs Fan Hui
huge computational resources: 1202 CPU + 176 GPU forplaying the games vs EU Go champion Fan Hui
38 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Take Away MessagesReferences
Key/Take Away Messages (2/2)
Deep Neural Networks in AlphaGo ML System
Aim at reducing depth and breadth in the original search tree:
by evaluating board configurations via value networks(→ predicting the outcomes of the games)
by sampling game moves from policy networks(computed in particular with reinforcement learning)
AlphaGo → Computer Go → Artificial Intelligence
Playing Go is a very specific task, with 2 enjoyable properties:
possibility to generate games and to perform self-play
stationary problem: game rules do not change over time(like for computer vision and natural language processing)
⇒ but general AI still remains an open and hard problem!39 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Take Away MessagesReferences
AlphaGo and Beyond. . . (1/2)
David Silver et al. (2016)
Mastering the game of Go with deep neural networks and tree search
Nature (529), 484 – 489, 28 January 2016
Volodymyr Mnih et al. (2015) (→ “video games”)
Human-level control through deep reinforcement learning
Nature (518), 529 – 533, 26 February 2015
Richard Sutton and Andrew Barto (1998)
Reinforcement learning: an introduction
MIT Press, 1998
40 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Take Away MessagesReferences
AlphaGo and Beyond. . . (2/2)
Yann LeCun et al. (1990)
Handwritten digit recognition with a back-propagation network
In Proc. of NIPS, 396 – 404, 1990
Geoffrey Hinton, Simon Osindero and Yee-Whye Teh (2006)
A fast learning algorithm for deep belief nets
Neural Computation 18(7), 1527 – 1554, 2006
Yann LeCun, Yoshua Bengio and Geoffrey Hinton (2015)
Deep learning (→ “review”)
Nature (521), 436 – 444, 28 May 2015
41 / 42
ContextGame of Go, Complexity
AlphaGo ML SystemConclusion
Take Away MessagesReferences
Thank you!
Thanks for your attention!
Questions?
Credits: Anaelle Laurans, Vincent Lemaire, Henri Sanson, Mikael Touati @ Orange Labs and DemisHassabis@DeepMind! ©This work is supported by the collaborative research projects ANR NETLEARN (ANR-13-INFR-0004) andEU H2020 5G-PPP COGNET
42 / 42