![Page 1: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/1.jpg)
ICS-270a:Notes 5: 1
Notes 5: Game-Playing
ICS 270a Winter 2003
![Page 2: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/2.jpg)
ICS-270a:Notes 5: 2
Summary
• Computer programs which play 2-player games
– game-playing as search
– with the complication of an opponent
• General principles of game-playing and search
– evaluation functions
– minimax principle
– alpha-beta-pruning
– heuristic techniques
• Status of Game-Playing Systems
– in chess, checkers, backgammon, Othello, etc, computers routinely defeat leading world players
• Applications?
– think of “nature” as an opponent
– economics, war-gaming, medical drug treatment
![Page 3: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/3.jpg)
ICS-270a:Notes 5: 3
Chess Rating Scale
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
1966 1971 1976 1981 1986 1991 1997
Ratings
Garry Kasparov (current World Champion) Deep Blue
Deep Thought
![Page 4: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/4.jpg)
ICS-270a:Notes 5: 4
Solving 2-players Games
• Two players, perfect information
• Examples:
– e.g., chess, checkers, tic-tac-toe
• configuration of the board = unique arrangement of “pieces”
• Statement of Game as a Search Problem
– States = board configurations
– Operators = legal moves
– Initial State = current configuration
– Goal = winning configuration
– payoff function = gives numerical value of outcome of the game
• A working example: Grundy's game
– Given a set of coins, a player takes a set and divides it into two unequal sets. The player who plays last, looses.
![Page 5: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/5.jpg)
ICS-270a:Notes 5: 5
Game Tree Representation
• New aspect to search problem– there’s an opponent we cannot control– how can we handle this?
SComputer Moves
OpponentMoves
ComputerMoves
G Possible Goal Statelower in Tree (winning situation for computer)
![Page 6: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/6.jpg)
ICS-270a:Notes 5: 6
Game Trees
![Page 7: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/7.jpg)
ICS-270a:Notes 5: 7
Game Trees
![Page 8: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/8.jpg)
ICS-270a:Notes 5: 8
Grundy’s Game
• Search Tree: represents Max moves.
• Goal: evaluate root node-value of game
• 0 - loss
• 1 - win
• In complex games search to termination is impossible. Rather:
• Find a first good move.
• Do it, wait for Min’s response
• Find a good move from new state
![Page 9: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/9.jpg)
ICS-270a:Notes 5: 9
Grundy’s game - special case of nim
![Page 10: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/10.jpg)
ICS-270a:Notes 5: 10
An optimal procedure: The Min-Max method
• Designed to find the optimal strategy for Max and find best move:
– 1. Generate the whole game tree to leaves
– 2. Apply utility (payoff) function to leaves
– 3. Back-up values from leaves toward the root:
• a Max node computes the max of its child values
• a Min node computes the Min of its child values
– 4. When value reaches the root: choose max value and the corresponding move.
• However: It is impossible to develop the whole search tree, instead
develop part of the tree and evaluate promise of leaves using a static
evaluation function.
![Page 11: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/11.jpg)
ICS-270a:Notes 5: 11
Complexity of Game Playing
• Imagine we could predict the opponent’s moves given each computer move
• How complex would search be in this case?
– worst case, it will be O(bd)
– Chess:
• b ~ 35 (average branching factor)
• d ~ 100 (depth of game tree for typical game)
• bd ~ 35100 ~10154 nodes!!
– Tic-Tac-Toe
• ~5 legal moves, total of 9 moves
• 59 = 1,953,125
• 9! = 362,880 (Computer goes first)
• 8! = 40,320 (Computer goes second)
• well-known games can produce enormous search trees
![Page 12: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/12.jpg)
ICS-270a:Notes 5: 12
Static (Heuristic) Evaluation Functions
• An Evaluation Function:
– estimates how good the current board configuration is for a player.
– Typically, one figures how good it is for the player, and how good it is for the opponent, and subtracts the opponents score from the players
– Othello: Number of white pieces - Number of black pieces
– Chess: Value of all white pieces - Value of all black pieces
• Typical values from -infinity (loss) to +infinity (win) or [-1, +1].
• If the board evaluation is X for a player, it’s -X for the opponent
• Example:
– Evaluating chess boards,
– Checkers
– Tic-tac-toe
![Page 13: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/13.jpg)
ICS-270a:Notes 5: 13
General Minimax Procedure on a Game Tree
For each move:
1. expand the game tree as far as possible
2. assign state evaluations at each open node
3. propagate upwards the minimax choicesif the parent is a Min node (opponent)
propagate up the minimum value of the childrenif the parent is a Max node (computer)
propagate up the maximum value of the children
![Page 14: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/14.jpg)
ICS-270a:Notes 5: 14
Minimax Principle
• “Assume the worst”
– say each configuration has an evaluation number
– high numbers favor the player (the computer)
• so we want to choose moves which maximize evaluation
– low numbers favor the opponent
• so they will choose moves which minimize evaluation
• Minimax Principle
– you (the computer) assume that the opponent will choose the minimizing move next (after your move)
– so you now choose the best move under this assumption
• i.e., the maximum (highest-value) option considering both your move and the opponent’s optimal move.
– we can extend this argument more than 2 moves ahead: we can search ahead as far as we can afford.
![Page 15: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/15.jpg)
ICS-270a:Notes 5: 15
Applying MiniMax to tic-tac-toe
• The static evaluation function heuristic
![Page 16: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/16.jpg)
ICS-270a:Notes 5: 16
Backup Values
![Page 17: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/17.jpg)
ICS-270a:Notes 5: 17
![Page 18: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/18.jpg)
ICS-270a:Notes 5: 18
![Page 19: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/19.jpg)
ICS-270a:Notes 5: 19
Pruning with Alpha/Beta
• In Min-Max there is a separation between node generation and evaluation.
Backup Values
![Page 20: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/20.jpg)
ICS-270a:Notes 5: 20
Alpha Beta Procedure
• Idea:
– Do Depth first search to generate partial game tree,
– Give static evaluation function to leaves,
– compute bound on internal nodes.
• Alpha, Beta bounds:
– Alpha value for Max node means that Max real value is at least alpha.
– Beta for Min node means that Min can guarantee a value below Beta.
• Computation:
– Alpha of a Max node is the maximum value of its seen children.
– Beta of a Min node is the lowest value seen of its child node .
![Page 21: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/21.jpg)
ICS-270a:Notes 5: 21
When to Prune
• Pruning
– Below a Min node whose beta value is lower than or equal to the alpha value of its ancestors.
– Below a Max node having an alpha value greater than or equal to the beta value of any of its Min nodes ancestors.
![Page 22: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/22.jpg)
ICS-270a:Notes 5: 22
Effectiveness of Alpha-Beta Search
• Worst-Case
– branches are ordered so that no pruning takes place. In this case alpha-beta gives no improvement over exhaustive search
• Best-Case
– each player’s best move is the left-most alternative (i.e., evaluated first)
– in practice, performance is closer to best rather than worst-case
• In practice often get O(b(d/2)) rather than O(bd)
– this is the same as having a branching factor of sqrt(b),
• since (sqrt(b))d = b(d/2)
• i.e., we have effectively gone from b to square root of b
– e.g., in chess go from b ~ 35 to b ~ 6
• this permits much deeper search in the same amount of time
![Page 23: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/23.jpg)
ICS-270a:Notes 5: 23
Iterative (Progressive) Deepening
• In real games, there is usually a time limit T on making a move
• How do we take this into account?
– using alpha-beta we cannot use “partial” results with any confidence unless the full breadth of the tree has been searched
– So, we could be conservative and set a conservative depth-limit which guarantees that we will find a move in time < T
• disadvantage is that we may finish early, could do more search
• In practice, iterative deepening search (IDS) is used
– IDS runs depth-first search with an increasing depth-limit
– when the clock runs out we use the solution found at the previous depth limit
![Page 24: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/24.jpg)
ICS-270a:Notes 5: 24
Heuristics and Game Tree Search
• The Horizon Effect
– sometimes there’s a major “effect” (such as a piece being captured) which is just “below” the depth to which the tree has been expanded
– the computer cannot see that this major event could happen
– it has a “limited horizon”
– there are heuristics to try to follow certain branches more deeply to detect to such important events
– this helps to avoid catastrophic losses due to “short-sightedness”
• Heuristics for Tree Exploration
– it may be better to explore some branches more deeply in the allotted time
– various heuristics exist to identify “promising” branches
![Page 25: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/25.jpg)
ICS-270a:Notes 5: 25
Computers can play GrandMaster Chess
• “Deep Blue” (IBM)
– parallel processor, 32 nodes
– each node has 8 dedicated VLSI “chess chips”
– each chip can search 200 million configurations/second
– uses minimax, alpha-beta, heuristics: can search to depth 14
– memorizes starts, end-games
– power based on speed and memory: no common sense
• Kasparov v. Deep Blue, May 1997
– 6 game full-regulation chess match (sponsored by ACM)
– Kasparov lost the match (2.5 to 3.5)
– a historic achievement for computer chess: the first time a computer is the best chess-player on the planet
• Note that Deep Blue plays by “brute-force”: there is relatively little which is similar to human intuition and cleverness
![Page 26: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/26.jpg)
ICS-270a:Notes 5: 26
Status of Computers in Other Games
• Checkers/Draughts– current world champion is Chinook, can beat any human– uses alpha-beta search
• Othello– computers can easily beat the world experts
• Backgammon– system which learns is ranked in the top 3 in the world– uses neural networks to learn from playing many many games against
itself
• Go– branching factor b ~ 360: very large!– $2 million prize for any system which can beat a world expert
![Page 27: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/27.jpg)
ICS-270a:Notes 5: 27
Summary
• Game playing is best modeled as a search problem
• Game trees represent alternate computer/opponent moves
• Evaluation functions estimate the quality of a given board configuration for the Max player.
• Minimax is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them
• Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper
• For many well-known games, computer algorithms based on heuristic search match or out-perform human world experts.
• Reading: Nillson Chapter 12, R&N Chapter 5.
![Page 28: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/28.jpg)
ICS-270a:Notes 5: 28
Minimax Search Example
• Look ahead several turns (we’ll use 2 for now)
• Evaluate resulting board configurations
• The computer will make the move such that when the opponent makes his best move, the board configuration will be in the best position for the computer
![Page 29: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/29.jpg)
ICS-270a:Notes 5: 29
Propagating Minimax Values up the Game Tree
• Starting from the leaves
– Assign a value to the parent node as follows
• Children are Opponent’s moves: Minimum of all immediate children
• Children are Computer’s moves: Maximum of all immediate children
![Page 30: ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003](https://reader030.vdocuments.us/reader030/viewer/2022032705/56649db65503460f94aa8566/html5/thumbnails/30.jpg)
ICS-270a:Notes 5: 30
Deeper Game Trees