practical techniques for agents playing multi-player games

Practical techniques for agents playing multi-player games

Quiz: Complexity of MinimaxChess: has an average branching factor of ~30, and each game takes on average ~40.

If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax?A few millisecondsA few secondsA few minutesA few hoursA few daysA few years?A few decades?A few millenia (thousands of years)?More time than the age of the universe?

Quiz: Complexity of MinimaxChess: has an average branching factor of ~30, and each game takes on average ~40.

If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax?A few millisecondsA few secondsA few minutesA few hoursA few daysA few years?A few decades?A few millenia (thousands of years)?More time than the age of the universe

Strategies for coping with complexity

• Reduce b• Reduce m• Memoize

Reduce b: Alpha-beta pruning

-5

121

∆∇ ∇ ∇4

56

20

-92 155-779

∆ ∆ ∆ ∆ ∆9 7

4

During Minimax search (assume depth-first, left-to-right order): First get a 6 for the left-most child of the root.For the middle child of the root, the first child is a 1.The agent can stop searching the middle child after this 1.

<=1

Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-92 1531079

∆ ∆ ∆ ∆ ∆9 7

4

The agent can stop searching the middle child after this 1.The reason is that this is a Min node, and by finding a 1,we’ve already guaranteed that Min would select AT MOST a 1.So, we’ve guaranteed that MAX would not select this child, and we can move on.

<=1

Quiz: Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-92 1531079

∆ ∆ ∆ ∆ ∆9 7

4

What other nodes will be visited, if the agent continues with this technique?

What will be the values of those nodes?

<=1

Answer: Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-92 1531079

∆ ∆ ∆ ∆ ∆9 7

4

What other nodes will be visited, if the agent continues with this technique?

What will be the values of those nodes?

<=1

3

<=3

Quiz: Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-95 2151079

∆ ∆ ∆ ∆ ∆

Suppose the algorithm visits nodes depth-first, but Right-to-Left.

What nodes will be visited, and what are the values of those nodes?

Answer: Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-95 2151079

∆ ∆ ∆ ∆ ∆

Going right-to-left in this tree, there are fewer opportunities for pruning: effects of pruning depend on the values in the tree.

On average, this technique tends to cut branching factors down to their square root (from b to √b).

2>=15

2

10

1

7>=9

4

4

Reduce m: evaluation functions

4

121

∆∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆ ∆ ∆

Suppose searching to a depth of m=3 is just too expensive.What we’ll do instead is introduce a horizon (h), or cutoff.For this example, we’ll let h=2.No nodes will be visited beyond the horizon.


4

121

∆∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆ ∆ ∆Problem: how do we determine the value of non-terminal nodes at the horizon?

The general answer is to introduce evaluation functions, which estimate (or guess) the value of a node.

? ? ? ? ?


4

121

∆∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆ ∆ ∆Let E(n) = w1 f1(n) + w2 f2(n) + … + wk fk(n)Each fi(n) is a “feature function” of the node, that returns some real number describing the node in some way.Each wi is a real number weight or parameter.One common way to create E(n) is to get game experts to come up with appropriate fi and wi.

? ? ? ? ?

Hex Evaluation Function Example

As an example, one possible fi function for Hex could be the shortest path to a solution for Red, minus the shortest path to a solution for Blue.

fi = shortest path for red – shortest path for blue = 2 - 1 = 1


If Red is Max, we can assign wi = -1.

This encodes the intuition that if Red has a longer shortest path than Blue, then this is a bad position for Red.

fi = shortest path for red – shortest path for blue = 2 - 1 = 1


Can you think of some other potential fi for Hex?

Notice, the important thing is that fi should be correlated with Value(n).

Learning an evaluation function

Experts are often good at coming up with fi functions.

But it’s often hard for a game expert (or anyone) to come up with the best wi weights for E(n).

An alternative approach is to create an algorithm to learn the wi weights from data.

What’s the data?To do machine learning, you need data that contains inputs and labels.

For an evaluation function, that means board positions and values.

But we’ve already said that it’s hard to figure out the right value for many board positions – that’s the whole point of the evaluation function in the first place.

Instead of asking people to label boards with values, a common approach is to have a simulation of the agent playing against itself.

The outcome of the game is used as the value of the board positions along the way.

Quiz: What’s the learning algorithm?

Once you’ve collected enough examples of board positions and values, there are lots of algorithms to do the learning.

For the kind of evaluation function I introduced, name some appropriate learning techniques that we’ve discussed.

Answer: What’s the learning algorithm?

Once you’ve collected enough examples of board positions and values, there are lots of algorithms to do the learning.

For the kind of evaluation function I introduced, name some appropriate learning techniques that we’ve discussed.

Two come to my mind:-Linear Regression-Gradient Descent

Quiz: Horizon effect example

1. What is Red’s best move?

2. If we use a horizon of 2 and the “shortest-path” evaluation, what will Red choose?

Answer: Horizon effect example


If Red moves to either of these squares, Blue can easily block it by moving to the other one.

If Red moves to either of these squares, Blue can easily win by moving to the other one.



If Red moves to any of these squares,

Blue can win by moving here.



Red’s only chance is to move here first.

In fact, if Red does that, Red should win the game.

But – that’s hard to see, and requires seeing many moves in advance.



This choice gives Red a shortest path of 3. Many of Blue’s responses would

decrease Blue’s shortest path to 2for a difference of 1 in favor of Blue.



This choice gives Red a shortest path of 3. Many of Blue’s responses would

decrease Blue’s shortest path to 2for a difference of 1 in favor of Blue.

But that’s basically evaluation for many of Red’s moves. So Red has no idea what move is the best move, and must pick randomly.

MemoizationMemoization involves remembering/memorizing certain good positions or strategies or moves, to avoid doing a lot of search when such positions or moves become available.Some common examples:• Opening book: A database of good positions for a particular player in the

beginning phases of a game, as determined by game experts. (These are especially important for chess.)

• Closing book: A database of board positions that are close to the end of the game, with the best possible strategy for completing the game from that position.

• Killer moves: A technique of remembering when some move in a game tree results in a big change to the game (eg, someone’s queen gets taken, in chess). If this happens in one place in a game tree, it’s a good idea to check for it in other branches of the tree as well.

Full AlgorithmInitial Call: Value(root, 0, -∞, +∞).α: best value for Max found so far. β: best value for Min found so far.Value(n, depth, α, β):- If n is a terminal node, return ∆’s utility - If depth >= cutoff, return E(n)- If n is ∆’s turn:

v = -∞For each c Children(n):∊v = max(v, Value(c, depth+1, α, β))if v >= β: return v (pruning step)α = max(α, v)Return v

- If n is ’s turn:∇v = +∞For each c Children(n):∊v = min(v, Value(c, depth+1, α, β))if v <= α: return v (pruning step) β = min(β, v)Return v

Benefits to Complexity• Reduce b: O(bm) O(bm/2)• Reduce m: O(bm) O(bh), h << m• Memoize: O(bm) O(1), if board position has already

been analyzed before

Note: alpha-beta pruning is an exact method: the best move using alpha-beta pruning is the same as the best move without it (normal Minimax).

Horizon cutoffs are approximate methods: you may get bad results for the choice of an action, if you get a horizon effect.

Example of a Real System: Chinook (checkers player)

Checkers: • Red moves first.• Each move is either

- a diagonal move one square- a diagonal jump over an opponent’s

piece, which removes the opponent’s piece.

• Multiple jumps are possible.• If a Jump is available to a player, the player

must take it.• Ordinarily, pieces can only move forward.• If a piece gets to the opposite side of the

board, it gets “crowned”.• A “crowned” piece can move forward or

backwards.


Chinook: Chinook is a computer program that plays Checkers.

1990: It became the first computer program to win the right to compete for a world championship in a game or sport. (It came second in the US National competition.)

The Checkers governing body didn’t like that, but they created the World Man-Machine Competition.

1994: Chinook wins the World Man-Machine Competition against Dr. Marion Tinsley, after Tinsley withdraws due to health problems.

1995: Chinook defended its title against Don Lafferty. After that, the program creators decided to retire Chinook.

2007: The program creators proved that the best anyone can do against Chinook is draw.


Chinook: The Chinook system:

• Minimax + alpha-beta pruning• Hand-crafted evaluation function (no learning

component)Linear function with features like:• Num. pieces for each player• How many kings for each player• How many kings are “trapped”• How many pieces are “runaways” – nothing to

stop them from being crowned

• Opening move database from checkers experts• End-game database that stores the best move from

all positions with 8 or fewer pieces.

Stochastic Games

Many games (Backgammon, Monopoly, World of Warcraft, etc.) involve some randomness.

An attack against another creature may or may not succeed, or the number of squares your piece is allowed to move may depend on a dice roll.

Stochastic Games: Giving Nature a turn

4

121

∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆

?1 2 3

? ?

In Stochastic games, we give “Nature” a turn in the game tree whenever it’s time for a dice roll or some random event.

We’ll represent Nature’s turn with a ?, and call these “Chance nodes”.

Stochastic Games: Giving Nature a turn

4

121

∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆

?1 2 3

? ?1 2 1 2

We’ll define the Value of a chance node to be the expected value of its children nodes.

Quiz: Stochastic Games

4

121

∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆

?1 2 3

? ?1 2 1 2

Assume each branch of chance nodes have equal probability.

What is the value of the root node for this game tree?

Answer: Stochastic Games

4

121

∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆

?1 2 3

? ?1 2 1 2

Assume each branch of chance nodes have equal probability.

What is the value of the root node for this game tree?

35

3

610

9

4 1

2.667

Partially-Observable Games: Poker Example

Simple Poker game:Deck contains 2 Ks, 2As.

Each player is dealt one card.

1st round: P1 can raise (r) or check (k)2nd round: P2 can call (c) or fold (f)

Partially-Observable Games: Poker Example

11 11 020-2

?

∆∇

∆∆∆∇ ∇∇-1 0

1

0

1/3: P1<-K, P2<-A

1/6: Both K

1/3: P1<-A, P2<-K

1/6: Both A

k kk

kr r r r

Computing equilibria in games with imperfect information

If the game has perfect recall (meaning, everyone knows the history of every player’s actions)

For 2-player, zero-sum games:Finding equilibria still amounts to Linear Programming.It’s possible to compute equilibria in polynomial time in the size of the game tree.

For general-sum games:As hard as the general problem of finding Nash equilibria.

For games without perfect recall: hard even for zero-sum games.

practical techniques for agents playing multi-player games

Documents

game tree

middle child

millenia thousands of

average branching factor

complexity of minimaxchess

board position

minimax search

multiplayer gamesquiz