practical techniques for agents playing multi-player games

40
Practical techniques for agents playing multi-player games

Upload: blake

Post on 23-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Practical techniques for agents playing multi-player games. Quiz: Complexity of Minimax. Chess: has an average branching factor of ~30, and each game takes on average ~40. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Practical techniques for agents playing multi-player games

Practical techniques for agents playing multi-player games

Page 2: Practical techniques for agents playing multi-player games

Quiz: Complexity of MinimaxChess: has an average branching factor of ~30, and each game takes on average ~40.

If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax?A few millisecondsA few secondsA few minutesA few hoursA few daysA few years?A few decades?A few millenia (thousands of years)?More time than the age of the universe?

Page 3: Practical techniques for agents playing multi-player games

Quiz: Complexity of MinimaxChess: has an average branching factor of ~30, and each game takes on average ~40.

If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax?A few millisecondsA few secondsA few minutesA few hoursA few daysA few years?A few decades?A few millenia (thousands of years)?More time than the age of the universe

Page 4: Practical techniques for agents playing multi-player games

Strategies for coping with complexity

• Reduce b• Reduce m• Memoize

Page 5: Practical techniques for agents playing multi-player games

Reduce b: Alpha-beta pruning

-5

121

∆∇ ∇ ∇4

56

20

-92 155-779

∆ ∆ ∆ ∆ ∆9 7

4

During Minimax search (assume depth-first, left-to-right order): First get a 6 for the left-most child of the root.For the middle child of the root, the first child is a 1.The agent can stop searching the middle child after this 1.

<=1

Page 6: Practical techniques for agents playing multi-player games

Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-92 1531079

∆ ∆ ∆ ∆ ∆9 7

4

The agent can stop searching the middle child after this 1.The reason is that this is a Min node, and by finding a 1,we’ve already guaranteed that Min would select AT MOST a 1.So, we’ve guaranteed that MAX would not select this child, and we can move on.

<=1

Page 7: Practical techniques for agents playing multi-player games

Quiz: Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-92 1531079

∆ ∆ ∆ ∆ ∆9 7

4

What other nodes will be visited, if the agent continues with this technique?

What will be the values of those nodes?

<=1

Page 8: Practical techniques for agents playing multi-player games

Answer: Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-92 1531079

∆ ∆ ∆ ∆ ∆9 7

4

What other nodes will be visited, if the agent continues with this technique?

What will be the values of those nodes?

<=1

3

<=3

Page 9: Practical techniques for agents playing multi-player games

Quiz: Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-95 2151079

∆ ∆ ∆ ∆ ∆

Suppose the algorithm visits nodes depth-first, but Right-to-Left.

What nodes will be visited, and what are the values of those nodes?

Page 10: Practical techniques for agents playing multi-player games

Answer: Reduce b: Alpha-beta pruning

4

121

∆∇ ∇ ∇4

56

20

-95 2151079

∆ ∆ ∆ ∆ ∆

Going right-to-left in this tree, there are fewer opportunities for pruning: effects of pruning depend on the values in the tree.

On average, this technique tends to cut branching factors down to their square root (from b to √b).

2>=15

2

10

1

7>=9

4

4

Page 11: Practical techniques for agents playing multi-player games

Reduce m: evaluation functions

4

121

∆∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆ ∆ ∆

Suppose searching to a depth of m=3 is just too expensive.What we’ll do instead is introduce a horizon (h), or cutoff.For this example, we’ll let h=2.No nodes will be visited beyond the horizon.

Page 12: Practical techniques for agents playing multi-player games

Reduce m: evaluation functions

4

121

∆∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆ ∆ ∆Problem: how do we determine the value of non-terminal nodes at the horizon?

The general answer is to introduce evaluation functions, which estimate (or guess) the value of a node.

? ? ? ? ?

Page 13: Practical techniques for agents playing multi-player games

Reduce m: evaluation functions

4

121

∆∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆ ∆ ∆Let E(n) = w1 f1(n) + w2 f2(n) + … + wk fk(n)Each fi(n) is a “feature function” of the node, that returns some real number describing the node in some way.Each wi is a real number weight or parameter.One common way to create E(n) is to get game experts to come up with appropriate fi and wi.

? ? ? ? ?

Page 14: Practical techniques for agents playing multi-player games

Hex Evaluation Function Example

As an example, one possible fi function for Hex could be the shortest path to a solution for Red, minus the shortest path to a solution for Blue.

fi = shortest path for red – shortest path for blue = 2 - 1 = 1

Page 15: Practical techniques for agents playing multi-player games

Hex Evaluation Function Example

If Red is Max, we can assign wi = -1.

This encodes the intuition that if Red has a longer shortest path than Blue, then this is a bad position for Red.

fi = shortest path for red – shortest path for blue = 2 - 1 = 1

Page 16: Practical techniques for agents playing multi-player games

Hex Evaluation Function Example

Can you think of some other potential fi for Hex?

Notice, the important thing is that fi should be correlated with Value(n).

Page 17: Practical techniques for agents playing multi-player games

Learning an evaluation function

Experts are often good at coming up with fi functions.

But it’s often hard for a game expert (or anyone) to come up with the best wi weights for E(n).

An alternative approach is to create an algorithm to learn the wi weights from data.

Page 18: Practical techniques for agents playing multi-player games

What’s the data?To do machine learning, you need data that contains inputs and labels.

For an evaluation function, that means board positions and values.

But we’ve already said that it’s hard to figure out the right value for many board positions – that’s the whole point of the evaluation function in the first place.

Instead of asking people to label boards with values, a common approach is to have a simulation of the agent playing against itself.

The outcome of the game is used as the value of the board positions along the way.

Page 19: Practical techniques for agents playing multi-player games

Quiz: What’s the learning algorithm?

Once you’ve collected enough examples of board positions and values, there are lots of algorithms to do the learning.

For the kind of evaluation function I introduced, name some appropriate learning techniques that we’ve discussed.

Page 20: Practical techniques for agents playing multi-player games

Answer: What’s the learning algorithm?

Once you’ve collected enough examples of board positions and values, there are lots of algorithms to do the learning.

For the kind of evaluation function I introduced, name some appropriate learning techniques that we’ve discussed.

Two come to my mind:-Linear Regression-Gradient Descent

Page 21: Practical techniques for agents playing multi-player games

Quiz: Horizon effect example

1. What is Red’s best move?

2. If we use a horizon of 2 and the “shortest-path” evaluation, what will Red choose?

Page 22: Practical techniques for agents playing multi-player games

Answer: Horizon effect example

1. What is Red’s best move?

If Red moves to either of these squares, Blue can easily block it by moving to the other one.

If Red moves to either of these squares, Blue can easily win by moving to the other one.

Page 23: Practical techniques for agents playing multi-player games

Answer: Horizon effect example

1. What is Red’s best move?

If Red moves to any of these squares,

Blue can win by moving here.

Page 24: Practical techniques for agents playing multi-player games

Answer: Horizon effect example

1. What is Red’s best move?

Red’s only chance is to move here first.

In fact, if Red does that, Red should win the game.

But – that’s hard to see, and requires seeing many moves in advance.

Page 25: Practical techniques for agents playing multi-player games

Answer: Horizon effect example

2. If we use a horizon of 2 and the “shortest-path” evaluation, what will Red choose?

This choice gives Red a shortest path of 3. Many of Blue’s responses would

decrease Blue’s shortest path to 2for a difference of 1 in favor of Blue.

Page 26: Practical techniques for agents playing multi-player games

Answer: Horizon effect example

2. If we use a horizon of 2 and the “shortest-path” evaluation, what will Red choose?

This choice gives Red a shortest path of 3. Many of Blue’s responses would

decrease Blue’s shortest path to 2for a difference of 1 in favor of Blue.

But that’s basically evaluation for many of Red’s moves. So Red has no idea what move is the best move, and must pick randomly.

Page 27: Practical techniques for agents playing multi-player games

MemoizationMemoization involves remembering/memorizing certain good positions or strategies or moves, to avoid doing a lot of search when such positions or moves become available.Some common examples:• Opening book: A database of good positions for a particular player in the

beginning phases of a game, as determined by game experts. (These are especially important for chess.)

• Closing book: A database of board positions that are close to the end of the game, with the best possible strategy for completing the game from that position.

• Killer moves: A technique of remembering when some move in a game tree results in a big change to the game (eg, someone’s queen gets taken, in chess). If this happens in one place in a game tree, it’s a good idea to check for it in other branches of the tree as well.

Page 28: Practical techniques for agents playing multi-player games

Full AlgorithmInitial Call: Value(root, 0, -∞, +∞).α: best value for Max found so far. β: best value for Min found so far.Value(n, depth, α, β):- If n is a terminal node, return ∆’s utility - If depth >= cutoff, return E(n)- If n is ∆’s turn:

v = -∞For each c Children(n):∊v = max(v, Value(c, depth+1, α, β))if v >= β: return v (pruning step)α = max(α, v)Return v

- If n is ’s turn:∇v = +∞For each c Children(n):∊v = min(v, Value(c, depth+1, α, β))if v <= α: return v (pruning step) β = min(β, v)Return v

Page 29: Practical techniques for agents playing multi-player games

Benefits to Complexity• Reduce b: O(bm) O(bm/2)• Reduce m: O(bm) O(bh), h << m• Memoize: O(bm) O(1), if board position has already

been analyzed before

Note: alpha-beta pruning is an exact method: the best move using alpha-beta pruning is the same as the best move without it (normal Minimax).

Horizon cutoffs are approximate methods: you may get bad results for the choice of an action, if you get a horizon effect.

Page 30: Practical techniques for agents playing multi-player games

Example of a Real System: Chinook (checkers player)

Checkers: • Red moves first.• Each move is either

- a diagonal move one square- a diagonal jump over an opponent’s

piece, which removes the opponent’s piece.

• Multiple jumps are possible.• If a Jump is available to a player, the player

must take it.• Ordinarily, pieces can only move forward.• If a piece gets to the opposite side of the

board, it gets “crowned”.• A “crowned” piece can move forward or

backwards.

Page 31: Practical techniques for agents playing multi-player games

Example of a Real System: Chinook (checkers player)

Chinook: Chinook is a computer program that plays Checkers.

1990: It became the first computer program to win the right to compete for a world championship in a game or sport. (It came second in the US National competition.)

The Checkers governing body didn’t like that, but they created the World Man-Machine Competition.

1994: Chinook wins the World Man-Machine Competition against Dr. Marion Tinsley, after Tinsley withdraws due to health problems.

1995: Chinook defended its title against Don Lafferty. After that, the program creators decided to retire Chinook.

2007: The program creators proved that the best anyone can do against Chinook is draw.

Page 32: Practical techniques for agents playing multi-player games

Example of a Real System: Chinook (checkers player)

Chinook: The Chinook system:

• Minimax + alpha-beta pruning• Hand-crafted evaluation function (no learning

component)Linear function with features like:• Num. pieces for each player• How many kings for each player• How many kings are “trapped”• How many pieces are “runaways” – nothing to

stop them from being crowned

• Opening move database from checkers experts• End-game database that stores the best move from

all positions with 8 or fewer pieces.

Page 33: Practical techniques for agents playing multi-player games

Stochastic Games

Many games (Backgammon, Monopoly, World of Warcraft, etc.) involve some randomness.

An attack against another creature may or may not succeed, or the number of squares your piece is allowed to move may depend on a dice roll.

Page 34: Practical techniques for agents playing multi-player games

Stochastic Games: Giving Nature a turn

4

121

∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆

?1 2 3

? ?

In Stochastic games, we give “Nature” a turn in the game tree whenever it’s time for a dice roll or some random event.

We’ll represent Nature’s turn with a ?, and call these “Chance nodes”.

Page 35: Practical techniques for agents playing multi-player games

Stochastic Games: Giving Nature a turn

4

121

∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆

?1 2 3

? ?1 2 1 2

We’ll define the Value of a chance node to be the expected value of its children nodes.

Page 36: Practical techniques for agents playing multi-player games

Quiz: Stochastic Games

4

121

∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆

?1 2 3

? ?1 2 1 2

Assume each branch of chance nodes have equal probability.

What is the value of the root node for this game tree?

Page 37: Practical techniques for agents playing multi-player games

Answer: Stochastic Games

4

121

∇ ∇ ∇4

56

20

-92 1551079

∆ ∆ ∆

?1 2 3

? ?1 2 1 2

Assume each branch of chance nodes have equal probability.

What is the value of the root node for this game tree?

35

3

610

9

4 1

2.667

Page 38: Practical techniques for agents playing multi-player games

Partially-Observable Games: Poker Example

Simple Poker game:Deck contains 2 Ks, 2As.

Each player is dealt one card.

1st round: P1 can raise (r) or check (k)2nd round: P2 can call (c) or fold (f)

Page 39: Practical techniques for agents playing multi-player games

Partially-Observable Games: Poker Example

11 11 020-2

?

∆∇

∆∆∆∇ ∇∇-1 0

1

0

1/3: P1<-K, P2<-A

1/6: Both K

1/3: P1<-A, P2<-K

1/6: Both A

k kk

kr r r r

Page 40: Practical techniques for agents playing multi-player games

Computing equilibria in games with imperfect information

If the game has perfect recall (meaning, everyone knows the history of every player’s actions)

For 2-player, zero-sum games:Finding equilibria still amounts to Linear Programming.It’s possible to compute equilibria in polynomial time in the size of the game tree.

For general-sum games:As hard as the general problem of finding Nash equilibria.

For games without perfect recall: hard even for zero-sum games.