agents that can play multi-player games. recall: single-player, fully-observable, deterministic game...

Post on 12-Jan-2016

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Agents that can play multi-player games

Recall: Single-player, fully-observable, deterministic game agents

An agent that plays Peg Solitaire involves - A representation of

the initial state;- A method to generate

new states from existing ones;

- A test for whether a state is a goal state.

Initial Board for Triangle Peg Solitaire

A jump, with resulting board

The goal state:

Recall: Single-player, fully-observable, deterministic game agents

Initial Board for Triangle Peg Solitaire

A jump, with resulting board

The goal state:

Initial state

Successor state axioms or STRIPS effects

Goal state

Recall: Single-player, fully-observable, deterministic game agents

Initial Board for Triangle Peg Solitaire

A jump, with resulting board

The goal state:

Initial state

Successor state axioms or STRIPS effects

Goal state

Goal state vs. Terminal states and Utilities

The goal state:

terminal statesUtility: +2

Utility: +1

Utility: -1

Quiz: Goal states vs. Terminal states and Utilities

Initial state

Successor state axioms or STRIPS effects

Terminal states

What could go wrong when using A* or breadth-first or other strategies with terminal states?

+1+2

-1

Answer: Goal states vs. Terminal states and Utilities

Initial state

Successor state axioms or STRIPS effects

Terminal states

You’re guaranteed to find the best path to the terminal state that is found.

You’re NOT guaranteed to find the best terminal state (the one with highest utility), unless you do an exhaustive search.

+1+2

-1

Hex: Two-player, zero-sum game

(Also, deterministic and fully-observable.)Hex:- Two players, red and blue.- Board is N x N, with hexagonal

spaces.- Two opposite sides are red, and

other two sides are blue.- Each player’s objective is to build

a path connecting the sides of his or her color.

- Players alternate turns, and place a single piece of their color on their turn.

Hex: Two-player, zero-sum gameSome fun facts:- There are no ties in Hex (proved

by John Nash).- First player has a distinct

advantage (also proved by Nash).- In tournament play, it’s common

to use the “pie rule”, for fairness: after the first player makes the first move, the second player can choose whether to switch sides. (We will ignore this rule.)

Hex Question

What is red’s best move (red’s turn next)?

Hex Question

What is red’s best move (red’s turn next)?This orange one looks pretty good: only one more square, and red will win.

Using a simple heuristic, this looks like it’s getting close to the goal.

Hex Question

What is red’s best move (red’s turn next)?However, if red moves to the orange square, the blue player can win on the next turn!

Quiz: Hex Question

If red moves to the orange square, what is blue’s best move?

Answer: Hex Question

Blue has no good moves left!

Answer: Hex Question

Blue has no good moves left!This one’s bad – red can still connect the paths.

Answer: Hex Question

Blue has no good moves left!And this one’s bad too – red can still connect the paths.

Reasoning about 2-player games

To pick a good move, each player has to think about the other player’s possible responses!

Extensive Form Representation of Games

Notation: - two players, Max (Δ) and Min (∇).- Terminal states are represented by a with a

number for the utility for Max (Δ) inside.(Since we’re doing zero-sum games, the utility for Min (∇) is just the opposite of this number.)

Extensive Form Representation of Games

Game tree:

Max’s turn

Resulting worlds/boards

+1+2

-1

∆∇ ∇ ∇

∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ Max’s turn

…Terminal states,

with utility for Max

Max’s possible actions

Min’s turn

Resulting worlds/boards

Min’s possible actions

Minimax (Backup) AlgorithmBasic Idea:Compute ∆’s Value(n) for each node n in the game tree, starting with the leaves and working up (“backup”).

We’ll use a depth-first tree traversal.

Once this is calculated, Max will choose an action that leads to a child node with the highest possible value.

8 121

∆∇ ∇ ∇

4 43 20 152

Minimax (Backup) AlgorithmValue(n) =- If n is a terminal node, Value(n) = ∆’s

utility- If n is ∆’s turn:

- If n is ’s turn:∇

8 121

∆∇ ∇ ∇

4 43 20 152

Minimax (Backup) AlgorithmValue(n) =- If n is a terminal node, Value(n) =

Max’s utility- If n is ∆’s turn:

- If n is ’s turn:∇

8 121

∆∇ ∇ ∇

4 43 20 152

Value: min {3, 4, 4} = 3

Value: min {2, 30, 15} = 2

Quiz: Minimax (Backup) AlgorithmValue(n) =- If n is a terminal node,

Value(n) = Max’s utility- If n is ∆’s turn:

- If n is ’s turn:∇

8 121

∆∇ ∇ ∇

4 43 20 152

Value: min {3, 4, 4} = 3

Value: min {2, 30, 15} = 2

1. What is the Value of the middle node?∇

2. What is the value of the top ∆ node?

Answer: Minimax (Backup) Algorithm

Value(n) =- If n is a terminal node,

Value(n) = Max’s utility- If n is ∆’s turn:

- If n is ’s turn:∇

8 121

∆∇ ∇ ∇

4 43 20 152

1. What is the Value of the middle node?∇min {1, 8, 12} = 1

2. What is the value of the top ∆ node?Max {3, 1, 2} = 3

Quiz: Minimax

1. Compute the value of each node in the game tree.

2. Which action should Max take?

3. What is Min’s optimal response?

4

121

∆∇ ∇ ∇

4

56

20

-92 15301079

∆ ∆ ∆ ∆ ∆a b c

Answer: Minimax

1. Compute the value of each node in the game tree.

2. Which action should Max take? Action on right (c)

3. What is Min’s optimal response? Action on right

4

121

∆∇ ∇ ∇

4

56

20

-92 15301079

∆ ∆ ∆ ∆ ∆6 7 10 30 15

4 1 15

15a b c

From Extensive Form toNormal Form Games

Every “extensive form” game (even ones where you don’t have zero-sum utilities) can be made into a “normal form” game.

4

1

∆∇ ∇

4

5 107

∆ ∆A B

C D C D

A B A B

C D

A, A +4, -4 +5, -5

A, B +4, -4 +7, -7

B, A +1, -1 +4, -4

B, B +1, -1 +10, -10

Each sequence of actions for a player becomes a row or a column.The size of the resulting matrix can be exponential in the size of the game tree.

From Normal Form games toExtensive Form games

Not every Normal Form game can be represented using the Extensive Form I have showed you so far.

C D

C +2, -2 -3, +3

D -3, +3 +4, -4

-3

∆∇ ∇

2

C D

C D C D

-3 4

-3

∇∆ ∆

2

C D

C D C D

-3 4

?

?

∇∆

From Normal Form games toExtensive Form games

Can introduce new notation – information states – that allows the Extensive Form to represent any Normal Form game.

C D

C +2, -2 -3, +3

D -3, +3 +4, -4

-3

∆∇ ∇

2

C D

C D C D

-3 4

-3

∇∆ ∆

2

C D

C D C D

-3 4

∇∆

From Normal Form games toExtensive Form games

Information states are also useful for handling Partial Observability in turn-based games.Eg, in Poker, they can be used to represent the set of all hands your opponent may have been dealt.

C D

C +2, -2 -3, +3

D -3, +3 +4, -4

-3

∆∇ ∇

2

C D

C D C D

-3 4

-3

∇∆ ∆

2

C D

C D C D

-3 4

∇∆

Perfect Information Games

Definition: A game in extensive form has perfect information if every information state has only one node. (This is the same as our original version of game trees.)

Perfect Information is basically just another name for full observability for game trees.

We’ll talk more about partial observability later.

Theorem (Zermelo, 1913): Every finite, perfect-information game in extensive form has a pure-strategy Nash equilibrium.

Relation between Minimax Algorithm and Minimax Theorem

Recall that the Minimax Theorem says every 2-player, zero-sum game has a Value for each player and a Nash Equilibrium.

The guy who proved this (von Neumann) used essentially the Minimax algorithm to prove the theorem.

The Value of the root node in the Minimax algorithm is the same as the Value of the game for the Max player.

Quiz: Time Complexity of Minimax

Let b be the branching factor of the game tree.

Let m be the depth of the game tree.

What is the time complexity of Minimax?O(b+m)?O(bm)?O(bm)?O(mb)?

4

121

∆∇ ∇ ∇

4

56

20

-92 15301079

∆ ∆ ∆ ∆ ∆

Answer: Time Complexity of Minimax

Let b be the branching factor of the game tree.

Let m be the depth of the game tree.

What is the time complexity of Minimax?O(b+m)?O(bm)?O(bm)O(mb)?

4

121

∆∇ ∇ ∇

4

56

20

-92 15301079

∆ ∆ ∆ ∆ ∆

Quiz: Space Complexity of Minimax

Let b be the branching factor of the game tree.

Let m be the depth of the game tree.

What is the space complexity of Minimax?O(b+m)?O(bm)?O(bm)?O(mb)?

4

121

∆∇ ∇ ∇

4

56

20

-92 15301079

∆ ∆ ∆ ∆ ∆

Answer: Space Complexity of Minimax

Let b be the branching factor of the game tree.

Let m be the depth of the game tree.

What is the space complexity of Minimax?O(b+m)?O(bm)O(bm)?O(mb)?

4

121

∆∇ ∇ ∇

4

56

20

-92 15301079

∆ ∆ ∆ ∆ ∆

Quiz: Complexity of MinimaxChess: has an average branching factor of ~30, and each game takes on average ~40.

If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax?A few millisecondsA few secondsA few minutesA few hoursA few daysA few years?A few decades?A few millenia (thousands of years)?More time than the age of the universe?

Quiz: Complexity of MinimaxChess: has an average branching factor of ~30, and each game takes on average ~40.

If it takes ~1 milli-second to compute the value of each board position in the game tree, how long to figure out the value of the game using Minimax?A few millisecondsA few secondsA few minutesA few hoursA few daysA few years?A few decades?A few millenia (thousands of years)?More time than the age of the universe

Strategies for coping with complexity

• Reduce b• Reduce m• Memoize

top related