l6 8 ardersarial search

8/9/2019 L6 8 Ardersarial Search

1/22

CIT 6261:Advanced Artificial Intelligence

Acknowledgement:Tom Lenaerts, SWITCH, Vlaams Interuniversitair Instituut voor Biotechnologie

Adversarial SearchAdversarial Search

Lecturer: Dr. Md. Saiful Islam

2

Outline

What are games?

Optimal decisions in games Which strategy leads to success?

- pruning

Games of imperfect information

Games that include an element of

chance


2/22

3

What are and why study games?

Games are a form ofmulti-agent environment What do other agents do and how do they affect our success?

Cooperative vs. competitive multi-agent environments.

Competitive multi-agent environments give rise to adversarialsearch a.k.a. games

Why study games? Fun; historically entertaining

Interesting subject of study because they are hard

State of a game is easy to represent and agents are

restricted to small number of actions

4

Relation of Games to Search

Search no adversary Solution is (heuristic) method for finding goal

Heuristics and CSP techniques can find optimal solution

Evaluation function: estimate of cost from start to goalthrough given node

Examples: path planning, scheduling activities

Games adversary Solution is strategy (strategy specifies move for every

possible opponent reply).

Time limits force an approximate solution

Evaluation function: evaluate goodness ofgame position

Examples: chess, checkers, Othello, backgammon


3/22

5

Types of Games

6

Game setup

Two players: MAX and MIN

MAX moves first and they take turns until the game

is over. Winner gets award, looser gets penalty.Games as search: Initial state: e.g. board configuration of chess

Successor function: list of (move, state) pairs specifying legalmoves.

Terminal test: Is the game finished? States where the game hasended are called terminal states.

Utility function: Gives numerical value of terminal states. E.g.win (+1), loose (-1) and draw (0) in tic-tac-toe (next)

MAX uses search tree to determine next move.


4/22

7

Partial Game Tree for Tic-Tac-Toe

8

Optimal strategies

Find the contingent s t ra tegy for MAX assumingan infallible MIN opponent.

Assumption: Both players play optimally !!

Given a game tree, the optimal strategy can bedetermined by using the minimax value of eachnode:

MINIMAX-VALUE(n)=

UTILITY(n) Ifn is a terminal

maxs successors(n) MINIMAX-VALUE(s) Ifn is a max node

mins successors(n) MINIMAX-VALUE(s) Ifn is a min node


5/22

9

Two-Ply Game Tree

10

Two-Ply Game Tree


6/22

11

Two-Ply Game Tree

12

Two-Ply Game Tree

The minimax decision

Minimax maximizes the worst-case outcome for max.


7/22

13

What if MIN does not play optimally?

Definition of optimal play for MAX assumesMIN plays optimally: maximizes worst-caseoutcome for MAX.

But if MIN does not play optimally, MAX willdo even better. [proven.]

14

Minimax Algorithm

function MINIMAX-DECISION(state) returns an action

inputs: state, current state in game

vMAX-VALUE(state)return the actionin SUCCESSORS(state) with value v

function MIN-VALUE(state) returns a utility value

ifTERMINAL-TEST(state) then return UTILITY(state)

v

for a,sin SUCCESSORS(state) dov MIN(v,MAX-VALUE(s))

return v

function MAX-VALUE(state) returns a utility valueifTERMINAL-TEST(state) then return UTILITY(state)

v

for a,sin SUCCESSORS(state) do

v MAX(v,MIN-VALUE(s))return v


8/22

15

Properties of Minimax

O(bm)Space

O(bm)Time

MinimaxCriterion

16

Multiplayer games

Games allow more than two players

Single minimax values become vectors


9/22

17

Problem of minimax search

Number of games states is exponentialto the number of moves.

Solution: Do not examine every node

==> Alpha-beta pruning

Remove branches that do not influence final decision

Revisit example

18

Alpha-Beta Example

[-, +]

[-,+]

Range of possible values

Do DF-search until first leaf


10/22

19

Alpha-Beta Example (continued)

[-,3]

[-,+]

3

20


[-,3]

[-

,+

]


11/22

21


[3,+]

[3,3]

22


[-,2]

[3,+]

[3,3]

This node is worse

for MAX


12/22

23


[-,2]

[3,14]

[3,3] [-,14]

,

24


[,2]

[3,5]

[3,3] [-,5]

,


13/22

25


[2,2][,2]

[3,3]

[3,3]

26


[2,2]

[3,3]

[3,3] [,2]


14/22

27

Alpha-Beta Algorithm

function ALPHA-BETA-SEARCH(state) returns an action

inputs: state, current state in gamevMAX-VALUE(state, - , +)return the actionin SUCCESSORS(state) with value v

function MAX-VALUE(state,, ) returns a utility value

ifTERMINAL-TEST(state) then return UTILITY(state)

v -


v MAX(v,MIN-VALUE(s, , ))

if v then return v

MAX(,v)return v

28

Alpha-Beta Algorithm

function MIN-VALUE(state, , ) returns a utility valueifTERMINAL-TEST(state) then return UTILITY(state)

v +


v MIN(v,MAX-VALUE(s, , ))

if v then return v

MIN(,v)return v


15/22

29

General alpha-beta pruning

Consider a node n

somewhere in the treeIf player has a betterchoice m at Parent node ofn

Or any choice point furtherup

n will never be reachedin actual play.

Hence when enough isknown about n, it can bepruned.

30

Final Comments about Alpha-Beta Pruning

Pruning does not affect final results

Entire subtrees can be pruned.

Good move ordering improves effectiveness ofpruning

With perfect ordering, time complexity is O(bm/2) Branching factor of sqrt(b) !!

Alpha-beta pruning can look twice as far as minimax in the sameamount of time

Repeated states are again possible. Store them in memory = transposition table


16/22

31

Games of imperfect information

Minimax and alpha-beta pruning require too

much leaf-node evaluations.May be impractical within a reasonableamount of time.

SHANNON (1950): Cut off search earlier (replace TERMINAL-TEST by CUTOFF-

TEST)

Apply heuristic evaluation function EVAL (replacing utilityfunction of alpha-beta)

32

Cutting off search

Change: ifTERMINAL-TEST(state) then return UTILITY(state)

into ifCUTOFF-TEST(state,depth) then return EVAL(state)

Introduces a fixed-depth limit depth Is selected so that the amount of time will not exceed what the

rules of the game allow.

When cuttoff occurs, the evaluation is performed.

More robust approach: apply iterative deepening. When time runes out, the program return the move selected by

deepest completed search.


17/22

33

Heuristic EVAL

Idea: produce an estimate of the expected utility ofthe game from a given position.

Performance depends on quality of EVAL.

Requirements: EVAL should order terminal-nodes in the same way as UTILITY.

Computation may not take too long.

For non-terminal states the EVAL should be strongly correlatedwith the actual chance of winning.

Only useful for quiescent (no wild swings in value innear future) states

34

Heuristic EVAL example

Most evaluation function computes separate numerical

contributions for each features and then combines them to find the

total value.

Example: pawn = 1, knight = bishop = 3, rook = 5, queen = 9,

good pawn structure = king safety = .

Weighted linear evaluation function:

Eval(s) = w1f1(s) + w2f2(s) + + wnfn(s)

For chess:

fi = number of each kind of piece on the board.

wi = value of the pieces.

Nonlinear combination of feature: a pair of bishop > 2(value of single bishop) a bishop is worth more in the end game the at the beginning.


18/22

35

Heuristic EVAL exampleA secure advantage of equivalent to

one pawn is likely to win

three pawns gives almost certain victory.

36

Heuristic difficulties

Heuristic counts pieces won


19/22

37

Horizon effect

Fixed depth search

thinks it can avoid

the queening move

As hardware improvements leads to deeper searches,

horizon effect will occur less frequently.

38

Discussion

Combining all these techniques results in a programthat can play reasonable chess (or other game). We can generate and evaluate around a million nodes per

second on the latest PC, allowing us to search roughly 200

million nodes per move under standard time controls ( 3minutes / move)

On average, branching factor for chess is 35.

If we use minimax search we could look ahead only aboutfive plies (355 = 50 million)

If we use alpha-beta search we could look ahead only aboutten plies

To reach grandmaster status we need an extensively tuned

evaluation function A supercomputer would perform better.


20/22

39

Games that include chance

Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

Backgammon: luck + skill

40

Games that include chance

Doubles (e.g. [1,1], [6,6]) chance 1/36, all other chance 1/18

Can not calculate definite minimax value, only expected value

chance nodes


21/22

41

Expected minimax value

EXPECTIMINIMAX(n)=

UTILITY(n) Ifn is a terminal

maxs successors(n) MINIMAX-VALUE(s) Ifn is a max nodemins successors(n) MINIMAX-VALUE(s) Ifn is a max node

s successors(n) P(s) . EXPECTIMINIMAX(s) Ifn is a chance node

These equations can be backed-up recursively all theway to the root of the game tree.

As with minimax, approcimation to make with

EXPECTIMINMAX Cut the search off at some point

Apply the evaluation function to each leaf

42

Position evaluation with chance nodes

Left, A1 wins

Right A2 wins

Outcome of evaluation function may not change when values are

scaled differently. Avoiding this sensitivity: the EVAL must be a pos i t i v e l inea r

transformation of the probability of winning from a position.

The presence of chance node requires to be more careful forevaluation values


22/22

43

Discussion

Examine section on state-of-the-art games yourself

Summary

Games are fun (and dangerous)They illustrate several important points

about AI Perfection is unattainable -> approximation Uncertainty constrains: the assignment of values

to states

Games are to AI as grand prix racing isto automobile design.

l6 8 ardersarial search

Documents