1 search cs 331/531 dr m m awais a* examples:. 2 search cs 331/531 dr m m awais 8-puzzle 0+41+5 1+3...
Post on 20-Dec-2015
214 views
TRANSCRIPT
CS 331/531 Dr M M Awais 2
search
8-Puzzle
0+4
1+5
1+5
1+3
3+3
3+4
3+4
3+2 4+1
5+2
5+0
2+3
2+4
2+3
f(N) = g(N) + h(N) with h(N) = number of misplaced tiles
CS 331/531 Dr M M Awais 4
search
Robot Navigation
0 211
58 7
7
3
4
7
6
7
6 3 2
8
6
45
23 3
36 5 24 43 5
54 6
5
6
4
5
f(N) = h(N), with h(N) = Manhattan distance to the goal(not A*)
CS 331/531 Dr M M Awais 5
search
Robot Navigation
0 211
58 7
7
3
4
7
6
7
6 3 2
8
6
45
23 3
36 5 24 43 5
54 6
5
6
4
5
f(N) = h(N), with h(N) = Manhattan distance to the goal(not A*)
7
0
CS 331/531 Dr M M Awais 6
search
Robot Navigationf(N) = g(N)+h(N), with h(N) = Manhattan distance to goal(A*)
0 211
58 7
7
3
4
7
6
7
6 3 2
8
6
45
23 3
36 5 24 43 5
54 6
5
6
4
57+0
6+1
6+1
8+1
7+0
7+2
6+1
7+2
6+1
8+1
7+2
8+3
7+26+36+35+45+44+54+53+63+62+7
8+37+47+46+5
5+6
6+35+6
2+73+8
4+7
5+64+7
3+8
4+73+83+82+92+93+10
2+9
3+8
2+91+101+100+110+11
CS 331/531 Dr M M Awais 7
search
Adversary Search (Games)
The aim is to move in such a way as to ‘stop’ the opponent from making a good / winning move.
Game playing can use Tree - Search.
The tree or game - tree alternates between two players.
CS 331/531 Dr M M Awais 8
search
Games? Games are a form of multi-agent environment
What do other agents do How do they affect our success? Cooperative vs. competitive multi-agent
environments. Competitive multi-agent environments give
rise to adversarial problems (games) Why study games?
Fun; historically entertaining Interesting subject of study because they are
hard Easy to represent and agents restricted to
small number of actions
CS 331/531 Dr M M Awais 9
search
Games vs. Search Search – no adversary
Solution is (heuristic) method for finding goal Heuristics and CSP techniques can find optimal
solution Evaluation function: estimate of cost from start to goal
through given node Examples: path planning, scheduling activities
Games – adversary Solution is strategy (strategy specifies move for every
possible opponent reply). Time limits force an approximate solution Evaluation function: evaluate “goodness” of
game position Examples: chess, checkers, Othello, backgammon
CS 331/531 Dr M M Awais 11
search
Game setup Two players: MAX and MIN MAX and MIN take turns until the game is
over. Winner gets award, looser gets penalty.
Games as search: Initial state: e.g. board configuration of chess Successor function: list of (move,state) pairs
specifying legal moves. Terminal test: Is the game finished? Utility function: Gives numerical value of
terminal states. E.g. win (+1), loose (-1) and draw (0) in tic-tac-toe (next)
MAX uses search tree to determine next move.
CS 331/531 Dr M M Awais 12
search
Things to Remember:
1. Every move is vital
2. The opponent could win at the next move or subsequent moves.
3. Keep track of the safest moves
4. The opponent is well - informed
5. How the opponent is likely to response to your moves.
CS 331/531 Dr M M Awais 13
search
Two move win
Player 1 = P1
Player 2 = P2
Safest move for P1 is always A to C
Safest move for P2 is always A to D (if allowed 1st move)
P1 P2 P1 P1 P2 P2
A
BC
D
E F G H I J
P1 moves
P2 moves
wins
CS 331/531 Dr M M Awais 14
search
MINIMAX Procedure for Games
Assumption: Opponent has same knowledge of state space and makes a consistent effort to WIN.
MIN: Label for the opponent trying to minimize other player’s (MAX) score.
MAX: Player trying to win (maximise advantage)
BOTH MAX AND MIN ARE EQUALLY INFORMED
CS 331/531 Dr M M Awais 15
search
Rules1. Label levels MAX and MIN
2. Assign values to leaf nodes:
0 if MIN wins
1 if MAX wins
3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
If parent is MIN, assign it
min-value of its children
MAX
MIN
MAX
MIN
CS 331/531 Dr M M Awais 16
search
Rules1. Label level’s MAX and MIN
2. Assign values to leaf nodes:
0 if MIN wins
1 if MAX wins
3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
If parent is MIN, assign it
min-value of its children
MAX
MIN
MAX
MIN
0 1
CS 331/531 Dr M M Awais 17
search
Rules3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
If parent is MIN, assign it
min-value of its children
MAX
MIN
MAX
MIN
0 1
1
Max(0,1) = 1
1
Max(1) = 1
CS 331/531 Dr M M Awais 18
search
Rules3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
If parent is MIN, assign it
min-value of its children
MAX
MIN
MAX
MIN
0 1
1
1
Min(1) = 1
1
1
Min(1) = 1
CS 331/531 Dr M M Awais 19
search
Rules3. Propagate values up the graph.
If parent is MAX, assign it
Max-value of its children
If parent is MIN, assign it
min-value of its children
MAX
MIN
MAX
MIN
0 1
1 1
1
Max(1,1)
1
Min(1) = 1
1
Min(1) = 1
CS 331/531 Dr M M Awais 20
search
Utility Values
• Leaf Nodes represent the result of the game• Results could be WIN or LOOSE for any player• WIN for MAX is 1, LOOSE for MAX is 0• These values are known as Utility values / functions• Draw could be another result, in this case• WIN for MAX could be 1• LOOSE for MAX could be –1• DRAW could be 0
CS 331/531 Dr M M Awais 22
search
MINMAX Unfinished Games
• Apply from the leaf node to the start node• Or, Result nodes are necessary to be in the search space• What if you want to evaluate the game status at an intermediate level• E.g.,• The game finishes at level 5• We want to find out the relative advantage of MAX upto level 3.• Solution: Evaluate intermediate nodes through a heuristic and then apply MINMAX
CS 331/531 Dr M M Awais 23
search
Minimaxing to fixed ply depth(Complex games)
Strategy: n - move look ahead
- Suppose you start in the middle of the game.
- One cannot assign WIN/LOOSEWIN/LOOSE values at that stage
- In this case some heuristics evaluation is applied
- Values are then projected back to supply indications of WINNING/LOOSING trend.
CS 331/531 Dr M M Awais 24
search
HEURISTIC FUNCTION: TIC - TAC - TOETIC - TAC - TOE
M(n) = Total of possible winning lines for MAX
O(n) = Trial of Opponents winning lines
E(n) = M(n) - O(n)
X
X
X
O
O
O
X
X
CS 331/531 Dr M M Awais 25
search
HEURISTIC FUNCTION: TIC - TAC - TOETIC - TAC - TOE
M(n) = Total of possible winning lines for MAX
O(n) = Trial of Opponents winning lines
E(n) = M(n) - O(n)
X
X
X
O
O
O
X
X
M(n)=4
M(n)=5
CS 331/531 Dr M M Awais 26
search
HEURISTIC FUNCTION: TIC - TAC - TOETIC - TAC - TOE
M(n) = Total of possible winning lines for MAX
O(n) = Trial of Opponents winning lines
E(n) = M(n) - O(n)
X
X
X
O
O
O
X
X
M(n)=4O(n)=2E(n)=2
M(n)=5O(n)=1E(n)=4
CS 331/531 Dr M M Awais 30
search
Two-Ply Game TreeThe minimax decision
Minimax maximizes the worst-case outcome for max.
CS 331/531 Dr M M Awais 31
search Problem of minimax search
Number of games states is exponential to the number of moves.
CS 331/531 Dr M M Awais 32
search
Solution Do not examine every node Alpha-beta pruning
Alpha = value of best choice found so far at any choice point along the MAX path Beta = value of best choice found so far at any choice point along the MIN path
CS 331/531 Dr M M Awais 33
search
Alpha - Beta Procedures
• Minimax procedure pursues all branches in the space. Some of them could have been ignored or pruned.
• To improve efficiency pruning is applied to two person games
CS 331/531 Dr M M Awais 34
search
Simple Idea
if A > 5 OR B < 0
If the first condition A > 5 succeeds then B < 0 may not be evaluated.
if A > 5 AND B < 0
If the first condition A > 5 fails then B < 0 may not be evaluated.
CS 331/531 Dr M M Awais 35
search
Implementation
FORWARD PASS:
APPLY DEPTH FIRST SEARCH REACH THE LEAF NODE
BACKWARD PASS:
PROPAGATE THE VALUES TO THE ROOT NODE
CS 331/531 Dr M M Awais 36
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
-0.2 (at least)
Why –0.2 is the least value?Why –0.2 is the least value?
CS 331/531 Dr M M Awais 37
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
-0.2
Suppose this node takes a value less than –0.2Value for node e will not change and remains at –0.2
CS 331/531 Dr M M Awais 38
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
v
Suppose this node takes a value greater than –0.2, say vValue for node e will change to v
CS 331/531 Dr M M Awais 39
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
v
WHAT IS THE LOWER BOUND ON v ?
Lower bound is the value at node g
CS 331/531 Dr M M Awais 40
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
=-0.2 (at least)
Minimum advantage for e MAX node is –0.2Minimum advantage for e MAX node is –0.2This is called the This is called the ALPHAALPHA Value for MAX Node Value for MAX Node
CS 331/531 Dr M M Awais 41
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
-0.2 (at most)
Why –0.2 is the AT MOST valueWhy –0.2 is the AT MOST valueFor node c ?For node c ?
=-0.2 (at least)
CS 331/531 Dr M M Awais 42
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
v
Suppose this node takes a value v less than –0.2
Value for node c will change to v
=-0.2 (at least)
CS 331/531 Dr M M Awais 43
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
-0.2
Suppose this node takes a value greater than –0.2Value for node c will not change and will remain at –0.2
=-0.2 (at least)
CS 331/531 Dr M M Awais 44
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
-0.2
WHAT IS THE UPPER BOUND ON v ?
UPPER bound is the value at node e
=-0.2 (at least)
CS 331/531 Dr M M Awais 45
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
= -0.2 (at most)
Maximum advantage for c MIN node is –0.2Maximum advantage for c MIN node is –0.2This is called the This is called the BETABETA Value for MIN Node Value for MIN Node
=-0.2 (at least)
CS 331/531 Dr M M Awais 46
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
= -0.2 (at most)
FIND THE FIND THE ALPHAALPHA VALUE FOR NODE VALUE FOR NODE aa ? ?
=-0.2 (at least)
CS 331/531 Dr M M Awais 47
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
= -0.2 (at most)
=-0.2 (at least)
= 0.4 (at least)
The The least advantageleast advantage which which MAXMAX can can get in this portion of the game is get in this portion of the game is 0.40.4
CS 331/531 Dr M M Awais 48
search
a
b = 0.4
g = -0.2
e
c
MAX
MIN
MAX
MIN
= -0.2 (at most)
=-0.2 (at least)
= 0.4 (at least)
IF this least advantage is acceptable, thenIF this least advantage is acceptable, thenExpanding to c and to all the proceeding Expanding to c and to all the proceeding nodes can be neglected: nodes can be neglected: Prune away link to cPrune away link to cWith ALPHA=0.4 With ALPHA=0.4
CS 331/531 Dr M M Awais 49
search
- MAX node neglects values <= a (atleast it can score) at MIN nodes below it.
- MIN node neglects values >= b (almost it can score) at MAX nodes below it
A
B =10
C
G=0 H
MAX
MIN
C node can score ATMOST 0 nothing above 0 (beta)
A node can score ATLEAST 10 nothing less than 10 (alpha)
CS 331/531 Dr M M Awais 50
search
Alpha-Beta Example
[-∞, +∞]
[-∞,+∞]
Range of possible values
Do DF-search until first leaf
CS 331/531 Dr M M Awais 54
search
Alpha-Beta Example (continued)
[-∞,2]
[3,+∞]
[3,3]
This node is worse for MAX