l6 8 ardersarial search

Upload: mmahmoud

Post on 30-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 L6 8 Ardersarial Search

    1/22

    CIT 6261:Advanced Artificial Intelligence

    Acknowledgement:Tom Lenaerts, SWITCH, Vlaams Interuniversitair Instituut voor Biotechnologie

    Adversarial SearchAdversarial Search

    Lecturer: Dr. Md. Saiful Islam

    2

    Outline

    What are games?

    Optimal decisions in games Which strategy leads to success?

    - pruning

    Games of imperfect information

    Games that include an element of

    chance

  • 8/9/2019 L6 8 Ardersarial Search

    2/22

    3

    What are and why study games?

    Games are a form ofmulti-agent environment What do other agents do and how do they affect our success?

    Cooperative vs. competitive multi-agent environments.

    Competitive multi-agent environments give rise to adversarialsearch a.k.a. games

    Why study games? Fun; historically entertaining

    Interesting subject of study because they are hard

    State of a game is easy to represent and agents are

    restricted to small number of actions

    4

    Relation of Games to Search

    Search no adversary Solution is (heuristic) method for finding goal

    Heuristics and CSP techniques can find optimal solution

    Evaluation function: estimate of cost from start to goalthrough given node

    Examples: path planning, scheduling activities

    Games adversary Solution is strategy (strategy specifies move for every

    possible opponent reply).

    Time limits force an approximate solution

    Evaluation function: evaluate goodness ofgame position

    Examples: chess, checkers, Othello, backgammon

  • 8/9/2019 L6 8 Ardersarial Search

    3/22

    5

    Types of Games

    6

    Game setup

    Two players: MAX and MIN

    MAX moves first and they take turns until the game

    is over. Winner gets award, looser gets penalty.Games as search: Initial state: e.g. board configuration of chess

    Successor function: list of (move, state) pairs specifying legalmoves.

    Terminal test: Is the game finished? States where the game hasended are called terminal states.

    Utility function: Gives numerical value of terminal states. E.g.win (+1), loose (-1) and draw (0) in tic-tac-toe (next)

    MAX uses search tree to determine next move.

  • 8/9/2019 L6 8 Ardersarial Search

    4/22

    7

    Partial Game Tree for Tic-Tac-Toe

    8

    Optimal strategies

    Find the contingent s t ra tegy for MAX assumingan infallible MIN opponent.

    Assumption: Both players play optimally !!

    Given a game tree, the optimal strategy can bedetermined by using the minimax value of eachnode:

    MINIMAX-VALUE(n)=

    UTILITY(n) Ifn is a terminal

    maxs successors(n) MINIMAX-VALUE(s) Ifn is a max node

    mins successors(n) MINIMAX-VALUE(s) Ifn is a min node

  • 8/9/2019 L6 8 Ardersarial Search

    5/22

    9

    Two-Ply Game Tree

    10

    Two-Ply Game Tree

  • 8/9/2019 L6 8 Ardersarial Search

    6/22

    11

    Two-Ply Game Tree

    12

    Two-Ply Game Tree

    The minimax decision

    Minimax maximizes the worst-case outcome for max.

  • 8/9/2019 L6 8 Ardersarial Search

    7/22

    13

    What if MIN does not play optimally?

    Definition of optimal play for MAX assumesMIN plays optimally: maximizes worst-caseoutcome for MAX.

    But if MIN does not play optimally, MAX willdo even better. [proven.]

    14

    Minimax Algorithm

    function MINIMAX-DECISION(state) returns an action

    inputs: state, current state in game

    vMAX-VALUE(state)return the actionin SUCCESSORS(state) with value v

    function MIN-VALUE(state) returns a utility value

    ifTERMINAL-TEST(state) then return UTILITY(state)

    v

    for a,sin SUCCESSORS(state) dov MIN(v,MAX-VALUE(s))

    return v

    function MAX-VALUE(state) returns a utility valueifTERMINAL-TEST(state) then return UTILITY(state)

    v

    for a,sin SUCCESSORS(state) do

    v MAX(v,MIN-VALUE(s))return v

  • 8/9/2019 L6 8 Ardersarial Search

    8/22

    15

    Properties of Minimax

    O(bm)Space

    O(bm)Time

    MinimaxCriterion

    16

    Multiplayer games

    Games allow more than two players

    Single minimax values become vectors

  • 8/9/2019 L6 8 Ardersarial Search

    9/22

    17

    Problem of minimax search

    Number of games states is exponentialto the number of moves.

    Solution: Do not examine every node

    ==> Alpha-beta pruning

    Remove branches that do not influence final decision

    Revisit example

    18

    Alpha-Beta Example

    [-, +]

    [-,+]

    Range of possible values

    Do DF-search until first leaf

  • 8/9/2019 L6 8 Ardersarial Search

    10/22

    19

    Alpha-Beta Example (continued)

    [-,3]

    [-,+]

    3

    20

    Alpha-Beta Example (continued)

    [-,3]

    [-

    ,+

    ]

  • 8/9/2019 L6 8 Ardersarial Search

    11/22

    21

    Alpha-Beta Example (continued)

    [3,+]

    [3,3]

    22

    Alpha-Beta Example (continued)

    [-,2]

    [3,+]

    [3,3]

    This node is worse

    for MAX

  • 8/9/2019 L6 8 Ardersarial Search

    12/22

    23

    Alpha-Beta Example (continued)

    [-,2]

    [3,14]

    [3,3] [-,14]

    ,

    24

    Alpha-Beta Example (continued)

    [,2]

    [3,5]

    [3,3] [-,5]

    ,

  • 8/9/2019 L6 8 Ardersarial Search

    13/22

    25

    Alpha-Beta Example (continued)

    [2,2][,2]

    [3,3]

    [3,3]

    26

    Alpha-Beta Example (continued)

    [2,2]

    [3,3]

    [3,3] [,2]

  • 8/9/2019 L6 8 Ardersarial Search

    14/22

    27

    Alpha-Beta Algorithm

    function ALPHA-BETA-SEARCH(state) returns an action

    inputs: state, current state in gamevMAX-VALUE(state, - , +)return the actionin SUCCESSORS(state) with value v

    function MAX-VALUE(state,, ) returns a utility value

    ifTERMINAL-TEST(state) then return UTILITY(state)

    v -

    for a,sin SUCCESSORS(state) do

    v MAX(v,MIN-VALUE(s, , ))

    if v then return v

    MAX(,v)return v

    28

    Alpha-Beta Algorithm

    function MIN-VALUE(state, , ) returns a utility valueifTERMINAL-TEST(state) then return UTILITY(state)

    v +

    for a,sin SUCCESSORS(state) do

    v MIN(v,MAX-VALUE(s, , ))

    if v then return v

    MIN(,v)return v

  • 8/9/2019 L6 8 Ardersarial Search

    15/22

    29

    General alpha-beta pruning

    Consider a node n

    somewhere in the treeIf player has a betterchoice m at Parent node ofn

    Or any choice point furtherup

    n will never be reachedin actual play.

    Hence when enough isknown about n, it can bepruned.

    30

    Final Comments about Alpha-Beta Pruning

    Pruning does not affect final results

    Entire subtrees can be pruned.

    Good move ordering improves effectiveness ofpruning

    With perfect ordering, time complexity is O(bm/2) Branching factor of sqrt(b) !!

    Alpha-beta pruning can look twice as far as minimax in the sameamount of time

    Repeated states are again possible. Store them in memory = transposition table

  • 8/9/2019 L6 8 Ardersarial Search

    16/22

    31

    Games of imperfect information

    Minimax and alpha-beta pruning require too

    much leaf-node evaluations.May be impractical within a reasonableamount of time.

    SHANNON (1950): Cut off search earlier (replace TERMINAL-TEST by CUTOFF-

    TEST)

    Apply heuristic evaluation function EVAL (replacing utilityfunction of alpha-beta)

    32

    Cutting off search

    Change: ifTERMINAL-TEST(state) then return UTILITY(state)

    into ifCUTOFF-TEST(state,depth) then return EVAL(state)

    Introduces a fixed-depth limit depth Is selected so that the amount of time will not exceed what the

    rules of the game allow.

    When cuttoff occurs, the evaluation is performed.

    More robust approach: apply iterative deepening. When time runes out, the program return the move selected by

    deepest completed search.

  • 8/9/2019 L6 8 Ardersarial Search

    17/22

    33

    Heuristic EVAL

    Idea: produce an estimate of the expected utility ofthe game from a given position.

    Performance depends on quality of EVAL.

    Requirements: EVAL should order terminal-nodes in the same way as UTILITY.

    Computation may not take too long.

    For non-terminal states the EVAL should be strongly correlatedwith the actual chance of winning.

    Only useful for quiescent (no wild swings in value innear future) states

    34

    Heuristic EVAL example

    Most evaluation function computes separate numerical

    contributions for each features and then combines them to find the

    total value.

    Example: pawn = 1, knight = bishop = 3, rook = 5, queen = 9,

    good pawn structure = king safety = .

    Weighted linear evaluation function:

    Eval(s) = w1f1(s) + w2f2(s) + + wnfn(s)

    For chess:

    fi = number of each kind of piece on the board.

    wi = value of the pieces.

    Nonlinear combination of feature: a pair of bishop > 2(value of single bishop) a bishop is worth more in the end game the at the beginning.

  • 8/9/2019 L6 8 Ardersarial Search

    18/22

    35

    Heuristic EVAL exampleA secure advantage of equivalent to

    one pawn is likely to win

    three pawns gives almost certain victory.

    36

    Heuristic difficulties

    Heuristic counts pieces won

  • 8/9/2019 L6 8 Ardersarial Search

    19/22

    37

    Horizon effect

    Fixed depth search

    thinks it can avoid

    the queening move

    As hardware improvements leads to deeper searches,

    horizon effect will occur less frequently.

    38

    Discussion

    Combining all these techniques results in a programthat can play reasonable chess (or other game). We can generate and evaluate around a million nodes per

    second on the latest PC, allowing us to search roughly 200

    million nodes per move under standard time controls ( 3minutes / move)

    On average, branching factor for chess is 35.

    If we use minimax search we could look ahead only aboutfive plies (355 = 50 million)

    If we use alpha-beta search we could look ahead only aboutten plies

    To reach grandmaster status we need an extensively tuned

    evaluation function A supercomputer would perform better.

  • 8/9/2019 L6 8 Ardersarial Search

    20/22

    39

    Games that include chance

    Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

    Backgammon: luck + skill

    40

    Games that include chance

    Doubles (e.g. [1,1], [6,6]) chance 1/36, all other chance 1/18

    Can not calculate definite minimax value, only expected value

    chance nodes

  • 8/9/2019 L6 8 Ardersarial Search

    21/22

    41

    Expected minimax value

    EXPECTIMINIMAX(n)=

    UTILITY(n) Ifn is a terminal

    maxs successors(n) MINIMAX-VALUE(s) Ifn is a max nodemins successors(n) MINIMAX-VALUE(s) Ifn is a max node

    s successors(n) P(s) . EXPECTIMINIMAX(s) Ifn is a chance node

    These equations can be backed-up recursively all theway to the root of the game tree.

    As with minimax, approcimation to make with

    EXPECTIMINMAX Cut the search off at some point

    Apply the evaluation function to each leaf

    42

    Position evaluation with chance nodes

    Left, A1 wins

    Right A2 wins

    Outcome of evaluation function may not change when values are

    scaled differently. Avoiding this sensitivity: the EVAL must be a pos i t i v e l inea r

    transformation of the probability of winning from a position.

    The presence of chance node requires to be more careful forevaluation values

  • 8/9/2019 L6 8 Ardersarial Search

    22/22

    43

    Discussion

    Examine section on state-of-the-art games yourself

    Summary

    Games are fun (and dangerous)They illustrate several important points

    about AI Perfection is unattainable -> approximation Uncertainty constrains: the assignment of values

    to states

    Games are to AI as grand prix racing isto automobile design.