using plans to reduce search wei wei. topics i’ll try to address n wilkins’ work on using plan...

44
Using Plans to Reduce Search Wei Wei

Upload: ross-berry

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Using Plans to Reduce Search

Wei Wei

Topics I’ll try to address

Wilkins’ work on using plan to reduce search in chess

Junghanns and Schaeffer’s work on search in Sokoban

Our attempt to reduce search in solving Sokoban problems

Reduce search in chess

Chess is an honor of AI. But, most successful chess programs use brute force search

Search is not practical in many other games such as Go due to the big branching factor.

Human players use very skinny search trees.

Wilkins’ work

Use knowledge, patterns and planning to control search.

PARADISE (Wilkins 1979) finds the best move in tactically sharp positions.

“Tactically sharp”: success can be judged by the winning of material.

Wilkins’ work, cont.

Get right answers in 89 out of 92 positions. Some as deep as 26 plies.

Successful in this restricted domain. A set of about 200 rules as the

knowledge base.

Production Rules: An example

((DMP1)

(NEVER (EXISTS (SQ)(PATTERN MOBIL DMP1 SQ)

(NEVER (EXISTS (P1)(PATTERN ENPRIS P1 DMP1)))

(ACTION ATTACK((OTHER-COLOR DMP1)(LOCATION DMP1)

(THREAT (WIN DMP1))

(LIKELY 0)))

This rule captures “a trapped piece”

Templates in plans

(P SQ) move P to SQ (NIL SQ) move any piece to SQ (P NIL) move P to any SQ (ANYBUT P) move any piece other than

P NIL matches any defensive move

A plan produced by PARADISE(((WN N5)

(((BN N4)(SAFEMOVE WR Q7)

(((BK NIL)(SAFECAPTURE WR BR))

((ANYBUT BK)(SAFECAPTURE WR BK))))

((BN N4)(CHECKMOVE WR Q7)(BK NIL)(SAFECAPTURE WR BQ))))

((THREAT (PLUS(EXCHVAL WN N5)(FORK WR BK BR)))

(LIKELY 0))

((THREAT(PLUS(EXCHVAL WN N5)(EXCH WR BQ)))

(LIKELY 0)))

Knowledge Source (KS)

In the previous plan, SAFEMOVE, CHECKMOVE, SAFECAPTURE are all KSes.

Each KS provides the knowledge necessary to understand and reason about the abstract concept.

KS cont.

A KS is a group of productions and a list of variables.

For example, ATTACK is a KS, and has 2 variables: COL and SQ, as well as a set of productions that know how to attack SQ for side COL.

PARADISE treats every KS as a subgoal and produce plan to achieve this subgoal.

Plan

THREAT SAVE LOSS LIKELY: branches. If every step forcing,

the LIKELY value is 0

Creating plans

The static analysis process posts a THREAT KS.

THREAT KS post other KS, such as MOVE, SAFEMOVE, etc.

Modification search methods

B* search (Berliner 1979): use range to express values.

We see a plan in PARADISE is a tree. In the tree search, it is knowledge-

controlled rather than parameter-controlled.

B* search

Use ranges to express values: give more space to alpha-beta pruning

Best-first search A threshold is defined: whenever

offense wins by 2 pawns, stop search.

The limited domain helps search The program knows each position is sharp

in the sense the offense can get material gain.

The threshold (2 pawns) helps the PARADISE terminate the search, and thus makes it “parameter-controlled”.

It is easier to make plans in sharp positions because more explicit concepts are involved.

Why doesn’t it work in general

Advantages other than material are hard to capture.

Without a clear threshold, there is no way to terminate a search.

He didn’t have sophisticated planners at that time.

Recap: PARADISE

Developed in late 70’s Simplifies the problem by picking

“sharp” positions. Achieve the goal of knowledge-

controlled search by planning and complicated pruning techniques.– PARADISE: 10-100 nodes– Brute force: 1000-100,000 nodes

Revisit this problem

We revisit this problem because– It is a core problem in AI.– With the recent advances in searching,

planning, and learning, we have more powerful tools than ever.

Why not on chess again? Deep Blue has beaten the human champion. Can

we do better? Chess is a complicated problem, many rules

involved. We will prefer a problem with less rules, and more

related to practical use. A better understanding of how to reduce search

will lead to new applications in e.g. theorem proving and program verification.

Sokoban

A game demo: stage 17

Sokoban is PSPACE-complete

J. Culberson 1997. Proven by using Sokoban to simulate a finite tape TM.

The complexity of “popular Sokoban instance”, which means all goals are contiguous, is unknown.

Junghanns and Schaeffer’s work

They use A* search plus domain-specific enhancements to solve this problems.

Pure A* solves none of the 90 instances.

What makes it hard for domain independent methods? Underlying directed graph: deadlock Long solution length (up to 674) and

large branching factor produce a large search space

Solutions are sequential. Subgoals interrelated.

No simple lower bound on solution length.

Domain-dependent enhancements Over 3 years, they have solved 52 out

of the 90 instances. Lower bound (0) Transposition table (6) Move ordering (6) Deadlock table 4*5 (8) Tunnel macros (10)

Domain-dependent enhancements cont. Goal macros (26) Pattern Search (46) Relevance cuts (47) : not safe Overestimate (52) : not optimal

Insights:

What improves the performance most are the “dynamic” knowledge (gleaned from search)

Examples: deadlock table, pattern search, transposition table.

Conclusion:

A* search plus all kinds of domain-dependent enhancements can improve the performance dramatically, though still not satisfactory.

Search power, rather than human advice, works.

Our goal:

Use knowledge, and planning to reduce search in this field.

Ideally, we could use learning to learn the knowledge needed in a short period of exploration.

Junghanns and Schaeffer’s work gives us a good comparison.

How about current planners?

Blockbox used more than one hour to solve a two-ball instance. (a few seconds to solve a one-ball instance.)

Planners are not good at dealing with long-range goal interactions. (McDermott 1998)

Domain knowledge is essential

We need to formalize the knowledge humans have. It is hard to formalize some “easy” concepts.

For example, rooms, tunnels, dead ends, entrances, goal area, etc.

We have a complex definition of room here: A room is …

– Any sq. in a 2*2 grid is a REG sq.– Any nonREG sq. next to a REG sq is a

WIR sq.– A room is a set of REG or WIR sqs such

that any two sqs are connected only by REG sqs in the path.

Why need room?

An essential concept: deadlock

If we could define deadlock, we could say, our next goal is to push one ball into a goal without causing deadlock. It is always true.

So, judging deadlock is PSPACE-complete.

But still, we need to recognize “local” deadlock.

Deadlocks

How to detect deadlocks

b

a

More complicated situations

1

2

4

3

2

13

H

4

H

ab

Deadlocks cont.

H H

a b

Deadlocks: subst rules

Classes: Wall > Ball > Empty > Goal replacing a low-class sq. with a high-

class sq keeps deadlocks

Deadlock: another method

Proposed in Junghanns and Schaeffer, 1999

Basic idea: solve the one-ball problem, in there are balls in either the ball-path or the man-path, add those ball and solve it again.

Shrink: after finding a deadlock, try all proper subset to find smaller deadlocks.

Deadlock: another method, cont.

Advantage: find some global deadlocks. Disadvantages:

– The method is neither sufficient nor necessary

– Computationally expensive

Why need logic?

Database has a cut-off size, never solves problem like:

Tasks:

Formalize the knowledge humans use Incorporate all the knowledge into a

planner Find a planner suitable for a large

amount of domain knowledge Hopefully, beat brute force methods Can we learn those knowledge

automatically?

Difficulties Hard to formalize the vague concepts No current planner can generate long

plans Category III rules cannot be captured

into constraint-based planners. (Huang et al 1999)

Category III: control that depends on current state and requires dynamic user-defined predicates.