recitation 2, thursday september 23 search me! prof. bob ...web.mit.edu/6.034/ · difference...

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

6.034 Artificial Intelligence Fall, 2010

Recitation 2, Thursday September 23

Search Me! Prof. Bob Berwick 1. Difference between Various Search Strategies: Uninformed search, and a bit more The most important data structure in the search algorithm is the queue Q, the list of partial paths. To make things efficient, we may also use a list of nodes already examined, via an extended list E. A partial path is a path from the start node to some node X, in reverse order, e.g., (X B A S) The head of a partial path is the most recent node of the path, e.g., X. Let S be the start node and G be the goal node. Here is a simple, uninformed search algorithm: 1: Initialize Q with partial path ((S)) as only entry2: while Q is not empty do 3: Pick some path N from Q 4: if head(N)=G, then 5: return N (we’ve reached the goal) 6: end if 7: remove N from Q 8: for all children c in head(N) not in E do 9: extend the partial path N to c 10: end for 11: Add these extensions of N somewhere in Q 12: end while 13: Signal failure Two operations on Q primarily determine the search strategy:

• Step 3: Which path to extend? Strategy of picking element(s) N from Q. • Step 11: How should the newly extended paths be added to Q? Strategy for adding path

extensions from node(s) N.1 Search Strategy N Add extensions of N to Q Depth-first First element Front of Q Breadth-first First element End of Q Best-first Best by heuristic value Anywhere in Q Hill-Climbing (no backup) First element Replace Q with sorted extensions of N Hill-Climbing (w/backup) First element Front of Q, sorted by heuristic value Beam (width k, no backup) Best k by heuristic value Anywhere in Q All basic search methods, except for some so called informed/heuristic search methods (like best-first search and beam search), pick the first partial path in Q. As we will see, the most variation is in where the extended paths are inserted (and in some cases, also how).

1Note that in ‘classical’ algorithms terminology, the paths added to Q, or the enqueued items, are called the ‘open list’; the extended list is called the ‘closed list.’ A further note on the extended list. This is a lousy implementation. Most often space is allocated in the node itself to keep a mark, which makes adding to the extended list and checking whether a node is there a constant time operation. Alternatively, a hash table may be used for efficient detection. In any case, the incremental space cost of an extended list will be proportional to the number of nodes in the worst case, which can be very high. To implement the searches w/o backtracking, we simply add

If, in addition, we use an extended list E, then the only difference is that (1) we add head(N) to E after N is extended; and (2) after we remove N from Q, we first check whether head(N) is in E. If head(N) is in E, then we continue at the top of the while loop without extending N. Otherwise, we continue with the remaining steps of the while loop (extending N and adding extensions to Q). To implement a search method without backtracking/backups (BT), instead of adding all the extensions of N to Q, only one is added. Let’s try out DFS; BFS; Hill-climb w/backup, ie, backtracking. DFS: pick first element of Q; add path extensions to front of Q

Pick first element of Q; Add path extensions to front of Q

C

S

B

GA

D

VisitedQ

5

4

3

2

1

G,C,D,B,A,S(G D A S) (B S)

C,D,B,A,S(D A S) (B S)

C,D,B,A,S(C A S) (D A S) (B S)

A, B, S(A S) (B S)

S(S)

1

2

3

4

5

Added paths in blue

We show the partial paths in reversed order; the head is the first node.

Depth-first

Added paths underlined. Why didn’t we add (C D A S) in Step 5? Your turn – BFS: pick first element of Q; add extensions to end of Q. Step Q Extended list 1 (S) 2 (A S) (B S) S 3 (B S)(C A S)(D A S)

A, S

4 [what happens now?]

5

6

Hill-climbing, with backup. (So this is a little bit informed!) Pick first element of Q. Add path extensions (sorted by heuristic value) to front of Q. Heuristic value is a measure of the ‘goodness’ of the path, e.g., an estimate of how far to go, as the crow flies; or in some other terms if not a map. (We will see this a bit later how to work this into optimal search.) Note that hill-climbing only looks at next locally best step. (We tack the heuristic value to the front of the list, to keep track; note sorting. What if no backup?)

Step Q Extended list 1 (S) 2 (A S) (B S) S 3 (C A S) (D A S) (B S) A, S 4 (D A S) (B S) C, A, S 5 (G D A S) (B S) D, C, A, S 6 Success- pop list

w/G G, D, C, A, S

Step Q Extended list 1 (10 S) 2 (2 A S) (3 B S) S 3 (1C A S) (4 D A S) (3 B S) A, S 4 (4 D A S) (3 B S) C, A, S 5 (0 G D A S) (3 B S) D, C, A, S 6 Success G, D, C, A, S

Pick first element of Q; Add path extensions (sorted by heuristic value) to front of Q

C

S

B

GA

D

VisitedQ

5

4

3

2

1

G,C,D,B,A,S(0 G D A S) (3 B S)

C,D,B,A,S(4 D A S) (3 B S)

C,D,B,A,S(1 C A S) (4 D A S) (3 B S)

A,B,S(2 A S) (3 B S)

S(10 S)

1

2

3

4

Added paths in blue; heuristic value of head is in front.


5

Heuristic Values

A=2 C=1 S=10

B=3 D=4 G=0

Hill-climbing (with backup)

Best-first search: idea is to use an evaluation function for each node, an estimate of its ‘desirability’, and then expand most desirable unexpanded node along the entire ‘fringe’.

Cost and Performance of Various Search Strategies (branching factor = b, depth = d) Worst case time = proportional to # nodes visited Worst case space= proportional to maximum length of Q Search Strategy

Worst Time

Worst Space

Fewest Nodes?

Guaranteed to find path?

Depth-first (with backup) bd+1 bd No Yes Breadth-first bd+1 bd Yes Yes Hill-Climbing (no backup) d b No No Hill-Climbing (with backup) bd+1 bd No Yes Best-first bd+1 bd No Yes Beam (beam width k, no backup) kd kb No No How could we combine the space efficiency of DFS with BFS? (BFS guaranteed to find path to goal with minimum number of nodes.) Answer: Iterative Deepening Search (IDS) – search DFS, level by level, until we run out of time. Let’s see. Counting Nodes in a Tree Why is (bd+1 – 1)/(b-1) the number of nodes in a tree? (branching factor = b, depth = d) If each node has b immediate descendents: o o o o b Then Level 0 (the root) has 1 node. Level 1 has b nodes. Level 2 has b * b = b2 nodes. Level 3 has b2* b = b3 nodes. … Level d has bd-1* b = bd nodes.

Step Q Extended list 1 (10 S) 2 (2 A S) (3 B S) S 3 (1C A S) (3 B S) (4 D A S) A, S 4 (3 B S) (4 D A S) C, A, S 5 (0 G B S) (4 D A S) B, C, A, S 6 Success G, B, C, A, S

So the total number of nodes is: N = 1 + b + b2 + b3 + b4 + … + bd bN = b + b2 + b3 + b4 + … + bd + bd+1

Subtracting: (b – 1)N = bd+1 – 1 N = bd+1 – 1 b – 1 So we could do this to implement iterative deepening search (IDS): 1: Initialize Dmax=1. (The goal node is of unknown depth d) 2: Do 3: DFS from S for fixed depth Dmax 4: If found a goal node, depth d ≤ Dmax then exit 5: Dmax = Dmax + 1 Cost is: O(b1+b2+…+bL)=O(bL) where L= length to goal. But isn’t IDS wasteful because we repeat searches on the different iterations? No. For example, suppose b=10 and d=5. Then the total # number of nodes N we look at for in each case is: N(IDS) ≈ db+ (d–1)b2+ … + b5 = 123,450, while for BFS the # of nodes is appro: N(BFS) ≈ b + b2 + … + b5 = 111,110, or only about 10% less. Most of the time is spent at depth d. So, IDS is asymptotically optimal; because ‘most’ of the time is spent in the fringe of the search tree. It is the preferred method over BFS, DFS when the goal depth is unknown. 2. Searching smarter: optimal search (finding the best path to the goal) “Heuristics, Patient rules of thumb, So often scorned: Sloppy, Dumb! Yet, Slowly, common sense come” – Ode to AI There are 2 functions that an agent can use to guide optimal search (smallest cost, path) 1. g(n): function that measures “cost” from start node to the current node n (e.g., distance traveled so far, # tiles moved, … 2. h(n): function that estimates future (forthcoming) costs from a node n to the goal node. This is usually called a heuristic function. (e.g., ‘as crow flies distance’, or even, h(n)=0 ) Note: the total estimated cost to the goal running through node n is thus given by: f(n)=g(n)+h(n) The total true cost to the goal will be denoted as: f*(n)=g(n)+h*(n) (where f* and h* indicate actual rather than estimated values. Note that in general we will set things up so that the following holds (why?) f*(n) ≤ f(n) 2a. Branch and bound (B&B) Pick best (measured by total path length) elt of Q; ie, Set h(n) = distance so far, always ≥ 0 (Q: what about ties?) Add path extensions anywhere in Q

B&B is like Best-first except uses total length/cost of a path instead of the heuristic value of just the head node, pushing forward along the entire fringe of the search tree. Also called ‘Uniform cost search’ in algorithms literature (we shall see why)

What about ties, as in step 5?

Note how the cost always increases monotonically in a branch-and-bound search. Q: Why don’t we stop at step 5? We don’t want to stop if the goal just appears in some path! Why not? Now we need to add two more things, to get optimal A* search (Dijkstra’s algorithm). Note that B&B is really finding the optimal path to each node in the graph. It is not ‘biased’ in the direction of the goal node. We use the heuristic function h to do that. A*= B&B + dynamic programming + admissible heuristic 1. We know what B&B (branch and bound is) 2. Dynamic programming principle (Principle AWP…told you we’d see it again!) (i) Given that path length is additive, the shortest path from S to G via a node X must be made up of the shortest path from S to X and then the shortest path from X to G. (ii) This means we only need to compute the single best path from S to any node X, because any other path will not be part of the final answer. (iii) Note that the first time B&B pulls a partial path of Q whose head node is X, this path is the shortest path from S to X. This follows from the fact that B&B enumerates paths in order of total length. (iv) So, once we find one path to N, we don’t need to consider (extend) any other paths to N. This is where we can use the extended list again. If the head of the partial path we pull off of Q is in the extended list, we discard the partial path. We should also discard paths already in Q whose head is in the extended list. With all this, B&B is still correct, but inefficient. Why? (See answer above.) 3. A* and Heuristic functions. Admissible heuristics, consistency, and examples The main idea of A* is to avoid expanding paths that are already expensive. We use the evaluation function f(n)=g(n)+h(n).

Step Q E 1 (0 S) 2 (2 A S) (5 B S) S 3 (4 C A S) (5 B S)(6 D A S) A, S 4 (5 B S)(6 D A S) C,A,S 5 (6 D A S)(6 D B S)(10 G B S) B,C,A,S 6 (6 D B S)(8 G D A S)(9 C D B S)(10 G B S) D,B,C,A,S 7 (8 G D A S)(9 C D A S)(9 C D B S)(10 G B S) G,D,B,C,A,S 8 Success! (8 G D A S) G,D,B,C,A,S

Branch and Bound

Pick best (by path length) element of Q; Add path extensions anywhere in Q

Q

1 (0 S)

1

Added paths in blue; underlined paths are chosen for extension.


C

S

B

GA

D2

5

4

23

2

51

So, we sort the Q by this value, and pick the best f. A heuristic is admissible if h(n) ≤ h*(n), i.e., the heuristic cost is always an underestimate (optimistic) estimate of the true cost to the goal from that node. Note that 0 will always be an admissible heuristic value, but then we are just doing B&B again, it won’t prune the search space. The ‘straight line’ distance is admissible if we’re in a Euclidean space. But we might be in other situations with costs, etc., where we can’t assume this. We also require that h(n) be nonzero. If these conditions holds, then A* search will provably find the optimal path. (If they don’t hold, it may not, as an example will show. We may miss a better path.) Step Q E 1 (0 S) 2 (4 A S) (8 B S) S 3 (5 C A S)(7 D A S)(8 B S) A, S 4 (7 D A S)(8 B S) C,A,S 5 (8 G D A S)(8 B S)(10 C D A S) D,C,A,S 6 Success! (8 G D A S) G,D,C,A,S Note the Q is shorter – A* is more efficient than B&B. Why didn’t we expand (8 A S) after step 5? Note that if the heuristic values at S =10 and D = 6, these would be inadmissible because, e.g., 4 is an overestimate of the remaining distance to the goal, 2. So the h value at D must be ≤ 2. Admissibility is a constraint that must hold between every node and the goal node. There is another constraint that is sometimes easier to check, that also implies admissibility, namely, consistency, which amounts to the triangle inequality, and ensures that f(n) is non-decreasing along any path. (However, admissibility does not imply consistency, so it’s not a bi-conditional.) Definition: A heuristic is consistent if, for every node n, every successor n' of n satisfies the following condition: h(n) ≤ c(n,a,n') + h(n')

n

c(n,a,n’)

h(n’)

h(n)

G

n’

This will work to ensure admissibility

So if h is consistent, we have: f(n') = g(n') + h(n') [by dfn of f] = g(n) + c(n,a, n') + h(n') ≥ g(n) + h(n) = f(n) [substituting for dfn of consistent h & dfn of f] So f(n) is non-decreasing along any path. Question: is the search graph above consistent? (Hint: Look at node D and calculate f values.)

A* (B&B + DynProg + Admissible Heuristic)

Pick best (by path length+heuristic) element of Q; Add path extensions anywhere in Q

(8 G D A S) (10 C D A S) (8 B S)5

Q

4

3

2

1

(7 D A S) (8 B S)

(5 C A S) (7 D A S) (8 B S)

(4 A S) (8 B S)

(0 S)

1

2

3

4

5

Added paths in blue; underlined paths are chosen for extension.


C

S

B

GA

D2

5

4

23

2

51

Heuristic Values

A=2 C=1 S=0

B=3 D=1 G=0

A* search = Djikstra’s algorithm

Example. Is this search tree admissible? Change values so that it is admissible, but not consistent.

Properties of A* 1. A* is complete unless there are infinitely many nodes s.t. f ≤ f(G) 2. Time: Exponential in [relative error in h x length of solution path] 3. Space: Keeps all nodes in memory (the dark side of A*, usually runs out of memory) 4. Optimal: Yes, cannot expand fi+1 until fi is finished A* expands all nodes with f(n) < C*, where C* is the optimum cost/distance A* expands some nodes with f(n) = C* A* expands no nodes with f(n) > C* Optimality of A* [optional]

G

n

G2

Start

Suppose the algorithm generates some suboptimal goal G2 and is in the fringe, as in the picture. Let n be an unexpanded node in the fringe such that n is on a shortest path to the optimal goal G. Then: (1) f(G2) = g(G2) since h(G2)=0 (2) g(G2) > g(G) since G2 is suboptimal (3) f(G) = g(G) since h(G) = 0 (4) f(G2) > f(G) from (1), (2), (3) (5) h(n) ≤ h*(n) since h is admissible (6) g(n) +h(n) ≤ g(n) +h*(n) (7) f(n) ≤ f(G) by dfn of f(G) as g(n) +h*(n) But then: (8) f(G2) > f(n) by (4) and (7), so A* will never select G2 for expansion. QED.

h=100 A f=100

25 g=25 h=55

g=80 h=30 80

C B f=80 f=110

45 10 5 g=70 h=0 f=70

5 g=90 h=20 g=30 g=85 G F E D h=80

f=110

h=35 f=120

f=110

g=0

Another picture, possibly more helpful (see properties of A*). A* expands in terms of increasing f values (like B&B), directed along contours ‘pointing’ towards the goal.

O

Z

A

T

L

M

D

C

R

F

P

G

B

U

H

E

V

I

N

380

400

420

S

4. The value of good heuristics: the 8 puzzle

2

Start State Goal State

51 3

4 6

7 8

5

1

2

3

4

6

7

8

5

What is a ‘legal move;? What would be a good heuristic h for this puzzle? Note that even IDS search is costly; if # tiles is 14, then IDS typically searches 3,473,941 nodes. If # tiles is 24, then about 54,000,000,000 nodes. Two suggested heuristics, h1=7; h2= ???? The first is called: The second is called: Question: can you guess what happens with efficiency of search if it’s always the case that: h2(n) ≥ h1(n) for all n? (Both admissible). Why?

recitation 2, thursday september 23 search me! prof. bob ...web.mit.edu/6.034/ · difference...

Documents