a tutorial on planning graph based reachability heuristicsrakaposhi.eas.asu.edu › pgsurvey.pdf ·...

37
The primary revolution in automated planning in the last decade has been the very impressive scale- up in planner performance. A large part of the credit for this can be attributed squarely to the invention and deployment of powerful reachabili- ty heuristics. Most, if not all, modern reachability heuristics are based on a remarkably extensible data structure called the planning graph, which made its debut as a bit player in the success of GraphPlan, but quickly grew in prominence to occupy the center stage. Planning graphs are a cheap means to obtain informative look-ahead heuristics for search and have become ubiquitous in state-of-the-art heuristic search planners. We present the foundations of planning graph heuris- tics in classical planning and explain how their flexibility lets them adapt to more expressive sce- narios that consider action costs, goal utility, numeric resources, time, and uncertainty. S ynthesizing plans capable of achieving an agent’s goals has always been a central endeavor in AI. Considerable work has been done in the last 40 years on modeling a wide variety of plan-synthesis problems and developing multiple search regimes for driving the synthesis itself. Despite this progress, the ability to synthesize reasonable length plans under even the most stringent restrictions remained severely limited. This state of affairs has changed quite dramatically in the last decade, giving rise to planners that can rou- tinely generate plans with hundreds of actions. This revolutionary shift in scalability can be attributed in large part to the use of sophisti- cated reachability heuristics to guide the plan- ners’ search. Reachability heuristics aim to estimate the cost of a plan between the current search state and the goal state. While reachability analysis can be carried out in many different ways (Bonet and Geffner 1999, McDermott 1999, Ghallab and Laruelle 1994), one particular way—involving planning graphs—has proven to be very effective and extensible. Planning graphs were originally introduced as part of the GraphPlan algorithm (Blum and Furst 1995) but quickly grew in prominence once their connection to reachability analysis was recog- nized. Planning graphs provide inexpensive but informative reachability heuristics by approxi- mating the search tree rooted at a given state. They have also proven to be quite malleable in being adapted to a range of expressive plan- ning problems. Planners using such heuristics have come to dominate the state of the art in plan synthesis. Indeed, 12 out of 20 teams in the 2004 International Planning Competition and 15 out of 22 teams in the 2006 Interna- tional Planning Competition used planning graph–based heuristics. Lack of search control for domain-independent planners has, in the past, made most planning applications be writ- ten with a knowledge-intensive planning approach—such as hierarchical task networks. However, the effectiveness of reachability heuristics has started tilting the balance back and is giving rise to a set of applications based on domain-independent planners (Ruml, Do, and Fromherz 2005; Boddy et al. 2005). This article 1 aims to provide an accessible introduction to reachability heuristics in plan- ning. While we will briefly describe the plan- ning algorithms to set the context for the heuristics, our aim is not to provide a compre- Articles SPRING 2007 47 Copyright © 2007, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602 A Tutorial on Planning Graph–Based Reachability Heuristics Daniel Bryce and Subbarao Kambhampati

Upload: others

Post on 30-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

■ The primary revolution in automated planning inthe last decade has been the very impressive scale-up in planner performance. A large part of thecredit for this can be attributed squarely to theinvention and deployment of powerful reachabili-ty heuristics. Most, if not all, modern reachabilityheuristics are based on a remarkably extensibledata structure called the planning graph, whichmade its debut as a bit player in the success ofGraphPlan, but quickly grew in prominence tooccupy the center stage. Planning graphs are acheap means to obtain informative look-aheadheuristics for search and have become ubiquitousin state-of-the-art heuristic search planners. Wepresent the foundations of planning graph heuris-tics in classical planning and explain how theirflexibility lets them adapt to more expressive sce-narios that consider action costs, goal utility,numeric resources, time, and uncertainty.

Synthesizing plans capable of achieving anagent’s goals has always been a centralendeavor in AI. Considerable work has

been done in the last 40 years on modeling awide variety of plan-synthesis problems anddeveloping multiple search regimes for drivingthe synthesis itself. Despite this progress, theability to synthesize reasonable length plansunder even the most stringent restrictionsremained severely limited. This state of affairshas changed quite dramatically in the lastdecade, giving rise to planners that can rou-tinely generate plans with hundreds of actions.This revolutionary shift in scalability can beattributed in large part to the use of sophisti-cated reachability heuristics to guide the plan-ners’ search.

Reachability heuristics aim to estimate the

cost of a plan between the current search stateand the goal state. While reachability analysiscan be carried out in many different ways(Bonet and Geffner 1999, McDermott 1999,Ghallab and Laruelle 1994), one particularway—involving planning graphs—has provento be very effective and extensible. Planninggraphs were originally introduced as part of theGraphPlan algorithm (Blum and Furst 1995)but quickly grew in prominence once theirconnection to reachability analysis was recog-nized.

Planning graphs provide inexpensive butinformative reachability heuristics by approxi-mating the search tree rooted at a given state.They have also proven to be quite malleable inbeing adapted to a range of expressive plan-ning problems. Planners using such heuristicshave come to dominate the state of the art inplan synthesis. Indeed, 12 out of 20 teams inthe 2004 International Planning Competitionand 15 out of 22 teams in the 2006 Interna-tional Planning Competition used planninggraph–based heuristics. Lack of search controlfor domain-independent planners has, in thepast, made most planning applications be writ-ten with a knowledge-intensive planningapproach—such as hierarchical task networks.However, the effectiveness of reachabilityheuristics has started tilting the balance backand is giving rise to a set of applications basedon domain-independent planners (Ruml, Do,and Fromherz 2005; Boddy et al. 2005).

This article1 aims to provide an accessibleintroduction to reachability heuristics in plan-ning. While we will briefly describe the plan-ning algorithms to set the context for theheuristics, our aim is not to provide a compre-

Articles

SPRING 2007 47Copyright © 2007, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602

A Tutorial on Planning Graph–Based Reachability Heuristics

Daniel Bryce and Subbarao Kambhampati

Page 2: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

hensive introduction to planning algorithms.Rather, we seek to complement existing sources(see, for example, Ghallab, Nau, and Traverso[2004]) by providing a complete introductionto reachability heuristics in planning.

As outlined in figure 1, we discuss the classi-cal planning model along with several exten-sions (in clockwise order). Naturally, as theplanning model becomes more expressive, theplanning graph representation and heuristicschange. We start with classical planning to layan intuitive foundation and then build uptechniques for handling additional problemfeatures. We concentrate on the following fea-tures independently, pointing out where theyhave been combined. Action costs are uniformin the classical model, but we later describenonuniform costs. Goal satisfaction is total inthe classical model but can be partial in gener-al (that is, the cost of satisfying all goals canexceed the benefit). Resources are describedonly by Boolean variables in the classical mod-el but can be integer or real-valued variables.Time is atomic in the classical model, but is

durative in general (that is, actions can havedifferent durations). Uncertainty is nonexist-ent in the classical model but can generallyresult through uncertain action effects and par-tial (or no) observability. While there are manyadditional ways to extend the classical model,we hold some features constant throughoutour discussion. The number of agents isrestricted to only one (the planning agent),execution takes place after plan synthesis, andwith the exception of temporal planning, plansare sequential (no actions execute concurrent-ly).

We will use variations on the following run-ning example to illustrate how several plan-ning models can address the same problem andhow we can derive heuristics for the models:

Example 1 (Planetary Rover)Ground control needs to plan the daily activi-ties for the rover Conquest. Conquest is current-ly at location alpha on Mars, with partial pow-er reserves and some broken sensors. Themission scientists have objectives to obtain asoil sample from location alpha, a rock coresample from location beta, and a photo of the

Articles

48 AI MAGAZINE

Planning with Resources- Unit Cost Actions- Real-valued Variables- Atomic Time - Known State- Deterministic Actions - Fully Satisfy Goals

Classical Planning- Unit Cost Actions- Boolean-valued Variables- Atomic Time - Known State- Deterministic Actions - Fully Satisfy Goals

Cost-Based Planning - Nonunit Cost Actions- Boolean-valued Variables- Atomic Time - Known State- Deterministic Actions - Fully Satisfy Goals Partial Satisfaction Planning

- Nonunit Cost Actions- Boolean-valued Variables- Atomic Time - Known State- Deterministic Actions - Partially Satisfy Goals

Temporal Planning- Unit Cost Actions- Boolean-valued Variables- Durative Time- Known State- Deterministic Actions - Fully Satisfy Goals

Planning Under Uncertainty- Unit Cost Actions- Boolean-valued Variables- Atomic Time - Partially-Known State

Uncertain Actions- Fully Satisfy Goals -

Figure 1. Taxonomy of Planning Models.

Page 3: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

lander from the hilltop location gamma—eachobjective having a different utility. The rovermust communicate any data it collects to havethe objectives satisfied. The driving time andcost from one location to another vary depend-ing on the terrain and distance between loca-tions. Actions to collect soil, rock, and imagedata incur different costs and take varioustimes. Mission scientists may not know whichlocations will have the data they need, and therover’s actions may fail, leading to potentialsources of uncertainty.

History of Reachability and Planning Graph HeuristicsGiven the current popularity of heuristicsearch planners, it is somewhat surprising tonote that the interest in the reachability heuris-tics in AI planning is a relatively new develop-ment. Ghallab and his colleagues were the firstto report on a reachability heuristic in IxTeT(Ghallab and Laruelle 1994) for doing actionselection in a partial-order planner. Howeverthe effectiveness of their heuristic was not ade-quately established. Subsequently the idea ofreachability heuristics was independently(re)discovered by Drew McDermott (1996,1999) in the context of his UNPOP planner.UNPOP was one of the first domain-indepen-dent planners to synthesize plans containingup to 40 actions. A second independent redis-covery of the idea of using reachability heuris-tics in planning was made by Blai Bonet andHector Geffner (1999). Each rediscovery is theresult of attempts to speed up plan synthesiswithin a different search substrate (partial-order planning, regression, and progression).

The most widely accepted approach to com-puting reachability information is embodiedby GraphPlan. The original GraphPlan planner(Blum and Furst 1995) used a specialized com-binatorial algorithm to search for subgraphs ofthe planning graph structure that correspondto valid plans. It is interesting to note thatalmost 75 percent of the original GraphPlanpaper was devoted to the specifics of this com-binatorial search. Subsequent interpretationsof GraphPlan recognized the role of the plan-ning graph in capturing reachability informa-tion. Subbarao Kambhampati, Eric Lambrecht,and Eric Parker (1997) explicitly characterizedthe planning graph as an envelope approxima-tion of the progression search tree (see figure 3of his work and the associated discussion).Bonet and Geffner (1999) interpreted Graph-Plan as an IDA* search with the heuristicencoded in the planning graph. XuanLongNguyen and Subbarao Kambhampati (2000)described methods for directly extractingheuristics from the planning graph. That sameyear, FF (Hoffmann and Nebel 2001), a planner

using planning graph heuristics placed first inthe International Planning Competition. Sincethen there has been a steady stream of devel-opments that increased both the effectivenessand the coverage of planning graph heuristics.

Classical PlanningIn this section we start with a brief backgroundon how the classical planning problem is rep-resented and why the problem is difficult. Wefollow with an introduction to planning graphheuristics for state-based progression search(extending plan prefixes). In the Heuristics inAlternative Planning Strategies section, we cov-er issues involved with using planning graphheuristics for state-based regression and planspace search.

BackgroundThe classical planning problem is defined as atuple <P, A, SI, G>, where P is a set of proposi-tions, A is a set of actions, SI is an initial state,and G is a set of goal propositions. Throughoutthis article we will use many representationalassumptions (described later) consistent withthe STRIPS language (Fikes and Nilsson 1971).While STRIPS is only one of many choices foraction representation, it is very simple, andmost other action languages can be compileddown to STRIPS (Nebel 2000). In STRIPS a states is a proper subset of the propositions P, whereevery proposition p � s is said to be true (or tohold) in the state s. Any proposition p � s isfalse in s. The initial state is specified by a set ofpropositions I � P known to be true (the falsepropositions are inferred by the closed-worldassumption), and the goal is a set of proposi-tions G � P that must be made true in a state sfor s to be a goal state. Each action a � A isdescribed by a set of propositions pre(a) forexecution preconditions, a set of propositionsthat it causes to become true eff+(a), and a setof propositions it causes to become false eff–(a).An action a is applicable appl(a, s) to a state s ifeach precondition proposition holds in thestate, pre(a) � s. The successor state s� is theresult of executing an applicable action a instate s, where s� = exec(a, s) = s\ eff– (a) � eff+(a).A sequence of actions {a1, … an}, executed instate s, results in a state s�, where s� = exec(an,exec(an–1, … exec(a1, s)…)) and each action isexecutable in the appropriate state. A validplan is a sequence of actions that can be exe-cuted from sI and results in a goal state. Fornow we assume that actions have unit cost,making the cost of a plan equivalent to thenumber of actions.

We use the planning domain description

Articles

SPRING 2007 49

Page 4: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

language (PDDL) (McDermott 1998) todescribe STRIPS planning problems. Figure 2 isa PDDL formulation of the rover problem forclassical planning.2 On the left is a domaindescription and on the right is a probleminstance. The domain description uses predi-cates and action schemas with free variables toabstractly define a planning domain. The prob-lem instance defines objects, an initial state,and a goal. Through a process called groundingwe use the objects defined in the problemdescription to instantiate predicates and actionschemas. Grounding involves using every com-bination of objects to replace free variables inpredicates to obtain propositions and in actionschemas to obtain ground actions. The prob-lem instance in figure 2 denotes that the initialstate is:

SI = {at(alpha), avail(soil, alpha), avail(rock, beta), avail(image, gamma)},

and that the goal is:G = {comm(soil), comm(image), comm(rock)}.

The domain description in figure 2 lists threeaction schemas for driving between two loca-tions, communicating data, and obtaining databy sampling. For example, the drive actionschema can be instantiated with the alpha and

beta location objects to obtain the groundaction drive(alpha, beta) where its preconditionis {at(alpha)}, and it causes {at(beta)} to becometrue and {at(alpha)} to become false. Executingdrive(alpha, beta) from the initial state resultsin the state:

s� = exec(drive(alpha, beta), sI)= {at(beta), avail(soil, alpha),

avail(rock, beta), avail(image, gamma)},

because at(beta) becomes true and at(alpha)becomes false. A valid plan for the problem infigure 2 is following sequence of actions:

{sample(soil, alpha), commun(soil), drive(alpha, beta), sample(rock, beta), commun(rock), drive(beta, gamma), sample(image, gamma), commun(image)}

Reachability HeuristicsClassical planning can be viewed as finding apath from an initial state to a goal state in astate-transition graph. This view suggests a sim-ple algorithm that constructs the state-transi-tion graph and uses a shortest path algorithmto find a plan in O(n log n) time. However,practical problems have a very large number ofstates n. In the example, there are a total of 18propositions, giving n = 218 = 2.6 � 105 states(which may be feasible for a shortest path). By

Articles

50 AI MAGAZINE

(define (domain rovers_classical)

(:requirements :strips :typing)

(:types location data)

(:predicates

(at ?x - location)

(avail ?d - data ?x - location)

(comm ?d - data)

(have ?d - data))

(:action drive

:parameters (?x ?y - location)

:precondition (at ?x)

:effect (and (at ?y) (not (at ?x))))

(:action commun

:parameters (?d - data)

:precondition (have ?d)

:effect (comm ?d))

(:action sample

:parameters (?d - data ?x - location)

:precondition (and (at ?x) (avail ?d ?x))

:effect (have ?d))

)

(define (problem rovers_classical1)

(:domain rovers_classical)

(:objects

soil image rock - data

alpha beta gamma - location)

(:init (at alpha)

(avail soil alpha)

(avail rock beta)

(avail image gamma))

(:goal (and (comm soil)

(comm image)

(comm rock)))

)

Figure 2. PDDL Description of Classical Planning Formulation of the Rover Problem.

Page 5: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

making the problem more realistic, adding 17more locations (a total of 20), 12 additionaltypes of data (a total of 15), and another rover,there are 420 propositions and n = 2420 = 2.7 �10126 states.3 With approximately 1087 particlesin the universe, explicitly representing andsearching a state-transition graph of this size isimpractical.

Instead of an explicit graph representation,it is possible to use a search algorithm and apropositional representation to constructregions of the state-transition graph, as need-ed. However, in the worst case, it is still possi-ble to construct the entire transition graph.Heuristic search algorithms, such as A* search,can “intelligently” search for plans and, wehope, avoid visiting large regions of the transi-tion graph. The critical concern of such heuris-tic search algorithms is the design of a goodheuristic.

To illustrate heuristic search for plans, con-sider the most popular search formulation, pro-gression (also known as forward chaining). Thesearch creates a projection tree rooted at theinitial state sI by applying actions to leaf nodes(representing states) to generate child nodes.Each path from the root to a leaf node corre-sponds to a plan prefix, and expanding a leafnode generates all single-step extensions of theprefix. A heuristic estimates the “goodness” ofeach leaf node, and in classical planning thiscan be done by measuring the cost to reach agoal state (hence the terminology reachabilityheuristics). With the heuristic estimate, searchcan focus effort on expanding the most prom-ising leaf nodes.

For instance, consider the empty plan prefix(starting at the initial state sI) in our example.Possible extensions of the plan prefix includedriving to other locations or sampling soil atthe current location. While each of theseextensions contain actions relevant to sup-porting the goals, they have different comple-tion costs. If the rover drives to another loca-tion, then at some point it will need to comeback and obtain the soil sample. It would bebetter to obtain the soil sample now to avoidextra driving later. A reachability heuristicshould be able to measure this distinction.

Exact and Approximate Reachability InformationAn obvious way to compute exact reachabilityinformation is to compute the full projectiontree rooted at the initial state. The projectiontree for our example is depicted in figure 3a.The projection is represented as states in darkboxes connected through edges for actions.The propositions holding in each state are list-

ed (except for the avail propositions, which arein all states).4 Within this tree, the exact reach-ability cost for each node is the minimal lengthpath to reach a state satisfying the goal. Forexample, the cost of reaching at(beta) is 1, andthe cost of reaching have(rock) is 2. It is easy tosee that access to such exact reachability infor-mation can guide the search well.

Expecting exact reachability information isimpractical, as it is no cheaper than the cost ofsolving the original problem! Instead, we haveto explore more efficient ways of computingreachability information approximately. Ofparticular interest are “optimistic” (or lower-bound) approximations, as they can providethe basis for admissible heuristics. It turns outthat the planning graph data structure suits ourpurpose quite well. Figure 3b shows the plan-ning graph for the rover problem in juxtaposi-tion with the exact projection tree. The plan-ning graph is a layered graph structure withalternating action and proposition layers (withthe former shown in rectangles). There areedges between layers: an action has its precon-ditions in the previous layer and its effects inthe next layer. For instance, the sample(soil,alpha) action, which is applicable at every lev-el, has incoming edges from its preconditionpropositions avail(soil, alpha) and at(alpha), andan outgoing edge for its effect have(soil). Inaddition to the normal domain actions, theplanning graph also uses “persistence” actions(shown by the dotted lines), which can be seenas noops that take and give back specific propo-sitions. It is easy to see that unlike the projec-tion tree, the planning graph structure can becomputed in polynomial time.

There is an obvious structural relationshipbetween planning graphs and projection trees:the planning graph seems to correspond to anenvelope over the projection tree. In particular,the action layers seem to correspond to theunion of all actions at the corresponding depthin the projection tree, and the proposition lay-ers correspond to the union of all the states atthat depth (with the states being treated as“sets” of propositions). The envelope analogyturns out to be more than syntactic—theproposition and action layers can be viewed asdefining the upper bounds on the feasibleactions and states in a certain formal sense.Specifically, every legal state s at depth d in theprojection tree must be a subset of the proposi-tion layer at level d in the planning graph. Theconverse however does not hold. For instance,P1 contains the propositions at(beta) andhave(soil), but they do not appear together inany state at depth 1 of the search graph. In oth-er words, planning graph data structure is pro-

Articles

SPRING 2007 51

Page 6: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

viding optimistic reachability estimates.We can view the planning graph as the exact

projection tree for a certain relaxation of thedomain. The relaxation involves ignoring theinteractions between and within action effectsduring graph expansion. Two axioms capturethese interactions and distinguish state expan-sion in the projection tree versus propositionlayer expansion in the planning graph. In theprojection tree, the first axiom expresses thatthe state (set of propositions) resulting fromapplying action a1 is exclusive of the stateresulting from applying a2. For example, atdepth 1 of the projection tree applying sam-ple(soil, alpha) to state s1 makes have(soil) truein only state s13; have(soil) is not true in theother states. The second axiom states that theeffect propositions of each action must holdtogether in the resulting state (that is, they arecoupled). Applying drive(alpha, beta) to state s1makes at(gamma) true and at(alpha) false instate s11; without coupling the effects, the statewould allow both at(gamma) and at(alpha) tobe true. Expanding a proposition layer alsoinvolves applying several actions, but withmodifications to the two axioms above. Thefirst modified axiom states that the set ofpropositions resulting from applying a1 is inde-pendent of the propositions resulting from

applying as meaning that they are neitherexclusive nor coupled. Consider how bothat(gamma) and have(soil) appear in the firstproposition layer P1 of the planning graph(suggesting that the state {at(gamma),have(soil)} is reachable at depth 1). The secondmodified axiom states that the effect proposi-tions of each action are also independent. Forexample, the first proposition layer P1 containsboth at(gamma) and at(alpha) through theindependence within the effect of drive(alpha,gamma); the assumption allows ignorance ofhow drive(alpha, gamma) makes at(alpha) false.The projection tree maintains state barriers andcouplings between effect propositions, wherethe planning graph removes both constraints.With an intuitive structural interpretation, theformal definitions of the planning graph andthe heuristics that it encodes follow quite easi-ly.

Planning GraphsWe start by formalizing planning graphs andfollow with a description of several planninggraph–based reachability heuristics. Tradition-ally, progression search uses a different plan-ning graph to compute the reachability heuris-tic for each state s. A planning graph PG(s, A),constructed for the state s and the action set A,

Articles

52 AI MAGAZINE

Figure 3. Progression Search Graph (a) and Planning Graph (b) for the Rover Problem.

Page 7: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

is a leveled graph, captured by layers of vertices(P0(s), A0(s), P1(s), A1(s), ..., Ak(s), Pk+1(s)),where each level i consists of a proposition lay-er Pi(s) and an action layer Ai(s). In the fol-lowing, we simplify our notation for a plan-ning graph to PG(s), assuming that the entireset of actions A is always used. The notation foraction layers Ai and proposition layers Pi alsoassumes that the state s is implicit.

The first proposition layer, P0, is defined asthe set of propositions in the state s. An actionlayer Ai consists of all actions that have all oftheir precondition propositions in Pi. A proposi-tion layer Pi, i > 0, is the set all propositions giv-en by the positive effect5 of an action in Ai–1. Itis common to use implicit actions for proposi-tion persistence (also known as noop actions)to ensure that propositions in Pi–1 persist to Pi.A noop action ap for proposition p is defined aspre(ap) = eff+(ap) = p.

Planning graph construction can continueuntil one of the following conditions holds: (1)the graph has leveled off (that is, two subse-quent proposition layers are identical), or (2)the goal is reachable (that is, every goal propo-sition is present in a proposition layer).

In figure 3a the planning graph has all of thegoal propositions {comm(soil), comm(rock),comm(image)} in P3 and will level off at P4. Itis also possible to truncate planning graph con-struction at any level. If the goal is not reach-able before truncation, then the number of lev-els is still a lower bound on the number of stepsto reach the goal. However, if the goal is notreachable before the graph has leveled off, thenthe goal is not reachable and there is no plan.

Heuristic Estimates of Plan CostPlanning graph heuristics are used to estimatethe plan cost for a transition between twostates, a source state and a destination state.The source state is always the state that definesP0, and the destination state is one of poten-tially many goal states.

There are two fundamental types of plan-ning graph heuristics, level-based and relaxedplans (Nguyen, Kambhampati, and Nigenda2002). The most obvious level-based heuristic,called the set-level heuristic, estimates the plancost to reach a goal state by finding the first lev-el where the proposition layer includes all ofthe propositions in the goal. This level index isused as the heuristic. Other level-based heuris-tics compute a cost c(s, g) to reach each propo-sition g � G from the state s and then numeri-cally aggregate the costs, through max-i mization (maxg�G c(s, g)) or summation (�g�Gc(s, g)). The heuristics are level based becausethe cost of each proposition p is determined by

the index of the first proposition layer in whichit appears, c(s, p) = mini:p�Pi i. For instance, thegoal proposition comm(soil) first appears in P2,meaning it has a cost of 2. If a proposition doesnot appear in any proposition layer of a lev-eled-off planning graph, then its cost is �. Oth-erwise, if the planning graph has k levels buthas not leveled off, the cost of such missingpropositions is at least k. Unlike level-basedheuristics that relate level to cost, relaxed plansidentify the actions needed to causally supportall goals (while ignoring negative interactions).We explore both types of heuristics in detail.

To illustrate the different heuristics, we willuse three goals: G = {comm(soil), comm(image),comm(rock)} (the original goal in the example),G1 = {at(beta), have(rock)}, and G2 = {at(beta),have(soil)}. Table 1 lists the cost estimates madeby different heuristics for each goal. The tablealso lists the true optimal plan cost for eachgoal. The following discussion uses these goalsto explain the properties of each heuristic.

Level-Based HeuristicsThe level-based heuristics make a strongassumption about the cost of reaching a set ofpropositions: namely, that achievement cost isrelated to the level where propositions firstappear. Depending on how we assess the costof the set of propositions, we make additionalassumptions.

The set-level heuristic is defined as the indexof the first proposition layer where all goalpropositions appear. For example, the cost ofG1 is 2, because both propositions first appeartogether in P2. The set-level heuristic assumesthat the subplans to individually achieve thegoal propositions positively interact, which isappropriate because the rover must be at(beta)in order to have(rock). Positive interaction cap-tures the notion that actions taken to achieveone goal will simultaneously help achieveanother goal. Positive interaction is not alwaysthe right assumption: the cost to reach G2 is 1because both propositions appear in P1 butthey require different actions for support. Theset-level heuristic for the cost to reach G is 3because the goal propositions are not reachabletogether until P3 of PG(sI). The set-level heuris-tic never overestimates the cost to achieve a setof propositions and is thus admissible. As wewill see in the next section, we can adjust theset-level heuristic with mutexes to incorporatenegative interactions and strengthen theheuristic.

As an alternative to the set-level heuristic,cost can be assigned to each individual propo-sition in the goal. Aggregating the cost for eachproposition provides a heuristic. Using a max-

Articles

SPRING 2007 53

Page 8: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

imization assumes that the actions used toachieve the set of propositions will positivelyinteract, like the set-level heuristic. For exam-ple, with G1 it is possible to achieve at(beta) inP1 defining c(sI, at(beta)) = 1, and achieve have(rock) in P2, defining c(sI, have(rock)) = 2. Tak-ing the maximum of the costs max(1, 2) = 2 toachieve each proposition avoids counting thecost of drive(alpha, beta) twice. The max heuris-tic for reaching the original goal G from sI is 3,because max(c(sI, have(soil)), c(sI, have(rock)),c(sI, have(image))) = max(2,3,3) = 3. At thispoint, the set-level and max heuristics are iden-tical. We will see in the next section how set-level more readily incorporates negative inter-actions between goal propositions to improvethe heuristic (making it different from the maxheuristic). Conversely, in the section on cost-based planning, we will see how the maxheuristic and sum heuristic (described next)generalize easily because they are alreadydefined in terms of proposition costs.

Using a summation to aggregate propositioncosts assumes that the subplans to achieve theset of propositions will be fully independent.Independence captures the notion that actionstaken to achieve one goal will neither aid norprevent achievement of another goal. Forexample, with G2 it is possible to achieveat(beta) in P1, defining c(sI, at(beta)) = 1, andachieve have(soil) in P1, defining c(sI, have(soil))= 1. Taking the summation of the costs 1 + 1 =2 accurately measures the plan cost as 2(because the required actions were independ-ent), whereas taking a maximization wouldunder-estimate the plan cost as 1. The sum

heuristic for reaching the original goal G is 8because c(sI, have(soil)) + c(sI, have(rock))+ c(sI,have(image)) = 2 + 3 + 3 = 8. In practice the sumheuristic usually outperforms the max heuris-tic, but gives up admissibility.

The primary problem with level-basedheuristics is that they assume that the proposi-tion layer index is equal to the cost of achiev-ing a proposition. Because planning graphsoptimistically allow multiple parallel actionsper step, using level to define cost can be mis-leading. Consider a goal proposition that firstappears in P2: its cost is 2. In reality the propo-sition may be supported by a single action with100 precondition propositions, where eachproposition must be supported by a differentaction. Thus, a plan to support the goal wouldcontain 101 actions, but a level-based heuristicestimates its cost as 2. One can overcome thislimitation by using a relaxed plan heuristic(described next), by using cost propagation(described in the Cost-Based Planning section),or by using a serial planning graph (describedin the next section).

Relaxed Plan HeuristicsMany of the problems with level-based heuris-tics came from ignoring how multiple actionsper level execute in parallel. The reachabilityheuristic should better estimate the number ofactions in a plan. Through a simple back-chaining algorithm (figure 4) called relaxed planextraction, it is possible to identify the actionsin each level that are needed to support thegoals or other actions.

Relaxed plans are subgraphs (P0RP, A0

RP, P1RP,

..., An–RP

1, PnRP) of the planning graph, where

each layer corresponds to a set of vertices.Where appropriate, we represent relaxed plansby their action layers (omitting noop actionsand empty action layers) and omitting allproposition layers. A relaxed plan satisfies thefollowing properties: (1) every proposition p �Pi

RP, i > 0, in the relaxed plan is supported byan action a � Ai–

RP1 in the relaxed plan, and (2)

every action a � AiRP in the relaxed plan has its

preconditions pre(a) � PiRP) in the relaxed plan.

A relaxed plan captures the causal chainsinvolved in supporting the goals but ignoreshow actions may conflict. For example, arelaxed plan extracted from PG(sI) may containboth drive(alpha, beta) and drive(alpha, gamma)in A0

RP because they both help support the goal,despite conflicting.

Figure 4 lists the algorithm used to extractrelaxed plans. Lines 2–4 initialize the relaxedplan with the goal propositions. Lines 5–13 arethe main extraction algorithm that starts at thelast level of the planning graph n and proceeds

Articles

54 AI MAGAZINE

Heuristic G G1 G2

Set-Level 3 2 1

Max 3 2 1

Sum 8 3 2

Relaxed Plan 8 2 2

True Cost 8 2 2

Table 1. Heuristic Estimates and True Cost to Achieve Each Goal from the Initial State.

The goals are G = {comm(soil), comm(image), comm(rock)}, G1 = {at(beta),have(rock)}, and G2 = {at(beta), have(soil)}.

Page 9: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

to level 1. Lines 6–9 find an action to supporteach proposition in a level. Choosing actionsin 7 is the most critical step in the algorithm.We will see in later sections, such as the sectionon cost-based planning, that it is possible tosupplement the planning graph with addition-al information that can bias the action choicein line 7. Lines 10–12 insert the preconditionsof chosen actions into the relaxed plan. Thealgorithm ends by returning the relaxed plan.The relaxed plan heuristic is the total numberof non-noop actions in the action layers. Therelaxed plan to support the goal G from sI isdepicted in figure 3b in bold. Each of the goalpropositions is supported by a chosen action,and each of the actions has its preconditionssupported. There are a total of eight actions inthe relaxed plan, so the heuristic is 8. In line 7of figure 4, it is common to prefer noop actionsfor supporting a proposition (if possible)because the relaxed plan is likely to includefewer extraneous actions. For instance, aproposition may support actions in multiplelevels of the relaxed plan; by supporting theproposition at the earliest possible level, it canpersist to later levels.

The advantage of relaxed plans is that theycapture both positive interaction and inde-pendence of subplans used to achieve thegoals, rather than assuming one or the other.The goals G1 and G2 have the respective relaxedplans:

(A0RP = {drive(alpha, beta)},

A1RP = {sample(rock, beta)}), and

(A0RP = {sample(soil, alpha), drive(alpha, beta)}).

In these, the relaxed plan heuristic is equiva-lent to the max heuristic for the positivelyinteracting goal propositions in G1, and it isequivalent to the sum heuristic for the inde-pendent goal propositions in G2. Relaxed plansmeasure both action independence and posi-tive interaction, making them a compromisebetween the max and sum heuristics (whichmeasure one or the other).

Relaxed plan extraction can be quite fast,mostly due to the fact that the extraction canbe done by a backtrack free choice of actions ateach level. It is possible to find the optimalrelaxed plan by using line 7 in figure 4 as abacktrack point in a branch and boundscheme. In general, finding an optimal relaxedplan is NP-hard. Recent work (Do, Benton, andKambhampati 2006) has noticed that, like theoriginal GraphPlan search for plans, findingrelaxed plans can be posed as combinatorialsearch, solvable by optimizing or satisficingalgorithms. While the procedural approach(described above) is practical in classical plan-ning, using combinatorial search may better

incorporate cost, utility, and other factorsdescribed in later sections. Nevertheless, wewill see that procedural relaxed plan extractioncan be biased in several ways to handle addi-tional problem constraints, such as actioncosts. As with all heuristics, the quality andcomputation time play competing roles, leav-ing the dominance of procedural versus com-binatorial algorithms, as yet, unsettled. Regard-less of the extraction algorithm, relaxed planstend to be very effective in practice and formthe basis for most modern heuristic searchplanners.

Related WorkLevel-based heuristics were first introduced byNguyen and Kambhampati (2000, 2001), whilerelaxed plan heuristics became popular withthe success of the FF planner (Hoffmann andNebel 2001). There are a number of improve-ments and alternatives that can be madebeyond the basic ideas on planning graph con-struction and heuristic extraction. One com-mon extension is to limit the branching factorof search using actions occurring in the plan-ning graph or relaxed plan. The idea is torestrict attention to only those actions whosepreconditions can be satisfied in a reachablestate. A conservative approach would allowsearch to use only actions appearing in the lastaction layer of a leveled-off planning graph (acomplete strategy). The actions that do notappear in these layers will not have their pre-conditions reachable and are useless. A less

Articles

SPRING 2007 55

RPExtract(PG(s), G)

1: Let n be the index of the last level of PG(s)

2: for all p∈ G ∩Pn do /* Initialize Goals */

3: P RPn ← PRP

n ∪ p

4: end for

5: for i = n ... 1 do

6: for all p∈ PRPi do /* Find Supporting Actions */

7: Find a ∈A such that p ∈ eff +(a)

8: ARP←ARPi−1∪ a

9: end for

10: for all a ∈ARP , p ∈ pre(a) do/* Insert Preconditions */

11: P RP← PRP ∪ p

12: end for

13: end for

14: return (P RP0 ,ARP

0 , PRP1 , ...,ARP

n−1 , P RPn )

i−1

i−1

i−1

i−1

i−1

Figure 4. Relaxed Plan Extraction Algorithm.

Page 10: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

conservative approach uses only those actionsin the layer preceding the layer where the goalsare supported (an incomplete strategy)(Nguyen, Kambhampati, and Nigenda 2002).An even more aggressive and incomplete strat-egy called “helpful actions” (Hoffmann andNebel 2001) involves applying only thoseactions that have effects similar to actions cho-sen for the first step of the state’s relaxed plan.In the rover example, search with helpfulactions would only apply actions to sI thathave at least one of the propositions {have(soil),at(beta), at(gamma)} in their effect becausethese propositions appear in P1

RP. Another use for relaxed plans is to derive

macro actions (plan fragments). The YAHSPplanner (Vidal 2004) encodes relaxed plansinto a CSP to resolve action conflicts, creatinga macro action. This often leads to search mak-ing very few (albeit expensive) choices.

While the planning graph was originallyintroduced as part of the GraphPlan searchalgorithm and subsequently used for heuristicsin state-based search, GraphPlan-based searchhas been revived in the search employed byLPG (Gerevini, Saetti, and Serina 2003). LPGuses local search on the planning graph totransform a relaxed plan into a feasible plan.LPG uses relaxed plan heuristics to evaluatepotential plan repairs that add and removeactions.

Adjusting for Negative Interactions

We noted that the proposition layers in theplanning graph are an upper bound approxi-mation to the states in the projection tree.Specifically, while every legal state is a subset ofthe proposition layer, not every subset of theproposition layer corresponds to a legal state.Because of the latter, the reachability estimatescan be too optimistic in scenarios where thereare negative interactions between the actions.There is a natural way to tighten the approxi-mation—mark subsets of the proposition lay-ers that cannot be present together in any legalstate. Any subset of the proposition layer thatsubsumes one of these “mutual exclusion” setswill not correspond to a legal state (and thusshould not be considered reachable by that lev-el). The more mutual exclusion sets we canmark, the more exact we can make the plan-ning graph–based reachability analysis. Intu-itively, if all mutual exclusion sets of all arityare marked, then the planning graph proposi-tion layers correspond exactly to the projectiontree states. The important point is that themarking procedure does not need to be all or

nothing. As long as the markup is “sound” (inthat any subset marked mutually exclusive isindeed mutually exclusive), the planninggraph continues to provide an optimistic (low-er-bound) estimate on reachability.

The question is whether it is possible to marksuch mutual exclusions efficiently. It turns outthat it is indeed feasible to compute a large sub-set of the binary mutual exclusions (hence-forth called “mutexes”) in quadratic time usinga propagation procedure. Mutexes arise inaction layers, where they record which of theactions cannot be executed in parallel (becausethey disagree on whether a proposition shouldbe positive or negative in an effect or precon-dition). Mutex propositions result from mutex-es among supporting actions when there is noset of nonmutex actions supporting the propo-sitions.6

To illustrate, figure 5 shows mutexes for thefirst two levels of the planning graph PG(sI).Mutexes are denoted by the arcs betweenactions and between propositions. Notice thatthe two drive actions in A0 are mutex with allother actions and the persistence of at(alpha).Both drive actions make at(alpha) false, inter-fering with each action requiring that at(alpha)be true. This results in proposition mutexesbetween have(soil) and both at(beta) andat(gamma) in P1. Mutexes capture the fact thatthe rover cannot be at(beta) and have(soil) afterone step. However, notice that it is possibleafter two steps for the rover to be at(beta) andhave(soil). Mutexes effectively reduce the opti-mism about which states are reachable, tight-ening the approximation of the progressionsearch graph.

A stronger interpretation and computationof mutex propagation takes place in the previ-ously mentioned serial planning graphs, whichremove all notion of action parallelism by forc-ing each action to be mutex with every otheraction (except for noop actions). The serialplanning graph has the advantage of improv-ing level-based heuristics by more tightly cor-relating level and cost. However, serial plan-ning graphs are not often used in practicebecause they require a large number of addi-tional mutexes (and also make the planninggraph achieve level-off much later). In theremainder of this section, we describe the “nor-mal” planning graph with mutexes, whereaction parallelism is allowed.

Heuristic AdjustmentsIt is possible to improve the quality of planninggraph heuristics by adjusting them to make upfor ignoring negative interactions. We startedwith a relaxation of the projection tree that

Articles

56 AI MAGAZINE

Page 11: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

ignored all negative interactions, in favor ofcausal support, to estimate the achievementcost of subgoals. While this relaxation workswell for many problems and search strategies,there are ways to either strengthen or adjust theheuristic relaxation to better guide search. Forinstance, it is possible to strengthen the set-lev-el heuristic because it is defined as the index ofthe first level where all goal propositionsappear reachable. If any pair of the goal propo-sitions is mutex in a level, then even if all goalpropositions are present the goal is not reach-able. Set level is still admissible with mutexes,and becomes a better lower bound on the costto reach the goal. What is perhaps more inter-esting is that admissibility depends only on thesoundness but not the completeness of mutexpropagation. We can thus be lazy in propagat-ing mutexes. For example, consider the set-lev-el heuristic for G2 = {at(beta), have(soil)} withand without mutexes. With mutexes theheuristic value is 2 because the propositions donot appear together without a mutex until P2,whereas without using mutexes the heuristicvalue is 1. This illustrates the differencebetween the max and set-level heuristicsbecause without mutexes the max heuristic isequal to the set-level heuristic and with mutex-es they are different.

Alternatively we can adjust the relaxed planheuristic, to get the adjusted sum heuristic, byadding a penalty factor to account for ignoringnegative interactions. One penalty factor canbe the difference between the set-level heuris-tic and the max heuristic: indicating howmuch extra work is needed to achieve the goalstogether. While the adjusted sum heuristic iscostly because it involves computing a relaxedplan, set level, and max heuristic, it guidessearch very well and improves performance onlarge problems (especially in regression search,described in the next section).

The notion of adjusting heuristics to accountfor ignored constraints is called phased relax-ation. As we will see in later sections, includingmore problem features makes it more difficultto include everything in planning graph con-struction. Phased relaxation allows us to get avery relaxed heuristic and make it stronger.

Related WorkMutexes were originally introduced in Graph-Plan (Blum and Furst 1995), but Nguyen andKambhampati (2000) realized their applicationto adjusting heuristics. Helmert (2004)describes another way to include negativeinteractions in reachability heuristics. Plan-ning domains best expressed with multivalued

Articles

SPRING 2007 57

at(α)

sample(soil, α)

drive(α, β)

drive(α, γ)

avail(soil, α)

avail(rock, β)

avail(image, γ)

at(α)

avail(soil, α)

avail(rock, β)

avail(image, γ)

at(β)

at(γ)

have(soil)

sample(soil, α)

drive(α, β)

drive(α, γ)

at(α)

avail(soil, α)

avail(rock, β)

avail(image, γ)

at(β)

at(γ)

have(soil)

drive(β, γ)

drive(β, α)

drive(γ, α)

commun(soil)

sample(rock, β)

sample(image, γ)

drive(γ, β)

have(image)

comm(soil)

have(rock)

A1A0 P1P0 P2

Figure 5. Classical Planning Graph with Binary Mutexes.

Page 12: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

variables (as in the SAS+ language [Backstromand Nebel 1995]) are often described in theclassical planning formalism with Booleanvariables (such as the at(?x) predicate in theexample). By representing an inherently multi-valued variable as Boolean, information aboutnegative interactions is lost (for example, thatat(alpha) and at(beta) are impossible together),where it is readily available in the multivaluedencoding (for example, at = alpha or at = betabecause at can have only one value). By trans-lating a Boolean encoding of a problem to amultivalued encoding, it is possible to solvedifferent relaxed planning tasks (which incor-porate these negative interactions). As yet,there is no decisive result indicating whetherBoolean or multivalued variable encodings arebest, much as there still exists the same ques-tion with respect to SAT and CSP.

Another recent work (Yoon, Fern, and Givan2006) automatically learns adjustments forrelaxed plan heuristics. The idea is to learn alinear regression function from training plansthat reflect the difference in relaxed planlength and the actual cost to reach the goal.

Heuristics in Alternative Planning Strategies

Planning graph heuristics can be used for manydifferent types of search, including regressionand plan space search. Regression and planspace search are similar, in that both usemeans-ends reasoning to find actions that arerelevant to supporting the goals. We have pre-sented planning graph heuristics in terms ofprogression search, where we estimate the costof a plan to transition between two sets ofpropositions (a search state and the goal). Wecan use the same technique for means-endsreasoning by carefully redefining the two setsof propositions. The initial state replaces thesearch state, and regression states replace thegoal.

Regression SearchRegression search (or back chaining) constructsplan suffixes starting from the goal. Becausethe planning problem defines a goal by a set ofpropositions instead of a state, there may be

Articles

58 AI MAGAZINE

comm(soil)

comm(rock)

comm(image)

commun(image)

comm(soil)

comm(rock)

have(image)

comm(soil)

have(rock)

have(image)

comm(soil)

avail(rock, β)

have(image)

at(β)

sample(rock, β)

S1

comm(rock)

comm(image)

comm(soil)

S0

avail(rock, β)avail(image, γ)

avail(soil, α)

at(α)

commun(image)

S1 comm(image)

have(image)

commun(rock)

S2

comm(rock)

have(rock)

sample(rock, β)

S3 have(rock)

avail(rock, β)

at(β)

commun(rock)

comm(soil)

avail(rock, β)

avail(image, γ)

at(β)

at(γ)

sample(image, γ)

sample(image, γ)

S4 have(image)

avail(image, γ)

at(γ)

S1

comm(rock)

comm(image)

comm(soil)

S0

avail(rock, β)avail(image, γ)

avail(soil, α)

at(α)

commun(image)

S1 comm(image)

have(image)

commun(rock)

S2

comm(rock)

have(rock)

sample(rock, β)

S3 have(rock)

avail(rock, β)

at(β)

A

B

C

S1

S2

S1

S3S

4S

G

Figure 6. State Space and Plan Space Regression.

Page 13: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

several goal states. Rather than individuallysearch for a plan for each goal state, it is com-mon to search for all goal states at once in thespace of partial states. Each partial state is char-acterized by a set of propositions that areknown to be true and a set of propositions thatare known to be false, assuming all otherpropositions are unknown. Since each proposi-tion is true, false, or unknown, there are 3|P|

partial states. Regression in the space of partialstates identifies plan suffixes that will reach agoal state. The regression tree is rooted at thepartial goal state, and leaf nodes are partialstates. The path from a leaf node to the root isa plan suffix that guarantees any state consis-tent with the leaf node can reach a goal state byexecuting the plan suffix. A valid plan startswith a partial state that is satisfied by the initialstate.

Figure 6 illustrates a regression search path(figure 6a) and two partial plan space plans (fig-ures 6b and 6c), described later in the section,for the rover example. The regression searchtrace starts on the right with a partial statecomprised of the goal propositions. Each par-tial state is generated by reasoning about whatmust be true in the previous state for theapplied action to make the latter state exist. Forexample, s1 is the partial state resulting fromregressing the partial goal state sG with com-mun(image). Because commun(image) does notmake any proposition in sG false and it makescomm(image) true, it is applicable to sG. To con-struct s1, regression removes the positive effectcomm(image) and adds the preconditionhave(image).

Heuristics for RegressionPlanning graphs can estimate the cost of a planto transition between a complete state and apartial state. In progression, search encountersseveral complete states (each requiring a plan-ning graph), and each must reach a partial goalstate. In regression, search generates severalpartial states, and each must be reached fromthe complete initial state (requiring a singleplanning graph). Planning graphs are single-source, multiple-destination relaxations of theprojection tree. In progression, the source isconstantly changing, depending on the searchstate, and the destination is always the goal. Inregression, the source is constant, but the des-tination changes, depending on the regressedpartial state. While it may seem to the readerthat regression has a definite advantagebecause it needs only one planning graph, wewill shortly see that the situation is more com-plex.

The planning graph in figure 3b can be used

to compute the heuristic for any regressed par-tial state in the example. To compute a level-based heuristic for s3, we find comm(soil) firstappears in P2, avail(rock, beta) appears in P0,have(image) appears in P2, and at(beta) appearsin P1. Using the level-based sum heuristic tomeasure its cost provides 2 + 0 + 2 + 1 = 5. Sim-ilarly, for state s4, the same heuristic provides 2+ 0 + 0 + 1 + 1 = 4.

The reader should notice that the s4 regres-sion state in figure 6 is inconsistent because itasserts that the rover is at two locations atonce. Regression search allows inconsistent /unreachable states. The heuristic measure ofthis inconsistent state is 4, but it should be �because it cannot be reached from the initialstate. The search may continue fruitlessly andexplore through this state unaware that it willnever become a valid solution. One can correctthis problem by propagating mutexes on theplanning graph and using the set-level heuris-tic. With set level there is no level whereat(beta) and at(gamma) are not mutex, so theheuristic value is �. Nguyen and Kambhampati(2001) show that the adjusted sum heuristicdiscussed earlier—which adds a penalty basedon negative interactions to the relaxed planheuristic—does quite well in regression search.However, this analysis increases the computa-tional cost of planning graph construction.

Progression Versus RegressionTable 2 summarizes the many important dif-ferences between progression and regressionfrom the point of view of reachability heuris-tics. In progression, the search encounters sev-eral complete states (at most 2|P|), and one ofthem must reach a partial goal state. Regressionsearch encounters several partial states (at most3|P|), and one of them must be reached from thecomplete initial state. Progression requires aplanning graph for each encountered state,whereas regression requires a single planninggraph for the initial state of the problem.Mutexes play a larger role in regression byimproving the ability of search to prune incon-sistent partial states. However, binary mutexanalysis is often not enough to rule out allinconsistent states, and considerably costlierhigher-order mutex propagation may berequired. Progression states are always consis-tent and mutexes only help adjust the heuris-tic. Thus, there is a trade-off between searchingin the forward direction constructing severalinexpensive planning graphs and searching inthe backward direction with a single costlyplanning graph. Current planner performanceindicates that progression works better.

Articles

SPRING 2007 59

Page 14: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Plan Space SearchPlan space search (also known as partial-orderor causal-link planning) shares many similari-ties with regression search and it too can bene-fit from planning graph heuristics. Both planspace and regression search reason backwardfrom the goal to the initial state, but plan spacesearch does not maintain an explicit notion ofstate. Rather, plan space search algorithms rea-son about which actions causally support theconditions of other actions or the goals.

Any partial plan space plan can be charac-terized in terms of its open conditions: thosepreconditions or goals that do not yet havecausal support. The open conditions may nev-er all need to hold in a single state at the sametime, but we can think of them as an approxi-mate regression state. In particular, by assum-ing that negative interactions do not exist, it isalways possible to achieve all conditions in thebeginning of the plan and persist them towhere they are needed. With this approximatestate, we can measure its reachability cost fromthe initial state just as in regression search(Nguyen and Kambhampati 2001; Younes andSimmons 2003).

In figure 6b and 6c, each plan space plan isrepresented by a set of steps, causal links, andordering relations. Each step has an associatedaction with preconditions and effects. Wedenote each step by a rectangle, with the pre-conditions on the left and effects on the right.The open conditions are contained in ovals.The regression state s3 contains every proposi-tion that is an open condition in the plan spaceplan in figure 6b, and likewise for the s4 andplan space plan in figure 6c. The heuristic val-ues for the plan space plans in figure 6 are thesame as the analogous regression states becausetheir open conditions are identical to thepropositions that hold in the regression states.

Related WorkPlanning graph heuristics in regression searchwere first considered by Nguyen and Kamb-hampati (2000, 2001). Their application to par-tial-order planning was first considered byNguyen and Kambhampati (2001) and subse-quently by Younes and Simmons (2003).

GraphPlan itself also benefits from planninggraph heuristics. By adopting the view thatGraphPlan solves a constraint satisfactionproblem where variables are propositions andvalues are supporting actions, planning graphheuristics help define variable and value order-ing heuristics. The GraphPlan value orderingheuristic is to consider only those supporters ofa proposition that appear in the previousaction layer. Many works have improved uponthe GraphPlan search algorithm by usingheuristics similar to those just described. Forinstance, Kambhampati and Sanchez (2000)recognized how level-based heuristics can beused in the ordering heuristics in GraphPlan.Defining proposition cost as before, it is prefer-able to support the most costly propositionsfirst to extract plans starting at later levels.Extracting plans at earlier levels first can leadto costly backtracking. Defining action cost asthe aggregate cost of its precondition proposi-tions, it is preferable to support a propositionwith the least costly actions first. As we will seein the next section, these heuristics are relatedto our intuitions for propagating cost in theplanning graph. The PEGG planner (Zimmer-man and Kambhampati 2005) uses similarplanning graph heuristics in its search. PEGGlinks GraphPlan search with state space searchby identifying the states consistent with thesteps in a partial GraphPlan solution. Using theidentified states, PEGG can get better heuristicsto rank partial solutions and guide GraphPlansearch.

Some less-traditional GraphPlan algorithms

Articles

60 AI MAGAZINE

Feature Progression Regression

Search Nodes Complete/Legal States Incomplete/Possibly Inconsistent

Search Space Size 2|P| 3|P|

Number of PlanningGraphs

O(2|P|) 1

Mutexes Marginally Helpful Extremely Helpful

Table 2. Comparison of Planning Graphs for Progression and Regression Search.

Page 15: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

also use reachability heuristics. Least Commit-ment Graph-Plan (LCGP) (Cayrol, Regnier, andVidal 2000) uses heuristics similar to Kamb-hampati and Sanchez (2000) in a GraphPlanalgorithm that alters the structure of the plan-ning graph. LCGP reduces the number of levelsin the planning graph by allowing more paral-lelism; mutex actions can appear in the samelevel if there is a feasible serialization. Theadvantage of LCGP is in using fewer levels tofind the same length plan, thus reducing thecost of constructing the planning graph. Thework of Hoffmann and Geffner (2003) exploitsthe connection between GraphPlan and con-straint satisfaction problems to find alternativebranching schemes in the BBG planner. BBGallows action selection at arbitrary planninggraph layers until a consistent plan is found.BBG guides its search with a novel level-basedheuristic. Each partial plan is a set of actionchoices in different planning graph layers, sothe heuristic for each partial plan constructs aplanning graph respecting the partial planaction choices. For instance, BBG wouldexclude all actions that are mutex with a cho-sen action from the planning graph, favoringits commitments. The number of levels in theresulting planning graph is the heuristic meritof the partial plan.

We mentioned that a key difference betweenprogression and regression is the number ofplanning graphs required. The backward plan-ning graph (Refanidis and Vlahavas 2001)allows progression search to use a single plan-ning graph, similar to regression. The backwardplanning graph reverses the planning graph byidentifying states reachable by regressionactions from the goal. In practice, the reverseplanning graph suffers from the differencebetween the initial state and goal descriptions:the former is complete (that is, consistent witha single state), while the latter is partial (that is,consistent with many states). Because there aremultiple goal states, reachability analysis isweakened, having fewer propositions to whichactions are relevant.

Planning graph heuristics can be computedin alternative ways that were originally discov-ered in the context of partial-order planningand regression. IxTeT (Ghallab and Laruelle1994) and UNPOP (McDermott 1996, 1999)compute the reachability heuristics on demandusing a top-down procedure that does notrequire a planning graph. Given a state s whosecost to reach the goal needs to be estimated,both IxTeT and UNPOP perform an approxi-mate version of regression search (called“greedy regression graphs” in UNPOP) to esti-mate cost of that state. The approximation

involves making an independence assumptionand computing the cost of a state s as the sumof the costs of the individual literals compris-ing s. When a proposition p is regressed usingan action a to get a set of new subgoals s�, s� isagain split into its constituent propositions.UNPOP also has the ability to estimate the costof partially instantiated states (that is, stateswhose predicates contains variables).

HSP (Bonet and Geffner 2000a) and HSP-r(Bonet and Geffner 1999) use a bottom-upapproach for computing the cost measures,again without the use of a planning graph.They use an iterative fixed-point computationto estimate the cost of reaching every literalfrom the given initial state. The computationstarts by setting the cost of the propositionsappearing in the initial state to 0, and the costof the rest of the propositions to �. The costsare then updated using action application untila fixed point is reached. The updating schemeuses independence assumptions in the follow-ing way: if there is an action a that gives aproposition p, and a requires the preconditionsp1, p2, ..., pi, then the cost of achieving p isupdated to be the minimum of its current cost,and the sum of the cost of a and the sum of thecosts of achieving p1, p2, ..., pi.

Once this updating is done to the fixedpoint, estimating the cost of a state s involvessumming the costs of the propositions com-posing s. In later work, Haslum and Geffner(2000) point out that this bottom-up fixed-point computation can be seen as an approxi-mate dynamic programming procedure, inwhich the cost of a state s is estimated as themaximum of the costs of its k-sized subsets ofpropositions.

These approaches can be related to planninggraph heuristics. Planning graph heuristics canbe seen as a bottom-up computation of reach-ability information. As discussed in Nguyen,Kambhampati, and Nigenda (2002, section6.3), basing the bottom-up computation onthe planning graph rather than on an explicitdynamic programming computation can offerseveral advantages, including the access torelaxed plans and help in selecting actions.

Cost-Based Planning The rover problem is coarsely modeled in theclassical planning framework because there aremany seemingly similar quality plans that areactually quite different. The first limitation welift is the assumption that actions have unitcosts. In the rover example with unit costactions, there are two equivalent plans toachieve the goal G. The first visits beta then

Articles

SPRING 2007 61

Page 16: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

gamma, and the second visits gamma thenbeta. However, with the cost model illustratedin figure 7, there is a difference between thesetwo plans. Figure 7 graphically depicts the cost(used here) and duration (used later) of eachaction. Driving to gamma then beta is 15 unitscostlier than driving to beta then gamma.

The extension to planning graph heuristicsto consider nonuniform action costs is to prop-agate cost functions in the planning graph.Where previously cost was measured using theplanning graph–level index, cost is now lesscorrelated with the level index, thus requiringcost functions. Reachability heuristics measurethe cost of achieving a set of propositions withthe cost functions. This requires explicitlytracking the cost of achieving a proposition ateach layer. Shortly, we will describe how heuris-tics change, but we consider first how one canpropagate cost functions in the planninggraph.

Cost FunctionsCost functions capture the distinction betweenreachability cost and proposition-layer indices

by propagating the best cost to achieve eachproposition at each level. Figure 8 illustratesthe planning graph PG(sI) with cost functions.The cost above each proposition indicates itsachievement cost, and the cost above eachaction is the cost to support and execute theaction. Notice that it takes only one level toachieve at(gamma), but it requires thedrive(alpha, gamma) action whose cost is 30.The cost functions allow a possibly differentcost for a proposition at different levels. Forinstance, at(gamma) has cost 30 in P1 and cost25 in P1 because it can be achieved with a dif-ferent, lower cost in two levels. With unit-costactions, achieving a proposition in fewer levelsis usually best, but with non-unit-cost actions,a least cost plan may use more levels.

Cost functions are computed during plan-ning graph expansion with a few simple prop-agation rules. Propositions in the initial propo-sition layer have a cost of zero. Each action ina level is assigned the sum of its execution costand its support cost. An action’s support costreflects the cost to reach its set of preconditionpropositions. It is typical to define the cost of a

Articles

62 AI MAGAZINE

$10, [50]

$10, [15]

$30, [30]

$5, [20]

$20, [10]

$25, [30]$5, [3]

$5, [1]

$15, [25]

$5,[50]

Figure 7. Costs ($) and Durations ([]) for Actions.

Page 17: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

set of propositions with a maximization (usedin max-propagation) or summation (used insum-propagation) of the individual proposi-tion costs (similar to the max and sum heuris-tics). Finally, the cost to reach a proposition isthe same as its least cost supporting actionfrom the previous level.

Figure 8 depicts the planning graph with costfunctions propagated from the initial state,using max-propagation. The goals comm(soil)and comm(rock) have minimum cost in theproposition layer where they first appear (P2and P3, respectively), but comm(image) does nothave minimum cost until one extra layer, P4,after it first appears in P3. The comm(image)goal proposition becomes less costly afteranother level because it is more costly to drivedirectly to gamma from alpha (one step), thanit is to drive from alpha to beta and beta togamma (two steps).

Notice that costs monotonically decreaseover time. Since cost propagation alwayschooses the cheapest supporter for a proposi-tion and the set of supporters monotonicallyincreases, it is only possible for cost to decreaseor remain constant. The cost of every proposi-tion in figure 8 decreases, and eventuallyremains constant, leading us to several strate-gies for ending cost propagation.

Ending Cost PropagationThe original termination criterion for planninggraph construction is no longer sufficientbecause the cost of each proposition maydecrease at each level. Stopping when the goal

propositions first appear may overestimateplan cost when they have lower costs at laterlevels. Stopping when two proposition layersare identical (the graph is leveled off) does notprevent an action’s cost from dropping in thelast action layer, making a goal propositionhave lower cost. To allow costs to stabilize, wecan generalize the leveling-off condition to use�-lookahead by extending the planning graphuntil proposition costs do no change. In somecases, using �-lookahead may be costlybecause costs do not stabilize for several levels.An approximation to �-lookahead is to use k-lookahead by extending the planning graph anadditional k levels after the goal propositionsfirst appear. The example, depicted in figure 8uses 1-lookahead (extending the planninggraph one level past the first appearance of allgoal propositions).

Cost-based HeuristicsLevel-based heuristics no longer accuratelymodel the cost to reach the goal propositionsbecause actions are no longer unit cost. Cost-based heuristics generalize level-based heuris-tics by using cost functions instead of levelindices to define proposition reachability costs.With cost functions the heuristics care aboutthe minimum cost to achieve a propositioninstead of its first level. The minimum cost isalways the cost at the last level of the planninggraph because costs decrease monotonically. Inthe example depicted in figure 8, at(gamma)has cost 30 after one step because it can only besupported by drive(alpha, gamma), but after

Articles

SPRING 2007 63

comm(soil) comm(rock) comm(image) Cost Utility Net Benefit

0 0 0

35 20 -15

40 60 20

65 80 15

25 50 25

60 70 10

65 110 45

90 130 40

Table 3. The Net Benefit for Each Goal Subset with Respect to an Optimal Cost Plan for the Subset.

The optimal net benefit is shown in bold.

Page 18: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

two steps its cost is 25 because it can be sup-ported by drive(beta, gamma). The cost of theat(gamma) proposition is 25 for the rest of theplanning graph, thus appearing with mini-mum cost in the last level. By knowing theminimum cost of each goal proposition it ispossible to use max or sum heuristic to meas-ure reachability cost. For example, the maxheuristic for G is max(c(sI, comm(soil)), c(sI,comm(rock)), c(sI, comm(image))) = max(25, 40,35) = 40, and the sum heuristic for G is 25 + 40+ 35 = 100. The set-level heuristic is no longermeaningful with cost functions because thelevel index is not directly related to cost.

It is possible to get an admissible cost-basedheuristic when actions have nonuniform cost.The heuristic must never overestimate theaggregate cost of a set of propositions, mean-ing that both the cost of a set of preconditionpropositions and the goal propositions areaggregated by maximization. This correspondsto using max-propagation within a planninggraph using �-lookahead and the max heuris-tic.

Cost-Sensitive Relaxed Plan HeuristicsRelaxed plan heuristics change in a significantway with cost functions. When there are sever-al supporting actions for a proposition, it isbest to bias the action choice toward one thatsupports with minimal cost (corresponding tothe choice in line 7 of figure 4). It is not alwaysenough to choose the action that is least cost-ly; it is also important to consider the cost tosupport the preconditions of the action. Therelaxed plan, as depicted in bold in figure 8,chooses supporting actions that have the leastpropagated cost among the possible support-ers. For instance, comm(image) is supported atlevel 4 with the commun(image) action (cost35) instead of the noop from level 3 (cost 40).Using propagated cost functions provides alook-ahead capability to the greedy extractionscheme that often results in low-cost relaxedplans. The relaxed plan heuristic is the sum ofthe execution costs of the actions in the actionlayers, providing 90 as the reachability heuris-tic for G. As mentioned earlier, extracting leastcost relaxed plans is NP-hard.

Articles

64 AI MAGAZINE

at(α)

sample(soil, α)

drive(α, β)

drive(α, γ)

avail(soil, α)

avail(image, γ)

at(α)

avail(soil, α)

avail(rock, β)

avail(image, γ)

at(β)

at( )

have(soil)

sample(soil, α)

drive(α, γ)at(α)

avail(soil, α)

avail(rock, β)

at(β)

at(γ)

have(soil)

drive(β, γ)

commun(soil)

sample(rock, β)

sample(image, γ)

have(image)

comm(soil)

have(rock)

commun(image)

commun(rock)

comm(image)

comm(rock)

sample(soil, α)

drive(α, γ)at(α)

avail(soil, α)

avail(rock, β)

avail(image, γ)

at(β)

at(γ)

have(soil)

drive(β, γ)

drive(β, α)

commun(soil)

sample(rock, β)

sample(image, γ)

drive(γ, β)

have(image)

comm(soil)

have(rock)

commun(image)

commun(rock)

comm(image)

comm(rock)

sample(soil, α)

at(α)

avail(soil, α)

avail(rock, β)

avail(image, γ)

at(γ)

have(soil)

drive(β, γ)

commun(soil)

sample(image, γ)

drive(γ, β)

have(image)

comm(soil)

have(rock)

0

0

0

0

20

10

30

0

0

0

0

0

0

0

0

0

0

0

0

20

10

30

20

10

30

20

10

30

20

35

10

30

35

25

25

15

35

40

35

35

25

25

30

40 40

3030

40

40

15

25

30

30

30

25

15

35

35

A1A0 A2P1P0 P2 P3 A3 P4

drive(α, β (evird) α, β) drive(α, β)

at(β)

drive(α, γ)

drive(β, α)drive(β, α)

drive(γ, α) drive(γ, α)drive(γ, α)

drive(γ, β)

avail(rock, β)

avail(image, γ)

sample(rock, β)

0 0 0

20 20 20

0

0101 10

25 25

2525

40 40

35 35

3535

25 25

40 40

Figure 8. Cost Propagation for the Portion of the Planning Graph for the Initial State.

Page 19: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Related WorkThe Sapa planner (Do and Kambhampati 2003)introduced the ideas for cost propagation inmetric-temporal planning problems. In metric-temporal planning problems it is common forplanners to trade off the cost of a plan with itsmakespan (duration). We will discuss how tohandle the temporal aspects of planning prob-lems shortly. The “Hybrid Planning Graphs”section describes how temporal planning com-bines with cost-based planning.

The POND planner (Bryce and Kambhampati2005) also makes use of cost propagation on aplanning graph called the LUG (described inthe nondeterministic planning section) forconditional planning problems. In conditionalplanning problems, the planner must oftentrade off the cost of sensing with the cost ofacting. The ideas developed by our earlier work(Bryce and Kambhampati 2005) describe howto propagate costs over multiple planninggraphs used for search in belief space.

Partial Satisfaction (Oversubscription) Planning

In the previous section we described handlingaction costs in the planning graph to guidesearch toward low-cost plans that achieve all ofthe goal propositions. However, as often hap-pens in reality, the benefit of a plan thatachieves all goals does not always justify thecost of the plan. In the rover example, the rovermay expend considerable cost (power) in orderto obtain an image while the benefit of havingthe image is low. However, it should get theimage if the additional residual cost of obtain-ing the image is not too high. By balancing goalutility and plan cost one can satisfy a subset ofthe goals to maximize net benefit of the plan.Net benefit assumes that plan cost and goal util-ity are expressed in the same currency, suchthat net benefit is total goal utility minus plancost.7 The cost-sensitive heuristics discussed inthe previous section can be applied to guidesearch toward plans with high net benefit byaugmenting them to consider goal utility.

In the example, the utility function over thegoal propositions is set such that comm(soil)has utility 50, comm(rock) has utility 60, andcomm(image) has utility 20. We will assume anadditive utility for the goals. Inspecting theoptimal cost plans to achieve each subset ofthese goal propositions identifies that the opti-mal net benefit is 45 for the plan to achieve{comm(soil), comm(rock)}. Table 3 lists the asso-ciated cost, utility, and net benefit of the opti-mal plan for each goal subset.

Picking Goal SetsThe simplest way to handle partial satisfactionproblems is to first pick the goal propositionsthat should be satisfied and then use them in areformulated planning problem, as describedby Van den Briel et al. (2004). The reformulat-ed problem removes the goal utilities andretains action costs to find a cost-sensitive planfor the chosen subset of goal propositions. Thetrick is to decide which of the possible 2|G| goalsubsets to use in the reformulation. One way toreformulate the problem is to iteratively builda goal subset G� accounting for positive inter-actions between goals. Each iteration considersadding one proposition g � G\G� to G� basedon how it changes the estimated net benefit ofG�.

The algorithm starts by initializing G� withthe most beneficial goal, by estimating the netbenefit of each goal proposition with its utilityminus its minimum propagated cost in theplanning graph. In the example, the goalpropositions have the following net benefits:

comm(soil) is utility 50 and cost 25 for net benefit 25,

comm(rock) is utility 60 and cost 40 for net benefit 20, and

comm(image) is utility 20 and cost 35 for net benefit –15.

The costs to achieve these goal propositionsare, coincidentally, the same in both the plan-ning graph (figure 8) and the true optimal planin table 3, but this is not true in general. Thealgorithm chooses comm(soil) to initialize thegoal set. Upon choosing the initial goal set G,the algorithm improves the estimate of its netbenefit by extracting a relaxed plan (which willcome in handy later). The relaxed plan forcomm(soil) is:

(A0RP = {sample(soil, alpha)},

A1RP = {commun(soil)}),

which costs 25—the same as propagated cost. After picking the initial goal proposition for

the parent goal set G, the algorithm iteratesthrough possible extensions of the goal setuntil the total net benefit stops increasing. Thecost of each candidate extension is measured interms of the cost of a relaxed plan to supportthe candidate goal. To measure as much posi-tive interaction as possible, the relaxed plan forthe candidate goal is biased to reuse actions in therelaxed plan for the parent goal set.

The candidate goal {comm(rock)} or the can-didate goal {comm(image)} can be used toextend G�. The biased relaxed plan for{comm(soil), comm(rock)} is:

(A0RP = {sample(soil, alpha), drive(alpha, beta)},

A1RP = {commun(soil), sample(rock, beta)},

A2R = {commun(rock)}),

Articles

SPRING 2007 65

Page 20: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

with utility 110 and cost 65 for net benefit 45.Similarly, {comm(soil), comm(image)} has therelaxed plan:

(A0RP = {sample(soil, alpha), drive(alpha, beta)},

A1RP = {commun(soil), drive(beta, gamma)},

A2RP = {sample(image, gamma)},

A3R = {commun(image)}),

with utility 70 and cost 60 for net benefit 10.The candidate goal {comm(rock)} increases netbenefit the most (by 25), so it is added to G�.

The only extension of the new goal set is theentire set of goals {comm(soil), comm(rock),comm(image)}, which has the relaxed plan:

(A0RP = {sample(soil, alpha), drive(alpha, beta)},

A1RP = {commun(soil), sample(rock, beta),

drive(beta, gamma)},A2

RP = {sample(image, gamma), commun(rock)},A3

RP = {commun(image)})

with utility 130 and cost 100 for net benefit 30.This decreases net benefit by 15 from 45, so theadditional goal is not added. The reformulatedplanning problem for the chosen goal subset{comm(soil), comm(rock)} is solved by ignoringgoal utility. Any of the cost-sensitive planningheuristics can be used to solve the subsequentproblem. In order to guarantee an optimal netbenefit plan, it may be necessary to find theoptimal cost-sensitive plan for every goal sub-set and select the plan with highest net benefit.

Related WorkPartial satisfaction planning was first suggestedas a generalization to classical planning bySmith (2002). The described techniques rely onthe work of Van den Briel et al. (2004). Sincethis work, several algorithmic improvementsand problem generalizations have appeared.The recent International Planning Competi-tion (Gerevini, Bonet, and Givan 2006) includ-ed a track specifically to encourage and evalu-ate work on partial satisfaction planning.

One of the first extensions improves iterativegoal set selection. Sanchez and Kambhampati(2005) develop methods for measuring nega-tive interaction between goals in addition tothe positive interaction described above. Theapproach involves using mutexes to measurethe conflict between two goals and adjustingthe heuristic measure of cost. This is anotherway of adjusting the heuristic to make up forrelaxations as we discussed earlier.

Do, Benton, and Kambhampati (2006) dis-cuss a generalization where the goal utilitieshave dependencies (for example, achievingcomm(soil) and comm(rock) together is signifi-cantly more useful than achieving either in iso-lation). The ideas described above can beextended to handle these more expressiveproblems. Do, Benton, and Kambhampati

(2006) use an integer linear program toimprove the quality of relaxed plans by betteraccounting for utility dependencies.

An alternative to preselecting goal sets is tosearch for the best goal set (Do and Kambham-pati 2004). Using progression search, it is pos-sible to approximately determine when a plansuffix cannot improve net benefit. Using tech-niques similar to those just described, it is pos-sible to estimate the remaining net benefit thatis reachable in a possible plan suffix. Since it ispossible to quickly find plans that satisfy someof the goals, an any-time search algorithm canreturn plans with increasing net benefit.

Another approach (Smith 2004) to oversub-scription planning uses similar planning graphheuristics to derive an abstraction of the plan-ning problem, called the orienteering problem.The solution to the orienteering problem pro-vides a heuristic to guide search. Through asensitivity analysis in the planning graph,through proposition and action removal, it ispossible to identify propositions and actionsthat have the most influence on the cost ofachieving goals, and these are used in theabstract problem.

Planning with Resources (Also Known as Numeric

State Variables) Planning with resources involves state vari-ables that have integer and real values (alsotermed functions or numeric variables).Actions can, in addition to assigning a specificvalue to a variable, increment or decrement thevalue of a variable. Actions also have precondi-tions that, in addition to equality constraints,involve inequalities.

We augment our example, as shown in fig-ure 9, to describe the rover battery power withthe power function. Each action consumespower, except for a recharge action that allowsthe rover to produce battery power (throughproper alignment of its solar panels). The roverstarts with partial power reserves, that are notenough to achieve every goal. Thus, the roverneeds a plan that is not much different fromthe classical formulation but adds some appro-priately inserted recharge actions.

Planning graphs require a new generaliza-tion to deal with numeric variables. Previously,it was simple to represent the reachable valuesof variables because they were only Booleanvalued. With numeric variables there is a pos-sibly infinite number of reachable values.Depending on whether a variable is real orinteger, there may be continuous or contiguousranges of values that are reachable, which are

Articles

66 AI MAGAZINE

Page 21: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

best represented as intervals. Tracking intervalsin the planning graph helps determine when aparticular enabling resource threshold can becrossed (for example, if the rover has enoughpower to perform an action). Because of therich negative interactions surrounding re -sources, it is generally quite difficult to exactlycharacterize the resource intervals. The ap -proach taken in many works is simply to rep-resent the upper and lower bounds for possiblevalues. Recall that the planning graph is opti-mistic in terms of reachability, so it is reason-able to represent only the upper and lowerbounds even though some intermediate valuesare not in fact reachable.

Tracking BoundsConsider the planning graph, shown in figure10, for the initial state of the problem describedin figure 9. The power variable is the onlynumeric variable whose value changes, and it isannotated with an interval of its possible val-ues. In the first proposition layer, the possiblevalues of the power variable are [25, 25], as giv-en in the initial state. This makes only some of

the actions applicable in the first action layer;notice that drive(alpha, gamma) is not applica-ble because it requires 30 power units. The sec-ond proposition layer has the power bounds setto [–5, 50]. The lower bound is –5 because inthe worst case all consumers are executed inparallel—sample(soil, alpha) consumes 20 unitsof power and drive(alpha, beta) consumes 10units, so 25 – (10 + 20) = –5. Notice that deter-mining the bounds on a numeric variable thisway may provide loose bounds because manyof the actions cannot really be executed in par-allel. For instance, the lower bound on powerin the second proposition layer is –5. Ignoringinteractions between actions will preventnoticing that executing both sample(soil, alpha)and drive(alpha, beta) in parallel uses morepower than is available (even though they indi-vidually have enough power to execute). Like-wise, in the best case all producers execute inparallel: recharge as well as the persistence forpower each produce 25 units of power, so 25 +25 = 50.

Articles

SPRING 2007 67

(define (problem rovers_resource1) (:domain rovers_resource) (:objects soil image rock - data alpha beta gamma - location) (:init (at alpha) (avail soil alpha) (avail rock beta) (avail image gamma) (= (effort alpha beta) 10) (= (effort beta alpha) 5) (= (effort alpha gamma) 30) (= (effort gamma alpha) 5) (= (effort beta gamma) 15) (= (effort gamma beta) 10) (= (effort soil) 20) (= (effort rock) 25) (= (effort image) 5) (= (power) 25)) (:goal (and (comm soil) (comm image) (comm rock))))

(define (domain rovers_resource)(:requirements :strips :typing)(:types location data)(:predicates (comm ?d - data) (have ?d - data) (at ?x - location) (avail ?d - data ?x - location))(:functions (power) (effort ?x ?y - location) (effort ?d - data))(:action drive :parameters (?x ?y - location) :precondition (and (at ?x) (>= (power) (effort ?x ?y))) :effect (and (at ?y) (not (at ?x)) (decrease (power) (effort ?x ?y))))(:action commun :parameters (?d - data) :precondition (and (have ?d) (>= (power) 5)) :effect (and (comm ?d) (decrease (power) 5)))(:action sample :parameters (?d - data ?x - location) :precondition (and (at ?x) (avail ?d ?x) (>= (power) (effort ?d))) :effect (and (have ?d) (decrease (power) (effort ?d))))(:action recharge :parameters () :precondition (and) :effect (and (increase (power) 25)))

)

Figure 9. PDDL Description of Resource Planning Formulation of the Rover Problem.

Page 22: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Level-Based HeuristicsLevel-based heuristics can be changed toaccommodate resource intervals. The first levelwhere a resource takes on a given value is thefirst level where the resource interval containsthe value. For instance, if the goal were to have(> (power) 55), then the set-level heuristic esti-mates the cost to be 2, since power has theinterval [–115, 75] in P2. As before, it is possi-ble to aggregate the cost of several goal propo-sitions to define a level-based heuristic.

Relaxed Plan HeuristicsExtracting a relaxed plan in the presence ofnumeric variables is a little more involvedbecause multiple supporters may be needed foreach subgoal.8 A resource subgoal may needmultiple supporters because its bounds arecomputed assuming that multiple actions pro-duce or consume the resource. In the example,the power variable is a precondition of everyaction. Including an action in the relaxed planrequires that there be enough power. By ignor-ing interactions between actions, when severalactions require power, the relaxed plan needonly have enough power for the action that

requires the maximum amount of power. Con-sider the number listed below the power vari-able in figure 10. This number denotes themaximum amount of power needed to supportan action chosen for the relaxed plan. In P2

RP

and P3RP the power requirement is 5 because the

actions chosen for the respective A2RP and A3

RP

have a maximum power requirement of 5 (thisincludes persistence actions). The relaxed plancan support the respective power requirementsby persistence from the previous propositionlayer because the maximum power is greaterthan the requirements. In P1

RP, the powerrequirement is 30 because drive(alpha, gamma)in A1

RP requires 30 units of power. Persistencealone cannot support the needed 30 units ofpower and the relaxed plan must use both thepersistence of power and the recharge action tosupport the requirement.

Related WorkPlanning graph heuristics for numeric statevariables were first described by Hoffmann(2003), tracking only upper bounds. Sanchezand Mali (2003) describe an approach for track-ing intervals. Benton, Do, and Kambhampati(2005) adapt planning graph heuristics to par-

Articles

68 AI MAGAZINE

at(α)

avail(soil, α)

avail(rock, β)

avail(image, γ)

sample(soil, α)

at(α)

avail(soil, α)

avail(rock, β)

avail(image, γ)

at(β)

have(soil)

sample(soil, α)

recharge

at(α)

avail(soil, α)

avail(rock, β)

at(β)

at(γ)

have(soil)

commun(soil)

sample(rock, β)

comm(soil)

have(rock)

power [-5,50]

recharge

power [25,25]

commun(image)

commun(rock)comm(image)

comm(rock)

at(α)

at(β)

at(γ)

have(soil)

commun(soil)

sample(image, γ)have(image)

comm(soil)

have(rock)

sample(soil, α)

recharge

avail(soil, α)

avail(rock, β)

avail(image, γ)

sample(rock, β)

commun(rock)comm(rock)

at(α)

at(β)

at(γ)

have(soil)

commun(soil)

sample(image, γ)

have(image)

comm(soil)

have(rock)

sample(soil, α)

recharge

avail(soil, α)

avail(rock, β)

avail(image, γ)

power [-115,75]

power [-250,100]

power[-390,125]

sample(rock, β)

55305

A1

A0 A2P1P0 P2 P3 A3 P4

drive(α, β) drive(α, β (evird) α, β)

drive(β, γ) drive(β, γ (evird) β, γ)

drive(β, α) drive(β, α) drive(β, α)

drive(γ, α) drive(γ, α)

drive(γ, β)drive(γ, β)

drive(α, γ) drive(α, γ)drive(α, γ)

avail(image, γ)

drive(α, β)

Figure 10. Resource Planning Graph for the Initial State.

Page 23: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

tial satisfaction planning scenarios where thedegree of satisfaction for numeric goalschanges net benefit. The technique propagatescosts for the resource bounds. As the boundschange over time, they track a cost for the cur-rent upper and lower bounds. Where cost prop-agation usually updates the cost of each valueof a variable, with resources, the number ofvariable values is so large that only the costs ofupper and lower bounds are updated.

Temporal Planning To this point we have discussed planning mod-els that have atomic time and are sequential—a single action is executed at each step. In tem-poral planning, actions have real duration andmay execute concurrently. In the rover exam-ple (adapted from the classical model in figure2 to have the durations depicted in figure 7), itshould be possible for the rover to communi-cate data to the lander while it is driving.

9

Where the planning objective was previouslyplan length (or cost), with durative actions itbecomes makespan (the total time to executethe plan). As such, temporal reachabilityheuristics need to estimate makespan. Whiletemporal planning can have actions with cost,goal utility, and resources, we omit these fea-tures to concentrate solely on handling dura-tive actions.

We adopt the conventions used by the Sapaplanner (Do and Kambhampati 2003) to illus-trate temporal reachability heuristics. Sapaconstructs a temporal plan by adding actionsto start at a given time. Sapa uses progressionsearch, meaning that after adding some num-ber of actions to start at time t, it will only addactions to time t� such that t� > t. The searchinvolves two types of choices, either concur-rently add a new action to start at time t (fat-ten) or advance the time t for action selection(advance). With concurrency and durativeactions, the search state is no longer character-ized by just a set of propositions. Each chosenaction may have its effects occur at some timein the future. So the state is not just what istrue now but also what the plan commits tomaking true in the future (delayed effects).With this extended state and temporal actions,temporal planning graph construction andheuristic extraction must explicitly handletime.

Temporal Planning GraphWithout durative actions the planning graph isconstructed in terms of alternating propositionand action layers. The most important infor-mation is the “first level” where a proposition

is reachable (with this information and theaction descriptions, it is possible to constructrelaxed plans). In the temporal planninggraph, heuristics are concerned with the “firsttime point” a proposition is reachable. Consid-er the temporal planning graph in figure 11 forthe initial state. The temporal planning graphis the structure above the dashed line thatresembles a Gantt chart. Each action is shownonly once, denoting the first time it is exe-cutable and its duration. The propositions thatare given as effects are denoted by the verticallines; each proposition is shown only at thefirst time point it is reachable. The planninggraph shows that comm(soil) is first reachableat time 11, comm(image) is first reachable attime 34, and comm(rock) is first reachable attime 76. In the temporal planning graph thereis no explicit level because actions can spanany duration and give effects at any time with-in their duration. In general, a durative actioncan have an effect at any time ta � t � ta +dur(a), where ta is the time the action executes,t is the time the effect occurs, and dur(a) is theaction’s duration. In the rover example, eachaction gives its effect at the end of its execu-tion. For example, a = drive(alpha, gamma)starts at ta = 0, and gives its positive effectat(gamma) at time ta + dur(a) = 0 + 30 = 30. Foran action to execute at a given time point t, itmust be the case that all of its preconditionsare reachable before t. In the example, sam-ple(image, gamma) cannot execute until time30 even though avail(image, gamma) is reachedat time 0 because its other precondition propo-sition at(gamma) is not reachable until time 30.If the state includes past commitments toactions whose effects have not yet occurred(that is, the delayed effects discussed earlier),then the planning graph includes these actionswith their start time possibly offset before therelative starting time of the planning graph.

It is also possible to compute mutexes in thetemporal planning graph, but they require gen-eralization to deal with concurrent actions thathave different durations. Following TemporalGraphPlan (Smith and Weld 1999), in additionto tracking proposition/proposition andaction/action mutexes, it becomes necessary tofind proposition/action mutexes explicitly.While we refer the reader to Smith and Weld(1999) for a complete account, proposition/action mutexes capture when it is impossiblefor a proposition to hold in any state while agiven action executes. In nontemporal plan-ning, these mutexes were captured by mutexesbetween actions and noops.

Articles

SPRING 2007 69

Page 24: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Temporal Planning Graph HeuristicsThe temporal planning graph for the initialstate of the rover example shows that everygoal proposition is reached by time 76, provid-ing a lower bound on the makespan of a validplan. This is an estimate of makespan that issimilar to the set-level heuristic in classicalplanning. Instead of finding the level wherethe goal propositions are reachable, the heuris-tic finds the time point where they are reach-able. Analogous to their role in classical plan-ning, mutexes can help improve makespanestimates by reducing optimism in the plan-ning graph.

In addition to makespan estimates, it is oftenconvenient to use temporal relaxed plans toestimate the plan cost. Similar to level-basedheuristics, relaxed plans use time points in theplace of levels. Recall that a classical relaxedplan is a set of propositions and actions that areindexed by the level where they appear. Tem-poral relaxed plans use a similar scheme butindex the propositions and actions by time.

Extracting a temporal relaxed plan involvesstarting from the time point where all goals arereachable and recursively supporting each goalas shown in figure 4. The main difference isthat instead of decrementing levels across therecursions, it decrements the time by whichsubgoals are needed. Figure 11 shows therelaxed plan in bold below the dashed line. Itsupports all of the goals at time 76. To supporta proposition at a time t, the relaxed planchooses an action a and sets its start time to t −dur(a) (that is, the action starts as late as possi-ble). For instance, commun(image) is used tosupport comm(image) at time 76 by starting attime 75. Each chosen action has its precondi-tion propositions supported at the time theaction starts. The commun(image) action startsat time 75, so its precondition propositionhave(image) must be supported by time 75. Ifthe state includes commitments to actionswith delayed effects, then relaxed plan extrac-tion can bias or force selection of such actionsto support propositions.

Articles

70 AI MAGAZINE

drive( , )

drive( , )

sample(rock, )

1000 5010 20 30 40 60 70 80 90

drive( , )

drive( , )

commun(soil)

sample(soil, )

commun(rock)drive( , )

drive( , )

commun(image)

sample(image, )

have(soil)

comm(soil)at( ) at( )

have(image)

comm(image)

have(rock)

comm(rock)

avail(soil, )

avail(rock, )

avail(image, )

at( )

sample(rock, ) commun(rock)drive( , )drive( , )

commun(soil)sample(soil, )

sample(image, ) commun(image)

Figure 11. Temporal Planning Graph (top) and Relaxed Plan (bottom) for the Initial State.

Page 25: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Related WorkTemporal GraphPlan (Smith and Weld 1999)was the first work to generalize planninggraphs to temporal planning, while Sapa (Doand Kambhampati 2003) was the first toextract heuristics from this type of planninggraph.

There are many extensions of GraphPlan totemporal planning. The temporal planninggraph, as we and Smith and Weld (1999) havepresented, is implemented with unique data-structure called a bilevel planning graph (Longand Fox 1999; Smith and Weld 1999). Thebilevel planning graph is a simple data struc-ture that has two layers: an action layer and aproposition layer, which are connected witharcs corresponding to preconditions andeffects. Bilevel planning graph expansioninvolves propagating lower bounds on thetimes to reach propositions and actions with-out physically generating extra levels. Bilevelplanning graphs can also be used in nontem-poral planning to save memory.

TPSys (Garrido, Onaindia, and Barber 2001)is another GraphPlan-like planner that uses adifferent generalization of planning graphs intemporal planning. Instead of removing allnotion of level to accommodate actions withdifferent durations, TPSys associates timestamps with proposition layers and allowsdurative actions to span several levels. Forexample, the proposition layer after one step isindexed by the end time of the shortest dura-tion action, and it contains this action’s effects.The explicit proposition layers can aid mutexpropagation, allowing TPSys to compute moremutexes than Temporal GraphPlan (Garrido,Onaindia, and Barber 2001). The TPSys plan-ning graph has not yet been used for comput-ing reachability heuristics.

Another approach for using GraphPlan intemporal planning, in the LPGP planner (Longand Fox 2003), uses a classical planning graphto maintain logical reachability informationand a set of linear constraints to manage time.By decoupling the planning graph and time,LPGP supports a richer model of durativeactions. The manner in which LPGP uses theplanning graph, for plan search, has not yetbeen extended to computing reachabilityheuristics.

Many of the techniques discussed in classicalplanning show up in temporal planning. Forinstance, Sapa adjusts its heuristic to accountfor negative interactions by using mutexes toreschedule its temporal plan to minimizeconflicts, resulting in a better makespan esti-mate. Sapa also uses cost functions to estimatecost in addition to makespan.

The LPG planner (Gerevini, Saetti, and Seri-na 2003) generalizes from classical planning byusing a temporal planning graph (most similarto Temporal GraphPlan) to perform localsearch in the planning graph. LPG uses tempo-ral relaxed plan heuristics, similar to thosedescribed above, to guide its search.

Another interesting and recent addition toPDDL allows for timed initial literals. These arepropositions that will become true at aspecified time point (a type of exogenousevent). Integrating timed initial literals into thetemporal planning graph is easy because thereare implicit exogenous actions that give the lit-eral as an effect. However, heuristics changeconsiderably because the planner no longer hasdirect choice over every state it reaches. Manyof the existing techniques to handle timed ini-tial literals adjust relaxed plans with commonscheduling heuristics (Gerevini, Saetti, andSerina 2005)—another example of adjustingheuristics to make up for relaxation.

Nondeterministic Planning None of the planning models we have consid-ered to this point allow uncertainty. Uncer-tainty can arise in multiple ways: a partiallyobservable state, noisy sensors, uncertainaction effects, or uncertain action durations.We first concentrate on the simplest uncertain-ty, an unobservable and uncertain state. Thisfalls into the realm of nondeterministic con-formant planning with deterministic actions(discussed later). Later in this section, we dis-cuss how the nondeterministic conformantplanning heuristics extend to conditional (sen-sory) planning. In the next section we consid-er stochastic planning with uncertain actions.

BackgroundThe conformant planning problem starts withan uncertain state and a set of actions withconditional effects. The uncertain state is char-acterized by a belief state b (a set of possiblestates), such that b � 2P. Actions with condi-tional effects are common in conformant plan-ning because the executed actions must beapplicable to all states in the belief state.Instead of requiring strict execution precondi-tions, it is common to move the conditionsinto secondary preconditions that enable con-ditional effects. This way actions are executablein belief states, but their effect depends on thetrue state in the belief state. Since the true stateis not known, the plan becomes conformant(that is, the plan should work no matter whichstate in the belief state is the true state).

Consider a modification to the rover exam-

Articles

SPRING 2007 71

Page 26: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

ple, listed in figure 12. The sample action has aconditional effect, so that sampling works onlyif a sample is available; otherwise it has noeffect. The initial state is uncertain because it isunknown where the rover can obtain a soilsample (the terrain is quite hard and rocky),but it is available at one of locations alpha,beta, or gamma. The initial belief state isdefined:

bI = {{at(alpha), avail(soil, alpha)}, {at(alpha), avail(soil, beta)}, {at(alpha), avail(soil, gamma)}}.

To simplify the example, the only goal propo-sition is comm(soil). At this point, the rovercannot determine whether it has the soil sam-ple because its sensor has broken. The solutionto this problem is to go to each location andsample to be sure the rover has the soil sample,and then communicate the data back to thelander:

{sample(soil, alpha), drive(alpha, beta),sample(soil, beta), drive(beta, gamma), sample(soil, gamma), commun(soil)},

resulting in the belief state:{{at(gamma), avail(soil, alpha), have(soil),comm(soil)},

{at(gamma), avail(soil, beta), have(soil),comm(soil)},

{at(gamma), avail(soil, gamma), have(soil),comm(soil)}},

where the goal comm(soil) is satisfied in everypossible state.

To clarify the semantics for applying actionsto belief states, consider applying sample(soil,alpha) to bI. The action is applicable because itsprecondition at(alpha) is satisfied by every statein the belief state. The resulting belief state b� isthe union of the successors obtained by apply-ing the action to every state in bI:

b� = {{at(alpha), avail(soil, alpha), have(soil)},{at(alpha), avail(soil, beta)},{at(alpha), avail(soil, gamma)}}.

The first state is the only state that changesbecause it is the only state where the condi-tional effect is enabled. Applying thedrive(alpha, beta) action to b results in:

b��= {{at(beta), avail(soil, alpha), have(soil)},{at(beta), avail(soil, beta)},{at(beta), avail(soil, gamma)}}.

Every state in b�� changes because the actionapplies to every state.

Single Planning GraphsThere are many ways to use planning graphs

Articles

72 AI MAGAZINE

(define (problem rovers_conformant1) (:domain rovers) (:objects soil image rock - data alpha beta gamma - location) (:init (at alpha) (oneof (avail soil alpha) (avail soil beta) (avail soil gamma))) (:goal (comm soil)))

(define (domain rovers_conformant)(:requirements :strips :typing)(:types location data)(:predicates (at ?x - location) (avail ?d - data ?x - location) (comm ?d - data) (have ?d - data))(:action drive :parameters (?x ?y - location) :precondition (at ?x) :effect (and (at ?y) (not (at ?x))))(:action commun :parameters (?d - data) :precondition (have ?d) :effect (comm ?d))(:action sample :parameters (?d - data ?x - location) :precondition (at ?x) :effect (when (avail ?d ?x) (have ?d)))

)

Figure 12. PDDL Description of Conformant Planning Formulation of Rover Problem.

Page 27: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

for conformant planning (Bryce, Kambham-pati, and Smith 2006a). The most straightfor-ward approach is to ignore state uncertainty.There is nothing about classical planninggraphs that requires defining the first proposi-tion layer as a state; every other propositionlayer already represents a set possibly reachableof states. One could union the propositions inall states of the belief state to create an initialproposition layer, such as:

P0= {at(alpha), avail(soil, alpha), avail(soil, beta), avail(soil, gamma)}

for the initial belief state. The planning graphbuilt from the unioned proposition layer hasthe goal appear after two levels, and the relaxedplan would be:

(A0RP = {sample(soil, alpha), commun(soil)})

which gives 2 as the heuristic (greatly underes-timating plan cost). Alternatively, one couldsample a single state from the belief state to usefor the planning graph (which may also lead to

an underestimate). Using either of theseapproaches results in a heuristic that measuresthe cost to reach the goal from one (or someintersection) of the states in a belief state. Thisis typically an underestimate because a confor-mant plan must reach the goal from all of thestates in a belief state.

Multiple Planning GraphsA more systematic approach to using planninggraphs involves constructing a planning graphfor each state in the belief state, extracting aheuristic estimate from each planning graph,and aggregating the heuristics. Similar to thelevel-based heuristics in classical planning thataggregate proposition costs, it is possible toaggregate the heuristic for each possible stateby assuming full positive interaction (maxi-mization) or full independence (summation)between the states. In the example there is pos-itive interaction between the possible initialstates because they can all benefit from using

Articles

SPRING 2007 73

at( )

sample(soil, α)

avail(soil, α)

at(α)

avail(soil, α)

at(β)

at(γ)

have(soil)

at(α)

avail(soil, )

at(α)

avail(soil, β)

at(β)

at(γ)

at(α)

avail(soil, β)

at(β)

at(γ)sample(soil, β)

have(soil)

at(α)

avail(soil, γ)

at(α)

avail(soil, )

at(β)

at(γ)

at(α)

avail(soil, γ)

at(β)

at(γ)sample(soil, γ)

have(soil)

at(α)

avail(soil, β)

at(β)

at(γ)sample(soil, β)

have(soil)

commun(soil)comm(soil)

commun(soil) comm(soil)

sample(soil, α)

at(α)

avail(soil, α)

at(β)

at(γ)

have(soil)

at(α)

avail(soil, β)

at(β)

at(γ)sample(soil, γ)

have(soil)

commun(soil)comm(soil)

A1

A0

A2

P1

P0

P2

P3

drive(α, β)

drive(α, β) drive(α, β)

drive(α, β)

drive(α, β)drive(α, β)

drive(α, β)

drive(α, γ) drive(α, γ) drive(α, γ)

drive(α, γ)

drive(α, γ)

drive(α, γ)

drive(β, γ) drive(β, γ)

drive(β, γ)drive(β, γ)

drive(α, β)

drive(β, α) drive(β, α)

drive(β, α) drive(β, α)

drive(γ, α)

drive(γ, α)

drive(γ, α)

drive(γ, α)

drive(γ, β) drive(γ, β)

drive(γ, β) drive(γ, β)

drive(α, γ)

drive(α, γ)

Figure 13. Conformant Planning Graphs.

Page 28: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

the commun action at the end of the plan.There is also independence between the initialstates because sampling works only at one loca-tion for each. Relaxed plans that do not simplynumerically aggregate individual relaxed planheuristics can capture (in)dependencies in theheuristic for the belief state, as was possible forgoal propositions in classical planning. Numer-ic aggregation loses information about posi-tively interacting and independent casualstructure that is used in common across therelaxed plans. Taking the union of the relaxedplans to get a unioned relaxed plan will bettermeasure these dependencies.

To see how to capture both positive interac-tion and independence between states in abelief state, consider figure 13. We illustrate theplanning graph for each state in the initialbelief state and a relaxed plan (in bold) to sup-port the goal. There are three relaxed plans thatmust be aggregated to get a heuristic measurefor the initial belief state bI. Using the maxi-mum cost of the relaxed plans gives 3 as theheuristic, and using the summation of costsgives 8 as the heuristic. Neither of these is veryaccurate because the optimal plan length is 6.Instead, one can step-wise union the relaxedplans. The intuition is that if an action appearsin the same layer for several of the relaxedplans, then it is likely to be useful for severalstates in the belief state, meaning the plans forstates positively interact. Likewise, actions thatdo not appear in multiple relaxed plans in thesame layer are used by the states independent-ly. Taking the step-wise union of the relaxedplans gives the unioned relaxed plan:

(A0RP = {sample(soil, alpha), drive(alpha, beta),

drive(alpha, gamma)},

A1RP = {commun(soil), sample(soil, beta),

sample(soil, gamma)},A2

RP = {commun(soil)}).

The unioned relaxed plan contains sevenactions, which is closer to the optimal planlength. The unioned relaxed plan can be poorwhen the individual relaxed plans are not wellaligned, that is, they contain many commonactions that appear in different layers. In thenext approach, it is possible to overcome thislimitation.

Labeled Planning GraphsWhile the multiple planning graph approachto obtain unioned relaxed plans is informed, ithas two problems. First, it requires multipleplanning graphs, which can be quite costlywhen there are several states in the belief state;plus, there is a lot of repeated planning graphstructure among the multiple planning graphs.Second, the relaxed plans used to obtain the

unioned relaxed plan were extracted inde-pendently and do not actively exploit positiveinteractions (like we have seen in previous sec-tions). For example, the unioned relaxed planhas commun(soil) in two different action layerswhere, intuitively, it is needed to execute onlyonce.

The solution to both these problems isaddressed with the labeled (uncertainty) plan-ning graph (LUG). The LUG represents multi-ple explicit planning graphs implicitly. Figure14 shows the labeled planning graph represen-tation of the multiple planning graphs in fig-ure 13. The idea is to use a single planninggraph skeleton to represent proposition andaction connectivity and to use labels to denotewhich portions of the skeleton are used bywhich of the explicit multiple planning graphs.Each label is a propositional formula whosemodels correspond to the propositions used inthe initial layers of the explicit planninggraphs. In figure 14, there is a set of shaded cir-cles above each proposition and action to rep-resent labels (ignore the variations in circle sizefor now). The black circles represent the plan-ning graph for the first state {at(alpha),avail(soil, alpha)} in bI, the grey circles are for thestate {at(alpha), avail(soil, beta)}, and the whitecircles are for the state {at(alpha), avail(soil, gam-ma)}. The initial proposition layer has at(alpha)labeled with every color because it holds ineach of the states, and the other propositionsare labeled to indicate their respective states.

The LUG is constructed by propagatinglabels (logical formulas) a layer at a time. Theintuition behind label propagation is thatactions are applicable in every explicit plan-ning graph where their preconditions are sup-ported, and that propositions are supported inevery explicit planning graph where they aregiven as an effect. For example, sample(soil,alpha) in A0 is labeled only with black to indi-cate it has an effect in the explicit planninggraph only for state {at(alpha), avail(soil, alpha)}(inspecting the explicit planning graphs in fig-ure 13 will confirm this). Similarly, have(soil) issupported in P1 only for the state {at(alpha),avail(soil, alpha)}. More formally, if the labelsrefer to sets of states (but are represented byBoolean formulas), then an action is labeledwith the intersection (conjunction) of its pre-conditions’ labels, and a proposition is labeledwith the union (disjunction) of its supporters’labels. The goal is reachable in the LUG whenevery goal proposition has a label with everystate in the source belief state. In the example,the goal comm(soil) is first reached by a singlesource state {at(alpha), avail(soil, alpha)} in P2and is reached by every state in P3.

Articles

74 AI MAGAZINE

Page 29: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Labeled Relaxed PlansWith a succinct, implicit representation of themultiple planning graphs it is possible toextract a relaxed plan. From the LUG one canextract the relaxed plan for all states simulta-neously and exploit positive interaction. Thelabeled relaxed plan extraction is similar to theclassical planning relaxed plan, but with onekey difference. A proposition may need supportfrom more than one action (such as when usingresources in the planning with resources sec-tion) because it needs support from several sourcestates, each requiring a supporter. Like relaxedplans for resources, there is some additionalbookkeeping needed for the labeled relaxedplan. Specifically, the relaxed plan tracks whichof the source states require the inclusion of aparticular action or proposition. Notice the cir-cles in figure 14 that are slightly larger than theothers: these indicate the source states wherethe given action or proposition is needed. Forinstance, at(gamma) in P1

RP is used only for the

source state {at(alpha), avail(soil, gamma)} (thewhite circle). Supporting a proposition is a setcover problem where each action covers someset of states (indicated by its label). For exam-ple, have(soil) in P2

RP needs support in thesource states where it supports commun(soil).There is no single action that supports it in allof these states. However, a subset of the sup-porting actions can support it by covering therequired source states. For example, to coverhave(soil) in P2

RP with supporters, the relaxedplan selects the noop for have(soil) for supportfrom state {{at(alpha), avail(soil, alpha)} (theblack circle), sample(soil, beta) for support fromstate {at(alpha), avail(soil, beta)} (the gray circle),and sample(soil, alpha) for support from state{at(alpha), avail(soil, gamma)} (the white circle).Using the notion of set cover, it easy to exploitpositive interactions with a simple greedyheuristic (which can be used to bias actionchoice in line 7 of figure 4): an action that sup-ports in several of the explicit planning graphs

Articles

SPRING 2007 75

sample(soil, )

avail(soil, ) avail(soil, )

have(soil)

at( )drive( , )

drive( , )

avail(soil, )

at( )

avail(soil, )

at( )

at( )

drive( , )

drive( , )

at( )

avail(soil, )

at( )

at( )

sample(soil, )

have(soil)

drive( , )

drive( , )

drive( , )

drive( , )

avail(soil, ) avail(soil, ) avail(soil, )

sample(soil, )

drive( , )

drive( , )

at( )

avail(soil, )

at( )

at( )

sample(soil, )

have(soil)

drive( , )

drive( , )

drive( , )

drive( , )

commun(soil)

comm(soil)

commun(soil)

comm(soil)

sample(soil, )

avail(soil, )

avail(soil, )

sample(soil, )

sample(soil, )

avail(soil, )

A1

A0

A2

P1P0 P2 P3

Figure 14. Conformant Labeled Uncertainty Graph.

Page 30: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

is better than an action that supports in onlyone. Notice that the labeled relaxed plan con-tains the commun(soil) action only once asopposed to twice in the unioned relaxed plan.The labeled relaxed plan gives a heuristic meas-ure of 6, which is the optimal cost of the realplan.

Related WorkOur discussion is based our previous work onplanning graph heuristics for nondeterministicplanning (Bryce, Kambhampati, and Smith2006a; Bryce and Kambhampati 2004).

Another approach is developed in the Con-formant and Contingent FF planners (Hoff-mann and Brafman 2004; Brafman and Hoff-mann 2005). Both planners do not explicitlyrepresent belief states; rather they keep track ofthe initial belief state and an action history tocharacterize plan prefixes. From a set of propo-sitions that must hold at the end of the actionhistory, a planning graph is constructed. Theinitial belief state, the action history, and theplanning graph are translated to a satisfiabilityinstance. The solution to the satisfiability prob-lem identifies a relaxed sequence of actionsthat will make the goal reachable. Rather thandirectly propagating belief state informationover the planning graph, as with the labeledplanning graph, this technique allows thesatisfiability solver to propagate belief stateinformation from the initial belief state.

A further extension of the LUG has beenused to develop the state agnostic planninggraph (SAG) (Cushing and Bryce 2005). Wepreviously mentioned that planning graphs aresingle source and multiple destination struc-tures. The SAG is a multiple source planninggraph, which like the LUG represents multipleplanning graphs implicitly. The SAG representsevery planning graph that progression searchwill require, making heuristic evaluation atevery search node amortize the cost of con-structing the SAG.

The GPT planner (Bonet and Geffner 2000b)and the work of Rintanen (2004) offer alterna-tive reachability heuristics for nondeterminis-tic planning. GPT relies on a relaxation of non-deterministic planning to full observability. Forinstance, GPT measures the heuristic for abelief state to reach a goal by finding the opti-mal deterministic plan for each state in thebelief state. The maximum cost deterministicplan among the states in the belief state is alower bound on the cost of a nondeterministicplan. Rintanen (2004) generalizes the GPTheuristic by finding the optimal nondetermin-istic plan for every size n subset of states in abelief state. The maximum cost nondetermin-

istic plan for a size n set of states is a better low-er bound on the optimal nondeterministicplan for the belief state.

It is possible to use the conformant planningheuristics, described above, to guide search forconditional (sensory) plans. Conditional plansbranch based on execution time feedbackbecause of negative interactions. If, for exam-ple, the sample action will break the samplingtool if no soil sample is available (because theground is otherwise rocky), then the rovermust sense before sampling. Breaking the sam-pling tool will introduce a negative interactionthat prevents finding a conformant plan. Cre-ating a plan branch for whether the location isrocky can avoid breaking the tool (and the neg-ative interaction it introduces). Since manyheuristics work by ignoring negative interac-tions, conformant planning heuristics willwork by ignoring the observations and the neg-ative interactions they remove. This is theapproach taken by Bryce, Kambhampati, andSmith (2006a) and Brafman and Hoffmann(2005) to a reasonable degree of success. Thereason using conformant relaxed plans is notoverly detrimental is that while search mayneed to generate several conditional planbranches, the total search effort over all suchbranches is related to the number of actions inthe relaxed plan. The conformant relaxed plansimply abstains from separating the chosenactions into conditional branches. The benefitof further incorporating negative interactionsand observations into conditional planningheuristics is currently an open issue.

Stochastic Planning Stochastic planning characterizes a belief statewith a probability distribution over states andthe uncertain outcomes of an action with aprobability distribution. Consider an extensionof the conformant version of the rover exam-ple to contain probability and uncertainactions, in figure 15. The sample and communactions have an uncertain outcome—they havethe desired effect or no effect at all. In the gen-eral case, there may be multiple different out-comes. The initial state is uncertain, as before,but is now characterized by a probability dis-tribution. Plans for this formulation willrequire a number of repetitions of certainactions in order to raise the probability of sat-isfying the goal (which is required to be at least0.5 in the problem). For example, executingsample(soil, alpha), followed by commun(soil),will result in only one state in the belief statesatisfying the goal comm(soil) (its probability is0.4(0.9)0.8 = 0.29). However, executing an

Articles

76 AI MAGAZINE

Page 31: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

additional commun(soil) raises the probabilityof the state where comm(soil) holds to0.4(0.9)(0.8 + 0.8(0.2)) = 0.35. By only per-forming actions at alpha, the maximum proba-bility of satisfying the goal is 0.4, so the planshould also perform some sampling at beta.

Exact LUG RepresentationIt is possible to extend the LUG to handle prob-abilities by associating probabilities with eachlabel. However, handling uncertain actions,whether nondeterministic or stochastic, istroublesome. With deterministic actions, labelscapture uncertainty only about the sourcebelief state, and the size of the labels is bound-ed (there are only so many states in the beliefstate). With uncertain actions, labels must cap-ture uncertainty about the belief state and eachuncertain action at each level of the planning graphbecause every execution of an action may havea different result. In the LUG with uncertainactions, labels can get quite large and costly topropagate for the purpose of heuristics.

Monte Carlo in Planning GraphsOne does not need to propagate labels for everyuncertain action outcome and possible state

because, as seen in the previous section, thereis usually considerable positive interaction inthe LUG. In order to include an action in therelaxed plan, one need not know every uncer-tain state or action that causally supports theaction, just the most probable. Thus, itbecomes possible to sample which uncertainstates and action outcomes enter the planninggraph. While it is possible to use a single (unla-beled) planning graph by sampling only onestate and outcome of each action, using sever-al samples better captures the probability dis-tributions over states and action outcomesencountered in a conformant plan. Each sam-ple can be thought of as a deterministic plan-ning graph.

The Monte Carlo LUG (McLUG) representsevery planning graph sample simultaneouslyusing the labeling technique developed in theLUG. Figure 16 depicts a McLUG for the initialbelief state of the example. There are four sam-ples (particles), denoted by the circles andsquare above each action and proposition.Each effect edge is labeled by particles (some-times with fewer particles than the associatedaction). The McLUG can be thought of assequential Monte Carlo (SMC) (Doucet, de Fre-

Articles

SPRING 2007 77

(define (problem rovers_stochastic1) (:domain rovers) (:objects soil image rock - data alpha beta gamma - location) (:init (at alpha) (probabilistic 0.4 (avail soil alpha) 0.5 (avail soil beta) 0.1 (avail soil gamma))) (:goal (comm soil) 0.5))

(define (domain rovers_stochastic)(:requirements :strips :typing)(:types location data)(:predicates (at ?x - location) (avail ?d - data ?x - location) (comm ?d - data) (have ?d - data))(:action drive :parameters (?x ?y - location) :precondition (at ?x) :effect (and (at ?y) (not (at ?x))))(:action commun :parameters (?d - data) :precondition (have ?d) :effect (probabilistic 0.8 (comm ?d)))(:action sample :parameters (?d - data ?x - location) :precondition (at ?x) :effect (when (avail ?d ?x)

(probabilistic 0.9 (have ?d)))))

Figure 15. PDDL Description of Stochastic Planning Formulation of the Rover Problem.

Page 32: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

itas, and Gordon 2001) in the relaxed planningspace. SMC is a technique used in particlefilters for approximating the posterior proba-bility of a random variable that changes overtime. The idea is to draw several samples froma prior distribution (a belief state) and thensimulate each sample through a transitionfunction (an action layer). Where particlefilters use observations to weight particles, theMcLUG does not.

Returning to figure 16, there are four parti-cles, the first two sampling the first state in thebelief state, and the latter two sampling thesecond state. The third state, whose probabili-ty is 0.1, is not sampled, thus avail(soil, gamma)does not appear in P0. Every action that is sup-ported by P0 is added to A0. In A0, sample(soil,alpha) is labeled by two particles, but its effectis labeled by only one. This is because each par-ticle labeling an action samples an outcome ofthe action. It happens that only one of the par-ticles labeling sample(soil, alpha) samples the

outcome with an effect; the other particle sam-ples the outcome with no effect. In each actionlevel, the McLUG must resample which parti-cles label each effect because each execution ofan action can have a different outcome. Propa-gation continues until the proportion of parti-cles labeling the goal is no less than the goalprobability threshold. In P2, comm(soil) islabeled with one particle, indicating its proba-bility is approximately 1/4, which is less thanthe threshold 0.5. In P3, comm(soil) is labeledwith three particles, indicating its probabilityis approximately 3/4, which is greater than 0.5.It is possible to extract a labeled relaxed plan(described in the previous section) to supportthe goal for the three particles that label it. Thelabeled relaxed plan contains commun(soil)twice, reflecting the need to repeat actions thatmay fail to give their desired effect. The com-mun(soil) action is used twice because in A2 theblack particle does not sample the desired out-come. The black particle must support

Articles

78 AI MAGAZINE

sample(soil, )

avail(soil, ) avail(soil, )

have(soil)

at( )drive( , )

drive( , )

avail(soil, )

at( )

avail(soil, )

at( )

at( )

drive( , )

drive( , )

at( )

avail(soil, )

at( )

at( )

sample(soil, )

have(soil)

drive( , )

drive( , )

drive( , )

drive( , )

drive( , )

drive( , )

at( )

avail(soil, )

at( )

at( )

sample(soil, )

have(soil)

drive( , )

drive( , )

drive( , )

drive( , )

commun(soil)

comm(soil)

commun(soil)

comm(soil)

sample(soil, )

avail(soil, )

sample(soil, )

avail(soil, )

A1A0 A2P1P0 P2 P3

Figure 16. Monte Carlo Labeled Uncertainty Graph.

Page 33: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

comm(soil) through persistence from the previ-ous level, where it was supported by the desiredoutcome of commun(soil). A plan using theactions identified by the relaxed plan could sat-isfy the goal with probability (0.4 (0.9) + 0.5(0.9)) (0.8 + (0.2) 0.8) = 0.78, which exceeds theprobability threshold. Notice also that 0.78 isclose to the number of particles (3/4) reachingthe goal in the McLUG.

Related WorkThe discussion of the McLUG is based on thework of Bryce, Kambhampati, and Smith(2006b).

The work on Conformant FF extends to theprobabilistic setting in Probabilistic FF(Domshalk and Hoffmann 2006). Previouslythe planner relied on SAT, and now it usesweighted model counting in CNFs. The relaxedplan heuristic is found in a similar fashion,using a weighted version of the planninggraph.

Where the McLUG uses labels and simula-tion to estimate the probability of reaching thegoals, it is possible to directly propagate proba-bility on the planning graph. Propagating aprobability for each proposition in order toestimate the probability of a set of propositionsgreatly underestimates the probability of theset. The technique developed by Bryce andSmith (2006) propagates a binary interactionfactor I(a, b) that measures the positive or neg-ative interaction between two propositions,actions, or effects. After computing I(a, b) =Pr(a ∧ b) / Pr(a)P(b), having I(a, b) = 0 meansthat a and b are mutex, having I(a, b) = 1 meansthat a and b are independent, having 0 < I(a, b)< 1 means that a and b negatively interact, andhaving 1 < I(a, b) means that a and b positive-ly interact. Consequently, the measure of inter-action can be seen as a continuous mutex.Propagating interaction can benefit nonproba-bilistic planning also, for example, by definingit in terms of cost I(a, b) = c(a, b) − (c(a) + c(b)).

Hybrid Planning Graphs In many of the previous sections we largelyconcentrated on how to adapt planning graphsto handle a single additional feature over andabove classical planning. In this section we dis-cuss a few works that have combined thesemethods to handle more expressive planningproblems. The problems are metric-temporalplanning, partial satisfaction planning withresources, temporal planning with uncertainactions, and cost-based nondeterministic plan-ning.

Metric-Temporal PlanningIn metric-temporal planning, the problem ismultiobjective because the planner must find aplan with low makespan using nonuniformcost actions. For instance, the rover may driveat two different speeds, the faster speed costingmore than the slower. The rover can drive fastand achieve the goals very quickly, but at highcost.

Recall from the “Temporal Planning” sectionthat the measure of makespan is identical withlevel-based and relaxed plan heuristics. In met-ric-temporal planning, level-based heuristicsmeasure makespan and relaxed plans measureplan cost. To obtain a high-quality relaxed planwhen the actions have nonuniform cost, aplanning graph can propagate cost functionsover time (instead of levels) (Do and Kamb-hampati 2003). Instead of updating the cost ofpropositions or actions at every time point, itupdates only when they receive a new sup-porter or the cost of a supporter drops. Then,instead of supporting the goals when they arefirst coreachable, the relaxed plan supportsthem at the time where they have minimumcost. With a heuristic measure of makespanand cost, a user-supplied combination functioncan help guide search toward plans matchingthe user’s preference.

Partial Satisfaction with ResourcesIn the “Planning with Resources” section wementioned planning for different degrees ofsatisfaction for numeric goals (Benton, Do, andKambhampati 2005). The techniques for select-ing goals for a reformulated planning problemcan be adapted to numeric goals. First, theplanning graphs need to propagate cost func-tions for resources, and second, the relaxedplan must decide at what resource level to sat-isfy the (numeric) resource goals to maximizenet benefit.

Since the planning graph propagates re -sources in terms of upper and lower bounds, itis easy to capture the reachable values of aresource. However, the cost of each value of aresource is much more difficult to propagate.An exact cost function for all resource values inthe reachable interval can be nonlinear in gen-eral. The approach taken by Benton, Do, andKambhampati (2005) is to track a cost for onlythe upper and lower bounds of the resource.The bound that enables an action is used todefine the cost of supporting the action. Inorder to decide which level to satisfy a numer-ic goal, the relaxed plan can use the propagat-ed cost of the resource bounds or relaxed plansto find the maximal net benefit value.

Articles

SPRING 2007 79

Page 34: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Temporal Planning with UncertaintyThe Prottle planner (Little, Aberdeen, andThiebaux 2005) addresses temporal planningwith uncertain actions whose durations arealso uncertain. Unlike the techniques for par-tially observable problems described in thenondeterministic planning section, Prottlefinds conditional plans for fully observableproblems (so there is no need to reason aboutbelief states). Prottle uses a planning graph toprovide a lower bound on the probability ofachieving the goal. The idea is to include anextra layer at each level to capture the uncer-tain outcomes of actions. Prottle back-propa-gates the probability of reaching each goalproposition on the planning graph to eachother proposition. To define the heuristic mer-it of each state, Prottle uses the aggregate prob-ability that each proposition reaches the goalpropositions. By using back-propagation, Prot-tle is able to avoid building a planning graphfor each search node, but it must be able toidentify which planning graph elements to usein the heuristic computation.

Cost-Based Conditional PlanningIn our previous work (Bryce and Kambhampati2005), we developed a cost-based version of theLUG, called the CLUG.The CLUG propagatescosts on the LUG to extract labeled relaxedplans that are aware of action costs. The keyinsight of the CLUG is to not propagate cost forevery one of the implicit planning graphs (ofwhich there may be an exponential number).Rather, planning graphs are grouped togetherand the propositions and actions they have incommon are assigned the same cost. Withcosts, it is possible to extract labeled relaxedplans that bias the selection of actions.

Conclusion and DiscussionWe have presented the foundations for usingplanning graphs to derive reachability heuris-tics for planning. These techniques haveenabled many planning systems with impres-sive scalability. We have also shown severalextensions to the classical planning model thatrely on many of the same fundamental meth-ods.

It is instructive to understand the broad rea-sons why planning graph heuristics haveproven to be so widely useful. Reachabilityheuristics based on planning graphs are usefulbecause they are forgiving, they can propagatemultiple types of information, they supportphased relaxation, they synergize with otherplanner features, and above all they are versa-tile. By forgiving, we mean that we can pick

and choose the features to compute (for exam-ple, mutexes). As we saw, the planning graphcan propagate all types of information, such aslevels, subgoal interactions, time, cost, andbelief support. Phased relaxation allows us toignore problem features to get an initial heuris-tic, which we later adjust to bring back theignored features. Planning graphs synergizewith search by influencing pruning strategies(for example, helpful actions) and choosingobjectives (as in partial satisfaction planning).

Planning graphs are versatile because of theirmany construction algorithms, the differentinformation propagated on them, the types ofproblems they can solve, and the types of plan-ners that employ them. We can construct seri-al, parallel, temporal, or labeled planninggraphs. We can propagate level information,mutexes, costs, and labels. We can (as of now)solve classical, resource, temporal, conformant,conditional, and stochastic planning prob-lems. They can be used in regression, progres-sion, partial order, and GraphPlan-style plan-ners. To this day we are finding new andinteresting ways to use planning graphs.

As researchers continue to tackle more andmore expressive planning models and prob-lems, we have no doubt that reachabilityheuristics will continue to evolve to supportsearch in those scenarios. In the near term, weexpect to see increased attention to the devel-opment of reachability heuristics for hybridplanning models that simultaneously relax setsof classical planning assumptions, as well asmodels that support first-order (and lifted)action and goal languages. We hope that thissurvey will fuel this activity by unifying sever-al views toward planning graph heuristics. Asthe scalability of modern planners improvesthrough reachability heuristics, we are opti-mistic about the range of real-world problemsthat can be addressed through automated plan-ning.

AcknowledgementsThis work was supported in part by NSF grantIIS-0308139, by ONR grant N000140610058,by DARPA Integrated Learning Initiative(through a contract from Lockheed Martin),the ARCS foundation, and an IBM facultyaward. We would like to thank the current andformer members of the Yochan planninggroup—including Xuanlong Nguyen, RomeoSanchez Nigenda, Terry Zimmerman, Minh B.Do, J. Benton, Will Cushing, and Menkes VanDen Briel—for numerous helpful comments onthis article as well as for developing many ofthe discussed techniques. Outside of Yochan,we thank David E. Smith and Dan Weld for

Articles

80 AI MAGAZINE

Page 35: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

many discussions on planning graph heuris-tics. We would also like to thank MalteHelmert, Sungwook Yoon, Garrett Wolf, andthe reviewers at AI Magazine for comments onearlier drafts of this article. This article is basedin part on an invited talk at ICAPS 2003 titled“1001 Ways to Skin a Planning Graph forHeuristic Fun and Profit” and on a more recenttutorial given at ICAPS 2006.

Notes1. Since this article is meant to be tutorial in nature,we have also prepared a set of slides to aid the pres-entation of this material. The slides are available atrakaposhi.eas.asu.edu/pg-tutorial.

2. In the following, we will refer to the alpha, beta,and gamma locations in the text, but use the relatedsymbols �, �, and � to simplify the illustrations.

3. As we will see, the number of states can also beinfinite (for example, when planning withresources).

4. It is common to remove static propositions fromstate and action descriptions because they do notchange value.

5. The reason that actions do not contribute theirnegative effects to proposition layers (which containonly positive propositions) is a syntactic conven-ience of using STRIPS. Since action preconditionsand the goal are defined only by positive proposi-tions, it is not necessary to reason about reachablenegative propositions. In general, action languagessuch as ADL (Pednault 1994) allow negative proposi-tions in preconditions and goals, requiring the plan-ning graph to maintain “literal” layers that recordthe all reachable values of propositions (Koehler etal. 1997).

6. Since action layers contain noop actions, techni-cally speaking, mutexes can also exist betweenactions and propositions (through the associatednoop actions), but mutexes are marked only betweenelements of the same layer.

7. With multiple objectives, it is necessary to find thePareto set of nondominated plans.

8. We will see that multiple supporters are also need-ed for relaxed plans when there is uncertainty (seethe “Nondeterministic Planning” section that fol-lows).

9. Or, while we did not model this in our example, itis possible for the rover to warm up instruments usedfor sampling while driving to a rock.

ReferencesBackstrom, C., and Nebel, B. 1995. ComplexityResults for SAS+ Planning. Computational Intelligence11(4): 625–655.

Benton, J.; Do, M. B., and Kambhampati, S. 2005.Over-Subscription Planning with Numeric Goals. InProceedings of the Nineteenth International Joint Confer-ence on Artificial Intelligence, 1207–1213. Denver, CO:Professional Book Center.

Blum, A., and Furst, M. L. 1995. Fast Planningthrough Planning Graph Analysis. In Proceedings of

the Fourteenth International Joint Conference on Artifi-cial Intelligence, 1636–1642. San Francisco: MorganKaufmann Publishers.

Boddy, M.; Gohde, J.; Haigh, T.; and Harp, S. 2005.Course of Action Generation for Cyber SecurityUsing Classical Planning. In Proceedings of the Fif-teenth International Conference on Automated Planningand Scheduling (ICAPS), 12–21. Menlo Park, CA: AAAIPress.

Bonet, B., and Geffner, H. 1999. Planning as Heuris-tic Search: New Results. In Proceedings of the 5th Euro-pean Conference on Planning, 360–372. Berlin:Springer-Verlag.

Bonet, B., and Geffner, H. 2000a. Planning as Heuris-tic Search. Artificial Intelligence 129(1–2): 5–33.

Bonet, B., and Geffner, H. 2000b. Planning withIncomplete Information as Heuristic Search in BeliefSpace. In Proceedings of the Fifth International Confer-ence on Artificial Intelligence Planning Systems, 52–61.Menlo Park, CA: AAAI Press.

Brafman, R., and Hoffmann, J. 2005. ContingentPlanning Via Heuristic Forward Search with ImplicitBelief States. In Proceedings of the Fifteenth Internation-al Conference on Automated Planning and Scheduling,71–80. Menlo Park, CA: AAAI Press.

Bryce, D., and Kambhampati, S. 2004. HeuristicGuidance Measures for Conformant Planning. In Pro-ceedings of the Fourteenth International Conference onAutomated Planning and Scheduling, 365–375. MenloPark, CA: AAAI Press.

Bryce, D., and Kambhampati, S. 2005. Cost SensitiveReachability Heuristics for Handling State Uncer-tainty. In Proceedings of Twenty-First Conference onUncertainty in Artificial Intelligence, 60–68. Eugene,OR: Association of Uncertainty in Artificial Intelli-gence.

Bryce, D., and Smith, D. 2006. Using Interaction toCompute Better Probability Estimates in PlanGraphs. Paper Presented at the Sixteenth Interna-tional Conference on Automated Planning andScheduling Workshop on Planning Under Uncer-tainty and Execution Control for Autonomous Sys-tems, 6–10 June, The English Lake District, Cumbria,UK.

Bryce, D.; Kambhampati, S.; and Smith, D. 2006a.Planning Graph Heuristics for Belief Space Search.Journal of Artificial Intelligence Research 26: 35–99.

Bryce, D.; Kambhampati, S.; and Smith, D. 2006b.Sequential Monte Carlo in Probabilistic PlanningReachability Heuristics. In Proceedings of the SixteenthInternational Conference on Automated Planning andScheduling, 233–242. Menlo Park, CA: AAAI Press.

Cayrol, M.; Regnier, P.; and Vidal, V. 2000. NewResults about LCGP, A Least Committed Graphplan.In Proceedings of the Fifth International Conference onArtificial Intelligence Planning Systems, 273–282. Men-lo Park, CA: AAAI Press.

Cushing, W., and Bryce, D. 2005. State Agnostic Plan-ning Graphs and the Application to Belief-SpacePlanning. In Proceedings of the Twentieth National Con-ference on Artificial Intelligence and the SeventeenthInnovative Applications of Artificial Intelligence Confer-ence, 1131–1138. Menlo Park, CA: AAAI Press.

Articles

SPRING 2007 81

Page 36: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Do, M., and Kambhampati, S. 2003. Sapa: A ScalableMulti-Objective Metric Temporal Planner. Journal ofArtificial Intelligence Research 20: 155–194.

Do, M., and Kambhampati, S. 2004. Partial Satisfac-tion (Over-Subscription) Planning as HeuristicSearch. Paper presented at the Fifth InternationalConference on Knowledge Based Computer Systems,Hyderabad, India, 19–22 December.

Do, M. B.; Benton, J.; and Kambhampati, S. 2006.Planning with Goal Utility Dependencies. Paper Pre-sented at the Sixteenth International Conference onAutomated Planning and Scheduling Workshop onPreferences and Soft Constraints, The English LakeDistrict, Cumbria, United Kingdom, 6–10 June.

Domshalk, C., and Hoffmann, J. 2006. Fast Proba-bilistic Planning through Weighted Model Counting.In Proceedings of the Sixteenth International Conferenceon Automated Planning and Scheduling, 243–251. Men-lo Park, CA: AAAI Press.

Doucet, A.; de Freitas, N.; and Gordon, N. 2001.Sequential Monte Carlo Methods in Practice. New York:Springer.

Fikes, R., and Nilsson, N. 1971. STRIPS: A NewApproach to the Application of Theorem Proving toProblem Solving. In Proceedings of Second Internation-al Joint Conference on Artificial Intelligence, 608–620.Los Altos, CA: William Kaufmann, Inc.

Garrido, A.; Onaindia, E.; and Barber, F. 2001. A Tem-poral Planning System for Time-Optimal Planning.In Proceedings of the Tenth Portuguese Conference onArtificial Intelligence, 379–392. Berlin: Springer.

Gerevini, A.; Bonet, B.; and Givan, R. 2006. The FifthInternational Planning Competition. The EnglishLakes District, Cumbria, United Kingdom.

Gerevini, A.; Saetti, A.; and Serina, I. 2003. Planningthrough Stochastic Local Search and TemporalAction Graphs in LPG. Journal of Artificial IntelligenceResearch 20: 239–290.

Gerevini, A.; Saetti, A.; and Serina, I. 2005. Integrat-ing Planning and Temporal Reasoning for Domainswith Durations and Time Windows. In Proceedings ofthe Nineteenth International Joint Conference on Artifi-cial Intelligence, 1226–1231. Denver, CO: ProfessionalBook Center.

Ghallab, M., and Laruelle, H. 1994. Representationand Control in IXTET, A Temporal Planner. In Pro-ceedings of the Second International Conference on Arti-ficial Intelligence Planning Systems, 61–67. Menlo Park,CA: AAAI Press.

Ghallab, M.; Nau, D.; and Traverso, P. 2004. Auto-mated Planning: Theory and Practice. San Francisco:Morgan Kaufmann.

Haslum, P., and Geffner, H. 2000. Admissible Heuris-tics for Optimal Planning. In Proceedings of the FifthInternational Conference on Artificial Intelligence Plan-ning Systems, 140–149. Menlo Park, CA: AAAI Press.

Helmert, M. 2004. A Planning Heuristic Based onCausal Graph Analysis. In Proceedings of the FourteenthInternational Conference on Automated Planning andScheduling, 161–170. Menlo Park, CA: AAAI Press.

Hoffmann, J., and Brafman, R. 2004. ConformantPlanning Via Heuristic Forward Search: A NewApproach. In Proceedings of the Fourteenth Internation-

al Conference on Automated Planning and Scheduling,355–364. Menlo Park, CA: AAAI Press.

Hoffmann, J., and Geffner, H. 2003. Branching Mat-ters: Alternative Branching in Graphplan. In Proceed-ings of the Thirteenth International Conference on Auto-mated Planning and Scheduling, 22–31. Menlo Park,CA: AAAI Press.

Hoffmann, J., and Nebel, B. 2001. The FF PlanningSystem: Fast Plan Generation through HeuristicSearch. Journal of Artificial Intelligence Research 14:253–302.

Hoffmann, J. 2003. The Metric-FF Planning System:Translating “Ignoring Delete Lists” to Numeric StateVariables. Journal of Artificial Intelligence Research 20:291–341.

Kambhampati, S., and Sanchez, R. 2000. DistanceBased Goal Ordering Heuristics for Graphplan. InProceedings of the Fifth International Conference on Arti-ficial Intelligence Planning Systems, 315–322. MenloPark, CA: AAAI Press.

Kambhampati, S.; Lambrecht, E.; and Parker, E. 1997.Understanding and Extending Graphplan. In Pro-ceedings of Fourth European Conference on Planning,260–272. Berlin: Springer-Verlag.

Koehler, J.; Nebel, B.; Hoffmann, J.; and Dimopoulos,Y. 1997. Extending Planning Graphs to an ADL Sub-set. In Proceedings of Fourth European Conference onPlanning, 273–285. Berlin: Springer-Verlag.

Little, I.; Aberdeen, D.; and Thiebaux, S. 2005. Prot-tle: A Probabilistic Temporal Planner. In Proceedingsof the Twentieth National Conference on Artificial Intel-ligence and the Seventeenth Innovative Applications ofArtificial Intelligence Conference, 1181–1186. MenloPark, CA: AAAI Press.

Long, D., and Fox, M. 1999. Efficient Implementa-tion of the Plan Graph in Stan. Journal of ArtificialIntelligence Research 10: 87–115.

Long, D., and Fox, M. 2003. Exploiting a GraphplanFramework in Temporal Planning. In Proceedings ofthe Thirteenth International Conference on AutomatedPlanning and Scheduling, 52–61. Menlo Park, CA:AAAI Press.

McDermott, D. V. 1996. A Heuristic Estimator forMeans-Ends Analysis in Planning. In Proceedings ofthe Third International Conference on Artificial Intelli-gence Planning Systems, 142–149. Menlo Park, CA:AAAI Press.

McDermott, D. 1998. PDDL—The Planning DomainDefinition Language. Technical Report CVC TR-98-003 / DCS TR-1165, Yale Center for ComputationalVision and Control, New Haven, CT.

McDermott, D. 1999. Using Regression-MatchGraphs to Control Search in Planning. Artificial Intel-ligence 109(1–2): 111–159.

Nebel, B. 2000. On the Compilability and ExpressivePower of Propositional Planning Formalisms. Journalof Artificial Intelligence Research 12: 271–315.

Nguyen, X., and Kambhampati, S. 2000. ExtractingEffective and Admissible State Space Heuristics fromthe Planning Graph. In Proceedings of the SeventeenthNational Conference on Artificial Intelligence andTwelfth Conference on Innovative Applications of Artifi-cial Intelligence, 798–805. Menlo Park, CA: AAAI Press.

Articles

82 AI MAGAZINE

Page 37: A Tutorial on Planning Graph Based Reachability Heuristicsrakaposhi.eas.asu.edu › pgSurvey.pdf · Planning graphs are a cheap means to obtain informative look-ahead ... and the

Proposals for the 2008 AAAI SpringSymposium Series

Are now being Solicited

For details, please visit www.aaai.org/Spymposia/

Spring/sss08.php

Nguyen, X., and Kambhampati, S. 2001. RevivingPartial Order Planning. In Proceedings of the Seven-teenth International Joint Conference on Artificial Intelli-gence, 459–466. San Francisco: Morgan KaufmannPublishers.

Nguyen, X.; Kambhampati, S.; and Nigenda, R. 2002.Planning Graph as the Basis for Deriving Heuristicsfor Plan Synthesis by State Space and CSP Search.Artificial Intelligence 135(1–2): 73–123.

Pednault, E. P. D. 1994. ADL and the State-TransitionModel of Action. Journal of Logic and Computation4(5): 467–512.

Refanidis, I., and Vlahavas, I. 2001. The GRT Plan-ning System: Backward Heuristic Construction inForward State-Space Planning. Journal of ArtificialIntelligence Research 15: 115–161.

Rintanen, J. 2004. Distance Estimates for Planning inthe Discrete Belief Space. In Proceedings of the Nine-teenth National Conference on Artificial Intelligence, Six-teenth Conference on Innovative Applications of ArtificialIntelligence, 525–530. Menlo Park, CA: AAAI Press.

Ruml, W.; Do, M. B.; and Fromherz, M. P. J. 2005. On-Line Planning and Scheduling for High-Speed Man-ufacturing. In Proceedings of the Fifteenth InternationalConference on Automated Planning and Scheduling, 30–39. Menlo Park, CA: AAAI Press.

Sanchez, R., and Kambhampati, S. 2005. PlanningGraph Heuristics for Selecting Objectives in Over-Subscription Planning Problems. In Proceedings of theFifteenth International Conference on Automated Plan-ning and Scheduling 192–201. Menlo Park, CA: AAAIPress.

Sanchez, J., and Mali, A. D. 2003. S-Mep: A Plannerfor Numeric Goals. In Proceedings of the Fifteenth IEEEInternational Conference on Tools with Artificial Intelli-gence, 274–283. Los Alamitos, CA: IEEE ComputerSociety.

Smith, D. E., and Weld, D. S. 1999. Temporal Plan-ning with Mutual Exclusion Reasoning. In Proceed-ings of the Sixteenth International Joint Conference onArtificial Intelligence, 326–337. San Francisco: MorganKaufmann.

Smith, D. E. 2003. The Mystery Talk. Invited Talk atthe Second International Planning Summer School,16–22 September.

Smith, D. E. 2004. Choosing Objectives in Over-Sub-scription Planning. In Proceedings of the FourteenthInternational Conference on Automated Planning andScheduling, 393–401. Menlo Park, CA: AAAI Press.

Van Den Briel, M.; Nigenda, R. S.; Do, M. B.; andKambhampati, S. 2004. Effective Approaches for Par-tial Satisfaction (Over-Subscription) Planning. In Pro-ceedings of the Nineteenth National Conference on Arti-ficial Intelligence, Sixteenth Conference on InnovativeApplications of Artificial Intelligence, 562–569. MenloPark, CA: AAAI Press.

Vidal, V. 2004. A Lookahead Strategy for HeuristicSearch Planning. In Proceedings of the Fourteenth Inter-national Conference on Automated Planning and Sched-uling, 150–160. Menlo Park, CA: AAAI Press.

Yoon, S.; Fern, A.; and Givan, R. 2006. LearningHeuristic Functions from Relaxed Plans. In Proceed-ings of the Sixteenth International Conference on Auto-

mated Planning and Scheduling, 162–171. Menlo Park,CA: AAAI Press.

Younes, H., and Simmons, R. 2003. VHPOP: VersatileHeuristic Partial Order Planner. Journal of ArtificialIntelligence Research 20: 405–430.

Zimmerman, T., and Kambhampati, S. 2005. UsingMemory to Transform Search on the PlanningGraph. Journal of Artificial Intelligence Research 23:533–585.

Daniel Bryce is a Ph.D. candidatein the computer science depart-ment at Arizona State University,where he is a member of theYochan research group. His cur-rent research interests includeplanning graph heuristics, plan-ning and execution under uncer-tainty, and applications to compu-

tational systems biology. He earned a B.S. fromArizona State University in 2001 and anticipatesearning his Ph.D. in 2007. His email address [email protected].

Subbarao Kambhampati is a pro-fessor of computer science at Ari-zona State University. His groupsaw the connection between plan-ning graph and reachability analy-sis in 1998, and subsequentlydeveloped several scalable plan-ners including AltAlt, AltAlt-psp,SAPA, SAPA-ps, PEGG, CAltAlt,

and POND. He is a 1994 NSF Young Investigator, a2004 IBM faculty fellow, an AAAI 2005 programcochair, a JAIR associate editor, and a fellow of AAAI(elected for his contributions to automated planning.He received the 2002 college of engineering teachingexcellence award. He gave an invited talk at ICAPS-03entitled “1001 Ways to Skin a Planning Graph forHeuristic Fun and Profit.” The current tutorial buildson that talk and discusses several new areas in whichreachability heuristics have been found to be useful.

Articles

SPRING 2007 83