planning with resource conﬂicts in human-robot cohabitation · with a brief introduction of the...

Planning with Resource Conflictsin Human-Robot Cohabitation

Tathagata Chakraborti1, Yu Zhang1, David E. Smith2, Subbarao Kambhampati11Department of Computer Science, Arizona State University, AZ

2Autonomous Systems & Robotics Intelligent Systems Division, NASA Ames Research Center, [email protected], [email protected], [email protected], [email protected]

ABSTRACTIn order to be acceptable members of future human-robotecosystems, it is necessary for autonomous agents to be re-spectful of the intentions of humans cohabiting a workspaceand account for conflicts on shared resources in the en-vironment. In this paper we build an integrated systemthat demonstrates how maintaining predictive models of itshuman colleagues can inform the planning process of therobotic agent. We propose an Integer Programming basedplanner as a general formulation of this flavor of “human-aware” planning and show how the proposed formulationcan be used to produce different behaviors of the roboticagent, showcasing compromise, opportunism or negotiation.Finally, we investigate how the proposed approach scaleswith the different parameters involved, and provide empir-ical evaluations to illustrate the pros and cons associatedwith the proposed style of planning.

1. INTRODUCTIONIn environments where multiple agents are working inde-

pendently, but utilizing shared resources, it is important forthese agents to model the intentions and beliefs of otheragents so as to act intelligently and prevent conflicts. Incases where some of these agents are human, as in the caseof assistive robots in household environments, these are re-quired (rather than just desired) capabilities of robots inorder for them to be considered “socially acceptable” - thishas been one of the important objectives of “human-aware”planning, as evident from existing literature in human-awarepath planning [13, 10] and human-aware task planning [5,9, 3, 15]. An interesting aspect of many of these scenarios,is the presence of many of the aspects of multi-agent envi-ronments, but absence of typical assumptions often made inexplicit teaming scenarios between humans and robots, aspointed out in [4]. Probabilistic plan recognition plays animportant role in this regard, because by not committing toa plan, that presumes a particular plan for the other agent,it might be possible to minimize suboptimal (in terms ofredundant or conflicting actions performed during the exe-cution phase) behavior of the autonomous agent.

Here we look at possible ways to minimize such subopti-mal behavior by ways of compromise, opportunism or nego-

Appears in: Proceedings of the 15th International Conferenceon Autonomous Agents and Multiagent Systems (AAMAS 2016),J. Thangarajah, K. Tuyls, C. Jonker, S. Marsella (eds.),May 9–13, 2016, Singapore.Copyright c© 2016, International Foundation for Autonomous Agents andMultiagent Systems (www.ifaamas.org). All rights reserved.

tiation. Specifically, we ask the question, what informationcan be extracted from the predicted plans, and how thisinformation can be used to guide the behavior of the au-tonomous agent. There has been previous work [1, 6] onsome of the modeling aspects of the problem, in terms ofplanning with uncertainty in resources and constraints. Inthis paper we provide an integrated framework (shown inFigure 1) for achieving these behaviors of the autonomousagent, particularly in the context of stigmergic coordinationof human-robot cohabitation. To this end, we modularizeour architecture so as to handle the uncertainty in the en-vironment separately with the planning process, and showhow these individual modules interact with each other bythe way of usage profiles of the concerned resources.

Figure 1: Schematic diagram of our integrated system for be-lief modeling, goal recognition, information extraction andplanning. The robot maintains a belief model of the en-vironment, and uses observations from the environment toextract information about how the world may evolve, whichis then used to drive its own planning process.

1069

The general architecture of the system is shown in Figure1. The autonomous agent, or the robot, is acting (with in-dependent goals) in an environment co-habited with otheragents (humans), who are similarly self-interested. The robothas a model of the other agents acting independently in itsenvironment. These models may be partial and hence therobot can only make uncertain predictions on how the worldwill evolve with time. However, the resources in the envi-ronment are limited and are likely to be constrained by theplans of the other agents. The robot thus needs to reasonabout the future states of the environment in order to makesure that its own plans do not produce conflicts with respectto the plans of the other agents.

With the involvement of humans, however, the problemis more skewed against the robot, because humans wouldexpect a higher priority on their plans - robots that pro-duce plans that clash with those of the humans, withoutany explanation, would be considered incompatible for suchan ecosystem. Thus the robot is expected to follow plansthat preserve the human plans, rather than follow a glob-ally optimal plan for itself. This aspect makes the currentsetting distinct from normal human robot teaming scenar-ios and produces a number of its own interesting challenges.How does the robot model the human’s behavior? How doesit plan to avoid friction with the human plans? If it is possi-ble to communicate, how does it plan to negotiate and refineplans? These are the questions that we seek to addressin this work. Our approach models human beliefs and de-fines resource profiles as abstract representations of the planspredicted on the basis of these beliefs. The robot updatesits beliefs upon receiving new observations, and passes onthe resultant profiles onto its planner, which uses an IP-formulation to minimize the overlap between these resourceprofiles and those produced by the human’s plans.

The contribution of our paper is thus three-fold, we (1)propose resource profiles as a concise mode of represent-ing different types of information from predicted plans; (2)develop an IP-based planner that can utilize this informa-tion and provide different modalities of conformant behav-ior; and (3) provide an integrated framework that supportsthe proposed mode of planning - the modular approach alsoprovides an elegant way to handle different challenges sep-arately (e.g. uncertainty and/or nested beliefs of humansleaves the planner). The planner, as a consequence of these,has properties not present in existing planners - for exam-ple, the work that probably comes closest is [9] that modelsa specific case of compromise only, while the formulation isalso likely to blow up in presense of large hypothesis setsdue to absence of concise representation techniques like theprofiles. We will discuss the trade-offs and design choices inmore detail in the evaluation sections.

The rest of the paper is organized as follows. We will startwith a brief introduction of the agent models that comprisethe belief component, and describe how it facilitates planrecognition. Then, in Sections 2.3 and 2.4, we are goingto go into details of how resource profiles may be used torepresent information from predicated plans, and describehow our planner converts this information into constraintsthat can be solved as an integer program during the plangeneration process. In Section 3 we will demonstrate howthe planner may be used to produce different modes of au-tonomous behavior. Finally in Section 4 we will provideempirical evaluations of the planner’s internal properties.

Figure 2: Use case - Urban Search And Rescue (USAR).

2. PLANNING WITH CONFLICTSON SHARED RESOURCES

We will now go into details about each of the modulesshown in Figure 1. The setting (adopted from [14]) involvesa commander CommX and a robot in a typical USAR (UrbanSearch and Rescue) task illustrated in Figure 2. The com-mander can perform triage in certain locations, for whichhe needs the medkit. The robot can also fetch medkits ifrequested by other agents (not shown) in the environment.The shared resources here are the two medkits - some ofthe plans the commander can execute will lock the use ofand/or change the position of these medkits, so that fromthe set of probable plans of the commander we can extracta probability distribution over the usage (or even the posi-tion) of the medkit over time based on the fraction of plansthat conform to these facts. These resource availability pro-files (i.e. the distribution over the usage or position of themedkit evolving over time) provide a way for the agents tominimize conflicts with the other agents. Before going intodetails about the planner that achieves this, we will firstlook at how the agents are modeled and how these profilesare computed in the next section.

2.1 The Belief Modeling ComponentThe notion of modeling beliefs introduced by the authors

in [14] is adopted in this work. Beliefs about state are definedin terms of predicates bel(α, φ), where α is an agent with be-lief φ = true. Goals are defined by predicates goal(α, φ),where agent α has a goal φ. The set of all beliefs thatthe robot ascribes to α together represents the perspec-tive for the robot of α. This is obtained by a belief modelBelα of agent α, defined as { φ | bel(α, φ) ∈ Belself },where Belself are the first-order beliefs of the robot (e.g.,bel(self, at(self, room1))). The set of goals ascribed to α issimilarly described by {goal(α, φ)|goal(α, φ) ∈ Belself}.

Next, we turn our attention to the domain model Dα ofthe agent α that is used in the planning process. We usePDDL [11] style agent models for the rest of the discus-sion, but most of the analysis easily generalizes to otherrelated modes of representation. Formally, a planning prob-lem Π = 〈Dα, πα〉 consists of the domain model Dα and theproblem instance πα. The domain model of α is defined asDα = 〈Tα, Vα, Sα, Aα〉, where Tα is a set of object types;Vα is a set of variables that describe objects that belong totypes in Tα; Sα is a set of named first-order logical predi-cates over the variables Vα that describe the state; and Aα isa set of operators available to the agent. The action modelsa ∈ Aα are represented as a = 〈N,C,P,E〉 where N denotes

1070

the name of that action; C is the cost of that action; P ⊆ Sαis the list of pre-conditions that must hold for the actiona to be applicable in a particular state s ⊆ Sα of the en-vironment; and Ea = 〈eff+(a), eff−(a)〉, eff±(a) ⊆ Sαis a tuple that contains the add and delete effects of ap-plying the action to a state. The transition function δ(·)determines the next state after the application of action ain state s as δ(a, s) |= ⊥ if ∃f ∈ P s.t. f 6∈ s; δ(a, s) |=(s \ eff−(a)) ∪ eff+(a) otherwise.

The belief model, in conjunction with beliefs about thegoals / intentions of another agent, will allow the robot toinstantiate a planning problem πα = 〈Oα, Iα,Gα〉, whereOα is a set of objects of type t ∈ Tα; Iα is the initial state ofthe world, and Gα is a set of goals, which are both sets of thepredicates from Sα initialized with objects from Oα. First,the initial state Iα is populated by all of the robot’s initial be-liefs about the agent α, i.e. Iα = {φ | bel(α, φ) ∈ Belrobot}.Similarly, the goal is set to Gα = {φ | goal(α, φ) ∈ Belrobot}.Finally, the set of objects Oα consists of all the objects thatare mentioned in either the initial state, or the goal descrip-tion: Oα = {o | o ∈ (φ | φ ∈ (Iα ∪Gα))}. The solution tothe planning problem is an ordered sequence of actions orplan given by πα = 〈a1, a2, . . . , a|πα|〉, ai ∈ Aα such thatδ(πα, Iα) |= Gφ, where the cumulative transition function isgiven by δ(π, s) = δ(〈a2, a3, . . . , a|π|〉, δ(a1, s)). The cost ofthe plan is given by C(πα) =

∑a∈πα Ca and the optimal

plan π∗α is such that C(π∗α) ≤ C(πα) ∀πα with δ(πα, Iα) |=

Gα. This planning problem instance (though not directlyused in the robot’s planning process) enables the goal recog-nition component to solve the compiled problem instances.More on this in the next section.

2.2 The Goal Recognition ComponentFor many real world scenarios, it is unlikely that the goals

of the humans are known completely, and that the plan com-puted by the planner is exactly the plan that they will follow.We are only equipped with a belief of the likely goal(s) ofthe human - and this may not be a full description of theiractual goals. Further, in the case of an incompletely speci-fied goal, there might be a set of likely plans that the humancan execute, which brings into consideration the idea of in-cremental goal recognition over a possible goal set given astream of observations.

2.2.1 Goal ExtensionTo begin with, it is worth noting that the robot might

have to deal with multiple plans even in the presence ofcompletely specified goals (even if the other agents are fullyrational). For example, there may be multiple optimal waysof achieving the same goal, and it is not obvious beforehandwhich one of these an agent is going to end up following. Inthe case of incompletely specified goals, the presence of mul-tiple likely plans become more relevant. To accommodatethis, we extend the robot’s current belief of an agent α’sgoal, Gα, to a hypothesis goal set Ψα. The computation ofthis goal set can be done using the planning graph method[2]. In the worst case, Ψα corresponds to all possible goals inthe final level of the converged planning graph. Having fur-ther (domain-dependent) knowledge (e.g. in our scenario,information that CommX is only interested in triage-relatedgoals) can prune some of these goals by removing the goalconditions that are not typed on the triage variable.

2.2.2 Goal / Plan RecognitionIn the present scenario, we thus have a set Ψα of goals that

α may be trying to achieve, and observations of the actionsα is currently executing. At this point we refer to the workof Ramirez and Geffner who in [12] provided a technique tocompile the problem of plan recognition into a classical plan-ning problem. Given a sequence of observations θ, we recom-pute the probability distribution Θ over G ∈ Ψα by using aBayesian update P (G|θ) ∝ P (θ|G), where the likelihood isapproximated by the function P (θ|G) = 1/(1 + e−β∆(G,θ))where ∆(G, θ) = Cp(G − θ) − Cp(G + θ). Here ∆(G, θ)gives an estimate of the difference in cost Cp of achieving thegoal G without and with the observations, thus increasingP (θ|G) for goals that explain the given observations. Thus,solving two compiled planning problems, with goals G − θand G + θ, gives us the required posterior update for thedistribution Θ over possible goals of α. The details of theapproach is available at [12].

The specific problem we will look at now is how to informthe robot’s own planning process from the recognized goalset Ψα. In order to do this, we compute the optimal plansfor each goal in the hypothesis goal set Ψα, and associatethem with the probabilities of these goals from the distri-bution thus obtained. Information from these plans is thenrepresented concisely in the form of resource profiles.

Notes on the Recognition ModuleFor our plan recognition module we use a much faster vari-ation [7] of the above approach that exploits cost and in-teraction information from plan graphs to estimate the goalprobabilities. This saves on the computational effort of hav-ing to solve two planning problems per goal. Also, note thatwhile computing the plan to a particular goal G, we use acompiled problem instance with the goal G + θ to ensurethat the predicted plan conforms to the existing observa-tions. Details on the compilation is available at [12].

Also, the output of the planner does not need to be as-sociated with probabilities - this is just the most generalformulation. If we want to deal with just a set of plans thatthe robot needs to be aware of, we can treat the plan seteither with a uniform distribution and/or by requiring ex-actly zero conflicts in the objective of the planner (this willbecome clearer in Section 2.4) depending on the preference.

Perhaps the biggest computational issue here is the needto compute optimal plans. While we still do it for our do-main, as we will note later in Section 2.3, this might notbe necessary, and suboptimal plans may be used in largerdomains where computation is an issue.

2.3 Resources and Resource ProfilesAs we discussed previously, since the plans of the agents

are in parallel execution, the uncertainty introduced by thecommander’s actions cannot be mapped directly betweenthe commander’s final state and the robot’s initial state.However, given the commander’s possible plans, the robotcan extract information about at what points of time theshared resources in the environment are likely to be lockedby the commander. This information can be representedby resource usage profiles that capture the expected (overall the recognized plans) variation of probability of usageor availability over time. The robot can, in turn, use thisinformation to make sure that the profile imposed by its ownplan has minimal conflicts with those of the commander’s.

1071

Figure 3: Different types of resource profiles.

Formally, a profile is defined as a mapping from time stepT to a real number between 0 and 1, and is represented bya set of tuples as follows G : N→ [0, 1] ≡ {(t, g) : t ∈ N, g ∈[0, 1], such that G(t) = g at time step t}.

The concept of resource profiles can be handled at twolevels of abstraction. Going back to our running example,shared resources that can come under conflict are the two(locatable typed objects) medkits, and the profiles over themedkits can be over both usage and location, as shown inFigure 3. These different types of profiles can be used (pos-sibly in conjunction if needed) for different purposes. Forexample, just the usage profile shown on top is more helpfulin identifying when to use the specific resource, while theresource when bound with the location specific groundings,as shown at the bottom can lead to more complicated higherorder reasoning (e.g. the robot can decide to wait for thecommander’s plans to be over, as he inadvertently bringsthe medkit closer to it with high probability as a result ofhis own plans). We will look at this again in Section 3.

Let the domain model of the robot be DR = 〈TR, VR, SR,AR〉 with the action models a = 〈N,C,P,E〉 defined in thesame way as described in Section 2.1. Also, let Λ ⊆ VRbe the set of shared resources and for each λ ∈ Λ we havea set of predicates fλ ⊆ SR that are influenced (as deter-mined by the system designer) by λ, and let Γ : Λ → P(ξ)be a function that maps the resource variables to the setof predicates ξ = ∪λfλ they influence. Without any exter-nal knowledge of the environment, we can set Λ = Vα ∩ VRand ξ = Sα ∩ SR, though in most cases these sets are muchsmaller. In the following discussion, we will look at howthe knowledge from the hypothesis goal set can be modeledin terms of resource availability graphs for each of the con-strained resources λ ∈ Λ.

Consider the set of plans ΨPα containing optimal planscorresponding to each goal in the hypothesis goal set, i.e.ΨPα = {π∗G = 〈a1, a2, . . . at〉 | δ(π∗G, Iα) |= G, ai ∈ Aα∀i, G ∈Ψα} and let l(π) be the likelihood of the plan π modeled onthe goal likelihood distribution ∀ G ∈ Ψα, p(G) ∼ Θ asl(πG) = c|πG| × p(G), where c is a normalization constant.

At each time step t, a plan π ∈ ΨPα may lock one ormore of the resources λ. Each plan thus provides a profileof usage of a resource with respect to the time step t asGλπ : N → {0, 1} = {(t, g) | t ∈ [1, |π|] and g = 1 if λ

is locked by π at step t, 0 otherwise} such that Gλπ (t) =g ∀ (t, g) ∈ Gλπ . The resultant usage profile of a resourceλ due to all the plans in ΨPα is obtained by summing over(weighted by the individual likelihoods) all the individualprofiles as Gλ : N → [0, 1] = {(t, g) | t ∈ {1,maxπ∈ΨPα |π|}and g ∝ 1|ΨPα |

∑π∈ΨPα

Gλπ (t)× l(π)}.Similarly, we can define profiles over the actual ground-

ings of a variable (shown in the lower part of Figure 3) as

Gfλ

π = {(t, g) | t ∈ [1, |π|] and fλ = 1 at step t of plan π,0 otherwise}, and the resultant usage profile due to all theplans in ΨPα is obtained as before as Gf

λ

= {(t, g) | t =1, 2, . . . ,maxπ∈ΨPα |π| and g ∝

1|ΨPα |

∑π∈ΨPα

Gfλ

π (t) × l(π)}.These profiles are helpful when actions in the robot’s domainare conditioned on these variables, and the values of thesevariables are conditioned on the plans of the other agents inthe environment currently under execution.

One important aspect of this formulation that should benoted here is that the notion of “resources” is described herein terms of the subset of the common predicates in the do-main of the agents (ξ ⊆ Sα ∩ SR) and can thus be usedas a generalized definition to model different types of con-flict between the plans between two agents. In as much asthese predicates are descriptions (possibly instantiated) ofthe typed variables in the domain and actually refer to thephysical resources in the environment that might be sharedby the agents, we will stick to this nomenclature of callingthem “resources”. We will now look at how an autonomousagent can use these resource profiles to minimize conflictsduring plan execution with other agents in its environment.

Notes on Usefulness of Profile ComputationOne interesting aspect of computing resource profiles is thatit provides a powerful interface between the belief on the en-vironment and the planner. On the one hand, note that theinput from the previous stage (goal/plan recognition mod-ule) is as generic as possible - a set of plans possibly asso-ciated with probabilities. Given any changes in precedingstages, e.g. modeling stochasticity or more complex beliefmodels, still yields a set of plans that the robot needs to beaware of. Thus the plan set and resource profiles providea surprisingly simple yet powerful way of abstracting awayrelevant information for the planner to use.

The profiles may also be leveraged to address differentmodalities of conformant behavior, for example with multi-ple humans and their relative importance, by (1) weighingthe contributions from individual profiles by the normalizedpriority of the human, which would cause the planner toavoid conflicts with these profiles more than with those withlower priorities; or (2) requiring zero conflicts on a subset ofprofiles which would cause the planner to avoid a subset ofconflicts at all costs, while minimizing the rest.

A somewhat implicit advantage of using profiles is its abil-ity to form regions of interest given the possible plans. Thiswill become clear later in Section 4.2 when we show that thepredicted conflicts provide well-informed guidance to avoid-ing real conflicts during execution (as evident by the robust-ness in performance with just 1-3 observations, and zero ac-tual conflicts in low probability areas in the computed pro-files). Right now this has the implication that we need notnecessarily compute perfect plan costs and goal distributionsto get good plans.

1072

2.4 Conflict MinimizationThe planning problem of the robot - given by Π = 〈DR, πR,

Λ, {Gλ| ∀λ ∈ Λ}, {Gfλ

| ∀f ∈ Γ(λ), ∀λ ∈ Λ}〉 - consists ofthe domain modelDR and the problem instance πR = 〈OR,IR,GR〉 similar to that described in section 2.3, and alsothe constrained resources and all the profiles correspondingto them. This is because the planning process must takeinto account both goals of achievement as also conflict of re-source usages as described by the profiles. Traditional plan-ners provide no direct way to handle such profiles within theplanning process. Note here that since the execution of theplans of the agents is occurring in parallel, the uncertainty isevolving at the time of execution, and hence the uncertaintycannot be captured from the goal states of the recognizedplans alone, and consequently cannot be simply compiledaway to the initial state uncertainty for the robot and solvedas a conformant plan. Similarly, the problem does not di-rectly compile into action costs in a metric planning instancebecause the profiles themselves are varying with time. Thuswe need a planner that can handle these resource constraintsthat are both stochastic and non-stationary due to the un-certainty in the environment. To this end we introduce thefollowing IP-based planner (partly following the techniquefor IP encoding for state space planning outlined in [16])as an elegant way to sum over and minimize overlaps inprofiles during the plan generation process. The followingformulation finds such T-step plans in case of non-durativeor instantaneous actions.

For action a ∈ AR at step t we have an action variable:

xa,t =

{1, if action a is executed in step t

0, otherwise; ∀a ∈ AR, t ∈ {1, 2, . . . , T}Also, for every proposition f at step t a binary state variableis introduced as follows:

yf,t =

{1, if proposition is true in plan step t

0, otherwise; ∀f ∈ SR, t ∈ {0, 1, . . . , T}Note here that the plan computed by the robot introduces aresource consumption profile itself, and thus one optimizingcriterion would be to minimize the overlap between the usageprofile due to the computed plan with those established bythe predicted plans of the other agents in the environment.Let us introduce a new variable to model the resource usagegraph imposed by the robot as follows:

gf,t =

{1, if f ∈ ξ is locked at plan step t0, otherwise; ∀f ∈ ξ, t ∈ {0, 1, . . . , T}

Further, for every resource λ ∈ Λ, we divide the actions inthe domain of the robot into three disjoint sets -

Ω+f = {a ∈ AR such that xa,t = 1 =⇒ yf,t = 1},Ω−f = {a ∈ AR such that xa,t = 1 =⇒ yf,t = 0}, andΩof = AR \ (Ω+f ∪ Ω

−f ), ∀f ∈ ξ.

These then specify respectively those actions in the domainthat lock, free up, or do not affect the use of a particularresource, and are used to calculate gf,t in the IP. Further, weintroduce a variable hf,t to track preconditions required byactions in the generated plan whose success is conditionedon the influence of the plans of the other agents on the world(e.g. position of the medkits are changing, and the actionpickup is conditioned on it) as follows:

hf,t =

{1, if f ∈ Pa and xa,t+1 = 10, otherwise; ∀f ∈ ξ, t ∈ {0, 1, . . . , T − 1}

Then the solution to the IP should ensure that the robot onlyuses these resources when they are in fact most expected tobe available (as obtained by maximizing the overlap between

hf,t and Gfλ ). These act like demand profiles from the per-

spective of the robot. We also add a “no-operation” actionAR ← AR ∪ aφ so that aφ = 〈N,C,P,E〉 where N = NOOP,C = 0, P = {} and E = {}.

The IP formulation is given by:

min k1∑a∈AR

∑t∈{1,2,...,T} Ca × xa,t

+k2∑λ∈Λ

∑f∈Γ(λ)

∑t∈{1,2,...,T} gf,t ×G

λ(t)

−k3∑λ∈Λ

∑f∈Γ(λ)

∑t∈{0,1,...,T−1} hf,t ×G

fλ(t)

yf,0 = 1 ∀f ∈ IR \ ξ (1)yf,0 = 0 ∀f /∈ IR or f ∈ ξ (2)yf,T = 1 ∀f ∈ GR (3)xa,t ≤ yf,t−1 ∀a s.t. f ∈ Pa, ∀f /∈ ξ, t ∈ {1, . . . , T} (4)hf,t−1 = xa,t ∀a s.t. f ∈ Pa, ∀f ∈ ξ, t ∈ {1, . . . , T} (5)yf,t ≤ yf,t−1 +

∑a∈add(f) xa,t

s.t. add(f) = {a|f ∈ eff+(a)},∀f, t ∈ {1, . . . , T} (6)yf,t ≤ 1−

∑a∈del(f) xa,t

s.t. del(f) = {a|f ∈ eff−(a)},∀f, t ∈ {1, . . . , T} (7)∑a∈AR

xa,t = 1, t ∈ {1, 2, . . . , T} (8)∑a∈Ω+

f

∑t xa,t ≤ 1 ∀f ∈ ξ, t ∈ {1, 2, . . . , T} (9)

gf,t =∑a∈Ω+

fxa,t+(1−

∑a∈Ω+

fxa,t−

∑a∈Ω−

fxa,t)×gf,t−1

∀f ∈ ξ, t ∈ {1, . . . , T} (10)

hf,t ×Gfλ

(t) ≥ � ∀f ∈ ξ, t ∈ {0, 1, . . . , T − 1} (11)yf,t ∈ {0, 1} ∀f ∈ SR, t ∈ {0, 1, . . . , T} (12)xa,t ∈ {0, 1} ∀a ∈ AR, t ∈ {1, 2, . . . , T} (13)gf,t ∈ {0, 1} ∀f ∈ SR, t ∈ {0, 1, . . . , T} (14)hf,t ∈ {0, 1} ∀f ∈ SR, t ∈ {0, 1, . . . , T − 1} (15)where k1, k2, k3 are constants determining the relative im-portance of the optimization criteria and � is a constant.

Here, the objective function minimizes the sum of the costof the plan and the overlap between the cumulative resourceusage profiles of the predicted plans and that imposed by thecurrent plan of the robot itself while maximizing the validityof the demand profiles. Constraints (1) through (3) modelthe initial and goal conditions, while the value of the con-strained variables are kept uninitialized (and are determinedby their profiles). Constraints (4) and (5), depending on theparticular predicate, enforces the preconditions, or producesthe demand profiles respectively, while (6) and (7) enforcesthe state equations that maintain the add and delete effectsof the actions. Constraint (8) imposes non concurrency onthe actions, and (9) ensures that the robot does not repeatthe same action indefinitely to increase its utility. Constraint(10) generates the resource profile of the current plan, while(11) maintains that actions are only executed if there is atleast a small probability � of success. Finally (12) to (15)provide the binary ranges of the variables.

Note on Temporal ExpressivityAt this point it is worth acknowledging the implicationsof having durative actions in our formulation. Note thatour approach does not discretize time, but rather uses time

1073

points as steps in the plan - that can be easily augmentedwith their own durations. So in order to handle durativeactions, the only (somewhat minor) change required in theformulation is in the way the conflicts are integrated (insteadof summed) over in the objective function. Further, uncer-tainty in action durations is always a big issue in humaninteractions; though resource profiles cannot directly handleuncertain durations, it only affects the way the profiles arecalculated, and the way in which information is expressedin it remains unchanged (i.e. expectations over action dura-tions add an extra expectation to the already probabilisticprofile computation). As noted before in Section 2.3, theability of profiles to form regions of interest is crucial inhandling such scenarios implicitly.

3. MODULATING BEHAVIOROF THE ROBOT

The planner is implemented on the IP-solver gurobi andintegrates [7] and [8] respectively for goal recognition andplan prediction for the recognized goals. We will now illus-trate how the formulation can produce different behaviorsof the robot by appropriately configuring the parameters ofthe planner. For this discussion we will limit ourselves to asingleton hypothesis goal set in order to observe the robot’sresponse more clearly.

3.1 CompromiseLet us now look back at the environment we introduced

in Figure 1. Consider that the goal of the commander is toperform triage in room1. The robot computes the human’soptimal plan (which ends up using medkit1 at time steps 7through 12) and updates the resource profiles accordingly. Ifit has its own goal to perform triage in hall3, the plan thatit comes up with given a 12 step lookahead is shown below.Notice that the robot opts for the other medkit (medkit2in room3) even though its plan now incurs a higher cost interms of execution. The robot thus can adopt a policy ofcompromise if it is possible for it to preserve the comman-der’s (expected) plan.

01 MOVE_ROBOT_ROOM1_HALL102 MOVE_ROBOT_HALL1_HALL203 MOVE_ROBOT_HALL2_HALL304 MOVE_ROBOT_HALL3_HALL405 MOVE_REVERSE_ROBOT_HALL4_ROOM406 MOVE_REVERSE_ROBOT_ROOM4_ROOM307 PICK_UP_MEDKIT_ROBOT_MK2_ROOM308 MOVE_ROBOT_ROOM3_ROOM409 MOVE_ROBOT_ROOM4_HALL410 MOVE_REVERSE_ROBOT_HALL4_HALL311 CONDUCT_TRIAGE_ROBOT_HALL312 DROP_OFF_ROBOT_MK2_HALL3

3.2 OpportunismNotice, however, that the commander is actually bringing

the medkit to room1 as predicted by the robot, and this isa favorable change in the world, because robot can use thismedkit once the commander is done and achieve its goal ata much lower cost. The robot, indeed, realizes this once wegive it a bigger time horizon to plan with, as shown above (onthe right). Thus, in this case, the robot shows opportunismbased on how it believes the world state will change.

01 NOOP02 NOOP

...

13 NOOP14 PICK_UP_MEDKIT_ROBOT_MK1_ROOM115 MOVE_ROBOT_ROOM1_HALL116 MOVE_ROBOT_HALL1_HALL217 MOVE_ROBOT_HALL2_HALL318 CONDUCT_TRIAGE_ROBOT_HALL319 DROP_OFF_ROBOT_MK1_HALL3

3.3 NegotiationIn many cases, the robot will have to eventually produce

plans that will have potential points of conflict with the ex-pected plans of the commander. This occurs when there isno feasible plan with zero overlap between profiles (specifi-cally

∑gf,t × Gλ(t) = 0) or if the alternative plans for the

robot are too costly (as determined by the objective func-tion). If, however, the robot is equipped with the ability tocommunicate with the human, then it can negotiate a planthat suits both. To this end, we introduce a new variableHλ(t) and update the IP as follows:

min k1∑a∈AR

∑t∈{1,2,...,T} Ca × xa,t

+k2∑λ∈λ

∑f∈Γ−1(λ)

∑t∈{1,2,...,T} gf,t ×H

λ(t)

−k3∑λ∈Λ

∑f∈Γ−1(λ)

∑t∈{0,1,...,T−1} hf,t ×G

fλ(t)

+k4∑λ∈Λ

∑t∈{0,1,...,T} ||G

λ(t)−Hλ(t)||

yf,T ≥ hf,t−1 ∀ a s.t. f ∈ Pa, ∀f ∈ ξ, t ∈ {1, . . . , T} (5a)Hλ(t) ∈ [0, 1] ∀λ ∈ Λ, t ∈ {0, 1, . . . , T} (16)Hλ(t) ≤ Gλ(t) ∀λ ∈ Λ, t ∈ {0, 1, . . . , T} (17)Constraint (5a) now complements constraint (5) from theexisting formulation, by promising to restore the world stateevery time a demand is made on a variable. The variableHλ(t), maintained by constraints (16) and (17), determinethe desired deviation from the given profiles. The objectivefunction has been updated to reflect that overlaps are nowmeasured with the desired profile of usage, and there is acost associated with the deviation from the real one. Therevised plan now produced by the robot is shown below.

01 MOVE_ROBOT_ROOM1_HALL102 MOVE_ROBOT_HALL1_HALL203 MOVE_REVERSE_ROBOT_HALL2_ROOM204 PICK_UP_MEDKIT_ROBOT_MK1_ROOM205 MOVE_ROBOT_ROOM2_HALL206 MOVE_ROBOT_HALL2_HALL307 CONDUCT_TRIAGE_ROBOT_HALL308 MOVE_REVERSE_ROBOT_HALL3_HALL209 MOVE_REVERSE_ROBOT_HALL2_ROOM210 DROP_OFF_ROBOT_MK1_ROOM2

Notice that the robot restores the world state that the hu-man is believed to expect, and can now communicate to him“Can you please not use medkit1 from time 7 to 9?” basedon how the real and the ideal profiles diverge, i.e. t suchthat Hλ(t) < Gλ(t) for each resource λ.

Notes on Adaptive Behavior ModelingOne might note here that people are often adaptive and it isvery much possible that they may be willing to change theirgoals based on observing the robot or are even unwilling tonegotiate if their plans conflict. Hence the policies of com-promise and opportunism for the robot are complementaryto negotiation in the event the latter fails. Thus, for exam-ple, the robot might choose to communicate a negotiationstrategy to the human, but fall back on a compromise if thatfails. It is a merit of such a simple formulation to be able tohandle such interesting adaptive behaviors.

1074

4. EVALUATIONThe power of the proposed approach lies in the modular

nature in which it tackles several complicated problems thatare separate research areas in their own rights. As we sawthroughout the course of the discussion, approaches usedin the individual modules may be varied with little to nochange in the rest of the architecture. For example the ex-pressivity of the belief modeling or goal recognition compo-nent is handled separately as the planner used informationfrom a generic plan set. Again the representation techniqueintroduced in terms of resource profiles provide properties interms of computational independence with respect to size ofthe hypothesis set and number of agents (which gets man-ifested in complexity in number of resources) that generalplanners do not have. So it becomes a design choice de-pending on which metric needs to be optimized.

For empirical evaluations, we simulated the USAR sce-nario on 360 different problem instances, randomly gener-ated by varying the specific (as well as the number of prob-able) goals of the human, and evaluated how the plannerbehaved with the number of observations it can start withto build its profiles. We fix the domain description, locationand goal of the agents, and the position of the resources, andconsider randomly generated hypothesis goal sets of size 2-11. The goals of the commander were assumed to be knownto be triage related, but the location of the triage was allo-cated randomly (one of which was again picked at randomas the real goal). Finally for each of these problems, wegenerate 1-5 observations by simulating the commander’splan over the real goal, and use these observations knowna priori the robot’s plan generation process. The experi-ments were conducted on an Intel Xeon(R) CPU E5-1620v2 3.70GHz×8 processor with a 62.9GiB memory.

4.1 Scaling UpOur primary contribution is the formulation for planning

with resource profiles, while the goal recognition componentcan be any off-the-shelf algorithm, and as such we comparescalability with respect to the planning component only.

- w.r.t. Length of the Planning HorizonThe performance of the planner with respect to the planninghorizon is shown in Figure 4a. This is, as expected, thebottleneck in computation due to exponential growth of thesize of the IP. It is however not prohibitively expensive, andthe planner is still able to produce plans of length 20 (steps,not durations) for our domain in a matter of seconds.

- w.r.t. Number of ResourcesThe performance of the planner with respect to the num-ber of constrained resources (medkits, in the context of thecurrent discussion) is shown in Figure 4b. Evidently, thecomputational effort is dominated by that due to the plan-ning horizon. This reiterates the usefulness of abstractingthe information in predicted plans in the form of resourceprofiles, thus isolating the complexities of the domain withthat of the underlying planning algorithm.

- w.r.t. the Number of Agents and GoalsThe planning module (i.e. the IP formulation) is by itselfindependent of the number of agents being modeled. Infact, this is one of the major advantages of using abstrac-tions like resource profiles in lieu of actual plans of each of

(a) w.r.t. T (|Λ| = 2)

(b) w.r.t. #medkits (T = 10)

Figure 4: Performance of the planner w.r.t. planning hori-zon T and number of constrained resources (medkits).

the agents. On the other hand, the time spent on recogni-tion, and on calculating the profiles, is significantly affected.However, observations on multiple agents are asynchronous,and goal recognition can operate in parallel, so that this isnot a huge concern beyond the complexity of a single in-stance. Similarly the performance is also unaffected by thesize of the hypothesis set Ψα, as shown in Figure 5, whichshows increase in the number of the possible goals does notcomplicate the profiles to an extent to affect the complexity.

4.2 Quality of the Plans ProducedWe define U as the average conflicts per plan step when

a demand is placed on a resource by the robot, and S asthe success probability per plan step that the demand ismet. C is the cost of a plan. F is the percentage of timesthere was an actual conflict during execution (distinct fromU which estimates the possible conflict that may occur perplan step). We observe the quality of the plans producedby the planner by varying the ratio of parameters k1 and k3from the objective function and the length of the planninghorizon T . Similar results can be produced by varying k1/k2.

From Table 1, as k1/k3 decreases, the planner becomesmore conservative (to maximize success probability) andthus plans become costlier. At the same time the expectedsuccess rate of actions are also increased (with simultaneousincrease in usage conflict), as reflected by a higher failurerate due to actual execution time conflicts.

1075

Figure 5: Performance of the planner w.r.t. size of the goalset. As expected, computational complexity is not affected.

k1/k3 0.05 0.5 5.0

C 9.47 6.37 6.31U 0.18 0.17 0.17S 0.85 0.579 0.578F 27.5 23.0 21.3

Table 1: Quality of plans produced w.r.t. k1/k3. Conserva-tive plans result in lowered utility.

Also note, from Table 2 the impact of the planning hori-zon T on the types of behaviors we discussed in the previ-ous section. As we increase T , the plan cost falls below theoptimal, indicating opportunities for opportunistic behavioron the part of the robot. The expected conflict also fallsto almost 0. However the expected success rate of actionsalso decreases, the ratio k1/k2 determines how daring therobot is, in choosing between cheap versus possibly invalidplans. Note, however, the actual execution time conflict isextremely low with increasing T , for even sufficiently con-servative estimates of S.

Thus we see that the robot is successfully able to nav-igate conflicts and find in many cases plans even cheaperthan the original optimal plan, thus highlighting the useful-ness of the approach. Finally, we look at the impact of theparameters in the plan recognition module in Figure 6. Asexpected, with bigger hypothesis sets, the success rate goesdown. Interestingly, the plan cost also shows a downwardtrend which might be because the bigger variety in possiblegoals give a better idea of which medkits are generally moreuseful for that instance at what points of time. With moreobservations, as expected, the success rate goes up and theexpected conflict goes down. The cost, however, increases alittle as the planner opts for more conservative options.

5. CONCLUSIONSIn this paper we investigate how plans may be affected by

conflicts on shared resources in an environment cohabited byhumans and robots, and introduce the concept of resourceprofiles as a means of representation for concisely modelingthe information pertaining to the usage of such resources,contained in predicted behavior of the agents. We proposea general formulation of a planner for such scenarios andshow how the planner can be used to model different typesof behavior of the robot by appropriately configuring the

(a) w.r.t. |Ψα|

(b) w.r.t. #obs

Figure 6: Performance of the planner w.r.t. size of goal setand number of observations (k1/k3 = 0.5, T = 16).

T 10 13 16 Optimal

C 9.0 5.6 4.53 9.0U 0.46 0.04 ≈0 n/aS 1.0 0.48 0.25 n/aF 53.3 11.9 6.6 53.3

Table 2: Quality of plans produced w.r.t. T . Opportunitiesfor opportunism explored, conflicts minimized.

objective function and optimization parameters. Finally, weprovide an end-to-end framework that integrates belief mod-eling, goal recognition and an IP-solver that can enforce thedesired interaction constraints. One interesting research di-rection would be to consider nested beliefs on the agents; af-ter all, humans are rarely completely aloof of other agents inits environment. Such interactions should have to considerevolution of beliefs with continued interactions and moti-vate further exploration of the belief modeling component.The modularity of the proposed approach allows for focusedresearch on each (individually challenging) subtask withoutsignificantly affecting the others.

AcknowledgmentThis research is supported in part by the ONR grants N00014-13-1-0176, N00014-13-1-0519 and N00014-15-1-2027, and theARO grant W911NF-13-1-0023.

1076

REFERENCES[1] E. Beaudry, F. Kabanza, and F. Michaud. Planning

with concurrency under resources and timeuncertainty. In Proceedings of the 2010 Conference onECAI 2010: 19th European Conference on ArtificialIntelligence, pages 217–222, Amsterdam, TheNetherlands, The Netherlands, 2010. IOS Press.

[2] A. Blum and M. L. Furst. Fast planning throughplanning graph analysis. In IJCAI, pages 1636–1642,1995.

[3] F. Cavallo, R. Limosani, A. Manzi, M. Bonaccorsi,R. Esposito, M. Di Rocco, F. Pecora, G. Teti,A. Saffiotti, and P. Dario. Development of a sociallybelievable multi-robot solution from town to home.Cognitive Computation, 6(4):954–967, 2014.

[4] T. Chakraborti, G. Briggs, K. Talamadupula,Y. Zhang, M. Scheutz, D. Smith, andS. Kambhampati. Planning for serendipity. InIEEE/RSJ International Conference on IntelligentRobots and Systems (IROS), 2015.

[5] M. Cirillo, L. Karlsson, and A. Saffiotti. Human-awaretask planning for mobile robots. In Proc of the IntConf on Advanced Robotics (ICAR), 2009.

[6] M. Cirillo, L. Karlsson, and A. Saffiotti. Human-awaretask planning: An application to mobile robots. ACMTrans. Intell. Syst. Technol., 1(2):15:1–15:26, Dec.2010.

[7] Y. E.-Mart́ın, M. D. R.-Moreno, and D. E. Smith. Afast goal recognition technique based on interactionestimates. In Proceedings of the Twenty-FourthInternational Joint Conference on ArtificialIntelligence, IJCAI 2015, Buenos Aires, Argentina,July 25-31, 2015, pages 761–768, 2015.

[8] M. Helmert. The fast downward planning system.CoRR, abs/1109.6051, 2011.

[9] U. Koeckemann, F. Pecora, and L. Karlsson. Grandpahates robots - interaction constraints for planning ininhabited environments. In Proc. AAAI-2010, 2014.

[10] M. Kuderer, H. Kretzschmar, C. Sprunk, andW. Burgard. Feature-based prediction of trajectoriesfor socially compliant navigation. In Proceedings ofRobotics: Science and Systems, Sydney, Australia,July 2012.

[11] D. Mcdermott, M. Ghallab, A. Howe, C. Knoblock,A. Ram, M. Veloso, D. Weld, and D. Wilkins. Pddl -the planning domain definition language. TechnicalReport TR-98-003, Yale Center for ComputationalVision and Control

”1998.

[12] M. Ramirez and H. Geffner. Probabilistic planrecognition using off-the-shelf classical planners. In InProc. AAAI-2010, 2010.

[13] E. Sisbot, L. Marin-Urias, R. Alami, and T. Simeon. Ahuman aware mobile robot motion planner. Robotics,IEEE Transactions on, 23(5):874–883, Oct 2007.

[14] K. Talamadupula, G. Briggs, T. Chakraborti,M. Scheutz, and S. Kambhampati. Coordination inhuman-robot teams using mental modeling and planrecognition. In Intelligent Robots and Systems (IROS2014), 2014 IEEE/RSJ International Conference on,pages 2957–2962, Sept 2014.

[15] S. Tomic, F. Pecora, and A. Saffiotti. Too cool forschool - adding social constraints in human awareplanning. In Proc of the International Workshop onCognitive Robotics (CogRob), 2014.

[16] T. Vossen, M. O. Ball, A. Lotem, and D. S. Nau. Onthe use of integer programming models in ai planning.In T. Dean, editor, IJCAI, pages 304–309. MorganKaufmann, 1999.

1077

IntroductionPlanning with Conflicts on Shared ResourcesThe Belief Modeling ComponentThe Goal Recognition ComponentGoal ExtensionGoal / Plan Recognition

Resources and Resource ProfilesConflict Minimization

Modulating Behavior of the RobotCompromiseOpportunismNegotiation

EvaluationScaling UpQuality of the Plans Produced

Conclusions

planning with resource conﬂicts in human-robot cohabitation · with a brief introduction of the...

Documents