csc2542w19 contingentplanning - department of computer ...€¦ · christian muise vaishak belle...

CSC2542 Contingent Planning

Sheila McIlraithDepartment of Computer ScienceWinter, 2019

Contingent PlanningThe slides that follow describe the following publication:

Christian J. Muise, Vaishak Belle, Sheila A. McIlraith: Computing Contingent Plans via Fully Observable Non-Deterministic Planning. AAAI 2014: 2322-2329

Slides largely prepared by Christian Muise.

https://dblp.uni-trier.de/pers/hd/m/Muise:Christian_J=

https://dblp.uni-trier.de/pers/hd/b/Belle:Vaishak

https://dblp.uni-trier.de/db/conf/aaai/aaai2014.html

Computing Contingent Plans viaFully Observable Non-Deterministic Planning

Christian Muise Vaishak Belle Sheila A. McIlraith

Department of Computer Science,University of Toronto, Toronto, Canada.{cjmuise,vaishak,sheila}@cs.toronto.edu

June 23, 2014

Overview

PRP (2012)

Planner for Fully ObservableNon-Deterministic (FOND).

Powerful techniques for deadenddetection, state relevance, etc.

Recently updated to handleconditional effects.

PO-PRP (2014)

Computes conditional plans forsimple contingent problems.

Modifies the computational coreof PRP to solve encodings.

Novel approach for constructingcompact conditional plans.

Main Contribution

Leverage modern FOND planners and POND encodings toeffectively compile a compact conditional plan offline.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 1 / 24

Example Plan: doors-5


Outline

1 Background & Notation

2 PO-PRPEncodingComputingRepresenting

3 Evaluation

4 Conclusion


Outline


2 PO-PRP

3 Evaluation

4 Conclusion


Contingent Planning (1/2)

Planning Task: 〈F , I,G,O,A〉F : Finite set of fluents.

I: Set of clauses over F that determines the initial state.

G: Conjunction of atoms over F that determines the goal.

O: Set of observations.

A: Set of actions.

States

Belief state b represents a set of states.

I corresponds to a belief state bI .

b satisfies a formula φ if φ holds in every state s ∈ b.


Contingent Planning (2/2)

Actions

Pre(a): Conjunction of atoms that must hold to execute a.

Eff(a): Set of pairs 〈C , L〉 to capture conditional effects.

Action progression: ba = {Prog(s, a) | s ∈ b}

Observations

Observation o consists of the condition Pre(o) that the agentmust know before observing the truth of fluent Sense(o).

Belief update with observation: boa = {s | s ∈ ba and L ∈ s},where o = 〈C , L〉 and every state in ba satisfies C .


Simple Contingent Problems (Bonet & Geffner, 2011)

Effective Characterization

The non-unary clauses in I are invariant. E.g.,:

Location of the agent (always in exactly one spot).

Uncertainty of a fluent (can only ever be true or false).

Monotonicity

Uncertainty of a fluent cannot propagate to other fluents. E.g.,:

Ok: The effect dropping a vase when we know its status:〈{fragile vase}, broken vase〉

Not Ok: Unknowingly moving into a trap:〈{at wumpus loc3}, dead〉

Corollary: Once known, a fluent remains known.


FOND Planning

Planning Task: 〈F , I,G,A〉F : Finite set of fluents.

I: Complete setting of the fluents for the initial state.

G: Conjunction of atoms over F that determines the goal.

A: Set of (possibly non-deterministic) actions.

Actions

Pre(a) is the set of fluents that must be true to execute a.

Eff(a) is a finite set of non-deterministic effects.

A non-deterministic effect consists of a set of conditionaleffects, each being a pair 〈C , L〉 (C possibly being >)


FOND Solutions and Representation

Weak Plan

Strong Plan

Strong Cyclic Plan


FOND Solutions and Representation

Policy

P(s): Given a set of tuples of the form 〈p, a, c〉 where p is a partialstate, a an action, and c a cost, return the action a in state s fromthe tuple with the lowest cost c such that s |= p.

Strong Cyclic Plan


Outline


2 PO-PRP

3 Evaluation

4 Conclusion


Approach

1 Convert the contingent problem to a FOND one.

2 Solve with a tailored version of PRP.

3 Compile the final conditional plan.


Outline



FOND Encoding (based on Bonet & Geffner, 2011)

Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′

A ∪ A′O ∪ A′

V

A′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.

A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.




A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.

A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Note: Do not need to include {¬K¬C ,¬K¬L | 〈C , L〉 ∈ Eff(a)}




A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.

A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Note: Can only observe a fluent once due to monotonicity.




A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.


Key Considerations

Fairness Assumption

How do we get away with assuming that the non-deterministicoutcomes of a sensing action are “fair”?

Reachable Incoherent States

How do we handle belief states that will never occur in practice?(e.g., considering a sensing outcome that is not possible)

Solution Correspondence

How do we interpret the policy that PRP generates for the originalcontingent planning problem?


Key Considerations

Fairness Assumption

How do we get away with assuming that the non-deterministicoutcomes of a sensing action are “fair”? Answer: Monotonicity!

Reachable Incoherent States

How do we handle belief states that will never occur in practice?(e.g., considering a sensing outcome that is not possible)

Solution Correspondence

How do we interpret the policy that PRP generates for the originalcontingent planning problem?


Outline



General PO-PRP Algorithm

Simple Contingent Problem → FOND Problem → Contingent Plan

1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);

2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;

3 If UpdatePolicy failed in 2.2, process s as a deadend ;

3 If Open is empty, return P. Else, repeat from step 2;


General PO-PRP Algorithm

Simple Contingent Problem → FOND Problem ⇒ Contingent Plan

1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);

2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;

3 If UpdatePolicy failed in 2.2, process s as a deadend ;

3 If Open is empty, return P. Else, repeat from step 2;


UpdatePolicy(〈F ′, s,G ′,A′〉,P)

1 A′′ = Determinize(A′); // Using all-outcomes

2 [a1, · · · an] = ComputePlan(〈F ′, s,G′,A′′〉);

3 For every suffix [ai , · · · an] of the plan,1 pi = Regress(G′, [ai , · · · an]);2 ci = cost([ai , · · · an]);3 Add 〈pi , ai , ci 〉 to P;

Invariants left as actions (and not axioms).

Priority is given to useful invariants.

Arbitrary ordering of invariants enforced.


Outline



Contingent Plan Representations

Option 1: State-Action Mapping

Uses the default output of PRP.

Requires maintaining the compiled belief.

Difficult to see the “bigger picture”.

Option 2: Conditional Plan

Expressed in terms of the original problem.

Standard for (offline) contingent planning.

Typically takes the form of an exponentially large tree.


Strong Cyclic Detection

(PO-)PRP identifies a subset of the tuples in a policy P that aredetermined to be marked as strong cyclic.

Strong Cyclic Property 1

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, the policy P is a strong cyclic plan for s.

Strong Cyclic Property 2

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, s ′ = Prog(s, a), and P(s ′) = 〈p′, a′, c ′〉,either p′ is G′ or the tuple 〈p′, a′, c ′〉 is marked as strong cyclic.


Exporting a Conditional Plan

General Intuition

Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,

(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;

(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;

(iii) Add Succ \ Cls to N and Open;

(iv) Add (s, s ′′) to E for every s ′′ in Succ;

3 If Open is empty, return 〈N,E 〉. Else repeat from 2.



General Intuition


1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;

2 Select and move a state s from Open to Cls such that,








General Intuition










General Intuition






(iv) Add (s, s ′′) to E for every s ′′ in Succ ;



Outline


2 PO-PRP

3 Evaluation

4 Conclusion


Setup

Used one fabricated and three existing domains to comparePO-PRP with CLG (Albore, Palacios, and Geffner 2009).

All domains are simple contingent planning problems.

Resources given: 30min and 1Gb time and memory limit.

Uses a modified version of the kreplanner translator.

Fast Downward’s invariant synthesis limited to 3 seconds.


Testbed

Domains

Doors (doors)

Coloured Balls (cballs)

Simple Wumpus (wumpus)

Canadian Traveller Problem (ctp)

Metrics

Size of contingent plan (action + sensing nodes)

Time to compute offline plan (sec)

PO-PRP Policy size (# of tuples)


Results: Time and Size (1/2)

ProblemTime (seconds) Size (actions + sensing)

CLG PO-PRP CLG PO-PRP

cballs-4-1 0.20 0.02 343 261

cballs-4-2 19.86 0.67 22354 13887

cballs-4-3 1693.02 171.28 1247512 671988

cballs-10-1 211.66 1.57 4829 4170

cballs-10-2 T M T M

doors-5 0.12 0.01 169 82

doors-7 3.50 0.04 2492 1295

doors-9 187.60 1.07 50961 28442

doors-11 T M T M


Results: Time and Size (2/2)

ProblemTime (seconds) Size (actions + sensing)

CLG PO-PRP CLG PO-PRP

wumpus-5 0.44 0.16 854 233

wumpus-7 9.28 1.54 7423 770

wumpus-10 1379.62 11.17 362615 2669

wumpus-15 T 86.16 T 15628

wumpus-20 T M T M

ctp-ch-1 0.00 0.00 5 4

ctp-ch-5 0.02 0.00 125 16

ctp-ch-10 2.2 0.02 4093 31

ctp-ch-15 133.24 0.07 131069 46

ctp-ch-20 T 0.22 T 61


Results: PO-PRP Policy -vs- Contingent Plan Size

ProblemSize PO-PRP Policy Size

CLG PO-PRP all strong used

cballs-4-1 343 261 150 24 125

cballs-4-2 22354 13887 812 46 507

cballs-4-3 1247512 671988 3753 81 1689

cballs-10-1 4829 4170 1139 20 815

wumpus-5 854 233 706 160 331

wumpus-7 7423 770 2974 241 992

wumpus-10 362615 2669 8122 281 2482


Outline


2 PO-PRP

3 Evaluation

4 Conclusion


Summary

Introduced PO-PRP: an extension to the FOND planner,PRP, capable of solving simple contingent planning problems.

Identified some of the key considerations involved when usingthe POND-as-FOND methodology.

Demonstrated the performance gains to be had over theprevious state of the art in offline contingent planning.


Future Work

Extend PO-PRP to handle width-1 problems.

Handle a richer set of invariants for new compilations.

Improve the strong cyclic detection for more compact policies.

Session PM 2b: Planning Under Uncertainty (Tues. 2:50pm)

Non-Deterministic Planning With Conditional Effects Muise,C.; McIlraith, S. A.; and Bell, V. In the 24th InternationalConference on Automated Planning and Scheduling, 2014.


Any Questions?

Benchmarks, code, and slides available (in July) at:http://www.haz.ca/research/poprp/


http://www.haz.ca/research/poprp/

csc2542w19 contingentplanning - department of computer ...€¦ · christian muise vaishak belle...

Documents