csc2542w19 contingentplanning - department of computer ...€¦ · christian muise vaishak belle...

61
CSC2542 Contingent Planning Sheila McIlraith Department of Computer Science Winter, 2019

Upload: others

Post on 27-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

CSC2542 Contingent Planning

Sheila McIlraithDepartment of Computer ScienceWinter, 2019

Page 2: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Contingent PlanningThe slides that follow describe the following publication:

Christian J. Muise, Vaishak Belle, Sheila A. McIlraith: Computing Contingent Plans via Fully Observable Non-Deterministic Planning. AAAI 2014: 2322-2329

Slides largely prepared by Christian Muise.

Page 3: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Computing Contingent Plans viaFully Observable Non-Deterministic Planning

Christian Muise Vaishak Belle Sheila A. McIlraith

Department of Computer Science,University of Toronto, Toronto, Canada.{cjmuise,vaishak,sheila}@cs.toronto.edu

June 23, 2014

Page 4: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Overview

PRP (2012)

Planner for Fully ObservableNon-Deterministic (FOND).

Powerful techniques for deadenddetection, state relevance, etc.

Recently updated to handleconditional effects.

PO-PRP (2014)

Computes conditional plans forsimple contingent problems.

Modifies the computational coreof PRP to solve encodings.

Novel approach for constructingcompact conditional plans.

Main Contribution

Leverage modern FOND planners and POND encodings toeffectively compile a compact conditional plan offline.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 1 / 24

Page 5: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Overview

PRP (2012)

Planner for Fully ObservableNon-Deterministic (FOND).

Powerful techniques for deadenddetection, state relevance, etc.

Recently updated to handleconditional effects.

PO-PRP (2014)

Computes conditional plans forsimple contingent problems.

Modifies the computational coreof PRP to solve encodings.

Novel approach for constructingcompact conditional plans.

Main Contribution

Leverage modern FOND planners and POND encodings toeffectively compile a compact conditional plan offline.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 1 / 24

Page 6: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Overview

PRP (2012)

Planner for Fully ObservableNon-Deterministic (FOND).

Powerful techniques for deadenddetection, state relevance, etc.

Recently updated to handleconditional effects.

PO-PRP (2014)

Computes conditional plans forsimple contingent problems.

Modifies the computational coreof PRP to solve encodings.

Novel approach for constructingcompact conditional plans.

Main Contribution

Leverage modern FOND planners and POND encodings toeffectively compile a compact conditional plan offline.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 1 / 24

Page 7: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Example Plan: doors-5

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 2 / 24

Page 8: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Example Plan: doors-5

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 2 / 24

Page 9: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Outline

1 Background & Notation

2 PO-PRPEncodingComputingRepresenting

3 Evaluation

4 Conclusion

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 3 / 24

Page 10: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Outline

1 Background & Notation

2 PO-PRP

3 Evaluation

4 Conclusion

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 4 / 24

Page 11: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Contingent Planning (1/2)

Planning Task: 〈F , I,G,O,A〉F : Finite set of fluents.

I: Set of clauses over F that determines the initial state.

G: Conjunction of atoms over F that determines the goal.

O: Set of observations.

A: Set of actions.

States

Belief state b represents a set of states.

I corresponds to a belief state bI .

b satisfies a formula φ if φ holds in every state s ∈ b.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 4 / 24

Page 12: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Contingent Planning (2/2)

Actions

Pre(a): Conjunction of atoms that must hold to execute a.

Eff(a): Set of pairs 〈C , L〉 to capture conditional effects.

Action progression: ba = {Prog(s, a) | s ∈ b}

Observations

Observation o consists of the condition Pre(o) that the agentmust know before observing the truth of fluent Sense(o).

Belief update with observation: boa = {s | s ∈ ba and L ∈ s},where o = 〈C , L〉 and every state in ba satisfies C .

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 5 / 24

Page 13: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Contingent Planning (2/2)

Actions

Pre(a): Conjunction of atoms that must hold to execute a.

Eff(a): Set of pairs 〈C , L〉 to capture conditional effects.

Action progression: ba = {Prog(s, a) | s ∈ b}

Observations

Observation o consists of the condition Pre(o) that the agentmust know before observing the truth of fluent Sense(o).

Belief update with observation: boa = {s | s ∈ ba and L ∈ s},where o = 〈C , L〉 and every state in ba satisfies C .

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 5 / 24

Page 14: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Simple Contingent Problems (Bonet & Geffner, 2011)

Effective Characterization

The non-unary clauses in I are invariant. E.g.,:

Location of the agent (always in exactly one spot).

Uncertainty of a fluent (can only ever be true or false).

Monotonicity

Uncertainty of a fluent cannot propagate to other fluents. E.g.,:

Ok: The effect dropping a vase when we know its status:〈{fragile vase}, broken vase〉

Not Ok: Unknowingly moving into a trap:〈{at wumpus loc3}, dead〉

Corollary: Once known, a fluent remains known.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 6 / 24

Page 15: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Simple Contingent Problems (Bonet & Geffner, 2011)

Effective Characterization

The non-unary clauses in I are invariant. E.g.,:

Location of the agent (always in exactly one spot).

Uncertainty of a fluent (can only ever be true or false).

Monotonicity

Uncertainty of a fluent cannot propagate to other fluents. E.g.,:

Ok: The effect dropping a vase when we know its status:〈{fragile vase}, broken vase〉

Not Ok: Unknowingly moving into a trap:〈{at wumpus loc3}, dead〉

Corollary: Once known, a fluent remains known.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 6 / 24

Page 16: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Simple Contingent Problems (Bonet & Geffner, 2011)

Effective Characterization

The non-unary clauses in I are invariant. E.g.,:

Location of the agent (always in exactly one spot).

Uncertainty of a fluent (can only ever be true or false).

Monotonicity

Uncertainty of a fluent cannot propagate to other fluents. E.g.,:

Ok: The effect dropping a vase when we know its status:〈{fragile vase}, broken vase〉

Not Ok: Unknowingly moving into a trap:〈{at wumpus loc3}, dead〉

Corollary: Once known, a fluent remains known.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 6 / 24

Page 17: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Planning

Planning Task: 〈F , I,G,A〉F : Finite set of fluents.

I: Complete setting of the fluents for the initial state.

G: Conjunction of atoms over F that determines the goal.

A: Set of (possibly non-deterministic) actions.

Actions

Pre(a) is the set of fluents that must be true to execute a.

Eff(a) is a finite set of non-deterministic effects.

A non-deterministic effect consists of a set of conditionaleffects, each being a pair 〈C , L〉 (C possibly being >)

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 7 / 24

Page 18: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Planning

Planning Task: 〈F , I,G,A〉F : Finite set of fluents.

I: Complete setting of the fluents for the initial state.

G: Conjunction of atoms over F that determines the goal.

A: Set of (possibly non-deterministic) actions.

Actions

Pre(a) is the set of fluents that must be true to execute a.

Eff(a) is a finite set of non-deterministic effects.

A non-deterministic effect consists of a set of conditionaleffects, each being a pair 〈C , L〉 (C possibly being >)

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 7 / 24

Page 19: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Solutions and Representation

Weak Plan

Strong Plan

Strong Cyclic Plan

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 8 / 24

Page 20: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Solutions and Representation

Weak Plan

Strong Plan

Strong Cyclic Plan

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 8 / 24

Page 21: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Solutions and Representation

Weak Plan

Strong Plan

Strong Cyclic Plan

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 8 / 24

Page 22: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Solutions and Representation

Policy

P(s): Given a set of tuples of the form 〈p, a, c〉 where p is a partialstate, a an action, and c a cost, return the action a in state s fromthe tuple with the lowest cost c such that s |= p.

Strong Cyclic Plan

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 8 / 24

Page 23: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Outline

1 Background & Notation

2 PO-PRP

3 Evaluation

4 Conclusion

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 9 / 24

Page 24: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Approach

1 Convert the contingent problem to a FOND one.

2 Solve with a tailored version of PRP.

3 Compile the final conditional plan.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 9 / 24

Page 25: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Outline

2 PO-PRPEncodingComputingRepresenting

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24

Page 26: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Encoding (based on Bonet & Geffner, 2011)

Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′

A ∪ A′O ∪ A′

V

A′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.

A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24

Page 27: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Encoding (based on Bonet & Geffner, 2011)

Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′

A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.

A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Note: Do not need to include {¬K¬C ,¬K¬L | 〈C , L〉 ∈ Eff(a)}

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24

Page 28: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Encoding (based on Bonet & Geffner, 2011)

Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′

A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.

A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Note: Can only observe a fluent once due to monotonicity.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24

Page 29: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

FOND Encoding (based on Bonet & Geffner, 2011)

Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′

A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24

Page 30: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Key Considerations

Fairness Assumption

How do we get away with assuming that the non-deterministicoutcomes of a sensing action are “fair”?

Reachable Incoherent States

How do we handle belief states that will never occur in practice?(e.g., considering a sensing outcome that is not possible)

Solution Correspondence

How do we interpret the policy that PRP generates for the originalcontingent planning problem?

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 11 / 24

Page 31: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Key Considerations

Fairness Assumption

How do we get away with assuming that the non-deterministicoutcomes of a sensing action are “fair”? Answer: Monotonicity!

Reachable Incoherent States

How do we handle belief states that will never occur in practice?(e.g., considering a sensing outcome that is not possible)

Solution Correspondence

How do we interpret the policy that PRP generates for the originalcontingent planning problem?

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 11 / 24

Page 32: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Outline

2 PO-PRPEncodingComputingRepresenting

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 12 / 24

Page 33: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

General PO-PRP Algorithm

Simple Contingent Problem → FOND Problem → Contingent Plan

1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);

2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;

3 If UpdatePolicy failed in 2.2, process s as a deadend ;

3 If Open is empty, return P. Else, repeat from step 2;

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 12 / 24

Page 34: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

General PO-PRP Algorithm

Simple Contingent Problem → FOND Problem ⇒ Contingent Plan

1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);

2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;

3 If UpdatePolicy failed in 2.2, process s as a deadend ;

3 If Open is empty, return P. Else, repeat from step 2;

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 12 / 24

Page 35: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

General PO-PRP Algorithm

Simple Contingent Problem → FOND Problem ⇒ Contingent Plan

1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);

2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;

3 If UpdatePolicy failed in 2.2, process s as a deadend ;

3 If Open is empty, return P. Else, repeat from step 2;

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 12 / 24

Page 36: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

UpdatePolicy(〈F ′, s,G ′,A′〉,P)

1 A′′ = Determinize(A′); // Using all-outcomes

2 [a1, · · · an] = ComputePlan(〈F ′, s,G′,A′′〉);

3 For every suffix [ai , · · · an] of the plan,1 pi = Regress(G′, [ai , · · · an]);2 ci = cost([ai , · · · an]);3 Add 〈pi , ai , ci 〉 to P;

Invariants left as actions (and not axioms).

Priority is given to useful invariants.

Arbitrary ordering of invariants enforced.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 13 / 24

Page 37: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

UpdatePolicy(〈F ′, s,G ′,A′〉,P)

1 A′′ = Determinize(A′); // Using all-outcomes

2 [a1, · · · an] = ComputePlan(〈F ′, s,G′,A′′〉);

3 For every suffix [ai , · · · an] of the plan,1 pi = Regress(G′, [ai , · · · an]);2 ci = cost([ai , · · · an]);3 Add 〈pi , ai , ci 〉 to P;

Invariants left as actions (and not axioms).

Priority is given to useful invariants.

Arbitrary ordering of invariants enforced.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 13 / 24

Page 38: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Outline

2 PO-PRPEncodingComputingRepresenting

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 14 / 24

Page 39: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Contingent Plan Representations

Option 1: State-Action Mapping

Uses the default output of PRP.

Requires maintaining the compiled belief.

Difficult to see the “bigger picture”.

Option 2: Conditional Plan

Expressed in terms of the original problem.

Standard for (offline) contingent planning.

Typically takes the form of an exponentially large tree.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 14 / 24

Page 40: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Contingent Plan Representations

Option 1: State-Action Mapping

Uses the default output of PRP.

Requires maintaining the compiled belief.

Difficult to see the “bigger picture”.

Option 2: Conditional Plan

Expressed in terms of the original problem.

Standard for (offline) contingent planning.

Typically takes the form of an exponentially large tree.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 14 / 24

Page 41: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Strong Cyclic Detection

(PO-)PRP identifies a subset of the tuples in a policy P that aredetermined to be marked as strong cyclic.

Strong Cyclic Property 1

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, the policy P is a strong cyclic plan for s.

Strong Cyclic Property 2

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, s ′ = Prog(s, a), and P(s ′) = 〈p′, a′, c ′〉,either p′ is G′ or the tuple 〈p′, a′, c ′〉 is marked as strong cyclic.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 15 / 24

Page 42: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Strong Cyclic Detection

(PO-)PRP identifies a subset of the tuples in a policy P that aredetermined to be marked as strong cyclic.

Strong Cyclic Property 1

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, the policy P is a strong cyclic plan for s.

Strong Cyclic Property 2

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, s ′ = Prog(s, a), and P(s ′) = 〈p′, a′, c ′〉,either p′ is G′ or the tuple 〈p′, a′, c ′〉 is marked as strong cyclic.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 15 / 24

Page 43: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Strong Cyclic Detection

(PO-)PRP identifies a subset of the tuples in a policy P that aredetermined to be marked as strong cyclic.

Strong Cyclic Property 1

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, the policy P is a strong cyclic plan for s.

Strong Cyclic Property 2

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, s ′ = Prog(s, a), and P(s ′) = 〈p′, a′, c ′〉,either p′ is G′ or the tuple 〈p′, a′, c ′〉 is marked as strong cyclic.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 15 / 24

Page 44: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Exporting a Conditional Plan

General Intuition

Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,

(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;

(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;

(iii) Add Succ \ Cls to N and Open;

(iv) Add (s, s ′′) to E for every s ′′ in Succ;

3 If Open is empty, return 〈N,E 〉. Else repeat from 2.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24

Page 45: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Exporting a Conditional Plan

General Intuition

Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;

2 Select and move a state s from Open to Cls such that,

(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;

(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;

(iii) Add Succ \ Cls to N and Open;

(iv) Add (s, s ′′) to E for every s ′′ in Succ;

3 If Open is empty, return 〈N,E 〉. Else repeat from 2.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24

Page 46: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Exporting a Conditional Plan

General Intuition

Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,

(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;

(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;

(iii) Add Succ \ Cls to N and Open;

(iv) Add (s, s ′′) to E for every s ′′ in Succ;

3 If Open is empty, return 〈N,E 〉. Else repeat from 2.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24

Page 47: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Exporting a Conditional Plan

General Intuition

Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,

(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;

(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;

(iii) Add Succ \ Cls to N and Open;

(iv) Add (s, s ′′) to E for every s ′′ in Succ;

3 If Open is empty, return 〈N,E 〉. Else repeat from 2.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24

Page 48: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Exporting a Conditional Plan

General Intuition

Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,

(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;

(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;

(iii) Add Succ \ Cls to N and Open;

(iv) Add (s, s ′′) to E for every s ′′ in Succ ;

3 If Open is empty, return 〈N,E 〉. Else repeat from 2.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24

Page 49: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Exporting a Conditional Plan

General Intuition

Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,

(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;

(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;

(iii) Add Succ \ Cls to N and Open;

(iv) Add (s, s ′′) to E for every s ′′ in Succ ;

3 If Open is empty, return 〈N,E 〉. Else repeat from 2.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24

Page 50: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Outline

1 Background & Notation

2 PO-PRP

3 Evaluation

4 Conclusion

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 17 / 24

Page 51: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Setup

Used one fabricated and three existing domains to comparePO-PRP with CLG (Albore, Palacios, and Geffner 2009).

All domains are simple contingent planning problems.

Resources given: 30min and 1Gb time and memory limit.

Uses a modified version of the kreplanner translator.

Fast Downward’s invariant synthesis limited to 3 seconds.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 17 / 24

Page 52: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Testbed

Domains

Doors (doors)

Coloured Balls (cballs)

Simple Wumpus (wumpus)

Canadian Traveller Problem (ctp)

Metrics

Size of contingent plan (action + sensing nodes)

Time to compute offline plan (sec)

PO-PRP Policy size (# of tuples)

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 18 / 24

Page 53: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Results: Time and Size (1/2)

ProblemTime (seconds) Size (actions + sensing)

CLG PO-PRP CLG PO-PRP

cballs-4-1 0.20 0.02 343 261

cballs-4-2 19.86 0.67 22354 13887

cballs-4-3 1693.02 171.28 1247512 671988

cballs-10-1 211.66 1.57 4829 4170

cballs-10-2 T M T M

doors-5 0.12 0.01 169 82

doors-7 3.50 0.04 2492 1295

doors-9 187.60 1.07 50961 28442

doors-11 T M T M

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 19 / 24

Page 54: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Results: Time and Size (2/2)

ProblemTime (seconds) Size (actions + sensing)

CLG PO-PRP CLG PO-PRP

wumpus-5 0.44 0.16 854 233

wumpus-7 9.28 1.54 7423 770

wumpus-10 1379.62 11.17 362615 2669

wumpus-15 T 86.16 T 15628

wumpus-20 T M T M

ctp-ch-1 0.00 0.00 5 4

ctp-ch-5 0.02 0.00 125 16

ctp-ch-10 2.2 0.02 4093 31

ctp-ch-15 133.24 0.07 131069 46

ctp-ch-20 T 0.22 T 61

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 20 / 24

Page 55: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Results: PO-PRP Policy -vs- Contingent Plan Size

ProblemSize PO-PRP Policy Size

CLG PO-PRP all strong used

cballs-4-1 343 261 150 24 125

cballs-4-2 22354 13887 812 46 507

cballs-4-3 1247512 671988 3753 81 1689

cballs-10-1 4829 4170 1139 20 815

wumpus-5 854 233 706 160 331

wumpus-7 7423 770 2974 241 992

wumpus-10 362615 2669 8122 281 2482

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 21 / 24

Page 56: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Outline

1 Background & Notation

2 PO-PRP

3 Evaluation

4 Conclusion

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 22 / 24

Page 57: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Summary

Introduced PO-PRP: an extension to the FOND planner,PRP, capable of solving simple contingent planning problems.

Identified some of the key considerations involved when usingthe POND-as-FOND methodology.

Demonstrated the performance gains to be had over theprevious state of the art in offline contingent planning.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 22 / 24

Page 58: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Future Work

Extend PO-PRP to handle width-1 problems.

Handle a richer set of invariants for new compilations.

Improve the strong cyclic detection for more compact policies.

Session PM 2b: Planning Under Uncertainty (Tues. 2:50pm)

Non-Deterministic Planning With Conditional Effects Muise,C.; McIlraith, S. A.; and Bell, V. In the 24th InternationalConference on Automated Planning and Scheduling, 2014.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 23 / 24

Page 59: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Future Work

Extend PO-PRP to handle width-1 problems.

Handle a richer set of invariants for new compilations.

Improve the strong cyclic detection for more compact policies.

Session PM 2b: Planning Under Uncertainty (Tues. 2:50pm)

Non-Deterministic Planning With Conditional Effects Muise,C.; McIlraith, S. A.; and Bell, V. In the 24th InternationalConference on Automated Planning and Scheduling, 2014.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 23 / 24

Page 60: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,

Any Questions?

Benchmarks, code, and slides available (in July) at:http://www.haz.ca/research/poprp/

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 24 / 24

Page 61: csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle Sheila A. McIlraith Department of Computer Science, University of Toronto, Toronto,