l earning p robabilistic h ierarchical t ask n etworks to c apture u ser p references nan li,...

LEARNING PROBABILISTIC HIERARCHICAL TASK NETWORKS TO CAPTURE USER PREFERENCESNan Li, Subbarao Kambhampati, and Sungwook YoonSchool of Computing and InformaticsArizona State UniversityTempe, AZ 85281 [email protected], [email protected], [email protected] Thanks to William Cushing

A riddle for you:

What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?

TWO TALES OF HTN PLANNING

Abstraction Efficiency Top-down

o Preference handlingo Qualityo Bottom-up

Learning Most work o Our work

Hitchhike? No way!

Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2

Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8

Phike: Hitchhike(source, dest) 0

LEARNING USER PLAN PREFERENCES

LEARNING USER PREFERENCES AS PHTNS

Given a set O of plans executed by the user Find a generative model, Hl

Hl = argmaxH p (O |H)

Probabilistic Hierarchical Task Networks(pHTNs)

S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout

LEARNING pHTNs

HTNs can be seen as providing a grammar of desired solutions Actions Words Plans Sentences HTNs Grammar HTN learning Grammar induction

pHTN learning by probabilistic context free grammar (pCFG) induction Assumptions: parameter-less, unconditional

S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout

A TWO-STEP ALGORITHM

• Greedy Structure Hypothesizer: Hypothesizes the

schema structure

• Expectation-Maximization (EM) Phase: Refines schema

probabilities Removes redundant

schemas

Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

GREEDY STRUCTURE HYPOTHESIZER

Structure learning Bottom-up Prefer recursive to non-recursive

EM PHASE

E Step: Plan parse tree

computation Most probable parse

tree M Step:

Selection probabilities update

s: ai p, aj ak

EVALUATION

Ideal: User studies (too hard) Our approach:

Assume H* represents user preferences Generate observed plans using H* (H* O) Learn Hl from O (O Hl) Compare H* and Hl (H* T*, Hl Tl)

Syntactic similarity is not important, only distribution is

Use KL-Divergence between distributions T*, Tl

KL-Divergence measures distance between distributions

Domains Randomly Generated Logistics Planning, Gold Miner

H*

P1, P2, …Pn

Learner

Hl

RATE OF LEARNING AND CONCISENESS

Rate of Learning Conciseness

More training plans, better schemas.

• Small domains, 1 or 2 more non-primitive actions• Large domains, much more non-primitive actions• Refine structure learning?

Randomly Generated Domains

EFFECTIVENESS OF EM

• Compare greedy schemas with learned schemas• EM step is very effective in capturing user preferences

Randomly Generated Domains

“BENCHMARK” DOMAINS

H*: Move by plane or

truck Prefer plane Prefer fewer steps

KL Divergence: 0.04 Recovers

plane > truck less steps > more

steps

H*: Get the laser cannon Shoot rock until adjacent to

gold Get a bomb Use the bomb to remove last

wall KL Divergence: 0.52

Reproduces basic strategy

Logistics Planning Gold Miner

CONCLUSIONS & EXTENSIONS

Learn user plan preferences Learned HTNs

capture preferences rather than domain abstractions

Evaluate predictive power Compare distributions

rather than structure

Preference obfuscation Poor graduate

student who prefers to travel by plane usually travels by car

Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09

l earning p robabilistic h ierarchical t ask n etworks to c apture u ser p references nan li,...

Documents