l earning p robabilistic h ierarchical t ask n etworks to c apture u ser p references nan li,...
TRANSCRIPT
![Page 1: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/1.jpg)
LEARNING PROBABILISTIC HIERARCHICAL TASK NETWORKS TO CAPTURE USER PREFERENCESNan Li, Subbarao Kambhampati, and Sungwook YoonSchool of Computing and InformaticsArizona State UniversityTempe, AZ 85281 [email protected], [email protected], [email protected] Thanks to William Cushing
A riddle for you:
What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?
![Page 2: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/2.jpg)
TWO TALES OF HTN PLANNING
Abstraction Efficiency Top-down
o Preference handlingo Qualityo Bottom-up
Learning Most work o Our work
![Page 3: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/3.jpg)
Hitchhike? No way!
Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2
Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8
Phike: Hitchhike(source, dest) 0
LEARNING USER PLAN PREFERENCES
![Page 4: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/4.jpg)
LEARNING USER PREFERENCES AS PHTNS
Given a set O of plans executed by the user Find a generative model, Hl
Hl = argmaxH p (O |H)
Probabilistic Hierarchical Task Networks(pHTNs)
S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
![Page 5: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/5.jpg)
LEARNING pHTNs
HTNs can be seen as providing a grammar of desired solutions Actions Words Plans Sentences HTNs Grammar HTN learning Grammar induction
pHTN learning by probabilistic context free grammar (pCFG) induction Assumptions: parameter-less, unconditional
S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
![Page 6: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/6.jpg)
A TWO-STEP ALGORITHM
• Greedy Structure Hypothesizer: Hypothesizes the
schema structure
• Expectation-Maximization (EM) Phase: Refines schema
probabilities Removes redundant
schemas
Generalizes Inside-Outside Algorithm (Lary & Young, 1990)
![Page 7: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/7.jpg)
GREEDY STRUCTURE HYPOTHESIZER
Structure learning Bottom-up Prefer recursive to non-recursive
![Page 8: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/8.jpg)
EM PHASE
E Step: Plan parse tree
computation Most probable parse
tree M Step:
Selection probabilities update
s: ai p, aj ak
![Page 9: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/9.jpg)
EVALUATION
Ideal: User studies (too hard) Our approach:
Assume H* represents user preferences Generate observed plans using H* (H* O) Learn Hl from O (O Hl) Compare H* and Hl (H* T*, Hl Tl)
Syntactic similarity is not important, only distribution is
Use KL-Divergence between distributions T*, Tl
KL-Divergence measures distance between distributions
Domains Randomly Generated Logistics Planning, Gold Miner
H*
P1, P2, …Pn
Learner
Hl
![Page 10: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/10.jpg)
RATE OF LEARNING AND CONCISENESS
Rate of Learning Conciseness
More training plans, better schemas.
• Small domains, 1 or 2 more non-primitive actions• Large domains, much more non-primitive actions• Refine structure learning?
Randomly Generated Domains
![Page 11: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/11.jpg)
EFFECTIVENESS OF EM
• Compare greedy schemas with learned schemas• EM step is very effective in capturing user preferences
Randomly Generated Domains
![Page 12: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/12.jpg)
“BENCHMARK” DOMAINS
H*: Move by plane or
truck Prefer plane Prefer fewer steps
KL Divergence: 0.04 Recovers
plane > truck less steps > more
steps
H*: Get the laser cannon Shoot rock until adjacent to
gold Get a bomb Use the bomb to remove last
wall KL Divergence: 0.52
Reproduces basic strategy
Logistics Planning Gold Miner
![Page 13: L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f125503460f94c25bbc/html5/thumbnails/13.jpg)
CONCLUSIONS & EXTENSIONS
Learn user plan preferences Learned HTNs
capture preferences rather than domain abstractions
Evaluate predictive power Compare distributions
rather than structure
Preference obfuscation Poor graduate
student who prefers to travel by plane usually travels by car
Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09