real-world behavior is hierarchical -...
TRANSCRIPT
![Page 1: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/1.jpg)
Real-world behavior is hierarchicalHierarchical RL: W
hat is it?
1. set water temp
2. get wet
3. shampoo
4. soap
5. turn off water
6. dry off
add hot
success
add cold
wait 5sec
simplified control, disambiguation, encapsulation
1. pour coffee
2. add sugar
3. add milk
4. stir
![Page 2: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/2.jpg)
Hierarchical Reinforcement Learning
• Exploits domain structure to facilitate learning– Policy constraints– State abstraction
• Paradigms: Options, HAMs, MaxQ• MaxQ task hierarchy
– Directed acyclic graph of subtasks– Leaves are the primitive MDP actions
• Traditionally, task structure is provided as prior knowledge to the learning agent
![Page 3: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/3.jpg)
![Page 4: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/4.jpg)
S: start G: goalOptions: going to doorsActions: + 2 door options
HRL: a toy exampleHierarchical RL: W
hat is it?
![Page 5: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/5.jpg)
Advantages of HRL1. Faster learning
(mitigates scaling problem)
Hierarchical RL: What is it?
RL: no longer ‘tabula rasa’
2. Transfer of knowledge from previous tasks(generalization, shaping)
![Page 6: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/6.jpg)
![Page 7: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/7.jpg)
![Page 8: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/8.jpg)
Solution
![Page 9: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/9.jpg)
Disadvantages (or: the cost) of HRLHierarchical RL: W
hat is it?
1. Need ‘right’ options ‐ how to learn them?2. Suboptimal behavior (“negative transfer”; habits)3. More complex learning/control structure
no free lunches…
![Page 10: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/10.jpg)
Example problem
![Page 11: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/11.jpg)
![Page 12: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/12.jpg)
![Page 13: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/13.jpg)
![Page 14: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/14.jpg)
Semi‐Markov Decision Process
• Generalizes MDPs• Action a takes N steps to complete in s• P(s’,n | a, s), R(s’, N | a, s)• Bellman equation:
![Page 15: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/15.jpg)
![Page 16: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/16.jpg)
Observations
![Page 17: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/17.jpg)
Learning with partial policies
![Page 18: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/18.jpg)
Hierarchies of Abstract Machines (HAM)
![Page 19: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/19.jpg)
![Page 20: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/20.jpg)
![Page 21: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/21.jpg)
Learn policies for a given set of sub‐tasks
![Page 22: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/22.jpg)
Learning hierarchical sub‐tasks
![Page 23: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/23.jpg)
![Page 24: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/24.jpg)
Example
Locally optimal Optimal for the entire task
![Page 25: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/25.jpg)
Taxi Domain• Motivational Example• Reward: ‐1 actions,‐10 illegal, 20 mission.
• 500 states• Task Graph:
![Page 26: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/26.jpg)
HSMQ Alg. (Task Decomposition)
![Page 27: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/27.jpg)
MAXQ
• Break original MDP into multiple sub‐MDP’s• Each sub‐MDP is treated as a temporally extended action
• Define a hierarchy of sub‐MDP’s (sub‐tasks)
• Each sub‐task Mi defined by:– T = Set of terminal states– Ai = Set of child actions (may be other sub‐tasks)– R’i = Local reward function
![Page 28: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/28.jpg)
MAXQ Alg. (Value Fun. Decomposition)
• Want to obtain some sharing (compactness) in the representation of the value function.
• Re‐write Q(p, s, a) as
where V(a, s) is the expected total reward while executing action a, and C(p, s, a) is the expected reward of completing parent task pafter a has returned
![Page 29: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/29.jpg)
Hierarchical Structure• MDP decomposed in task M0, … , Mn
• Q for the subtask i
![Page 30: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/30.jpg)
Value Decomposition
![Page 31: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/31.jpg)
MAXQ Alg. • An example
![Page 32: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/32.jpg)
Value Decomposition
• The value function can be decomposed as follows
![Page 33: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/33.jpg)
MAXQ Alg. (cont’d)
![Page 34: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/34.jpg)
MAXQ Alg. (cont’d)
![Page 35: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/35.jpg)
State Abstraction
Three fundamental forms• Irrelevant variablese.g. passenger location is irrelevant for the navigate and put subtasks and it thus could be ignored.
• Funnel abstractionA funnel action is an action that causes a larger number of initial states to be mapped into a small number of resulting states. E.g., the navigate(t) action maps any state into a state where the taxi is at location t. This means the completion cost is independent of the location of the taxi—it is the same for all initial locations of the taxi.
![Page 36: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/36.jpg)
State Abstraction (cont’d)
• Structure constraints‐ E.g. if a task is terminated in a state s, then there is no need to
represent its completion cost in that state
‐ Also, in some states, the termination predicate of the child task implies the termination predicate of the parent task
Effect‐ reduce the amount memory to represent the Q‐function. 14,000 q values required for flat Q‐learning 3,000 for HSMQ (with the irrelevant‐variable abstraction632 for C() and V() in MAXQ
‐ learning faster
![Page 37: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/37.jpg)
State Abstraction (cont’d)
![Page 38: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/38.jpg)
Wargus Resource‐Gathering Domain
![Page 39: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/39.jpg)
Induced Wargus HierarchyRoot
Harvest WoodHarvest Gold
Get Gold Get Wood
Goto(loc)
Mine Gold Chop WoodGDeposit
Put Gold Put Wood
WGoto(townhall)GGoto(goldmine) WGoto(forest)GGoto(townhall)
WDeposit
![Page 40: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/40.jpg)
Induced Abstraction & TerminationTask Name State Abstraction Termination ConditionRoot req.gold, req.wood req.gold = 1 && req.wood = 1
Harvest Gold req.gold, agent.resource, region.townhall req.gold = 1
Get Gold agent.resource, region.goldmine agent.resource = gold
Put Gold req.gold, agent.resource, region.townhall agent.resource = 0
GGoto(goldmine) agent.x, agent.y agent.resource = 0 && region.goldmine = 1
GGoto(townhall) agent.x, agent.y req.gold = 0 && agent.resource = gold && region.townhall = 1
Harvest Wood req.wood, agent.resource, region.townhall req.wood = 1
Get Wood agent.resource, region.forest agent.resource = wood
Put Wood req.wood, agent.resource, region.townhall agent.resource = 0
WGoto(forest) agent.x, agent.y agent.resource = 0 && region.forest = 1
WGoto(townhall) agent.x, agent.y req.wood = 0 && agent.resource = wood && region.townhall = 1
Mine Gold agent.resource, region.goldmine NA
Chop Wood agent.resource, region.forest NA
GDeposit req.gold, agent.resource, region.townhall NA
WDeposit req.wood, agent.resource, region.townhall NA
Goto(loc) agent.x, agent.y NA
Note that because each subtask has a unique terminal state, Result Distribution Irrelevance applies
![Page 41: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/41.jpg)
Claims
• The resulting hierarchy is unique– Does not depend on the order in which goals and trajectory sequences are analyzed
• All state abstractions are safe– There exists a hierarchical policy within the induced hierarchy that will
reproduce the observed trajectory– Extend MaxQ Node Irrelevance to the induced structure
• Learned hierarchical structure is “locally optimal”– No local change in the trajectory segmentation can improve the state abstractions (very weak)
![Page 42: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/42.jpg)
Experimental Setup
• Randomly generate pairs of source‐target resource‐gathering maps in Wargus
• Learn the optimal policy in source
• Induce task hierarchy from a single (near) optimal trajectory
• Transfer this hierarchical structure to the MaxQ value‐function learner for target
• Compare to direct Q learning, and MaxQ learning on a manually engineered hierarchy within target
![Page 43: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/43.jpg)
Hand‐Built Wargus Hierarchy
Root
Get Gold Get Wood
Goto(loc)Mine Gold Chop Wood Deposit
GWDeposit
![Page 44: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/44.jpg)
Hand‐Built Abstractions & Terminations
Task Name State Abstraction Termination Condition
Root req.gold, req.wood, agent.resource req.gold = 1 && req.wood = 1
Harvest Gold agent.resource, region.goldmine agent.resource ≠ 0
Harvest Wood agent.resource, region.forest agent.resource ≠ 0
GWDeposit req.gold, req.wood, agent.resource, region.townhall agent.resource = 0
Mine Gold region.goldmine NA
Chop Wood region.forest NA
Deposit req.gold, req.wood, agent.resource, region.townhall NA
Goto(loc) agent.x, agent.y NA
![Page 45: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/45.jpg)
Results: WargusWargus domain: 7 reps
-1000
0
1000
2000
3000
4000
5000
6000
7000
8000
0 10 20 30 40 50 60 70 80 90 100Episode
Tota
l Dur
atio
n
Induced (MAXQ)Hand-engineered (MAXQ)No transfer (Q)
![Page 46: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/46.jpg)
Limitations
• Recursively optimal not necessarily optimal• Model‐free Q‐learningModel‐based algorithms (that is, algorithms that try to learn P(s’|s,a) and R(s’|s,a)) are generally much more efficient because they remember past experience rather than having to re‐experience it.
![Page 47: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/47.jpg)
Planning, Acting, Learning
• On‐line planning• RL Learning• Dyna‐Q
starting states and actions for the simulated experiences generated by the model
RL methods to the simulated experiences just as if they had really happened
The reinforcement learning method is thus the "final common path" for both learning and planning
![Page 48: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/48.jpg)
Planning, Acting, Learning
• Dyna‐Q alg.
![Page 49: Real-world behavior is hierarchical - unina.itwpage.unina.it/alberto.finzi/didattica/SGRB/materiale/HRL.pdf · Real-world behavior is hierarchical Hierarchical RL: What is it? 1](https://reader030.vdocuments.us/reader030/viewer/2022040411/5ed400798d46b66d2263406a/html5/thumbnails/49.jpg)
References and Further Reading• Sutton, R., Barto, A., (2000) Reinforcement Learning: an
Introduction, The MIT Presshttp://www.cs.ualberta.ca/~sutton/book/the‐book.html
• Kaelbling, L., Littman, M., Moore, A., (1996) Reinforcement Learning: a Survey, Journal of Artificial Intelligence Research, 4:237‐285
• Barto, A., Mahadevan, S., (2003) Recent Advances in Hierarchical Reinforcement Learning, Discrete Event Dynamic Systems: Theory and Applications, 13(4):41‐77