goal towards jumpy planning - github...

1
Performance Comparisons This Work Use a model-based planner together with a goal-conditioned policy trained with model-free learning. We use a model-based planner that operates at higher levels of abstraction i.e., decision states and use model-free RL between the decision states. Towards Jumpy Planning email: [email protected] Akilesh B 1,2 Suriya Singh 2,3 Anirudh Goyal 1,2 Alexander Neitz 4 Aaron Courville 1,2 1 Universite de Montreal 2 Mila 3 Polytechnique Montreal 4 MPI for Intelligent Systems 1. Overview 3. Jumpy Planning with Dynamical Models 4. Results References 1) Sutton et. al. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. AI Journal. 2) Hoang Le et al. Hierarchical Imitation and Reinforcement Learning. ICML 2018. 3) Goyal et al. InfoBot, Transfer and Exploration via the Information Bottleneck. ICLR 2019. 2. Jumpy Planning ICML | 2019 MBRL Workshop Model-free RL: high sample inefficiency and ignorance of the environment dynamics. Model-based RL at the scale of time-steps: compounding errors and high computational requirements. Hierarchical Reinforcement Learning framework [1, 2] address limitations in classic RL through sub-tasks and abstract actions. Jumpy Dataset BFS summary current state BFS summary current state BFS direction Local Planner Local Planner Local Planner Dynamical Model Query (a) action at root node (b) location of subgoal subgoal (c) BFS summary subgoal goal s goal subgoal Decision States [3] (aka subgoal) states where the agent’s policy has high entropy We fix such that a tiny fraction of states are chosen as decision states. Dynamical Models argmax or sample successively query M in BFS fashion until the goal state is encountered or maximum search depth is reached. Jumpy Dataset Funnel all intermediate states leading to the same or The result of query is further passed to the agent to take action at current decision state Generalisation Performance Dynamical models Fixed time step models Funneling Coverage #unique states visited #reachable states

Upload: others

Post on 17-Apr-2020

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: goal Towards Jumpy Planning - GitHub Pagessuriyasingh.github.io/towards_jumpy_planning_icml_mbrl_poster.pdf · email: akilesh.badrinaaraayanan@umontreal.ca Akilesh B 1,2 Suriya Singh

Performance Comparisons

This WorkUse a model-based planner together with a goal-conditioned policy trained with model-free learning. We use a model-based planner that operates at higher levels of abstraction i.e., decision states and use model-free RL between the decision states.

Towards Jumpy Planning

email: [email protected]

Akilesh B 1,2 Suriya Singh 2,3 Anirudh Goyal 1,2 Alexander Neitz 4 Aaron Courville 1,2

1 Universite de Montreal 2 Mila 3 Polytechnique Montreal 4 MPI for Intelligent Systems

1. Overview 3. Jumpy Planning with Dynamical Models

4. Results

References1) Sutton et. al. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. AI Journal.2) Hoang Le et al. Hierarchical Imitation and Reinforcement Learning. ICML 2018.3) Goyal et al. InfoBot, Transfer and Exploration via the Information Bottleneck. ICLR 2019.

2. Jumpy Planning

ICML | 2019MBRL Workshop

● Model-free RL: high sample inefficiency and ignorance of the environment dynamics.

● Model-based RL at the scale of time-steps: compounding errorsand high computational requirements.

Hierarchical Reinforcement Learning framework [1, 2] address limitations in classic RL through sub-tasks and abstract actions.

Jumpy Dataset

BFS

sum

mar

ycu

rren

t sta

teBF

S su

mm

ary

curr

ent s

tate

BFS direction

Local Planner Local Planner Local Planner Dynamical Model

Query

(a) action at root node

(b) location of subgoalsubgoal

(c) BFS summarysubgoal

goal

s

goal

subgoal

Decision States [3] (aka subgoal)states where the agent’s policy has high entropy

We fix such that a tiny fraction of states are chosen as decision states. Dynamical Models

argmax or samplesuccessively query M in BFS fashion until the goal state is encountered or maximum search depth is reached.

Jumpy Dataset

Funnel all intermediate states leading to the same

or

The result of query is further passed to the agent to take action at current decision state

Generalisation Performance

Dynamical models

Fixed time step models

Funneling

Coverage#unique states visited

#reachable states