chapter 1: what is reinforcement learning? · 2019-06-11 · learning performance 200 400 800 1000...

74
Chapter 1: What is Reinforcement Learning?

Upload: others

Post on 10-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 1: What is Reinforcement Learning?

Page 2: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 3: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 4: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 5: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 6: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 2: OpenAI Gym

Page 7: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 8: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 9: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 10: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 3: Deep Learning with PyTorch

Page 11: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 12: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 13: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 4: The Cross-Entropy Method

Page 14: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 15: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 16: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 17: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 5: Tabular Learning and the Bellman Equation

Page 18: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 19: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 20: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 21: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 6: Deep Q-Networks

Page 22: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 23: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 7: DQN Extensions

Page 24: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 25: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 26: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 27: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 28: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 29: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 30: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 31: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 32: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 33: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 8: Stocks Trading Using RL

Page 34: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 35: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 36: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 37: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 38: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 39: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 9: Policy Gradients – An Alternative

Page 40: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 41: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 42: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 43: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 10: The Actor-Critic Method

Page 44: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 45: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 46: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 11: Asynchronous Advantage Actor-Critic

Page 47: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 48: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 12: Chatbots Training with RL

Page 49: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 50: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 51: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 52: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 13: Web Navigation

Page 53: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 54: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 55: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 56: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 57: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 58: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 59: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 60: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 14: Continuous Action Space

Page 61: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 62: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 63: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 15: Trust Regions – TRPO, PPO, and ACKTR

Page 64: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 65: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 66: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 16: Black-Box Optimization in RL

Page 67: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 68: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 17: Beyond Model-Free – Imagination

Page 69: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 70: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 71: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 72: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO

Chapter 18: AlphaGo Zero

Page 73: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO
Page 74: Chapter 1: What is Reinforcement Learning? · 2019-06-11 · Learning performance 200 400 800 1000 Solved after 205 episodes. Best 100-episode average reward was 18.42 + 0.76. (DoomDefendLine-vO