reinforcement learning : a beginners tutorial
DESCRIPTION
This a presentation of a Reinforcement Learning tutorial for beginners which I worked on.TRANSCRIPT
REINFORCEMENT LEARNINGA Beginner’s Tutorial
By: Omar Enayet
(Presentation Version)
The Problem
Agent-Environment Interface
Environment Model
Goals & Rewards
Returns
Credit-Assignment Problem
Markov Decision Process
t
An MDP is defined by < S, A, p, r, >S - set of states of the environmentA(s) – set of actions possible in state s - probability of transition from s
- expected reward when executing a in s - discount rate for expected reward
Assumption: discrete time t = 0, 1, 2, . . .
sr
s sr
s. . .t a
t +1t +1
t +1a
rt +2
t +2t +2
at +3
t +3. . .
t +3a
rsa
pss'a
Value Functions
Value Functions
Value Functions
Optimal Value Functions
Exploration-Exploitation Problem
Policies
Elementary Solution Methods
Dynamic Programming
Perfect Model
Bootstrapping
Generalized Policy Iteration
Efficiency of DP
Monte-Carlo Methods
Episodic Return
Advantages over DP•No Model
•Simulation OR part of Model
•Focus on small subset of states
•Less Harmed by violations of Markov Property
First Visit VS Every-Visit
On-Policy VS Off-Policy
Action-value instead of State-value
Temporal-Difference Learning
Advantages of TD Learning
SARSA (On-Policy)
Q-Learning (Off-Policy)
Actor-Critic Methods(On-Policy)
R-Learning (Off-Policy)>>Average Expected reward per time-step
Eligibility Traces
REFERENCES
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.
Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003
SLIDES FOR READING WITH : Omar Enayet – Reinforcement Learning : A
Beginner’s Tutorial - 2009