machine learning chapter 13. reinforcement learning tom m. mitchell

Machine Learning

Chapter 13. Reinforcement

Learning

Tom M. Mitchell

Control Learning

Consider learning to choose actions, e.g., Robot learning to dock on battery charger Learning to choose actions to optimize factory output Learning to play Backgammon

Note several problem characteristics: Delayed reward Opportunity for active exploration Possibility that state only partially observable Possible need to learn multiple tasks with same

sensors/effectors

One Example: TD-Gammon

Learn to play Backgammon

Immediate reward +100 if win -100 if lose 0 for all other states

Trained by playing 1.5 million games against itself

Now approximately equal to best human player

Reinforcement Learning Problem

Markov Decision Processes

Assume finite set of states S set of actions A at each discrete time agent observes state st S and chooses ac

tion at A then receives immediate reward rt and state changes to st+1

Markov assumption : st+1 = (st, at ) and rt = r(st, at )– i.e., rt and st+1 depend only on current state and action– functions and r may be nondeterministic– functions and r not necessarily known to agent

Agent's Learning Task

Value Function

What to Learn

Q Function

Training Rule to Learn Q

Q Learning for Deterministic Worlds

Nondeterministic Case

Nondeterministic Case(Cont’)

Temporal Difference Learning

Temporal Difference Learning(Cont’)

Subtleties and Ongoing Research

machine learning chapter 13. reinforcement learning tom m. mitchell

qq learning

current state

states sset of actions

backgammonimmediate

learnq functiontraining

problem characteristics

active explorationpossibility

delayed rewardopportunity

Documents

reinforcement learning introduction passive reinforcement...

reinforcement learning: learning algorithms

new 10703 deep reinforcement learning · 2018. 9. 12. ·...

generalization in reinforcement learning: successful...

reinforcement learning chapter 13 what is reinforcement...

universal reinforcement learning algorithms: …...

10703 deep reinforcement learning€¦ · 10/09/2018 ·...

the reinforcement learning toolbox – reinforcement...

deep learning for reinforcement learning in pacman · deep...

inverse reinforcement learning cs885 reinforcement

from reinforcement learning to deep reinforcement...

multi-objective reinforcement learning using sets of pareto...

reinforcement learning instructor: max welling source: t....

bayesian reinforcement learning -...

reinforcement learning lecture inverse reinforcement...

inverse reinforcement learning -...

hierarchical deep reinforcement learning: integrating...

reinforcement learning and deep reinforcement...

deep learning for reinforcement learning in · pdf filedeep...

10703 deep reinforcement learning€¦ · 12/9/2018 ·...