![Page 1: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/1.jpg)
Machine Learning
Chapter 13. Reinforcement
Learning
Tom M. Mitchell
![Page 2: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/2.jpg)
2
Control Learning
Consider learning to choose actions, e.g., Robot learning to dock on battery charger Learning to choose actions to optimize factory output Learning to play Backgammon
Note several problem characteristics: Delayed reward Opportunity for active exploration Possibility that state only partially observable Possible need to learn multiple tasks with same
sensors/effectors
![Page 3: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/3.jpg)
3
One Example: TD-Gammon
Learn to play Backgammon
Immediate reward +100 if win -100 if lose 0 for all other states
Trained by playing 1.5 million games against itself
Now approximately equal to best human player
![Page 4: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/4.jpg)
4
Reinforcement Learning Problem
![Page 5: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/5.jpg)
5
Markov Decision Processes
Assume finite set of states S set of actions A at each discrete time agent observes state st S and chooses ac
tion at A then receives immediate reward rt and state changes to st+1
Markov assumption : st+1 = (st, at ) and rt = r(st, at )– i.e., rt and st+1 depend only on current state and action– functions and r may be nondeterministic– functions and r not necessarily known to agent
![Page 6: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/6.jpg)
6
Agent's Learning Task
![Page 7: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/7.jpg)
7
Value Function
![Page 8: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/8.jpg)
8
![Page 9: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/9.jpg)
9
What to Learn
![Page 10: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/10.jpg)
10
Q Function
![Page 11: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/11.jpg)
11
Training Rule to Learn Q
![Page 12: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/12.jpg)
12
Q Learning for Deterministic Worlds
![Page 13: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/13.jpg)
13
![Page 14: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/14.jpg)
14
![Page 15: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/15.jpg)
15
Nondeterministic Case
![Page 16: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/16.jpg)
16
Nondeterministic Case(Cont’)
![Page 17: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/17.jpg)
17
Temporal Difference Learning
![Page 18: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/18.jpg)
18
Temporal Difference Learning(Cont’)
![Page 19: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649e415503460f94b32a61/html5/thumbnails/19.jpg)
19
Subtleties and Ongoing Research