effective reinforcement learning for mobile robots smart, d.l and kaelbing, l.p

18
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.

Upload: sharon-jefferson

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Effective Reinforcement Learning for Mobile Robots

Smart, D.L and Kaelbing, L.P.

Page 2: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Content

Background Review Q-learning Reinforcement learning on mobile robots Learning framework Experimental results Conclusion Discussion

Page 3: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Background

Hard to code behaviour efficiently and correctly Reinforcement learning: tell the robot what to

do, not how to do it How well suited is reinforcement learning for

mobile robots?

Page 4: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Review Q-learning

Discrete states s and actions a Learn value function by observing rewards

– Actual function Q*(s,a) = E[R(s,a) + max Q*(s’,a’)]– Learn by

Q(st,at) = (1-) Q(st,at) + (rt+1 + max Q(st+1,a’))

Sample distribution has no effect on learned policy *(s) = argmax Q*(s,a)

Page 5: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Reinforcement learning on mobile robots

Sparse reward function– Almost always zero reward R(s,a)– Non-zero reward only when on success or failure

Continuous environment– HEDGER is used as a function approximator– Function approximation can be used when it never

extrapolates from the data

Page 6: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Reinforcement learning on mobile robots

Q-learning can only be successful when a state with positive reward can be found

Sparse reward function and continuous environment cause reward states to be hard to find by trial and error

Solution: show robot how to find the reward states

Page 7: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Learning framework

Split learning into two phases:– Phase one: actions are controlled by exterior force,

learning algorithm only passively observes– Phase two: learning algorithm learns optimal policy

By ‘showing’ the robot where the interesting states are, learning should be quicker

Page 8: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental setup

Two experiments on B21r mobile robot– Movement speed is fixed by outside force– Rotation speed has to be learned – Settings = 0.2, = 0.99 or 0.90

Performance is measured after every 5 runs– Robot does not learn from these test– Starting position and orientation similar, not identical

Page 9: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental Results:Corridor Following Task

State space: – distance to end of corridor– distance to left wall as fraction of corridor width– angle to target point

Page 10: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental Results:Corridor Following Task

Computer controlled teacher– Rotation speed is a fraction of the angle

Page 11: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental Results:Corridor Following Task

Human controlled teacher– Different corridor than computer controlled teacher

Page 12: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental Results:Corridor Following Task Results

Decrease in performance after training– Phase 2 supplies more novel experiences

Sloppy human controller causes faster convergence than rigid computer controller– Fewer phase 1 and phase 2 runs– Human controller supplies more varied data

Page 13: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental Results:Corridor Following Task Results

Simulated performance without advantage of teacher examples

Page 14: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental Results:Obstacle Avoidance Task

State space: – direction and distance to obstacles– direction and distance to target

Page 15: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental Results:Obstacle Avoidance Task Results

Human controlled teacher– Robot starts 3m from target, random orientation

Page 16: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Experimental Results:Obstacle Avoidance Task Results

Simulation without teacher examples– No obstacles present; robot only must reach goal– Simulated robot starts in the right orientation– 3 meters from target: 18.7% reached target in one

week of simulated time, taking 6.54 hours on average

Page 17: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Conclusion

Passive observation of appropriate state-action behaviour can speed up Q-learning

Knowledge about the robot or the learning algorithm is not necessary

Any solution will work, providing a good solution is not necessary

Page 18: Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P

Discussion