exploration and other applications of reinforcement...
TRANSCRIPT
![Page 1: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/1.jpg)
Exploration and Other Applications of Reinforcement Learning in Robotics
AS-84.4340Postgraduate Seminar in Automation Technology
Juhana Ahtiainen
![Page 2: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/2.jpg)
OutlineIntroductionExploration
Information gainMonte Carlo algorithmActive LocalizationMapping in Occupancy GridMulti robots extensionExploration for SLAM
Other Applications of Reinforcement LearningRecent advancesRoboCupHumanoid robots
SummaryExercise
![Page 3: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/3.jpg)
IntroductionReinforcement learning is a sub-area of machine learning concerned with how an agent ought to take actions in an environmentso as to maximize some notion of long-term rewardExploration is the problem of controlling a robot so as to maximize its knowledge about the external world
![Page 4: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/4.jpg)
Introduction The environment is usually modelled as a finite-state Markov Decision ProcessReinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those statesNever correct input-output pairs
![Page 5: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/5.jpg)
ExplorationExploration problem is paramount in robotics
Abandoned mines, nuclear disasters, Mars...Exploration problem comes in many forms
Acquire a map in a static environmentKnown pose
Moving factors (dynamic environment)E.g. pursuit evasion problem
Active localizationKnown map
SLAMVirtually anywhere in robotics
![Page 6: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/6.jpg)
POMDP and explorationFully subsumed by the POMDP framework?-POMDP in to an algorithm whose sole goal is to
maximize informationpayoff function = e.g information gain
Exploring using POMDP is often not a good idea
Number of unknown variables is hugeAlso the number of possible observations
![Page 7: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/7.jpg)
Ch 17: Practical algorithmsFor high-dimensional exploration problemAll are geedy (look-ahead is limited to only one exploration action)Exploration action can involve a sequence of control actions
e.g select a location anywhere in the map moving there is considered a single exploration action
![Page 8: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/8.jpg)
Information gainKey to exploration is informationEntropy Hp(x) of a probability distribution p is the expected information E[-log p]
Entropy is at its maximum when p is a uniform distribution and in its minimum when p is point-mass distributionIn exploration we seek to minimize the expected entropy of the belief after executing an action
![Page 9: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/9.jpg)
Information gainConditional entropy of the state x’ after executing action u and measuring z:
Information gain associated with action u in belief b is given by the difference:
Conditional entropy with measurement integrated out:
![Page 10: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/10.jpg)
Greedy techniquesExpected information lets us formulate the exploration problem as a decision theoretic problem addressed in the previous presentationsOptimal exploration maximizes the difference between the information gain and the costs
![Page 11: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/11.jpg)
Greedy techniques
Utility of u compute expected entropy after executing u and observing
Previous equation resolves to:
![Page 12: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/12.jpg)
Exploration techniquesMost of them are greedy
Optimal at time horizon 1Enormous branching factor in explorationGoal is to acquire new information
New belief stateAdjust policy
Exploration policies have to be highly reactive
![Page 13: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/13.jpg)
Monte Carlo explorationSamples state x from
momentary belief b
Samples also the next state x’ and corresponding measurement z
New posterior belief
Entropy-cost trade-off
Action with higest MC information gain-cost value
![Page 14: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/14.jpg)
Monte Carlo ExplorationMay still be very time consuming
Number of possible measurements can be huge
e.g. Robot with 24 ultrasonic sensort that report one byte of range data
25624 possible sonar scans in specific location
![Page 15: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/15.jpg)
Active localizationSimplest case of exploration when estimating the state of a low-dimensional variableHere we seek information about robots pose but have a map of the environmentMoving to right place can make localization very fast
![Page 16: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/16.jpg)
Active localization
Can be solved greedely but we need to define exploration actions differently
e.g. target locations in robots coordinate frameThis is ok if we can devise a low level module to map that action back into low-level controls
![Page 17: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/17.jpg)
Example
![Page 18: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/18.jpg)
Example
![Page 19: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/19.jpg)
Example
![Page 20: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/20.jpg)
Example
![Page 21: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/21.jpg)
Example
![Page 22: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/22.jpg)
Example
![Page 23: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/23.jpg)
Analysis of active localizationGreedy
Cannot compose multiple exploration actionsAction definition
Open loop control while moving no measurementsReal robot can abandon target point (closed door)
Not considered during planning
Performs well in practise
![Page 24: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/24.jpg)
Learning occupancy grid mapsMapping problems include many more unknown variablesWe treat the information gain as independent between different grid cellsHow to compute gain
EntropyExpected information gainBinary gain (frontier based exploration)
![Page 25: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/25.jpg)
Calculating the information gainEntropy
StraightforwardThe brighter the larger
Expected information gainEntropy only measures current informationRequires assumptions on the nature of information
![Page 26: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/26.jpg)
Calculating the information gainBinary Gain
Simplest of allBy far the most popularVery crude approximation of the expected information
Tends to work well in practiceCore of Frontier-based exploration
![Page 27: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/27.jpg)
Propagating gainDefinition of an exploration action
Simple but effective move to x-y location along minimum cost path, and then sense all the grid in a small circular diameter around the robotValue iteration the best greedy exploration action
![Page 28: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/28.jpg)
Learning OG maps -
summaryCrude approximationTotally ignores the information acquired as the robot movesTends to work well in practice
![Page 29: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/29.jpg)
Extension to Multi-Robot SystemsAcquire a map through cooperative explorationThe speed up is usually linear, might be even 2K
Single robot might have to traverse many areas twice
CoordinationStatic greedy task allocation techniques
![Page 30: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/30.jpg)
Value function for each robot
(minimum at the robots
pose)
Reset the gain map to zero in the vincity of the chosen cell
Optimal cell to explore for each robot
![Page 31: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/31.jpg)
Summary of multi robot explorationSimple...Each robot greedely picks a best available goal and prohibits other from picking the same cell. Easily trapped in a minimum
Crossing pathsImproved coordination tehniques enable robots to trade goals
![Page 32: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/32.jpg)
Example
![Page 33: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/33.jpg)
SLAM in explorationIn SLAM we do not know the map nor the poseWithout knowledge about the pose the integration of sensor information can lead to serious errorsRobot that only focuses on pose does not move
Entropy decomposition!
![Page 34: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/34.jpg)
Entropy decompositionFull SLAM posterior:
This implies:
The expectation is taken overSLAM entropy is the sum of the path entropy and the expected entropy of the map
![Page 35: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/35.jpg)
Derivation of decomposition
![Page 36: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/36.jpg)
Exploration in fastSLAMBased on grid-based fastSLAM (Ch 13.10)
Posterior by set of particlesEach particle contains a robot pathAlso occupancy grid map
FastSLAM exploration algorithm is a test-and-evaluate algorithm
Proposes a course of action for explorationEvaluates these actions by measuring the residual entropySelects action that minimize the resulting entropy
![Page 37: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/37.jpg)
FastSLAM SummaryFastSLAM Exploration algorithm is an extension to Monte Carlo exploration algorithm with two insights
Applies to the full sequence of controlsTwo types of entropies!
One pertaining to the robots pathOne to the map
![Page 38: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/38.jpg)
Example of SLAM
![Page 39: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/39.jpg)
Example of SLAM
![Page 40: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/40.jpg)
Example of SLAM
![Page 41: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/41.jpg)
Reinforcement Learning in RoboticsReinforcement learning offers one of the most general frame-work to take traditional robotics towards true autonomy and versatilityIn many well-defined, low dimensional, discrete problems
Backgammon (Tesauro 1994)Elevator control (Crites & Barto 1996)Helicopter control (Bagnell & Schneider 2001)
![Page 42: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/42.jpg)
Reinforcement Learning in RoboticsGoogle Scholar:
Results 1 - 10 of about 19,700 for Reinforcementlearning in robotics. Recent articles (since 2003)
4,340
![Page 43: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/43.jpg)
Recent advances (2003)Curse of dimensionalityHierarchical reinforcement learning
Temporal abstractionsDecisions not required at each step
Semi-Markov Decision ProcessesGeneralization of MDPTime between one decision and another is a random variable, real- or integer-valued
allows the decision maker to choose actions whenever the system state changesmodels the system evolution in continuous timeallows the time spent in a particular state to follow an arbitrary probability distribution
![Page 44: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/44.jpg)
Reinforcement Learning in RoboCup (http://www.robocup.org/)
Keepaway = keepers vs. takersMax 4 vs 3.
Large state space SMDP
In RoboCup alsoKick ball in to goal while avoiding an opponentFull team (11) learn collaborative passing and shooting (MC)Learn low level skills (drippling, passing, kicking)
![Page 45: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/45.jpg)
RL for Humanoid robotsApplying RL to high dimensional movement systems like humanoid robots remains an unsolved problemGreedy algorithms are likely to fail
Natural Actor-Critic (Peters et al. 2005)Efficiently optimize nonlinear motor primitivesBased on natural gradient formulation
![Page 46: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/46.jpg)
SummaryReinforcement learning has been applied succesfully on many areas of robotics
High dimensions are still a problemExploration is one application of RL
Maximize the knowledge gained by the robotActive localization – seeking pose, map knownMapping – pose known at all timesSLAM – decomposition of entropy, map and pose unknown
![Page 47: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/47.jpg)
References1.
S.
Thrun, W.
Burgard, and D.
Fox. Probabilistic Robotics. MIT Press, Cambridge, MA, 20052.
Wikipedia, http://en.wikipedia.org/wiki/Reinforcement_learning (Sutton, Richard S., and Barto, Andrew G. (1998) Reinforcement Learning: An Introduction MIT Press.
)3.
Barto, A. G. and Mahadevan, S. (2003) Recent Advances in Hierarchical Reinforcement Learning Discrete Event Dynamic Systems
vol. 13(4), pages 341 -
379 4.
Peter Stone, Richard S. Sutton, and Gregory Kuhlmann. Reinforcement Learning for RoboCup- Soccer Keepaway. Adaptive Behavior, 13(3):165–188, 2005
5.
Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. In: Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sept.29-30.
6.
Maja
J Matarić, Reinforcement Learning in the Multi-Robot Domain, Autonomous Robots, 4(1), Mar 1997, 73-83
7.
William D. Smart and Leslie Pack Kaelbling, Effective Reinforcement Learning for Mobile Robots, International Conference on Robotics and Automation, May 11-15, 2002
8.
Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2), 215–219.
9.
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems
(Vol. 8, pp. 1017–1023). Cambridge, MA: The MIT Press.10
.
Bagnell, J. A., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. In International Conference on Robotics and Automation
(pp. 1615–1620). IEEE.
11
.
Jan Peters, Sethu
Vijayakumar, Stefan Schaal
(2005), Natural Actor-Critic, in the Proceedings of the 16th European Conference on Machine Learning (ECML 2005).
![Page 48: Exploration and other applications of reinforcement ...automation.tkk.fi/attach/AS-84-4340/Exploration.pdf · Exploration and Other Applications of Reinforcement Learning in Robotics](https://reader033.vdocuments.us/reader033/viewer/2022052607/5a72955f7f8b9aac538da325/html5/thumbnails/48.jpg)
Exercise: