new value-aware loss function for model-based reinforcement … · 2017. 6. 16. · model that...

11
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Value-Aware Loss Function for Model-based Reinforcement Learning Farahmand, A.-M.; Barreto, A.M.S.; Nikovski, D.N. TR2017-049 April 2017 Abstract We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error, number of samples, and the complexity of the model space. We also empirically compare the method with the maximum likelihood estimator on a simple problem. Artificial Intelligence and Statistics (AISTATS) This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2017 201 Broadway, Cambridge, Massachusetts 02139

Upload: others

Post on 03-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does

MITSUBISHI ELECTRIC RESEARCH LABORATORIEShttp://www.merl.com

Value-Aware Loss Function for Model-based ReinforcementLearning

Farahmand, A.-M.; Barreto, A.M.S.; Nikovski, D.N.

TR2017-049 April 2017

AbstractWe consider the problem of estimating the transition probability kernel to be used by amodel-based reinforcement learning (RL) algorithm. We argue that estimating a generativemodel that minimizes a probabilistic loss, such as the log-loss, is an overkill because it doesnot take into account the underlying structure of decision problem and the RL algorithmthat intends to solve it. We introduce a loss function that takes the structure of the valuefunction into account. We provide a finite-sample upper bound for the loss function showingthe dependence of the error on model approximation error, number of samples, and thecomplexity of the model space. We also empirically compare the method with the maximumlikelihood estimator on a simple problem.

Artificial Intelligence and Statistics (AISTATS)

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy inwhole or in part without payment of fee is granted for nonprofit educational and research purposes provided that allsuch whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi ElectricResearch Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and allapplicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall requirea license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Copyright c© Mitsubishi Electric Research Laboratories, Inc., 2017201 Broadway, Cambridge, Massachusetts 02139

Page 2: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 3: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 4: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 5: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 6: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 7: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 8: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 9: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 10: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does
Page 11: New Value-Aware Loss Function for Model-based Reinforcement … · 2017. 6. 16. · model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does