byzantine reinforcement learning

Distributed Computing Prof. R. Wattenhofer SA/MA: Byzantine Reinforcement Learning Reinforcement learning is a powerful tool to investigate environments that are described by Markov Decision Processes (MDPs). Given an action of the agent in some environment, the agent changes to a new state of the system according to the underlying MDP. The new state can give either positive or negative reward to the agent. The aim of the agent is to maximize its total reward. Just as every other system, reinforcement learning algorithms are prone to failures. The worst possible failures that can happen in a system are the Byzantine failures, that is, agents who are controlling parts of the system are deliberately disturb- ing the decision process. In this work we want to investigate how Byzantine behavior aﬀects reinforcement learning algorithms. We will assume that a Byzantine agent might control any step in the protocol - the action, observation, or they might even embody the agents themselves. Your task would be to simulate Byzantine behavior and derive protocols which are robust to this kind of malicious behavior. Requirements: An interested student should be familiar with Byzantine behavior, have knowledge in Deep Learning or a solid background in Machine Learning. Implementation experience is an advantage. The student should be able to work independently! Interested? Please contact us for more details! Contacts • Darya Melnyk: [email protected], ETZ G93 • Oliver Richter: [email protected], ETZ G63

Upload: others

Post on 09-May-2022

1 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Byzantine Reinforcement Learning

Distributed Computing

Prof. R. WattenhoferSA/MA:

Byzantine Reinforcement Learning

Reinforcement learning is a powerful tool to investigate environments that are described byMarkov Decision Processes (MDPs). Given an action of the agent in some environment, theagent changes to a new state of the system according to the underlying MDP. The new statecan give either positive or negative reward to the agent. The aim of the agent is to maximizeits total reward.

Just as every other system, re-inforcement learning algorithms areprone to failures. The worst possiblefailures that can happen in a systemare the Byzantine failures, that is,agents who are controlling parts ofthe system are deliberately disturb-ing the decision process.

In this work we want to investi-gate how Byzantine behavior affectsreinforcement learning algorithms. We will assume that a Byzantine agent might controlany step in the protocol - the action, observation, or they might even embody the agentsthemselves. Your task would be to simulate Byzantine behavior and derive protocols whichare robust to this kind of malicious behavior.

Requirements: An interested student should be familiar with Byzantine behavior, haveknowledge in Deep Learning or a solid background in Machine Learning. Implementationexperience is an advantage. The student should be able to work independently!

Interested? Please contact us for more details!

Contacts

• Darya Melnyk: [email protected], ETZ G93

• Oliver Richter: [email protected], ETZ G63

mailto:Darya Melnyk <[email protected]>

mailto:Oliver Richter <[email protected]>

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary

From Reinforcement Learning to Deep Reinforcement …fagostin/assets/files/...Keywords: Machine learning · Reinforcement learning Deep learning · Deep reinforcement learning 1 Introduction

Reinforcement Learning & Apprenticeship Learning

Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1

Reinforcement Learning - uni-freiburg.degki.informatik.uni-freiburg.de/.../recordings/reinforcement.pdf · Reinforcement Learning 3 What is Reinforcement Learning? Learning from interaction

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning