markov games as a framework for multi-agent reinforcement learning mike l. littman

18
Reinforcement Learning Presentation Reinforcement Learning Presentation Markov Games as a Framework Markov Games as a Framework for Multi-agent Reinforcement for Multi-agent Reinforcement Learning Learning Mike L. Littman Mike L. Littman Jinzhong Niu March 30, 2004

Upload: miyoko

Post on 05-Jan-2016

53 views

Category:

Documents


1 download

DESCRIPTION

Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman. Jinzhong Niu March 30, 2004. Overview. MDP is capable of describing only single-agent environments. New mathematical framework is needed to support multi-agent reinforcement learning. Markov Games - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Reinforcement Learning PresentationReinforcement Learning Presentation

Markov Games as a Framework for Markov Games as a Framework for Multi-agent Reinforcement LearningMulti-agent Reinforcement Learning

Mike L. LittmanMike L. Littman

Markov Games as a Framework for Markov Games as a Framework for Multi-agent Reinforcement LearningMulti-agent Reinforcement Learning

Mike L. LittmanMike L. Littman

Jinzhong Niu

March 30, 2004

Page 2: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 2

Overview

MDP is capable of describing only single-agent

environments.

New mathematical framework is needed to support

multi-agent reinforcement learning.

Markov Games

A single step in this direction is explored.

2-player zero-sum Markov Games

Page 3: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 3

Definitions

Markov Decision Process (MDP)

Page 6: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 6

2P-MG Is Capable?

Precludes cooperation!

Generalizes

MDPs (when |O|=1)

The opponent has a constant behavior, which may be

viewed as part of the environment.

Matrix Games (when |S|=1)

The environment doesn’t hold any information and rewards

are totally decided by the actions.

Yes

Page 7: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 7

Matrix Games

Example – “rock, paper, scissors”

Page 8: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 8

What does ‘optimality’ exactly mean?

MDPA stationary, deterministic, and undominated optimal policy always exists.

MGThe performance of a policy depends on the opponent’s policy, so we cannot evaluate them without context.

New definition of ‘optimality’ in game theory Performs best at its worst case compared with others

At least one optimal policy exists, which may or may not be deterministic because the agent is uncertain of its opponent’s move.

Page 9: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 9

Finding Optimal Policy - Matrix Games

The optimal agent’s minimum expected reward should be as large as possible.

Use V to express the minimum value, then consider how to maximize it

Page 11: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 11

Finding Optimal Policy – 2P-MG

Value of a state

Quality of a s-a-o triple

V(s)

Q(s,a3,o3)Q(s,a2,o2)Q(s,a1,o1)

o1

o2

o3

V(s,o2)

min

(s,a1) (s,a2)(s,a3)

Page 12: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 12

Learning Optimal Polices

Q-learning

minimax-Q learning

Page 13: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 13

Minimax-Q Algorithm

Page 14: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 14

Experiment - Problem

Soccer

Page 15: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 15

Experiment - Training

4 agents trained through 106 stepsminimax-Q learning

vs. random opponent - MR

vs. itself - MM

Q-learningvs. random opponent - QR

vs. itself - QQ

Page 16: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 16

Experiment - Testing

Test 3QR, QQ – 100% loser?

Test 1QR > MR?

Test 2QR<<QQ?

Page 17: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 17

Contributions

A solution to 2-player Markov games with a modified Q-learning method in which minimax is in place of max

Minimax can also be used in single-agent environments to avoid risky behavior.

Page 18: Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman

Markov Games as a Framework for Multi-agent Reinforcement Learning 18

Future work

Possible performance improvement of the minimax-Q learning method

Linear programming caused large computational complexity.

Iterative methods may be used to get approximate solutions to minimax much faster, which is sufficiently satisfactory.