real-world#games# - duke university · 2010-11-20 · m.#wellman# nov-2010 wrane# 1...

M. Wellman Nov-‐20-‐10

WRANE 1

Empirical Game-‐Theore;c Analysis for Prac;cal Strategic Reasoning

Michael P. Wellman University of Michigan

Workshop on Reasoning in Adversarial and Noncoopera5ve Environments

Real-‐World Games

•  rich strategy space –  strategy: obs* × ;me ac;on

•  severely incomplete informa;on –  interdependent types (signals) –  info par;ally revealed over ;me

➙ analy;c game-‐theore;c solu;ons few and far between

two approaches 1.  analyze (stylized)

approxima;ons –  one-‐shot, complete info…

2.  empirical methods –  simula;on, sta;s;cs,

machine learning,…

complex dynamics and uncertainty


WRANE 2

Empirical Game-‐Theore;c Analysis (EGTA)

•  Game described procedurally, no directly usable analy;cal form

•  Parametrize strategy space based on agent architecture

•  Selec;vely explore strategy/profile space •  Induce game model (payoff func;on) from simula;on data Empirical game

EGTA Process

Simulator Profile Payoff Data

Empirical Game

Strategy Set

Profile Space

Game Analysis (NE)

1. Parametrize strategy space

2. Es;mate empirical game

3. Solve empirical game


WRANE 3

TAC Supply Chain Mgmt Game

Pintel IMD

Basus Macrostar

Mec Queenmax Watergate

Mintor

CPU

Mot

herb

oard

Mem

ory

Har

d D

isk

suppliers Manufacturer 1

Manufacturer 2

Manufacturer 3

Manufacturer 4

Manufacturer 5

Manufacturer 6

manufacturers

customer

component RFQs

supplier offers

component orders

PC RFQs

PC bids

PC orders

10 component types 16 PC types 220 simulation days 15 seconds per day

Two-‐Strategy Game (Unpreempted)


WRANE 4

Two-‐Strategy Game (Unpreempted)

Three-‐Strategy Game: Devia;ons


WRANE 5

TAC/SCM-‐06 Devia;on Graph

concentric regret regions

CDA Devia;on Graph

4 strategies: GD, GDX, ZI, Kap

[3GD, 1GDX]

[4GDX]

[1GD, 3Kap]


WRANE 6

Ranking Strategies: TAC/SCM-‐07

SCM-‐07 EGTA SCM-‐07 Tournament

from PR Jordan PhD Thesis, 2009

!!"## !!$## !!### !%## !&## !"## !$## # $##

$!

!'!$

!!''

'%!"

$!"!

!(')

'"'&

'(!#

$'"&

"(!*

$*$)

&%

$$$%

'$"

$&!&

'#'*

!%)

$('

*!)

"#'!

""$#

")"'

($"

"$"*

(#

+,-./0.1234/.2

Strategy Ranking (TAC Travel) Strategies ranked with respect to the final equilibrium context

from LJ Schvartzman PhD Thesis, 2009


WRANE 7

DeepMaize-‐08 Design Explora;on

from PR Jordan PhD Thesis, 2009

Itera;ve EGTA Process


Empirical Game

Strategy Set

Profile Space

Game Analysis (NE)

Strategy Space Refine?

End

N

More Strategies

More Samples

Select

Sampling Control Problem

Game Model Induc;on Problem

JW, AAMAS-‐09

Add Strategy

Strategy Explora;on Problem

JSW, AAMAS-‐10

JVW, AAMAS-‐08


WRANE 8


•  Revealed payoff model – sample provides exact payoff

– minimum-‐regret-‐first search (MRFS) •  ajempts to refute best current candidate

•  Noisy payoff model – sample drawn from payoff distribu;on –  informa;on gain search (IGS)

•  sample profile maximizing entropy difference wrt probability of being min-‐regret profile

c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 0

Min-‐Regret-‐First Search


WRANE 9

c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 0 (r1,c2) 2

Select random devia;on from current best profile

Min-‐Regret Search

c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 0 (r1,c2) 2 (r2,c1) 3



WRANE 10

c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 0 (r1,c2) 2 (r2,c1) 3 (r3,c1) 7


c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 3 (r1,c2) 5 (r2,c1) 3 (r3,c1) 7 (r1,c4) 0



WRANE 11

c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 3 (r1,c2) 5 (r2,c1) 3 (r3,c1) 7 (r1,c4) 1 (r2,c4) 1


c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 3 (r1,c2) 5 (r2,c1) 4 (r3,c1) 7 (r1,c4) 1 (r2,c4) 5 (r2,c2) 0



WRANE 12

c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 3 (r1,c2) 5 (r2,c1) 4 (r3,c1) 7 (r1,c4) 1 (r2,c4) 5 (r2,c2) 0 (r2,c3) 8


c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 3 (r1,c2) 5 (r2,c1) 4 (r3,c1) 7 (r1,c4) 1 (r2,c4) 5 (r2,c2) 0 (r2,c3) 8 (r3,c2) 6



WRANE 13

c1 c2 c3 c4

r1 9,5 3,3 2,5 4,8

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Profile ε-bound (r1,c1) 3 (r1,c2) 5 (r2,c1) 4 (r3,c1) 7 (r1,c4) 1 (r2,c4) 5 (r2,c2) 0* (r2,c3) 8 (r3,c2) 6 (r4,c2) 6


Finding Approximate PSNE


WRANE 14



Empirical Game

Strategy Set

Profile Space

Game Analysis (NE)


End

N

More Strategies

More Samples

Select



JW, AAMAS-‐09

Add Strategy


JSW, AAMAS-‐10

Construct Empirical Game

•  Simplest approach: direct es;ma;on – employ control variates and other variance reduc;on techniques

(s1,u(s1))

(sL,u(sL))

...

? u(•)

Empirical Game

Payoff data from selected profiles


WRANE 15

Payoff Func;on Regression

Si = [0,1]

eq = (0.32,0.32)

solve learned game

learn regression

3,3 1,0 1,1

4,1 2,2 4,1

1,1 1,4 3,3 0

0.5

1

0 0.5 1

generate data (simula;ons) FPSB2 Example

Vorobeychik et al., ML 2007

Generaliza;on Risk Approach

•  Model varia;ons –  func;onal forms, rela;onship structures, parameters

–  strategy granularity •  Approach:

–  Treat candidate game model as a predictor for payoff data

–  Adopt loss func;on for predictor

–  Select model candidate minimizing expected loss

Cross Valida4on

Observa5on Data

Training

Fold 2 Fold 3 Fold 1

Valida5on


WRANE 16

Sensi;vity Analysis

392 two-strategy mixtures

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Regret Bound

Fre

qu

en

cy

[49]0.275 [50]0.725 (3961.62, 0.00, 33.01,100.0%)

[49]1.000 (3972.78, 16.14,115.89,100.0%)



Empirical Game

Strategy Set

Profile Space

Game Analysis (NE)


End

N

More Strategies

More Samples

Select



Add Strategy


JSW, AAMAS-‐10


WRANE 17

Simulator

Game Analysis (NE)

Strategy Set

Profile Space

Profile Payoff Data

Empirical Game

RL: Best response to NE Refine?

Add new Strategy

Deviates? Improve RL Model?

Online Learning

N Y

Y

End N

N

More Strategies

More Samples

Select

New Strategy

Learning New Strategies: EGTA+RL

CDA Learning Problem Setup

H1: Moving average H2: Frequency weighted ra;o, threshold= V H3: Frequency weighted ra;o, threshold= A

Q1: Opposite role Q2: Same role

T1: Total T2: Since last trade

U: Number of trades leq V: Value of next unit to be traded

History of recent trades

Quotes

Time

Pending Trades

State Space

Ac;ons

A: Offset from V

Rewards

R: Difference between unit valua;on and trade price


WRANE 18

EGTA/RL Round 1

Strategies Payoff NE Learning

Strategy Dev. Payoff

Kaplan ZI ZIbtq

248.1 1.000 ZI L1 268.7

L1 242.5 1.000 L1

EGTA/RL Round 2


Strategy Dev. Payoff

Kaplan ZI ZIbtq

248.1 1.000 ZI L1 268.7

L1 242.5 1.000 L1

ZIP 248.0 1.000 ZIP

GD 248.6 1.000 GD L2-‐L8 L9

-‐-‐-‐ 251.8

L9 246.1 0.531 GD 0.469 L9

L10 252.1


WRANE 19

EGTA/RL Rounds 3+


Strategy Dev. Payoff … … … … …

L10 248.0 0.191 GD 0.809 L10

L11 251.0

L11 246.2 1.000 L11

GDX 245.8 0.192 GDX 0.808 L11

L12 248.3

L12 245.8 0.049 L11 0.951 L12

L13 245.9

L13 245.6 0.872 L12 0.128 L13

L14 245.6

RB 245.6 0.872 L12 0.128 L13

Final champion


•  Premise: –  Limited ability to cover profile space –  Expecta;on to reasonably evaluate all considered strategies

•  Need deliberate policy to decide which strategies to introduce

•  RL for strategy explora;on –  ajempt at best response to current equilibrium

–  is this a good heuris;c (even assuming ideal BR calc?)


WRANE 20

Example"A1 A2 A3 A4

A1 1, 1 1, 2 1, 3 1, 4

A2 2, 1 2, 2 2, 3 2, 6

A3 3, 1 3, 2 3, 3 3, 8

A4 4, 1 6, 2 8, 3 4, 4

Strategy Set Candidate Eq. Regret wrt True Game

{A1} (A1,A1) 3

{A1,A2} (A2,A2) 4

{A1,A2,A3} (A3,A3) 5

{A1,A2,A3,A4} (A4,A4) 0

Introduce strategies in order: A1, A2, A3, A4

Regret may increase over subsequent steps!"

00.2

0.40.6

0.81

0

0.5

1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

ki

kj

!(k

j)

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1BR

E(DEV)

DEV [MESH]

FPSB2 Regret Surface


WRANE 21

Explora;on Policies

•  RND: Random (uniform) selec;on •  Devia;on-‐Based

–  DEV: Uniform among strategies that deviate from current equilibrium

–  BR: Best response to current equilibrium –  BR+DEV: Alternate on successive itera;ons –  ST(τ): Soqmax selec;on among deviators, propor;onal to gain

•  MEMT: –  Select strategy that maximizes the gain (regret) from devia;ng to a strategy outside the set from any mixture over the set.

CDA↓4"

Step

Exp

ecte

d R

eg

ret

1 3 5 7 9 11 13

10!3

10!2

10!1

100

101

102

103

MEMT

DEV

RND

BR

ST10

ST1

ST0.1


WRANE 22

EGTA Applica;ons

•  Market games – TAC: Travel, Supply Chain, Ad Auc;on – Canonical auc;ons: SAAs, CDAs – Equity premium in financial trading

•  Networking games – privacy ajacks, rou;ng, wireless access point selec;on

•  Mechanism design

Conclusion: EGTA Methodology

•  Extends scope of GT to procedurally defined scenarios

•  Embraces sta;s;cal underpinnings of strategic reasoning

•  Search process: – GT for establishing salient strategic context –  Strategy explora;on:

•  e.g., RL to search for best response to that context → Principled approach to evaluate complex strategy spaces

•  Growing toolbox of EGTA techniques

real-world#games# - duke university · 2010-11-20 · m.#wellman# nov-2010 wrane# 1...

Documents