Download - Towards Richer Game Models - UTEP · Behavioral game theory models generally perform ... Optimal Finite Algorithms (2) Sampled Replicator Dynamics (3) Greedy Monte-Carlo (4) Decoupled

Towards Richer Game Models

Chris Kiekintveld

University of Texas at El Paso

.. and MANY Collaborators

Research Challenges

Scalability  Human behavior  Robustness to uncertainty  Learning  Evaluation

Security Game

2 players 2 targets 1 defender resource

Target 1

Target1

Target 2

Target 2

1, -1

-1, 1

-2, 2

2, -1

IRIS: “Intelligent Randomization in International Scheduling” (Deployed 2009)

Federal Air Marshals Service (FAMS)

Flights (each day) ~27,000 domestic flights ~2,000 international flights

International Flights from Chicago O’Hare

Not enough air marshals: Assign air marshals to flights

Undercover, in-flight law enforcement

n  Massive scheduling problem n  Adversary may exploit predictable schedules n  Complex constraints: tours, duty hours, off-hours

1.7 × 1013 combinations

100 flights, 10 officers:

Overall problem: 30000 flights, 3000 officers

Our focus: international sector

Federal Air Marshals Service (FAMS)

Large Numbers of Defender Strategies

Strategy1 Strategy2 Strategy3

Strategy1

Strategy2

Strategy3

Strategy4

Strategy5

Strategy6

100 Flight tours 10 Air Marshals

1.73 x 1013

Schedules: ARMOR out of memory

FAMS: Joint Strategies or Combinations

Don’t enumerate ALL joint strategies •  Marginals (IRIS I & II) •  Branch and price (IRIS III)

Strategy1 Strategy2 Strategy3

Strategy1

Strategy2

Strategy3

Strategy4

Strategy5

Strategy6

IRIS I & II: Marginals Instead of Joint Strategies

ARMORActions

Tour combos

Prob

1 1,2,3 x1 2 1,2,4 x2 3 1,2,5 x3 … … … 120 8,9,10 x120

CompactAction

Tour Prob

1 1 y1 2 2 y2 3 3 y3 … … … 10 10 y10

Attack 1

Attack 2

Attack …

Attack 6

1,2,3 5,-10 4,-8 … -20,9 1,2,4 5,-10 4,-8 … -20,9 1,3,5 5,-10 -9,5 … -20,9 … … … … …

ARMOR: 10 tours, 3 air marshals Payoff duplicates: Depends on target covered

MILP similar to ARMOR, y instead of x:   10 instead of 120 variables   y1+y2+y3…+y10 = 3   Sample from “y”, not enumerate “x”   Only works for SIMPLE tours

}1,0{],1...0[

)1()(0

1,1..

max ,

∈∈

−≤−≤

==

∑

∑∑

∑∑∑

∈

∈

∈ ∈ ∈

lji

lji

Xi

lij

l

Qj

lj

ii

lji

lij

Xi Ll Qj

lqx

qx

MqxCa

qxts

qxRp

IRIS Speedups

FAMS Ireland

FAMS London

ARMOR Actions

ARMOR Runtime

IRIS Runtime

6,048 4.74s 0.09s

85,275 ---- 1.57s

0

1

2

3

4

5

10 11 12 13 14 15 16 17 18 19 20

Run

times

(min

)

Targets

Scaling with Targets: Compact ARMOR IRIS I IRIS II

Research Challenges

 Scalability Human behavior  Robustness to uncertainty  Learning  Evaluation

Game theoretic models require assumptions:  What are the attacker’s utilities for different outcomes?  What does the attacker know about the defender’s strategy?  What procedure will an attacker use to make a decision?

Adversary Modeling Assumptions

Even the best models are estimates

  Improving/validating more accurate models of human behavior   Robust solution methods   Adaptive learning models

Research Directions

  Humans do not play according to Nash equilibrium  Psychology and experimental economics may provide better models   Choice theory   Prospect theory   Anchoring biases   Subjective utility   …

Behavioral Game Theory

Behavioral Experiments

  Perform experiments with human participants to evaluate models

  Recruit using Amazon Mechanical Turk

  Participants play the role of the attacker in a simulated security game

  Paid real money based on performance to motivate thoughtful choices

Quantal Response Equilibrium

 Error in individual’s response

 Still: more likely to select better choices than worse choices

Quantal best response:

  λ: represents error level (=0 means uniform random)  Maximal likelihood estimation (λ=0.76)

 Compute optimal response to a QR-attacker

∑=

⋅

⋅

= M

k

xkU

xjU

j

e

eq

1

),(

),(

λ

λ

Example Experiments

 7 payoff structures  5 strategies for each payoff structure

 New methods: BRPT, RPT and BRQR  Leading contender: COBRA  Perfect rational baseline: DOBSS

 Subjects play all games (randomized orders)  No feedback until subject finishes all games

Average Defender Expected Utility

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Payoff 1 Payoff 2 Payoff 3 Payoff 4

BRPT RPT BRQR COBRA DOBSS

Discussion

 Behavioral game theory models generally perform better than Nash equilibrium models  Increasingly sophisticated models are being developed/tested  Current leading model is “subjective utility” quantal response

 Addresses modeling/validation of decision making process, but not the model itself

Research Challenges

 Scalability  Human behavior Robustness to uncertainty  Learning  Evaluation

Robust Solution Methods

 Account for uncertainty in the game model  Payoff uncertainty  Observation uncertainty  Decision-making uncertainty

 Different models of payoff uncertainty  Finite Bayesian – distinct types of attackers  Infinite Bayesian – payoff distributions  Intervals - ranges of possible payoffs

Finite Bayesian Games

Term #1 Term #2

Term#1 5, -3 -1, 1 Term#2 -5, 5 2, -1

Term #1 Term #2

Term#1 2, -1 -3, 4 Term#2 -3, 1 3, -3

Term #1 Term #2

Term#1 4, -2 -1,0.5 Term#2 -4, 3 1.5, -0.5

P=0.3 P=0.5 P=0.2

111 121

112

211

… … … 222

Terminal #1

3.3,-2.2 2.3,…

Terminal #2

-3.8,2.6 …,…

Optimization Models

Harsanyi Transformation

Distributional Payoff Representation

target 2

target T

Pb(payoff)

payoff 0 Pb(payoff)

payoff 0

Pb(payoff)

payoff 0

target covered

target uncovered

.....

coverage vector

target 1

Computational Approaches

Coverage Vector

?

Attack Vector

? (1) Monte-Carlo estimation (2) Numerical methods

(1) Optimal Finite Algorithms (2) Sampled Replicator Dynamics (3) Greedy Monte-Carlo (4) Decoupled Target Sets

Assume Perfect Information

Sample Types Exact Optimization

Sample Types Approx Optimization

Assuming perfect information is very brittle

Approximate both type distribution and optimization

Results for Distributional Games

Beyond Bayesian Games

 Bayesian games are powerful  General framework for model uncertainty  Exact behavior predictions based on uncertainty

 Some limitations  Require distributional information  Even MORE parameters to specify!  What if these are wrong?

 Computational challenges (NP-hard)  Uncertainty about human decision making is hard to capture in Bayesian models

Target 1 Target 2 Target 3 Target 4

Defender Reward 0 0 0 0

Defender Penalty -1 -4 -6 -10

Attacker Penalty 0 0 0 0

Attacker Reward [1,3] [2,5] [4,7] [6,10]

•  Attacker payoffs represented by intervals•  Maximize worst case for defender•  Distribution-free

Interval Security Games

Bayesian vs Interval Solution Quality

Research Challenges

 Scalability  Human behavior  Robustness to uncertainty Learning  Evaluation

Adaptive Adversary Modeling

 Many domains have repeated interactions with an adversary  Border security (detections/apprehensions)  Network security (constant probing/attacking)

 Machine learning methods can be used to update models over time based on observed behaviors  May need to explore to learn more about attacker’s responses

Basic Model

Fixed number of zonesMultiple rounds of interactionMultiple attackers per roundDefender patrols one zoneApprehensions only in this zone

 There are a finite number of (slot) machines each with a single arm to pull  Each machine has an unknown expected payoff

 Problem: select the best arm to pull  Balance exploring machines to find good payoffs and

exploiting current knowledge If the attacker chooses zones with fixed probabilities, this maps easily to a MAB

Multi-Armed Bandits

Online Learning Methods

 Upper Confidence Bounds (UCB) [Auer et al., 2002]  Standard approach for multi-armed bandits  Converges logarithmically to optimal strategy

 Exponential-weight algorithm for Exploration and Exploitation (EXP3) [Auer et al., 2001]  Designed for adversarial bandits  Guarantees on convergence even when rewards can be manipulated by an adversary

 Combined equilibrium/learning methods  Start with an (approximate) equilibrium  Simultaneously learn the optimal policy

Sample Experiment

Round/100 10 20 30 40 50 60 70 80 90 100

App

rehe

nsio

n ra

te in

%

10

12

14

16

18

20

22

24Apprehension rate against adversarial attacker

SSESSE error 0.1EXP3COMB1COMB2COMB3COMB4

Research Challenges

 Scalability  Human behavior  Robustness to uncertainty  Learning Evaluation

Lab Evaluation

Simulated adversary

Human subject adversaries

Security Games superior vs

Human Schedulers/”simple random”

EVALUATING DEPLOYED SECURITY SYSTEMS NOT EASY

How Well Optimized are Limited Security Resources?

Field Evaluation:Patrol quality Unpredictable? Cover?

Compare real schedule

Scheduling competition

Expert evaluation

Field Evaluation: Tests against adversaries

“Mock attackers”

Capture rates ofreal adversaries

Predictable patterns, e.g., US Coast GuardScheduling efforts and cognitive burden

Human Schedulers

Multiple deployments over multiple years: No forced use

Repeatedly fails in deployments, e.g., officers to sparsely crowded terminals

Weighted random: Trillions of patrolling strategies, selecting important ones?How to Incorporate learned adversary models, planning in these weights?

Simple random (e.g., dice roll):

WHY DOES GAME THEORY PERFORM BETTER?

Weaknesses of Previous Methods

Lab Evaluation via Simulation: IRIS (FAMS)

-10

-8

-6

-4

-2

0

2

4

6

50 150 250

Def

ende

r Exp

ecte

d ut

ility

Schedule Size

Uniform Weighted random 1 Weighted random 2 IRIS

Day 1

Day 2

Day 3

Day 4

Day 5

Day 6

Day 7

Cou

nt

Day 1

Day 2

Day 3

Day 4

Day 5

Day 6

Day 7

Cou

nt

Base Patrol Area

Patrols Before PROTECT: Boston Patrols After PROTECT: Boston

PROTECT (Coast Guard)

FIELD EVALUATION OF SCHEDULE QUALITY

Improved Patrol Unpredictability & Coverage for Less Effort

: 350% increase in defender expected utility

FAMS: IRIS Outperformed expert human over six months

Report:GAO-09-903T

FIELD EVALUATION OF SCHEDULE QUALITY

Improved Patrol Unpredictability & Coverage for Less Effort

Trains: TRUSTS outperformed expert humans schedule 90 officers on LA trains

33.5

44.5

55.5

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q1

Q1

Q1

Secu

rity

Scor

e

Human Game Theory

Boston boaters’ questions: “..has the Coast Guard recently acquired more boats”POST-PROTECT: Actual reports of illegal activity

PRE- to POST-PROTECT (Boston): “Deterrence” Improved

Additional real-world indicators from Boston:

”Mock attacker” team analysis

FIELD TEST AGAINST ADVERSARIES: MOCK ATTACKERS

Example from PROTECT

•  Game theory vs Random

Controlled

x

Not Controlled

05

101520

# Captures /30 min

# Warnings /30 min

# Violations /

30 min

Game Theory

Rand+Human

020406080

100Miscellaneous

Drugs

FIELD TESTS AGAINST ADVERSARIES

Computational Game Theory in the Field

•  21 days of patrol •  Identical conditions •  Random + Human

July 2011: Operational Excellence Award (US Coast

Guard, Boston)

June 2013: Meritorious Team Commendation from

Commandant (US Coast Guard)

February 2009: CommendationsLAX Police (City of Los

Angeles)

September 2011: Certificate of Appreciation (Federal Air

Marshals)

EXPERT EVALUATION

Example from ARMOR, IRIS AND PROTECT

Thank You!

 Final questions?

Download - Towards Richer Game Models - UTEP · Behavioral game theory models generally perform ... Optimal Finite Algorithms (2) Sampled Replicator Dynamics (3) Greedy Monte-Carlo (4) Decoupled

Top Related