Towards Richer Game Models
Chris Kiekintveld
University of Texas at El Paso
.. and MANY Collaborators
Research Challenges
Scalability Human behavior Robustness to uncertainty Learning Evaluation
Security Game
2 players 2 targets 1 defender resource
Target 1
Target1
Target 2
Target 2
1, -1
-1, 1
-2, 2
2, -1
IRIS: “Intelligent Randomization in International Scheduling” (Deployed 2009)
Federal Air Marshals Service (FAMS)
Flights (each day) ~27,000 domestic flights ~2,000 international flights
International Flights from Chicago O’Hare
Not enough air marshals: Assign air marshals to flights
Undercover, in-flight law enforcement
n Massive scheduling problem n Adversary may exploit predictable schedules n Complex constraints: tours, duty hours, off-hours
1.7 × 1013 combinations
100 flights, 10 officers:
Overall problem: 30000 flights, 3000 officers
Our focus: international sector
Federal Air Marshals Service (FAMS)
Large Numbers of Defender Strategies
Strategy1 Strategy2 Strategy3
Strategy1
Strategy2
Strategy3
Strategy4
Strategy5
Strategy6
100 Flight tours 10 Air Marshals
1.73 x 1013
Schedules: ARMOR out of memory
FAMS: Joint Strategies or Combinations
Don’t enumerate ALL joint strategies • Marginals (IRIS I & II) • Branch and price (IRIS III)
Strategy1 Strategy2 Strategy3
Strategy1
Strategy2
Strategy3
Strategy4
Strategy5
Strategy6
IRIS I & II: Marginals Instead of Joint Strategies
ARMORActions
Tour combos
Prob
1 1,2,3 x1 2 1,2,4 x2 3 1,2,5 x3 … … … 120 8,9,10 x120
CompactAction
Tour Prob
1 1 y1 2 2 y2 3 3 y3 … … … 10 10 y10
Attack 1
Attack 2
Attack …
Attack 6
1,2,3 5,-10 4,-8 … -20,9 1,2,4 5,-10 4,-8 … -20,9 1,3,5 5,-10 -9,5 … -20,9 … … … … …
ARMOR: 10 tours, 3 air marshals Payoff duplicates: Depends on target covered
MILP similar to ARMOR, y instead of x: 10 instead of 120 variables y1+y2+y3…+y10 = 3 Sample from “y”, not enumerate “x” Only works for SIMPLE tours
}1,0{],1...0[
)1()(0
1,1..
max ,
∈∈
−≤−≤
==
∑
∑∑
∑∑∑
∈
∈
∈ ∈ ∈
lji
lji
Xi
lij
l
Qj
lj
ii
lji
lij
Xi Ll Qj
lqx
qx
MqxCa
qxts
qxRp
IRIS Speedups
FAMS Ireland
FAMS London
ARMOR Actions
ARMOR Runtime
IRIS Runtime
6,048 4.74s 0.09s
85,275 ---- 1.57s
0
1
2
3
4
5
10 11 12 13 14 15 16 17 18 19 20
Run
times
(min
)
Targets
Scaling with Targets: Compact ARMOR IRIS I IRIS II
Research Challenges
Scalability Human behavior Robustness to uncertainty Learning Evaluation
Game theoretic models require assumptions: What are the attacker’s utilities for different outcomes? What does the attacker know about the defender’s strategy? What procedure will an attacker use to make a decision?
Adversary Modeling Assumptions
Even the best models are estimates
Improving/validating more accurate models of human behavior Robust solution methods Adaptive learning models
Research Directions
Humans do not play according to Nash equilibrium Psychology and experimental economics may provide better models Choice theory Prospect theory Anchoring biases Subjective utility …
Behavioral Game Theory
Behavioral Experiments
Perform experiments with human participants to evaluate models
Recruit using Amazon Mechanical Turk
Participants play the role of the attacker in a simulated security game
Paid real money based on performance to motivate thoughtful choices
Quantal Response Equilibrium
Error in individual’s response
Still: more likely to select better choices than worse choices
Quantal best response:
λ: represents error level (=0 means uniform random) Maximal likelihood estimation (λ=0.76)
Compute optimal response to a QR-attacker
∑=
⋅
⋅
= M
k
xkU
xjU
j
e
eq
1
),(
),(
λ
λ
Example Experiments
7 payoff structures 5 strategies for each payoff structure
New methods: BRPT, RPT and BRQR Leading contender: COBRA Perfect rational baseline: DOBSS
Subjects play all games (randomized orders) No feedback until subject finishes all games
Average Defender Expected Utility
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Payoff 1 Payoff 2 Payoff 3 Payoff 4
BRPT RPT BRQR COBRA DOBSS
Discussion
Behavioral game theory models generally perform better than Nash equilibrium models Increasingly sophisticated models are being developed/tested Current leading model is “subjective utility” quantal response
Addresses modeling/validation of decision making process, but not the model itself
Research Challenges
Scalability Human behavior Robustness to uncertainty Learning Evaluation
Robust Solution Methods
Account for uncertainty in the game model Payoff uncertainty Observation uncertainty Decision-making uncertainty
Different models of payoff uncertainty Finite Bayesian – distinct types of attackers Infinite Bayesian – payoff distributions Intervals - ranges of possible payoffs
Finite Bayesian Games
Term #1 Term #2
Term#1 5, -3 -1, 1 Term#2 -5, 5 2, -1
Term #1 Term #2
Term#1 2, -1 -3, 4 Term#2 -3, 1 3, -3
Term #1 Term #2
Term#1 4, -2 -1,0.5 Term#2 -4, 3 1.5, -0.5
P=0.3 P=0.5 P=0.2
111 121
112
211
… … … 222
Terminal #1
3.3,-2.2 2.3,…
Terminal #2
-3.8,2.6 …,…
Optimization Models
Harsanyi Transformation
Distributional Payoff Representation
target 2
target T
Pb(payoff)
payoff 0 Pb(payoff)
payoff 0
Pb(payoff)
payoff 0
target covered
target uncovered
.....
coverage vector
target 1
Computational Approaches
Coverage Vector
?
Attack Vector
? (1) Monte-Carlo estimation (2) Numerical methods
(1) Optimal Finite Algorithms (2) Sampled Replicator Dynamics (3) Greedy Monte-Carlo (4) Decoupled Target Sets
Assume Perfect Information
Sample Types Exact Optimization
Sample Types Approx Optimization
Assuming perfect information is very brittle
Approximate both type distribution and optimization
Results for Distributional Games
Beyond Bayesian Games
Bayesian games are powerful General framework for model uncertainty Exact behavior predictions based on uncertainty
Some limitations Require distributional information Even MORE parameters to specify! What if these are wrong?
Computational challenges (NP-hard) Uncertainty about human decision making is hard to capture in Bayesian models
Target 1 Target 2 Target 3 Target 4
Defender Reward 0 0 0 0
Defender Penalty -1 -4 -6 -10
Attacker Penalty 0 0 0 0
Attacker Reward [1,3] [2,5] [4,7] [6,10]
• Attacker payoffs represented by intervals• Maximize worst case for defender• Distribution-free
Interval Security Games
Bayesian vs Interval Solution Quality
Research Challenges
Scalability Human behavior Robustness to uncertainty Learning Evaluation
Adaptive Adversary Modeling
Many domains have repeated interactions with an adversary Border security (detections/apprehensions) Network security (constant probing/attacking)
Machine learning methods can be used to update models over time based on observed behaviors May need to explore to learn more about attacker’s responses
Basic Model
Fixed number of zonesMultiple rounds of interactionMultiple attackers per roundDefender patrols one zoneApprehensions only in this zone
There are a finite number of (slot) machines each with a single arm to pull Each machine has an unknown expected payoff
Problem: select the best arm to pull Balance exploring machines to find good payoffs and
exploiting current knowledge If the attacker chooses zones with fixed probabilities, this maps easily to a MAB
Multi-Armed Bandits
Online Learning Methods
Upper Confidence Bounds (UCB) [Auer et al., 2002] Standard approach for multi-armed bandits Converges logarithmically to optimal strategy
Exponential-weight algorithm for Exploration and Exploitation (EXP3) [Auer et al., 2001] Designed for adversarial bandits Guarantees on convergence even when rewards can be manipulated by an adversary
Combined equilibrium/learning methods Start with an (approximate) equilibrium Simultaneously learn the optimal policy
Sample Experiment
Round/100 10 20 30 40 50 60 70 80 90 100
App
rehe
nsio
n ra
te in
%
10
12
14
16
18
20
22
24Apprehension rate against adversarial attacker
SSESSE error 0.1EXP3COMB1COMB2COMB3COMB4
Research Challenges
Scalability Human behavior Robustness to uncertainty Learning Evaluation
Lab Evaluation
Simulated adversary
Human subject adversaries
Security Games superior vs
Human Schedulers/”simple random”
EVALUATING DEPLOYED SECURITY SYSTEMS NOT EASY
How Well Optimized are Limited Security Resources?
Field Evaluation:Patrol quality Unpredictable? Cover?
Compare real schedule
Scheduling competition
Expert evaluation
Field Evaluation: Tests against adversaries
“Mock attackers”
Capture rates ofreal adversaries
Predictable patterns, e.g., US Coast GuardScheduling efforts and cognitive burden
Human Schedulers
Multiple deployments over multiple years: No forced use
Repeatedly fails in deployments, e.g., officers to sparsely crowded terminals
Weighted random: Trillions of patrolling strategies, selecting important ones?How to Incorporate learned adversary models, planning in these weights?
Simple random (e.g., dice roll):
WHY DOES GAME THEORY PERFORM BETTER?
Weaknesses of Previous Methods
Lab Evaluation via Simulation: IRIS (FAMS)
-10
-8
-6
-4
-2
0
2
4
6
50 150 250
Def
ende
r Exp
ecte
d ut
ility
Schedule Size
Uniform Weighted random 1 Weighted random 2 IRIS
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Cou
nt
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Cou
nt
Base Patrol Area
Patrols Before PROTECT: Boston Patrols After PROTECT: Boston
PROTECT (Coast Guard)
FIELD EVALUATION OF SCHEDULE QUALITY
Improved Patrol Unpredictability & Coverage for Less Effort
: 350% increase in defender expected utility
FAMS: IRIS Outperformed expert human over six months
Report:GAO-09-903T
FIELD EVALUATION OF SCHEDULE QUALITY
Improved Patrol Unpredictability & Coverage for Less Effort
Trains: TRUSTS outperformed expert humans schedule 90 officers on LA trains
33.5
44.5
55.5
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q1
Q1
Q1
Secu
rity
Scor
e
Human Game Theory
Boston boaters’ questions: “..has the Coast Guard recently acquired more boats”POST-PROTECT: Actual reports of illegal activity
PRE- to POST-PROTECT (Boston): “Deterrence” Improved
Additional real-world indicators from Boston:
”Mock attacker” team analysis
FIELD TEST AGAINST ADVERSARIES: MOCK ATTACKERS
Example from PROTECT
• Game theory vs Random
Controlled
x
Not Controlled
05
101520
# Captures /30 min
# Warnings /30 min
# Violations /
30 min
Game Theory
Rand+Human
020406080
100Miscellaneous
Drugs
FIELD TESTS AGAINST ADVERSARIES
Computational Game Theory in the Field
• 21 days of patrol • Identical conditions • Random + Human
July 2011: Operational Excellence Award (US Coast
Guard, Boston)
June 2013: Meritorious Team Commendation from
Commandant (US Coast Guard)
February 2009: CommendationsLAX Police (City of Los
Angeles)
September 2011: Certificate of Appreciation (Federal Air
Marshals)
EXPERT EVALUATION
Example from ARMOR, IRIS AND PROTECT
Thank You!
Final questions?