rlchina 2021 game theory and machine learning in

75
Reinforcement Learning China Summer School RLChina 2021 Game Theory and Machine Learning in Multiagent Communication and Coordination Prof. FANG Fei Leonardo Assistant Professor School of Computer Science Carnegie Mellon University August 20, 2021

Upload: others

Post on 30-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RLChina 2021 Game Theory and Machine Learning in

Reinforcement Learning China Summer School

RLChina 2021

Game Theory and Machine Learning

in Multiagent Communication and

Coordination

Prof. FANG Fei

Leonardo Assistant Professor

School of Computer Science

Carnegie Mellon University

August 20, 2021

Page 2: RLChina 2021 Game Theory and Machine Learning in

Machine Learning + Game Theory

for Societal Challenges

Security & Safety

Environmental

SustainabilityTransportation

Zero HungerArtificial

Intelligence

Machine Learning

Computational

Game Theory

Societal Challenges

2

Page 3: RLChina 2021 Game Theory and Machine Learning in

Protect Ferry Line from Potential Attacks

0

2

4

6

8

Max 𝔼

[π‘ˆ]

Previous USCG

Game-theoretic

Defender-attacker security game

Randomized patrol strategy

Minimize attacker’s maximum

expected utility

Reduce potential risk by 50%Deployed by US Coast Guard

Optimal Patrol Strategy for Protecting Moving Targets with Multiple Mobile

Resources. Fei Fang, Albert Xin Jiang, Milind Tambe. In AAMAS-133

In collaboration with US Coast Guard

Page 4: RLChina 2021 Game Theory and Machine Learning in

Protect Wildlife from Poaching

Data from past patrols

& satellite imageryPredicted poaching threat

Machine Learning Methods

Ensemble Learning, Decision Trees,

Neural Networks, Gaussian

Process, Markov Random Field, …

Learn poacher behavior from data

Ranger-poacher game to plan patrols

Deployed in Uganda, China, Malaysia

Increased detection of poaching

Available to more than 600 sites worldwide

In collaboration with Uganda Wildlife Authority, Wildlife Conservation Society, World Wild Fund for Nature, Panthera, Rimba

IJCAI-15, IAAI-16, AAMAS-17, ECML-PKDD 2017, COMPASS 2019, IAAI 20214

Page 5: RLChina 2021 Game Theory and Machine Learning in

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Discussion and Summary

5

Page 6: RLChina 2021 Game Theory and Machine Learning in

Motivation: Community Engagement in Anti-Poaching

6

β€’ Lack of patrol resources, e.g., 1 patroller/167 π‘˜π‘š2

β€’ Recruit informants to provide tips about poachers

β€’ Other domains: community watchers for urban safety

Green Security Game with Community Engagement Taoan Huang, Weiran

Shen, David Zeng, Tianyu Gu, Rohit Singh, Fei Fang In AAMAS-20

Page 7: RLChina 2021 Game Theory and Machine Learning in

Motivation: Community Engagement in Anti-Poaching

β€’ How should the rangers plan patrols with/without tips?

β€’ Informant’s goal may not be always aligned with the

defender

β€’ Strategic informants can choose what to tell

AttackerDefender

Informant!

7When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 8: RLChina 2021 Game Theory and Machine Learning in

β€’ A set 𝑇 of 𝑛 targets

β€’ Defender: choose a patrol strategy

– A (randomized) allocation of π‘Ÿ resources to 𝑛 targets

β€’ Attacker: attack a target

β€’ Informant: has type πœƒ ∈ Θ with prior distribution 𝑝(πœƒ), send a message to defender

β€’ Defender: determine a defense plan

Defender-Attacker-Informant Game

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Utility Covered Uncovered

Defender

Attacker

8

Page 9: RLChina 2021 Game Theory and Machine Learning in

Defender-Attacker-Informant Game

Step 1

Step 2

Step 3

Step 4

9

β€’ A defense plan 𝑑 = 𝑀, π‘₯, π‘₯0

– 𝑀: A set of possible messages

– π‘₯0 ∈ [0,1]𝑛: A routine patrol strategy (when no messages)

– π‘₯:𝑀 β†’ [0,1]𝑛: A mapping from message to a patrol strategy

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 10: RLChina 2021 Game Theory and Machine Learning in

Direct Defense Plan

10

In a direct defense plan 𝑑 = 𝑀, π‘₯, π‘₯0 , 𝑀 = 𝑇 Γ— Θ. A direct defense plan is truthful, if reporting the

actual target and his true type is the informant’s

best strategy.

Definition

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 11: RLChina 2021 Game Theory and Machine Learning in

Revelation Principle

11

For any defense plan 𝑀, π‘₯, π‘₯0 , there exists a truthful

direct defense plan 𝑀, π‘₯, π‘₯0 , such that all players

obtain the same utility, for any target and any type.

Theorem (Revelation Principle)

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 12: RLChina 2021 Game Theory and Machine Learning in

How many messages are enough?

There exists a defense plan 𝑀, π‘₯, π‘₯0 , with 𝑀 = 𝑛 + 1that achieves the optimal defender’s utility.

Theorem

12

Upper bound: Direct defense plan: 𝑀 = 𝑛|Θ|

Interpretation

Message 1 to 𝑛: pro-defender informants

Message 𝑛 + 1: pro-attacker informants

For target 𝑑:

Informant reports message 𝑑, if π‘ˆπ‘‘π‘ πœƒ > π‘ˆπ‘‘

𝑒 πœƒ

Informant reports message 𝑛 + 1, if π‘ˆπ‘‘π‘’ πœƒ > π‘ˆπ‘‘

𝑐 πœƒ

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 13: RLChina 2021 Game Theory and Machine Learning in

Computation

β€’ The optimal defense plan can be computed in

polynomial time

β€’ Solve a linear program (LP) for each target

– Each LP ensures

β€’ The attacker’s best strategy is to attack a target 𝑑

β€’ Informant’s best strategy is to report π‘šπ‘‘ if the

informant is defender aligned on target 𝑑, i.e.,

π‘ˆπ‘‘π‘ πœƒ > π‘ˆπ‘‘

𝑒 πœƒ

β€’ Informant’s best strategy is to report π‘šπ‘›+1 if the

informant is attacker aligned on target 𝑑, i.e.,

π‘ˆπ‘‘π‘ πœƒ < π‘ˆπ‘‘

𝑒 πœƒ

13When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 14: RLChina 2021 Game Theory and Machine Learning in

Computation

14

The attacker’s best strategy is to

attack a target 𝑑

Informant’s best strategy is to

report π‘šπ‘‘β€² if the informant is

defender aligned on target 𝑑′, and

to report π‘šπ‘›+1 otherwise

Maximize the defender’s expected

utility

LP assuming target 𝑑 is the best choice for attacker

Defender has π‘Ÿ resources in total

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 15: RLChina 2021 Game Theory and Machine Learning in

β€’ Utility vs. informant type

– Type 1: Fully defender aligned: π‘ˆπ‘‘π‘ πœƒ > π‘ˆπ‘‘

𝑒 πœƒ , βˆ€π‘‘

– Type 2: Fully attacker aligned: π‘ˆπ‘‘π‘ πœƒ < π‘ˆπ‘‘

𝑒 πœƒ , βˆ€π‘‘

– Type 3: Random

β€’ Informant could significantly affect the game

Experiments

15When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 16: RLChina 2021 Game Theory and Machine Learning in

Experiments

β€’ If informant is not fully aligned with defender, more

defender resources are needed to achieve the same

expected utility

β€’ Giving the informant additional reward helps a lot

16When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Page 17: RLChina 2021 Game Theory and Machine Learning in

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

17

Page 18: RLChina 2021 Game Theory and Machine Learning in

Motivation: UAV & Human Patrols in Anti-Poaching

18SPOT Poachers in Action: Augmenting Conservation Drones with Automatic Detection in Near Real Time. Elizabeth

Bondi, Fei Fang, Mark Hamilton, Debarun Kar, Donnabell Dmello, Jongmoo Choi, Robert Hannaford, Arvind Iyer,

Lucas Joppa, Milind Tambe, Ram Nevatia. In IAAI-18

Page 19: RLChina 2021 Game Theory and Machine Learning in

Motivation: UAV & Human Patrols in Anti-Poaching

19

Not enough rangers

Flash light to deter poachers

Actual video of poacher running away

Page 20: RLChina 2021 Game Theory and Machine Learning in

Signaling

β€’ Flash light is a signal to indicate ranger is arriving

β€’ The signal can be deceptive

β€’ If Prob(ranger arrives|signal) = 0.1, poacher may not be

stopped

β€’ Must be strategic in deceptive signaling

20

Page 21: RLChina 2021 Game Theory and Machine Learning in

Signaling with Perfect Detection

21

Assuming

perfect

detectionNo Signal

no detection

detection

No Signal

0.3

0.7

0.3

0.4

0.3

How to incorporate uncertainty?

Strategic Coordination of Human Patrollers and Mobile Sensors with Signaling for Security

Games Haifeng Xu, Kai Wang, Phebe Vayanos and Milind Tambe AAAI 2018

Page 22: RLChina 2021 Game Theory and Machine Learning in

Signaling with Detection Uncertainty

β€’ Key insight: With uncertainty, adversary also uncertain

about our uncertainty. Exploit the information advantage

22

no detectionNo Signal

?

detection

No Signal

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Page 23: RLChina 2021 Game Theory and Machine Learning in

Stackelberg Security Game Model

23

Defender utilities at each target:

β€’ Positive if covered

β€’ Negative if attacked

β€’ 0 if attacker runs away

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Page 24: RLChina 2021 Game Theory and Machine Learning in

β€’ Enumerate all possible β€œstates” of a target

β€’ Defender’s pure strategy: Assign a state to each target

β€’ Goal: Find optimal mixed strategy for the defender

Solution

24

Matched

Unmatched

Matched

UnmatchedPatroller far

p

n+

n-

s+

s-

s

Patroller near

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Page 25: RLChina 2021 Game Theory and Machine Learning in

Solution

β€’ Linear Programming + Branch and Price

25

π‘₯π‘–πœƒ: Prob. of allocating resources to ensure state πœƒ at target 𝑖

πœ“π‘–πœƒ: Prob. of sending signal in state πœƒ with detection

πœ‘π‘–πœƒ: Prob. of sending signal in state πœƒ without detection

Defender’s expected utility

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Joint prob. (not

conditional prob.)

Feasibility of π‘ž and π‘₯

Marginal prob.

Feasibility of πœ“ and πœ‘

Attacker attacks if

signal is 0 and runs

away if 1

π‘žπ‘’: Prob. of defender choosing pure strategy 𝑒

Page 26: RLChina 2021 Game Theory and Machine Learning in

Experimental Results

β€’ Perform worse than expected if ignoring uncertainty

26

Case Study

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Page 27: RLChina 2021 Game Theory and Machine Learning in

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

27

Page 28: RLChina 2021 Game Theory and Machine Learning in

Information Advantage

β€’ Consider a finitely repeated Bayesian security game

β€’ 𝑇 rounds

β€’ In each round, defender/attacker chooses actions

π‘Žπ‘‘1, π‘Žπ‘‘2 (targets to protect/attack) simultaneously

β€’ Defender has no commitment power

β€’ Attacker’s type (utility) is unknown to the defender

β€’ Defender need to infer attacker’s type πœ† ∈ Ξ› from their actions

β€’ Prior type distribution 𝐩 = {π‘πœ†}

β€’ Attacker balance between playing myopically and

maintaining information advantage to maximize

accumulated payoff

β€’ Attack can be viewed as (deceptive) signal of his type

β€’ Task: Find optimal defender strategy28

Page 29: RLChina 2021 Game Theory and Machine Learning in

Bayesian Equilibrium

β€’ Rationality

β€’ Belief Consistency: the belief is updated followed

the Bayes' rule

29

Page 30: RLChina 2021 Game Theory and Machine Learning in

Optimality from any decision point onward

30

a (P1)

L R

b (P2) c (P2)

3; 1

K U

1; 3 2; 1

K U

0; 0

a (P1)

L R

b (P2) c (P2)

3; 1

K U

1; 3 2; 1

K U

0; 0

NE NE and Perfect NE

Page 31: RLChina 2021 Game Theory and Machine Learning in

Perfect Bayesian Equilibrium

β€’ Equilibrium refinement for Bayesian Equilibrium

β€’ Sequential rationality starting from any information set

β€’ Most existing work solve using Mathematical

Programming-based method (Nguyen et al. 2019[1]; Guo et

al. 2017[2])

– Very precise

– Lacks scalability: long time and large memory to solve

Thanh H. Nguyen, Yongzhao Wang, Arunesh Sinha, and Michael P. Wellman. Deception

in finitely repeated security games. In AAAI-19

31

Page 32: RLChina 2021 Game Theory and Machine Learning in

Our Algorithm for Computing PBE

β€’ Temporal Induced Self-Play (TISP)

– A framework that can be combined with different

learning algorithms

32

Backward

inductionPolicy

Learning

Belief-space

Approximation

Belief-based

representation

Page 33: RLChina 2021 Game Theory and Machine Learning in

Belief-based representation

β€’ Use belief instead of history: πœ‹(𝑠, 𝑏) instead of πœ‹(β„Ž)

– πœ‹(attack Target 1 in (𝑙 βˆ’ 1) round, 2 in 𝑙 βˆ’ 2 round, . . )is now πœ‹(0.2 prob. of being attacker type a)

β€’ Helps in the case with long history

33

Page 34: RLChina 2021 Game Theory and Machine Learning in

Backward Induction

β€’ Reverse the training process

– From round 𝐿 βˆ’ 1 to round 𝐿 βˆ’ 2, to …, to round 0

– Use trained value network 𝑉 and policy network πœ‹in round 𝑙 + 1 when training round 𝑙

β€’ Do not sample the whole trajectory from round 0

to round 𝐿 βˆ’ 1, but one step trajectory from

round 𝑙 to round 𝑙 + 1.

β€’ Using a special reset function to help

β€’ Different networks for different rounds

β€’ Improve performance without adding training cost

34

Page 35: RLChina 2021 Game Theory and Machine Learning in

Belief Space Approximation

β€’ Sample 𝐾 belief vectors, and train the strategies

specifically conditioning on the belief and round,

β€’ Query time:

πœ‹ π‘Ž 𝑏, 𝑠 = π‘˜=1𝐾 πœ‹πœƒπ‘˜ π‘Ž 𝑠; π‘π‘˜ 𝑀(𝑏, π‘π‘˜)

π‘˜=1𝐾 𝑀(𝑏, π‘π‘˜)

35

Page 36: RLChina 2021 Game Theory and Machine Learning in

Policy Learning

β€’ Policy gradient:

– Update rule changed:

β€’ Regret matching:

πœ‹π‘‘+1 π‘Ž 𝑠, 𝑏 =𝑅𝑑+1(𝑠,𝑏,π‘Ž)

+

π‘Žβ€² 𝑅𝑑+1(𝑠,𝑏,π‘Žβ€²) +

where

π›»πœƒ π‘‰πœ†(πœ‹, 𝑏, 𝑠) =

π‘ŽβˆˆΞ‘

π›»πœƒ πœ‹πœƒ π‘Ž 𝑏, 𝑠 π‘„πœ†(πœ‹, 𝑏, 𝑠, π‘Ž))

= 𝐸[ π‘„πœ† πœ‹, 𝑏, 𝑠, π‘Ž π›»πœƒ π‘™π‘›πœ‹πœƒ(π‘Ž | 𝑏, 𝑠) + π›Ύπ›»πœƒπ‘β€²π›»π‘β€²π‘‰πœ†(πœ‹, 𝑏′, 𝑠′)]

𝑅𝑑+1 𝑠, 𝑏, π‘Ž =

𝜏=1

𝑑

π‘„πœ πœ‹πœ, 𝑠, 𝑏, π‘Ž βˆ’ π‘‰πœ™πœ(πœ‹πœ, 𝑠, 𝑏)

36

Page 37: RLChina 2021 Game Theory and Machine Learning in

Temporal Induced Self-play Training

37

Page 38: RLChina 2021 Game Theory and Machine Learning in

Test-time Policy Transformation

38

Page 39: RLChina 2021 Game Theory and Machine Learning in

Experiment: Security Game

β€’ Better scalability than MP-based method

β€’ Much higher quality than other learning-based method

TISP can be used for more complex Stochastic Bayesian games

39

Page 40: RLChina 2021 Game Theory and Machine Learning in

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

40

Page 41: RLChina 2021 Game Theory and Machine Learning in

Coordination in Games

β€’ Correlated equilibrium (CE)

β€’ Correlation device: send private signals to players

– Signals are sampled from a joint probability distribution over the actions of players and represent recommended player behavior

– Equivalent to having a mediator that privately recommends behavior to the players, but does not enforce it

Nash Equilibrium:

Total Utility=7+2=2+7=9

0.5

0.25

0.25

Correlated Equilibrium:

Total Utility=0.25*2(7+2)+0.5(6+6)

=10.5

41

Page 42: RLChina 2021 Game Theory and Machine Learning in

Understand and Compute EFCE

β€’ Extensive-form correlated equilibrium (EFCE):– Correlation device selects

private signals for the players before the game starts;

– Recommendations are revealed incrementally as the players progress in the game tree

β€’ It is computationally challenging to compute EFCE

42

Page 43: RLChina 2021 Game Theory and Machine Learning in

Theoretical Results

β€’ Theorem (informal): Finding an EFCE in a two-player

game can be seen as a bilinear saddle-point problem

β€’ Conceptual implication: A zero-sum game between the

mediator and the deviator

β€’ Computational implication: The bilinear saddle-point

formulation opens the way to the plethora of optimization

algorithm that has been developed specifically for

saddle-point problems

43

minπ‘₯βˆˆπ‘‹maxπ‘¦βˆˆπ‘Œπ‘₯𝑇𝐴𝑦

Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks Gabriele Farina,

Chun Kai Ling, Fei Fang, Tuomas Sandholm. In NeurIPS-19:

Page 44: RLChina 2021 Game Theory and Machine Learning in

Algorithms to Compute EFCE

β€’ Algorithm 1: A simple subgradient descent method

– Exploits the bilinear saddle-point problem formulation

– Use structural properties of EFCEs

– Can lead to better scalability than the prior approach

based on linear programming

β€’ Algorithm 2: A regret minimization-based algorithm

– Adapt the self-play methods based on regret

minimization

– Much more scalable than Algorithm 1

44Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks Gabriele Farina, Chun Kai Ling,

Fei Fang, Tuomas Sandholm. In NeurIPS-19. Efficient Regret Minimization Algorithm for Extensive-Form

Correlated Equilibrium Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm In NeurIPS-19

Page 45: RLChina 2021 Game Theory and Machine Learning in

Copula Learning for Agent Coordination

Correlation

Device

Players in Team

1

Players in Team

2

Our goal is to design the copula, from which we can

derive the distribution of the signals, to achieve good

enough coordination among players

Design the distribution

of the signal

represented by a

copula

e.g., a neural network

Parameterize the copula with a neural network and try to learn the

parameters to ensure the players have incentive to follow the

recommended action

Deep Archimedean Copulas Chun Kai Ling, Fei Fang, Zico Kolter, NeurIPS-2045

Page 46: RLChina 2021 Game Theory and Machine Learning in

Archimedean Copulas

β€’ Copulas: Multivariate CDF with marginals uniform in [𝟎, 𝟏]

β€’ 𝐢 π‘₯1, … , π‘₯𝑑 = 𝑃(𝑋1 ≀ π‘₯1, … , 𝑋𝑑 ≀ π‘₯𝑑)

β€’ Archimedean Copulas: specified by a generator πœ‘: 0,∞ β†’0, 1

β€’ 𝐢 π‘₯1, … π‘₯𝑑 = πœ‘ πœ‘βˆ’1 π‘₯1 +β‹―+ πœ‘

βˆ’1 π‘₯𝑑

β€’ Commonly used A.C. are parameterized by a single scalar πœƒe.g.,

Frank Clayton

πœ‘πœƒ 𝑑 = βˆ’1

πœƒlog(eβˆ’π‘‘ π‘’βˆ’πœƒ βˆ’ 1 + 1) πœ‘πœƒ 𝑑 = 1 + 𝑑

βˆ’1/πœƒ

Image from Scherer, Matthias, and Jan-frederik Mai. Simulating copulas: stochastic models, sampling algorithms, and

applications.

46

Page 47: RLChina 2021 Game Theory and Machine Learning in

Our approach: ACNet

β€’ ACNet learns a πœ‘ as a convex combination of negative

exponentials

β€’ Other probabilistic quantities obtained by differentiation

w.r.t. inputs

– Joint Density: πœ•π‘‘πΆ π‘₯1,…,π‘₯𝑑

πœ•π‘₯1β€¦πœ•π‘₯𝑑

– Conditional densities, conditional distributions, etc.

β€’ Evaluating these quantities requires computing πœ‘βˆ’1

– Provide a wrapper in PyTorch to compute inverses

using Newtons method

– Fully differentiable, derivatives w.r.t. weights are

computed using auto-differentiation

47

Page 48: RLChina 2021 Game Theory and Machine Learning in

Fitting real world data

Ground Truth

Boston INTC-MSFT GOOG-FB

Best parametric

ACNet

48

Page 49: RLChina 2021 Game Theory and Machine Learning in

We find that ACNet…

β€’ Is able to fit synthetic data generated from common A.C.

β€’ Outperforms common A.C. in real-world datasets

β€’ Can give conditional densities/CDFs in a single model

β€’ Can be sampled from efficiently even in high dimensions

β€’ Can be the basis for computing correlated equilibrium in

complex games (e.g., with continuous action space)

β€’ Future direction: Leverage ACNet to compute correlated

equilibrium for complex games

49

Page 50: RLChina 2021 Game Theory and Machine Learning in

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

50

Page 51: RLChina 2021 Game Theory and Machine Learning in

Motivation: Volunteer-Based Food Rescue Platform

Volunteer claim

rescue

Pick up from

donor

Deliver to recipient

Success!

Food waste and food insecurity coexist Waste up to 40% of our food globally (>1.3 billion tons annually)

1 in 8 people go hungry every day

Rescue good food!

Post rescue

requests

In collaboration with 412 Food Rescue (412FR)51

Page 52: RLChina 2021 Game Theory and Machine Learning in

Motivation: Volunteer-Based Food Rescue Platform

β€’ Challenges

– Uncertainty about whether a rescue will be claimed

and completed

Human dispatcher intervenesSend notifications to volunteers

1-to-many communication

Volunteer: notification fatigue

1-to-1 communication

Dispatcher: overstretched

52

Page 53: RLChina 2021 Game Theory and Machine Learning in

Motivation: Volunteer-Based Food Rescue Platform

β€’ How can AI help?

– Predictive model of rescue claim status

β€’ Determine which rescues need special attention

from human dispatcher

– Data-driven optimization of intervention and

notification

β€’ Avoid excessive notifications to help retain

volunteers

Improving Efficiency of Volunteer-Based Food Rescue Operations. Zheyuan

Ryan Shiβˆ—, Yiwen Yuanβˆ—, Kimberly Lo, Leah Lizarondo, Fei Fang. In IAAI-20.

53

Page 54: RLChina 2021 Game Theory and Machine Learning in

Predictive Model of Rescue Claim Status

Timing

Weather

Location

Percentage of unclaimed

rescues by zip codeFeatures used for predictive model

Operational dataset of 412FR from March 2018 to May 2019

54

Page 55: RLChina 2021 Game Theory and Machine Learning in

Predictive Model of Rescue Claim Status

A stacking model

Predict whether a rescue will be claimed by volunteers

+: Claimed (3825)

-: Not claimed (749)

NN

Training data: May 2018 to Dec 2018

Test data: Jan 2019 to May 2019

55

Page 56: RLChina 2021 Game Theory and Machine Learning in

Predictive Model of Rescue Claim Status

Predict whether a rescue will be claimed by volunteers

Model Accuracy Precision Recall F1 AUC

Gradient

boosting0.73 0.86 0.82 0.84 0.51

Random forest 0.71 0.87 0.78 0.82 0.54

Gaussian

process0.56 0.88 0.54 0.67 0.60

Stacking model 0.69 1.00* 0.64 0.78 0.81

56

Page 57: RLChina 2021 Game Theory and Machine Learning in

Rescue

published

Dispatcher

Intervene

60 minutes

5 miles

15 minutes

1st wave

notification

Notify all

volunteers

Optimize Intervention and Notification Scheme

Current Practice (Default INS)

964 volunteers get 1st-wave notification on average

44.6% rescues are claimed by volunteers receiving 1st-wave

notification 57

Page 58: RLChina 2021 Game Theory and Machine Learning in

Rescue

published

Dispatcher

Intervene

𝑧 minutes

𝑦 miles

π‘₯ minutes

1st wave

notification

Notify all

volunteers

(2nd-wave)

Optimize Intervention and Notification Scheme

58

Improve INS with minor changes

Task: Find best values of π‘₯, 𝑦, 𝑧 to reduce notifications and

human intervention while ensuring at least the same claim rate

Page 59: RLChina 2021 Game Theory and Machine Learning in

Optimize Intervention and Notification Scheme

β€’ Optimization problem

59

minπ‘₯,𝑦,𝑧𝑣 𝑦 + π‘ž π‘₯, 𝑦, 𝑧 + πœ† Γ— 𝑠 π‘₯, 𝑦, 𝑧

s.t. 𝑝 π‘₯, 𝑦, 𝑧 β‰₯ 𝑏

𝑣 𝑦 = Expected # of volunteers receiving 1st-wave notification

π‘ž π‘₯, 𝑦, 𝑧 = Expected # of volunteers receiving 2nd-wave

notification𝑠 π‘₯, 𝑦, 𝑧 = Expected # of rescues requiring human

intervention

π‘₯, 𝑦, 𝑧 ∈ 𝑆

Claim rate β‰₯ threshold

Page 60: RLChina 2021 Game Theory and Machine Learning in

Counterfactual Estimation

β€’ How to estimate 𝑣 𝑦 , π‘ž π‘₯, 𝑦, 𝑧 , 𝑠 π‘₯, 𝑦, 𝑧 , 𝑝(π‘₯, 𝑦, 𝑧)?

– Assume rescue and volunteer distribution remain the

same

– Estimate the quantities based on historical rescues

– For each rescue, we calculate the counterfactual

claim time (CCT) under an INS with a number of

assumptions

β€’ Assumption 1: Upon receiving the notification, a

volunteer take the same amount of time to respond

β€’ Assumption 2: Success of intervention is not

affected by INS

β€’ …

60

Page 61: RLChina 2021 Game Theory and Machine Learning in

Branch-and-Bound Algorithm

β€’ Can we avoid enumerating all possible INSs?

β€’ Estimate a lower bound of objective value with INS

(π‘₯, 𝑦, 𝑧) when a subset of parameters are specified

β€’ Use the lower bound to prioritize and prune INSs through

branch-and-bound

61

π‘₯, 𝑦, 𝑧 ∈ 𝑆

Example: π‘₯ = 𝑒𝑛𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑, 𝑦 = 5 π‘šπ‘–π‘™π‘’π‘ , 𝑧 = 45 π‘šπ‘–π‘›

Lower bound = 𝑣 𝑦 + π‘ž π‘₯π‘šπ‘Žπ‘₯ , 𝑦, 𝑧 + πœ† 𝑠 π‘₯π‘šπ‘–π‘›, 𝑦, 𝑧

1st wave 2nd wave Human

Page 62: RLChina 2021 Game Theory and Machine Learning in

Branch-and-Bound Algorithm

62

(π‘₯, 𝑦, 𝑧)unspecified

𝑧 = 45 𝑧 = 60𝑧 = 50 𝑧 = 55

𝑦 = 4.5 𝑦 = 6𝑦 = 5 𝑦 = 5.5

Calculate LB

π‘₯ = 14π‘₯ = 15

π‘₯ = 16

Calculate LB

Calculate Objective Value

𝑂𝑏𝑗 = 2500

𝐿𝐡 = 2600

Page 63: RLChina 2021 Game Theory and Machine Learning in

Recommended INS

β€’ Optimize on data from May 2018 to Dec 2018

β€’ Test on test data from Jan 2019 to May 2019

β€’ Recommend INS by checking Pareto frontier

Deployed

since Jan

2020 63

Page 64: RLChina 2021 Game Theory and Machine Learning in

Rescue-Specific Notification Scheme

β€’ Can we do better than simply changing the parameters

of the default INS?

– Send 1st-wave notifications to volunteers that are

more likely to claim the rescue

β€’ Task: Given a rescue, provide a list of π‘˜ volunteers

β€’ Similar to recommender systems

– Users - Rescue trips

– Items - Volunteers

A Recommender System for Crowdsourcing Food Rescue Platforms. Zheyuan

Ryan Shi, Leah Lizarondo, Fei Fang In WWW-21

64

Page 65: RLChina 2021 Game Theory and Machine Learning in

Distribution of Donor and Recipient Organizations

0 1 2

3 4 5

6 7 8

10 119

12 13 14

15

0 1 2

3 4 5

6 7 8

10 119

12 13 14

15

Donor Organizations Recipient Organizations

Divide the Pittsburgh area into 16 regions

65

Page 66: RLChina 2021 Game Theory and Machine Learning in

Feature Extraction

Feature extraction

Rescue

Volunteer

Volunteer’s # completed

rescues in donor’s region

Volunteer’s # completed

rescues in recipient’s region

Volunteer’s total # completed

rescues

Time between rescue and

volunteer’s onboarding

Distance between donor and

volunteer

66

Page 67: RLChina 2021 Game Theory and Machine Learning in

Predictive Model of Rescue-Volunteer Compatibility

Predict whether a rescue will be claimed by a specific volunteer

+: Claimed

-: Not claimed Training data: Mar 2018 to Oct 2019

Test data: Nov 2019 to Mar 2020

Feature

s

Neural

network

with 4

hidden

layers

6757 rescues

9212 volunteers

67

Page 68: RLChina 2021 Game Theory and Machine Learning in

Rescue-Specific Notification

Volunteer 1

Volunteer 2

Volunteer N

…

0.341

0.105

0.422

0.663

0.635

0.002

Volunteer 8346

Volunteer 333

Volunteer 1835

…

68

Page 69: RLChina 2021 Game Theory and Machine Learning in

Evaluation

β€’ Metric: Hit ratio at top π‘˜ (HR@k): % of rescues that are

claimed by volunteers in top π‘˜

β€’ π‘˜ = 964 to match the default INS

69

Page 70: RLChina 2021 Game Theory and Machine Learning in

Caveat with Rescue-Specific Notification

β€’ ML model discovers some frequent volunteers and

sends them notifications almost all the time

70

Page 71: RLChina 2021 Game Theory and Machine Learning in

ML + Online Planning

β€’ Rather than greedily taking the top π‘˜ volunteers, we

enforce a constraint such that each volunteer receives at

most 𝐿 notifications per day

β€’ For current rescue 𝑖, determine who to send notifications

to by planning with an projected set of future rescues 𝑅

71

π‘₯𝑖𝑗 ∈ {0,1}: Whether to send

notification of rescue 𝑖 to

volunteer 𝑗

𝑝𝑖𝑗 ∈ 0,1 : Output of ML model

indicating the prob. that

volunteer 𝑗 will claim rescue 𝑖

𝑏𝑗 ∈ {0, … , 𝐿}: Number of

notifications volunteer 𝑗 can

receive for the rest of the day

Page 72: RLChina 2021 Game Theory and Machine Learning in

Online Planning-Based Rescue-Specific Notification

β€’ Avoid the over-concentration with 𝐿 = 5

β€’ HR@k = 0.645, much better than current practice

72

(𝐿)

Page 73: RLChina 2021 Game Theory and Machine Learning in

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

73

Page 74: RLChina 2021 Game Theory and Machine Learning in

Summary

β€’ Communication and Coordination in Multi-Agent

Interaction

– Informants, Signals, Notifications

– Historical actions

β€’ ML+ GT for Communication and Coordination

– Mathematical programming-based algorithms

– Learn human behavior

– Learn equilibrium / optimal strategy

β€’ ML+ GT for Societal Challenges

– Security, Sustainability, Food security

74

Page 75: RLChina 2021 Game Theory and Machine Learning in

Acknowledgment

β€’ Advisors, postdocs, students and all co-authors!

β€’ Collaborators and partners

β€’ Funding support

75