distributed learning in routing games convergence

105
1/38 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Distributed Learning in Routing Games Convergence, Estimation of Player Dynamics, and Control Walid Krichene Alexandre Bayen Electrical Engineering and Computer Sciences, UC Berkeley IPAM Workshop on Decision Support for Traffic November 18, 2015

Upload: others

Post on 22-Nov-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Learning in Routing Games Convergence

1/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Distributed Learning in Routing GamesConvergence, Estimation of Player Dynamics, and Control

Walid Krichene Alexandre Bayen

Electrical Engineering and Computer Sciences, UC Berkeley

IPAM Workshop on Decision Support for TrafficNovember 18, 2015

Page 2: Distributed Learning in Routing Games Convergence

2/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Learning dynamics in the routing game

Routing games model congestion on networks.

Communication networks

Nash equilibrium quantifies efficiency of network in steady state.

System does not operate at equilibrium. Beyond equilibria, we need tounderstand decision dynamics (learning).

A realistic model for decision dynamics is essential for prediction, optimalcontrol.

Page 3: Distributed Learning in Routing Games Convergence

2/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Learning dynamics in the routing game

Routing games model congestion on networks.

Communication networks

Nash equilibrium quantifies efficiency of network in steady state.

System does not operate at equilibrium. Beyond equilibria, we need tounderstand decision dynamics (learning).

A realistic model for decision dynamics is essential for prediction, optimalcontrol.

Page 4: Distributed Learning in Routing Games Convergence

2/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Learning dynamics in the routing game

Routing games model congestion on networks.

Communication networks

Nash equilibrium quantifies efficiency of network in steady state.

System does not operate at equilibrium. Beyond equilibria, we need tounderstand decision dynamics (learning).

A realistic model for decision dynamics is essential for prediction, optimalcontrol.

Page 5: Distributed Learning in Routing Games Convergence

2/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Learning dynamics in the routing game

Routing games model congestion on networks.

Communication networks

Nash equilibrium quantifies efficiency of network in steady state.

System does not operate at equilibrium. Beyond equilibria, we need tounderstand decision dynamics (learning).

A realistic model for decision dynamics is essential for prediction, optimalcontrol.

Page 6: Distributed Learning in Routing Games Convergence

3/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Desiderata

Learning dynamics should be

Realistic in terms of information requirements, computational complexity.

Consistent with the full information Nash equilibrium.

x (t) → X ?

Convergence rates?Robust to stochastic perturbations.

Observation noise(Bandit feedback)

Page 7: Distributed Learning in Routing Games Convergence

3/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Desiderata

Learning dynamics should be

Realistic in terms of information requirements, computational complexity.

Consistent with the full information Nash equilibrium.

x (t) → X ?

Convergence rates?

Robust to stochastic perturbations.Observation noise(Bandit feedback)

Page 8: Distributed Learning in Routing Games Convergence

3/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Desiderata

Learning dynamics should be

Realistic in terms of information requirements, computational complexity.

Consistent with the full information Nash equilibrium.

x (t) → X ?

Convergence rates?Robust to stochastic perturbations.

Observation noise(Bandit feedback)

Page 9: Distributed Learning in Routing Games Convergence

3/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Outline

1 Introduction

2 Convergence of agent dynamics

3 Routing Examples

4 Estimation and optimal control

Page 10: Distributed Learning in Routing Games Convergence

3/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Outline

1 Introduction

2 Convergence of agent dynamics

3 Routing Examples

4 Estimation and optimal control

Page 11: Distributed Learning in Routing Games Convergence

4/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Interaction of K decision makers

Decision maker k faces a sequential decision problemAt iteration t

(1) chooses probability distribution x(t)Ak

over action set Ak

(2) discovers a loss function `(t)Ak

: Ak → [0, 1]

(3) updates distribution

Environment

Agent k

outcome`

(t)Ak

learning algorithmx

(t+1)Ak

= u(x

(t)Ak, `

(t)Ak

)

Figure: Sequential decision problem.

Loss of agent k affected by strategies of other agents.Does not know this function, only observes its value.Write x (t) = (x

(t)A1, . . . , x

(t)AK

).

Page 12: Distributed Learning in Routing Games Convergence

4/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Interaction of K decision makers

Decision maker k faces a sequential decision problemAt iteration t

(1) chooses probability distribution x(t)Ak

over action set Ak

(2) discovers a loss function `(t)Ak

: Ak → [0, 1]

(3) updates distribution

EnvironmentOther agents

Agent k

outcome`Ak (x

(t)A1, . . . , x

(t)AK

)

learning algorithmx

(t+1)Ak

= u(x

(t)Ak, `

(t)Ak

)

Figure: Sequential decision problem.

Loss of agent k affected by strategies of other agents.Does not know this function, only observes its value.Write x (t) = (x

(t)A1, . . . , x

(t)AK

).

Page 13: Distributed Learning in Routing Games Convergence

5/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Examples of decentralized decision makers

Routing game

Player drives from source to destination node

Chooses path from Ak

Mass of players on each edge determines cost on that edge.

2 3

0 1

4

5

6

Figure: Routing game

Page 14: Distributed Learning in Routing Games Convergence

5/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Examples of decentralized decision makers

Routing game

Player drives from source to destination node

Chooses path from Ak

Mass of players on each edge determines cost on that edge.

2 3

0 1

4

5

6

Figure: Routing game

Page 15: Distributed Learning in Routing Games Convergence

6/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Online learning model

Online Learning Model1: for t ∈ N do2: Play p ∼ x

(t)Ak

3: Discover `(t)Ak

4: Update

x(t+1)Ak

= uk(x

(t)Ak, `

(t)Ak

)5: end for

x(t)A1∈ ∆A1

Sample p ∼ x(t)A1

Discover `(t)A1

Update x(t+1)A1

Main problem

Define class of dynamics C such that

uk ∈ C ∀k ⇒ x (t) → X?

Page 16: Distributed Learning in Routing Games Convergence

6/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Online learning model

Online Learning Model1: for t ∈ N do2: Play p ∼ x

(t)Ak

3: Discover `(t)Ak

4: Update

x(t+1)Ak

= uk(x

(t)Ak, `

(t)Ak

)5: end for

x(t)A1∈ ∆A1 Sample p ∼ x

(t)A1

Discover `(t)A1

Update x(t+1)A1

Main problem

Define class of dynamics C such that

uk ∈ C ∀k ⇒ x (t) → X?

Page 17: Distributed Learning in Routing Games Convergence

6/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Online learning model

Online Learning Model1: for t ∈ N do2: Play p ∼ x

(t)Ak

3: Discover `(t)Ak

4: Update

x(t+1)Ak

= uk(x

(t)Ak, `

(t)Ak

)5: end for

x(t)A1∈ ∆A1 Sample p ∼ x

(t)A1

Discover `(t)A1

Update x(t+1)A1

Main problem

Define class of dynamics C such that

uk ∈ C ∀k ⇒ x (t) → X?

Page 18: Distributed Learning in Routing Games Convergence

6/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Online learning model

Online Learning Model1: for t ∈ N do2: Play p ∼ x

(t)Ak

3: Discover `(t)Ak

4: Update

x(t+1)Ak

= uk(x

(t)Ak, `

(t)Ak

)5: end for

x(t)A1∈ ∆A1 Sample p ∼ x

(t)A1

Discover `(t)A1

Update x(t+1)A1

Main problem

Define class of dynamics C such that

uk ∈ C ∀k ⇒ x (t) → X?

Page 19: Distributed Learning in Routing Games Convergence

6/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Online learning model

Online Learning Model1: for t ∈ N do2: Play p ∼ x

(t)Ak

3: Discover `(t)Ak

4: Update

x(t+1)Ak

= uk(x

(t)Ak, `

(t)Ak

)5: end for

x(t)A1∈ ∆A1 Sample p ∼ x

(t)A1

Discover `(t)A1

Update x(t+1)A1

Main problem

Define class of dynamics C such that

uk ∈ C ∀k ⇒ x (t) → X?

Page 20: Distributed Learning in Routing Games Convergence

7/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

A brief review

Discrete time:

Hannan consistency: [10]

Hedge algorithm for two-player games: [9]

Regret based algorithms: [11]

Online learning in games: [7]

Potential games: [19]

Continuous-time:

[10]James Hannan. Approximation to Bayes risk in repeated plays.Contributions to the Theory of Games, 3:97–139, 1957[9]Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights.Games and Economic Behavior, 29(1):79–103, 1999[11]Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies.Journal of Economic Theory, 98(1):26 – 54, 2001[7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.Cambridge University Press, 2006[19]Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategy fictitious play withinertia for potential games.Automatic Control, IEEE Transactions on, 54(2):208–220, 2009

Page 21: Distributed Learning in Routing Games Convergence

7/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Outline

1 Introduction

2 Convergence of agent dynamics

3 Routing Examples

4 Estimation and optimal control

Page 22: Distributed Learning in Routing Games Convergence

8/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Nash equilibria, and the Rosenthal potential

Writex = (xA1 , . . . , xAK ) ∈ ∆A1 × · · · ×∆AK

`(x) = (`A1(x), . . . , `AK (x))

Nash equilibrium

x? is a Nash equilibrium if

〈`(x?), x − x?〉 ≥ 0 ∀x ⇔ ∀k, ∀xAk ,⟨`Ak (x?), xAk − x?Ak

⟩≥ 0

In words, for all k, paths in the support of x?Akhave minimal loss.

Mass distributions x (t)Ak

Path losses `Ak (x (t))

Pop

ulationk

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

τ

E[ x

(τ)A

1

]

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

E[ ˆ p

(x(τ

))]

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Figure: Population distributions and path losses

Page 23: Distributed Learning in Routing Games Convergence

8/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Nash equilibria, and the Rosenthal potential

Writex = (xA1 , . . . , xAK ) ∈ ∆A1 × · · · ×∆AK

`(x) = (`A1(x), . . . , `AK (x))

Nash equilibrium

x? is a Nash equilibrium if

〈`(x?), x − x?〉 ≥ 0 ∀x ⇔ ∀k, ∀xAk ,⟨`Ak (x?), xAk − x?Ak

⟩≥ 0

In words, for all k, paths in the support of x?Akhave minimal loss.

Mass distributions x (t)Ak

Path losses `Ak (x (t))

Pop

ulationk

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

τ

E[ x

(τ)A

1

]

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

E[ ˆ p

(x(τ

))]

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Figure: Population distributions and path losses

Page 24: Distributed Learning in Routing Games Convergence

9/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Nash equilibria, and the Rosenthal potential

Rosenthal potential

∃f convex such that

∇f (x) = `(x)

Then the set of Nash equilibria is

X ? = argminx∈∆A1×···×∆AK

f (x)

Nash condition ⇔ first order optimality∀x , 〈`(x?), x − x?〉 ≥ 0 ∀x , 〈∇f (x?), x − x?〉 ≥ 0

x

X

x∗

∇f(x∗) = `(x∗)

Figure: First order optimality conditions of the potential f

Page 25: Distributed Learning in Routing Games Convergence

9/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Nash equilibria, and the Rosenthal potential

Rosenthal potential

∃f convex such that

∇f (x) = `(x)

Then the set of Nash equilibria is

X ? = argminx∈∆A1×···×∆AK

f (x)

Nash condition ⇔ first order optimality∀x , 〈`(x?), x − x?〉 ≥ 0 ∀x , 〈∇f (x?), x − x?〉 ≥ 0

x

X

x∗

∇f(x∗) = `(x∗)

Figure: First order optimality conditions of the potential f

Page 26: Distributed Learning in Routing Games Convergence

10/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Regret analysis

Technique 1: Regret analysis

Page 27: Distributed Learning in Routing Games Convergence

10/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Regret analysis

Cumulative regret

R(t)Ak

= supxAk∈∆Ak

∑τ≤t

⟨x

(t)Ak− xAk , `Ak (x (t))

“Online” optimality condition. Sublinear if lim suptR

(t)Akt≤ 0.

Convergence of averages[∀k,R(t)

Akis sublinear

]⇒ x (t) → X ?

x (t) = 1t

∑tτ=1 x

(τ). proof

Page 28: Distributed Learning in Routing Games Convergence

11/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence of x (t) Vs. convergence of x (t)

Routing game example

Pop

ulation1

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

τ

x(τ

)A

1

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Pop

ulation2

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

τ

x(τ

)p

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Figure: Population distributions

More

Page 29: Distributed Learning in Routing Games Convergence

11/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence of x (t) Vs. convergence of x (t)

Routing game example

Path losses `Ak (x (t)) `Ak (x (t))

Pop

ulation1

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ))

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ))

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Pop

ulation2

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ) )

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ) )

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Figure: Path losses

More

Page 30: Distributed Learning in Routing Games Convergence

12/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic approximation

Technique 2: Stochastic approximation

Page 31: Distributed Learning in Routing Games Convergence

12/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic approximation

Idea:

View the learning dynamics as a discretization of an ODE.

Study convergence of ODE.

Relate convergence of discrete algorithm to convergence of ODE.

. . .0 η1 η1 + η2

Figure: Underlying continuous time

Page 32: Distributed Learning in Routing Games Convergence

13/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Example: the Hedge algorithm

Hedge algorithm

Update the distribution according to observed loss

x (t+1)a ∝ x (t)

a e−ηkt `

(t)a

Also known asExponentially weighted average forecaster [7].Multiplicative weights update [1].Exponentiated gradient descent [12].Entropic descent [2].Log-linear learning [5], [18]

[7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.Cambridge University Press, 2006[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: ameta-algorithm and applications.Theory of Computing, 8(1):121–164, 2012[12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descentfor linear predictors.Information and Computation, 132(1):1 – 63, 1997[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methodsfor convex optimization.Oper. Res. Lett., 31(3):167–175, May 2003[5]Lawrence E. Blume. The statistical mechanics of strategic interaction.Games and Economic Behavior, 5(3):387 – 424, 1993.ISSN 0899-8256[18]Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony,completeness and payoff-based implementation.Games and Economic Behavior, 75(2):788–808, 2012

Page 33: Distributed Learning in Routing Games Convergence

13/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Example: the Hedge algorithm

Hedge algorithm

Update the distribution according to observed loss

x (t+1)a ∝ x (t)

a e−ηkt `

(t)a

Also known asExponentially weighted average forecaster [7].

Multiplicative weights update [1].Exponentiated gradient descent [12].Entropic descent [2].Log-linear learning [5], [18]

[7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.Cambridge University Press, 2006[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: ameta-algorithm and applications.Theory of Computing, 8(1):121–164, 2012[12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descentfor linear predictors.Information and Computation, 132(1):1 – 63, 1997[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methodsfor convex optimization.Oper. Res. Lett., 31(3):167–175, May 2003[5]Lawrence E. Blume. The statistical mechanics of strategic interaction.Games and Economic Behavior, 5(3):387 – 424, 1993.ISSN 0899-8256[18]Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony,completeness and payoff-based implementation.Games and Economic Behavior, 75(2):788–808, 2012

Page 34: Distributed Learning in Routing Games Convergence

13/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Example: the Hedge algorithm

Hedge algorithm

Update the distribution according to observed loss

x (t+1)a ∝ x (t)

a e−ηkt `

(t)a

Also known asExponentially weighted average forecaster [7].Multiplicative weights update [1].

Exponentiated gradient descent [12].Entropic descent [2].Log-linear learning [5], [18]

[7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.Cambridge University Press, 2006[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: ameta-algorithm and applications.Theory of Computing, 8(1):121–164, 2012[12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descentfor linear predictors.Information and Computation, 132(1):1 – 63, 1997[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methodsfor convex optimization.Oper. Res. Lett., 31(3):167–175, May 2003[5]Lawrence E. Blume. The statistical mechanics of strategic interaction.Games and Economic Behavior, 5(3):387 – 424, 1993.ISSN 0899-8256[18]Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony,completeness and payoff-based implementation.Games and Economic Behavior, 75(2):788–808, 2012

Page 35: Distributed Learning in Routing Games Convergence

13/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Example: the Hedge algorithm

Hedge algorithm

Update the distribution according to observed loss

x (t+1)a ∝ x (t)

a e−ηkt `

(t)a

Also known asExponentially weighted average forecaster [7].Multiplicative weights update [1].Exponentiated gradient descent [12].

Entropic descent [2].Log-linear learning [5], [18]

[7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.Cambridge University Press, 2006[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: ameta-algorithm and applications.Theory of Computing, 8(1):121–164, 2012[12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descentfor linear predictors.Information and Computation, 132(1):1 – 63, 1997[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methodsfor convex optimization.Oper. Res. Lett., 31(3):167–175, May 2003[5]Lawrence E. Blume. The statistical mechanics of strategic interaction.Games and Economic Behavior, 5(3):387 – 424, 1993.ISSN 0899-8256[18]Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony,completeness and payoff-based implementation.Games and Economic Behavior, 75(2):788–808, 2012

Page 36: Distributed Learning in Routing Games Convergence

13/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Example: the Hedge algorithm

Hedge algorithm

Update the distribution according to observed loss

x (t+1)a ∝ x (t)

a e−ηkt `

(t)a

Also known asExponentially weighted average forecaster [7].Multiplicative weights update [1].Exponentiated gradient descent [12].Entropic descent [2].

Log-linear learning [5], [18]

[7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.Cambridge University Press, 2006[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: ameta-algorithm and applications.Theory of Computing, 8(1):121–164, 2012[12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descentfor linear predictors.Information and Computation, 132(1):1 – 63, 1997[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methodsfor convex optimization.Oper. Res. Lett., 31(3):167–175, May 2003[5]Lawrence E. Blume. The statistical mechanics of strategic interaction.Games and Economic Behavior, 5(3):387 – 424, 1993.ISSN 0899-8256[18]Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony,completeness and payoff-based implementation.Games and Economic Behavior, 75(2):788–808, 2012

Page 37: Distributed Learning in Routing Games Convergence

13/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Example: the Hedge algorithm

Hedge algorithm

Update the distribution according to observed loss

x (t+1)a ∝ x (t)

a e−ηkt `

(t)a

Also known asExponentially weighted average forecaster [7].Multiplicative weights update [1].Exponentiated gradient descent [12].Entropic descent [2].Log-linear learning [5], [18]

[7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.Cambridge University Press, 2006[1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: ameta-algorithm and applications.Theory of Computing, 8(1):121–164, 2012[12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descentfor linear predictors.Information and Computation, 132(1):1 – 63, 1997[2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methodsfor convex optimization.Oper. Res. Lett., 31(3):167–175, May 2003[5]Lawrence E. Blume. The statistical mechanics of strategic interaction.Games and Economic Behavior, 5(3):387 – 424, 1993.ISSN 0899-8256[18]Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony,completeness and payoff-based implementation.Games and Economic Behavior, 75(2):788–808, 2012

Page 38: Distributed Learning in Routing Games Convergence

14/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

The replicator ODE

In Hedge x(t+1)p ∝ x

(t)p e−η

kt `

(t)p , take ηt → 0.

Replicator equation [29]

∀a ∈ Ak ,dxadt

= xa (〈`Ak (x), xAk 〉 − `a(x)) (1)

Theorem: [8]

Every solution of the ODE (1) converges to the set of its stationary points.

[29] Jörgen W Weibull. Evolutionary game theory.MIT press, 1997

[8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing.In Algorithms–ESA 2004, pages 323–334. Springer, 2004

Page 39: Distributed Learning in Routing Games Convergence

14/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

The replicator ODE

In Hedge x(t+1)p ∝ x

(t)p e−η

kt `

(t)p , take ηt → 0.

Replicator equation [29]

∀a ∈ Ak ,dxadt

= xa (〈`Ak (x), xAk 〉 − `a(x)) (1)

Theorem: [8]

Every solution of the ODE (1) converges to the set of its stationary points.

[29] Jörgen W Weibull. Evolutionary game theory.MIT press, 1997

[8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing.In Algorithms–ESA 2004, pages 323–334. Springer, 2004

Page 40: Distributed Learning in Routing Games Convergence

15/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

AREP dynamics: Approximate REPlicator

Discretization of the continuous-time replicator dynamics

x (t+1)a − x (t)

a = ηtx(t)a

(⟨`Ak (x (t)), x

(t)Ak

⟩− `a(x (t))

)+ ηtU

(t+1)a

ηt discretization time steps.

(U(t))t≥1 perturbations that satisfy for all T > 0,limτ1→∞maxτ2:

∑τ2t=τ1 ηt<T

∥∥∥∑τ2t=τ1

ηtU(t+1)

∥∥∥ = 0

(a sufficient condition is that ∃q ≥ 2: supτ E ‖U(τ)‖q <∞ and∑τ η

1+ q2

τ <∞)

[3] Michel Benaïm. Dynamics of stochastic approximation algorithms.In Séminaire de probabilités XXXIII, pages 1–68. Springer, 1999

Page 41: Distributed Learning in Routing Games Convergence

15/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

AREP dynamics: Approximate REPlicator

Discretization of the continuous-time replicator dynamics

x (t+1)a − x (t)

a = ηtx(t)a

(⟨`Ak (x (t)), x

(t)Ak

⟩− `a(x (t))

)+ ηtU

(t+1)a

ηt discretization time steps.

(U(t))t≥1 perturbations that satisfy for all T > 0,limτ1→∞maxτ2:

∑τ2t=τ1 ηt<T

∥∥∥∑τ2t=τ1

ηtU(t+1)

∥∥∥ = 0

(a sufficient condition is that ∃q ≥ 2: supτ E ‖U(τ)‖q <∞ and∑τ η

1+ q2

τ <∞)

[3] Michel Benaïm. Dynamics of stochastic approximation algorithms.In Séminaire de probabilités XXXIII, pages 1–68. Springer, 1999

Page 42: Distributed Learning in Routing Games Convergence

15/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

AREP dynamics: Approximate REPlicator

Discretization of the continuous-time replicator dynamics

x (t+1)a − x (t)

a = ηtx(t)a

(⟨`Ak (x (t)), x

(t)Ak

⟩− `a(x (t))

)+ ηtU

(t+1)a

ηt discretization time steps.

(U(t))t≥1 perturbations that satisfy for all T > 0,limτ1→∞maxτ2:

∑τ2t=τ1 ηt<T

∥∥∥∑τ2t=τ1

ηtU(t+1)

∥∥∥ = 0

(a sufficient condition is that ∃q ≥ 2: supτ E ‖U(τ)‖q <∞ and∑τ η

1+ q2

τ <∞)

[3] Michel Benaïm. Dynamics of stochastic approximation algorithms.In Séminaire de probabilités XXXIII, pages 1–68. Springer, 1999

Page 43: Distributed Learning in Routing Games Convergence

16/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence to Nash equilibria

Theorem [15]

Under AREP updates, if ηt ↓ 0 and∑ηt =∞, then

x (t) → X ?

Affine interpolation of x (t) is an asymptotic pseudo trajectory.

x(0)

Φt0(x(0))

x(1) Φtk−2(x(k−2))

x(k−1)

x(k)

Φtk−1(x(k−1))

Use f as a Lyapunov function. proof details

However, No convergence rates.

[15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria incongestion games.SIAM Journal on Control and Optimization (SICON), to appear, 2014

Page 44: Distributed Learning in Routing Games Convergence

16/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence to Nash equilibria

Theorem [15]

Under AREP updates, if ηt ↓ 0 and∑ηt =∞, then

x (t) → X ?

Affine interpolation of x (t) is an asymptotic pseudo trajectory.

x(0)

Φt0(x(0))

x(1) Φtk−2(x(k−2))

x(k−1)

x(k)

Φtk−1(x(k−1))

Use f as a Lyapunov function. proof details

However, No convergence rates.

[15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria incongestion games.SIAM Journal on Control and Optimization (SICON), to appear, 2014

Page 45: Distributed Learning in Routing Games Convergence

17/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic convex optimization

Technique 3: (Stochastic) convex optimization

Page 46: Distributed Learning in Routing Games Convergence

17/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic convex optimization

Idea:

View the learning dynamics as a distributed algorithm to minimize f .

(More generally: distributed algorithm to find zero of a monotoneoperator).

Allows us to analyze convergence rates.

Here:Class of distributed optimization methods: stochastic mirror descent.

Page 47: Distributed Learning in Routing Games Convergence

17/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic convex optimization

Idea:

View the learning dynamics as a distributed algorithm to minimize f .

(More generally: distributed algorithm to find zero of a monotoneoperator).

Allows us to analyze convergence rates.

Here:Class of distributed optimization methods: stochastic mirror descent.

Page 48: Distributed Learning in Routing Games Convergence

17/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic convex optimization

Idea:

View the learning dynamics as a distributed algorithm to minimize f .

(More generally: distributed algorithm to find zero of a monotoneoperator).

Allows us to analyze convergence rates.

Here:Class of distributed optimization methods: stochastic mirror descent.

Page 49: Distributed Learning in Routing Games Convergence

18/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic Mirror Descent

minimize f (x) convex function

subject to x ∈ X ⊂ Rd convex, compact set

Algorithm 1 MD Method with learning rates (ηt)

ηt : learning rate

Dψ: Bregman divergence

f(x(t))f(x(t+1))

f(x)

f(x(t)) + 〈`(t), x− x(t)〉f(x(t)) + 〈`(t), x− x(t)〉+ 1

ηtDψ(x, x

(t))

[21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency inoptimization.Wiley-Interscience series in discrete mathematics. Wiley, 1983[20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximationapproach to stochastic programming.SIAM Journal on Optimization, 19(4):1574–1609, 2009

Page 50: Distributed Learning in Routing Games Convergence

18/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic Mirror Descent

minimize f (x) convex function

subject to x ∈ X ⊂ Rd convex, compact set

Algorithm 2 MD Method with learning rates (ηt)

1: for t ∈ N do2: observe `(t) ∈ ∂f (x (t))

3: x (t+1) = arg minx∈X

⟨`(t), x

⟩+ 1

ηtDψ(x , x (t))

4: end for

ηt : learning rate

Dψ: Bregman divergence

f(x(t))f(x(t+1))

f(x)

f(x(t)) + 〈`(t), x− x(t)〉f(x(t)) + 〈`(t), x− x(t)〉+ 1

ηtDψ(x, x

(t))

[21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency inoptimization.Wiley-Interscience series in discrete mathematics. Wiley, 1983[20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximationapproach to stochastic programming.SIAM Journal on Optimization, 19(4):1574–1609, 2009

Page 51: Distributed Learning in Routing Games Convergence

18/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic Mirror Descent

minimize f (x) convex function

subject to x ∈ X ⊂ Rd convex, compact set

Algorithm 2 MD Method with learning rates (ηt)

1: for t ∈ N do2: observe `

(t)Ak∈ ∂Ak f (x (t))

3: x(t+1)Ak

= arg minx∈XAk

⟨`

(t)Ak, x⟩

+ 1ηktDψk (x , x

(t)Ak

)

4: end for

ηt : learning rate

Dψ: Bregman divergence

f(x(t))f(x(t+1))

f(x)

f(x(t)) + 〈`(t), x− x(t)〉f(x(t)) + 〈`(t), x− x(t)〉+ 1

ηtDψ(x, x

(t))

[21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency inoptimization.Wiley-Interscience series in discrete mathematics. Wiley, 1983[20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximationapproach to stochastic programming.SIAM Journal on Optimization, 19(4):1574–1609, 2009

Page 52: Distributed Learning in Routing Games Convergence

18/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic Mirror Descent

minimize f (x) convex function

subject to x ∈ X ⊂ Rd convex, compact set

Algorithm 2 SMD Method with learning rates (ηt)

1: for t ∈ N do2: observe ˆ(t)

Akwith E

[ˆ(t)Ak|Ft−1

]∈ ∂Ak f (x (t))

3: x(t+1)Ak

= arg minx∈XAk

⟨ˆ(t)Ak, x⟩

+ 1ηktDψk (x , x

(t)Ak

)

4: end for

ηt : learning rate

Dψ: Bregman divergence

f(x(t))f(x(t+1))

f(x)

f(x(t)) + 〈`(t), x− x(t)〉f(x(t)) + 〈`(t), x− x(t)〉+ 1

ηtDψ(x, x

(t))

[21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency inoptimization.Wiley-Interscience series in discrete mathematics. Wiley, 1983[20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximationapproach to stochastic programming.SIAM Journal on Optimization, 19(4):1574–1609, 2009

Page 53: Distributed Learning in Routing Games Convergence

19/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence

dτ = Dψ(X ?, x (τ)).

Main ingredient

E [dτ+1|Fτ−1] ≤ dτ−ητ (f (x(τ))− f ?)+η2τ

2µE[‖ˆ(τ)‖2∗|Fτ−1

]

From here,

Can show a.s. convergence x (t) → X ? if∑ηt =∞ and

∑η2t <∞

dτ is an almost super martingale [22], [6]

Deterministic version:If dτ+1 ≤ dτ−aτ+bτ , and

∑bτ <∞, then (dτ ) converges.

[22]H. Robbins and D. Siegmund. A convergence theorem for non negative almostsupermartingales and some applications.Optimizing Methods in Statistics, 1971[6]Léon Bottou. Online algorithms and stochastic approximations.1998

Page 54: Distributed Learning in Routing Games Convergence

19/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence

dτ = Dψ(X ?, x (τ)).

Main ingredient

E [dτ+1|Fτ−1] ≤ dτ−ητ (f (x(τ))− f ?)+η2τ

2µE[‖ˆ(τ)‖2∗|Fτ−1

]From here,

Can show a.s. convergence x (t) → X ? if∑ηt =∞ and

∑η2t <∞

dτ is an almost super martingale [22], [6]

Deterministic version:If dτ+1 ≤ dτ−aτ+bτ , and

∑bτ <∞, then (dτ ) converges.

[22]H. Robbins and D. Siegmund. A convergence theorem for non negative almostsupermartingales and some applications.Optimizing Methods in Statistics, 1971[6]Léon Bottou. Online algorithms and stochastic approximations.1998

Page 55: Distributed Learning in Routing Games Convergence

19/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence

dτ = Dψ(X ?, x (τ)).

Main ingredient

E [dτ+1|Fτ−1] ≤ dτ−ητ (f (x(τ))− f ?)+η2τ

2µE[‖ˆ(τ)‖2∗|Fτ−1

]From here,

Can show a.s. convergence x (t) → X ? if∑ηt =∞ and

∑η2t <∞

dτ is an almost super martingale [22], [6]

Deterministic version:If dτ+1 ≤ dτ−aτ+bτ , and

∑bτ <∞, then (dτ ) converges.

[22]H. Robbins and D. Siegmund. A convergence theorem for non negative almostsupermartingales and some applications.Optimizing Methods in Statistics, 1971[6]Léon Bottou. Online algorithms and stochastic approximations.1998

Page 56: Distributed Learning in Routing Games Convergence

19/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence

dτ = Dψ(X ?, x (τ)).

Main ingredient

E [dτ+1|Fτ−1] ≤ dτ−ητ (f (x(τ))− f ?)+η2τ

2µE[‖ˆ(τ)‖2∗|Fτ−1

]From here,

Can show a.s. convergence x (t) → X ? if∑ηt =∞ and

∑η2t <∞

dτ is an almost super martingale [22], [6]

Deterministic version:If dτ+1 ≤ dτ−aτ+bτ , and

∑bτ <∞, then (dτ ) converges.

[22]H. Robbins and D. Siegmund. A convergence theorem for non negative almostsupermartingales and some applications.Optimizing Methods in Statistics, 1971[6]Léon Bottou. Online algorithms and stochastic approximations.1998

Page 57: Distributed Learning in Routing Games Convergence

20/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence

To show convergence E[f (x (t))

]→ f ?, generalize the technique of Shamir

et al. [25] (for SGD, α = 12 ).

Convergence of Distributed Stochastic Mirror Descent

For ηkt = θktαk , αk ∈ (0, 1),

E[f (x (t))

]− f ? = O

(∑k

log ttmin(αk ,1−αk )

)

Non-smooth, non-strongly convex.More details

[25]Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth optimization:Convergence results and optimal averaging schemes.In ICML, pages 71–79, 2013[14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence ofheterogeneous distributed learning in stochastic routing games.In 53rd Allerton Conference on Communication, Control and Computing, 2015

Page 58: Distributed Learning in Routing Games Convergence

21/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Summary

Regret analysis: convergence of x (t)

Stochastic approximation: almost sure convergence of x (t)

Stochastic convex optimization: almost sure convergence,E[f (x (t))

]→ f ?, E

[Dψ(x?, x (t))

]→ 0, convergence rates.

Page 59: Distributed Learning in Routing Games Convergence

22/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Outline

1 Introduction

2 Convergence of agent dynamics

3 Routing Examples

4 Estimation and optimal control

Page 60: Distributed Learning in Routing Games Convergence

23/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Application to the routing game

2 3

0 1

4

5

6

Figure: Example with strongly convex potential.

Centered Gaussian noise on edges.

Population 1: Hedge with η1t = t−1

Population 2: Hedge with η2t = t−1

Page 61: Distributed Learning in Routing Games Convergence

24/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Routing game with strongly convex potential

Mass distributions x (t)Ak

Path losses `Ak (x (t))

Pop

ulation1

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

τ

x(τ

)A

1

path p0 = (v0, v4, v6, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ))

path p0 = (v0, v4, v6, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Pop

ulation2

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

τ

x(τ

)p

path p3 = (v2, v3)

path p4 = (v2, v4, v5, v3)

path p5 = (v2, v4, v6, v3)

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ) )

path p3 = (v2, v3)

path p4 = (v2, v4, v5, v3)

path p5 = (v2, v4, v6, v3)

Figure: Population distributions and noisy path losses

Page 62: Distributed Learning in Routing Games Convergence

25/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Routing game with strongly convex potential

100 101 10210−4

10−3

10−2

10−1

100

101

τ

E[ D

KL(x?,x

(τ))]

η1t = t−1, η2t = t−1

Figure: Distance to equilibrium.For ηkt = θk

`f tαk , αk ∈ (0, 1], E

[Dψ(x?, x(t))

]= O(

∑k t−αk )

Page 63: Distributed Learning in Routing Games Convergence

26/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Routing game with weakly convex potential

0

1

2 3 4

Figure: A weakly convex example.

Page 64: Distributed Learning in Routing Games Convergence

27/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Routing game with weakly convex potential

100 101 10210−6

10−5

10−4

10−3

10−2

τ

f(x

(τ))−f∗

η1t = t−.3, η2t = t−.4

Figure: Potential values.For θk

tαk , αk ∈ (0, 1), E[f (x(t))

]− f ? = O

(∑k

log t

tmin(αk ,1−αk )

)

Page 65: Distributed Learning in Routing Games Convergence

27/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Routing game with weakly convex potential

100 101 10210−6

10−5

10−4

10−3

10−2

τ

E[ f

(x(τ

))]] −

f∗

η1t = t−.3, η2t = t−.4

Figure: Potential values.For θk

tαk , αk ∈ (0, 1), E[f (x(t))

]− f ? = O

(∑k

log t

tmin(αk ,1−αk )

)

Page 66: Distributed Learning in Routing Games Convergence

27/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Routing game with weakly convex potential

100 101 10210−6

10−5

10−4

10−3

10−2

τ

E[ f

(x(τ

))]] −

f∗

η1t = t−.3, η2t = t−.4

η1t = t−.5, η2t = t−.5

Figure: Potential values.For θk

tαk , αk ∈ (0, 1), E[f (x(t))

]− f ? = O

(∑k

log t

tmin(αk ,1−αk )

)

Page 67: Distributed Learning in Routing Games Convergence

28/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Outline

1 Introduction

2 Convergence of agent dynamics

3 Routing Examples

4 Estimation and optimal control

Page 68: Distributed Learning in Routing Games Convergence

29/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

A routing experiment

Interface for the routing game.

Used to collect sequence of decisions x (t).

Figure: Interface for the routing game experiment.

Page 69: Distributed Learning in Routing Games Convergence

29/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

A routing experiment

Interface for the routing game.

Used to collect sequence of decisions x (t).

Serverx(t)

x(t)1

(¯(t)1 , ¯(t−1)

1 , . . . )

x(t)2

x(t)3

(¯(t)2 , ¯(t−1)

2 , . . . )

(¯(t)3 , ¯(t−1)

3 , . . . )

Figure: Interface for the routing game experiment.

Page 70: Distributed Learning in Routing Games Convergence

30/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Estimation of learning dynamics

We observe a sequence of player decisions (x (t)) and losses (¯(t)).

Can we fit a model of player dynamics?

Mirror descent model

Estimate the learning rate in the mirror descent model

x (t+1)(η) = argminx∈∆Ak

⟨¯(t), x

⟩+

1ηDKL(x , x (t))

Then d(η) = DKL(x (t+1), x (t+1)(η)) is a convex function. Can minimize it toestimate η(t)

k .

[17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics inthe routing game.In International Conference on Cyber-Physical Systems (ICCPS), in review., 2015

Page 71: Distributed Learning in Routing Games Convergence

30/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Estimation of learning dynamics

We observe a sequence of player decisions (x (t)) and losses (¯(t)).

Can we fit a model of player dynamics?

Mirror descent model

Estimate the learning rate in the mirror descent model

x (t+1)(η) = argminx∈∆Ak

⟨¯(t), x

⟩+

1ηDKL(x , x (t))

Then d(η) = DKL(x (t+1), x (t+1)(η)) is a convex function. Can minimize it toestimate η(t)

k .

[17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics inthe routing game.In International Conference on Cyber-Physical Systems (ICCPS), in review., 2015

Page 72: Distributed Learning in Routing Games Convergence

30/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Estimation of learning dynamics

We observe a sequence of player decisions (x (t)) and losses (¯(t)).

Can we fit a model of player dynamics?

Mirror descent model

Estimate the learning rate in the mirror descent model

x (t+1)(η) = argminx∈∆Ak

⟨¯(t), x

⟩+

1ηDKL(x , x (t))

Then d(η) = DKL(x (t+1), x (t+1)(η)) is a convex function. Can minimize it toestimate η(t)

k .

[17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics inthe routing game.In International Conference on Cyber-Physical Systems (ICCPS), in review., 2015

Page 73: Distributed Learning in Routing Games Convergence

31/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Preliminary results

0 5 10 15 20 25 30

Game iteration

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

Nor

mal

ized

cost

〈x(t)

k,ℓ(t)

k〉

〈x⋆ k,ℓ⋆ k〉 Exploration Convergence

Player 1Player 2Player 3Player 4Player 5

Figure: Costs of each player (normalized by the equilibrium cost)

Page 74: Distributed Learning in Routing Games Convergence

31/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Preliminary results

0 5 10 15 20 25 30

Game iteration

18

19

20

21

22

23

24

25

f(x

(t) )−

f(x

⋆) Exploration Convergence

f(x(t))− f(x⋆)

f(x(t))− f(x⋆) minimum

Figure: Potential function f (x(t))− f ?.

Page 75: Distributed Learning in Routing Games Convergence

31/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Preliminary results

1 2 3 4 5 6 7 8

Prediction horizon h

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Ave

rage

Bre

gman

dive

rgen

ce

Parameterized ηt

Previous ηt

Moving average ηt

Linear regression ηt

Figure: Average KL divergence between predicted distributions and actualdistributions, as a function of the prediction horizon h.

Page 76: Distributed Learning in Routing Games Convergence

32/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Optimal routing with learning dynamics

Assumptions

A central authority has control over a fraction of traffic:u(t) ∈ α1∆A1 × · · · × αK∆AK

Remaining traffic follows learning dynamics:x (t) ∈ (1− α1)∆A1 × · · · × (1− αK )∆AK

Optimal routing under selfish learning constraints

minimizeu(1:T ),x(1:T )

T∑t=1

J(x (t), u(t))

subject to x (t+1) = u(x (t) + u(t), `(x (t) + u(t)))

Page 77: Distributed Learning in Routing Games Convergence

33/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Solution methods

Greedy method: Approximate the problem with a sequence of convexproblems.

minimizeu(t)J(u(x (t−1), u(t−1)), u(t))

Mirror descent with the adjoint method.

Adjoint method

minimizeu J(u, x)

subject to H(x , u) = 0

equivalent tominimize J(u,X (u))

Then perform mirror descent on this function of u.

Page 78: Distributed Learning in Routing Games Convergence

33/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Solution methods

Greedy method: Approximate the problem with a sequence of convexproblems.

minimizeu(t)J(u(x (t−1), u(t−1)), u(t))

Mirror descent with the adjoint method.

Adjoint method

minimizeu J(u, x)

subject to H(x , u) = 0

equivalent tominimize J(u,X (u))

Then perform mirror descent on this function of u.

Page 79: Distributed Learning in Routing Games Convergence

34/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

A simple example

s t

ce(φe) = 1

F = 1

ce(φe) = 2φe

Figure: Simple Pigou network used for the numerical experiment.

Social optimum: ( 34 ,

14 )

Nash equilibrium ( 12 ,

12 )

Control over α = 12 of traffic

Page 80: Distributed Learning in Routing Games Convergence

35/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

A simple example

0 50 100 150 200 250 3000.8

0.9

1

1.1

1.2

1.3

1.4

1.5

t

Jt(ut+xt)

Jt(f*t)

0 50 100 150 200 250 3000.86

0.88

0.9

0.92

0.94

0.96

0.98

1

t

Jt(ut+xt)

Jt(f*t)

Figure: Social cost J(t) over time induced by adjoint solution (left) and the greedysolution (right). The dashed line shows the social optimal allocation.

Page 81: Distributed Learning in Routing Games Convergence

35/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

A simple example

0 50 100 150 200 250 3000

0.5

1

ut and ft* (−−)

0 50 100 150 200 250 3000

0.5

1

xt and ft* (−−)

0 50 100 150 200 250 3000.5

1

1.5

l(xt+ut)

t

Figure: Adjoint controlled flows (top), selfish flows (bottom). The green linescorrespond to the top path, and the blue lines to the bottom path. The dashed linesshow the social optimal flows xSO = ( 3

4 ,14 ).

Page 82: Distributed Learning in Routing Games Convergence

36/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Application to the L.A. highway network

Simplified model of the L.A. highway network.

Cost functions uses the B.P.R. function, calibrated using the work of [28].

Figure: Los Angeles highway network.

[28]J. Thai, R. Hariss, and A. Bayen. A multi-convex approach to latency inference and controlin traffic equilibria from sparse data.In American Control Conference (ACC), 2015, pages 689–695, July 2015

Page 83: Distributed Learning in Routing Games Convergence

36/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Application to the L.A. highway network

0 50 100 150 200 2505.5

6

6.5

7

7.5

8

8.5

9

9.5

10

10.5x 10

4

τ

J(x

(i) ,u(i) )

α= 0.1α= 0.3α= 0.5α= 0.7α= 0.9

J(s)

J∗

Figure: Average delay without control (dashed), with full control (solid), and differentvalues of α.

[27]Milena Suarez, Walid Krichene, and Alexandre Bayen. Optimal routing under hedgeresponse.Transactions on Control of Networked Systems (TCNS), in preparation, 2015

Page 84: Distributed Learning in Routing Games Convergence

37/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Summary

EnvironmentOther agents

Agent k

outcome`Ak(x

(t)A1, . . . , x

(t)AK )

learning algorithm

x(t+1)Ak = u

(x(t)Ak , `

(t)Ak

)

Figure: Coupled sequential decision problems.

Simple model for distributed learning.

Techniques for design / analysis of learning dynamics.

Estimation of learning dynamics.

Optimal control.

Page 85: Distributed Learning in Routing Games Convergence

38/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Ongoing / future work

Larger classes of games.Acceleration of learning dynamics (Nesterov’s method).

In continuous time: accelerated replicator dynamics.In discrete time: convergence in O(1/t2) instead of O(1/t).Heuristic to remove oscillations.

Applications to load balancing.

Other control schemes: tolling / incentivization.

Thank you!

eecs.berkeley.edu/∼walid/

Page 86: Distributed Learning in Routing Games Convergence

38/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Ongoing / future work

Larger classes of games.Acceleration of learning dynamics (Nesterov’s method).

In continuous time: accelerated replicator dynamics.In discrete time: convergence in O(1/t2) instead of O(1/t).Heuristic to remove oscillations.

Applications to load balancing.

Other control schemes: tolling / incentivization.

Thank you!

eecs.berkeley.edu/∼walid/

Page 87: Distributed Learning in Routing Games Convergence

39/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

References I

[1] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weightsupdate method: a meta-algorithm and applications. Theory ofComputing, 8(1):121–164, 2012.

[2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projectedsubgradient methods for convex optimization. Oper. Res. Lett., 31(3):167–175, May 2003.

[3] Michel Benaïm. Dynamics of stochastic approximation algorithms. InSéminaire de probabilités XXXIII, pages 1–68. Springer, 1999.

[4] Avrim Blum, Eyal Even-Dar, and Katrina Ligett. Routing without regret:on convergence to nash equilibria of regret-minimizing algorithms inrouting games. In Proceedings of the twenty-fifth annual ACM symposiumon Principles of distributed computing, PODC ’06, pages 45–52, NewYork, NY, USA, 2006. ACM.

[5] Lawrence E. Blume. The statistical mechanics of strategic interaction.Games and Economic Behavior, 5(3):387 – 424, 1993. ISSN 0899-8256.

[6] Léon Bottou. Online algorithms and stochastic approximations. 1998.

[7] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games.Cambridge University Press, 2006.

Page 88: Distributed Learning in Routing Games Convergence

40/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

References II

[8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing.In Algorithms–ESA 2004, pages 323–334. Springer, 2004.

[9] Yoav Freund and Robert E Schapire. Adaptive game playing usingmultiplicative weights. Games and Economic Behavior, 29(1):79–103,1999.

[10] James Hannan. Approximation to Bayes risk in repeated plays.Contributions to the Theory of Games, 3:97–139, 1957.

[11] Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies.Journal of Economic Theory, 98(1):26 – 54, 2001.

[12] Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versusgradient descent for linear predictors. Information and Computation, 132(1):1 – 63, 1997.

[13] Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicativeupdates outperform generic no-regret learning in congestion games. InProceedings of the 41st annual ACM symposium on Theory of computing,pages 533–542. ACM, 2009.

Page 89: Distributed Learning in Routing Games Convergence

41/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

References III

[14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen.Convergence of heterogeneous distributed learning in stochastic routinggames. In 53rd Allerton Conference on Communication, Control andComputing, 2015.

[15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nashequilibria in congestion games. SIAM Journal on Control andOptimization (SICON), to appear, 2014.

[16] Walid Krichene, Syrine Krichene, and Alexandre Bayen. Convergence ofmirror descent dynamics in the routing game. In European ControlConference (ECC), 2015.

[17] Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation oflearning dynamics in the routing game. In International Conference onCyber-Physical Systems (ICCPS), in review., 2015.

[18] Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning:Asynchrony, completeness and payoff-based implementation. Games andEconomic Behavior, 75(2):788–808, 2012.

Page 90: Distributed Learning in Routing Games Convergence

42/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

References IV

[19] Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategyfictitious play with inertia for potential games. Automatic Control, IEEETransactions on, 54(2):208–220, 2009.

[20] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochasticapproximation approach to stochastic programming. SIAM Journal onOptimization, 19(4):1574–1609, 2009.

[21] A. S. Nemirovsky and D. B. Yudin. Problem complexity and methodefficiency in optimization. Wiley-Interscience series in discretemathematics. Wiley, 1983.

[22] H. Robbins and D. Siegmund. A convergence theorem for non negativealmost supermartingales and some applications. Optimizing Methods inStatistics, 1971.

[23] William H Sandholm. Potential games with continuous player sets.Journal of Economic Theory, 97(1):81–108, 2001.

[24] William H. Sandholm. Population games and evolutionary dynamics.Economic learning and social evolution. Cambridge, Mass. MIT Press,2010. ISBN 978-0-262-19587-4.

Page 91: Distributed Learning in Routing Games Convergence

43/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

References V

[25] Ohad Shamir and Tong Zhang. Stochastic gradient descent fornon-smooth optimization: Convergence results and optimal averagingschemes. In ICML, pages 71–79, 2013.

[26] Weijie Su, Stephen Boyd, and Emmanuel Candes. A differential equationfor modeling nesterov’s accelerated gradient method: Theory and insights.In NIPS, 2014.

[27] Milena Suarez, Walid Krichene, and Alexandre Bayen. Optimal routingunder hedge response. Transactions on Control of Networked Systems(TCNS), in preparation, 2015.

[28] J. Thai, R. Hariss, and A. Bayen. A multi-convex approach to latencyinference and control in traffic equilibria from sparse data. In AmericanControl Conference (ACC), 2015, pages 689–695, July 2015.

[29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997.

Page 92: Distributed Learning in Routing Games Convergence

44/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Continuous time model

Back

Continuous-time learning model

xAk (t) = vk(x

(t)Ak, `Ak (x (t))

)Evolution in populations: [24]Convergence in potential games under dynamics which satisfy a positivecorrelation condition [23]Replicator dynamics for the congestion game [8] and in evolutionary gametheory [29]No-regret dynamics for two player games [11]

[24]William H. Sandholm. Population games and evolutionary dynamics.Economic learning and social evolution. Cambridge, Mass. MIT Press, 2010.ISBN 978-0-262-19587-4[23] William H Sandholm. Potential games with continuous player sets.Journal of Economic Theory, 97(1):81–108, 2001

[29] Jörgen W Weibull. Evolutionary game theory.MIT press, 1997

[8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing.In Algorithms–ESA 2004, pages 323–334. Springer, 2004

[11] Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies.Journal of Economic Theory, 98(1):26 – 54, 2001

Page 93: Distributed Learning in Routing Games Convergence

45/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Oscillating example

Back

Path losses `Ak (x (t)) `Ak (x (t))

Pop

ulation1

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ))

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ))

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Pop

ulation2

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ) )

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

0 10 20 30 40 50 60 70 80 90 100

0.5

1

1.5

2

τ

ˆ p(x

(τ) )

path p0 = (v0, v5, v1)

path p1 = (v0, v4, v5, v1)

path p2 = (v0, v1)

Figure: Path losses

Page 94: Distributed Learning in Routing Games Convergence

46/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Oscillating example

Back

0 50 100 150 200 250 30010−5

10−4

10−3

10−2

10−1

100

t

f(x

(τ) )−f∗

Figure: Potentials

Page 95: Distributed Learning in Routing Games Convergence

47/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Oscillating example

Back

x(0)A1

limτ→∞

x(τ)A1

p0

p1

p2

x(0)A2

limτ→∞

x(τ)A2

p3

p4

p5

Figure: Trajectories in the simplex

Page 96: Distributed Learning in Routing Games Convergence

48/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Regret [10]

Back Cumulative regret

R(t)Ak

= supxAk∈∆Ak

∑τ≤t

⟨x

(t)Ak− xAk , `Ak (x (t))

Convergence of averages

∀k, lim supt

R(t)Ak

t≤ 0⇒ x (t) =

1t

∑τ≤t

x (τ) → X ?

By convexity of f ,

f

1t

∑τ≤t

x (τ)

− f (x) ≤ 1t

∑τ≤t

f (x (τ))− f (x)

≤ 1t

∑τ≤t

⟨`(x (t)), x (t) − x

⟩=

K∑k=1

R(t)Ak

t

[10] James Hannan. Approximation to Bayes risk in repeated plays.Contributions to the Theory of Games, 3:97–139, 1957

Page 97: Distributed Learning in Routing Games Convergence

49/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

AREP convergence proof

Back

Affine interpolation of x (t) is an asymptotic pseudo trajectory.

x(0)

Φt0(x(0))

x(1) Φtk−2(x(k−2))

x(k−1)

x(k)

Φtk−1(x(k−1))

The set of limit points of an APT is internally chain transitive ICT.

If Γ is compact invariant, and has a Lyapunov function f withint f (Γ) = ∅, then ∀L ICT, Γ, and f is constant on L.

In particular, f is constant on L(x (t)), so f (x (t)) converges.

Page 98: Distributed Learning in Routing Games Convergence

50/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Bregman Divergence

Back

Bregman Divergence

Strongly convex function ψ

Dψ(x , y) = ψ(x)− ψ(y)− 〈∇ψ(y), x − y〉

Example [2]: when X = ∆d

ψ(x) = −H(x) =∑

a xa ln xaDψ(x , y) = DKL(x , y) =

∑a xa ln

xaya

The MD update has closed form solution

x (t+1) ∝ x (t)a e−ηtg

(t)a

A.k.a. Hedge algorithm, exponentialweights.

δ1

δ2

δ3

q

Figure: KL divergence

[2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methodsfor convex optimization.Oper. Res. Lett., 31(3):167–175, May 2003

Page 99: Distributed Learning in Routing Games Convergence

50/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Bregman Divergence

Back

Bregman Divergence

Strongly convex function ψ

Dψ(x , y) = ψ(x)− ψ(y)− 〈∇ψ(y), x − y〉

Example [2]: when X = ∆d

ψ(x) = −H(x) =∑

a xa ln xaDψ(x , y) = DKL(x , y) =

∑a xa ln

xaya

The MD update has closed form solution

x (t+1) ∝ x (t)a e−ηtg

(t)a

A.k.a. Hedge algorithm, exponentialweights.

δ1

δ2

δ3

q

Figure: KL divergence

[2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methodsfor convex optimization.Oper. Res. Lett., 31(3):167–175, May 2003

Page 100: Distributed Learning in Routing Games Convergence

51/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

A bounded entropic divergence

Back

X = ∆

DKL(x , y) =∑d

i=1 xi lnxiyi

is unbounded.

Define DεKL(x , y) =

∑di=1(xi + ε) ln xi+ε

yi+ε

Proposition

DεKL is 1

1+dε-strongly convex w.r.t. ‖ · ‖1

DεKL is bounded by (1 + dε) ln 1+ε

ε.

Page 101: Distributed Learning in Routing Games Convergence

51/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

A bounded entropic divergence

Back

X = ∆

DKL(x , y) =∑d

i=1 xi lnxiyi

is unbounded.

Define DεKL(x , y) =

∑di=1(xi + ε) ln xi+ε

yi+ε

Proposition

DεKL is 1

1+dε-strongly convex w.r.t. ‖ · ‖1

DεKL is bounded by (1 + dε) ln 1+ε

ε.

Page 102: Distributed Learning in Routing Games Convergence

52/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence of DMD

Back

Theorem: Convergence of DMD [16]

Suppose f has L Lipschitz gradient. Then under the MD class with ηt ↓ 0 and∑ηt =∞,

f (x (t))− f ? = O

(∑τ≤t ητ

t+

1ηt

+1t

)

1t

∑τ≤t

f (x (t))− f ? ≤∑k

L2k

2`ψk

∑τ≤t

ηkτ +Dk

ηkt

and

f (x (t))− f ? ≤ 1t

∑τ≤t

f (x (t))− f ? + O

(1t

)

Page 103: Distributed Learning in Routing Games Convergence

53/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence in DSMD

Back

Regret bound [14]

SMD method with (ηt). ∀t2 > t1 ≥ 0 and Ft1 -measurable x ,

t2∑τ=t1

E[⟨

g (τ), x(τ) − x⟩]≤ E

[Dψ(x , x(t1))

]ηt1

+ D

(1ηt2− 1ηt1

)+

G

2`ψ

t2∑τ=t1

ητ

Strongly convex case:

E[Dψ(x?, x (t+1))] ≤ (1− 2`f ηt)E[Dψ(x?, x (t))] +G

2`ψη2t

Page 104: Distributed Learning in Routing Games Convergence

54/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Convergence in DSMD

Back Weakly convex case:

Theorem [14]

Distributed SMD such that ηpt =θptαp with αp ∈ (0, 1). Then

E[f (x (t))

]− f (x?) ≤

(1 +

t∑i=1

1i

)∑k∈A

(1

t1−αk

D

θk+

θkG

2`ψ(1− αk)

1tαk

)= O

(log t

tmin(mink αk ,1−maxk αk )

)Define Si = 1

i+1

∑tt−i E[f (x (τ))]

Show Si−1 ≤ Si +(

1tα−1 + θG

2`ψ(1−α)1tα

)1i

[14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence ofheterogeneous distributed learning in stochastic routing games.In 53rd Allerton Conference on Communication, Control and Computing, 2015

Page 105: Distributed Learning in Routing Games Convergence

55/38

Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References

Stochastic mirror descent in machine learning

Back

Large scale learning:

minimizexN∑i=1

fi (x)

subject to x ∈ X

N very large. Gradient prohibitively expensive to compute exactly. Instead,compute

g(x (t)) =∑i∈I

∇fi (x (t))

with I random subset of {1, . . . ,N}.