university of california, berkeley · stochastic games and mean field games with singular controls...

Stochastic Games and Mean Field Games with Singular Controls

Xin Guo ∗ Joon Seok Lee †

March 28, 2018

Abstract

In this paper we show that a class of N -player stochastic games with singular controls of finitevariation can be approximated by the corresponding MFGs with singular controls of boundedvelocity. This follows from three results, each of independent interest. The first establishesthe convergence of singular control problems of bounded velocity to singular control problemsof finite variation; the second shows that an N -player stochastic game with singular controlsof bounded velocity can be approximated by the corresponding MFG with singular controls ofbounded velocity; the third shows the existence and uniqueness of the optimal control to theMFG of bounded velocity.

1 Introduction

N -player non-zero-sum stochastic games are notoriously hard to analyze. Recently, the theory ofMean Field Games (MFGs), pioneered by Lasry and Lions (2007) and Huang, Malhame, and Caines(2006), presents a powerful approach to study stochastic games of a large population with smallinteractions. MFG avoids directly analyzing the difficult N -player stochastic games. Instead, itapproximates the dynamics and the objective function under the notion of population’s probabilitydistribution flows, a.k.a., mean information processes. Under proper technical conditions, theoptimal control to an MFG is shown to be an ε-Nash Equilibrium (ε-NE) to its corresponding N -player game. As such, MFGs provide an elegant and analytically feasible framework to approximateN -player stochastic games.

The literature on MFGs is expanding rapidly. In addition to the PDEs approach (see alsoGueant, Lasry, and Lions (2010)), there are approaches based on backward stochastic differentialequations (BSDEs) by Buckdahn, Djehiche, Li, and Peng (2009) and Buckdahn, Li, and Peng(2009), the probabilistic approach by Carmona and Delarue (2013, 2014) and Carmona and Lacker(2015), and the dynamic programming method on McKean–Vlasov controls by Pham and Wei(2015, 2016). Recently, Fischer (2014) connects symmetric N -player games with MFGs, Lacker(2015) analyzes MFGs by formulating controlled martingale problems, Carmona, Delarue, andLacker (2016) and Nutz (2016) add common noise to MFGs, Carmona and Zhu (2014) and Sen andCaines (2017) add major and minor players to MFGs, and Gomes, Patrizi, and Voskanyan (2013)and Cardaliaguet and Lehalle (2016) study stationary MFGs. All these theoretical developmentsare within the framework of regular controls where controls are absolutely continuous.

∗Department of Industrial Engineering and Operations Research, University of California, Berkeley, USA. Email:[email protected].†Laboratoire de Probabilites et Modeles Aleatoires, CNRS, UMR 7599, Universite Paris Diderot. Email: delinbe-

[email protected].

1

Compared to regular controls, singular controls provide a more general mathematical frameworkwhere controls are allowed to be discontinuous. Though more natural for practical engineering andeconomics problems, singular controls are in general more challenging to analyze. For instance, an-alyzing singular control problems amounts to adding possibly state-dependent gradient constraintsto the underlying Hamilton–Jacobi–Bellman (HJB) equations. Moreover, the Hamiltonian for sin-gular controls diverges and the standard stochastic maximal principle fails. To overcome thesetechnical difficulties for MFGs with singular controls, Zhang (2012) and Hu, Øksendal, and Sulem(2014) propose and establish relaxed stochastic maximal principles to prove the existence of optimalcontrols to MFGs, while Fu and Horst (2016) adopt the notion of relaxed controls to prove theuniqueness and existence of solution to MFGs in a finite-time horizon.

Our work. There are two types of singular controls, namely, single controls of finite variation andsingular controls of bounded velocity. Unlike singular controls of finite variation, singular controlsof bounded velocity share some nice properties with regular controls and are easier to analyze. Inthis paper we show that N -player stochastic games with singular controls of finite variation can infact be approximated under the NE criterion by MFGs with singular controls of bounded velocity.The key idea is to introduce an intermediate stochastic game: an N -player game with singularcontrols of bounded velocity.

To find the relationships among these games of different types, we first analyze the relationshipbetween the underlying singular control problems. Assuming T < ∞, we show that under properassumptions, the value function of singular controls of bounded velocity converges to that of singularcontrols of finite variation as the bound θ goes to infinity (Theorem 1). Note that a similar result wasestablished in a different problem setting in Hernandez-Hernandez, Perez, and Yamazaki (2016).We then establish the existence, uniqueness, and the smoothness for the solution to the MFG withsingular controls of bounded velocity (Theorem 2). Combined, we next show that (i) the optimalcontrol to the MFG with singular controls of bounded velocity is an εN -NE to an N -player gamewith singular controls of bounded velocity with εN = O( 1√

N), and (ii) it is an (εN + εθ)-NE to an

N -player game with singular controls of finite variation, where εθ is an error term that depends onθ (Theorem 3).

2 Problem formulations and main results

Throughout the paper, we will use the following notation, unless otherwise specified.

Notation.

• (Ω,F ,F = (Ft)0≤t≤∞, P ) is a probability space and W i = W it 0≤t≤∞ are i.i.d. standard

Brownian motions in this space for i = 1, . . . , N <∞.

• P(R) is the set of all probability measures on R.

• Pp(R) is the set of all probability measures of pth order on R. That is

Pp(R) =

µ ∈ P(R)

∣∣∣∣(∫R|x|pµ(dx)

) 1p

<∞.

• The pth order Wasserstein metric on Pp(R) is defined as

Dp(µ, µ′) = infµ

(∫R×R|y − y′|pµ(dy, dy′)

) 1p

,

2

where µ, µ′ are any probability measures in Pp(R) and µ is a coupling of µ and µ′.

• M[0,T ] ⊂ C([0, T ] : P1(R)) is a class of flows of probability measures µt0≤t≤T on [0, T ]. Thatis, there exists a positive constant c such that

M[0,T ] =

µt0≤t≤T ∈ C([0, T ] : P1(R))

∣∣∣∣ sups 6=t

D1(µt, µs)

|t− s|12

≤ c, supt∈[0,T ]

∫R|x|2µt(dx) ≤ c

,

where M[0,T ] is a metric space endowed with the metric

dM

(µt0≤t≤T , µ′t0≤t≤T

)= sup

0≤t≤TD1(µt, µ

′t). (1)

• M[0,∞) ⊂ C([0,∞) : P1(R)) is a class of flows of probability measures µt0≤t<∞ on [0,∞).That is, there exists a positive constant cT which depends on T such that

M[0,∞) =

µt0≤t<∞ ∈ C([0,∞) : P1(R))

∣∣∣∣ sups 6=t

D1(µt, µs)

|t− s|12

≤ c, and

for each T ∈ [0,∞), supt∈[0,T ]

∫R|x|2µt(dx) ≤ cT

,

where M[0,∞) is a metric space endowed with the metric

dβM

(µt0≤t<∞, µ′t0≤t<∞

)=

∫ ∞0

e−βtD1(µt, µ′t)dt, (2)

for some sufficient large β > 0.

• Lip(ψ) is a Lipschitz coefficient of ψ for any given Lipschitz function ψ. That is, |ψ(x) −ψ(y)| ≤ Lip(ψ)|x− y| for any x, y ∈ R.

• Lψ(x) = b(x)∂xψ(x)+ 12σ

2(x)∂xxψ(x) is the infinitesimal generator for any stochastic processdxt = b(xt)dt+ σ(xt)dWt and any ψ(x) ∈ C2.

• A function f is said to be of a polynomial growth if |f(x)| ≤ c(|x|k + 1) for some constants cand k, for all x.

2.1 N-player stochastic games

N-player game with singular controls of a finite variation. Fix a time T <∞ and supposethere are N identical players in the game. Denote xits≤t≤T as the state process in R for player i(i = 1, . . . , N), with xis− = xi starting from time s ∈ [0, T ]. Now assume that the dynamics of xitfollows, for s ≤ t ≤ T ,

dxit =1

N

N∑j=1

b0(xit, x

jt )dt+ σdW i

t + dξi+t − dξi−t , xis− = xi, (3)

where b0 : R× R → R is Lipschitz continuous and σ is a positive constant. Here ξi· = (ξi+· , ξi−· ) is

the control by player i with (ξi+· , ξi−· ) nondecreasing, cadlag, ξi+s− = ξi−s− = 0, E

[∫ Ts dξi+t

]<∞, and

E[∫ Ts dξi−t

]<∞.

3

Given Eqn. (3), the objective of player i is to minimize, over an appropriate control set UNT,∞,

her cost function J i,NT,∞(s, xi, ξi+· , ξi−· ; ξ−i· ) . That is

inf(ξi+· ,ξi−· )∈UNT,∞

J i,NT,∞(s, xi, ξi+· , ξi−· ; ξ−i· ) = inf

(ξi+· ,ξi−· )∈UNT,∞E

∫ T

s

1

N

N∑j=1

f0(xit, x

jt )dt+ γ1dξ

i+t + γ2dξ

i−t

.(N-FV)

Here ξ−i· = (ξj+· , ξj−· )nj=1,j 6=i is the set of controls for all the players except for player i, the costfunction f0 : R× R→ R is Lipschitz continuous, γ1 and γ2 are constants, and

UNT,∞ =

(ξ+· , ξ

−· )

∣∣∣∣ ξ+t and ξ−t are F (x1t ,...,xNt )

t -adapted, cadlag, nondecreasing,

ξ+s− = ξ−s− = 0,E[∫ T

sdξ+t

]<∞, and E

[∫ T

sdξ−t

]<∞, for 0 ≤ s ≤ t ≤ T

,

with F (x1t ,...,xNt )

t s≤t≤T the natural filtration of x1t , . . . , xNt s≤t≤T .Note that in (N-FV), both the drift term in the dynamics and the objective function are affected

by the state of the player i, and the states of all the other players. In general, this type of stochasticgame is difficult to analyze.

N-player game with singular controls of bounded velocity. If one restricts the controls(ξi+· , ξ

i−· ) to be with a bounded velocity such that

dξi+t = ξi+t dt, dξi−t = ξi−t dt,

with 0 ≤ ξi+t , ξi−t ≤ θ for a constant θ > 0. Then game (N-FV) becomes

inf(ξi+· ,ξi−· )∈UNT,θ

J i,NT,θ (s, xi, ξi+· , ξi−· ; ξ−i· ) = inf

(ξi+· ,ξi−· )∈UNT,θE

∫ T

s

1

N

N∑j=1

f0(xit, x

jt )dt+ γ1ξ

i+t dt+ γ2ξ

i−t dt

,subject to dxit =

1

N

N∑j=1

b0(xit, x

jt )dt+ σdW i

t + ξi+t dt− ξi−t dt, xis = xi.

(N-BD)

Here the admissible set is given by

UNT,θ =

(ξ+· , ξ

−· )

∣∣∣∣ (ξ+· , ξ−· ) ∈ UNT,∞, 0 ≤ ξ+t , ξ

−t ≤ θ, for 0 ≤ s ≤ t ≤ T

.

2.2 Corresponding MFG games when N →∞

Game (N-BD) is easier to analyze than game (N-FV). Indeed, assume that all N -players areidentical. That is, for each time t ∈ [0, T ], all xit have the same probability distribution. Define

εt = limN→∞

1N

N∑i=1

δxit as the empirical distribution of xit. Then, according to SLLN, as N →∞,

1

N

N∑j=1

b0(xt, xjt )→

∫Rb0(xt, y)εt(dy) = b(xt, εt),

1

N

N∑j=1

f0(xt, xjt )→

∫Rf0(xt, y)εt(dy) = f(xt, εt),

4

subject to appropriate technical conditions. Here b, f : R × P1(R) → R are functions satisfyingassumptions to be specified later. Consequently, the game (N-BD) can be analyzed by studyingthe following MFG counterpart.

MFG of bounded velocity. It is

vT,θ(s, x) := inf(ξ+· ,ξ

−· )∈UT,θ

J∞T,θ(s, x, ξ+· , ξ

−· )

= inf(ξ+· ,ξ

−· )∈UT,θ

E[∫ T

s

(f(xt, µt) + γ1ξ

+t + γ2ξ

−t

)dt

],

(MFG-BD)

subject to

dxt =(b(xt, µt) + ξ+t − ξ

−t

)dt+ σdWt, xs = x, µs = µ. (4)

Here µt is a probability measure of xt for any t ∈ [s, T ], and

UT,θ =

(ξ+· , ξ

−· )

∣∣∣∣ξ+t and ξ−t are F (xt,µt)t -adapted, cadlag, nondecreasing, ξ+s = ξ−s = 0,

0 ≤ ξ+t , ξ−t ≤ θ,E

[∫ T

sdξ+t

]<∞, and E

[∫ T

sdξ−t

]<∞, for 0 ≤ s ≤ t ≤ T

,

with F (xt,µt)t s≤t≤T the natural filtration of xt, µts≤t≤T . Note that the drift term in the dynamics

and the objective function now rely only on the local information xit and the aggregated meaninformation process µt.

2.3 Solution framework

There are several criteria to analyze stochastic games. Two standard ones are the Pareto Optimalityand the Nash Equilibrium (NE). In this paper we will focus on the NE. Depending on the problemsetting and in particular the admissible controls, there are several forms of NEs, including the openloop NE, the closed loop NE, and the closed loop in feedback form NE (a.k.a., the Markovian NE).

Throughout the paper, we will consider the Markovian NE. Markovian NE means that thecontrols are deterministic functions of time t, current state xt, and a fixed control µt. Moreprecisely,

Definition 1 (Control function). A control of bounded velocity ξt is called Markovian if dξt =ξtdt = ϕ(t, xt; µt)dt for some function ϕ : [0, T ]×R→ R. ϕ(t, x; µt) is called the control func-tion for the fixed µt. A control of a finite variation ξt is called Markovian if dξt = dϕ(t, x; µt)for some function ϕ. ϕ is called the control function for the fixed µt.

Definition 2 (Markovian ε-Nash equilibrium to (N-FV)). A Markovian control (ξi∗+· , ξi∗−· ) ∈ UNT,∞for i = 1, . . . , N is a Markovian ε-Nash equilibrium to (N-FV) if for any i ∈ 1, . . . , N, any(s, x) ∈ [0, T ]× R and any Markovian (ξi

′+· , ξi

′−· ) ∈ UNT,∞,

J i,NT,∞(s, x, ξi′+· , ξi

′−· ; ξ∗−i· ) ≥ J i,NT,∞(s, x, ξi∗+· , ξi∗−· ; ξ∗−i· )− ε.

Definition 3 (Markovian ε-Nash equilibrium to (N-BD)). A Markovian control (ξi∗+· , ξi∗−· ) ∈ UNT,θfor i = 1, . . . , N is a Markovian ε-Nash equilibrium to (N-BD) if for any i ∈ 1, . . . , N, any(s, x) ∈ [0, T ]× R and any Markovian (ξi

′+· , ξi

′−· ) ∈ UNT,θ,

J i,NT,θ (s, x, ξi′+· , ξi

′−· ; ξ∗−i· ) ≥ J i,NT,θ (s, x, ξi∗+· , ξi∗−· ; ξ∗−i· )− ε.

5

The solution to MFGs (MFG-BD) is defined as follows;

Definition 4. A solution of (MFG-BD) is a pair of Markovian control (ξ∗+· , ξ∗−· ) ∈ UT,θ and aflow of probability measures µ∗t ∈ M[0,T ] satisfying vT,θ(s, x) = J∞T,θ(s, x, ξ

∗+· , ξ∗−· ) for all (s, x) ∈

[0, T ] × R, with µ∗t a probability measure of the optimally controlled process x∗t for each t ∈ [0, T ],where the dynamics of x∗t is

dx∗t =(b(x∗t , µ

∗t ) + ξ∗+t − ξ

∗−t

)dt+ σdWt, s ≤ t ≤ T, x∗s = x.

The analysis of stochastic games (N-FV), (N-BD), and (MFG-BD) is built on the PDE/controlmethodology of [21] and [19]. In this approach, the MFG is essentially analyzed by studying twocoupled PDEs, the backward HJB equation and the forward SDE.

To start, we introduce the underlying stochastic control problems.

2.4 Corresponding control problems

Control problem of a bounded velocity with T <∞. Let µt ∈ M[0,T ] be a fixed exogenousflow of probability measures. Then, (MFG-BD) becomes the following control problem,

vT,θ(s, x; µt) , inf(ξ+· ,ξ

−· )∈UT,θ

J∞T,θ(s, x, ξ+· , ξ

−· ; µt)

= inf(ξ+· ,ξ

−· )∈UT,θ

E[∫ T

s

(f(xt, µt) + γ1ξ

+t + γ2ξ

−t

)dt

],

(Control-BD)

subject to Eqn. (4). This is a classical stochastic control problem, and the corresponding HJBequation with the terminal condition is given by

−∂tvT,θ = infξ+,ξ−∈[0,θ]

(b(x, µ) + (ξ+ − ξ−)

)∂xvT,θ +

(f(x, µ) + γ1ξ

+ + γ2ξ−)

+σ2

2∂xxvT,θ

= min

(∂xvT,θ + γ1)θ, (−∂xvT,θ + γ2)θ, 0

+b(x, µ)∂xvT,θ + f(x, µ) +

σ2

2∂xxvT,θ.

with vT,θ(T, x; µt) = 0, ∀x ∈ R.

(5)

If the controls are of finite variation, then we have

Control problem of a finite variation with T <∞.

vT,∞(s, x; µt) , inf(ξ+· ,ξ

−· )∈UT,∞

E[∫ T

sf(xt, µt)dt+ γ1dξ

+t + γ2dξ

−t

], (Control-FV)

subject todxt = b(xt, µt)dt+ σdWt + dξ+t − dξ

−t , xs− = x.

2.5 Main results

The main results in the paper are derived under the following technical assumptions.

6

Assumptions.

(A1). b(x, µ) and f(x, µ) are Lipschitz continuous in x and µ, and b(x, µ) is bounded. That is,|b(x1, µ1)− b(x2, µ2)| ≤ Lip(b)(|x1 − x2|+D1(µ1, µ2)) for some Lip(b) > 0, and |f(x1, µ

1)−f(x2, µ

2)| ≤ Lip(f)(|x1 − x2| + D1(µ1, µ2)) for some Lip(f) > 0, and |b(x, µ)| ≤ c1 for somec1.

(A2). f(x, µ) has a first-order derivative in x with f(x, µ) and ∂xf(x, µ) satisfying the polynomialgrowth condition. Moreover, for any fixed µ ∈ P1(R), f(x, µ) is convex and nonlinear in x.

(A3). b(x, µ) has first- and second-order derivatives with respect to x with uniformly continuousand bounded derivatives in x.

(A4). −γ1 < γ2. This ensures the finiteness of the value function. Indeed, take game (N-FV) with−γ1 > γ2. Then, letting dξi+t = dξi−t = M and M →∞, we will have J i,NT,∞ → −∞.

(A5). (Monotonicity of the cost function) f satisfies either

(i).

∫R

(f(x, µ1)− f(x, µ2))(µ1 − µ2)(dx) ≥ 0, for any µ1, µ2 ∈ P1(R),

and H(x, p) = infξ+,ξ−∈[0,θ]

(ξ+ − ξ−)p+ γ1ξ+ + γ2ξ

− satisfies the following condition for any

x, p, q ∈ R

if H(x, p+ q)−H(x, p)− ∂pH(x, p)q = 0, then ∂pH(x, p+ q) = ∂pH(x, p), or

(ii).

∫R

(f(x, µ1)− f(x, µ2))(µ1 − µ2)(dx) > 0, for any µ1 6= µ2 ∈ P1(R).

As in [21, 5], Assumption (A5) is critical to ensure the solution of (MFG-BD), as will be clearfrom the proof of Proposition 6 for the uniqueness of the fixed point.

(A6). (Rationality of players) For any control function ϕ, any t ∈ [0, T ], any fixed µt, and any

x, y ∈ R, (x− y)

(ϕ(t, x; µt)− ϕ(t, y; µt)

)≤ 0.

Intuitively, it says that the better off the state of an individual player, the less likely theplayer exercises controls, in order to minimize her cost.

(A7). For any fixed µ ∈ P1(R), there exist b1, b2 ∈ R such that b(x, µ) ≤ b1 + b2x and b2 < r forany x ∈ R.

(A8). There exists some constant cf satisfying |f(x, µ)| ≤ cf

(1 + |x|2 +

∫R y

2µ(dy)

)for any x ∈

R, µ ∈ P1(R).

Theorem 1. Assume (A1)–(A4). Then for any (s, x) ∈ [0, T ]× R, as θ →∞, the value functionvT,θ(s, x; µt) of (Control-BD) converges to the value function vT,∞(s, x; µt) of (Control-FV).

Theorem 2. Assume (A1)–(A6). Then there exists a unique solution ((ξ∗+· , ξ∗−· ), µ∗t ) of (MFG-BD). Moreover, the corresponding value function vT,θ(s, x) for (MFG-BD) is in C1,2([0, T ] × R)with a polynomial growth.

7

Theorem 3. Assume (A1)–(A6). Then, the optimal control to (MFG-BD) is an

a). εN -Nash equilibrium to (N-BD), with εN = O

(1√N

);

b). (εN + εθ)-Nash equilibrium to (N-FV), with εN = O

(1√N

)and εθ → 0 as θ →∞.

3 Proofs

The proof of the main results relies on the analysis of their perspective underlying control problems,in particular problem (Control-BD).

3.1 Analysis of problem (Control-BD)

To study problem (Control-BD), let us recall the viscosity solution to the HJB Eqn. (5).

Definition 5. v is called a viscosity solution to the HJB Eqn. (5) if v is both a viscosity superso-lution and a viscosity subsolution, with the following definitions,(i). viscosity supersolution: for any (t0, x0) ∈ [0, T ]× R and any ϑ ∈ C1,2([0, T ]× R), if (t0, x0) isa local minimum of v − ϑ with v(t0, x0)− ϑ(t0, x0) = 0, then

−∂tϑ(t0, x0)− infξ+,ξ−∈[0,θ]

(b(x0, µ) + (ξ+ − ξ−)

)∂xϑ(t0, x0) +

(f(x0, µ) + γ1ξ

+ + γ2ξ−)

− σ2

2∂xxϑ(t0, x0) ≥ 0, and ϑ(T, x0) ≥ 0;

(ii). viscosity subsolution: for any (t0, x0) ∈ [0, T ]×R and any ϑ ∈ C1,2([0, T ]×R), if (t0, x0) is alocal maximum of v − ϑ with v(t0, x0)− ϑ(t0, x0) = 0, then

−∂tϑ(t0, x0)− infξ+,ξ−∈[0,θ]

(b(x0, µ) + (ξ+ − ξ−)

)∂xϑ(t0, x0) +

(f(x0, µ) + γ1ξ

+ + γ2ξ−)

− σ2

2∂xxϑ(t0, x0) ≤ 0, and ϑ(T, x0) ≤ 0.

Proposition 1. Assume (A1)–(A4). The HJB Eqn. (5) has a unique solution v in C1,2([0, T ]×R)with a polynomial growth. Furthermore, this solution is the value function to problem (Control-BD),and the corresponding optimal control function is

ϕθ(t, xt; µt) = ξ+t,θ − ξ−t,θ =

θ if ∂xvT,θ(t, xt; µt) ≤ −γ1,0 if − γ1 < ∂xvT,θ(t, xt; µt) < γ2,−θ if γ2 ≤ ∂xvT,θ(t, xt; µt).

(6)

Proof. By [13, Theorem 6.2, Chapter VI], the HJB Eqn. (5) has a unique solution ϑ in C1,2([0, T ]×R) with a polynomial growth. Now we show that it is the value function to problem (Control-BD).

First, by the viscosity subsolution property of ϑ, for any (s, x) ∈ [0, T ]× R, ϑ(T, x) ≤ 0 and

−∂tϑ− infξ+,ξ−∈[0,θ]

(b(x, µ) + (ξ+ − ξ−)

)∂xϑ+

(f(x, µ) + γ1ξ

+ + γ2ξ−)− σ2

2∂xxϑ ≤ 0.

8

Moreover, for any (ξ+· , ξ−· ) ∈ UT,θ, let xt be a controlled process with (ξ+· , ξ

−· ). Then, applying Ito’s

formula to ϑ(s, x),

0 ≥ E[ϑ(T, xT )]

= ϑ(s, x) + E[∫ T

s∂tϑ(t, xt) + (b(xt, µt) + (ξ+t − ξ

−t ))∂xϑ(t, xt) +

σ2

2∂xxϑ(t, xt)dt

]≥ ϑ(s, x)− E

[ ∫ T

sf(xt, µt) + γ1ξ

+t + γ2ξ

−t dt

].

Hence, for any (ξ+· , ξ−· ) ∈ UT,θ, E

[ ∫ Ts f(xt, µt) + γ1ξ

+t + γ2ξ

−t dt]≥ ϑ(s, x). Therefore,

vT,θ(s, x; µt) = inf(ξ+· ,ξ

−· )∈UT,θ

E[ ∫ T

sf(xt, µt) + γ1ξ

+t + γ2ξ

−t dt

]≥ ϑ(s, x).

Furthermore, let xt,θ be the controlled process under the control (ξ+·,θ, ξ−·,θ). By definition of

(ξ+·,θ, ξ−·,θ),

−∂tϑ−(b(xt,θ, µt) + (ξ+t,θ − ξ

−t,θ))∂xϑ+

(f(xt,θ, µt) + γ1ξ

+t,θ + γ2ξ

−t,θ

)− σ2

2∂xxϑ = 0.

Then, applying Ito’s formula to ϑ(t, x),

0 = E[ϑ(T, xT,θ)]

= ϑ(s, x) + E[∫ T

s

(∂tϑ(t, xt,θ) + (b(xt,θ, µt) + (ξ+t,θ − ξ

−t,θ)∂xϑ(t, xt,θ) +

σ2

2∂xxϑ(t, xt,θ)

)dt

]= ϑ(s, x)− E

[ ∫ T

s

(f(xt,θ, µt) + γ1ξ

+t,θ + γ2ξ

−t,θ

)dt

]≤ ϑ(s, x)− vT,θ(s, x; µt).

Hence, v(s, x; µt) ≤ ϑ(s, x).Combined, vT,θ(s, x; µt) = ϑ(s, x), and the optimal control function is

ϕθ(t, xt; µt) =

θ if ∂xvT,θ(t, xt,θ; µt) ≤ −γ1,0 if − γ1 < ∂xvT,θ(t, xt,θ; µt) < γ2,−θ if γ2 ≤ ∂xvT,θ(t, xt,θ; µt).

Next, we establish the regularity of the value function to problem (Control-BD).

Proposition 2. Assume (A1)–(A4). For any fixed t ∈ [0, T ], the value function vT,θ(t, x; µt) isstrictly convex in x.

Proof. Fix any x1, x2 ∈ R and any λ ∈ [0, 1]. For any (ξ1,+· , ξ1,−· ) ∈ UT,θ and (ξ2,+· , ξ2,−· ) ∈ UT,θ, bythe convexity of f ,

λJ∞T,θ(s, x1, ξ1,+· , ξ1,−· ; µt) + (1− λ)J∞T,θ(s, x2, ξ

2,+· , ξ2,−· ; µt)

≥J∞T,θ(s, λx1 + (1− λ)x2, λξ1,+· + (1− λ)ξ1,+· , λξ2,+· + (1− λ)ξ2,−· ; µt)

≥vT,θ(s, λx1 + (1− λ)x2; µt).

Since this holds for any (ξ1,+· , ξ1,−· ) ∈ UT,θ and (ξ2,+· , ξ2,−· ) ∈ UT,θ,

λvT,θ(s, x1; µt) + (1− λ)J∞T,θ(s, x2, ξ2,+· , ξ2,−· ; µt) ≥ vT,θ(s, λx1 + (1− λ)x2; µt),

9

λvT,θ(s, x1; µt) + (1− λ)vT,θ(s, x2; µt) ≥ vT,θ(s, λx1 + (1− λ)x2; µt).

Hence, vT,θ(s, x; µt) is convex in x. By Proposition 1, vT,θ(s, x; µt) is a C1,2([0, T ]×R) solutionto the equation

−∂tvT,θ = min

(∂xvT,θ + γ1)θ, (−∂xvT,θ + γ2)θ, 0

+b(x, µ)∂xvT,θ + f(x, µ) +

σ2

2∂xxvθ.

Since f(x, µ) is not linear in x, the solution to this equation is also nonlinear in x. Hence,vT,θ(s, x; µt) is strictly convex.

Proposition 3. Assume (A1)–(A4). The optimal control function ϕθ(t, x; µt) to problem (Control-BD) is unique and the optimally controlled state process xt,θ is also unique, with

dxt,θ =

(b(xt,θ, µt) + ϕθ(t, xt,θ; µt)

)dt+ σdWt, xs,θ = x.

Proof. By Proposition 1, there exists a unique value function vT,θ(t, x; µt) to problem (Control-BD). Furthermore, by (6), the optimal control function ϕθ(t, x; µt) is uniquely determined. Letus prove that the optimally controlled state process xt,θ exists and is unique.

For any given xnt,θ, consider a mapping Φ such that Φ(xnt,θ) = xn+1t,θ where xn+1

t,θ is a solution tothe following SDE:

dxn+1t,θ =

(b(xnt,θ, µt) + ϕθ(t, x

n+1t,θ ; µt)

)dt+ σdWt, xn+1

s,θ = x. (7)

By [29], for any given xnt,θ, the SDE (7) has a unique solution xn+1t,θ , so the mapping Φ is well

defined. Then, for any n ∈ N,

d(xn+1t,θ − x

n+2t,θ ) =

(b(xnt,θ, µt)− b(xn+1

t,θ , µt) + ϕθ(t, xn+1t,θ ; µt)− ϕθ(t, xn+2

t,θ ; µt))dt.

Because ϕθ(t, x; µt) is nonincreasing in x,

d(xn+1t,θ − x

n+2t,θ )2

= 2(xn+1t,θ − x

n+2t,θ )

(b(xnt,θ, µt)− b(xn+1

t,θ , µt) + ϕθ(t, xn+1t,θ ; µt)− ϕθ(t, xn+2

t,θ ; µt))dt

≤ 2Lip(b)|xn+1t,θ − x

n+2t,θ ||x

nt,θ − xn+1

t,θ |dt

≤ Lip(b)(|xn+1t,θ − x

n+2t,θ |

2 + |xnt,θ − xn+1t,θ |

2

)dt.

By Gronwall’s inequality, for any t ∈ [0, T ],

|xn+1t,θ − x

n+2t,θ |

2 ≤ Lip(b) exp

(Lip(b)t

)∫ t

0|xns,θ − xn+1

s,θ |2ds.

Hence, we can derive for any n ∈ N,

|xn+1t,θ − x

n+2t,θ |

2 ≤

(Lip(b)t

)nexp

(nLip(b)t

)n!

|x1t,θ − x2t,θ|2.

As n → ∞, Φ is a contraction mapping, and the SDE (7) has a unique fixed point solution.Therefore, there exists a unique optimally controlled state process xt,θ to problem (Control-BD).Furthermore, the optimal Markovian control (ξ+·,θ, ξ

−·,θ) to (Control-BD) also uniquely exists.

10

3.2 Proof of Theorem 1

Fix µt ∈ M[0,T ]. For any (ζ+·,∞, ζ−·,∞) ∈ UT,∞, since each path of a finite variation process is almost

everywhere differentiable, there exists a sequence of bounded velocity functions which converges tothe path as θ → ∞. Hence, there exists a sequence (ζ+·,θ, ζ

−·,θ)θ∈[0,∞) such that (ζ+·,θ, ζ

−·,θ) ∈ UT,θ

and E∫ T0 |ζ

+t,θdt− dζ

+t,∞| → 0,E

∫ T0 |ζ

−t,θdt− dζ

−t,∞| → 0 as θ →∞. Define εθ as

εθ = O

(E∫ T

0|ζ+t,θdt− dζ

+t,∞|+ E

∫ T

0|ζ−t,θdt− dζ

−t,∞|

), (8)

and εθ → 0 as θ →∞.Denote

dxt,θ = (b(xt,θ, µt) + ζ+t,θ − ζ−t,θ)dt+ σdWt, xs,θ = x, and

dxt = b(xt, µt)dt+ σdWt + dζ+t,∞ − dζ−t,∞, xs− = x.

Then, for any τ ∈ [s, T ],

|xτ,θ − xτ | ≤∫ τ

s|b(xt,θ, µt)− b(xt, µt)|dt+

∫ τ

s|ζ+t,θdt− dζ

+t,∞|+

∫ τ

s|ζ−t,θdt− dζ

−t,∞|

≤∫ τ

sLip(b)|xt,θ − xt|dt+

∫ τ

s|ζ+t,θdt− dζ

+t,∞|+

∫ τ

s|ζ−t,θdt− dζ

−t,∞|.

By Gronwall’s inequality,

E|xτ,θ − xτ | ≤ O(E∫ τ

0|ζ+t,θdt− dζ

+t,∞|+ E

∫ τ


−t,∞|

).

Consequently, ∣∣∣∣J∞T,∞(s, x, ζ+t,∞, ζ−t,∞; µt)− J∞T,θ(s, x, ζ+t,θ, ζ

−t,θ; µt)

∣∣∣∣≤ E

[∣∣∣∣∫ T

sf(xt, µt)− f(xt,θ, µt) + γ1dζ

+t,∞ + γ2dζ

−t,∞ − γ1ζ

+t,θdt− γ2ζ

−t,θdt

∣∣∣∣]≤ E

[∫ T

sLip(f)|xt − xt,θ|+ γ1|dζ+t,∞ − ζ

+t,θdt|+ γ2|dζ−t,∞ − ζ

−t,θdt|

]≤ O

(E∫ T

0|ζ+t,θdt− dζ

+t,∞|+ E

∫ T


−t,∞|

).

Therefore,

∣∣∣∣vT,∞(s, x; µt)− vT,θ(s, x; µt)∣∣∣∣→ 0 as θ → 0.


First, from Proposition 1 we see that for any given fixed µt there exists a unique optimal controlfunction as ϕθ(t, x; µt). Now, one can define a mapping Γ1 from M[0,T ] to a class of pairs of theoptimal control function ϕθ and the fixed flow of probability measures µt such that

Γ1(µt) =

(ϕθ(t, x; µt), µt

).

11

Moreover, by Proposition 3 the optimally controlled process xt,θ under the fixed µt exists uniquelywith

dxt,θ =

(b(xt,θ, µt) + ϕθ(t, xt,θ; µt)

)dt+ σdWt, xs,θ = x.

Consequently, we can define Γ2 so that

Γ2

(ϕθ(t, x; µt), µt

)= µt,

where µt is the probability measure of xt,θ for each t ∈ [0, T ].Now, define a mapping Γ as

Γ(µt) = Γ2 Γ1(µt) = µt.

We will use the Schauder fixed point theorem [26, Theorem 4.1.1] to show the existence of a fixedpoint. The key is to prove that Γ is a continuous mapping of M[0,T ] into M[0,T ], and the range ofΓ is relatively compact.

Proposition 4. Assume (A1)–(A4). Γ is a mapping from M[0,T ] to M[0,T ].

Proof. For any µt in M[0,T ], let us prove that µt = Γ(µt) is also in M[0,T ]. Without loss ofgenerality, suppose s > t, and

xs = xt +

∫ s

t

(b(xr, µr) + ϕθ(r, xr; µt)

)dr +

∫ s

tσdWr.

Since b(x, µ) is Lipschitz, |ϕθ(s, xs; µt)| ≤ θ, and E∣∣∣∣(b(xr, µr) + ϕθ(r, xr; µt))

∣∣∣∣≤M for large M

and for any r ∈ [0, T ],

D1(µs, µt) ≤ E|xs − xt|

≤ E∫ s

t

∣∣∣∣b(xr, µr) + ϕ(r, xr; µt)∣∣∣∣dr + σE sup

r∈[t,s]|Wr −Wt|

≤M |s− t|+ σE supr∈[t,s]

|Wr −Wt| ≤M |s− t|+ σ|s− t|12 .

Therefore, sups 6=tD1(µt,µs)

|t−s|12≤ c. For any t ∈ [0, T ], since |b(x, µ)| ≤ c1 is bounded,

∫R|x|2µt(dx) ≤ 2E

[∫R|x|2dµ0 + c22t

2 + σ2t

]≤ 2E

[∫R|x|2dµ0 + c21T

2 + σ2T

],

and supt∈[0,T ]

∫R |x|

2µt(dx) ≤ c.

Proposition 5. Assume (A1)–(A6). Γ :M[0,T ] →M[0,T ] is continuous.

Proof. Let µnt ∈ M[0,T ] for n = 1, . . . , be a sequence of flows of probability measures dM(µnt , µt)→0 as n → ∞, for some µt ∈ M[0,T ]. Fix τ ∈ [0, T ). By Proposition 1, for each µnt , problem

12

(Control-BD) has a value function vnT,θ(s, x; µnt ) with the optimal control ϕn(t, x). 1 Let xnt bethe corresponding optimal controlled process:

dxnt =

(b(xnt , µ

nt ) + ϕn(t, xnt )

)dt+ σdWt, τ ≤ t ≤ T, xnτ = x.

Let µnt be a flow of probability measures of xnt , then Γ(µnt ) = µnt .Similarly, for each µt, problem (Control-BD) has a value function vT,θ(s, x; µt) with the

optimal control ϕ(t, x). 2 Let xt be the corresponding optimal controlled process:

dxt =

(b(xt, µt) + ϕ(t, xt)

)dt+ σdWt, τ ≤ t ≤ T, xτ = x.

Let µt be a flow of probability measures of xt, then Γ(µt) = µt.To show that Γ is continuous, we need to show

dM

(µnt , µt

)→ 0 as n→∞.

This is established in four steps.Step 1. We first establish some relation between D2(µnt , µt) and D2(µnt , µt).

For any s ∈ [τ, T ],

d(xs − xns ) =

(b(xs, µs)− b(xns , µns ) + ϕ(s, xs)− ϕn(s, xns )

)ds.

Then, for any t ∈ [τ, T ],

|xt − xnt |2 = 2

∫ t

τ

(b(xs, µs)− b(xns , µns ) + ϕ(s, xs)− ϕn(s, xns )

)(xs − xns )ds

≤ 2

∫ t

τLip(b)

(|xs − xns |+D1(µs, µ

ns )

)|xs − xns |

+

(ϕ(s, xs)− ϕn(s, xns )

)(xs − xns )ds.

Lip(b)

(|xs − xns |+D1(µs, µ

ns )

)|xs − xns | ≤ Lip(b0)|xs − xns |2 +

Lip(b)

2

((D1(µs, µ

ns ))2 + |xs − xns |2

).

By Assumption (A6),

(ϕ(s, xns )− ϕn(s, xns ))(xs − xns )

≤(ϕ(s, xs)− ϕn(s, xs) + ϕn(s, xs)− ϕn(s, xns )

)(xs − xns )

≤ (ϕ(s, xs)− ϕn(s, xs))(xs − xns )

≤ 1

2

(|ϕ(s, xs)− ϕn(s, xs)|2 + |xs − xns |2

).

1In this proof, for simplicity of notation, denote ϕn(t, x) = ϕnθ (t, x; µnt ).2In this proof, for simplicity of notation, denote ϕ(t, x) = ϕθ(t, x; µt).

13

Consequently,

|xt − xnt |2 ≤∫ t

τ(3Lip(b) + 1)|xs − xns |2 + Lip(b0)(D

1(µs, µns ))2 + |ϕ(s, xs)− ϕn(s, xs)|2ds.


(D2(µt, µnt ))2 ≤ c2

∫ t

τLip(b)(D1(µs, µ

ns ))2 + E

[|ϕ(s, xs)− ϕn(s, xs)|2

]ds, (9)

for some constant c2 depending on T and Lip(b).Step 2. Now we prove that for any (t, x) ∈ [τ, T ]× R,

∂xvnT,θ(t, x; µnt )→ ∂xv(t, x; µt) as n→∞.

By Proposition 1, vT,θ and vnT,θ are the solutions to the HJB Eqn. (5). Denote

ϕ1(s, x) = maxϕ(s, x), 0, ϕ2(s, x) = −max−ϕ(s, x), 0,

ϕn1 (s, x) = maxϕn(s, x), 0, ϕn2 (s, x) = −max−ϕn(s, x), 0.Since ϕ1, ϕ2 are optimal controls, using Ito’s formula and the HJB Eqn. (5), we obtain

− vT,θ(τ, x; µt)= vT,θ(T, xT ; µt)− vT,θ(τ, x; µt)

= −E∫ T

τ

(f(xs, µs) + γ1ϕ1(s, xs) + γ2ϕ2(s, xs)

)ds+

∫ T

τσ∂xvT,θ(s, xs; µt)dWs.

(10)

Similarly, for any n ∈ N, applying Ito’s formula to vnT,θ(s, x) and xt yields

vnT,θ(T, xT ; µnt )− vnT,θ(τ, x; µnt )

=

∫ T

τ∂tv

nT,θ(s, xs; µnt ) + (b(xs, µs) + ϕ(s, xs))∂xv

nT,θ(s, xs; µnt ) +

σ2

2∂xxv

nT,θ(s, xs; µnt )ds

+

∫ T

τσ∂xv

nT,θ(s, xs; µnt )dWs

=

∫ T

τ∂tv

nT,θ(s, xs; µnt ) + (b(xs, µ

ns ) + ϕnT,θ(s, xs))∂xv

nT,θ(s, xs; µnt ) +

σ2

2∂xxv


+

∫ T

τσ∂xv


−∫ T

τ(b(xs, µ

ns )− b(xs, µs) + ϕn(s, xs)− ϕ(s, xs))∂xv


=−∫ T

τ

(f(xs, µ

ns ) + γ1ϕ

n1 (s, xs) + γ2ϕ

n2 (s, xs)

)ds+

∫ T

τσ∂xv


−∫ T

τ(b(xs, µ


nT,θ(s, xs; µnt )ds.

The last equality is due to the HJB Eqn. (5). Hence,

vnT,θ(τ, x; µnt ) =

∫ T

τ

(f(xs, µ

ns ) + γ1ϕ

n1 (s, xs) + γ2ϕ

n2 (s, xs)

)ds−

∫ T

τσ∂xv


+

∫ T

τ

(b(xs, µ


nT,θ(s, xs; µnt

)ds.

(11)

14

Denote H(s, x) = infξ+,ξ−∈[0,θ]

(ξ+ − ξ−)∂xvT,θ(s, x; µnt ) + γ1ξ+ + γ2ξ

−, and

Hn(s, x) = infξ+,ξ−∈[0,θ]

(ξ+ − ξ−)∂xvnT,θ(s, x; µnt ) + γ1ξ

+ + γ2ξ−. Then for any ξ+, ξ− ∈ [0, θ],

∣∣∣((ξ+ − ξ−)∂xvT,θ(s, x; µt) + γ1ξ+ + γ2ξ

−)−(

(ξ+ − ξ−)∂xvnT,θ(s, x; µnt ) + γ1ξ

+ + γ2ξ−)∣∣∣

≤∣∣∣∣ξ+(∂xvT,θ(s, x; µt)− ∂xvnT,θ(x, µn; µnt )

)−ξ−

(∂xvT,θ(s, x; µt)− ∂xvnT,θ(x, µn; µnt )

)∣∣∣∣≤ 2θ

∣∣∂xvT,θ(s, x; µt)− ∂xvnT,θ(s, x; µnt )∣∣ .

Hence, for any s, x ∈ [τ, T ]× R,

|H(s, x)−Hn(s, x)| ≤ 2θ

∣∣∣∣∂xvT,θ(s, x; µt)− ∂xvnT,θ(s, x; µnt )∣∣∣∣.

By definition,

2θ

∣∣∣∣∂xvT,θ(s, x; µt)− ∂xvnT,θ(s, x; µnt )∣∣∣∣

≥∣∣∣∣(ϕ1(t, x)− ϕ2(t, x)

)∂xvT,θ(s, x; µt) + γ1ϕ1(t, x) + γ2ϕ2(t, x)

−(ϕn1 (t, x)− ϕn2 (t, x)

)∂xv

nT,θ(s, x; µnt ) + γ1ϕ

n1 (t, x) + γ2ϕ

n2 (t, x)

∣∣∣∣=

∣∣∣∣(γ1 + ∂xvT,θ(s, x; µt))(

ϕ1(t, x)− ϕn1 (t, x)

)+

(γ2 − ∂xvT,θ(s, x; µnt )

)(ϕ2(t, x)− ϕn2 (t, x)

)+

(∂xvT,θ(s, x; µt)− ∂xvnT,θ(s, x; µnt )

)(ϕn1 (t, x)− ϕn2 (t, x)

)∣∣∣∣≥∣∣∣∣(γ1 + ∂xvT,θ(s, x; µt)

)(ϕ1(t, x)− ϕn1 (t, x)

)+

(γ2 − ∂xvT,θ(s, x; µt)

)(ϕ2(t, x)− ϕn2 (t, x)

)∣∣∣∣− θ∣∣∣∣∂xvT,θ(s, x; µt)− ∂xvnT,θ(s, x; µnt )

∣∣∣∣.Hence,

3θ

∣∣∣∣∂xvT,θ(s, x; µt)− ∂xvnT,θ(s, x; ; µnt )∣∣∣∣ ≥ ∣∣∣∣(γ1 + ∂xvT,θ(s, x; µt)

)(ϕ1(s, x)− ϕn1 (s, x)

)+

(γ2 − ∂xvT,θ(s, x; µt)

)(ϕ2(s, x)− ϕn2 (s, x)

)∣∣∣∣.(12)

Similarly,

3θ

∣∣∣∣∂xvT,θ(s, x; µt)− ∂xvnT,θ(s, x; µnt )∣∣∣∣ ≥ ∣∣∣∣(γ1 + ∂xv

nT,θ(s, x; µnt )

)(ϕ1(s, x)− ϕn1 (s, x)

)+

(γ2 − ∂xvnT,θ(s, x; µnt )

)(ϕ2(s, x)− ϕn2 (s, x)

)∣∣∣∣. (13)

Step 3. We can further show ϕn(s, x)→ ϕ(s, x) for any s, x ∈ [0, T ]× R as n→∞.

15

Indeed, from Eqns. (10) and (11) and by Ito’s isometry and Cauchy–Schwartz inequality,(vT,θ(τ, x; µt)− vnT,θ(τ, x; µnt )

)2

+σ2E[∫ T

τ

(∂xvT,θ(s, xs; µt)− ∂xvnT,θ(s, xs; µnt )

)2

ds

]≤3(T − τ)E

[∫ T

τ

(f(xs, µs)− f(xs, µ

ns )

)2

+

((b(xs, µs)− b(xs, µns ))∂xv

nT,θ(s, xs; µt)

)2

+

((γ1 + ∂xv

nT,θ(s, x; µnt ))(ϕ1(s, xs)− ϕn1 (s, xs))

+ (γ2 − ∂xvnT,θ(s, x; µnt ))(ϕ2(s, xs)− ϕn2 (s, xs))

)2

ds

]≤3(T − τ)E

[∫ T

τ

(Lip(f)D1(µs, µ

ns )

)2

+

(Lip(b)D1(µs, µ

ns )

∣∣∣∣∂xvnT,θ(s, xs; µnt )∣∣∣∣)2

+

((γ1 + ∂xv

nT,θ(s, xs; µnt ))(ϕ1(s, xs)− ϕn1 (s, xs))

+ (γ2 − ∂xvnT,θ(s, xs; µnt ))(ϕ2(s, xs)− ϕn2 (s, xs))

)2

ds

]≤3(T − τ)E

[∫ T

τ

(Lip(f)D1(µs, µ

ns )

)2

+

(Lip(b)D1(µs, µ

ns )|∂xvnT,θ(s, xs; µnt )|

)2

+

(3θ(∂xvT,θ(s, xs; µnt )− ∂xvnT,θ(s, xs; µnt ))

)2

ds

].

Let δ = σ2

54θ2. Then, for any τ ∈ [T − δ, T ],(

vT,θ(τ, x; µt)− vnT,θ(τ, x; µnt ))2

+σ2

2E[∫ T

τ(∂xvT,θ(s, xs; µt)− ∂xvnT,θ(s, xs; µnt ))2ds

]≤3(T − τ)E

[∫ T

τ

(Lip(f)D1(µs, µ

ns )

)2

+

(Lip(b)D1(µs, µ

ns )|∂xvn(s, xs; µnt )|

)2

ds

].

Hence, for any τ ∈ [T − δ, T ],

vT,θ(τ, x; µt)− vnT,θ(τ, x; µnt )→ 0,

and

E[∫ T

τ

(∂xvT,θ(s, xs; µt)− ∂xvnT,θ(s, xs; µnt )

)2

ds

]→ 0 as n→∞.

Since δ > 0, one can repeat this process for [T −2δ, T −δ]. Proceeding recursively, one can show

that for any (t, x) ∈ [0, T ] × R, vnT,θ(t, x; µnt ) → vT,θ(t, x; µt), and E[∫ T

0

(∂xvT,θ(s, xs; µt) −

∂xvnT,θ(s, xs; µnt )

)2

ds

]→ 0 as n→∞. Hence, for any (s, x) ∈ [0, T ]× R,

∂xvnT,θ(s, x; µnt )→ ∂xvT,θ(s, x; µt) as n→∞.

Since vnT,θ(s, x; µnt ) and vT,θ(s, x; µt) are convex in x and f(x, µ) is nonlinear, vnT,θ(s, x; µnt )and vT,θ(s, x; µt) are strictly convex. So, ∂xv

nT,θ(s, x; µnt ), ∂xvT,θ(s, x; µt) are strictly increas-

ing in x, and by definition of ϕn and ϕ, ϕn(s, x) converges to ϕ(s, x) for any (s, x) ∈ [0, T ]× R.

16

Step 4. We are now ready to show dM

(µt, µnt

)→ 0 as n→∞.

From previous steps, ϕn(s, xs) → ϕ(s, xs) a.s. as n → ∞, and by the Dominated Convergence

Theorem in the L2 space, for each s ∈ [0, T ], E∣∣∣∣ϕn(s, xs)−ϕn(s, xs)

∣∣∣∣2→ 0. Hence, by inequality (9),

D2(µt, µnt )→ 0 for any t ∈ [0, T ]. Since D1(µt, µ

nt ) ≤ D2(µt, µ

nt ), D1(µt, µ

nt )→ 0 for any t ∈ [0, T ],

dM

(µt, µnt

)→ 0 as n→∞. That is, Γ is continuous.

Proposition 6. Assume (A1)–(A6). Γ :M[0,T ] →M[0,T ] has a fixed point, and (MFG-BD) has aunique solution.

Proof. As in the proof in Section 3.2 and the proof of Lemma 5.7 in [5], the range of the mapping Γis relatively compact, and by Proposition 5, Γ is a continuous mapping. Hence, due to the Schauderfixed point theorem [26, Theorem 4.1.1], Γ has a fixed point such that Γ(µt) = µt ∈ M[0,T ].By Assumption (A5), the fixed point is at most one ([5, 21]). Therefore, there exists a unique fixedpoint solution of flow of probability measures µ∗t . By definition of the solution to a MFG andProposition 1, the optimal control is also unique.


Suppose that

((ξ+·,θ, ξ

−·,θ), µt,θ

)is an MFG solution to (MFG-BD) with a bound θ, and xt,θ is the

optimally controlled process:

dxt,θ =

(b(xt,θ, µt,θ) + ϕ1,θ(t, xt,θ; µt,θ)− ϕ2,θ(t, xt,θ; µt,θ)

)dt+ σdWt, xs,θ = x,

where ξ+t,θ− ξ−t,θ = ϕθ(t, x; µt) = ϕ1,θ(t, x; µt,θ)−ϕ2,θ(t, x; µt,θ) is the optimal control function.

Consider the stochastic control problem (Control-FV) with µt,θ and vT,∞(s, x; µt,θ) thevalue function, and let xt,∞ be the optimal controlled process:

dxt,∞ = b(xt,∞, µt,∞)dt+ σdWt + dξ+t,∞ − dξ−t,∞, xs−,∞ = x.

Note that the optimal control ξt,∞ is of a feedback form. Hence, denote

dϕ∞(t, x; µt,θ) = dϕ1,∞(t, x; µt,θ)− dϕ2,∞(t, x; µt,θ) = dξ+t,∞ − dξ−t,∞

as the optimal control function for the stochastic control problem (N-FV) with fixed µt,θ.Denote3 also

dxit,θ =

(b(xit,θ, µt,θ) + ϕ1,θ(t, x

it,θ)− ϕ2,θ(t, x

it,θ)

)dt+ σdW i

t , xis,θ = x,

dxit,∞ = b(xit,∞, µt,θ)dt+ dϕ1,∞(t, xit,∞)− dϕ2,∞(t, xit,∞) + σdW it , xis−,∞ = x,

dxi,Nt,θ =

(1

N

∑j=1,...,N

b0(xi,Nt,θ , x

i,Nt,θ ) + ϕ1,θ(t, x

i,Nt,θ )− ϕ2,θ(t, x

i,Nt,θ )

)dt+ σdW i

t , xi,Ns,θ = x,

Since (µt,θ, ϕθ) is the solution to (MFG-BD) and xit,θ are i.i.d., and µt,θ is the probability measure

of xit,θ for any i = 1, . . . , N .We first establish some technical Lemmas.

3In this proof, omit µt,θ for notations simplicity. Denote ϕi,θ(t, x) = ϕi,θ(t, x; µt,θ) and ϕi,∞(t, x) =ϕi,∞(t, x; µt,θ) for i = 1, 2.

17

Lemma 1. For any 1 ≤ i ≤ n, E sups≤t≤T

∣∣∣∣xit,θ − xi,Nt,θ ∣∣∣∣2= O

(1N

).

Proof.

d(xit,θ − xi,Nt,θ ) =

∫Rb0(x

it,θ, y)µt,θ(dy)− 1

N

N∑j=1

b0(xi,Nt,θ , x

j,Nt,θ ) + ϕθ(t, x

it,θ)− ϕθ(t, x

i,Nt,θ )

dt,

and

d(xit,θ − xi,Nt,θ )2 =

2(xit,θ − x

i,Nt,θ )

(∫Rb0(x

it,θ, y)µt,θ(dy)

− 1

N

N∑j=1

b0(xi,Nt,θ , x

j,Nt,θ ) + ϕθ(t, x

it,θ)− ϕθ(t, x

i,Nt,θ )

)dt.

By Assumption (A6), (xit,θ − xi,Nt,θ )

(ϕθ(t, x

it,θ)− ϕθ(t, x

i,Nt,θ )

)≤ 0. Consequently,

|xiT,θ − xi,NT,θ |

2 ≤∫ T

s2|xit,θ − x

i,Nt,θ |∣∣∣∣∫

Rb0(x


N

N∑j=1

b0(xi,Nt,θ , x

j,Nt,θ )

∣∣∣∣dt≤∫ T

s2|xit,θ − x

i,Nt,θ |∣∣∣∣∫

Rb0(x


N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)

∣∣∣∣dt+

∫ T

s2|xit,θ − x

i,Nt,θ |∣∣∣∣ 1

N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)−

1

N

N∑j=1

b0(xi,Nt,θ , x

j,Nt,θ )

∣∣∣∣dt≤∫ T

s2|xit,θ − x

i,Nt,θ |∣∣∣∣∫

Rb0(x


N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)

∣∣∣∣dt+

∫ T

s2|xit,θ − x

i,Nt,θ |(

1

N

N∑j=1

Lip(b0)|xjt,θ − xj,Nt,θ |)dt

≤∫ T

s|xit,θ − x

i,Nt,θ |

2 +

∣∣∣∣∫Rb0(x

it,θ, y)µθt (dy)− 1

N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)

∣∣∣∣2dt+

∫ T

s

1

NLip(b0)

N∑j=1

(|xit,θ − x

i,Nt,θ |

2 + |xjt,θ − xj,Nt,θ |

2

)dt

≤ (1 + Lip(b0))

∫ T

s|xit,θ − x

i,Nt,θ |

2dt

+

∫ T

s

∣∣∣∣∫Rb0(x


N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)

∣∣∣∣2dt+

∫ T

s

1

NLip(b0)

N∑j=1

|xjt,θ − xj,Nt,θ |

2dt.

18

Hence,

E|xiT,θ − xi,NT,θ |

2 ≤(1 + Lip(b0))E∫ T

s|xit,θ − x

i,Nt,θ |

2dt

+ E∫ T

s

∣∣∣∣∫Rb0(x


N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)

∣∣∣∣2dt+ E

∫ T

sLip(b0)|xit,θ − x

i,Nt,θ |

2dt,

and

E

∣∣∣∣∣∣∫Rb0(x


N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)

∣∣∣∣∣∣2

≤ 2E∣∣∣∣∫

Rb0(x

it,θ, y)− b0(xi,Nt,θ , y)µt(dy)

∣∣∣∣2 + 2E

∣∣∣∣∣∣∫Rb0(x

i,Nt,θ , y)µt(dy)− 1

N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)

∣∣∣∣∣∣2

≤ 2E∣∣∣∣∫

RLip(b0)|xit,θ − x

i,Nt,θ |µt(dy)

∣∣∣∣2 + 2E

∣∣∣∣∣∣∫Rb0(x

i,Nt,θ , y)µt(dy)− 1

N

N∑j=1

b0(xi,Nt,θ , x

jt,θ)

∣∣∣∣∣∣2

= 2Lip(b0)2E|xit,θ − x

i,Nt,θ |

2 + ε2N ,

with εN = O

(1√N

)from the Central Limit Theorem. Consequently,

E|xiT,θ − xi,NT,θ |

2 ≤ E∫ T

s(1 + 2Lip(b0))|xit,θ − x

i,Nt,θ |

2dt+ E∫ T

s

(2Lip(b0)

2|xit,θ − xi,Nt,θ |

2 + ε2N

)dt.


E sups≤t≤T

|xiT,θ − xi,NT,θ |

2 ≤∫ T

sε2Ndt · E

[exp(

∫ T

s(1 + 4Lip(b0)

2)dt)

]=

∫ T

sε2Ndt · e(1+4Lip(b0)2)T = O

(1

N

).

Therefore, E sups≤t≤T

|xit,θ − xi,Nt,θ |

2 = O(1N

).

Proof of Theorem 3 a). Suppose that the first player chooses a different control ξ′t ∈ U∞ whichis of a bounded velocity and all other players i = 2, 3, . . . , N choose to stay with the optimal controlξt,θ. Denote

dξ′t = ξ′tdt = ϕ′(t, x)dt, and dξt,θ = ξt,θdt = ϕθ(t, x)dt.

Then the corresponding dynamics for the MFG is

dx1t,θ =

(b(x1t,θ, µt,θ) + ϕ′(t, x1t,θ)

)dt+ σdW 1

t ,

19

So, sup2≤i≤N

E sups≤t≤T

|xi,Nt,θ − xi,Nt,θ | = O

(1√N

).

Next, by Lemma 1, for any 2 ≤ i ≤ N , sups≤t≤T

E|xit,θ − xi,Nt,θ | = O

(1√N

), and by the triangle

inequality, sup2≤i≤N

E sups≤t≤T

|xit,θ − xi,Nt,θ | = O( 1√

N). Therefore,

sup2≤i≤N

E sups≤t≤T

|xit,θ − xi,Nt,θ |+ sup

1≤i≤NE sups≤t≤T

|xit,θ − xi,Nt,θ | = O

(1√N

).

Finally, define

dx1,Nt,θ =

1

N

N∑j=1

b0(x1,Nt,θ , x

jt,θ) + ϕ′(t, x1,Nt,θ )

dt+ σdW 1t ,

Since (x− y)(ϕ′(t, x)−ϕ′(t, y)) ≤ 0 by Assumption (A6), then a similar proof as that for Lemma 1

shows E sup0≤t≤T

|x1,Nt,θ − x1,Nt,θ | = O

(1√N

)and E sup

0≤t≤T|x1,Nt,θ − x

1t,θ| = O

(1√N

). Therefore,

J i,NT,θ (s, x1, ξ′+· , ξ

′−· ; ξ−1·,θ )

= E

∫ T

s

1

N

N∑j=1

f0(x1,Nt,θ , x

j,Nt,θ )dt+ γ1ϕ

′1(t, x

1,Nt,θ )− γ2ϕ′2(t, x

1,Nt,θ )dt

≥ E

∫ T

s

1

N

N∑j=1

f0(x1,Nt,θ , x

jt,θ)dt+ γ1ϕ

′1(t, x

1,Nt,θ )− γ2ϕ′2(t, x

1,Nt,θ )dt

−O( 1√N

)

≥ E

∫ T

s

1

N

N∑j=1

f0(x1,Nt,θ , x

jt,θ)dt+ γ1ϕ

′1(t, x

1,Nt,θ )− γ2ϕ′2(t, x

1,Nt,θ )dt

−O( 1√N

)

≥ E[∫ T

s

∫Rf0(x

1t,θ, y)µt,θ(dy) + γ1ϕ

′1(t, x

1t,θ)− γ2ϕ′2(t, x1t,θ)dt

]−O

(1√N

)≥ E

[∫ T

s

∫Rf0(x

1t,θ, y)µt,θ(dy) + γ1ϕ1,θ(t, x

1t,θ)− γ2ϕ2,θ(t, x

1t,θ)dt

]−O

(1√N

)

= E

∫ T

s

1

N

N∑j=1

f0(x1,Nt,θ , x

j,Nt,θ )dt+ γ1ϕ1,θ(t, x

1,Nt,θ )− γ2ϕ2,θ(t, x

1,Nt,θ )dt

−O( 1√N

)

= J i,NT,θ (s, x1, ξ+·,θ, ξ−·,θ; ξ

−1·,θ )−O

(1√N

),

the last inequality is due to the optimality of ϕ for problem (MFG-BD), and the last equality followsfrom the Central Limit Theorem.

Proof of Theorem 3 b). Let all players except player one choose the optimal controls (ξ+·,θ, ξ−·,θ),

let player one choose any other controls (ξ′+·,θ , ξ

′−·,θ ) ∈ UT,∞. Denote

dξ′t = dϕ′(t, x) = dϕ′1(t, x)− dϕ′2(t, x),

21

dx1t,∞ = b(x1t,∞, µθt )dt+ dϕ′1(t, x

1t,∞)− dϕ′2(t, x1t,∞) + σdW 1

t x1s−,∞ = x,

dx1,Nt,∞ =1

N

∑j=1,...,N

b0(x1,Nt,∞, x

j,Nt,∞)dt+ dϕ′1(t, x

1,Nt,∞)− dϕ′2(t, x

1,Nt,∞) + σdW 1

t , x1,Ns−,∞ = x,

dxi,Nt,∞ =

(1

N

∑j=1,...,N

b0(xi,Nt,∞, x

j,Nt,∞) + ϕ1,θ(t, x

i,Nt,∞)− ϕ2,θ(t, x

i,Nt,∞)

)dt+ σdW i

t , xi,Ns−,∞ = x,

for i = 2, . . . , N.

Then,

d(xi,Nt,θ − xi,Nt,∞) =

1

N

N∑j=1

(b0(x

i,Nt,θ , x

j,Nt,θ )− b0(xi,Nt,∞, x

j,Nt,∞)

)+ ϕθ(t, x

i,Nt,θ )− ϕθ(t, xi,Nt,∞)

dt.By definition, ϕθ(t, x) is nonincreasing in x. Hence, a similar proof to the one for Lemma 2 yields

Lemma 3. sup2≤i≤N

E sups≤t≤T

|xi,Nt,θ − xi,Nt,∞| = O

(1√N

).

From Lemma 1 and the triangle inequality, sup2≤i≤N

E sups≤t≤T

|xit,θ − xi,Nt,∞| = O

(1√N

). Therefore,

sup2≤i≤N

E sups≤t≤T

|xit,θ − xi,Nt,∞|+ sup

1≤i≤NE sups≤t≤T

|xit,θ − xi,Nt,θ | = O

(1√N

).

Since dϕ′(t, x) is also nonincreasing in x, then again the same proof as that for Lemma 1 shows

E sups≤t≤T

|x1,Nt,∞ − x1t,∞| = O

(1√N

).

By the Lipschitz continuity of f, f0,

J1,NT,∞(s, x, ξ

′+· , ξ

′−· ; ξ−1·,θ )

= E

∫ T

s

1

N

N∑j=1

f0(x1,Nt,∞, x

j,Nt,∞)dt+ γ1dϕ

′1(t, x

1,Nt,∞) + γ2dϕ

′2(t, x

1,Nt,∞)

≥ E

∫ T

s

1

N

N∑j=1

f0(x1,Nt,∞, x

jt,θ)dt+ γ1dϕ

′1(t, x

1,Nt,∞) + γ2dϕ

′2(t, x

1,Nt,∞)

−O( 1√N

)

≥ E[∫ T

s

∫Rf(x1,Nt,∞, y)µt,θ(dy)dt+ γ1dϕ

′1(t, x

1,Nt,∞) + γ2dϕ

′2(t, x

1,Nt,∞)

]−O

(1√N

)≥ E

[∫ T

s

∫Rf(x1t,∞, y)µt,θ(dy)dt+ γ1dϕ

′1(t, x

1,Nt,∞) + γ2dϕ

′2(t, x

1,Nt,∞)

]−O

(1√N

).

By definition of ϕ′1, ϕ′2,

E∣∣∣dϕ′1(t, x1,Nt,∞)− dϕ′1(t, x1t,∞)− dϕ′2(t, x

1,Nt,∞) + dϕ′2(t, x

1t,∞)

∣∣∣≤ Ed|x1,Nt,∞ − x1t,∞|+ E

∣∣∣∣∣∣ 1

N

∑j=1,...,N

b0(x1,Nt,∞, x

j,Nt,∞)− b(x∞1

t , µt,θ)

∣∣∣∣∣∣ dt = O

(1√N

),

22

and by definition of ϕ′1, ϕ′2,∣∣∣(dϕ′1(t, x1,Nt,∞)− dϕ′1(t, x1t,∞)

)+(−dϕ′2(t, x

1,Nt,∞) + dϕ′2(t, x

1t,∞)

)∣∣∣=∣∣∣dϕ′1(t, x1,Nt,∞)− dϕ′1(t, x1t,∞)

∣∣∣+∣∣∣−dϕ′2(t, x1,Nt,∞) + dϕ′2(t, x

1t,∞)

∣∣∣ .Therefore,

E sups≤t≤T

∣∣∣dϕ′1(t, x1,Nt,∞)− dϕ′1(t, x1t,∞)∣∣∣ = O

(1√N

),

E sups≤t≤T

∣∣∣−dϕ′2(t, x1,Nt,∞) + dϕ′2(t, x1t,∞)

∣∣∣ = O

(1√N

),

and

E[∫ T

s


′1(t, x

1,Nt,∞) + γ2dϕ

′2(t, x

1,Nt,∞)

]−O

(1√N

)≥ E

[∫ T

s


′1(t, x

1t,∞) + γ2dϕ

′2(t, x

1t,∞)

]−O

(1√N

)≥ E

[∫ T

s

∫Rf(x1t,∞, y)µt,θ(dy)dt+ γ1dϕ1,∞(t, x1t,∞) + γ2dϕ2,∞(t, x1t,∞)

]−O

(1√N

)= vT,∞(s, x; µt,θ)−O

(1√N

).

The last inequality is due to the optimality of ϕ∞.Now, by Theorem 1, ∣∣∣∣vT,θ(s, x : µt,θ))− vT,∞(s, x; µt,θ))

∣∣∣∣≤ εθ.Hence, by E sup

s≤t≤T|xit,θ − x

i,Nt,θ | = εN and by the analysis as in the previous steps

J1,NT,∞(s, x, ξ

′+· , ξ

′−· ; ξ−1·,θ ) = vT,∞(s, x; µt,θ)− εN

≥ vT,θ(s, x; µt,θ)− (εN + εθ)

= E[∫ T

s

∫Rf(x1t,θ, y)µt,θ(dy)dt+ γ1dϕ1,θ(t, x

1t,θ) + γ2dϕ2,θ(t, x

1t,θ)

]− (εN + εθ)

≥ E

∫ T

s

1

N

N∑j=1

f0(x1,Nt,θ , x

j,Nt,θ )dt+ γ1dϕ1,θ(t, x

1,Nt,θ ) + γ2dϕ2,θ(t, x

1,Nt,θ )

− (εN + εθ)

= J1,NT,∞(s, x, ξ+·,θ, ξ

−·,θ; ξ

−1·,θ )− (εN + εθ).

References

[1] M. Bardi and F. S. Priuli. Linear-quadratic N-person and mean-field games with ergodic cost.SIAM Journal on Control and Optimization 52(5) (2014), 3022-3052.

[2] P. Billingsley. Convergence of Probability Measures. John Wiley & Sons (2013).

23

[3] R. Buckdahn, B. Djehiche, J. Li, and S. G. Peng. Mean-field backward stochastic differentialequations: a limit approach. The Annals of Probability 37(4) (2009), 1524-1565.

[4] R. Buckdahn, J. Li, and S. G. Peng. Mean-field backward stochastic differential equations andrelated partial differential equations. Stochastic Processes and their Applications 119(10) (2009),3133-3154.

[5] P. Cardaliaguet. Notes on mean field games (from Pierre-Louis Lions’ lectures at College deFrance). Technical report (2013).

[6] P. Cardaliaguet and C-A. Lehalle. Mean field game of controls and an application to tradecrowding. arXiv preprint arXiv:1610.09904 (2016).

[7] R. Carmona and F. Delarue. Probabilistic analysis of mean-field games. SIAM Journal on Con-trol and Optimization 51(4) (2013), 2705-2734.

[8] R. Carmona and F. Delarue. The master equation for large population equilibriums. StochasticAnalysis and Applications (2014), 77-128.

[9] R. Carmona, F. Delarue, and D. Lacker. Mean field games with common noise. The Annals ofProbability 44(6) (2016), 3740-3803.

[10] R. Carmona, and D. Lacker. A probabilistic weak formulation of mean field games and appli-cations. The Annals of Applied Probability 25(3) (2015), 1189-1231.

[11] R. Carmona and X. Zhu. A probabilistic approach to mean field games with major and minorplayers. The Annals of Applied Probability 26(3) (2016),1535-1580.

[12] M. Fischer. On the connection between symmetric N -player games and mean field games. arXivpreprint arXiv:1405.1345 (2014).

[13] W. H. Fleming and R. W. Rishel. Deterministic and Stochastic Optimal Control. Vol. 1.Springer Science & Business Media (2012).

[14] G. X. Fu and U. Horst. Mean field games with singular controls. arXiv preprintarXiv:1612.05425 (2016).

[15] D. Gomes, S. Patrizi, and V. Voskanyan. On the existence of classical solutions for stationaryextended mean field games. Nonlinear Analysis: Theory, Methods & Applications 99 (2014),49-79.

[16] O. Gueant, J. Lasry, and P. L. Lions. Mean field games and applications. Paris-Princetonlectures on mathematical finance (2010).

[17] D. Hernandez-Hernandez, J. L. Perez, and K. Yamazaki. Optimality of refraction strategies forspectrally negative Levy processes. SIAM Journal on Control and Optimization 54(3) (2016),1126-1156.

[18] Y. Z. Hu, B. Øksendal, and A. Sulem. Singular mean-field control games with applications tooptimal harvesting and investment problems. arXiv preprint arXiv:1406.1863 (2014).

[19] M. Huang, R. P. Malhame, and P. E. Caines. Large population stochastic dynamic games:closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Communica-tions in Information & Systems 6(3) (2006), 221-252.

24

[20] D. Lacker. Mean field games via controlled martingale problems: existence of Markovian equi-libria. Stochastic Processes and their Applications 125(7) (2015), 2856-2894.

[21] J. Lasry and P. L. Lions. Mean field games. Japanese Journal of Mathematics 2(1) (2007),229-260.

[22] M. Nutz. A mean field game of optimal stopping. arXiv preprint arXiv:1605.09112 (2016).

[23] H. Pham. Continuous-time Stochastic Control and Optimization with Financial Applications.Springer (2009).

[24] H. Pham and X. Wei. Bellman equation and viscosity solutions for mean-field stochastic controlproblem. arXiv preprint arXiv:1512.07866 (2015).

[25] H. Pham and X. Wei. Dynamic programming for optimal control of stochastic McKean-Vlasovdynamics. arXiv preprint arXiv:1604.04057 (2016).

[26] D. Smart. Fixed point theorems. Vol. 66. CUP Archive, (1980).

[27] J. Yong and X. Y. Zhou. Stochastic Controls: Hamiltonian Systems and HJB Equations.Springer (1999).

[28] L. Zhang. The relaxed stochastic maximum principle in the mean-field singular controls. arXivpreprint arXiv:1202.4129 (2012).

[29] A. K. Zvonkin. A transformation of the phase space of a diffusion process that removes thedrift. Mathematics of the USSR-Sbornik 22(1) (1974).

25

university of california, berkeley · stochastic games and mean field games with singular controls...

Documents