hamilton-jacobi-bellman equations for dynamic pricing · hamilton-jacobi-bellman equations for...

Hamilton-Jacobi-Bellman equations fordynamic pricingIndustrial and Applied Mathematics Seminar, Oxford

Asbjørn Nilsen RisethSupervisors: Jeff Dewynne, Chris FarmerFebruary 2, 2017

OCIAM, University of Oxford

1

Pricing challenge

Continuous time pricing challengeGiven

• an initial amount of stock for different products,• a termination time.

Maximise revenue and minimise cost of unsold items.

2

Mathematical challenge

Hamilton-Jacobi-Bellman (HJB) equationsFind v : [0, T]× D such that

vt +maxa∈A

{σ(t, x,a)2

2 vxx + b(t, x,a)vx + f(t, x,a)}

= 0 (1)

v(T, x) = g(x) (2)

3

Outline

• Pricing problem and HJB solutions

• Numerical and modelling challenges• Explicit time-stepping scheme with CFL O(∆t) = O(∆x)

• (Risk-neutral vs. risk-averse behaviour)

4

Single product optimal control

Stochastic optimal control

Ingredients

1. Dynamical system — SDEs

2. Objective function

3. Find policy function to maximise objective

4. Optimality condition→ HJB equation

5

Sales dynamics

Simple pricing model

• Pricing strategy α(t)• Stock levels Xα(t) over period t ∈ [0, T], Xα(0) = x0 > 0• Expected demand per time, Q(a)• Uncertainty 0 < γ << 1• Brownian motion W(t)

dXα(t) = −Q(α(t))(dt+ γdW(t)), when Xα(t) > 0. (3)

• O(dW(t)) = O(√

dt)

6

Silly model:High probability of “negative sales” over small

timescales

6

Sales dynamics

Example

Q(a) = 1− a, γ = 0.05 (4)

Constant price α(t) = 2/3, starting with X(0) = 1.

7

Sales dynamics

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

0.2

0.4

0.6

0.8

1

t

X(t)

ω1ω2ω3ω4ω5ω6

Figure 1: Constant price α(t) = 2/3, starting with X(0) = 1. 7

Pricing objective

Pricing objective

• Revenue∫ T0 α(s) · −dXα(s)

• Cost per unit of remaining stock C > 0• Total value

−∫ T

0α(s) · dXα(s)− C · Xα(T)

, a random variable!

(4)

• Maximise expected value

Simplification: E[∫ T

0α(s) · Q(α(s))ds− C · Xα(T)

](5)

8

Pricing objective

Pricing objective

• Revenue∫ T0 α(s) · −dXα(s)

• Cost per unit of remaining stock C > 0• Total value

−∫ T

0α(s) · dXα(s)− C · Xα(T), a random variable! (4)

• Maximise expected value

Simplification: E[∫ T

0α(s) · Q(α(s))ds− C · Xα(T)

](5)

8

Stochastic control problem

Pricing challenge

• Allowed prices A = [amin,amax]• Find pricing strategies of the form α(t) = a(t, X(t)) ∈ A,taken from collection A

Given initial stock x0, solve the optimisation problem

maxa(·,·)∈A

E[∫ T

0α(s) · Q(α(s))ds− C · Xα(T) | α(s) = a(s, X(s))

]

9

Function space A is infinite-dimensional

9

HJB: optimality condition

Strategy

• Define a value function v(t, x)

• Function v(t, x) satisfies the HJB equation

• Find optimal a(t, x) in terms of v(t, x)

10

HJB: optimality condition

Value functionExpected value of having stock X(t) = x > 0 at time t

v(t, x) = maxa(·,·)∈A

E[∫ T

tα(s)Q(α(s))ds− C · Xα(T) | X(t) = x

](6)

Infinitesimal change in tDynamic programming principle + Itô’s lemma [2]:For t ∈ [0, T), x > 0,

vt +maxa∈A

{γ2

2 Q(a)2vxx − Q(a)vx + a · Q(a)

}= 0 (7)

11

HJB for pricing problem

Find v : [0, T]× [0,∞) such that

vt +maxa∈A

{γ2


}= 0 (8)

v(T, x) = −C · x (9)v(t, 0) = 0 (10)

12

HJB → optimal control function

Optimality result

• The value function is the unique viscosity solution to HJB• The optimal pricing function a∗(t, x) is given by

a∗(t, x) = argmaxa∈A

{γ2


}(11)

13

HJB pricing example

Explicit HJB formLet A = [0, 1], Q(a) = 1− a, then

a∗(t, x) = PA[1+ vx − γ2vxx

2− γ2vxx

](12)

Whenever a∗(t, x) ∈ interior(A)

vt +(1− vx)24− 2γ2vxx

= 0 (13)

Otherwise,

vt = 0 a∗(t, x) = 1 (14)

vt +γ2

2 vxx − vx = 0 a∗(t, x) = 0 (15)

14

HJB pricing solution

Example

Q(a) = 1− a, γ = 0.05, C = 1. (16)

15


Figure 2: Value function 16


Figure 2: Optimal pricing strategy 16


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

0.2

0.4

0.6

0.8

1

t

X(t)

ω1ω2ω3ω4ω5ω6

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

0.6

0.7

0.8

t

α(t)

Figure 2: Optimal sales dynamics16


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

0.2

0.4

0.6

0.8

1

t

X(t)

ω1ω2ω3ω4ω5ω6

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

0.2

0.4

0.6

0.8

1

t

X(t)

ω1ω2ω3ω4ω5ω6

Figure 2: α(t) = a∗(t, x) versus α(t) = 2/316

Improved CFL condition

HJB challenges

vt +maxa∈A

{σ(t, x,a)2

2 vxx + b(t, x,a)vx + f(t, x,a)}

= 0 (17)

v(T, x) = g(x) (18)

Numerical approximations

• PDE may have multiple solutions• Unbounded domains

• Truncation and artificial boundary conditions

• Global vs. local maxima in optimisation step

17

Linear Interpolation Semi-Lagrangian Schemes

Scheme with guarantees [1]

• Converges to correct PDE solution• Standard finite difference schemes can’t guarantee this inhigher dimensions

• Explicit time-stepping stability1: O(∆t) = O(∆x)• Standard finite difference: O(∆t) = O(∆x2)

1Boundaries cause trouble

18

Linear Interpolation Semi-Lagrangian Interpolation Schemes

Idea

• Directional derivative:

b∂v∂x = lim

∆x↓0v(x+ b∆x)− v(x)

∆x (19)

• Diffusion:

σ2∂2v∂x = lim

∆x↓0v(x+ σ

√∆x)− 2v(x) + v(x− σ

√∆x)

∆x (20)

19

Linear Interpolation Semi-Lagrangian Interpolation Schemes

Interpolation

• Grid x0, x1, . . . xN

b∂v∂x(xj) ≈

v(xj + b(xj,a)∆x)− v(xj)∆x (21)

• The point xj + b(xj,a)∆x is not necessarily on the grid

• Linear interpolation guarantees convergence

20

Wrapping up

Summary and future work

Summary

• Dynamic pricing problems can be solved with HJB• Semi-Lagrangian schemes make explicit time-steppingfeasible

• (More risk-averse — lower price)

Future work

• Model demand as a positive process• Generalise HJB solver

• Any dimension• Optimisation algorithms• Better handling of truncated boundaries

21

References i

References

[1] K. Debrabant and E. Jakobsen. Semi-Lagrangian schemesfor linear and fully non-linear diffusion equations.Mathematics of Computation, 82(283):1433–1462, 2013.

[2] H. Pham. Continuous-time stochastic control andoptimization with financial applications, volume 61.Springer Science & Business Media, 2009.

22

Risk-averse decision maker

Should the pricing strategy depend on howmuch you’ve already earned?

Depends on risk-attitude

Should the pricing strategy depend on howmuch you’ve already earned?Depends on risk-attitude

Larger state space

Include accumulated revenue in the state:

dXα(t) = −Q(α(t))(dt+ γdW(t)) (22)dRα(t) = −α(t)dXα(t) (23)

Profit at time T, following strategy α(t) = a(t, X(t),R(t)):

P(α) = Rα(T)− C · Xα(T) (24)

• Risk-neutral: Maximise E[P(α)]• Risk-averse: Maximise E[u(P(α))]

• Utility function u(p) is increasing, and concave• Losses are more painful than pleasure from gains

HJB with revenue

HJB equation with revenueFind v : [0, T]× [0,∞)× R such that

vt +maxa∈A

{γ2

2 Q(a)2[vxx − 2avxr + a2vrr]− Q(a)vx + aQ(a)vr} = 0

v(t, 0, r) = u(r)v(T, x, r) = u(r− C · x)

Risk profiles

Absolute risk aversion, A(p) = −u′′(p)u′(p)

• Decreasing: Power

u(p) = (p+m)µ

µµ < 1, µ ̸= 1 (25)

• Constant: Exponential

u(p) = 1− e−µp

µµ > 1 (26)

• Increasing: Quadratic

u(p) = p− µp2 (27)

HJB utility solution

Example

Q(a) = 1− a, γ = 0.2, C = 1. (28)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.37

0.37

0.38

0.38

0.39

0.39

0.4

0.4

r

a(t,x

,r)

t = 0.5, x = 0.3

PowerQuadraticExponentialNeutral

Figure 3: Optimal pricing strategies for different risk-attitudes

hamilton-jacobi-bellman equations for dynamic pricing · hamilton-jacobi-bellman equations for...

Documents