hamilton-jacobi-bellman equations for dynamic pricing · hamilton-jacobi-bellman equations for...
TRANSCRIPT
Hamilton-Jacobi-Bellman equations fordynamic pricingIndustrial and Applied Mathematics Seminar, Oxford
Asbjørn Nilsen RisethSupervisors: Jeff Dewynne, Chris FarmerFebruary 2, 2017
OCIAM, University of Oxford
1
Pricing challenge
Continuous time pricing challengeGiven
• an initial amount of stock for different products,• a termination time.
Maximise revenue and minimise cost of unsold items.
2
Mathematical challenge
Hamilton-Jacobi-Bellman (HJB) equationsFind v : [0, T]× D such that
vt +maxa∈A
{σ(t, x,a)2
2 vxx + b(t, x,a)vx + f(t, x,a)}
= 0 (1)
v(T, x) = g(x) (2)
3
Outline
• Pricing problem and HJB solutions
• Numerical and modelling challenges• Explicit time-stepping scheme with CFL O(∆t) = O(∆x)
• (Risk-neutral vs. risk-averse behaviour)
4
Single product optimal control
Stochastic optimal control
Ingredients
1. Dynamical system — SDEs
2. Objective function
3. Find policy function to maximise objective
4. Optimality condition→ HJB equation
5
Sales dynamics
Simple pricing model
• Pricing strategy α(t)• Stock levels Xα(t) over period t ∈ [0, T], Xα(0) = x0 > 0• Expected demand per time, Q(a)• Uncertainty 0 < γ << 1• Brownian motion W(t)
dXα(t) = −Q(α(t))(dt+ γdW(t)), when Xα(t) > 0. (3)
• O(dW(t)) = O(√
dt)
6
Silly model:High probability of “negative sales” over small
timescales
6
Sales dynamics
Example
Q(a) = 1− a, γ = 0.05 (4)
Constant price α(t) = 2/3, starting with X(0) = 1.
7
Sales dynamics
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
0.2
0.4
0.6
0.8
1
t
X(t)
ω1ω2ω3ω4ω5ω6
Figure 1: Constant price α(t) = 2/3, starting with X(0) = 1. 7
Pricing objective
Pricing objective
• Revenue∫ T0 α(s) · −dXα(s)
• Cost per unit of remaining stock C > 0• Total value
−∫ T
0α(s) · dXα(s)− C · Xα(T)
, a random variable!
(4)
• Maximise expected value
Simplification: E[∫ T
0α(s) · Q(α(s))ds− C · Xα(T)
](5)
8
Pricing objective
Pricing objective
• Revenue∫ T0 α(s) · −dXα(s)
• Cost per unit of remaining stock C > 0• Total value
−∫ T
0α(s) · dXα(s)− C · Xα(T), a random variable! (4)
• Maximise expected value
Simplification: E[∫ T
0α(s) · Q(α(s))ds− C · Xα(T)
](5)
8
Stochastic control problem
Pricing challenge
• Allowed prices A = [amin,amax]• Find pricing strategies of the form α(t) = a(t, X(t)) ∈ A,taken from collection A
Given initial stock x0, solve the optimisation problem
maxa(·,·)∈A
E[∫ T
0α(s) · Q(α(s))ds− C · Xα(T) | α(s) = a(s, X(s))
]
9
Function space A is infinite-dimensional
9
HJB: optimality condition
Strategy
• Define a value function v(t, x)
• Function v(t, x) satisfies the HJB equation
• Find optimal a(t, x) in terms of v(t, x)
10
HJB: optimality condition
Value functionExpected value of having stock X(t) = x > 0 at time t
v(t, x) = maxa(·,·)∈A
E[∫ T
tα(s)Q(α(s))ds− C · Xα(T) | X(t) = x
](6)
Infinitesimal change in tDynamic programming principle + Itô’s lemma [2]:For t ∈ [0, T), x > 0,
vt +maxa∈A
{γ2
2 Q(a)2vxx − Q(a)vx + a · Q(a)
}= 0 (7)
11
HJB: optimality condition
Value functionExpected value of having stock X(t) = x > 0 at time t
v(t, x) = maxa(·,·)∈A
E[∫ T
tα(s)Q(α(s))ds− C · Xα(T) | X(t) = x
](6)
Infinitesimal change in tDynamic programming principle + Itô’s lemma [2]:For t ∈ [0, T), x > 0,
vt +maxa∈A
{γ2
2 Q(a)2vxx − Q(a)vx + a · Q(a)
}= 0 (7)
11
HJB for pricing problem
Find v : [0, T]× [0,∞) such that
vt +maxa∈A
{γ2
2 Q(a)2vxx − Q(a)vx + a · Q(a)
}= 0 (8)
v(T, x) = −C · x (9)v(t, 0) = 0 (10)
12
HJB → optimal control function
Optimality result
• The value function is the unique viscosity solution to HJB• The optimal pricing function a∗(t, x) is given by
a∗(t, x) = argmaxa∈A
{γ2
2 Q(a)2vxx − Q(a)vx + a · Q(a)
}(11)
13
HJB pricing example
Explicit HJB formLet A = [0, 1], Q(a) = 1− a, then
a∗(t, x) = PA[1+ vx − γ2vxx
2− γ2vxx
](12)
Whenever a∗(t, x) ∈ interior(A)
vt +(1− vx)24− 2γ2vxx
= 0 (13)
Otherwise,
vt = 0 a∗(t, x) = 1 (14)
vt +γ2
2 vxx − vx = 0 a∗(t, x) = 0 (15)
14
HJB pricing example
Explicit HJB formLet A = [0, 1], Q(a) = 1− a, then
a∗(t, x) = PA[1+ vx − γ2vxx
2− γ2vxx
](12)
Whenever a∗(t, x) ∈ interior(A)
vt +(1− vx)24− 2γ2vxx
= 0 (13)
Otherwise,
vt = 0 a∗(t, x) = 1 (14)
vt +γ2
2 vxx − vx = 0 a∗(t, x) = 0 (15)
14
HJB pricing example
Explicit HJB formLet A = [0, 1], Q(a) = 1− a, then
a∗(t, x) = PA[1+ vx − γ2vxx
2− γ2vxx
](12)
Whenever a∗(t, x) ∈ interior(A)
vt +(1− vx)24− 2γ2vxx
= 0 (13)
Otherwise,
vt = 0 a∗(t, x) = 1 (14)
vt +γ2
2 vxx − vx = 0 a∗(t, x) = 0 (15)
14
HJB pricing solution
Example
Q(a) = 1− a, γ = 0.05, C = 1. (16)
15
HJB pricing solution
Figure 2: Value function 16
HJB pricing solution
Figure 2: Optimal pricing strategy 16
HJB pricing solution
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
0.2
0.4
0.6
0.8
1
t
X(t)
ω1ω2ω3ω4ω5ω6
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
0.6
0.7
0.8
t
α(t)
Figure 2: Optimal sales dynamics16
HJB pricing solution
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
0.2
0.4
0.6
0.8
1
t
X(t)
ω1ω2ω3ω4ω5ω6
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
0.2
0.4
0.6
0.8
1
t
X(t)
ω1ω2ω3ω4ω5ω6
Figure 2: α(t) = a∗(t, x) versus α(t) = 2/316
Improved CFL condition
HJB challenges
vt +maxa∈A
{σ(t, x,a)2
2 vxx + b(t, x,a)vx + f(t, x,a)}
= 0 (17)
v(T, x) = g(x) (18)
Numerical approximations
• PDE may have multiple solutions• Unbounded domains
• Truncation and artificial boundary conditions
• Global vs. local maxima in optimisation step
17
Linear Interpolation Semi-Lagrangian Schemes
Scheme with guarantees [1]
• Converges to correct PDE solution• Standard finite difference schemes can’t guarantee this inhigher dimensions
• Explicit time-stepping stability1: O(∆t) = O(∆x)• Standard finite difference: O(∆t) = O(∆x2)
1Boundaries cause trouble
18
Linear Interpolation Semi-Lagrangian Interpolation Schemes
Idea
• Directional derivative:
b∂v∂x = lim
∆x↓0v(x+ b∆x)− v(x)
∆x (19)
• Diffusion:
σ2∂2v∂x = lim
∆x↓0v(x+ σ
√∆x)− 2v(x) + v(x− σ
√∆x)
∆x (20)
19
Linear Interpolation Semi-Lagrangian Interpolation Schemes
Interpolation
• Grid x0, x1, . . . xN
b∂v∂x(xj) ≈
v(xj + b(xj,a)∆x)− v(xj)∆x (21)
• The point xj + b(xj,a)∆x is not necessarily on the grid
• Linear interpolation guarantees convergence
20
Wrapping up
Summary and future work
Summary
• Dynamic pricing problems can be solved with HJB• Semi-Lagrangian schemes make explicit time-steppingfeasible
• (More risk-averse — lower price)
Future work
• Model demand as a positive process• Generalise HJB solver
• Any dimension• Optimisation algorithms• Better handling of truncated boundaries
21
References i
References
[1] K. Debrabant and E. Jakobsen. Semi-Lagrangian schemesfor linear and fully non-linear diffusion equations.Mathematics of Computation, 82(283):1433–1462, 2013.
[2] H. Pham. Continuous-time stochastic control andoptimization with financial applications, volume 61.Springer Science & Business Media, 2009.
22
Risk-averse decision maker
Should the pricing strategy depend on howmuch you’ve already earned?
Depends on risk-attitude
Should the pricing strategy depend on howmuch you’ve already earned?Depends on risk-attitude
Larger state space
Include accumulated revenue in the state:
dXα(t) = −Q(α(t))(dt+ γdW(t)) (22)dRα(t) = −α(t)dXα(t) (23)
Profit at time T, following strategy α(t) = a(t, X(t),R(t)):
P(α) = Rα(T)− C · Xα(T) (24)
• Risk-neutral: Maximise E[P(α)]• Risk-averse: Maximise E[u(P(α))]
• Utility function u(p) is increasing, and concave• Losses are more painful than pleasure from gains
Larger state space
Include accumulated revenue in the state:
dXα(t) = −Q(α(t))(dt+ γdW(t)) (22)dRα(t) = −α(t)dXα(t) (23)
Profit at time T, following strategy α(t) = a(t, X(t),R(t)):
P(α) = Rα(T)− C · Xα(T) (24)
• Risk-neutral: Maximise E[P(α)]• Risk-averse: Maximise E[u(P(α))]
• Utility function u(p) is increasing, and concave• Losses are more painful than pleasure from gains
HJB with revenue
HJB equation with revenueFind v : [0, T]× [0,∞)× R such that
vt +maxa∈A
{γ2
2 Q(a)2[vxx − 2avxr + a2vrr]− Q(a)vx + aQ(a)vr} = 0
v(t, 0, r) = u(r)v(T, x, r) = u(r− C · x)
Risk profiles
Absolute risk aversion, A(p) = −u′′(p)u′(p)
• Decreasing: Power
u(p) = (p+m)µ
µµ < 1, µ ̸= 1 (25)
• Constant: Exponential
u(p) = 1− e−µp
µµ > 1 (26)
• Increasing: Quadratic
u(p) = p− µp2 (27)
HJB utility solution
Example
Q(a) = 1− a, γ = 0.2, C = 1. (28)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.37
0.37
0.38
0.38
0.39
0.39
0.4
0.4
r
a(t,x
,r)
t = 0.5, x = 0.3
PowerQuadraticExponentialNeutral
Figure 3: Optimal pricing strategies for different risk-attitudes