convex duality and financial mathematics - homepages …homepages.wmich.edu/~zhu/papers/cdmf.pdf ·...

150
Peter Carr and Qiji Jim Zhu Convex Duality and Financial Mathematics April 11, 2017

Upload: nguyenliem

Post on 13-May-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Peter Carr and Qiji Jim Zhu

Convex Duality and FinancialMathematics

April 11, 2017

To Carol and Olivia

To Lilly and Charles.

And in memory of Jonathan Borwein (1951-2016) with respect.

Preface

Convex duality plays an essential role in many important financial problems.For example, it arises both in the minimization of convex risk measures andin the maximization of concave utility functions. Together with generalizedconvex duality, they also appear when an optimization is not immediatelyapparent, for instance in implementing dynamic hedging of contingent claims.Recognizing the role of convex duality in financial problems is crucial for sev-eral reasons. First, considering the primal and dual problem together gives thefinancial modeler the option to tackle the more accessible problem first. Usu-ally, knowledge of the solution of one helps in solving the other. Moreover, thesolution to the dual problem can usually be given a financial interpretation.As a result, the dual problem often illuminates an alternative perspective,which is not easily achieved by examining the primal problem in isolation.When flipping from the primal to the dual, a surprise insight typically awaits,irrespective of past experience. Finally, as an added benefit, the primal andthe dual can often be paired together to provide better numerical solutionsthan when either side is considered in isolation.

The goal of this book is to provide a concise introduction to this growingresearch field. Our target audience is graduate students and researchers in re-lated areas. We begin in Chapter 1 with a quick introduction of convex dualityand related tools. We emphasize the relationship between convex duality andthe Lagrange multiplier rule for constrained optimization problems. We thengive a quick overview of the intrinsic duality relationship in several diversefinancial problems.

In Chapter 2, we consider the simplest possible financial market model. Inparticular, we consider a one period economy with a finite number of possiblestates. Using this simple financial market model, we showcase convex dualityin a number of important financial problems. We begin with the Markowitzportfolio theory, which involves a particularly simple convex programmingproblem: optimizing a quadratic function with linear constraints. Duality playstwo important roles in Markowitz portfolio theory. First, while the primalproblem may involve hundreds or even thousands of variables representing

iv Preface

the risky assets potentially included in the portfolio, the dual problem hasonly two variables related to the two constraints on the initial endowmentand the expected return. In fact, the key observation of Markowitz is thatone can evaluate the performance of a portfolio in the dual space using thevariance - expected return pair. Second, the duality relationship between theprimal Markowitz portfolio problem and its dual help us to understand thatthe set of optimal portfolios is an affine set, which leads to the importanttwo fund theorem. The core methodology of optimizing a quadratic functionwith linear constraints was also used in the capital asset pricing model, whichleads to the widely used Sharpe ratio. Duality also plays a crucial role in thisproblem.

Next, we consider portfolio optimization from the perspective of maxi-mizing expected utility. There has been a very long history of using utilityfunctions in economics. In financial problems, utility functions are increasingconcave functions of wealth. The concavity of the utility function captures therisk aversion of an investor. Arrow and Pratt introduced widely used measuresof the level of risk aversion. It turns out that there is a precise way of usinggeneralized convexity to characterize Pratt–Arrow risk aversion. This applica-tion illustrates the relevance of generalized convexity in dealing with financialproblems. It is even more interesting to consider the dual of the expectedutility maximization problem. It turns out that in the absence of arbitrage,solutions to the dual problem are in essence the equivalent martingale mea-sures (also called risk-neutral probabilities), which are widely used in pricingfinancial derivatives. Considering the expected utility maximization problemalong with its dual leads us to rediscover the fundamental theorem of as-set pricing. An added benefit of this alternative approach is that martingalemeasures can be related to the risk aversion of agents in the market.

The last application that we cover in Chapter 2 concerns the dual rep-resentation of coherent risk measures. Coherent risk measures are motivatedby the common regulatory practice of assigning each position in a risky assetwith the appropriate amount of cash reserves. Hence, they are widely usedto analyze risks. Mathematically, a coherent risk measure is characterized bya sub-linear function: a convex function with positive homogeneity. It is wellknown that the dual of a sub-linear function is an indicator function. Thus,using dual representation, a coherent risk measure is just the support functionof a closed convex set. Financially, we can view the generating set of a co-herent risk measure as the probabilities assigned to risky scenarios in a stresstest. Duality also generates numerical methods for calculating some importantcoherent risk measures such as the conditional value at risk.

We expand our discussion to a more general multi-period financial marketmodel in Chapter 3. This more general setting allows us to model dynamictrading. The added complexity in dealing with a multi-period model mainlyinvolves capturing the increase in information using an information structure.After laying out the multi-period financial market model, we show that thefundamental theorem of asset pricing also arises in a multi-period financial

Preface v

market model. After that we also discuss two new topics: super (sub) hedg-ing and conic finance. In general, the absence of arbitrage leads to multiple(usually infinitely many) pricing martingale measures in an incomplete mar-ket. Thus, the no arbitrage principle usually determines a price range for acontingent claim with upper and lower bounds, which are given by the supre-mum and the infimum of the expectation of the payoff under the martingalemeasures, respectively. If a market price falls outside of these bounds, then anarbitrage opportunity occurs. It turns out that the dual solution to the opti-mization problem of finding the upper or lower no arbitrage bounds providesa trading strategy that one can use to take advantage of such an arbitrageopportunity. Conic finance is used to describe financial markets for which theabsolute value of the price depends on whether one is buying or selling. Inother words, conic finance describes realistic financial markets with a strictlypositive bid-ask spread. In such a model, the cash flows that can be achievedfrom implementing acceptable trading strategies form a convex cone. This ob-servation provides the rationale for the name conic finance. Despite the addedcomplication of dealing with a conic constraint, we show that most of theduality relationships that are observed under zero bid–ask spread still prevailwhen the spread is positive.

We then move to continuous-time financial models in Chapter 4. The mostnoteworthy duality relationship developed in this chapter is the observationthat the classical Black-Scholes formula for pricing a contingent claim witha convex payoff is, in fact, a Fenchel-Legendre transform. We show that thefunction describing cash borrowings while delta hedging a short position in acontingent claim is just the Fenchel conjugate of the contingent claim pricingfunction. The flip side is that the contingent claim pricing function can itselfbe viewed as a Fenchel conjugate of the function describing these cash borrow-ings. This provides a new perspective on the convex function linking the priceof the contingent claim to the underlying spot price. With the availability ofmany tradable contingent claims such as those embedded in ETF’s, the abil-ity to dynamically hedge a contingent claim with other contingent claims isincreasingly becoming a financial reality. Interestingly, when using contingentclaims as hedging instruments, one discovers a similar duality relationship be-tween the contingent claim pricing function and the cash borrowings functionin terms of generalized convexity. Many useful applications are also discussedin this chapter. We examine the convexity and generalized convexity of theBachelier and Black-Scholes option pricing formulae with respect to volatilityas well. Generalizations of these properties might be useful in dealing withfinancial products related to volatility and be a potentially fruitful futureresearch direction.

The material in this book grew out of slides used to teach a joint doctoralseminar at New York University’s Courant Institute in the fall of 2015. Partof the materials has also been used previously for graduate topic courses onoptimization and modelling at Western Michigan University. We thank ourcolleagues at both NYU and WMU for providing us with supportive research

vi Preface

environments. Professor Robert Kohn helped to arrange us becoming neigh-bors, which facilitated our collaboration in no small part. Conversations withProfessors Marco Avellaneda, Jonathan Goodman and Fang-Hua Lin havebeen most helpful. We are also indebted to the participants of these coursesfor many stimulating discussions. In particular, we thank Monty Essid, TomLi, Matthew Foreman, Sanjay Karanth, Jay Treiman, Mehdi Vazifadan, andGuolin Yu whose detailed comments on various parts of our lecture notes havebeen incorporated into the text.

New York, NY Peter CarrKalamazoo, MI Qiji Jim Zhu

April, 2017

Contents

1 Convex Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Convex Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Subdifferential and Lagrange Multiplier . . . . . . . . . . . . . . . . . . . . 41.3 Fenchel Conjugate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Convex Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5 Generalized Convexity, Conjugacy and Duality . . . . . . . . . . . . . . 25

2 Financial Models in One Period Economy . . . . . . . . . . . . . . . . . 312.1 Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2 Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.3 Fundamental Theorem of Asset Pricing . . . . . . . . . . . . . . . . . . . . . 492.4 Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3 Finite Period Financial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.2 Arbitrage and Admissible Trading Strategies . . . . . . . . . . . . . . . . 803.3 Fundamental Theorem of Asset Pricing . . . . . . . . . . . . . . . . . . . . . 833.4 Hedging and Super Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.5 Conic Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 Continuous Financial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.1 Continuous Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.2 Bachelier and Black-Scholes Formulae . . . . . . . . . . . . . . . . . . . . . . 1094.3 Duality and Delta Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1144.4 Generalized Duality and Hedging with Contingent Claims . . . . 118

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

1

Convex Duality

Summary. We present a concise description of the convex duality theory in thischapter. The goal is to lay a foundation for later application in various financialproblems rather than to be comprehensive. We emphasize the role of the subdiffer-ential of the value function of a convex programming problem. It is both the set ofLagrange multiplier and the set of solutions to the dual problem. These relation-ships provide much convenience in financial applicaitons. We also discuss generalizedconvexity, conjugacy and duality.

1.1 Convex Sets and Functions

1.1.1 Definitions

Definition 1.1.1 (Convex Sets and Functions) Let X be a Banach space. We saythat a subset C of X is a convex set if, for any x, y ∈ C and any λ ∈ [0, 1],λx + (1 − λ)y ∈ C. We say an extended-valued function f : X → R ∪ +∞ is aconvex function if its domain, dom f , is convex and for any x, y ∈ dom f and anyλ ∈ [0, 1], one has

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y).

We call f : X → [−∞,+∞) a concave function if −f is convex.

In some sense convex functions are the simplest functions next to linear functions.Convex sets and functions are intrinsically related. For example, it is easy to verifythat C is a convex set if and only if ιC(x) := inf∥x− c∥ | c ∈ C, its indicator func-tion, is a convex function. On the other hand if f is a convex function then, the epi-graph of f , epi f := (x, r) | f(x) ≤ r and f−1((−∞, a]) := x | f(x) ∈ (−∞, a],a ∈ R are convex sets. In fact, we can check that the convexity of epi f character-izes that of f . This geometric characteization is very useful in many situations. Forinstance, it is easy to see that the intersection of a class of convex sets is convex.Now let fα be a class of convex functions we can see that

epi supαfα = ∩α epi fα

2 1 Convex Duality

and, thus, supα fα is convex. In particular, the support function of a set C ⊂ Xdefined on the dual space X∗ by

σC(x∗) = σ(C;x∗) := sup⟨x, x∗⟩ | x ∈ C

is always convex. Note that allowing the extended value +∞ in the definition ofconvex function is important in establishing those relations.

An important property of convex functions related to applications in economicsand finance is the Jensen inequality.

Proposition 1.1.2 (Jensen’s Inequality) Let f be a convex function. Then, for anyrandom variable X on a finite probability space,

f(E[X]) ≤ E[f(X)].

When X has only finite states this result directly follows from the definition. Thegeneral result can be proven by approximation.

A special kind of convex set – convex cone is very useful.

Definition 1.1.3 Let X be a finite dimensional Banach space. We say K ⊂ X is aconvex cone if for any x, y ∈ K and any α, β ≥ 0, αx+ βy ∈ K. Moreover, we sayK is pointed if K ∩ (−K) = 0.

A pointed convex cone K induces a partial order ≤K by defining x ≤K y if andonly if y−x ∈ K. We can easily check that ≤K is reflexive (x ≤K x), antisymmetric(x ≤K y and y ≤K x implies x = y) and transitive (x ≤K y and y ≤K z impliesthat x ≤K z). The definition of convexity can easily be extended to mappings whoseimage space has such a partial order.

Definition 1.1.4 (Convex Mappings) Let X and Y be two Banach spaces. Assumethat Y has a partial order ≤K generated by the pointed convex cone K ⊂ Y . We saythat a mapping f : X → Y is K-convex provided that, for any x, y ∈ dom f and anyλ ∈ [0, 1], one has

f(λx+ (1− λ)y) ≤K λf(x) + (1− λ)f(y).

1.1.2 Convex Programming

We will often encounter various forms of the general convex programming problemsbelow in financial applications in subsequent chapters. Let X, Y and Z be finitedimensional Banach spaces. Assume that Y has a partial order ≤K generated by thepointed convex cone K. We denote the polar cone of K by

K+ := y∗ ∈ Y ∗ : ⟨y∗, y⟩ ≥ 0 for all y ∈ K.

Consider the following class of constrained optimization problems

1.1 Convex Sets and Functions 3

P (y, z) Minimize f(x) (1.1.1)

Subject to g(x) ≤K y,

h(x) = z,

x ∈ C,

where C is a closed set, f : X → R is lower semicontinuous, g : X → Y is lowersemicontinuous with respect to ≤K , and h : X → Z is continuous. We will usev(y, z) to represent the optimal value function

v(y, z) := inff(x) : g(x) ≤K y, h(x) = z, x ∈ C,

which may take values ±∞ (in infeasible or unbounded below cases), and S(y, z)the (possibly empty) solution set of problem P (y, z).

A concrete example is

Minimize f(x) (1.1.2)

Subject to gi(x) ≤ ym,m = 1, 2, . . . ,M,

hk(x) = zk, k = 1, 2, . . . ,K

x ∈ C ⊂ RN ,

where C is a closed subset, f, gm : RN → R are lower semicontinuous and hk :RN → R are continuous. Defining vector valued function g = (g1, g2, . . . , gM ) andh = (h1, h2, . . . , hk), problem (1.1.2) becomes problem (1.1.1) with ≤K=≤RM+ .Beside Euclidean spaces, for applications in this book we will often need to considerthe Banach space of random variables.

It turns out that the optimal value function of a convex programming problemis convex.

Proposition 1.1.5 (Convexity of Optimal Value Function)Suppose that in the con-strained optimization problem (1.1.1), function f is convex, mapping g is ≤K convex,and mapping h is affine and set C is convex. Then the optimal value function v isconvex.

Proof. Consider (yi, zi), i = 1, 2 in the domain of v and an arbitrary ε > 0. Wecan find xiε feasible to the constraint of problem P (yi, zi) such that

f(xiε) < v(yi, zi) + ε, i = 1, 2. (1.1.3)

Now for any λ ∈ [0, 1], we have

f(λx1ε + (1− λ)x2ε) ≤ λf(x1ε) + (1− λ)f(x2ε) (1.1.4)

< λv(y1, z1) + (1− λ)v(y2, z2) + ε.

It is easy to check that λx1ε + (1 − λ)x2ε is feasible for problem P (λ(y1, z1) + (1 −λ)(y2, z2)). Thus, v(λ(y1, z1) + (1 − λ)(y2, z2)) ≤ f(λx1ε + (1 − λ)x2ε). Combiningwith inequality (1.1.4) and letting ε→ 0 we arrive at

v(λ(y1, z1) + (1− λ)(y2, z2)) ≤ λv(y1, z1) + (1− λ)v(y2, z2),

4 1 Convex Duality

that is to say v is convex. •This is a very potent result that can help us to recognize the convexity of many

other functions. For example, let C be a convex set then the distance function to Cdefined by dC(z) = inf[∥z − c∥ : c ∈ C] is a convex function because we can rewriteit as the optimal value of the following special case of problem (1.1.1)

dC(z) = inf[∥x∥ : x+ c = z, c ∈ C].

Similarly, the inf-convolution function defined below is convex

fg(z) := infy[f(z − x) + g(x)]

= inff(u) + r : g(x)− r ≤ 0, u+ x = z

While the value function of a convex programming problem is always convex itis not necessarily smooth even if all the data involved are smooth. The following isan example

v(y) = inf[x : x2 ≤ y] =

−√

y y ≥ 0

+∞ y < 0.

1.2 Subdifferential and Lagrange Multiplier

Many naturally arising nonsmooth convex functions lead to the definition of subd-ifferential as a replacement for the nonexisting derivative.

1.2.1 Definition

Definition 1.2.1 (Subdifferential) Let X be a finite dimensional Banach space. Thesubdifferential of a lower semicontinuous function ϕ : X → R∪+∞ at x ∈ dom ϕis defined by

∂ϕ(x) = x∗ ∈ X∗ : ϕ(y)− ϕ(x) ≥ ⟨x∗, y − x⟩ for all y ∈ X.

We define the domain of the subdifferential of ϕ by

dom ∂ϕ = x ∈ X | ∂ϕ(x) = ∅.

An element of ∂ϕ(x) is called a subgradient of ϕ at x.

Definition 1.2.2 (Normal Cone) For a closed convex set C ⊂ X, we define thenormal cone of C at x ∈ C by N(C; x) = ∂ιC(x).

Sometimes we will also use the notation NC(x) = N(C; x). A useful characterizationof the normal cone is x∗ ∈ N(C;x) if and only if, for all y ∈ C, ⟨x∗, y − x⟩ ≤ 0.

It is easy to verify that if f has a continuous derivative at x then ∂f(x) = f ′(x).At a nondifferentiable point a convex function’s subdifferential is usually a set. Hereare a few examples.

1.2 Subdifferential and Lagrange Multiplier 5

Example 1.2.3 We can easily verify that

• ∂∥ · ∥(0) = B1(0), the closed ball centered at 0 with radius 1, in particular,∂| · |(0) = [−1, 1].

• ∂(·)+(0) = [0,+∞).• ∂(·)−(0) = (−∞, 0].

1.2.2 Nonemptyness of Subdifferential

A natural and important question is that when can we ensure the subdifferentialis nonempty. The following Fenchel - Rockafellar theorem provides a basic form ofsufficient conditions.

Theorem 1.2.4 (Fenchel - Rockafellar Theorem on Nonemptiness of Subdifferen-tial) Let f : X → R ∪ +∞ be a convex function. Suppose that x ∈ int(dom f).Then the subdifferential ∂f(x) is nonempty.

Proof. We observe that (x, f(x)) is a boundary point of the closed set epi f whichhas a nonempty interior. Thus by the Hahn-Banach extension theorem there exists asupporting hyperplane of epi f at (x, f(x)) whose normal vector is (0, 0) = (x∗, r) ∈X∗ × R. Now, for any x ∈ dom f and u ≥ f(x), we have

r(u− f(x)) + ⟨x∗, x− x⟩ ≥ 0. (1.2.5)

Since u ≥ f(x) is arbitrary, r ≥ 0. Moreover, if r = 0, then x ∈ int dom f would alsoimply x∗ = 0, which yield a contradiction. Thus, r > 0. Letting u = f(x) in (1.2.5)we see that −x∗/r ∈ ∂f(x). •

Remark 1.2.5 (Constraint Qualification: Relative Interior) The Fenchel–RockafellarTheorem is a fundamental result that we will use often in the sequel. Conditionx ∈ int(dom f) is a sufficient condition that can be improved. Notice that we don’tneed to worry about points at which f = ∞. Thus, we need only check the conditionof Theorem 1.2.4 on span(dom f). Thus, condition x ∈ int(dom f) can be revisedto x ∈ ri(dom f) and f is lower semicontinuous, where ri signifies the relative inte-rior, i.e. interior points on span(dom f).

Remark 1.2.6 (Constraint Qualification: Polyhedral Problem) Recall that a set ispolyhedral if it is the intersection of finitely many closed half-spaces. A function ispolyhedral if its epigraph is a polyhedral set. For a polyhedral function its subdiffer-ential is nonempty in any point of its domain (see e.g. [7]). This sufficient conditionis very useful in dealing with linear programming problems.

The conclusion ∂f(x) = ∅ can be stated alternatively as there exists a linearfunctional x∗ such that f − x∗ attains its minimum at x. This is a very usefulperspective on the use of variational arguments – deriving results by observing acertain auxiliary function attains a minimum or maximum.

6 1 Convex Duality

1.2.3 Calculus

For more complicated convex functions we need the help of a convenient calculusfor calculating or estimating its subdifferential. It turns out that the key for devel-oping such a calculus is to combine a decoupling mechanism with the existence ofsubgradient. We summarize this idea in the following lemma.

Lemma 1.2.7 (Decoupling Lemma) Let the functions f : X → R and g : Y → Rbe convex and let A : X → Y be a linear transform. Suppose that f , g and A satisfythe condition

0 ∈ ri[dom g −A dom f ]. (1.2.6)

Then there is a y∗ ∈ Y ∗ such that for any x ∈ X and y ∈ Y ,

p ≤ [f(x)− ⟨y∗, Ax⟩] + [g(y) + ⟨y∗, y⟩], (1.2.7)

where p = infx∈Xf(x) + g(Ax).

Proof. Define an optimal value function v : Y → [−∞,+∞] by

v(u) = infx∈X

f(x) + g(Ax+ u)

= infx∈X

f(x) + g(y) : y −Ax = u. (1.2.8)

Proposition 1.1.5 implies that v is convex. Moreover, it is easy to check that dom v =dom g −A dom f so that the constraint qualification condition (1.2.6) ensures that∂v(0) = ∅. Let −y∗ ∈ ∂v(0). By definition we have

v(0) = p ≤ v(y −Ax) + ⟨y∗, y −Ax⟩ ≤ f(x) + g(y) + ⟨y∗, y −Ax⟩. (1.2.9)

•We apply the decoupling lemma of Lemma 1.2.7 to establish a sandwich theorem.

Theorem 1.2.8 (Sandwich Theorem) Let f : X → R ∪ +∞ and g : Y → R ∪+∞ be convex functions and let A : X → Y be a linear map. Suppose that f ≥−g A and f , g and A satisfy condition (1.2.6). Then there is an affine functionα : X → R of the form α(x) = ⟨A∗y∗, x⟩ + r satisfying f ≥ α ≥ −g A. Moreover,for any x satisfying f(x) = −g A(x), we have −y∗ ∈ ∂g(Ax).

Proof. By Lemma 1.2.7 there exists y∗ ∈ Y ∗ such that for any x ∈ X and y ∈ Y ,

0 ≤ p ≤ [f(x)− ⟨y∗, Ax⟩] + [g(y) + ⟨y∗, y⟩]. (1.2.10)

For any z ∈ X setting y = Az in (1.2.10) we have

f(x)− ⟨A∗y∗, x⟩ ≥ −g(Az)− ⟨A∗y∗, z⟩. (1.2.11)

Thus,

1.2 Subdifferential and Lagrange Multiplier 7

a := infx∈X

[f(x)− ⟨A∗y∗, x⟩] ≥ b := supz∈X

[−g(Az)− ⟨A∗y∗, z⟩].

Picking any r ∈ [a, b], α(x) := ⟨A∗y∗, x⟩+r is an affine function that separates f and−g A. Finally, when f(x) = −g A(x), it follows from (1.2.10) that −y∗ ∈ ∂g(Ax).

•We now use the tools established above to deduce calculus rules for the con-

vex functions. We start with a sum rule playing a role similar to the sum rule forderivatives in calculus.

Theorem 1.2.9 (Convex Subdifferential Sum Rule) Let f : X → R ∪ +∞ andg : Y → R ∪ +∞ be convex functions and let A : X → Y be a linear map. Thenat any point x in X, we have the sum rule

∂(f + g A)(x) ⊃ ∂f(x) +A∗∂g(Ax), (1.2.12)

with equality if condition (1.2.6) holds.

Proof. Inclusion (1.2.12) is easy and left as an exercise. We prove the reverseinclusion under condition (1.2.6). Suppose x∗ ∈ ∂(f + g A)(x). Since shifting by aconstant does not change the subdifferential of a convex function, we may assumewithout loss of generality that

x→ f(x) + g(Ax)− ⟨x∗, x⟩

attains its minimum 0 at x = x. By the sandwich theorem there exists an affinefunction α(x) := ⟨A∗y∗, x⟩+ r with −y∗ ∈ ∂g(Ax) such that

f(x)− ⟨x∗, x⟩ ≥ α(x) ≥ −g(Ax).

Clearly equality is attained at x = x. It is now an easy matter to check that x∗ +A∗y∗ ∈ ∂f(x). •

Note that when A is the identity mapping and both f and g are differentiableTheorem 1.2.9 recovers sum rules in calculus. The geometrical interpretation of thisis that one can find a hyperplane in X × R that separates the epigraph of f andhypograph of −g i.e. (x, r) : −g(x) ≥ r.

By applying the subdifferential sum rule to the indicator functions of two convexsets we have parallel results for the normal cones to the intersection of convex sets.

Theorem 1.2.10 (Normals to an Intersection) Let C1 and C2 be two convex subsetsof X and let x ∈ C1 ∩ C2. Suppose that C1 ∩ intC2 = ∅. Then

N(C1 ∩ C2;x) = N(C1;x) +N(C2;x).

Proof. Applying the subdifferential sum rule to the indicator functions of C1 andC2. •

The condition (1.2.6) is often referred to as a constraint qualification. Withoutit the equality in the convex subdifferential sum rule may not hold (Exercise ??).

8 1 Convex Duality

1.2.4 Role in Convex Programming

Subdifferential plays important roles in convex programming. First for unconstrainedconvex minimization problem we have Fermat’s rule:

Proposition 1.2.11 (Subdifferential at Optimality) Let X be a Banach space andlet f : X → R ∪ +∞ be a proper convex function. Then the point x ∈ X is a(global) minimizer of f if and only if the condition 0 ∈ ∂f(x) holds.

Proof. We only need to observe that x ∈ X is a minimizer of f if and only if

f(x)− f(x) ≥ 0 = ⟨0, x− x⟩,

which by definition is equivalent to 0 ∈ ∂f(x). •Alternatively put, minimizers of f correspond exactly to “zeroes” of ∂f .Consider the constrained convex optimization problem of

CP minimize f(x)

subject to x ∈ C ⊂ X,

(1.2.13)

where C is a closed convex subset of X and f : X → R ∪ +∞ is a convex lowersemicontinuous function. Combining the Fermat’s rule with the subdifferential sumrule we derive a characterization for solutions to CP.

Theorem 1.2.12 (Pshenichnii–Rockafellar Conditions) Let C be a closed convexsubset of RN and let f : X → R ∪ +∞ be a convex function. Suppose that C ∩cont f = ∅ and f is bounded below on C. Then x is a solution of CP if and only ifit satisfies

0 ∈ ∂f(x) +N(C; x).

Proof. Apply the convex subdifferential sum rule of Theorem 1.2.9 to f + ιC at x.•

Finally we turn to the relationship between subdifferential of optimal value func-tions in convex programming and Lagrange multipliers. We shall see from the twoversions of Lagrange multiplier rules given below, the subdifferential of the optimalvalue function completely characterizes the set of Lagrange multipliers (denoted λin these theorems).

Theorem 1.2.13 (Lagrange Multiplier without Existence of Optimal Solution) Letv(y, z) be the optimal value function of the constrained optimization problem P (y, z).Then −λ ∈ ∂v(0, 0) if and only if

(i) (nonnegativity) λ ∈ K+ × Z∗; and(ii) (unconstrained optimum) for any x ∈ C,

f(x) + ⟨λ, (g(x), h(x))⟩ ≥ v(0, 0).

1.2 Subdifferential and Lagrange Multiplier 9

Proof. (a) The “only if” part. Suppose that −λ ∈ ∂v(0, 0). It is easy to see thatv(y, 0) is non-increasing with respect to the partial order ≤K . Thus, for any y ∈ K,

0 ≥ v(y, 0)− v(0, 0) ≥ ⟨−λ, (y, 0)⟩

so that λ ∈ K+ × Z∗. Conclusion (ii) follows from the fact that for all x ∈ C,

f(x) + ⟨λ, (g(x), h(x))⟩ ≥ v(g(x), h(x)) + ⟨λ, (g(x), h(x))⟩ ≥ v(0, 0). (1.2.14)

(b) The “if” part. Suppose λ satisfies conditions (i) and (ii). Then we have, forany x ∈ C, g(x) ≤K y and h(x) = z,

f(x) + ⟨λ, (y, z)⟩ ≥ f(x) + ⟨λ, (g(x), h(x))⟩ ≥ v(0, 0). (1.2.15)

Taking the infimum of the leftmost term under the constraints x ∈ C, g(x) ≤K yand h(x) = z, we arrive at

v(y, z) + ⟨λ, (y, z)⟩ ≥ v(0, 0). (1.2.16)

Therefore, −λ ∈ ∂v(0, 0). •If we denote by Λ(y, z) the multipliers satisfying (i) and (ii) of Theorem 1.2.13

then we may write the useful set equality

Λ(0, 0) = −∂v(0, 0).The next corollary is now immediate. It is often a useful variant since h may wellbe affine.

Corollary 1.2.14 (Lagrange Multiplier without Existence of Optimal Solution) Letv(y, z) be the optimal value function of the constrained optimization problem P (y, z).Then −λ ∈ ∂v(0, 0) if and only if

(i) (nonnegativity) λ ∈ K+ × Z∗;(ii) (unconstrained optimum) for any x ∈ C, satisfying g(x) ≤K y and h(x) = z,

f(x) + ⟨λ, (y, z)⟩ ≥ v(0, 0).

When an optimal solution for the problem P (0, 0) exists, we can also derive aso called complementary slackness condition.

Theorem 1.2.15 (Lagrange Multiplier when Optimal Solution Exists) Let v(y, z)be the optimal value function of the constrained optimization problem P (y, z). Thenthe pair (x, λ) satisfies −λ ∈ ∂v(0, 0) and x ∈ S(0, 0) if and only if

(i) (nonnegativity) λ ∈ K+ × Z∗;(ii) (unconstrained optimum) the function

x 7→ f(x) + ⟨λ, (g(x), h(x))⟩

attains its minimum over C at x;(iii)(complementary slackness) ⟨λ, (g(x), h(x))⟩ = 0.

10 1 Convex Duality

Proof. (a) The “only if” part. Suppose that x ∈ S(0, 0) and −λ ∈ ∂v(0, 0). As inthe proof of Theorem 1.2.13 we can show that λ ∈ K+ × Z∗. By the definition ofthe subdifferential and the fact that v(g(x), h(x)) = v(0, 0), we have

0 = v(g(x), h(x))− v(0, 0) ≥ ⟨−λ, (g(x), h(x))⟩ ≥ 0,

so that the complementary slackness condition ⟨λ, (g(x), h(x))⟩ = 0 holds.Observing that v(0, 0) = f(x)+⟨λ, (g(x), h(x))⟩, the strengthened unconstrained

optimal condition follows directly from that of Theorem 1.2.13.(b) The “if” part. Let λ and x satisfy conditions (i), (ii) and (iii). Then, for any

x ∈ C satisfying g(x) ≤K 0 and h(x) = 0,

f(x) ≥ f(x) + ⟨λ, (g(x), h(x))⟩ (1.2.17)

≥ f(x) + ⟨λ, (g(x), h(x))⟩ = f(x).

That is to say x ∈ S(0, 0).Moreover, for any g(x) ≤K y, h(x) = z, f(x)+⟨λ, (y, z)⟩ ≥ f(x)+⟨λ, (g(x), h(x))⟩.

Since v(0, 0) = f(x), by (1.2.17) we have

f(x) + ⟨λ, (y, z)⟩ ≥ f(x) = v(0, 0). (1.2.18)

Taking the infimum on the left hand side of (1.2.18) yields

v(y, z) + ⟨λ, (y, z)⟩ ≥ v(0, 0),

which is to say, −λ ∈ ∂v(0, 0). •We can deduce from Theorems 1.2.13 and 1.2.15 that ∂v(0, 0) completely char-

acterizes the set of Lagrange multipliers.

1.3 Fenchel Conjugate

Obtaining Lagrange multipliers by using the convex subdifferential is closely relatedto convex duality theory based on the concept of conjugate functions introduced byFenchel.

1.3.1 The Fenchel Conjugate

The Fenchel conjugate of a function (not necessarily convex) f : X → [−∞,+∞] isthe function f∗ : X∗ → [−∞,+∞] defined by

f∗(x∗) = supx∈X

⟨x∗, x⟩ − f(x).

The operation f → f∗ is also called a Fenchel-Legendre transform. The function f∗ isconvex and if the domain of f is nonempty then f∗ never takes the value −∞. Clearlythe conjugate operation is order-reversing : for functions f, g : X → [−∞,+∞], theinequality f ≥ g implies f∗ ≤ g∗.

1.3 Fenchel Conjugate 11

1.3.2 The Fenchel–Young Inequality

This is an elementary but important result that relates conjugate operation withthe subgradient.

Proposition 1.3.1 (Fenchel–Young Inequality) Let f : X → R∪+∞ be a convexfunction. Suppose that x∗ ∈ X∗ and x ∈ dom f . Then

f(x) + f∗(x∗) ≥ ⟨x∗, x⟩. (1.3.1)

Equality holds if and only if x∗ ∈ ∂f(x).

Proof. The inequality (1.3.1) follows directly from the definition. We have theequality

f(x) + f∗(x∗) = ⟨x∗, x⟩,

if and only if, for any y ∈ X,

f(x) + ⟨x∗, y⟩ − f(y) ≤ ⟨x∗, x⟩.

That isf(y)− f(x) ≥ ⟨x∗, y − x⟩,

or x∗ ∈ ∂f(x). •

Remark 1.3.2 When f is differentiable, taking derivative with respect to x in theFenchel equality we have x∗ = f ′(x). Then the Fenchel-Legendre transform has thefollowing explicit form as a function of x

f∗(f ′(x)) = ⟨x, f ′(x)⟩ − f(x).

In Chapter 4 we will see that when f is the price of a contingent claim as a functionof a forward price x, the Fenchel -Legendre transform is related to the delta hedging. Its derivative is also relevant when we deal with dynamical hedging. We can di-rectly verify the following representation of the derivative of the Fenchel -Legendretransform

Dxf∗(f ′(x)) = Dx⟨x, f ′(x)⟩ − f ′(x) = [Dx, f

′(x)I]x,

where Dx is the differential operator with respect to x, I is the identity operator and[A,B] = AB −BA represents the commutator of operator A and B. Symmetricallywe also have

Dx∗f((f∗)′(x∗)) = [Dx∗ , (f

∗)′(x∗)I]x∗.

We can consider the conjugate of f∗ called the biconjugate of f and denoted f∗∗.This is a function on X∗∗. When X is a reflexive Banach space, i.e. X = X∗∗ itfollows from the Fenchel–Young inequality (1.3.1) that f∗∗ ≤ f . The function f∗∗ isthe largest among all the convex function dominated by f and is called the convexhull of f . Many important convex functions f on X = RN equal to their biconjugate

12 1 Convex Duality

f∗∗. Such functions thus occur as natural pairs, f and f∗. Table 1.1 shows someelegant examples on R and Table 1.2 describes some simple transformations of theseexamples. Checking the calculation in Table 1.1 and verifying the formulas in Table1.2 are good exercises to get familiar with concept of conjugate functions.

Note that the first four functions in Table 1.1 are special cases of indicatorfunctions on R. A more general result is:

Example 1.3.3 Let C be a closed convex set in the reflexive Banach spacce X.Then ι∗C = σC and σ∗

C = ιC .

f(x) = g∗(x) dom f g(y) = f∗(y) dom g

0 R 0 0

0 R+ 0 −R+

0 [−1, 1] |y| R

0 [0, 1] y+ R

|x|p/p, p > 1 R |y|q/q ( 1p+ 1

q= 1) R

|x|p/p, p > 1 R+ |y+|q/q ( 1p+ 1

q= 1) R

−xp/p, 0<p<1 R+ −(−y)q/q ( 1p+ 1

q= 1) − intR+

− log x intR+ −1− log(−y) − intR+

ex Ry log y − y (y > 0)0 (y = 0)

R+

Table 1.1. Conjugate pairs of convex functions on R.

Combining Fenchel–Young inequality and the sandwich theorem we can showthat f∗∗ = f for convex lsc function f .

Theorem 1.3.4 (Biconjugate) Let X be a finite dimensional Banach space. Thenf∗∗ ≤ f in dom f and equality holds at point x ∈ int dom f .

Proof. It is easy to check f∗∗ ≤ f and we leave it as an exercise. For any x ∈int dom f , ∂f(x) = ∅. Let x∗ ∈ ∂f(x). By the Fenchel–Young inequality we have

1.3 Fenchel Conjugate 13

f = g∗ g = f∗

f(x) g(y)

h(ax) (a = 0) h∗(y/a)

h(x+ b) h∗(y)− by

ah(x) (a > 0) ah∗(y/a)

Table 1.2. Transformed conjugates.

f(x) = ⟨x∗, x⟩ − f∗(x∗) ≤ supy∗

[⟨y∗, x⟩ − f∗(y∗)] = f∗∗(x) ≤ f(x).

s

t

O x

x∗

ϕϕ−1

s

t

O x

x∗

ϕϕ−1

Fig. 1.1. Fenchel-Young inequality

1.3.3 Graphic Illustration and Generalizations

For increasing function ϕ, ϕ(0) = 0, f(x) =∫ x0ϕ(s)ds is convex and f∗(x∗) =∫ x∗

0ϕ−1(t)dt. Graphs Fig. 1.1 illustrate the Fenchel-Young inequality graphically.

The additional areas enclosed by the graph of ϕ−1, s = x and t = x∗ or that ofϕ, s = x and t = x∗ beyond the area of the rectangle [0, x] × [0, x∗] generates theadditional area that leads to a strict inequality. We also see that equality holds whenx∗ = ϕ(x) = f ′(x) and x = ϕ−1(x∗) = (f∗)′(x∗) in Fig. 1.2.

14 1 Convex Duality

s

t

O x

x∗

ϕϕ−1

Fig. 1.2. Fenchel-Young equality

1.4 Convex Duality Theory

Using the Fenchel–Young inequality for each constrained optimization problem wecan write its companion dual problem. There are several different but equivalentperspectives.

1.4.1 Rockafellar Duality

We start with the Rockafellar formulation of bi-conjugate. It is very general and —as we shall see — other perspectives can easily be written as its special cases.

Consider a two-variable function F (x, y) onX×Y whereX,Y are Banach spaces.Treating y as a parameter, consider the parameterized optimization problem

v(y) = infxF (x, y). (1.4.1)

Our associated primal optimization problem1 is

p = v(0) = infx∈X

F (x, 0) (1.4.2)

and the dual problem is

d = v∗∗(0) = supy∗∈Y ∗

−F ∗(0,−y∗). (1.4.3)

Since v dominates v∗∗ as the Fenchel-Young inequality establishes, we have

v(0) = p ≥ d = v∗∗(0).

This is called weak duality and the non-negative number p − d = v(0) − v∗∗(0) iscalled the duality gap — which we aspire to be small or zero.

1 The use of the term ‘primal’ is much more recent than the term ‘dual’ and wassuggested by George Dantzig’s father Tobias when linear programming was beingdeveloped in the 1940’s.

1.4 Convex duality theory 15

Let F (x, (y, z)) := f(x) + ιepi(g)(x, y) + ιgraph(h)(x, z). Then problem P (y, z) in(1.1.1) becomes problem (1.4.1) with parameters (y, z). On the other hand, we canrewrite (1.4.1) as

v(y) = infxF (x, u) : u = y

which is problem P (0, y) with x = (x, u), C = X×Y , f(x, u) = F (x, u), h(x, u) = uand g(x, u) = 0. So where we start is a matter of taste and predisposition.

Theorem 1.4.1 (Duality and Lagrange Multipliers)The followings are equivalent:

(i) the primal problem has a Lagrange multiplier λ.(ii) there is no duality gap, i.e. d = p is finite and the dual problem has solution −λ.

Proof. If the primal problem has a Lagrange multiplier λ then −λ ∈ ∂v(0). By theFenchel-Young equality

v(0) + v∗(−λ) = ⟨−λ, 0⟩ = 0.

Direct calculation yields

v∗(−λ) = supy⟨−λ, y⟩ − v(y)

= supy,x

⟨−λ, y⟩ − F (x, y) = F ∗(0,−λ).

Since

−F ∗(0,−λ) ≤ v∗∗(0) ≤ v(0) = −v∗(−λ) = −F ∗(0,−λ), (1.4.4)

λ is a solution to the dual problem and p = v(0) = v∗∗(0) = d.On the other hand, if v∗∗(0) = v(0) and λ is a solution to the dual problem then

all the quantities in (1.4.4) are equal. In particular,

v(0) + v∗(−λ) = 0.

This implies that −λ ∈ ∂v(0), so that λ is a Lagrange multiplier of the primalproblem. •

Example 1.4.2 (Finite Duality Gap) Consider

v(y) = inf|x2 − 1| :√x21 + x22 − x1 ≤ y.

We can easily calculate

v(y) =

0 y > 0

1 y = 0

+∞ y < 0,

and v∗∗(0) = 0, i.e. there is a finite duality gap v(0)− v∗∗(0) = 1.In this example neither the primal nor the dual problem has a Lagrange multi-

plier yet both have solutions. Hence, even in two dimensions, existence of a Lagrangemultiplier is only a sufficient condition for the dual to attain a solution and is farfrom necessary.

16 1 Convex Duality

1.4.2 Fenchel Duality

Let us specify F (x, y) := f(x) + g(Ax + y), where A : X → Y is a linear operator.We then get the Fenchel formulation of duality. Now the primal problem is

p = v(0) = infx[f(x) + g(Ax)]. (1.4.5)

To derive the dual problem we calculate

F ∗(0,−y∗) = supx,y

[⟨−y∗, y⟩ − f(x)− g(Ax+ y)].

Letting u = Ax+ y we have

F ∗(0,−y∗) = supx,u

[⟨−y∗, u−Ax⟩ − f(x)− g(u)]

= supx

[⟨y∗, Ax⟩ − f(x)] + supu

[⟨−y∗, u⟩ − g(u)]

= f∗(A∗y∗) + g∗(−y∗).

Thus, the dual problem is

d = v∗∗(0) = supy∗

[−f∗(A∗y∗)− g∗(−y∗)]. (1.4.6)

If both f and g are convex functions, then it is easy to see that so is

v(y) = infx[f(x) + g(Ax+ y)].

We have already checked that dom v = dom g − A dom f . Thus, a sufficientcondition for the existence of Lagrange multipliers for the primal problem, i.e.,∂v(0) = ∅, is

0 ∈ ri dom v = ri[dom g −A dom f ]. (1.4.7)

Figure 1.3 illustrates the Fenchel duality theorem for f(x) := x2/2 + 1 andg(x) = (x − 1)2/2 + 1/2. The upper function is f and the lower one is −g. Theminimum gap occurs at 1/2 and, which is 7/4.

Condition (1.4.7) is often referred to as a constraint qualification or a transver-sality condition. Enforcing such constraint qualification conditions we can writeTheorem 1.4.1 in the following form:

Theorem 1.4.3 (Strong Duality)If the lower semicontinuous convex functions f , gand the linear operator A satisfy the constraint qualification conditions (1.4.7) thenthere is a zero duality gap between the primal and dual problems, (1.4.5) and (1.4.6),and the dual problem has a solution.

A really illustrative example is the application to entropy optimization.

1.4 Convex duality theory 17

2

0–0.5 1 1.50.5

1

–1

Fig. 1.3. The Fenchel duality sandwich

Example 1.4.4 (Entropy Optimization Problem) Entropy maximization refers to

minimize f(x) (1.4.8)

subject to Ax = b ∈ RN ,

with the lower semicontinuous convex function f defined on a Banach space ofsignals, emulating the negative of an entropy and A emulating a finite number ofcontinuous linear constraints representing conditions on some givenmoments. A widevariety of applications can be covered by this model due to its physical relevance.

Applying Theorem 1.4.3 with g = ιb we have if b ∈ ri(A dom f) then

infx∈X

f(x) | Ax = b = maxϕ∈RN

⟨ϕ, b⟩ − f∗(A∗ϕ) = (f∗ A∗)∗(b). (1.4.9)

When N < dim X (often infinite) the dual problem is typically much easier to solvethen the primal.

Example 1.4.5 (Boltzmann–Shannon Entropy in Euclidean Space) Let

f(x) :=

N∑n=1

p(xn), (1.4.10)

where

p(t) :=

t ln t− t if t > 0,

0 if t = 0,

+∞ if t < 0.

18 1 Convex Duality

The functions p and f defined above are (negatives of) Boltzmann-Shannon en-tropy functions on R and RN , respectively. For c ∈ RN , b ∈ RM and linear mappingA : RN → RM consider the entropy optimization problem

minimize f(x) + ⟨c, x⟩ : Ax = b. (1.4.11)

Example 1.4.4 can help us conveniently derive an explicit formula for solutionsof (1.4.11) in terms of the solution to its dual problem.

First we note that the sublevel sets of the objective function are compact, thusensuring the existence of solutions to problem (1.4.11). We can also see by directcalculation that the directional derivative of the cost function is −∞ on any bound-ary point x of dom f = RN+ , the domain of the cost function, in the direction ofz − x. Thus, any solution of (1.4.11) must be in the interior of RN+ . Since the costfunction is strictly convex on int (RN+ ), then the solution is unique.

Let us denote this unique solution of (1.4.11) by x. Then the duality result inExample 1.4.4 implies that

f(x) + ⟨c, x⟩ = infx∈RN

f(x) + ⟨c, x⟩ : Ax = b

= maxϕ∈RM

⟨ϕ, b⟩ − (f + c)∗(A⊤ϕ).

Now let ϕ be a solution to the dual problem, i.e., a Lagrange multiplier for theconstrained minimization problem (1.4.11). We have

f(x) + ⟨c, x⟩+ (f + c)∗(A⊤ϕ) = ⟨ϕ, b⟩ = ⟨ϕ, Ax⟩ = ⟨A⊤ϕ, x⟩.

It follows from the Fenchel-Young equality that A⊤ϕ ∈ ∂(f + c)(x). Since x ∈int (RN+ ) where f is differentiable, we have A⊤ϕ = f ′(x) + c. Explicit computationshows that x = (x1, . . . , xN ) is determined by

xn = exp(A⊤ϕ− c)n, n = 1, . . . , N. (1.4.12)

Indeed, we can use the existence of the dual solution to prove that the primalproblem has the given solution without direct appeal to compactness — we deducethe existence of the primal from the duality theory.

Remark 1.4.6 In view of Remark 1.2.6, when both f and g are polyhedral func-tions the constraint qualification condition (1.4.7) simplifies to

dom g ∩A dom f = ∅. (1.4.13)

This is very useful in dealing with polyhedral cone programming and, in particular,linear programming problems. One can also similarly handle a subset of polyhedralconstraints, see [7, 8].

1.4.3 Lagrange Duality

For problem (1.1.1) define the Lagrangian

L(λ, x; (y, z)) = f(x) + ⟨λ, (g(x)− y, h(x)− z)⟩.

1.4 Convex duality theory 19

Then

supλ∈K+×Z∗

L(λ, x; (y, z)) =

f(x) if g(x) ≤K y, h(x) = z

+∞ otherwise..

Then problem (1.1.1) can be written as

p = v(0) = infx∈C

supλ∈K+×Z∗

L(λ, x; 0). (1.4.14)

We can calculate

v∗(−λ) = supy,z

[⟨−λ, (y, z)⟩ − v(y, z)]

= supy,z

[⟨−λ, (y, z)⟩ − infx∈C

f(x) : g(x) ≤K y, h(x) = z]

= supx∈C,y,z

⟨−λ, (y, z)⟩ − f(x) : g(x) ≤K y, h(x) = z.

Letting ξ = y − g(x) ∈ K we can rewrite the expression above as

v∗(−λ) = supx∈C,ξ∈K

[⟨−λ, (g(x), h(x))⟩ − f(x) + ⟨−λ, (ξ, 0)⟩]

= − infx∈C,ξ∈K

[L(x, λ, 0) + ⟨λ, (ξ, 0)⟩]

=

− infx L(x, λ, 0) if λ ∈ K+ × Z∗

+∞ otherwise.

Thus, the dual problem is

d = v∗∗(0) = supλ

−v∗(−λ) = supλ∈K+×Z∗

infx∈C

L(λ, x; 0). (1.4.15)

We can see that the weak duality inequality v(0) ≥ v∗∗(0) is simply the familiar factthat

inf sup ≥ sup inf .

Example 1.4.7 (Classical Linear Programming Duality) Consider a linear pro-gramming problem

max ⟨c, x⟩ (1.4.16)

subject to Ax ≤ b, x ≥ 0

where x ∈ RN , b ∈ RM , A is a M×N matrix and ≤=≤RM+. Then by the Lagrange

duality, the dual problem is

min ⟨b, λ⟩ (1.4.17)

subject to A∗λ ≥ c, λ ≥ 0.

In fact, we need to deal with the minimizing problem

min[⟨−c, x⟩ : Ax ≤ b, x ≥ 0] = −max[⟨c, x⟩ : Ax ≤ b, x ≥ 0]

20 1 Convex Duality

We write the Lagrangian

L(λ, x) = ⟨−c, x⟩+ ⟨λ,Ax− b⟩

Then the primal problem isinfx≥0

supλ≥0

L(λ, x).

The dual problem is

supλ≥0

infx≥0

L(λ, x).

We can see that

infx≥0

L(λ, x) = infx≥0

⟨−c+A∗λ, x⟩ − ⟨λ, b⟩ =

−⟨λ, b⟩ if A∗λ ≥ c

+∞ otherwise.

So we have

max[⟨c, x⟩ : Ax ≤ b, x ≥ 0] = −maxλ≥0

[−⟨λ, b⟩ : A∗λ ≥ c]

= min[⟨λ, b⟩ : A∗λ ≥ c, λ ≥ 0].

Clearly all the functions involved here are polyhedral. Applying the constraintqualification condition for polyhedral functions we can conclude that if either theprimal problem or the dual problem is feasible then there is no duality gap. Moreover,when the common optimal value is finite then both problems have optimal solutions.

The hard work in Example 1.4.7 was hidden in establishing that the constraintqualification (1.4.13) is sufficient, but unlike many applied developments we haverigorously recaptured linear programming duality within our framework.

Note that the primal Lagrange multiplier λ is the dual solution and vice versa.Table 1.3 can help us formulating the dual problem.

Primal constraint Dual variable P.variable D.constraint

Ax ≤ b λ ≥ 0 x ≥ 0 A∗λ ≥ c

Ax = b λ free x free A∗λ = c

Ax ≥ b λ ≤ 0 x ≤ 0 A∗λ ≤ c

Table 1.3. Transformed conjugates.

1.4 Convex duality theory 21

1.4.4 Generalized Fenchel-Young Inequality

Reexamining the graphic representation of the Fenchel–Young inequality we alsorealize that the underlying inequality relationship remains valid when the area isweighted by a positive ‘density’ function K(s, t). Thus, we have

Theorem 1.4.8 (Weighted Fenchel-Young Inequality) Let K(x, y) be a continuouspositive function and let ϕ be a continuous increasing function with ϕ(0) = 0. Then∫ x

0

∫ x∗

0

K(s, t)dtds ≤∫ x

0

∫ ϕ(s)

0

K(s, t)dtds+

∫ x∗

0

∫ ϕ−1(t)

0

K(s, t)dsdt

and equality holds when x∗ = ϕ(x) and x = ϕ−1(x∗).

Proof. If ϕ(x) ≥ x∗ we have∫ x

0

∫ ϕ(s)

0

K(s, t)dtds+

∫ x∗

0

∫ ϕ−1(t)

0

K(s, t)dsdt (1.4.18)

≥∫ x

0

∫ x∗

0

K(s, t)dsdt+

∫ x

ϕ−1(x∗)

∫ ϕ(s)

x∗K(s, t)dtds

≥∫ x

0

∫ x∗

0

K(s, t)dtds.

Otherwise, ϕ(x) < x∗ and we have∫ x

0

∫ ϕ(s)

0

K(s, t)dtds+

∫ x∗

0

∫ ϕ−1(t)

0

K(s, t)dsdt (1.4.19)

≥∫ x

0

∫ x∗

0

K(s, t)dsdt+

∫ ϕ−1(x∗)

x

∫ x∗

ϕ(s)

K(s, t)dtds

≥∫ x

0

∫ x∗

0

K(s, t)dtds.

Clearly equality holds if and only if ϕ(x) = x∗. •The condition ϕ(0) = 0 merely conveniently locates the lower left conner of the

graph to the coordinate origin and is clearly not essential. In general we can alwaysshift this corner to any point (a, ϕ(a)). More substantively, the requirment that ϕbeing a continuous increasing function can be relaxed to nondecresing as long asϕ−1 is replaced appropriately by

ϕ−1inf (t) = infs, ϕ(s) ≥ t.

Now we can state a more general Fenchel–Young inequality whose proof is an easyexercise.

Theorem 1.4.9 (Weighted Fenchel-Young Inequality) Let K(x, y) be a boundedessentially positive measurable function and let ϕ be a nondecreasing function. Then

22 1 Convex Duality∫ x

a

∫ x∗

ϕ(a)

K(s, t)dsdt ≤∫ x

a

∫ ϕ(s)

ϕ(a)

K(s, t)dtds+

∫ x∗

ϕ(a)

∫ ϕ−1inf

(t)

a

K(s, t)dsdt

with equality attiained when x∗ ∈ [ϕ(x−), ϕ(x+)], x ∈ [ϕ−1inf (x

∗−), ϕ−1inf (x

∗+)].

The above idea can be further pushed in two different directions in the next twosections.

Multidimensional Fenchel–Young Inequality

It is easier to understand and to formulate n-dimensional Fenchel Young inequal-ity starting by re-examining the graphs presented above with a parameterization(ϕ1, ϕ2) of the graph of ϕ in Fig. 1.4–1.5.

ϕ1

ϕ2

(ϕ1(a), ϕ2(a)) ϕ1(b1)

ϕ2(b2)

(ϕ1(t), ϕ2(t))

ϕ1

ϕ2

(ϕ1(a), ϕ2(a)) ϕ1(b1)

ϕ2(b2)

(ϕ1(t), ϕ2(t))

Fig. 1.4. Fenchel-Young inequality

Let K(s1, s2) be a nonnegative function and let ϕ1, ϕ2 be increasing functions.To avoid technical complication we assume that ϕ1, ϕ2 are invertible. Then we canrewrite the Fenchel-Young inequality as

∫ ϕ2(b2)

ϕ2(a)

∫ ϕ1(b1)

ϕ1(a)

K(s1, s2)ds1ds2 (1.4.20)

≤∫ ϕ1(b1)

ϕ1(a)

∫ ϕ2(ϕ−11 (s1))

ϕ2(a)

K(s1, s2)ds2ds1 +

∫ ϕ2(b2)

ϕ2(a)

∫ ϕ1(ϕ−12 (s2))

ϕ1(a)

K(s1, s2)ds1ds2

with equality attained when b1 = b2.This form of the Fenchel-Young inequality can easily be generalized to N -

dimention with an induction argument. We will use the following vector notation:sN = (s1, . . . , sN ), 1N = (1, 1, . . . , 1) and

sNn = (s1, . . . , sn−1, sn+1, . . . , sN ).

1.4 Convex duality theory 23

ϕ1

ϕ2

(ϕ1(a), ϕ2(a)) ϕ1(b1)

ϕ2(b1)

(ϕ1(t), ϕ2(t))

Fig. 1.5. Fenchel-Young inequality

When ϕN = (ϕ1, . . . , ϕN ) is a vector valued function we define

ϕN (sN ) = (ϕ1(s1), . . . , ϕN (sN )).

Similarly,∫ ϕN (bN )

ϕN (aN )

K(sN )dsN =

∫ ϕN (b1)

ϕ1(a1)

. . .

∫ ϕN (bN )

ϕN (aN )

K(s1, . . . , sN )dsN . . . ds1.

Now we can state and prove the multidimensional Fenchel–Young inequality.

Theorem 1.4.10 (Multidimensional Generalized Fenchel-Young Inequality) LetK : RN 7→ R be a nonnegative function and let ϕN be a vector function with allthe components increasing and invertible. We have∫ ϕN (bN )

ϕN (a·1N )

K(sN )dsN ≤N∑n=1

∫ ϕn(bn)

ϕn(a)

∫ ϕNn (ϕ−1n (sn)·1N−1)

ϕNn (a·1N−1)

K(sN )dsNn dsn(1.4.21)

with equality attained when b1 = b2 = . . . = bN .

Proof. We prove by induction. The case N = 2 has already been established. Wefocus on the induction step. By separating the integration with respect to dsN+1,we can write the left hand side of the inequality as

LHS =

∫ ϕN (bN+1)

ϕN+1(a·1N+1)

K(sN+1)dsN+1

=

∫ ϕN+1(bN+1)

ϕN+1(a)

∫ ϕN (bN )

ϕN (a·1N )

K(sN+1)dsNdsN+1

Applying the induction hypothesis to the inner layer of the integration we have

24 1 Convex Duality

LHS ≤∫ ϕN+1(bN+1)

ϕN+1(a)

N∑n=1

∫ ϕn(bn)

ϕn(a)

∫ ϕNn (ϕ−1n (sn)·1N−1)

ϕNn (a·1N−1)

K(sN+1)dsNn dsndsN+1

=

N∑n=1

∫ ϕN+1(bN+1)

ϕN+1(a)

∫ ϕn(bn)

ϕn(a)

(∫ ϕNn (ϕ−1n (sn)·1N−1)

ϕNn (a·1N−1)

K(sN+1)dsNn

)dsndsN+1.

The last equality groups the two out layers of the integration together. Now Ap-plying the Fenchel-Young inequality with N = 2 to get

LHS ≤N∑n=1

∫ ϕn(bn)

ϕn(a)

∫ ϕN+1(ϕ−1n (sn))

ϕN+1(a)

∫ ϕNn (ϕ−1n (sn)·1N−1)

ϕNn (a·1N−1)

K(sN+1)dsNn dsN+1dsn

+

∫ ϕN+1(bN+1)

ϕN+1(a)

N∑n=1

∫ ϕn(ϕ−1N+1

(sN+1))

ϕn(a)

∫ ϕNn (ϕ−1n (sn)·1N−1)

ϕNn (a·1N−1)

K(sN+1)dsNn dsndsN+1

Combining the inner layers of the integration in the first sum and applying theequality part of the induction hypothesis for the second sum we arrive at

LHS ≤N∑n=1

∫ ϕn(bn)

ϕn(a)

∫ ϕN+1n (ϕ−1

n (sn)·1N )

ϕN+1n (a·1N )

K(sN+1)dsN+1n dsn

+

∫ ϕN+1(bN+1)

ϕN+1(a)

∫ ϕN (ϕ−1N+1

(sN+1)·1N )

ϕNn (a·1N )

K(sN+1)dsNdsN+1

=

N+1∑n=1

∫ ϕn(bn)

ϕn(a)

∫ ϕN+1n (ϕ−1

n (sn)·1N )

ϕN+1n (a·1N )

K(sN+1)dsN+1n dsn = RHS.

•A three dimensional graphical illuatration of the multidimentional Fenchel–

Young inequality is presented in Fig. 1.6. In this figure we illustrate the sim-ple case where K(s1, s2, s3) = 1 so that the left hand side of the inequality(1.4.21) is the volume of a rectangular region. We set (ϕ1(t), ϕ2(t), ϕ3(t)) = (t, t2, t),(a1, a2, a3) = (0, 0, 0) and (b1, b2, b3) = (0.9, 1, 0.8). The light lines are the edges ofthe rectangular region and the dark lines outlines the boundaries of the three re-gions corresponding to the three integrals on the right hand side of Fenchel–Younginequality (1.4.21).

Remark 1.4.11 We also have the following alternative form of estimations bychanging the way of integration. Let K(s1, s2) be a nonnegative function and letϕ1, ϕ2 be non-decreasing functions.∫ ϕ2(b2)

ϕ2(a)

∫ ϕ1(b1)

ϕ1(a)

K(s1, s2)ds1ds2

≤∫ ϕ1(b2)

ϕ1(a)

∫ ϕ2(b2)

ϕ2(ϕ−11 (s1))

K(s1, s2)ds2ds1 +

∫ ϕ2(b1)

ϕ2(a)

∫ ϕ1(b1)

ϕ1(ϕ−12 (s2))

K(s1, s2)ds1ds2

with equality attained when b1 = b2.

1.5 Generalized Convexity, Conjugacy and Duality 25

Fig. 1.6. Three Dimensional Fenchel–Young Inequality

1.5 Generalized Convexity, Conjugacy and Duality

Note that the graphic illustrations in section 1.4.3 only works when x, x∗ ∈ R. When,in general, (x, x∗) ∈ X ×X∗ we can immitate the general defintion of the Fenchelconjugate. In such a generalization a nonlinear function c(x, x∗) replaces the role

of ⟨x∗, x⟩ just as in Theorem 1.4.8∫ x0

∫ x∗0

K(s, t)dsdt replacing the product x∗x. Infact, x∗ does not even have to be in X∗. This is a more significant generalization.To implement this idea, one needs to first revise the concept of convexity.

26 1 Convex Duality

Definition 1.5.1 (Generalized Convexity) Let Φ be a set of extended real valuedfunctions. We say f is Φ− convex if

f(x) = supϕ(x) : ϕ ∈ Φ, f ≥ ϕ.

It is easy to verify that Φ− convex functions are closed under supremum. Thus, everyfunction has a largest Φ−convex minorant called its Φ−convex hull. Moreover, if f isΦ− convex then it is coincide with its Φ−convex hull. By setting Φ to be the class ofaffine functions we get the usual convexity with in the class of lower semicontinuousfunctions.

Similar to Fenchel conjugate we define:

Definition 1.5.2 (Generalized Fenchel Conjugate) Let c be a function on X × Y .We define

fc(1)(y) = supx

[c(x, y)− f(x)] and gc(2)(x) = supy[c(x, y)− g(y)].

They are generalizations of Fenchel conjugate. When the function c is not sym-metric with respect to its two variables, the c(1) and c(2) conjugate are different. It iseasy to see that the generalized Fenchel conjugate also has the order reversing prop-erty. Define Φc(1) = c(·, y)−b : y ∈ Y, b ∈ R and Φc(2) = c(x, ·)−b : x ∈ X, b ∈ R.Then fc(1) is Φc(2)−convex and gc(2) is Φc(1)−convex.

Next we discuss some basic properties of generalized Fenchel conjugate.

Theorem 1.5.3 (Fenchel Inequality and Duality) Let f : X → R ∪ +∞ andg : Y → R ∪ +∞. Then(i) (Fenchel inequality) fc(1)(y) ≥ c(x, y)− f(x), gc(2)(x) ≥ c(x, y)− g(y),(ii) (Convex hull) The Φc(1)(c(2))−convex hull of f(g) is fc(1)c(2)(gc(2)c(1)),

(iii) (Duality) fc(1) = fc(1)c(2)c(1), gc(2) = gc(2)c(1)c(2).

Proof. (i) follows directly from the definitions.To prove (ii) we observe that by (i) f(x) ≥ c(x, y)− fc(1)(y). Taking sup over y

we get f ≥ fc(1)c(2). On the other hand if for some y, b, f(x) ≥ c(x, y)− b for all x,then b ≥ c(x, y)− f(x). Taking sup over x we have b ≥ fc(1)(y). Thus,

f(x) ≥ fc(1)c(2)(x) ≥ c(x, y)− fc(1)(y) ≥ c(x, y)− b

establishing fc(1)c(2) as the largest Φc(1)− convex function dominated by f . The

proof that gc(2)c(1) is the Φc(2)− convex hull of g is similar.

(iii) follows from (ii) since fc(1) is Φc(2)− convex and gc(2) is Φc(1)− convex. •

Remark 1.5.4 We see from the discussion about generalized Fenchel conjugatethat what is essential in dealing with conjugate operation is the closedness withrespect to the sup operation. For simple convexity the key link is that a convexfunction is the sup of all the affine functions it dominates. It is a fact based on thefundamental convex separation theorem.

1.5 Generalized Convexity, Conjugacy and Duality 27

The generalized convexity can characterize many class of functions. The follow-ings are a few examples showcase the potent of this concept.

Example 1.5.5 Let ⟨·, ·⟩ be the dual pairing between X and X∗. Define c(x, x∗) =ln⟨x, x∗⟩, with ln t = −∞ for t ≤ 0. Then a function f : X → R ∪ +∞ is Φc(1)−convex if and only if ef (with the convention e−∞ = 0) is sublinear.

Example 1.5.6 Let X = Y = [0,+∞] and define c(x, y) = xy, with the conventiona(+∞) = +∞. Then a function f : X → R ∪ +∞ is Φc(1)− convex if and only ifit is convex and nondecreasing.

Example 1.5.7 Let X be a Hilbert space and Y = R+ ×X. Define c(x, (ρ, y)) =−ρ∥x − y∥2. Then f : X → R ∪ +∞ is Φc(1)− convex if and only if it is lowersemicontinuous and has a finite minorant ϕ ∈ Φc(1).

The concept of subdifferential and its relationship with Fenchel conjugate canalso be generalized.

Definition 1.5.8 (Generalized Subdifferential) Let c be a function on X × Y . Wesay y0(x0) is a c(1)(c(2))−subdifferential of f(g) at x0(y0) if

f(x)− c(x, y0)(g(y)− c(x0, y))

attains minimum at x0(y0).

Notation y0 ∈ ∂c(1)f(x0)(x0 ∈ ∂c(2)g(y0)).

Theorem 1.5.9 (Generalized Fenchel–Young Equality)

(i) (Fenchel equality) y0 ∈ ∂c(1)f(x0) iff f(x0) + fc(1)(y0) = c(x0, y0).

(ii) (Symmetry) y0 ∈ ∂c(1)fc(1)c(2)(x0) iff x0 ∈ ∂c(2)f

c(1)(y0).(iii)(Φ convexity) ∂c(1)f(x0) = ∅ implies that f is Φc(1) convex at x0. On the other

hand f is Φc(1) convex at x0 implies that ∂c(1)f(x0) = ∂c(1)fc(1)c(2)(x0).

Proof. The argument for proving Fenchel equality applies to (i) with ⟨y0, x0⟩replaced by c(x0, y0). The rest follows from this generalized Fenchel equality. Detailsare left as an exercise. •

Similar to the usual subdifferential we have

Theorem 1.5.10 (Cyclical Monotonicity) Subdifferential ∂c(1)f is c(1)− cyclicallymonotone that is for any m pairs of points yi ∈ ∂c(1)f(xi) we have

(c(x1, y0)− c(x0, y0)) + (c(x2, y1)− c(x1, y1)) +

. . .+ (c(x0, ym)− c(xm, ym)) ≤ 0.

28 1 Convex Duality

Proof. Adding the following inequalities:

f(x1)− f(x0) ≥ c(x1, y0)− c(x0, y0)

f(x2)− f(x1) ≥ c(x2, y1)− c(x1, y1)

. . . . . .

f(x0)− f(xm) ≥ c(x0, ym)− c(xm, ym).

and noticing all the terms on the left hand side are cancelled. •Next we look at an axiomatic approach to the c−conjugate.

Theorem 1.5.11 (Characterization of c−conjugate) Define an operator ∆ thatmaps an extended valued function f on X to an extended valued function ∆f onY . Then ∆ is a c−conjugate if and only if

(i) (Duality) ∆ infα fα = supα∆fα(ii) (Shift reversing) ∆(f + d) = ∆(f)− d, for all d ∈ Rwhere

c(x, y) = ∆(ιx)(y).

Proof. The “if ” part: The two properties can be derived from direct computation.For property (i)

(inf fα)c(1)(y) = sup

x[c(x, y)− inf

αfα(x)]

= supx

supα

[c(x, y)− fα(x)]

= supα

supx

[c(x, y)− fα(x)] = supαfc(1)α (y).

For property (ii)

(f + d)c(1)(y) = supx

[c(x, y)− (f(x)− d)]

= supx

[c(x, y)− f(x)] + d = fc(1)(y) + d.

The “only if” part: The key is the representation

f(·) = infx[ιx(·) + f(x)].

Applying the ∆ operator to the above representation we have

(∆f)(y) = ∆(infx[ιx + f(x)]

)(y)

= supx

(∆[ιx + f(x)]

)(y)

= supx

[∆(ιx)(y)− f(x)] = fc(1)(y)

wherec(x, y) = ∆(ιx)(y).

1.5 Generalized Convexity, Conjugacy and Duality 29

Rockafellar Duality

Consider the bi-conjugate setting again. The primal problem is

p = v(0) = infx∈X

F (x, 0) (1.5.22)

as one of the family v(y) = infx F (x, y) on the perturbation space Y . Let Z bethe ‘dual parameter space’ and let c(y, z) be a coupling function. Define the dualproblem as

d = vc(1)c(2)(0) = supz∈Z

c(0, z)− vc(1)(z). (1.5.23)

This definition is the same as the Rockafellar duality. However, since now c(0, z) isnot necessarily 0 the problem is more involved.

Theorem 1.5.12 (Dual Solution Set) If d = vc(1)c(2)(0) < ∞ then the optimalsolution set to the dual problem is ∂cv

c(1)c(2)(0).

Proof. It follows directly from definition and is left as an exercise. •Also similar to the Rockafellar duality we have

Theorem 1.5.13 (Weak and Strong Duality) We always have the weak dualityd = vc(1)c(2)(0) ≤ v(0) = p. Equality holds if and only if v is Φc(1)−convex at 0.In this case if d = p is finite then the optimal solution set to the dual problem is∂c(1)v(0).

Proof. As before the weak duality follows easily from the Fenchel -Young inequality.To prove strong duality notice that v is Φc(1)−convex at 0 implies that ∂c(1)v(0) = ∅.Then we can check each element of ∂c(1)v(0) is a solution to the dual problem. •

Lagrange Duality

Define Lagrangian for the primal problem as

L(x, z) = c(0, z)− F cx(z)

where Fx(y) := F (x, y). Then we have the Lagrange form of the primal: If Fx(y)is Φc− convex for all x ∈ X at y = 0 then

supzL(x, z) = sup

zc(0, z)− F c(1)x (z) = F c(1)c(2)x (0) = Fx(0) = F (x, 0).

Thus, the primal problem becomes

infx

supzL(x, z).

30 1 Convex Duality

Next we consider the Lagrange form of the dual If c < +∞ we have

infxL(x, z) = inf

xc(0, z)− F c(1)x (z)

= infxc(0, z)− sup

y(c(y, z)− Fx(y))

= c(0, z)− supyc(y, z)− inf

xF (x, y)

= c(0, z)− supyc(y, z)− v(y) = c(0, z)− vc(1)(z).

Therefore, the dual problem becomes

supz

infxL(x, z).

We see that the primal and dual value equal if and only if

infx

supzL(x, z) = sup

zinfxL(x, z).

2

Financial Models in One Period Economy

Summary. This chapter focuses on financial models in a one period economy witha finite sample space. Mathematically, these models involve only finite dimensionalspaces yet they still illustrate the main patterns.

In modeling the behavior of agents in a financial market, we usually use concaveutility functions and convex risk measure to characterize their attitude towardsrisk. These agents are subject to various constraints ranging from the availabilityof capital, contractual obligation to clients to mandates from regulators. Thus, thetheory regarding constrained (convex) optimization discussed in the previous chapteris most relevant. The Lagrange multipliers in such financial models often carry aspecial financial meaning and are worthy of attention. Moreover, as illustrated inthe previous chapter, they also provide the key link between the primal and the dualproblems.

2.1 Portfolio

Portfolio theory considers the one period financial model in which transaction canonly take place at either the beginning of the period or the end of the periodrepresented by t = 0 or 1, respectively. We use probability space (Ω,F , P ) torepresent an economy where the σ-algebra F is generated by finitely many atomsF = σ(B1, . . . , BN). We use RV (Ω,F , P ) to denote the Hilbert space of all F-measurable random variables endowed with the inner product

⟨x, y⟩ = EP [xy] =∑ω∈Ω

x(ω)y(ω)P (ω) =N∑i=1

x(Bi)y(Bi)P (Bi),

where x(Bi) and y(Bi) signify the common value of F-measurable random variablesx and y on atom Bi, respectively. Elements in RV (Ω,F , P ) represent the priceor payoff of assets. In a one perod economy we may think the sample space issimply consists of the atoms of F . Denoting ωi = Bi, then Ω = ω1, . . . , ωN,P (ωi) = P (Bi) and F contains all subsets of Ω.

A financial market is modeled by random vectors St = (S0t , S

1t , . . . , S

Mt ), t = 0, 1

on Ω in which S0t represent the price of a risk free asset and for simplicity is assumed

32 2 Financial Models in One Period Economy

to be cash here so that S0t = 1 for t = 0, 1, and St = (S1

t , . . . , SMt ) represent the

prices of risky assetsat time t. For each asset i > 0, we also assume that its price Si0is a constant and Si1 is a F-measurable random variable.

Definition 2.1.1 (Portfolio) A portfolio is a vector Θ = (θ0, θ1, . . . , θM ) ∈ RM+1

whose ith component θi signifies the share of the ith asset (with price at t representedby Sit) in the portfolio. The value of a portfolio Θ at time t is Θ ·St, where notation“·” signifies the dot product in RM+1.

The question is what is the best portfolio. Since different agents have differentpreferences there is no unique answer to this question.

2.1.1 Markowitz Portfolio

Markowitz proposed his pioneering portfolio theory in his thesis and later publishedit in his book [36]. Markowitz consider only risky assets. The idea is that for a fixedexpected return one should choose portfolios with minimum variation, which servesas a measure for the risk. In general, a portfolio with a higher expected return alsoaccompanied with a higher variation (risk). The tradeoff is left to the individualagent.

Use S = (S1, . . . , SM ) to denote the price process of the risky assets and Θ =(θ1, . . . , θM ) to denote the portfolio. For a given expected payoff r0 and an initialwealth w0 we can formulate the problem as

minimize Var(Θ · S1)

subject to E[Θ · S1] = r0 (2.1.1)

Θ · S0 = w0.

Regarding S as a row vector of random variables and Θ as a row vector, denotingE[S1] = [E[S1

1 ], . . . ,E[SM1 ]],

A =

[E[S1]

S0

], and b =

[r0w0

],

we can rewrite (2.1.1) as an entropy maximization problem

minimize f(x) :=1

2x⊤Σx

subject to Ax = b. (2.1.2)

Here x = Θ⊤ and

Σ = E[(S1 −E(S1))⊤(S1 −E(S1))] (2.1.3)

= (E[(Si1 −E(Si1))(Sj1 −E(Sj1))])i,j=1,...,M .

The coefficient 1/2 is added to the risk function to make the computation easier.Clearly, Σ is a symmetric positive semidefinite matrix. We will assume that it is infact positive definite. Then

2.1 Portfolio 33

f∗(y) =1

2y⊤Σ−1y. (2.1.4)

The constraint qualification condition for strong duality here is b ∈ rangeA whichis to say (r0, w0) is feasible for the constraint. Assuming that this constraint qual-ification condition is satisfied, it follows from Theorem 1.4.3 on the strong dualitythat the value of problem (2.1.2) equals to that of its dual:

maximize b⊤y − 1

2y⊤AΣ−1A⊤y

=1

2b⊤(AΣ−1A⊤)−1b. (2.1.5)

Here the optimal solution to the dual is

y = (AΣ−1A⊤)−1b. (2.1.6)

Denote σ the minimum standard deviation of portfolios with expected return r0,we have

σ2 = b⊤(AΣ−1A⊤)−1b. (2.1.7)

Let x be the solution of (2.1.2). Decoupling tells us that

f(x) + f∗(A⊤y)− ⟨y,Ax⟩ = 0 (2.1.8)

and

g(Ax) + g∗(−y) + ⟨y,Ax⟩ = 0 (2.1.9)

The equality (2.1.9) is an identity. The equality (2.1.8) via Fenchel equality tellsus

x = (f∗)′(A⊤y) = Σ−1A⊤y.

Thus the optimal portfolio is

x = Σ−1A⊤(AΣ−1A⊤)−1b. (2.1.10)

Define α = E[S1]Σ−1E[S1]

⊤, β = E[S1]Σ−1S⊤

0 and γ = S0Σ−1S⊤

0 . We have

Theorem 2.1.2 (Markowitz Portfolio Theorem) For given initial wealth w0 andexpected payoff r0, the minimum risk in terms of variation σ and the correspondingminimum risk portfolio Θ are determined by

σ(r0, w0) =

√γr20 − 2βr0w0 + αw2

0

αγ − β2(2.1.11)

and

Θ(r0, w0) =E(S1)(γr0 − βw0) + S0(αw0 − βr0)

αγ − β2Σ−1 (2.1.12)

34 2 Financial Models in One Period Economy

Proof. Rewriting (2.1.7) and (2.1.10) in terms of α, β and γ defined above.•

Note that both σ(r0, w0) and Θ(r0, w0) are positive homogeneous functions wehave

Corollary 2.1.3 Use µ to denote the expected return on unit initial wealth and letσ = σ(µ, 1) and Θ = Θ(µ, 1). Then

σ =

√γµ2 − 2βµ+ α

αγ − β2(2.1.13)

and

Θ =E[S1](γµ− β) + S0(α− βµ)

αγ − β2Σ−1 (2.1.14)

Moreover, σ(µw0, w0) = w0σ and Θ(µw0, w0) = w0Θ.

We now turn to a graphical interpretation of the Markowitz portfolio theory.Note that (2.1.13) also determines µ as a function of σ. Draw this function on theσµ-plan we get the following curve called a Markowitz bullet because of its shape.It is also often referred to as the Markowitz frontier .

Fig. 2.1. Markowitz Bullet

Every point inside the Markowitz bullet represents a portfolio that can be movedhorizontally to the left to a point on the boundary of the bullet. This point on theboundary represents a portfolio with the same expected return but less risk. Forevery point on the lower half of the boundary of the Markowitz bullet, one can finda corresponding point on the upper half of the boundary with the same variationand a higher expected return. Thus, preferred portfolios are represented by points

2.1 Portfolio 35

on the upper boundary of the Markowitz bullet. We note that the upper boundaryof the Markowitz bullet has a asymptote whose slope can be determined by

limσ→∞

µ

σ=

√αγ − β2

γ. (2.1.15)

By taking the limit of the tangent line of points on the boundary of the Markowitzbullet one can show that the µ-intercept of this asymptote is at β/γ. This numberwill play an important role in our discussion of the capital asset pricing model. Infact, the asymptote for the upper boundary of the Markowitz bullet passes throughthis point.

Although the Markowitz bullet is nonlinear, the Markowitz portfolio is an affinefunction of the return. This leads to

Theorem 2.1.4 (Two Fund Theorem) Given two distinct portfolios on the Markowitzbullet, then any portfolio on the Markowitz bullet can be represented as their linearcombination.

Proof. This follows directly from the affine structure of the Markowitz optimalportfolio (2.1.12). •

Remark 2.1.5 In pointing out that all portfolios on the Markowitz frontier aregenerated by just two such portfolios, the two fund theorem has great practicalsignificance. One can often use two broad based indices to approximate the two basicgenerating portfolios for the Markowitz frontier. This can be viewed as a theoreticalfoundation for the passive investment strategy of buy and hold broad based indices.

If our sole goal is to minimize the risk then our problem becomes

minimize f(x) :=1

2x⊤Σx

subject to S⊤0 x = w0. (2.1.16)

Using a similar argument one can show

Theorem 2.1.6 (Minimum Risk Portfolio) The minimum risk portfolio is

Θmin = γ−1w0S0Σ−1

and its standard deviation isσmin = γ−1/2w0.

36 2 Financial Models in One Period Economy

2.1.2 Capital Asset Pricing Model

Capital asset pricing model (CAPM) works as follows. First it generalizes theMarkowitz portfolio theory by allowing risk free asset in the portfolio. It turnsout that the optimal portfolios this sense all lies on a straight line in the σµ-planecalled the capital market line. Then the model prices a risky asset according to theprinciple that adding it to the market does not change the capital market line.

We derive the capital market line using convex duality first. Similar to (2.1.1)we now face the problem of

minimize Var(Θ · S1)

subject to E[Θ · S1] = µ (2.1.17)

Θ · S0 = 1.

Here we standardized the initial wealth to 1 and µ is the expected return. SinceVar(S0

1) = 0 one can show that

Var(Θ · S1) = Var(Θ · S1). (2.1.18)

Relation (2.1.18) suggests a strategy of solving problem (2.1.17) in two steps.First, for a portfolio with θ = θ0 ≥ 0, denote R = S0

1/S00 , the return on the risk free

asset, we solve problem

minimize Var(Θ · S1)

subject to E[Θ · S1] = µ− θR (2.1.19)

Θ · S0 = 1− θ.

Then, we minimize the minimum variation of (2.1.19) as a function of θ.By Theorem 2.1.2 the minimum variation corresponding to problem (2.1.19) as

a function of θ is determined by

f(θ) = [σ(µ− θR, 1− θ)]2

=γ(µ− θR)2 − 2β(µ− θR)(1− θ) + α(1− θ)2

αγ − β2(2.1.20)

Clearly, the solution of problem (2.1.17) corresponds to the minimum of function f ,if it exists. Since f is a quadratic function of θ, the minimum attains at

θ =α− β(µ+R) + γµR

α− 2βR+ γR2, (2.1.21)

the solution to the equation f ′(θ) = 0. Denote ∆ := α− 2βR + γR2 > 0. It is easyto see that the share invested in the risky assets is

1− θ = (β − γR)µ−R

∆(2.1.22)

We observe that only µ > R makes sense because by including risky assets we alwaysexpect to get a higher return than the risk free assets. Note that the risky assets areinvolved in the minimum variance portfolio only when 1− θ > 0. This implies

2.1 Portfolio 37

R < β/γ (2.1.23)

by (2.1.22). Let us focus on the case when R satisfies (2.1.23). We can calculate

µ− θR = (α− βR)µ−R

∆. (2.1.24)

By the positive homogeneous property of σ we have

σ = σ(µ− θR, 1− θ) = σ(α− βR, β − γR)µ−R

∆. (2.1.25)

It is easy to verify that σ(α − βR, β − γR) =√∆. Thus, all the optimal portfolios

lie on the line

µ = R+√∆σ. (2.1.26)

This line on the σµ-plane is usually referred to as the capital market line. Thislinear structure of the optimal portfolios suggests that we can derive all the optimalportfolios as the linear combinations of two distinct portfolios. Taking the risk freebond and a portfolio of pure risky assets we have the following

Theorem 2.1.7 (Two Fund Separation Theorem) All the optimal portfolios on thecapital market line can be represented as the linear combination of the riskless bondand the capital market portfolio

ΘM =E[S1]−RS0

β − γRΣ−1 =

E[S1]−RS0

(E[S1]−RS0)Σ−1S⊤0

Σ−1, (2.1.27)

whose corresponding coordinates in the σµ-plane is

(σM , µM ) =

( √∆

β − γR,α− βR

β − γR

). (2.1.28)

Proof. Clearly the riskless bond is on the capital market line and can be repre-sented in the σµ-plane as (0, R). We now seek a portfolio on the capital market linethat contains only risky asset. We denote its coordinates by (σM , µM ). Note such aportfolio corresponding to θ = 0. It follows from (2.1.22) that

µM = R+∆

β − γR=α− βR

β − γR. (2.1.29)

Thus, we can find risky part of the capital market portfolio by solving

minimize Var(Θ · S1)

subject to E[Θ · S1] =α− βR

β − γR(2.1.30)

Θ · S0 = 1.

By Theorem 2.1.2, we derive the optimal portfolio of (2.1.30) to be

38 2 Financial Models in One Period Economy

ΘM =E[S1]−RS0

β − γRΣ−1. (2.1.31)

Noting that the weight on the riskless bond is 0 for the capital market portfolio wearrive at the representation in (2.1.27): ΘM = (0, ΘM ).

Finally, comparing (2.1.26) and (2.1.29), we derive

σM =

√∆

β − γR. (2.1.32)

•Clearly, the point (σM , µM ) lies on the boundary of the Markowitz bullet. More-

over, since the capital market line represents optimal portfolio, the Markowitz fron-tier must lie below it. Thus, the capital market line must tangent to the Markowitzfrontier at (σM , µM ) (see Fig. 2.2). As a result, if R ≥ β/γ, there is no capital marketline (see Fig. 2.3), which confirms what has been derived analytically in (2.1.23).

Fig. 2.2. Capital market line

Using the fact that both (0, R) and (σM , µM ) belong to the capital market linewe can rewrite the capital market line as

µ =µM −R

σMσ +R. (2.1.33)

The theorem below tells us how to use this capital market line to price a risky assetin terms of its expected return.

Theorem 2.1.8 (Capital Asset Pricing Model) Suppose that we know a financialmarket S with a riskless bond returning R. Let ai be a fair priced risky asset withexpected percentage return µi. Then

µi = R+ βi(µM −R). (2.1.34)

Here βi = σiM/σ2M is called the beta of ai, where σiM = cov(ai, ΘM · S1) is the

covariance of ai and the market portfolio.

2.1 Portfolio 39

Fig. 2.3. No capital market line

Proof. Consider a portfolio relies on the parameter α that consists the risky assetai and the capital market portfolio:

p(α) = αai + (1− α)ΘM · S. (2.1.35)

Denote the expected return and the standard variation of p(α) by µα and σα, re-spectively, we have

µα = αµi + (1− α)µM , (2.1.36)

and

σ2α = α2σ2

i + 2α(1− α)σiM + (1− α)2µ2M , (2.1.37)

where µi and σi are the expected return and standard deviation of asset ai, respec-tively. The parametric curve (σα, µα) must lie below the capital market line becausethe latter consists of optimal portfolios. On the other hand it is clear that whenα = 0 this curve coincide with the capital market line. Thus, the capital market lineis an tangent line of the parametric curve (σα, µα) at α = 0. It follows that

µM −R

σM=

[dµαdσα

]α=0

=σM (µi − µM )

σiM − σ2M

. (2.1.38)

Solving for µi we derive

µi = R+ βi(µM −R). (2.1.39)

2.1.3 Sharpe Ratio

Think a little bit more we will realize that to construct the capital market portfo-lio, theoretically, we need to use every available risky asset available to us. Giventhe huge number of available equities, constructing the capital market portfolio is

40 2 Financial Models in One Period Economy

practically impossible even if we have accurate probability distribution informationon all the available risky assets (which is another impossible task). Thus, we haveto deal with suboptimal situation. What happens if we mix risk free asset withan arbitrary portfolio of risky assets (not necessarily the capital market portfolio)?Let Θ = (θ1, . . . , θM ) be such a portfolio corresponding to risky assets (a1, . . . , aM )with price random vector S = (S1, . . . , SM ). Again we standardize the portfolio so

that Θ · S0 = 1. Denote µ∗ = E[Θ · S1] and σ∗ =

√Var(Θ · S1). Then any mix of

this portfolio with a risk free asset having return R will produce a portfolio whoseexpected return µ and standard deviation σ lies on the line

µ =µ∗ −R

σ∗ σ +R. (2.1.40)

Portfolios of risky assets with larger µ∗−Rσ∗ have the potential of generating higher

return for a fixed level of risk (see Fig. 2.4). Sharpe proposes the formula to comparerisky portfolios such as those maintained by mutual funds using this idea. As anillustration, suppose that R1, . . . , RN are the monthly returns of a mutual fund a inthe pastN months and the monthly return of the risk free asset is R. Define a randomvariable X with finite values Rn−R | n = 1, . . . , N and prob(X = Rn−R) = 1/N .Then the Sharpe ratio of a is defined as

s(a) =E[X]√V ar(X)

. (2.1.41)

We can see that the Sharpe ratio is, in fact, a statistical estimate of µ∗−Rσ∗ .

Fig. 2.4. Sharpe ratio

2.2 Utility Functions

In financial problems maximizing utilities and minimizing risks are constant themes.In the Markowitz portfolio theory, one uses expected return to measure performanceand the variance to measure the risk. They are among the simplest of such measures.In general, utility functions most of the time are concave and risk measures areconvex. Hence convex analysis is a natural tool in dealing with financial modeling.

2.2 Utility functions 41

2.2.1 Utility Functions

In 1738, while working in St. Petersburg, Daniel Bernoulli posted the followingproblem later known as the St. Petersburg Wager paradox:

“Peter tosses a coin and continues to do so until it should land ‘heads’ when itcomes to the ground. He agrees to give Paul one ducat if he gets ‘heads’ on the veryfirst throw, two ducats if he gets it on the second, four if on the third, eight if theon the fourth, and so on, so that with each additional throw the number he mustpay is doubled. Suppose we seek to determine the value of Paul’s expectation.”

Of course assuming a fair coin we can easily calculate the expectation to be

∞∑n=1

2n−1 · P (getting the first head on the nth throw)

=

∞∑n=1

2n−1 1

2n=

∞∑n=1

1

2= ∞.

The paradox lies in according to this computation the value of the rights of playingsuch a game would be infinity. In other words, one would be willing to pay any costto play it, which is obviously absurd.

One way to resolve this is based on Peter only has limited amount of moneyso that he must post a limit on the payoff and, therefore, restricts the number ofthrowing of coins.

This looks more like playing a lottery. However, the expectation alone does notexplain the fact that people keep playing the lottery even if it is not a fair game(since lottery makes money the expectation must be less than the price of tickets).Daniel Bernoulli himself suggested a solution which became highly influential later.Observing that an extra 100 ducat maybe considered a small fortune to a poorit may mean little to a rich, Daniel Bernoulli argued that people intuitively valuemoney not according to its face value but its relative usefulness. Mathematically, heintroduced utility function to capture this. For the St. Petersburg Wager problem,Bernoulli suggested to use u(x) = ln(x) as the utility function.

Bernoulli choose the ln as a utility function because of two of the propertiesof this function. First the ln function is increasing signaling the more the better.Second the derivative of the ln function is 1/x which is decreasing. This matchesthe intuition that the more you have the less you care about additional money.Abstractly, let us denote a utility function by u(x). For convenience let us assume u istwice differentiable. Then we can characterize the above two properties as u′(x) ≥ 0and u′′(x) ≤ 0. Alternatively, without assuming differentiability of u we can alsocoding the intuition above mathematically by requiring a utility function to be anincreasing concave function. We say a function f : R → R is concave if and only if−f is convex. If −f is concave we say f is convex. Usually we assume rational agentsmaximizing their expected utility when making decisions. Thus, convex optimizationbecomes important in analyzing financial problems.

There are many increasing concave functions. A few are listed below.

• Power utility: (x1−γ − 1)/(1− γ), γ > 0.• Log utility: ln(x).• Exponential utility: −e−αx, α > 0.

42 2 Financial Models in One Period Economy

In dealing with a particular application problem the choice of the utility functionis often based on economic or tractability considerations. Different agents can havedifferent utility functions that reflect their own attitude towards rewards and risksof various degree.

For our mathematical model, it is important to know what kind of general condi-tions we should impose on a utility function. We consider a general extended valuedupper semicontinuous utility function u. The following is a collection of additionalconditions that are often used in financial models to accommodate different levelsof tolerance to risk:

(u1)(Risk aversion) u is strictly concave,(u2)(Profit seeking) u is strictly increasing and limt→+∞ u(t) = +∞,(u3) (Bankruptcy forbidden) For any t < 0, u(t) = −∞,

2.2.2 Measuring Risk Aversion

Comparing tendency of risk aversion by directly examining the utility functions isdifficult. The following tools are useful.

Definition 2.2.1 (Arrow-Pratt Absolute Risk Aversion Coefficient (ARA)) Thecoefficient of absolute risk aversion is defined as

A(x) = −u′′(x)

u′(x).

Constant absolute risk aversion (CARA) refers to A(x) = α is a constant,e.g. u(x) = 1 − e−αx Hyperbolic absolute risk aversion (HARA) refers to A(x) =1/(ax+ b) is a hyperbolic function, e.g.

u(x) =(x− x0)

1−γ

1− γ

where γ = 1/a, x0 = −b/a.

Definition 2.2.2 (Relative Risk Aversion Coefficient (RRA)) The coefficient ofrelative risk aversion is defined as

R(x) = −xu′′(x)

u′(x).

When ARA decreases the investor will increase risky investment in absoluteamount. Similarly, when RRA decreases the investor will increase risky investmentin percentage.

The property that an utility function has bounded ARA and RRA can be char-acterized by generalized convexity. We showcase the proof for RRA.

2.2 Utility functions 43

Theorem 2.2.3 (Characterization of Bounded Relative Risk Aversion) Let u :R+ → R be an increasing (decreasing) function with continuous second orderderivative. Then, for any p ∈ R, u has a coefficient of relative risk aversionR(x) ≤ (≥)1− p if and only if u is Φ(xpy)(1)-convex.

Proof. We focus on the case that u is increasing and the case of decreasing issimilar. The “If” part. Assume u is Φ(xpy)(1)-convex. Then, for any x > 0 we canfind y(x), b(x) such that

u(z) ≥ y(x)zp − b(x), for all z > 0

with equality holds at z = x. Let

z → f(z) := u(z)− y(x)zp + b(x).

We have f ′(x) = 0, f ′′(x) ≥ 0, which give us

R(x) = −xu′′(x)

u′(x)≤ 1− p.

The “Only if” part. Write the R(x) ≤ 1− p condition as

u′′(s)

u′(s)≥ p− 1

s.

Then solving for u on [x, z]. Details are left as an exercise.•

Similarly, we have

Theorem 2.2.4 (Characterization of Bounded Absolute Risk Aversion) Let u :R+ → R be an increasing (decreasing) function with continuous second order deriva-tive. Then, for any p ∈ R, u has a coefficient of absolute risk aversion A(x) ≤ (≥)pif and only if u is Φ(e−pxy)(1)-convex.

Remark 2.2.5 It is not hard to see that the above two theorems are also valid forfunctions with piecewise continuous second order derivatives.

2.2.3 Growth Portfolio Theory

Maximizing the expected log utility leads to the growth portfolio theory. The convexoptimization problem can be stated as

maximizing E[ln(Θ · S1)] (2.2.1)

subject to Θ · S0 = w0. (2.2.2)

44 2 Financial Models in One Period Economy

To make the theory fit to various application situations one often standardize it byassume w0 = 1, S0 = 1, S0

1 = 1 and Θ0 = 1. Then g = S1 − S0 represents the vectorof percentage return of the risky assets in the market. We can then write problem(2.2.1) as

maximizing E[ln(1 + Θ · g)]. (2.2.3)

Note that

E[ln(1 + Θ · g)] =N∑n=1

ln(1 + Θ · g(Bn))P (Bn) = ln[ΠNn=1(1 + Θ · g(Bn))P (Bn)].

Problem (2.2.3) has the same effect as to maximizing

ΠNn=1(1 + Θ · g(Bn))P (Bn),

the compounded return or “growth”.A growth optimal portfolio has the theoretical advantage of maximum rate of

growth of one’s wealth. However, in practice it often suffers the drawback of beingtoo risky. To understand this risk let us look at a simple financial market with onlyone risk asset. In this case s = Θ is just one real number. For the simplicity ofthe notation we denote gn = g(Bn) and pn = P (Bn). Then the growth portfoliooptimization problem becomes

maximizing f(s) =

N∑n=1

pn ln(1 + sgn). (2.2.4)

We will call f(s) a log return function.

Theorem 2.2.6 (Compute the Optimal Leverage) Assume without loss of general-ity that g1 < g2 < . . . < gN . Then the optimal leverage s is determined by the uniquesolution of the (N − 1)th order polynomial equation

0 = ΠNn=1(1 + sgn)

(N∑n=1

pngn1 + sgn

)(2.2.5)

on the interval (− 1gN,− 1

g1).

Proof. Since the log return function,

f(s) =

N∑n=1

pn ln(1 + sgn),

is a strictly concave function on (− 1gN,− 1

g1), its derivative is strictly decreasing.

Moreover, it is easy to see that lims→(−1/gN )+ f′(s) = ∞ and lims→(−1/g1)− f

′(s) =−∞. Thus, there is a unique solution s to the equation

2.2 Utility functions 45

0 = f ′(s) =

N∑n=1

pngn1 + sgn

(2.2.6)

on (− 1gN,− 1

g1) which is the optimal leverage.

Finally, observing that the polynomial ΠNn=1(1 + sgn) has no solution in the

interval (− 1gN,− 1

g1), which shows that smust be the unique solution of the (N−1)th

polynomial equation

0 = ΠNn=1(1 + sgn)

(N∑n=1

pngn1 + sgn

)

on the inverval (− 1gN,− 1

g1). •

When the market has only two or three states explicit solutions are not hard toderive. Those results are very useful for analyzing betting on games and, therefore,presented below.

Proposition 2.2.7 (Two States) Consider a market with two distinct states rep-resented by g1 < g2 corresponding to probabilities p1 and p2, respectively. Then thebest investment size is

s = −p1g1 + p2g2g1g2

. (2.2.7)

Proof. The log return function for such an investment system is f(s) = p1 ln(1 +sg1)+ p2 ln(1+ sp2). By Theorem 2.2.6, the best investment size s is the solution ofequation

0 = (1 + sp1)(1 + sp2)

(p1g1

1 + sp1+

p2g21 + sg2

).

Solving this equation produces equation (2.2.7). •

Proposition 2.2.8 (Three States) Consider a market with three distinct states rep-resented by g1 < g2 < g3 corresponding to probabilities p1, p2 and p3, respectively.Then the best investment size s is given by

s =

0 if C = 0

− p1g1+p3g3(p1+p3)g1g3

if g2 = 0

−B+√B2−4AC

2Aif C < 0, g2 = 0

−B−√B2−4AC

2Aif C > 0, g2 = 0.

(2.2.8)

Here A = g2g2g3, B = g − 2[p3g3 + p1g1 + p2(g1 + g3)] + (p1 + p3)g1g3 and C =p1g1 + p2g2 + p3g3.

46 2 Financial Models in One Period Economy

Proof. The proof is similar to that of Proposition 2.2.7 and is left as an exercise.•

Remark 2.2.9 (The Kelly Criterion and the Shannon Information Rate) In Propo-sition 2.2.7 if −g1 = g2 = 1 are symmetric and standardized then at the best leveragesize

s = p2 − p1

the value of the log return function is

f(s) = p1 ln p1 + p2 ln p2 + ln 2.

This is Shannon’s information rate for a communication channel with noise [50]. Notethat when g2 = −1 and g1 = 1 our portfolio is equivalent to a game with symmetricpayoffs. This is exactly what Kelly observed in [27], which is that Shannon’s infor-mation rate can be explained as the best possible outcome of using communicationchannel with noise when the signal is used for a game with symmetric payoffs.

Let us apply Proposition 2.2.7 to a simplified Blackjack game.

Example 2.2.10 (Money Management in Blackjack) In play a certain version ofthe Blackjack we know with counting cards a skilled player has a winning probabilityof 51% over the house. We simplify the problem by assuming the win and loss arealways equal to the bet and apply Proposition 2.2.8 to determine the best bettingsize s as a percentage of all the bankroll of the player. In this case g2 = 1 (wining100% of the bet), g1 = −1 (losing 100% of the bet), p2 = 51% and p1 = 49%. Thus,the best betting size is

s = −p1g1 + p2g2g1g2

= 2%.

This is actually recommended by Ed Thorp an expert in the Blackjack game anda pioneer in applying the Kelly method to investment management in his classicalbook [56]. •

The game of Blackjack has changed a lot and the player’s advantage has mostlyslipped away due to the use of multiple deck of cards and frequent shuffling. However,even if the assumption in Example 2.2.10 were correct, the optimal betting size s istoo aggressive as explained in the next example.

Example 2.2.11 Now consider playing a game with symmetric payoff t = −c = 1with the wining probability of 90%. We can easily calculate that the best betting

2.2 Utility functions 47

size s = 80%. Putting 80% of your wealth on the line is clearly too aggressive nomatter how favorable the game is to you. •

2.2.4 Efficiency Index

Despite the short comings of the growth portfolio theory, similar to the Markowitzportfolio theory the idea can also be used to construct a criterion for evaluatinginvestment performance. The key is to realize by examining e.g. Propositions 2.2.7that the effectiveness of an investment strategy must be evaluated with appropriateleverage level.

Example 2.2.12 We consider two simplified investment strategies each with tentrades whose percentage gain (loss) is listed in the first two columns of Table 1.The effects of the two systems are tested using an investment capital of $100 withtwo different investment sizes: 100% and 30% of the available capital for each trade,respectively. The results show that with an investment size of 100% of the availablecapital for each trade, System 2 is better than System 1, but with an investmentsize of 30% System 1 becomes better. •

trades S1 %gain S2 %gain 100% S1 100% S2 30% S1 30% S2

1 13% 6% 113.00 106.00 103.90 101.80

2 −25% 6% 84.75 112.36 96.11 103.63

3 13% −5% 95.77 106.74 99.86 102.08

4 −25% −5% 71.83 101.40 92.37 100.55

5 −25% 6% 53.87 107.49 85.44 102.36

6 13% −5% 60.87 102.11 88.77 100.82

7 13% −5% 68.79 97.00 92.23 99.31

8 13% −5% 77.73 92.16 95.83 97.82

9 13% 6% 87.83 97.69 99.57 99.58

10 13% 6% 99.25 103.55 103.45 101.37

Table 2.1. Effects of investment systems under different investment sizes.

How to put them on a leveled field? One way to do it is to compare them usingtheir best investment size respectively. This leads to the following definition.

Definition 2.2.13 (Efficiency Index) Suppose an investment strategy is character-ized by its returns g ∈ RV (Ω,F , P ). we define its efficiency index γ as

48 2 Financial Models in One Period Economy

γ = maxs∈[−1/max(gn),−1/min(gn)

N∑n=1

pn ln(1 + sgn), (2.2.9)

where gn = g(Bn) and pn = P (Bn).

If gn ≥ 0, n = 1, . . . , N or gn ≤ 0, n = 1, . . . , N then we have an arbitrage signaledby γ = +∞. Otherwise the efficiency index γ is the log return of the portfolio of cashand the given investment strategy under the best leverage level. In view of Remark2.2.9 the efficiency index gauges the useful information contained in an investmentstrategy.

Example 2.2.14 Let us re-examine Example 2.2.12 using the efficiency index. Bothinvestment strategies in Example 2.2.12 have two distinct gains and their profilesare summarized below.

g1 p1 g2 p2Strategy 1 13% 0.7 −25% 0.3Strategy 2 6% 0.5 −5% 0.5

Drawing the log return functions of these two investment strategies simultaneouslyin Figure 2.5 we can understand the reasons behind the phenomenon observed inExample 2.2.12. Moreover, we see that neither strategy was tested in Example 2.2.12under the best investment size. Using Theorem 2.2.7 we can calculate that, for

x

43210

System 1

0.005

0.004

0.003

0.002

0.001

System 2

0

-0.001

-0.002

Fig. 2.5. Log return functions

Strategy 1, s = 49%, γ = 0.040 and for Strategy 2, s = 167%, γ = 0.041. If wecompare the efficiency indices only then the two investment strategies are almostthe same while Strategy 2 is slightly better. Yet this fact is hard to unveil without

2.3 Fundamental Theorem of Asset Pricing 49

the help of the efficiency index. However, if margin is not allowed then Strategy 1is the better choice even though Strategy 2 has a slightly higher efficiency index. •

Remark 2.2.15 While the Kelly criterion, as a special case of the GOP, can help uscalculating a theoretical best betting size for any game, in practice such an optimalstrategy is often too risky as illustrated in Example 2.2.11. Various fractional Kellybetting scheme, often ad hoc where proposed to limiting the risk of GOP. RecentlyVince and Zhu [63] and Lopez de Prado, Vince and Zhu [33] provided theoreticaljustification for such more conservative betting strategies. They use more realisticfinite investment horizon and select betting size based on risk adjusted returns. Theanalysis involves, however, nonconvex functions.

2.3 Fundamental Theorem of Asset Pricing

We turn to consider optimizing a general utility of the payoff of a portfolio Θ ∈RM+1. We wish to endow a norm on the space of portfolios that can reflect thesize of a portfolio. Intuitively, the magnitude of Θ as a vector in RM+1 in a senseindicates the level of capital commitment or leverage level of a portfolio. However,one needs to be careful here. Holding a portfolio, an investor’s goal is to derive arisk adjusted gain represented by the random variable

Θ · (S1 − S0) ∈ RV (Ω,F , P ). (2.3.10)

We can see that increasing or reducing the share of cash in the portfolio clearlyswings the leverage level as measured by the magnitude of Θ, yet does nothing tothe gain (2.3.10). The following example shows that even if we fix the share of thecash, such a phenomenon can still happen.

Example 2.3.1 (Infinitely Many Portfolio with Equivalent Gain) Consider a statespace Ω = 0, 1 and with a financial market with three risky assets whose pricesat times 0, 1 are given by S0 = (1, 1, 1, 1), S1(0) = (1, 0.8, 0.9, 1) and S1(1) =(1, 1.1, 1.2, 1.1). We can easily verify that for portfolio Θ = (1, 1,−2, 3), Θ · (S1 −S0)(i) = 0 for both i = 0 and i = 1. It follows that for any r ∈ R, all the portfoliosΘ + rΘ have the same gain. •

Notice that as |r| → ∞, the magnitude of Θ + rΘ ∈ RM+1 also goes to infinity.This example demonstrates that the magnitude of a portfolio in RM+1 is not anappropriate measure for the leverage level of the portfolio. Moreover, it clearly doesnot make sense in practice to use a portfolio of the form Θ+ rΘ with large |r|. Thisis because doing so will greatly increase the risk (as the price of assets in a financialmarket is not deterministic) without benefit to the gain. These considerations leadto the following definitions:

50 2 Financial Models in One Period Economy

Definition 2.3.2 (Equivalent Portfolios) We say two portfolios Θ1 and Θ2 areequivalent in market S if they have the same initial value and the gain, that isto say,

Θ1 · S0 = Θ2 · S0 (2.3.11)

and, as random variables,

Θ1 · (S1 − S0) = Θ2 · (S1 − S0).

We will use S[Θ] to denote all the portfolios that are equivalent to Θ in market S.

Since all the portfolio in S[Θ] are equivalent we prefer those that have lowleverages as measured by ∥Θ∥. The following lemma provides us with an optimallyleveraged portfolio in each equivalent class.

Lemma 2.3.3 For any portfolio Θ in S, the optimization problem

min∥x∥ : x ∈ S[Θ]. (2.3.12)

has a unique solution Θ. Moreover, there exists a constant K = K(S) dependingonly on S such that, for any portfolio Θ,

∥Θ∥ ≤ K∥Θ · (S1 − S0)∥RV . (2.3.13)

Here ∥ · ∥ is the Euclidean norm on RM+1 and ∥ · ∥RV is the norm on RV (Ω,F , P )induced by the inner product ⟨·, ·⟩.

Proof. Note that problem (2.3.12) and the following problem (2.3.14) has the samesolution

min∥x∥2 : x ∈ S[Θ]. (2.3.14)

Denote

A =

S1(B1)− S0

S1(B2)− S0

. . .S1(BN )− S0

,where B1, . . . , BN are the set of atoms of the probability space (Ω,F , P ). ThenA is a N × (M + 1) matrix.

We observe that x ∈ S[Θ] amounts to requiring

Ax = Θ · (S1 − S0). (2.3.15)

We first consider the special case when rank(A) = min(M + 1, N) If rank(A) =M + 1, the constraint uniquely determines Θ = x = (A⊤A)−1A⊤Θ · (S1 − S0).Otherwise, rank(A) = N and the quadratic function ∥x∥2 attains a minimum on theaffine set characterized by the linear constraint. It is easy to calculate this solution

2.3 Fundamental Theorem of Asset Pricing 51

to be Θ = x = A⊤(AA⊤)−1Θ · (S1 − S0). In both cases Θ is unique. Moreover,defining

K = K(S) = max(∥A⊤(A⊤A)−1∥, ∥(AA⊤)−1A⊤)∥,we have (2.3.13).

If rank(A) < min(M + 1, N) then we can first remove the rows or columns in Athat are dependent on others and then apply the above special case to the reducedmatrix A. •

Definition 2.3.4 (Portfolio Space) We call the quotient space of RM+1 with respectthe portfolio equivalent relationship in market S the portfolio space on S and denoteit port[S]. For Θ ∈ port[S] we define its norm by

∥Θ∥p = ∥Θ∥.

The portfolio space (port[S], ∥ · ∥p) is a finite dimensional Banach space.

2.3.1 Fundamental Theorem of Asset Pricing

Gain without risk is what every investor desires. Such opportunities arguably willnot last as when everyone tries to chase it. Based on this observation, in a financialmarket a guiding principle is that arbitrage should not exist. The following is aformal definition.

Definition 2.3.5 (Arbitrage) We say that a portfolio Θ is an arbitrage if it involvesno risk, Θ · (S1 − S0) ≥ 0 and has opportunity to gain something Θ · (S1 − S0) = 0.

A rational investor with a utility function u satisfying conditions (u1)-(u3) willtry to maximize the expected utility of the final wealth among all portfolios inport[S]. In other words, if w0 > 0 is the initial wealth of the investor, he wants tosolve the following portfolio utility maximization problem. Find:

supE[u(w0 +Θ · (S1 − S0)] : Θ ∈ port[S]. (2.3.16)

It turns out that an arbitrage opportunity is exactly characterized by the optimalvalue for problem (2.3.16) to be +∞.

Theorem 2.3.6 (Characterizing Arbitrage with Utility Optimization) The portfo-lio space port[S] contains an arbitrage if and ony if the optimal value of the utilityoptimization problem is +∞

52 2 Financial Models in One Period Economy

Proof. The “only if” part is easy: if Θ ∈ port[S] is an arbitrage then so is rΘ forany r > 0. Then it is easy to see that E[u(w0 + rΘ · (S1 − S0)] → +∞ as r → +∞.

To prove the “if part” assume the optimal value for problem (2.3.16) is +∞. Thenthere exists a sequence Θn ∈ port[S] such that E(u(w0 +Θn · (S1 − S0)) → +∞ asn→ +∞. Necessarily, tn = ∥Θn · (S1 − S0)∥RV → +∞ as n goes to ∞. By Lemma2.3.3 there exists a constant K = K(S) such that ∥Θn/tn∥ ≤ K. Without loss ofgenerality we may assume that Θn/tn converges to some Θ∗ ∈ port[S]. Note that,for any n, Θn · (S1 − S0) ≥ −w0 by property (u3) of the utility function. Thus,Θ∗ · (S1 − S0) ≥ 0. Also,

∥Θ∗ · (S1 − S0)∥ ≥ lim infn→∞

∥Θn · (S1 − S0)/tn∥ = 1.

Therefore, Θ∗ is an arbitrage. •The fundamental theorem of asset pricing (FTAP) links no arbitrage with the

existence of risk neutral or martingale measures defined below:

Definition 2.3.7 (Equivalent Martingale Measure) We say that Q is an equivalentmartingale measure (EMM) on economy (Ω,F , P ) for financial market S providesthat, for any atom Bi of F , Q(Bi) = 0 if and only if P (Bi) = 0, and

EQ[S1] = S0.

Given an initial wealth w0 > 0, the set of all achievable wealth outcomes at theend of the one period economy t = 1 using all possible portfolios is

w0 + Θ · (S1 − S0) : Θ ∈ port[S] ⊂ RV (Ω,F , P ).

We denote the set of gains

W := Θ · (S1 − S0) : Θ ∈ port[S] ⊂ RV (Ω,F , P ).

In fact,W is a subspace of RV (Ω,F , P ). It is not hard to see that if Θ is an arbitrageportfolio then Θ · (S1 − S0) ∈ RV (Ω,F , P )+\0, where RV (Ω,F , P )+ is the coneof nonnegative random variables. Thus, no arbitrage can be described as

W ∩RV (Ω,F , P )+\0 = ∅.

Traditional proof of the FTAP relies on applying an appropriate version of thecone separation theorem to ensure that there is a hyperplane separating W andRV (Ω,F , P )+. Then, a scaling of the normal vector of such a separation hyper-plane gives us an equivalent martingale measure. This geometric picture is ofteninterpreted as the no arbitrage price being independent of investors preferences.However, we will give a proof of the FTAP below based on portfolio utility opti-mization (2.3.16). We show that the equivalent martingale measure can be viewed asa scaling of the solution to the dual problem or equivalently the Lagrange multiplierrelated to such a utility optimization problem. As a result, a pricing martingalemeasure does depend on the utility function of the investor when the market isincomplete.

2.3 Fundamental Theorem of Asset Pricing 53

Theorem 2.3.8 (Refined Fundamental Theorem of Asset Pricing) Let S be a fi-nancial market, let u be a utility function that satisfies properties (u1),(u2) and (u3)and let w0 ≥ 0 be a given initial endowment. Then TFAE:

(i) port[S] contains no arbitrage.(ii) The optimal value of the portfolio utility optimization problem (2.3.16) is finite

and attained.(iii)There is an equivalent S-martingale measure proportional to a subgradient of

−u at the optimal solution of (2.3.16).

Proof. First observe that the utility optimization problem (2.3.16) can be writtenequivalently as

max E[u(y)] (2.3.17)

subject to y ∈ w0 +W.

Define f(y) = −E[u(y)] and g(y) = ιw0+W (y). Then we can rewrite problem(2.3.17) as

−miny

f(y) + g(y) (2.3.18)

The dual problem of (2.3.18) is,

−max−f∗(−z)− g∗(z) (2.3.19)

= minE[(−u)∗(−z)] + ⟨w0, z⟩+ σW (z)

Since we can check that the constraint qualification condition

w0 ∈ ri[dom g − dom f ] = ri[w0 +W −RV (Ω,F , P )+\0] (2.3.20)

(corresponding to (1.4.7)) holds, Fenchel strong duality implies (2.3.18) and its dual(2.3.19) have the same value.

By Theorem 2.3.6, port[S] contains no arbitrage if and only if the optimal valuesof problem (2.3.16) are finite and, therefore, the dual problems (2.3.18) and (2.3.19)are all finite. Since W is a subspace, the optimal value of (2.3.19) is not −∞ impliesthat its solution z ⊥W . Moreover, E[(−u)∗(−z)] > −∞ implies that z(Bi) > 0 forall P (Bi) = 0. Thus, Q = z/E[z] is a risk neutral measure equivalent to P . That is,(i) implies (ii).

On the other hand the existence of an equivalent S-martingale measure impliesthat the constraint qualification condition for (2.3.19) holds. In fact, problem (2.3.19)can be viewed as minimizing the convex function z → E[(−u)∗(−z)] + ⟨w0, z⟩ overthe entire subspaceW⊥ (z > 0 is merely a consequence of the domain of E[(−u)∗(·)]being a subset of int −RV (Ω,F , P )+ and, therefore, is not a separate constraint).Thus, the constrain qualification condition for (2.3.19) satisfies (see e.g. [65, Theorem2.7.1]). It follows that problem (2.3.16) which equivalent to (2.3.18) as the dual of(2.3.19) has a finite value and attains its solution, which is to say (ii) implies (iii).

Finally, if (iii) is true then there cannot be any arbitrage in port[S] becauseadding an arbitrage to the optimal solution of (2.3.16) will improve it. Thus, (iii)implies (i) and we have completed a cyclic proof of the equivalence of (i), (ii) and(iii). •

54 2 Financial Models in One Period Economy

An equivalent martingale measure can also be viewed as a scaling of a Lagrangemultiplier for the portfolio utility optimization problem (2.3.16) due to the relation-ship between Lagrange multipliers and dual solutions highlighted in [9]. To see thislet us rewrite problem (2.3.16) as a constrained minimization problem

minimize E[(−u)(x)] (2.3.21)

subject to x−Θ · (S1 − S0)− w0 = 0.

We have already known from the proof of the Theorem 2.3.8 that this problem has asolution (x∗, Θ∗). Moreover, since we know strong duality holds and the dual problemhas a solution, which implies that problem (2.3.21) has a Lagrange multiplier. Letλ be the Lagrange multiplier of problem (2.3.21). Then the Lagrangian is

L((x,Θ), λ) = E[(−u)(x)] + ⟨λ, x−Θ · (S1 − S0)− w0⟩= E[(−u)(x)] + ⟨λ, x− w0⟩ − ⟨λ,Θ · (S1 − S0)⟩= E[(−u)(x) + λ(x− w0)]− ⟨λ,Θ · (S1 − S0)⟩.

It attains mininum at (x∗, Θ∗). Thus, we have ⟨λ, S1 − S0⟩ = 0 and −λ(Bi) ∈∂(−u)(x∗(Bi)), i = 1, 2, . . . , N for P (Bi) > 0. Since −u is strictly decreasing we haveλ(Bi) > 0 whenever P (Bi) > 0. Moreover, dividing ⟨λ, S1−S0⟩ = E[λ(S1−S0)] = 0by E[λ] and noticing that S0 is a constant vector we get

E[(λ/E[λ])S1] = S0.

This is to say that Q = (λ/E[λ])P is a martingale measure equivalent to P . We cansee that this martingale measure is indeed a scaling of the Lagrange multiplier.

Condition (u3) can be removed from Theorem 2.3.8 to derive a generalization ofthe version of FTAP in [17].

Theorem 2.3.9 (Refined Fundamental Theorem of Asset Pricing) Let S be a mar-ket. Then the following are equivalent:

(i) There exists no arbitrage trading strategy in port[S];(ii) There is an equivalent S-martingale measure.(iii)There exists a utility function u with properties (u1) and (u2), such that the

finite optimal value of the trading strategy utility optimization problem (2.3.16)is attained.

Proof. Implication (i)→ (ii)→ (iii) follows from Theorem 2.3.8. If the finite optimalvalue of the trading strategy utility optimization problem (2.3.16) is attained thenthere can be no arbitrage because superposition of an arbitrage to the optimalsolution will improve it. Thus (iii) also implies (i) completing a cyclic proof. •

Remark 2.3.10 Although the fundamental result of no arbitrage is equivalent toexistence of an equivalent martingale measure is well known, as pointed out in [67]

2.3 Fundamental Theorem of Asset Pricing 55

the proof of Theorem 2.3.8 using a class of utility functions says more: when themartingale measure is not unique, the dual problem actually points to one particularmartingale measure. Thus, in principle, every choice of risk neutral measure (corre-sponding to a particular price of the contingent claim) can be viewed as a particularportfolio optimization problem with a corresponding concave utility function.

The useful perspective we can get from this exercise is that pricing contingentclaims either by a replicating portfolio or by using a martingale measure can beviewed as a special case of portfolio optimization with respect to a certain utilityfunction. Moreover, when the market is not complete there are many possibilities inselecting the utility functions. Thus, the pricing of contingent claims do rely on thetrader’s preference. There can exist many different reasonable prices as a result ofthe differences in trader’s risk-reward preferences.

2.3.2 Pricing Contingent Claims

Suppose a contingent claim’s payoff at t = 1 is ϕ(S1), a function of the price of theassets at t = 1. To find the arbitrage free price ϕ0 of this contingent claim we form aportfolio holding one such contingent claim along with a portfolio of other assets inthe market scaled to the initial wealth of the investor and then (as in the previoussection) consider the portfolio optimization problem of maximizing the utility of thefinal wealth:

maximizing E[u(β(ϕ(S1) +Θ · S1))]

subject to β(ϕ(S0) +Θ · S0) = w0.

Equivalently we can write this portfolio optimization problem as

minimizing E[(−u)(x)] (2.3.22)

subject to x− β(ϕ(S1)− ϕ0 +Θ · (S1 − S0))− w0 = 0.

Assume there is no arbitrage then Theorem 2.3.6 implies that the optimal value ofproblem 2.3.22 is finite and is attained at (x∗, β∗, Θ∗). As in the previous sectionthat we can check that the constraint qualification condition for problem (2.3.22) issatisfied and, therefore, problem (2.3.22) has a Lagrange multiplier λ ∈ RV (Ω,F , P )such that the Lagrangian

L((x, β,Θ), λ) = E[(−u)(x)] + ⟨λ, x− β[ϕ(S1)− ϕ0 +Θ · (S1 − S0)]− w0⟩= E[(−u)(x)] + ⟨λ, x− w0⟩ − ⟨λ, β[ϕ(S1)− ϕ0 +Θ · (S1 − S0)]⟩= E[(−u)(x) + λ(x− w0)]− ⟨λ, β[ϕ(S1)− ϕ0 +Θ · (S1 − S0)]⟩,

attains mininum at (x∗, β∗, Θ∗). Thus, we have −λ(Bi) ∈ ∂(−u)(x∗(Bi)), i =1, 2, . . . , N for P (Bi) > 0. Since −u is strictly decreasing we have λ(Bi) > 0 when-ever P (Bi) > 0. Moreover, ⟨λ, S1 − S0⟩ = 0, which is E[λ(S1 − S0)] = 0. Dividingby E[λ] and noticing that S0 is a constant vector we get

E[(λ/E[λ])S1] = S0.

56 2 Financial Models in One Period Economy

This is to say that Q = (λ/E[λ])P is a P -equivalent martingale measure. Finally,⟨λ, ϕ(S1)− ϕ0⟩ = 0. That is

ϕ0 = EQ[ϕ(S1)],

in other words, if there is no arbitrage then the price of the contingent claim mustbe the expectation of its payoff under one of the martingale measures that areequivalent to P .

We note that unless the market is complete, martingale measures are not unique.We can see from above that martingale measures and, therefore, the resulting pricesof the contingent claim depend on the choice of utility functions. We now give asimple example that explicitly calculates the martingale measures in terms of aclass of utility functions.

Example 2.3.11 Consider a market S contains only one risky asset. Assume thatthe market has N states Ω = ω1, . . . , ωN and state ωn happens with probabilitypn. Assume for simplicity that S0 = 1 and denote xn := S1(ωn) − S0. In this casea trading strategy H is simply a constant h indicating the share of S that thetrader holds. Given a utility function u satisfying properties (u1)–(u3) the utilitymaximization problem (2.3.16) takes the following concrete form:

maxE[u(1 + h · (S1 − S0))] =

N∑n=1

pnu(1 + hxn). (2.3.23)

Rewrite (2.3.23) as a constrained minimization problem

min −N∑n=1

pnu(yn) (2.3.24)

subject to yn − 1− hxn = 0, n = 1, . . . , N.

Let’s write the Lagrangian

L((y, h), λ) = −N∑n=1

pn[u(yn) + λn(yn − 1− hxn)].

Setting ∇y,hL = 0 we derive, at the optimal solution,

N∑n=1

pnλnxn = 0, (2.3.25)

and

λn = u′(yn) = u′(1 + hxn). (2.3.26)

Equation (2.3.25) clearly shows that a scaled λ gives us the martingale measure. Tosolve for h so as to derive the solution to the utility optimization problem (2.3.23)we can substitute (2.3.26) into (2.3.25) to get the following equation for h,

N∑n=1

pnu′(1 + hxn)xn = 0. (2.3.27)

2.3 Fundamental Theorem of Asset Pricing 57

Equation (2.3.26) clearly shows that the martingale measure depends on the choiceof utility function. •

We continue this example by considering a concrete family of utility functions.

Example 2.3.12 (Risk Aversion) Let us consider a class of utility function thatdepend on parameter c > 0,

uc(x) =

lnx+ cx x > 0

−∞ x ≤ 0,

and set N = 3, p1 = p2 = p3 = 1/3 and x1 = 1,x2 = 0.5 and x3 = −0.5.In this case the Lagrangian is

L((y, h), λ) = −N∑n=1

pn[ln(yn) + cyn + λn(yn − 1− hxn)].

At the optimal solution (y, h), equation (2.3.26) determines the Lagrange multiplieras

λ = (λ1, λ2, λ3) =

(1

1 + h+ c,

1

1 + 0.5h+ c,

1

1− 0.5h+ c

). (2.3.28)

The optimal portfolio h can be determined by (2.3.27) that is(1

1 + h+ c

)+

(1

1 + 0.5h+ c

)0.5−

(1

1− 0.5h+ c

)0.5 = 0. (2.3.29)

Numerically solving (2.3.28) and (2.3.29) and scaling the Lagrange multipliersyield the following table that relates c to optimal portfolio h and risk neutral measureπ:

c h π1 π2 π3

0.0 0.868 .178 .232 .589

0.2 1.023 .183 .226 .591

0.4 1.154 .185 .222 .593

0.6 1.258 .189 .219 .593

Martingale measures when w0 = 1.

w0 θ∗ π1 π2 π3

1 1.024 .183 .226 .591

3 3.777 .188 .218 .594

6 8.830 .192 .212 .596

Martingale measures when c = 0.2.

58 2 Financial Models in One Period Economy

We can see that fixing w0 when c increases so does h, which is a fact that isnot hard to verify to be true in general from equation (2.3.29). Note that in ourfamily of utility functions depend on the parameter c, decreasing of c correspondingto increasing of risk aversion. On the other hand, fixing a utility function (by fixingc) decreasing of w0 corresponds to increasing of risk aversion. This is consistentwith an intuitive explanation of the change in the martingale measure: increasingin the weight in the middle (π2) while decreasing the weight on both extremes (π1

and π3).

Example 2.3.13 (Pricing Contingent Claims) We now turn to pricing contingentclaims. We consider the same financial market as in Example 2.3.12 defined byS0 = (1, 1) and

S1(ω1) = (1, 2), S1(ω2) = (1, 1.5), S1(ω3) = (1, 0.5)

the payoff of a call option with strike 1 is

C(ω1) = 1, C(ω2) = 0.5, C(ω3) = 0.

Fixing a utility ln(x) + 0.2x, pricing C using the equivalent martingale measurefrom the previous example gives the following results:

w0 price π1 π2 π3

1 0.296 .183 .226 .591

3 0.297 .188 .218 .594

6 0.298 .192 .212 .596

Prices of a call option when c = 0.2.

Fixing u(x) = ln(x) + 0.2x,w0 = 3 from the table p = 0.297. This is theprivate price of the agent corresponding to his/her risk aversion. The meaning ofthis private price is that the agent should buy (long) when the market price is lowerthan p = 0.297 and sell (short) when the market price is higher to improve his/herutility as shown in Fig 2.6.

Remark 2.3.14 We can see that when market price differs from the agent’s privateprice an opportunity of improving utility arises. However, this does not mean op-portunity for arbitrage. In fact, from the graph we can see that buying (or shorting)too much will actually reduce the utility. Market price equals the agent’s privateprice means no opportunity of improving utility. In this case the agent should takeno position.

The utility optimization point of view also explains that trading will happenbetween agents with different risk aversion determined by utility and initial endow-ment. For example, if market price is 0.297, then agents with w0 = 1 will sell, agentswith w0 = 6 will buy while agent with w0 = 3 will take no action.

2.3 Fundamental Theorem of Asset Pricing 59

Fig. 2.6. Green p = 0.296, Blue p = 0.297 and Red p = 0.298

2.3.3 Complete Market

We have seen that in general the risk neutral measure is not unique and they arerelated to the investor’s utility function. One exception is when the financial marketis complete:

Definition 2.3.15 (Complete Market) We say a financial market S is complete if

Θ · S1 | Θ ∈ port[S] = RV (Ω,F , P ),

or equivalently1B : B ∈ F ⊂ Θ · S1 | Θ ∈ port[S].

When a financial market is complete, it is a simple fact in linear algebra thatthere is only one unique equivalent martingale measure.

Proposition 2.3.16 (Unique Martingale Measure) Let S be a complete financialmarket. Then there is only one unique equivalent martingale measure.

Proof. Since W = Θ · S1 | Θ ∈ port[S], Θ · S0 = 0, dimW=dim θ · S1 | Θ ∈port[S]−1. Thus, for a complete market dim W⊥=1. Hence, in a complete marketequivalent martingale measure is unique. •

If we focus only on complete markets then utility functions are irrelevant toasset pricing. But, of course, most markets are incomplete. In a complete marketthe search for optimal portfolio can also be simplified.

60 2 Financial Models in One Period Economy

Suppose that (x∗, Θ∗) is the solution to the constrained minimization problem(2.3.21) then it is also the solution to the problem of minimizing the Lagrangian

L((x,Θ), λ) = E[(−u)(x) + λ(x− w0)]− ⟨λ,Θ · (S1 − S0)⟩.

which implies that Q = λ/E[λ]P is the unique risk neutral measure. Moreover, sincex∗ satisfies the constaint x∗−Θ∗ ·(S1−S0)−w0 = 0 we also know that⟨λ, x∗−w0⟩ =EQ[x∗−w0] = 0. Thus, x∗ is also a solution to the constrained minimization problem

minimize E[(−u)(x)] (2.3.30)

subject to EQ[x] = w0.

On the other hand, since −u is strictly convex, the solution to (2.3.30) is unique and,therefore, must be x∗. Thus, problem (2.3.21) and (2.3.30) have the same solution.

Remark 2.3.17 1. Problem (2.3.30) only provides a solution x∗. To get the optimalportfolio one has to do additional work using the constraint.

2. The equivalence of the solutions of the two problem breaks down if martingalemeasures are not unique and, therefore the above result only holds in a completemarket.

2.3.4 Use Linear Programming Duality

If we set w0 = 0 then the utility optimization problem becomes

supE[u(x)] : x ∈W.

Importantly, property (u2) of the utility function forces x ∈ RV (Ω,F , P )+ so thatthe problem is, in fact,

supE[u(x)] : x ∈W ∩RV (Ω,F , P )+.

Note that no arbitrage is equivalent to

W ∩RV (Ω,F , P )+ = 0.

Thus, for the purpose of characterizing no arbitrage, the problem is trivial.What do we get from our theory then? We still see that no arbitrage implies the

existence of an equivalent martingale measure. Moreover, we still have the martingalemeasure is proportional to a subdifferential of the negative of the utility functionat the optimal portfolio. This is where we can derive more from our approach.In this trivial problem the only solution is 0 for all economic states ω ∈ Ω. Sinceu(t) = −∞, t < 0, the subdifferential of−u at 0 is determined by the right directionalderivative:

k := limt↓0

u(t)− u(0)

t> 0.

In fact,

−∂(−u)(0) = [0, k]. (2.3.31)

2.4 Risk Measures 61

Since this is true for all states ω ∈ Ω, it tells us the equivalent martingale measureis proportional to an vector in [0, k]N , N = number of states in Ω. This amounts toconstraint in the martingale measure. We also note that in this case nothing is lostby picking the utility function u(t) = t− ι(−∞,0)(t) so that the utility maximizationproblem becomes a linear programming problem. This way one can use the morewidely known linear programming duality instead of Fenchel duality.

In a finite dimensional space, linear programming duality is equivalent to theKrep-Yan cone separation theorem which is used by Harrison and Kreps [23], Harri-son and Pliska [22], Delbaen and Schachermayer [15] and many others in their proofsof FTAP in different settings. These approaches, however, lose the information re-lating to the agent’s risk aversion.

2.4 Risk Measures

We have discussed variance –standard deviation and drawdown as risk measures.There are many other risk measures. To be systematic, in this section, we take anaxiomatic approach: list desired properties of risk measures. We focus on coherentrisk measures. It is proposed by Artzner et al. in [2] partly motivated by marginrules developed by Chicago Mercantile Exchange (CME) and Security and ExchangeCommission (SEC) and have attracted much attention. From a mathematical pointof view coherent risk measure is sublinear, a special type of convex function. As aresult many tools in convex analysis and duality theory are applicable.

2.4.1 Coherent Risk Measure

Definition 2.4.1 (Coherent Risk Measure) Let RV (Ω,F , P ) represent the payoffspace. We say a lower semicontinuous function ρ : RV (Ω,F , P ) → R ∪ +∞ is acoherent risk measure if, for any x, y ∈ RV (Ω,F , P ), ρ has the following properties:

(r1) (Positive homogeneity) ρ(rx) = rρ(x) for any r > 0,(r2) (Subadditivity) ρ(x+ y) ≤ ρ(x) + ρ(y),(r3) (Translation property) ρ(x+c1) = ρ(x)−c for all x ∈ RV (Ω,F , P ) and c ∈ R.(r4) (Monotonicity) ρ(x) ≤ ρ(y) for any x ≥ y,

Propoerty (r1) says that the risk measure is proportional to scaling; Propoerty(r2) reflects the belief that diversification reduces risk; The idea of (r3) is that onemay measure the risk of x by the minimum amount of additional capital reserve toensure that there is no risk of bankruptcy; And finally (r4) says that a dominantpayoff is less risky.

Not all commonly used risk measures satisfy all the requirements of a coherentrisk measure. The axioms of coherent risk measures provide a set of desired prop-erties for further discussions and comparisons. A coherent risk measure as definedabove has a simple structure and afford several equivalent characterization whichwe will discuss below.

62 2 Financial Models in One Period Economy

2.4.2 Equivalent Characterization of Coherent Risk Measures

Dual Representation

Coherent risk measure is convex. Any l.s.c. convex function on a finite dimensionalBanach space has the dual representation

ρ(x) = supy∈RV (Ω,F,P )

[⟨x, y⟩ − ρ∗(y)], (2.4.1)

where ⟨x, y⟩ = E[xy]. What is interesting here is that ρ∗ for any risk measure ρsatisfying (r1) and (r2) must be an indicator function. Properties (r3) and (r4)further restrict the support of this indicator function.

Proposition 2.4.2 (Conjugate of a sublinear risk measure) Let ρ be a risk measuresatisfying axioms (r1) and (r2). Then

ρ∗ = ιM ,

whereM = y : ⟨x, y⟩ ≤ ρ(x), for all x ∈ RV (Ω,F , P ).

Proof. Clearly, for any y ∈ RV (Ω,F , P ), we have

ρ∗(y) = supx∈RV (Ω,F,P )

[⟨x, y⟩ − ρ(x)] ≥ ⟨0, y⟩ − ρ(0) = 0.

For any y ∈M , ρ∗(y) cannot exceeds 0 so that it must be equal to 0.On the other hand, for any y ∈ M , there exists x ∈ RV (Ω,F , P ) such that

⟨x, y⟩ − ρ(x) ≥ 0. Since the function x → ⟨x, y⟩ − ρ(x) is positive homogeneous, wemust have

ρ∗(y) ≥ supr>0

[⟨rx, y⟩ − ρ(rx)] = supr>0

r[⟨x, y⟩ − ρ(x)] = +∞.

Thus,ρ∗ = ιM .

•We note that the characterization of M in Proposition 2.4.2 depends on ρ. Thus

we cannot use it to describe ρ. Information leads to ρ independent restriction isuseful. The axioms (r3) and (r4) provide such information.

Proposition 2.4.3 (Effect of the Translation Property) Let ρ be a risk measuresatisfying (r1), (r2) and (r3). Then there exists a closed convex subset

M ⊂ y ∈ RV (Ω,F , P ) : E[−y] = 1,

such thatρ∗ = ιM .

2.4 Risk Measures 63

Proof. By Proposition 2.4.2 M = y : ⟨x, y⟩ ≤ ρ(x), for all x ∈ RV (Ω,F , P ). Ifρ also satisfies (r3), choose x = 1 and x = −1, respectively we have E[y] ≤ −1 andE[−y] ≤ 1, respectively. Thus, E[−y] = 1 as was to be shown. •

Proposition 2.4.4 (Effect of Monotonicity) Let ρ be a risk measure satisfying (r1),(r2) and (r4). Then there exists a closed convex subset

M ⊂ −RV (Ω,F , P )+,

such thatρ∗ = ιM .

Proof. By Proposition 2.4.2 M = y : ⟨x, y⟩ ≤ ρ(x), for all x ∈ RV (Ω,F , P ). If ρalso satisfies (r4), then for any y ∈M and x ∈ RV (Ω,F , P )+ we have ⟨x, y⟩ ≤ 0 sothat y ∈ −RV (Ω,F , P )+. •

Since the conjugate of an indicator function is a support function we derived thefollowing characterization of a coherent risk measure.

Theorem 2.4.5 (Dual Characterization of Coherent Risk Measure) Let ρ be a co-herent risk measure. Then there exists a closed convex subset

M ⊂ y ∈ −RV (Ω,F , P )+ : E[−y] = 1,

such thatρ = σM .

Remark 2.4.6 Coherent risk measure is directly related to cash reserve. It is away to gauge how much cash reserve one needs to have for investing in a certainrisky asset. The set y ∈ −RV (Ω,F , P )+ : E[−y] = 1 represents standardizedlosses because E[y] = −1. Theorem 2.4.5 tells us a coherent risk measure is inessence picking a particular ‘test’ set of typical losses represented by the set M todetermine the level of cash reserve for a certain investment. There are infinitely manypossibilities in choosing the set M and thus determining particular coherent riskmeasures. The larger the set M , the more conservative the risk measure (requiringhigher cash reserves). In fact, this is the original motivation for the definition of thecoherent risk measure. The CME margin system is an example of using this methodwith a finite set M . The idea is rather similar to ‘stress’ test. In implementation,it is clear that what is important is not how many elements one includes in M buthow ‘diversified’ the elements in M are.

64 2 Financial Models in One Period Economy

Coherent Acceptance Cone

A second characterization for a coherent risk measure is the acceptance cone definedby

Aρ = x ∈ RV (Ω,F , P ) | ρ(x) ≤ 0. (2.4.2)

It is easy to check the following properties for Aρ.

Proposition 2.4.7 Let ρ be a coherent risk measure. Then the related acceptancecone Aρ has the following properties:

(a1) Aρ is a closed convex cone,(a2) 1 ∈ Aρ,(a3) RV (Ω,F , P )+ ⊂ Aρ.

Proof. We merely note that (a1) is a consequence of (r1) and (r2), (a2) followsfrom the transitive property (r3) and (a3) is the result of monotone property (r4).Details are left as an exercise. •

What is interesting is that and set has properties (a1)–(a3) must be the accep-tance set of some coherent risk measure. This leads to the following definition.

Definition 2.4.8 (Coherent Acceptance Cone) We say a set A ⊂ RV (Ω,F , P ) isa coherent acceptance cone provided that it has the following properties:

(a1) A is a closed convex cone,(a2) 1 ∈ A,(a3) RV (Ω,F , P )+ ⊂ A.

Theorem 2.4.9 (Coherent Risk and Acceptance Cone) Let A ⊂ RV (Ω,F , P ) be acoherent acceptance cone. Then there exists a coherent risk measure ρA such that

A = x ∈ RV (Ω,F , P ) | ρA(x) ≤ 0.

Proof. The way to construct ρA is

ρA(x) = inft ∈ R | x+ t1 ∈ A.

All the desired properties then follow naturally. We leave checking the details as anexercise. •

It is natural to ask the relationship between the acceptance cone and the gener-ating set of a coherent risk measure.

2.4 Risk Measures 65

Theorem 2.4.10 (Acceptance Cone and the Generating Set) Let ρ be a coherentrisk measure with a generating set M , i.e. ρ = σM . Let Aρ be its acceptance cone.Then

Aρ = −(cone M)+,

where cone M is the cone generated by M , i.e. the smallest cone containing M .

Proof. We only need to observe x ∈ −(cone M)+ if and only if ⟨x,m⟩ ≤0, for all m ∈M iff ρ(x) = σM (x) ≤ 0, i.e. x ∈ Aρ. •

Fig. 2.7 provides a graphic illustration of the relationship between M and Aρ.

RV (Ω,F , P )+

M

Fig. 2.7. Generating set M and acceptance set Aρ

The coherent acceptance cone provides a dual representation of a coherent riskmeasure. It provides a different implementation of margin rules that are essentiallythe SEC methods adopted by National Association of Security Dealers (NASD). Theway they implement is to consider a portfolio as consisting of a list of componentsecurities and for each of these securities there is a corresponding margin require-ment. In the language of coherent acceptance cone, this amounts to specify a set ofgenerating elements of the cone.

Coherent Preference

We know that any closed convex cone induces a continuous partial order. Let ≤Abe the linear partial order defined by an coherent acceptance cone A, that is x ≤A yif and only if y − x ∈ A.

66 2 Financial Models in One Period Economy

Proposition 2.4.11 Let A be a coherent acceptance cone and define partial order≤A by x ≤A y if and only if y − x ∈ A. Then ≤A has the following properties:

(o1) (Positive homogeneous) 0 ≤A x implies 0 ≤A tx for any t > 0,(o2) (Additive) x ≤A y and u ≤A v implies x+ u ≤ y + v,(o3) (Reflexive) x ≤A x,(o4) (Monotone) 0 ≤ x for any x ∈ RV (Ω,F , P )+.

Proof. Exercise. •Properties (o1)–(o4) also characterize partial order generated by a coherent ac-

ceptance set.

Definition 2.4.12 (Coherent Partial Order) We say ≤ is a coherent partial or-der provided that it has the following properties:

(o1) (Positive homogeneous) 0 ≤ x implies 0 ≤ tx for any t > 0,(o2) (Additive) x ≤ y and u ≤ v implies x+ u ≤ y + v,(o3) (Reflexive) x ≤ x,(o4) (Monotone) 0 ≤ x for any x ∈ RV (Ω,F , P )+.

Theorem 2.4.13 (Coherent Partial Order and Acceptance Cone) Let ≤ be a co-herent partial order. Then there exists a coherent acceptance cone A such that x ≤ yif and only if y − x ∈ A.

Proof. The coherent acceptance cone can be identified as

A = x ∈ RV (Ω,F , P ) | 0 ≤ x.

Verifying the properties of A is not hard and is left as an exercise. •

Valuation Bounds and Price System

Definition 2.4.14 (Valuation Bounds) Let ≤ be a coherent partial order. We definethe related coherent valuation bounds, for x ∈ RV (Ω,F , P ) by

π(x) = infr : x ≤ r1 and π(x) = supr : r1 ≤ x.

Definition 2.4.15 (Admissible Price) Let ≤ be a coherent partial order. We sayπ ∈ RV (Ω,F , P )∗ = RV (Ω,F , P ) is an admissible price operator if, for all 0 ≤ x,

⟨π, x⟩ ≥ 0.

We say π is normalized if π(1) = 1.

2.4 Risk Measures 67

Definition 2.4.16 (Consistent Price) Consider a one period financial market S onRV (Ω,F , P ). We say π ∈ RV (Ω,F , P )∗ = RV (Ω,F , P ) is an consistent priceoperator for S, provided that

⟨π, S1⟩ = ⟨π, S0⟩.

Viewing price operators as elements in the dual space is consistent with the oneprice principle. The definition of admissible price operators recognizes the value ofany payoff 0 ≤ x, or x ∈ A where A is the coherent acceptance cone generatingthe partial order ≤. Normalized price is consistent with the value of cash impliedin the translation property of the coherent risk measure. Consistent price operatoris, in fact, looking at martingale measures from the perspective of pricing system.The next proposition explains the meaning of valuation bounds and follows directlyfrom the definition.

Proposition 2.4.17 (Bounds for Normalized Price) Let π be a normalized admis-sible price operator. Then, for any x ∈ RV (Ω,F , P ),

π(x) ≤ ⟨π, x⟩ ≤ π(x).

Proof. Exercise. •While the concepts of valuation bounds and prices provide different perspectives

they are closely related to the coherent risk and its equivalent description in terms ofits coherent acceptance cone and coherent partial order as evidenced in the theorembelow.

Theorem 2.4.18 (Valuation Bounds and Coherent Risk Measure) Let ≤ be thecoherent partial order generated by the coherent risk measure ρ and let π and π bethe price bounds induced by the partial order ≤. Then, for any x ∈ RV (Ω,F , P ),

ρ(x) = π(−x) = −π(x).

Proof. Consider r ∈ R with −x ≤ r1. We have 0 ≤ x + r1 so that ρ(x) − r =ρ(x+ r1) ≤ 0 or ρ(x) ≤ r. Taking infimum over all such r we have

ρ(x) ≤ π(−x).

On the other hand, ρ(x+ ρ(x)1) = ρ(x)− ρ(x) = 0 implies that

ρ(x) ≥ π(−x).

The equality π(−x) = −π(x) follows directly from definition. •

68 2 Financial Models in One Period Economy

2.4.3 Good Deal

Having a risk measure and related acceptance cone, we can consider ‘good deal’ op-portunities of making money with some acceptable risks instead completely focusingon arbitrage.

Definition 2.4.19 (Good Deal) Consider a one period financial market S onRV (Ω,F , P ). Let port[S] be the portfolio space and let W = Θ · (S1 − S0) : Θ ∈port[S] be the gain space. For a coherent acceptance cone A we say that x ∈ W isa good deal with respect to A if there exists r > 0 such that

x− r1 ∈ A.

We note that a good deal with respect to A = RV (Ω,F , P )+ is an arbitrage.Thus, good deal is a relaxation of arbitrage. We have the following characterizationof the existence (or absence) of a good deal.

Proposition 2.4.20 (Existence of Good Deals) Portfolio on S contains a good dealwith respect to A if and only if 1 ∈ W − A. Equivalently, port[S] contains no gooddeal with respect to A if and only if 1 ∈W −A.

Proof. If 1 ∈ W − A we can find x ∈ W and a ∈ A such that x − 1 = a ∈ A. Inother words, x is a good deal. One the other hand if x is a good deal then x−r1 = afor some r > 0 and a ∈ A. Now 1 = x/r − a/r ∈W −A as was to be shown. •

The above characterization for the existence of good deal is from the perspectiveof payoffs. We now relate it to price and price bounds. Mathematically, it is a processof scalarization. What we do here is to consider the potential price of a payoff z inthe market. First we discuss price bounds for a good deal.

Definition 2.4.21 (Good Deal Bounds) Let A be a coherent acceptance cone andlet z ∈W the gain space of financial market S. We define the upper and lower gooddeal bounds with respect to A by

πW (z) = infr∈R,x∈W

r : x+ r1− z ∈ A

andπW (z) = sup

r∈R,x∈Wr : x− r1 + z ∈ A.

As the name suggests, good deal bounds reveal prices for good deals. The interval[πW (z), πW (z)] is the interval of nomalized admissible prices that consistent with

2.4 Risk Measures 69

the absence of a good deal. In fact, if z has a normalized admissible price P > πW (z)then there exists x = Θ ·(S1−S0) ∈W and 0 < r < P such that x+r1−z ∈ A, thenwe can sell short z at price P and assemble portfolio Θ ·S0 at time t = 0. When t = 1the value of the portfolio gives us y = x+P 1−z. Since y−(P −r)1 = x+r1−z ∈ A,it is a good deal.

The good deal bounds are actually coherent valuation bounds.

Proposition 2.4.22 (Good Deal Bounds as Valuation Bounds) The upper andlower good deal bounds πW (z) and πW (z) defined in Definition 2.4.21 are actuallycoherent valuation bounds.

Proof. It is easy to check that πW (−z) = −πW (z). Moreover, rewrite −πW (z) as

−πW (z) = − supr∈R,x∈W

r : x− r1 + z ∈ A

= inf−r∈R

r : −r1 + z ∈ A−W

= infr∈R

r : z + r1 ∈ A−W.

Since A−W is a cone containing RV (Ω,F , P )+, we can see that −πW (z) = ρA−W (z)is the coherent risk measure corresponding to the coherent acceptance cone A−W .

•Actually, one can show that ρA−W (z) = infx∈W ρA(x+ z) (Exercise).Note that the fundamental theorem of asset pricing is essentially based on the

separation of W and RV (Ω,F , P )+. The same argument can be applied to yield asimilar result regarding good deal.

Theorem 2.4.23 (Fundamental Theorem of Asset Pricing for Good Deal) Let Abe a coherent acceptance cone and let W = Θ · (S1 − S0) : Θ ∈ port[S] be the gainspace of financial market S. Then port[S] contains no good deal iff there exists anadmissible consistent normalized price operator.

Proof. The portfolio space port[S] contains no good deal if and only if W notintersect with the interior of A if and only if there exists y ∈ RV (Ω,F , P )∗ =RV (Ω,F , P ) such that

⟨x, y⟩ ≤ ⟨a, y⟩, for all x ∈W and a ∈ A.

Since 0 ∈W , we have, for all a ∈ A, ⟨a, y⟩ ≥ 0. Thus, y is an admissible price. Since0 ∈ A, we have, for all x ∈ W , ⟨x, y⟩ ≤ 0. Since W is a subspace ⟨x, y⟩ = 0 for allx ∈ W . This is equivalent to π = y/⟨1, y⟩ is an admissible consistent normalizedprice operator. •

70 2 Financial Models in One Period Economy

2.4.4 Several Commonly Used Risk Measures

Standard Deviation

Variance or equivalently standard deviation has been used as a risk measure sinceMarkowitz proposed the modern portfolio theory. It satisfies (r1) and (r2) but fails(r3) and (r4). The standard deviation does not satisfy axiom (r4) has long beencriticized as unreasonable. Some remedies have been suggested such as count thedeviation only on losses. It turns out that

ρs(x) =√

E[((x−E[x])−)2)−E[x]

is actually a coherent risk measure that is faithful to the idea of using downsidedeviation as a measure for risk.

Both implementations suggested by the dual representation Theorem 2.4.5 andthe acceptance cone formulation in Theorem 2.4.9 are viable. For example, if oneuses the acceptance cone to implement then each security is paired with a marginrequirement equals to its modified standard deviation if that can be estimated.

Drawdown

The maximum absolute drawdown, denoted dd(x) in a given period of time is oftenused by traders. This risk measure also satisfies axioms (r1) and (r2) but fails (r3)and (r4).

As in the case of standard deviation we can also subtract E(x) to make it satisfy(r3). One way to adjust it so that it has property (r4) is to make the reference pointfor maximum down move to the fixed beginning wealth. But this completely distortsthe intention of drawdown as a risk measure.

Both implementations suggested by the dual representation Theorem 2.4.5 andthe acceptance cone formulation in Theorem 2.4.9 are viable without axiom (r4).The only difference is that the acceptance cone may not contain the entire coneRV (Ω,F , P )+. This is not unreasonable in practice.

Value at Risk

The value at risk of a portfolio in a given period is a gauge for the risk of the portfoliothat is important for both portfolio managers and regulators. It is defined on therandom variable of loss L = −x.

Definition 2.4.24 (Value at Risk) Let L be the random variable representing theloss of a portfolio in a given period. The value at risk with confidence level α ∈ (0, 1),denoted by V aRα is defined as

V aRα(L) = infl ∈ R | P (L > l) ≤ 1− α.

2.4 Risk Measures 71

In other words, V aRα is a minimum level of loss which has a probability ofhappening 1− α. The following is an illustration.

Example 2.4.25 (VaR of a Discrete Loss Distribution) Suppose that the loss L isdiscretely distributed as in the following table

L Prob

600 0.02

50 0.03

40 0.05

30 0.10

20 0.10

10 0.05

0 0.65

Table 3. A discrete loss distribution.

Then V aR0.95(L) = 50, V aR0.9(L) = 40, and V aR0.8(L) = 30.

Let FL(l) := P (L ≤ l) be the cumulative distribution function of L. Then

V aRα(L) = infl ∈ R | FL(l) ≥ α.

We define the quantile function of L by

QL(p) = infl ∈ R | p ≤ FL(l).

When FL is an invertible function, QL = F−1L .

Value at risk as a risk measure satisfies axioms (r1) and (r4). Similar to themaximum drawdown one can adjust the cash position and define a revised versionthat also meets the requirement of (r3). However, missing (r2) is a big drawback forVaR as a risk measure and the remedy is complicated.

Conditional Value at Risk

Rockafellar and Uryasev [44, 45] proposed conditional value at risk as a remedy forVaR does not satisfy (r2).

Definition 2.4.26 (Conditional Value at Risk) Let L be the random variable thatrepresents the loss of a portfolio in a given period. The conditional value at risk withconfidence level α ∈ (0, 1), denoted by CV aRα is defined as

CV aRα(L) =1

1− α

∫ 1

α

V aRs(L)ds.

72 2 Financial Models in One Period Economy

We can see that CV aRα is the expected or average loss that has a probability1− α of happening.

Example 2.4.27 (CVaR of a Discrete Loss Distribution) Suppose again that theloss L is discretely distributed as in Table 3. Then CV aR0.95(L) = (50 · 0.03+ 600 ·0.02)/0.05 = 270, V aR0.9(L) = (40 · 0.05 + 50 · 0.03 + 600 · 0.02)/0.1 = 155, andV aR0.8(L) = (30 · 0.1 + 40 · 0.05 + 50 · 0.03 + 600 · 0.02)/0.2 = 92.5.

Table 4 Compares VaR and CVaR.

L Prob α VaR CVaR

600 0.02

50 0.03 0.95 50 270

40 0.05 0.9 40 155

30 0.10 0.8 30 92.5

20 0.10

10 0.05

0 0.65

Table 4. Comparing VaR and CVaR.

We can see that V aR has the effect of give unreasonable incentive to insurancewriters in general and Credit Default Swap (CDS) writers in particular.

It is not hard to see that both V aRα(L) and CV aRα(L) are increasing functionsof α and V aRα(L) is dominated by CV aRα(L).

The following representation reveals that the conditional value at risk is convexwith respect to L.

Theorem 2.4.28 (Representation as an Expectation)

CV aRα(L) = minr∈R

r +

1

1− αE[(L− r)+]

(2.4.3)

= V aRα(L) +1

1− αE[(L− V aRα(L))

+].

Proof. Note that for any r,

2.4 Risk Measures 73

1

1− αE[(L− r)+] =

1

1− α

∫Ω

(L(ω)− r)+P (dω) (2.4.4)

=1

1− α

∫Ω

∫ ∞

r

1[t,∞)(L(ω))dtP (dω)

=1

1− α

∫ ∞

r

∫Ω

1[t,∞)(L(ω))P (dω)dt

=1

1− α

∫ ∞

r

P (L ≥ t)dt.

In particular (see Fig. 2.8 in which the shaded area represents E[(L− rα)+]), letr = rα = V aRα(L) we have

1

1− αE[(L− rα)

+] =1

1− α

∫ ∞

P (L ≥ t)dt (2.4.5)

=1

1− α

∫ 1

α

(V aRt(L)− rα)dt

=1

1− α

∫ 1

α

V aRt(L)dt− rα

= CV aRα(L)− rα.

This proves

CV aRα(L) = V aRα(L) +1

1− αE[(L− V aRα(L))

+].

0

α1

rα = V aRα(L)

FL

QL

Fig. 2.8. Represent CVaR

To show that the min with respect to r is attained at r = rα we define

D = [r +1

1− αE[(L− r)+]]− [rα +

1

1− αE[(L− rα)

+]] (2.4.6)

= r − rα +1

1− α

∫ rα

r

P (L ≥ t)dt,

and we need only to show the easy fact that, for any r,

(1− α)D = (1− α)(r − rα) +

∫ rα

r

P (L ≥ t)dt ≥ 0. (2.4.7)

74 2 Financial Models in One Period Economy

The intuition is illustrated in Fig. 2.9 in which the short vertial bars siginify r < rαand r > rα, respectively. •

0

α1

rα = V aRα(L)

FL

QL

Fig. 2.4.3. Inequality (2.4.7).

Fig. 2.9. Inequality (2.4.7).

The representation (2.4.3) can actually be written as a linear programming whichyields the following dual representation.

Theorem 2.4.29 (Dual Representation)

CV aRα(L) = max

⟨v,−L⟩ : E[−v] = 1, 0 ≤ −v ≤ 1

1− α1

. (2.4.8)

Proof. We can write the conditional value at risk with confidence level α as thevalue function of the following linear programming problem:

CV aRα(L) = infr∈R,u∈RV (Ω,F,P )

r + 1

1− αE[u] : u ≥ 0, u+ r1 ≥ L.

The Lagrangian of this linear programming problem is,

L((r, u), (s, v)) = r + ⟨ 1

1− α1, u⟩+ ⟨s, u⟩+ ⟨v, u+ r1− L1⟩,

where s, v ≤ 0. For linear programming problem as long as both primal and dualproblems are feasible strong duality holds. Thus, we have

CV aRα(L) = infr,u

sups≤0,v≤0

L((r, u), (s, v))

= sups≤0,v≤0

infr,u

L((r, u), (s, v))

= sups≤0,v≤0

infr,u

[r(1 + ⟨v, 1⟩) + ⟨ 1

1− α1 + s+ v, u⟩+ ⟨v,−L⟩

]= sup

s≤0,v≤0

[⟨v,−L⟩ : ⟨−v, 1⟩ = 1,

1

1− α1 + s+ v ≥ 0

]= sup

[⟨v,−L⟩ : E[−v] = 1,

1

1− α1 ≥ −v ≥ 0

].

2.4 Risk Measures 75

Since the dual solution exists the sup is, in fact, a max. •As a corollary we see that CV aR is essentially a coherent risk measure.

Corollary 2.4.30 Define ρ(x) = CV aRα(−x). Then ρ is a coherent risk measure.

Estimating CVaR

The dual representation in Theorem 2.4.29 provides a method of estimating theconditional value at risk. Consider a portfolio Θ. Its corresponding gain is Θ · Rwhere R = S1 − S0 is the vector of gains of the assets in the financial market. Theloss is then represented by −Θ · r. Now suppose R1, . . . , Rm is a sample of the gainvector of size m, then we can estimate the expectation of the return of the portfolioΘ by

E[Θ ·R] ≈ 1

m

m∑k=1

Θ ·Rk.

It follows that

CV aRα(Θ ·R) ≈ minr∈R

r +

1

(1− α)m

m∑k=1

(Θ ·Rk − r)+

(2.4.9)

Thus, by discretizing the dual representation we can estimate

CV aRα(Θ ·R) ≈ max m∑k=1

−vkΘ ·Rk (2.4.10)

0 ≤ vk ≤ 1

(1− α)m, k = 1, . . . ,m,

m∑k=1

vk = 1.

We can view vk as an alternative probability measure on the sample spaceR1, R2, . . . , Rm.

3

Finite Period Financial Models

Summary. We now expand our discussion to a multi-period economy with finitestatus. This setting models trading in the real world quite well, where we alwaysonly deal with finite number of transactions and finite number of possible scenarios.On the technical side, both payoffs and trading strategies are still belong to finitedimensional vector spaces. The first three sections show that the key results in oneperiod economy also holds in the more general setting of a multi-period economy.Section 4 discusses super and sub-hedging from the perspective of duality. Section5 discusses how to model the more practical financial markets with bid and askspreads.

3.1 The Model

3.1.1 An Example

Consider the game of bet on flipping a fair coin.

• Head: the house will double your bet.• Tail: you lose your bet to the house.

Play the game i times and always bet 1 unit. Denote the outcome of the ithgame by Xi. Then Xi is a random variable and P (Xi = 1) = P (Xi = −1) = 1/2. Ifwe start with an initial endowment of w0 then our total wealth after the ith gameis

wi = w0 +X1 + . . .+Xi. (3.1.1)

Now (wi)ni=1 is an example of a discrete stochastic process.

We turn to consider the available information at each stage. Suppose we knowX1, . . . , Xi. Does this help us to play the (i + 1)th game? In this case we have noreason to believe so. How do we clearly describe this conclusion? Let us look at thegame with n = 3 to get some feeling. We use H to represent a head and T , tail. Theinformation we can get at each stage can be illustrated with the following binarytree.

78 3 Finite Period Financial Models

F0 F1 F2 F3

HHHHH

HHTH

HTHHT

HTTΩ

THHTH

THTT

TTHTT

TTT

In this example all the information are represented by F3 = 2Ω ,where

Ω = HHH,HHT,HTH,HTT, THH, THT, TTH, TTT.

Similarly, after 2 tosses F2 = 2HH,HT,TH,TT, where

HH,HT, TH, TT = HHH,HHT, HTH,HTT, THH, THT, TTH, TTT.

F2 has less information than F3. Similarly, F1 = 2H,T, where

H,T = HHH,HHT,HTH,HTT, THH, THT, TTH, TTT.

At the beginning F0 = ∅, Ω.A random variable such as wi relies only on information up to time i. Then,

for any a, (wi < a) ∈ Fi. In other words, wi is Fi-measurable. We say a stochasticprocess X = (Xi) is F-adapted if, for each i, Xi is Fi-measurable. The randomprocess (wi) in the coin toss example is F-adapted.

3.1.2 A General Model

We continue using probability space (Ω,F , P ) to represent an economy where thesample space Ω is finite. Transactions now can happen in a finite set of times0, 1, . . . , T instead of only 0, 1. Involving transactions at multiple stages re-quires us to be more elaborative about the information available at each of thestags. An information structure is a finite chain of σ-algebras of Ω: F = ∅, Ω =F0 ⊂ F1 ⊂ . . . ⊂ FT = F. It represents the gradually revealing information asillustrated in the previous subsection. Since Ω is finite, each Ft is generated bya finite number of atoms Bt = Bnt , n = 1, . . . , Nt. We model a financial mar-ket with a M + 1-dimensional F-adapted stochastic process S = (S0, S1, . . . , ST )where St = (S0

t , S1t , . . . , S

Mt ) represents the prices of M + 1 assets at time t and is

Ft-measurable. Again we assume the risk free rate is 0 so that S0t = 1.

3.1 The Model 79

Definition 3.1.1 (Trading Strategies) A trading strategy Θ = (Θ0, Θ1, . . . , ΘT−1)is a F-adapted process of M +1 dimensional random vectors. Each Θt can be viewedas a portfolio that the trader holds in the time interval [t, t + 1). Restricting thisportfolio Θt to each of the atoms Bnt , n = 1, . . . , Nt of Ft, Θt|Bnt is a constant

vector in RM+1 as in Definition 2.3.2 of portfolios. Two portfolios Θ1t and Θ2

t areequivalent on market S if their restriction on all the atoms Bnt , n = 1, 2, . . . , Nt ofF are equivalent on S. Similarly two trading strategies are equivalent on S if all oftheir corresponding portfolios are equivalent on S. The quotient space of all tradingstrategies with respect to this equivalent relationship is called the trading strategyspace on market S and is denoted by ts[S]. We define the norm of the portfolio Θtby

∥Θt∥p =

√√√√ Nt∑n=1

∥Θt|Bnt ∥2p

and the norm of a trading strategy Θ ∈ ts[S] by

∥Θ∥ts =

√√√√T−1∑t=0

∥Θt∥2p.

Then (ts[S], ∥ · ∥ts) is a finite dimensional Banach space.

In a real world of investing, the investors often face scenarios in which not allthe trading strategies in ts[S] are available. For example

• If short selling is not allowed, then the set of admissible trading strategies isdefined by

ts[S]+ = Θ ∈ ts[S] | Θt ≥ 0, t = 0, 1, . . . , T − 1.• If for a particular investor only a subset of the assets S0, S1, . . . , Sk is available,

then the set of admissible trading strategies becomes

ts[S0, S1, . . . , Sk] = Θ ∈ ts[S] | Θmt = 0,m = k+1, . . . ,M, t = 0, 1, . . . , T−1.

• Suppose a subset of the assets Sk+1, . . . , SM can only be traded at t = 0 andt = T . Then the set of admissible trading strategies is defined by

Θ ∈ ts[S] | Θmt = Θm0 ,m = k + 1, . . . ,M, t = 1, . . . , T − 1.

By choosing different subset of ts[S] we can conveniently handle different scenar-ios of the finite period financial model over economy (Ω,F , P ). We can view variousquestions related to these scenarios as to find suitable admissible trading strategiesto obtain preferred risk adjusted gains. However, the preference will depend on theagent who is usually risk avert. By and large, there are two ways of modeling the riskaversion: using concave utility functions and using convex risk or loss functions . Asa result, problems related to these financial models will be handled in the frameworkof maximizing expected utility functions or minimizing convex risk functions. Thus,tools in convex analysis again play essential roles.

We say a trading strategy is self-financing if

80 3 Finite Period Financial Models

Θt−1 · St = Θt · St, t = 1, 2, . . . , T − 1.

We use T to denote all self-financing trading strategies on market S. Clearly T is asubspace of ts[S]. The gain of a self-financing trading strategy Θ up to time t is thecumulative gains of portfolios Θs, s = 0, 1, . . . , t− 1:

Gt(Θ) :=

t∑s=1

Θs−1 · (Ss − Ss−1) = Θt−1 · St −Θ0 · S0.

We can verify that Gt(Θ) ∈ RV (Ω,F , P ) for all t = 1, 2, . . . , T .The norm of a trading strategy is a good proxy for its leverage level which is

very important for many purposes. As a corollary of Lemma 2.3.3 we have

Corollary 3.1.2 There exists a constant K = K(S) that depends only on marketS such that for any self-financing trading strategy Θ ∈ T ,

∥Θ∥ts ≤ Kmax∥Gt(Θ)∥RV , t = 1, 2, . . . , T.

3.2 Arbitrage and Admissible Trading Strategies

We extend the definition of arbitrage in Definition 2.3.5 to trading strategies.

Definition 3.2.1 (Arbitrage Trading Strategy) We say that a self-financing tradingstrategy Θ on market S is an arbitrage if Gt(Θ) ≥ 0, t = 1, . . . , T and GT (Θ) = 0.

In every practical trading there is always a limit in how much one can lose. Thisleads to the concept of admissible trading strategies described below.

Definition 3.2.2 (Admissible Trading Strategy) Let a > 0 be a constant. We saythat a self-financing trading strategy Θ ∈ T is a-admissible if, for all t = 1, 2, . . . , T ,

Gt(Θ) ≥ −a. (3.2.2)

We use A(a) to denote the (convex) set of all a−admissible trading strategies.

3.2 Arbitrage and Admissible Trading Strategies 81

An arbitrage trading strategy is a- admissible for any a > 0. Thus, we have

Lemma 3.2.3 For a > 0, T contains no arbitrage if and only if A(a) contains noarbitrage.

The next lemma shows that when T contains no arbitrage to show Θ is a-admissible we need only to check condition (3.2.2) at t = T . The proof is an adap-tation of the argument used in [22].

Lemma 3.2.4 If T contains no arbitrage then Θ ∈ T is a-admissible if and only if

GT (Θ) ≥ −a. (3.2.3)

Proof. The ”only if” part is obvious.To prove the “if” part observe first that without loss of generality we may assume

that the initial endowment Θ0 · S0 = 0 so that Gt(Θ) = Θt−1 · St, t = 1, 2, . . . , T .Now assume that (3.2.3) holds and Θ is not a-admissible. Then there exist t ≤ Tand A ∈ Ft such that on A,

Θt−1 · St = b < −a

and Θs−1 · Ss ≥ −a on A for all s ≥ t.Define a trading strategy Θ as follows: for all s ≤ t − 1, Θs = 0. For ω ∈ A,

Θt(ω) = 0 and for ω ∈ A,

Θnt (ω) =

Θ0t (ω)− b for n = 0

Θnt (ω) for n = 1, 2, . . . ,M.(3.2.4)

For s > t define

Θns =

Θt · St+1 for n = 0

0 for n = 1, 2, . . . ,M.(3.2.5)

We can see that Θ is F-adapted. Moreover, for ω ∈ A,

Θt · St = Θ0t − b+

M∑n=1

Θnt Snt (3.2.6)

= Θt · St − b = Θt−1 · St − b = 0 = Θt−1 · St.

For ω ∈ A, Θt · St = 0 = Θt−1 · St by definition. For s > t, Θs−1 · Ss = Θt · St+1 arepure cash and, therefore, Θ is a self-financing trading strategy.

82 3 Finite Period Financial Models

Finally, for all s > t,

Θs−1 · Ss = Θt · St+1 (3.2.7)

= Θ0t − b+

M∑n=1

Θnt Snt+1

=

Θt · St+1 − b > −a− b > 0 for ω ∈ A

0 for ω ∈ A..

This implies that Θ is an arbitrage, which leads to a contradiction. •We can also show that when there is no arbitrage the set of admissible trading

strategies A(a) is compact.

Lemma 3.2.5 For any a > 0, if A(a) contains no arbitrage then it is bounded andcompact.

Proof. We first show that A is bounded. For t = 1, 2, . . . , T , let us denote At =Θ ∈ A : Θs contains only cash position for s > t − 1. We note that AT = Aand prove by induction on t. Again without loss of generality we assume the initialendowment is always 0.

For t = 1, assume that there is no arbitrage but A1 is unbounded. By Corol-lary 3.1.2 there exists a sequence of trading strategies Θ(m) ∈ A1 such that∥Θ(m)0 · S1∥ is unbounded. Without loss of generality we may assume that, forall m, ∥Θ(m)0 ·S1∥ > 1 and ∥Θ(m)0 ·S1∥ → +∞ then Θ(m)/∥Θ(m)0 ·S1∥ ∈ A1 andis bounded by Corollary 3.1.2. Selecting a subsequence if necessary we may assumethat Θ(m)/∥Θ(m)0 ·S1∥ converges to Θ∗ ∈ A1. Since Θ(m)0 ·S1 ≥ −a, taking limitwe have

limm→∞

Θ(m)1 · S1/∥Θ(m)1 · S1∥) = Θ∗1 · S1 ≥ 0.

On the other hand we also know from the above limiting process that ∥Θ∗1 ·S1∥ = 1.

This means Θ∗ is an arbitrage, a contradiction.Now under the induction hypothesis of As, s = 1, 2, . . . , t − 1 are all bounded,

we show that At is bounded. Assume that the contrary holds. Then there exists asequence of trading strategies Θ(m) ∈ At such that ∥Θ(m)t−1 · St∥ is unbounded.Since all As, s = 1, 2, . . . , t − 1 are bounded, the portfolio Θt−1(m) must be un-bounded. Then the same argument as in the case of t = 1 will yield a contradiction.This completes the induction proof and, therefore, A is bounded.

Since Θt · St is continuous in Θt, A defined by constraint (3.2.2) is also closedand, therefore, it is compact.

3.3 Fundamental Theorem of Asset Pricing 83

3.3 Fundamental Theorem of Asset Pricing

Now we turn to prove the FTAP in multiperiod market model and discuss relatedapplications.

3.3.1 Fundamental Theorem of Asset Pricing

As in the case of T = 1, we prove the FTAP by considering a pair of dual convexprogramming problems in which the primal is maximizing utility among admissibletrading strategies:

supE[u(ΘT−1 · ST )] : Θ0 · S0 = w0, Θ ∈ ts[S]. (3.3.8)

We show that a solution to the dual of (3.3.8) when scaled gives us a martingalemeasure and, thus, linking the fundamental theorem of asset pricing to utility max-imization problem (3.3.8).

Theorem 3.3.1 Let S be a financial market. Then the following are equivalent:

(i) There exists no arbitrage trading strategy in ts[S];(ii) For every utility function u with properties (u1),(u2), and (u3), the finite optimal

value of the trading strategy utility optimization problem (3.3.8) is attained.(iii)There is an equivalent S-martingale measure proportional to an element of the

subdifferential of the utility function at the optimal portfolio.

Proof. First observe that the utility optimization problem (3.3.8) can be writtenequivalently as

max E[u(y)] (3.3.9)

subject to y ∈ w0 +W,

where W = GT (Θ) : Θ ∈ T is the linear subspace of all achievable gains usingself-financing trading strategies.

Defining f(y) = −E[u(y)] and g(y) = ιw0+W (y), we can rewrite problem (3.3.9)as

−miny

f(y) + g(y) (3.3.10)

The dual problem of (3.3.10) is,

−max−f∗(−z)− g∗(z) (3.3.11)

= min E[(−u)∗(−z)] + ⟨z, w0⟩+ σW (z)

Since we can check that the constraint qualification condition

84 3 Finite Period Financial Models

w0 ∈ int contf ∩ domg = RV (Ω,F , P )+ ∩ (w0 +W ) (3.3.12)

holds , (3.3.10) and its dual (3.3.11) have the same value.When T contains no arbitrage, by property (u2) of the utility function,E[u(ΘT−1·

ST )] > −∞ implies ΘT−1 · ST ≥ 0 or GT (Θ) ≥ −w0. By Lemma 3.2.4, we musthave Θ ∈ A(w0). Thus, the utility maximization problem (3.3.8) is equivalent to

supE[u(ΘT−1 · ST )] : Θ0 · S0 = w0,H ∈ A(w0). (3.3.13)

By Lemma 3.2.5 problem (3.3.13) and, therefore, (3.3.9) has a finite solution. Bythe strong duality, the dual problem (3.3.11) has a finite optimal value and at-tains its solution. Condition (u2) forces the domain of E[(−u)∗(·)] to be a subset ofint (−RV (Ω,F , P )+). Thus, we only need to consider z > 0 in the dual problem(3.3.11). Moreover, we must have ⟨z,GT (Θ)⟩ = 0 in (3.3.11) since σW (z) < ∞ andW is a subspace of RV (Ω,F , P ). Hence we can write problem (3.3.11) as

min E[(−u)∗(−z)] + ⟨w0, z⟩ | z > 0, ⟨z,GT (Θ)⟩ = 0, for all Θ ∈ T .(3.3.14)

Let z be a solution to (3.3.14) it is easy to check that Q = (z/E[z)]P is an equivalentS-martingale measure. Thus, (i) implies (ii).

On the other hand, the existence of an equivalent S-martingale measure impliesthat the dual problem (3.3.11) has a finite value and, therefore is equivalent toproblem (3.3.14) whose dual is the utility maximization problem (3.3.8). Problem(3.3.14) can be viewed as minimizing the convex function z → E((−u)∗(−z)) +⟨w0, z⟩ over the entire subspace z : ⟨z,GT (Θ)⟩ = 0, for all Θ ∈ T (z > 0 is merely aconsequence of the domain of E[(−u)∗(·)] being a subset of int −RV (Ω,F , P )+ and,therefore, is not a separate constraint). Thus, the constrain qualification conditionfor (3.3.14) satisfies (see e.g. [65, Theorem 2.7.1]). It follows that problem (3.3.8) asthe dual of (3.3.14) has a finite value and attains its solution, which is to say that(ii) implies (iii).

Finally, if (iii) is true then there cannot be any arbitrage in T because addingan arbitrage to the optimal solution of (3.3.8) will improve it. Thus, (iii) implies (i)and we have completed a cyclic proof of the equivalence of (i), (ii) and (iii). •

3.3.2 Relationship between Dual of Portfolio UtilityMaximization, Lagrange Multiplier and Martingale Measure

Although no arbitrage is equivalent to the existence of an equivalent martingalemeasure is well known, as pointed out in [67] the proof of Theorem 3.3.1 using aclass of utility functions says more. It tells us that the risk neutral measure is, in fact,a scaling of the solution to the dual of the portfolio utility maximization problem.Moreover, since the dual solution corresponding to the Lagrange multipliers of theprimal portfolio utility maximization problem (see [9]), we see that the equivalentmartingale measure can also be explained as the scaling of the Lagrange multiplierof the portfolio utility maximization problem.

To see this relationship explicitly, let us write the utility optimization problem(3.3.8) as

infE[(−u)(x)] : x−GT (Θ)− w0 = 0, Θ ∈ T . (3.3.15)

3.3 Fundamental Theorem of Asset Pricing 85

The existence of the solution to the dual of (3.3.15) implies the existence of a La-grange multiplier λ ∈ RV (Ω,F , P ) such that the Lagrangian

L((x,Θ), λ) = E[(−u)(x)] + ⟨λ, x−GT (Θ)− w0⟩= E[(−u)(x) + λ(x− w0)]− ⟨λ,GT (Θ)⟩

attains minimum at solution (x∗, Θ∗) to the problem (3.3.8). It follows that, for anyP (ω) = 0,

λ(ω) ∈ −∂(−u)(x∗(ω)) ⊂ (0,+∞) (3.3.16)

and, since Θ 7→ ⟨λ,GT (Θ)⟩ is linear,

⟨λ,GT (Θ)⟩ = 0, for all Θ ∈ T . (3.3.17)

It is easy to deduce from (3.3.17) that E[λ(St − St−1) | Ft−1] = 0. Thus, Q =(λ/E[λ])P is a martingale probability measure for market S equivalent to P .

3.3.3 Pricing Contingent Claims

Suppose that a contingent claim can only be traded at t = 0 and t = T and itspayoff at time t = T is ϕ(ST ). To find out a reasonable price ϕ0 for this contingentclaim at time t = 0, we can again consider the portfolio utility optimization problem

minimize E[(−u)(x)] (3.3.18)

subject to x− β(ϕ(ST )− ϕ0 +GT (Θ))− w0 = 0,

Θ ∈ T .

Using the same arguement as in the previous subsection, we can show that thereexists a Lagrange multiplier λ ∈ RV (Ω,F , P ) such that, for any P (ω) = 0,

λ(ω) ∈ −∂(−u)(x∗(ω)) ⊂ (0,+∞)

and Q = (λ/E[λ])P is a martingale probability measure for market S equivalent toP . Moreover,

ϕ0 = EQ[ϕ(ST )]. (3.3.19)

Formula (3.3.19) indicates that the martingale measure used to pricing a contingentclaim is, in general, relies on the risk aversion of an agent. Thus, in an incompletemarket, agents with different risk aversions and, therefore, different utility functionsmay reasonably price the same contingent differently. This is certainly consistentwith the reality of the markets.

3.3.4 Solution to the Utility Optimization Problem

The discussion in section 2.3.3 can be extended to multi-period model.

86 3 Finite Period Financial Models

Theorem 3.3.2 Suppose that equivalent martingale measure Q on market S isunique and S has no arbitrage. Then portfolio optimization problem (3.3.15) is equiv-alent to

minimize E[(−u)(x)] (3.3.20)

subject to EQ(x) = w0.

As we have seen in the one period case this is merely calculating the optimalend wealth using the Lagrangian. Proof is similar to that of the one period case andis omitted.

3.4 Hedging and Super Hedging

If the market price of an asset violates those specified by the fundamental theorem ofasset pricing then in theory an arbitrage opportunity arises. We turn to the problemof how to take advantage of such an arbitrage opportunity.

3.4.1 Super- and Sub-hedging Bounds

Consider an European style contingent claim whose payoff at T is ψ(ST ). By thefundamental theorem of asset pricing, the price of ψ at t = 0 must belong to theset EQ[ψ(ST )] : Q ∈ M to be arbitrage free. Here M is the set of all martingalemeasures equivalent to P . It follows that

ψ = supEQ[ψ(ST )] : Q ∈ M (3.4.1)

and

ψ = infEQ[ψ(ST )] : Q ∈ M (3.4.2)

give us upper and lower bounds for the price of ψ. If the price of ψ fells outsideof these bounds, an arbitrage will become possible. We call them super- and sub-hedging bounds, respectively. We focus on the super-hedging bound. The discussionabout the sub-hedging bound can be reduced to that of a super hedging bound for−ψ because

−ψ = supEQ[−ψ(ST )] : Q ∈ M. (3.4.3)

If the market price of ψ is above this super hedging bound how can we find an ar-bitrage strategy? It turns out that the key is to view (3.4.1) as a linear programmingproblem and consider its dual. As discussed before that for a linear programmingproblem and its dual, the constraint qualification condition ensuring the strong du-ality is, in fact, the feasiblity condition. So the key is to correctly formulate the dualproblem of (3.4.1). We will use the Lagrange formulation. Let’s assume ΘnNn=1 isa bases for the finite dimensional Banach space ts[S]. Then we can rewrite (3.4.1)as

3.4 Hedging and Super Hedging 87

ψ = supQ∈M+

EQ[ψ(ST )] : EQ[GT (Θ)] = 0,EQ[1] = 1, Θ ∈ ts[S] (3.4.4)

= supQ∈M+

EQ[ψ(ST )] : EQ[1] = 1,EQ[GT (Θn)] = 0, n = 1, . . . , N,

where M+ signifies the set of all positive measures. We can see that (3.4.4) is alinear programming problem. Moreover, the Lagrangian of (3.4.4) is

L(Q,λ) = EQ[ψ(ST )] +

N∑n=1

λnEQ[GT (Θn)] + λ0(E

Q[1]− 1), (3.4.5)

where λ = (λ0, λ1, . . . , λN ) ∈ RN+1 is the Lagrange multiplier. Observe that ele-ments Θ ∈ ts[S] can be represented as

Θ =

N∑n=1

λnΘn

we can equivalently view (Θ, λ0) as a Lagrange multiplier of the linear programmingproblem (3.4.4) and write the Lagrangian as,

L(Q, (Θ, λ0)) = EQ[ψ(ST )] +EQ[GT (Θ)] + λ0(EQ[1]− 1), (3.4.6)

where (Θ, λ0) ∈ ts[S]× R. It is easy to verify that

inf(Θ,λ0))∈ts[S]×R

L(Q, (Θ, λ0)) =

EQ[ψ(ST )] Q ∈ M−∞ otherwise.

Thus, we can write

ψ = supQ∈M+

inf(Θ,λ0)∈ts[S]×R

L(Q, (Θ, λ0)) (3.4.7)

and by strong duality we have

ψ = inf(Θ,λ0)∈ts[S]×R

supQ∈M+

L(Q, (Θ, λ0)) (3.4.8)

= infΘ∈ts[S]

supQ∈M+

EQ[ψ(ST ) +GT (Θ)],EQ[1] = 1

= infΘ∈ts[S]

supω∈Ω

ψ(ST )(ω) +GT (Θ)(ω)

The financial interpretation of the last expression in (3.4.8) is that a solution toproblem (3.4.8), if exists, is a trading strategy that results in a payoff that is alwaysbounded by the super-hedging bound. Thus, if the market price exceeds the super-hedging bound one has an arbitrage strategy.

The arbitrage trading strategy alluded to above can be found by solving thelinear programing problem

min t (3.4.9)

s.t. t−GT (Θ)(ω) ≥ ψ(ST )(ω), ω ∈ Ω

Θ ∈ ts[S], t ∈ R.

88 3 Finite Period Financial Models

Let Θ and t = ψ be the solution of (3.4.9). If the market price of the contingentclaim at t = 0 is

ψ0 > ψ.

Then we can short one share of the contingent claim and follow the trading strategy−Θ (or equivalently, short the trading strategy Θ). By time t = T , we have

t−GT (Θ)(ω) ≥ ψ(ST )(ω), for all ω ∈ Ω.

That is to say the gain from the trading and cash amount ψ safely covers theshort position in any possible economic state and the difference ψ0 −ψ becomes ourarbitrage profit.

3.4.2 Towards a Complete Market

If we know the prices of some European contingent claims, say ϕ1, . . . , ϕK at t = 0to be c1, . . . , cK , respectively. Then to avoid arbitrage the estimate of the upperbound for a contingent claim ψ is

supEQ[ψ] : Q ∈ M,EQ[ϕk] = ck, k = 1, . . . ,K. (3.4.10)

Denote c = (c1, . . . , cK) and ϕ = (ϕ1, . . . , ϕK) we can write the Lagrangian of theconstrained optimization problem (3.4.10) as

L(Q, (Θ, λ0, b)) = EQ[ψ(ST )] +EQ[GT (Θ)] + λ0(EQ[1]− 1) + b · (EQ[ϕ(ST )]− c),

where (Θ, λ0, b) ∈ ts[S]× R× RK .Similar to the previous section we can verify that, by the strong lagrange duality,

ψ|ϕ = inf(Θ,λ0,b)∈ts[S]×R×RK

supQ∈M+

L(Q, (Θ, λ0, b)) (3.4.11)

= inf(Θ,b)∈ts[S]×RK

supQ∈M+

EQ[ψ(ST ) +GT (Θ) + b · (ϕ− c)],EQ[1] = 1

= inf(Θ,b)∈ts[S]×RK

supω∈Ω

ψ(ST )(ω) +GT (Θ)(ω) + b · (ϕ(ST )(ω)− c).

The financial interpretation of the last expression in (3.4.11) is that a solution toproblem (3.4.11), if exists, is a trading strategy that results in a payoff that is alwaysbounded by the super-hedging bound. Thus, if the market price exceeds the super-hedging bound one has an arbitrage stratgy, which can be calculated using a linerprogramming problem similar to that of in (3.4.9).

Here with the additional tradable contingent claims ϕ1, . . . , ϕK , the upper boundfor the no arbitrage price is lowered and correspondingly the lower bound will beincreased so that we get a more accurate estimate of the price. If we add enoughadditional contingent claims as the tradable, the market eventually becomes com-plete in the sense that the upper and lower bounds will coincide to give us a uniqueprice. There are many ways to characterize a complete market. In the context herethe most direct way is to require the subspace

W = GT (Θ) + b · (ϕ− c) | (Θ, b) ∈ ts[S]× RK (3.4.12)

of RV (Ω,F , P ) has a codimension 1 (the dimension of W is exactly 1 less than thatof RV (Ω,F , P )).

3.4 Hedging and Super Hedging 89

3.4.3 Incomplete Market Arise from Complete Markets

We turn to consider an incomplete market arises from complete markets. A moti-vating example is a call option on a currency spread. For simplicity let us considera one period economy where transactions take place at t = 0 and t = 1.The payoffof a call option on the spread of two different currencies C1, C2 with a strike K interms of a third currency at t = 1 is then

(C11 − C2

1 −K)+. (3.4.13)

Since C1 and C2 are different currencies, it is reasonable to model their value interms of the common currency at time t = 1 as random variables in two differentprobability spaces (Ω1,F1, P1) and (Ω2,F2, P2), respectively. We assume that bothmarkets for C1 and C2 are complete. Moreover, we assume that Pi is the uniquemartingale measure for Ci, i = 1, 2. If we consider (3.4.13) to be a special formof the more general contingent claim ψ = ψ(C1

1 , C21 ), then ψ is a random variable

on the product measure space (Ω1 × Ω2,F1 × F2). Our problem now is to seek amartingale measure π on (Ω1 × Ω2,F1 × F2), which prices ψ so as to consistentwith the martingale measures P1 and P2, respectively. Consider a contingent claimϕ1(C1) that depends only on C1. We can view this payoff both as a random variableon (Ω1,F1, P1) and as a random variable on (Ω1 ×Ω2,F1 ×F2, π). Thus requiringπ to be consistent with P1 is to require∫

Ω1

ϕ1(C11 )dP1 =

∫Ω1×Ω2

ϕ1(C11 )dπ. (3.4.14)

Since ϕ1(C11 ) is arbitrary this is to say that P1 is the marginal probability measure

of π on Ω1. Similarly, P2 must be the marginal probability measure of π on Ω2.Clearly, product measure π that satisfies such marginal requirements is not unique.We see that despite the completeness of the financial markets on Ω1 and Ω2, inpricing a contingent claim with payoff as a random variable on the product measurespace (Ω1 ×Ω2,F1 ×F2), we face an incomplete market.

To find the upper bound for the price of ψ that is consistent with the no arbitrageprinciple we face the optimization problem

ψ = supπ∈Π(P1,P2)

Eπ[ψ], (3.4.15)

where Π(P1, P2) signifies the set of all probability measures on the product measurespace (Ω1×Ω2,F1×F2) whose marginals on Ω1 and Ω2 are P1 and P2, respectively.

Problem (3.4.15) turns out to be a Kantorovich mass transport problem. ByKantorovich’es duality theorem (a special case of the abstract linear programmingduality) we have

ψ = supπ∈Π(P1,P2)

Eπ[ψ] = min(ϕ1,ϕ2)∈Gψ

(EP1 [ϕ1] +EP2 [ϕ2]

), (3.4.16)

where Gψ := (ϕ1, ϕ2) ∈ (Ω1×Ω2,F1×F2) : ϕ1(ω1)+ϕ

2(ω2) ≥ ψ(C11 (ω1), C

21 (ω2)).

The Kantorovich duality (3.4.16) shows that in principle one can implement theupper no arbitrage price bound ψ using the sum of two contingent claims ϕ1 and ϕ2

on sample spaces Ω1 and Ω2, respectively.

90 3 Finite Period Financial Models

In this concise introduction we cannot afford a detailed discussion of the Kan-torovich duality theorem. Instead we will exam the case when both sample spacesΩ1 and Ω2 are finite. In this case problem (3.4.15) reduces to a linear program-ming problem. We can achieve the decoupling alluded to in the Kantorovich dualitytheorem by directly using linear programming duality.

Example 3.4.1 (Estimate Upper No Arbitrage Bound in Finite Sample Spaces)Suppose that both sample spaces Ω1 and Ω2 are finite. Denote Ω1 = i : i =

1, . . . , L and Ω2 = j : j = 1, . . . ,M, respectively. For briefty of the notation wedenote

ψij = ψ(C1(i), C2(j)).

Then the problem of finding an upper bound for the contingent claim ψ(C1, C2) canbe formulated as

max∑

ψijπij (3.4.17)

s.t.∑j

πij − µi = 0,∑i

πij − νj = 0

∑i

C11 (i)µi = C1

0 ,∑j

C21 (j)νj = C2

0∑i

µi = 1,∑j

νj = 1.

The dual of the linear programming problem (3.4.17) is

min λ1C10 + λ2C

20 + λ3 + λ4 (3.4.18)

s.t. ui + vj ≥ ψij

λ1C11 (i) + λ3 − ui ≥ 0

λ2C21 (j) + λ4 − vj ≥ 0.

Defining ϕ1(C1) = λ1C1 + λ3 and ϕ2(C2) = λ2C

2 + λ4 we can rewrite (3.4.18) as

min ϕ1(C10 ) + ϕ2(C2

0 ) (3.4.19)

s.t. ϕ1(C11 (i)) + ϕ2(C2

1 (j)) ≥ ψij

which is the Kantorovich dual form of (3.4.17). Note that ϕ1 and ϕ2 are linearlydepend on C1 and C2, respectively. Thus, problem (3.4.19) is a linear programmingproblem.

3.5 Conic Finance

Real financial markets have frictions. Trading a financial asset one faces two differentprices: ask and bid. Usually, the ask is strictly larger than the bid and one can only

3.5 Conic Finance 91

buy at the ask price and sell at the bid price. This violation of the one price principlecomplicates the modeling. The attainable gains from trading assets in such a morerealistic market model is not a subspace but rather, in general, a cone. This leadsto the name of conic finance.

3.5.1 Modeling Financial Markets with an Ask-Bid Spread

Let F = ∅, Ω = F0 ⊂ F1 ⊂ . . . ⊂ FT = F be an information structure on theprobaility space (Ω,F , P ) with a finite sample space that represents the economicstates. Denote X the space of all F−adapted cash streams x = (xt)

Tt=0 endowed with

the inner product

⟨x, y⟩ = E

[T∑t=0

xtyt

].

Then X is a finite dimensional Hilbert space.A financial market consists of M risky assets Sm ∈ X ,m = 1, 2, . . . ,M and T

riskless bonds 1u, u = 1, 2, . . . , T where 1uu = 1 and 1ut = 0 for t = u. At time t,to trade the rights to the income stream of Si after t, there is a bid and ask pricepaire:

bit ≤ ait (3.5.1)

Thus, paying ait one will get the income stream (Sis)Ts=t+1. Similarly, receiving bit

one sells the income stream (Sis)Ts=t+1 or in other words get the income stream

(−Sis)Ts=t+1. Considering the market friction makes the model more complicatedcomparing to the one price model we have used so far. A risky asset in a one priceeconomy is described buy its prices (as random variables) at T +1 trading times. Ina economy with bid and ask spread to describe a risky asset we need to use 2(T +1)income streams with corresponding prices. Similarly, the description of riskless assetalso becomes more involved. We regard them as a series of riskless bonds maturingat time u = 1, 2, . . . , T whose payment streams are 1 when t = u and 0 when t = uand is denoted 1u. The bid and ask prices of 1u at t < u are denoted gut and hut ,respectively. They satisfy the following inequality

gut ≤ hut . (3.5.2)

A convenient way of thinking the trading of these income streams is to incorpo-rate the buying cost or selling revenue into the income streams to view the resultingincome streams as zero cost money streams. For example, the action of buying in-come stream (Sis)

Ts=t+1 at time t with ask price ait is equivalent to acquiring the zero

cost money stream Sit defined by

Sits =

0 s < t

−ait s = t

Sis s > t.

(3.5.3)

Selling the above income stream at the bid price bt yields the zero cost incomestream Sit defined by

92 3 Finite Period Financial Models

Sits =

0 s < t

bit s = t

−Sis s > t.

(3.5.4)

We observe that Sit is different from −Sit due to the spread between the ask andbid prices. Similarly the bond maturing at u generates zero cost income streams

1uts =

0 s = u, t

−hut s = t

1 s = u,

and 1uts =

0 s = u, t

gut s = t

−1 s = u.

(3.5.5)

Assuming that one can buy or sell any fraction of the cash stream alluded to above,suppose αit, α

it, β

ut , β

ut , i = 1, . . . ,M, u = 1, . . . , T are nonnegative Ft measurable

random variables, then

z =

T∑t=0

M∑i=1

[αitSit + αitS

it] +

T∑t=0

T∑u=1

[βut 1ut + βut 1

ut], (3.5.6)

is a cash stream that can be implemented by trading the available zero cost cashstreams. Denote Z the collection of all cash streams of the form in (3.5.6). It is clearthat Z is a cone. Define C to be the set of all cash streams c ∈ X such that thereexists a z ∈ Z with z ≥ c, that is, C is all the cash streams that can be dominatedby a cash stream in Z. Then it is easy to see that C is also a cone and Z ⊂ C.In general, C maybe larger than Z. The set C represents all the cash streams thatcan be dominated by a corresponding cash stream in Z, which can be implementedby using zero cost cash streams in the market by involving appropriate trading. Forany c ∈ C, we can find z ∈ Z defined by (3.5.6) such that z ≥ c. We say αit, α

it, β

ut ,

and βut is a trading strategy that super implements c.We note that when Si is a cash stream that pays SiT with ask and bid prices

at t both coincide with Sit and all gut = hut = 1, we recover the one price financialmarkets defined before as a special case.

3.5.2 Characterization of No Arbitrage by Utility Optimization

Using the model described in the previous section, we can extend the fundamentaltheorem of asset pricing to markets with a bid -ask spread. First we define arbitragein such a market.

Definition 3.5.1 (Arbitrage Trading Strategy) We say that a cash stream c ∈C\0 is an arbitrage if ct ≥ 0 for all t = 0, 1, . . . , T .

Denote X+ the cone in X with all the components are nonnegative, then there is noarbitrage trading strategy in the financial market described in the previous sectionif and only if

3.5 Conic Finance 93

C ∩ X+ = 0. (3.5.7)

Let u be a utility function satisfying the conditions (u1)–(u3). We consider theoptimal trading problem

p = max

T∑t=0

E[u(ct)] : c ∈ w0 + C

, (3.5.8)

where w0 ∈ X+ is an initial endowment cash stream. We can characterize the noarbitrage in terms of the optimal trading problem (3.5.8):

Theorem 3.5.2 (No Arbitrage and Utility Maximization) The financial marketdescribed in the previous section has no arbitrage trading strategy if and only if theoptimal trading problem (3.5.8) has a finite optimal value p <∞.

Proof. Since one can always scale an arbitrage trading strategy with any arbitrarilylarge positive number, therefore p < +∞ implies that there is no arbitrage tradingstrategy. On the other hand if p = +∞, without loss of generality we assume thatthere is a sequence zn ∈ Z such that

T∑t=0

E[u(w0t + znt )] → +∞. (3.5.9)

Clearly ∥zn∥ → +∞. Then taking a subsequence if necessary we can assume thatzn/∥zn∥ → z∗ ∈ Z\0. By property (u3) znt ≥ −w0

t , t = 0, 1, . . . , T . Thus, z∗t ≥ 0implies that z∗ is an arbitrage trading strategy. •

3.5.3 Dual Characterization of No Arbitrage

We turn to the dual characterization of the no arbitrage and its implication for theprice of financial assets. Define, for x ∈ X ,

f(x) =

T∑t=0

E[(−u)(xt)], (3.5.10)

we can rewrite the optimal trading problem (3.5.8) as

p = − inf[f(x) + ιw0+C(x)]. (3.5.11)

Note that the (CQ) condition

94 3 Finite Period Financial Models

0 ∈ int[dom ιw0+C − dom f ] = int[w0 + C −X+] (3.5.12)

holds. Thus, strong duality implies that

p = −maxz∈X

−σw0+C(z)− f∗(−z) (3.5.13)

= minz∈X

T∑t=0

E[(−u)∗(−zt) + ⟨w0, z⟩+ σC(z)

.

Let x∗, z∗ be solutions to the primal and dual problem (3.5.11) and (3.5.13), respec-tively. Condition (u2) implies that dom(−u)∗ = (−∞, 0) so that z∗t > 0. Moreover,

z∗t ∈ −∂(−u)(x∗t ). (3.5.14)

Finally, if the market has no arbitrage trading strategy then p < +∞ in (3.5.13)which implies that σC(z

∗) <∞ or

z∗ ∈ C := z∗ ∈ X : ⟨z∗, c⟩ ≤ 0, for all c ∈ C. (3.5.15)

Relation (3.5.15) can be interpreted as scaling z∗ one can derive a martingalemeasure. Let’s look into the detials. We will use Et to denote the conditional expec-tation with respect to Ft and characteristic function of a set χA(x) = 1 if x ∈ Aand χA(x) = 0 otherwise. Since for any A ∈ Ft, χA1ut, χA1ut ∈ Z ⊂ C we have

⟨z∗, χA1ut⟩ ≤ 0 and ⟨z∗, χA1ut⟩ ≤ 0 (3.5.16)

which implies that

gut z∗t ≤ Et[z

∗u] ≤ hut z

∗t . (3.5.17)

In particular, πt = Et[z∗t+1]/z

∗t ∈ [gt+1

t , ht+1t ] plays the role of a discounting factor

in the interval between transaction times t and t+1. Defining a discounting processΓt recursively by

Γ0 = 1, Γt+1 = Γtπt, (3.5.18)

we see that Γt is F− adapted. Denoting Mt = z∗t /Γt we can verify

Et[Mt+1] = Et

[z∗t+1

Γt+1

]= Et

[z∗t+1

Γtπt

]=

1

ΓtEt

[z∗t+1

πt

]=z∗tΓt

=Mt. (3.5.19)

In other words, Mt is a matingale and we have the decomposition

z∗t = ΓtMt. (3.5.20)

Defining Q =MTP , we can rewrite (3.5.17) as

gut ≤ EQt

[T∑

s=t+1

ΓsΓt

1uts

]≤ hut . (3.5.21)

Similarly, for any A ∈ Ft, χASit, χASit ∈ Z ⊂ C implies

bitz∗t ≤ Et

[T∑

s=t+1

z∗sSis

]≤ aitz

∗t . (3.5.22)

3.5 Conic Finance 95

Deviding by z∗t and using the representation (3.5.20) we have

bit ≤ Et

[T∑

s=t+1

ΓsMs

ΓtMtSis

]≤ ait, (3.5.23)

or

bit ≤ EQt

[T∑

s=t+1

ΓsΓtSis

]≤ ait. (3.5.24)

in other words the discounted values of the cash flows related to both bonds andrisky assets under the equivalent martingale measure Q fall between the bid and askprices.

3.5.4 Pricing and Hedging

In a market model with bid-ask spread the pair of equivalent martingale measure andrelated discount factors (Q,Γ ) plays the role of an equivalent martingale measurein a one price market model.

Definition 3.5.3 (Martingale Measure and Discount Factor) Let Q be a measureequivalent to P and let Γ = (Γt)

Tt=0 be an F− adapted process with Γ0 = 1. We say

that (Q,Γ ) is an equivalent martingale measure and discounting process pair cor-responding to the T -period market in section 3.5.1 provided that they satisfy reloa-tionships (3.5.21) and (3.5.24). We use MD to denote the collection of all suchpairs.

The set MD plays a role similar to the set of equivalent martingale measures ina one price economy and can be used to determine sub and sper-hedge bounds. Forsake of brievity we only discuss the one period case to illustrate the idea. Readerscan find a more technical discussion of the general multiperiod model in [60].

We will show that

u0 = max

EQ0

[Γ1

Γ0c1

]: (Q,Γ ) ∈ MD

defines a super hedging bound. A sub-hedging bound can be derived similarly. Werepresent u0 as a linear programming problem

u0 = max EQ0

[Γ1

Γ0c1

](3.5.25)

subject to EQ0

[Γ1

Γ0Sm1

]≤ am0 , EQ0

[Γ1

Γ0(−Sm1 )

]≤ −bm0 ,m = 1, . . . ,M,

EQ0

[Γ1

Γ0

]≤ h1

0, EQ0

[Γ1

Γ0(−1)

]≤ −g10 .

96 3 Finite Period Financial Models

We formulate the dual problem using the Lagrange format. Let

(Λ, γ) = (λ10, . . . , λ

M0 , λ

10, . . . , λ

M0 , γ

10 , γ

10) ∈ R2M+2

+ (3.5.26)

be the Lagrange multipliers of linear programming problem (3.5.25). We conside theLagrangian

L((Q,Γ ), (Λ, γ)) = EQ0

[Γ1

Γ0c1

]+ γ1

0

(h10 −EQ0

[Γ1

Γ0

])+ γ1

0

(EQ0

[Γ1

Γ0

]− g10

)(3.5.27)

+

M∑m=1

λm0

(am0 −EQ0

[Γ1

Γ0Sm1

])+

M∑m=1

λm0

(EQ0

[Γ1

Γ0Sm1

]− bm0

).

We can see that

inf(Λ,γ)∈R2M+2

+

L((Q,Γ ), (Λ, γ)) =

EQ0

[Γ1Γ0c1]

(Q,Γ ) ∈ MD−∞ otherwise.

(3.5.28)

Thus, by the strong linear programming duality

u0 = sup(Q,Γ )∈PM×R+

inf(Λ,γ)∈R2M+2

+

L((Q,Γ ), (Λ, γ)) (3.5.29)

= inf(Λ,γ)∈R2M+2

+

sup(Q,Γ )∈PM×R+

L((Q,Γ ), (Λ, γ)),

where PM signifies the set of all probability measures. For (Λ, γ) ∈ R2M+2+ consider

the zero cost portfolio of cash follow

(z0(Λ, γ), z1(Λ, γ)) = γ101

10 + γ10 1

10 +

M∑m=1

(λm0 Sm0 + λm0 S

m0). (3.5.30)

We see that

L((Q,Γ ), (Λ, γ)) = EQ0

[Γ1

Γ0(c1 − z1(Λ, γ))

]− z0(Λ, γ). (3.5.31)

Note that z0(Λ, γ) and Γ1/Γ0 are constants. Moreover, for u0 to be finite in (3.5.29)we need only to consider (Λ, γ) that makes c1 − z1(Λ, γ) ≤ 0. Let

ω ∈ argmaxc1 − z1(Λ, γ).

In (3.5.29), taking Q to be the probability measure concentrated at ω and lettingΓ1/Γ0 = g10 for all ω ∈ Ω, we derive

u0 = inf(Λ,γ)∈R2M+2

+

g10 sup

ω∈Ω[c1(ω)− z1(Λ, γ)(ω)]− z0(Λ, γ)

. (3.5.32)

We show that a solution (Λ, γ) to the minimization problem (3.5.32) provides asuper-hedging strategy when the bid price b0(c) for payoff c1 at t = 0 exceeds u0.We observe that (b0(c),−c1) is a zero cost cash flow. To get an arbitrage strategywe aquire the zero cost cash flows (3.5.30) with (Λ, γ) = (Λ, γ), (b0(c),−c1), and− supω∈Ω [c1(ω)−z1(Λ, γ)(ω)] units of 110. We can see that this portfolio’s cash flowis, at t = 0,

3.5 Conic Finance 97

z0(Λ, γ) + b0(c)− g10 supω∈Ω

[c1(ω)− z1(Λ, γ)(ω)] (3.5.33)

= z0(Λ, γ) + b0(c)− (u0 − z0(Λ, γ)) = b0(c)− u0 > 0,

and at t = 1,

z1(Λ, γ)− c1 + supω∈Ω

[c1(ω)− z1(Λ, γ)(ω)] ≥ 0. (3.5.34)

Thus, our portfolio is indeed an arbitrage.

4

Continuous Financial Models

Summary. We turn to discuss continuous financial models. These models in gen-eral involve infinite dimensional spaces and are more complex. Our focus here is touse relatively simple models to illustrate the convex duality between the price of acontingent claim and the process of cash borrowed in delta hedging. This revealsthe root of the convexity in contingent claims. Interestingly, when hedging with acontingent claim instead of the underlying, a similar duality in the sense of general-ized Fenchel conjugate holds. Correspondingly, this generalized duality leads to thegeneralized convexity of the contingent claims with many interesting applications.Much of the material presented in this chapter appear here for the first time.

4.1 Continuous Stochastic Processes

4.1.1 Continuous Stochastic Processes

A continuous stochastic process is a generalization of the discrete stochastic processthat we discussed before.

Definition 4.1.1 (Stochastic Process) Let (Ω,F , P ) be a probability space and let[0, T ] be an interval. We call (Xt), t ∈ [0, T ] a stochastic process if for every t, Xt isa random variable on (Ω,F , P ).

In financial applications the parameter t is usually time but not always. Forexample, it could be the so call local time when the calendar time is fixed at a pointand the parameter t, in fact, reflects the change in the price space. Similar to thediscrete case we also need to deal with gradually revealing information.

100 4 Continuous Financial Models

Definition 4.1.2 (Filtration) Let (Ω,F , P ) be a probability space and let [0, T ] bean interval. We say (Ft), t ∈ [0, T ] is a filtration if for every t, Ft ⊂ F is a σ-algebraand, for any s < t,

Fs ⊂ Ft.

As in the discrete case, Ft represents information available up to time t. The defi-nition implicitly assumes that information once become available will never be for-gotten.

Definition 4.1.3 (Adapted Stochastic Process) Let (Ft), t ∈ [0, T ] be a filtration onprobability space (Ω,F , P ). We say a stochastic process (Xt) is Ft-adapted providedthat, for every t, Xt is Ft measurable.

Intuitively, the value Xs of an adapted stochastic process becomes deterministicwhen the current time t > s.

4.1.2 Brownian Motion and Martingale

Brownian motion is a special continuous stochastic process that plays a crucial role infinancial modeling. It is named after the Scottish botanist Robert Brown who in 1828observed such a motion from pollen suspended in liquid. Louis Bachelier first used itto model the price of financial assets in his 1900 Ph. D. thesis and derived the famousBachelier formula for option pricing. The mathematical property of Brownian motionwas clearly elaborated by Robert Weiner who also provided a proof of the existenceof a Brownian motion by construction. Paul Samuelson proposed the widely usedgeometric Brownian motion model for stock price movements in 1965, which is morerealistic when modeling assets with nonnegative values. However, the geometricBrownian motion is continuous so that it does not allow any price jump which doeshappen to a stock price process from time to time. As the saying goes “All modelsare wrong. Some are wronger than others.” What we need to keep in mind is thatmodels are approximations of the reality. They are not reality.

Definition 4.1.4 (One-dimensional Brownian Motion) A stochastic process Bt :t ∈ [0, T ) is called a standard Brownian motion if

1. B0 = 0,

4.1 Continuous Stochastic Processes 101

2. for 0 ≤ t1 < t2 < . . . < tk ≤ T , the random variables

Bt2 −Bt1 , Bt3 −Bt2 , . . . , Btk −Btk−1

are independent,3. for 0 ≤ s ≤ t ≤ T , Bt − Bs has a Gaussian distribution with mean 0 and

variance t− s,4. for ω in a set of probability one, the path Bt(ω) is continuous.

Definition 4.1.5 (Multi-dimensional Brownian Motion) A vector stochastic pro-cess Bt : t ∈ [0, T ] in Rn is called a standard Brownian motion if Bt =(B1

t , B2t , . . . , B

nt ) where B

it, i = 1, 2, . . . , n are independent standard one-dimensional

Brownian motions. If Bt is a standard Brownian motion. Then x + Bt is called aBrownian motion starting from x.

Remark 4.1.6 The existence of a stochastic process satisfying all the conditionslaid out in Definition 4.1.4 is not automatically guaranteed. By and large, there aretwo ways to prove the existence:

• by construction pioneered by Wiener (see e.g. [55]), or• by Kolmogorov’s extension theorem (see e.g. [42]).

We are satisfied with known the existence of Brownian motions for our applications.If in a given probability space there is a Brownian motion then one can also

define a Brownian motion in a different yet similar probability space. Thus, Brownianmotion is not uniquely defined. However, since every Brownian motion has the sameproperties laid out in Definition 4.1.4, their effects are equivalent. We usually picka ‘convenient’ version for the purpose of a concrete application.

For each Brownian motion Bt, defining the σ−algebra represents the informationcontained in Bt up to time t by Ft we get a nature filtration associated with Bt. Infact, we can take Ft to be the σ−algebra generated by the collection of preimagesof Borel sets under Bs, s < t. In the sequel whenever we discuss a Brownian motionwe always assume that it is accompanied by this filtration.

Somewhat more general than a Brownian motion is the martingale process.

Definition 4.1.7 (Martingale) Let Ft be a filtration for the probability space (Ω,F , P ).We sayMt is a (P,Ft)−martingale ifMt is adapted to the filtrition Ft, for all t > 0,E|Mt| <∞ and for all s < t,

102 4 Continuous Financial Models

EP [Mt|Fs] =Ms.

Similar to the discrete case a martingale can be think of representing the wealthprocess in playing a fair game. A Brownian motion Bt is clearly a martingale andit is also easy to check that Mt = B2

t − t is also a martingale. So martingale is notnecessarily a Brownian motion. However, martingales are only slightly more generalthan the Brownian motion as the following Levy’s theorem shows (which we statewithout proof).

Theorem 4.1.8 (The Levy Characterization of Brownian Motion) Let X(t) =(X1(t), . . . , Xn(t)) be a continuous stochastic process on (Ω,F , Q). Then X(t) isa Brownian motion with respect to Q if and only if

(i) X(t) is a martingale w.r.t. Q, and(ii)Xi(t)Xj(t)− δijt is a martingale w.r.t. Q for all i, j = 1, . . . , n.

Here δij is the Kronecker delta defined by δij = 0 when i = j and δii = 1.

For n = 1 we have the characterization of one-dimensional Brownian motion.

Theorem 4.1.9 (The Levy Characterization of Brownian Motion) Let X(t) be ascalar continuous stochastic process on (Ω,F , Q). Then X(t) is a Brownian motionwith respect to Q if and only if

(i) X(t) is a martingale w.r.t. Q, and(ii)X2(t)− t is a martingale w.r.t. Q.

4.1.3 The Ito Formula

The Ito formula is an important tool in analyzing continuous stochastic processes.

Theorem 4.1.10 (Basic Form of the Ito formula)Let f(x, t) ∈ C2,1 and let Bt bea one dimensional Brownian motion. Then

4.1 Continuous Stochastic Processes 103

df(Bt, t) = ft(Bt, t)dt+ fx(Bt, t)dBt +1

2fxx(Bt, t)dt. (4.1.1)

The Ito formula presented in (4.1.1) is a shorthand for

f(Bt, t) = f(0, 0) +

∫ t

0

ft(Bs, s)ds (4.1.2)

+

∫ t

0

fx(Bs, s)dBs +1

2

∫ t

0

fxx(Bs, s)dt.

This formula (4.1.1) looks like an usual chain rule except for the last term. A rigorousproof is beyond the scope of this short book. Below are some heuristics that canhelp in understanding the Ito formula.

We know that f(Bt, t)− f(0, 0) =∫ t0df(Bt, t). Expand df(Bt, t) using the Tay-

lor’s expansion. Since terms of order o(dt) will vanish in the integration process weneed only do this to the second order. That gives us

df(Bt, t) = ft(Bt, t)dt+ fx(Bt, t)dBt +1

2fxx(Bt, t)dB

2t

+1

2ftt(Bt, t)dt

2 + ftx(Bt, t)dtdBt.

Since dt2, dtdBt are o(dt) the last two terms can be omitted and we have

df(Bt, t) = ft(Bt, t)dt+ fx(Bt, t)dBt +1

2fxx(Bt, t)dB

2t .

By the properties of the Brownian motion, we can replace dB2t by dt giving us the

Ito formula (4.1.1).Graphically we can illustrate by drawing the graph of fx around point Bt,

then df(Bt, t) is the area under the graph of fx (see Fig 4.1). We can see thatfx(Bt, t)dBt represnts the approximation of the area using Euler’s mathod while12fxx(Bt, t)dB

2t ∼ 1

2fxxdt corrects the “triangle” part to get to an approximation

using the trapezoid rule.The heuristic argument leads us to the following simple rule in handling the

differential term arising in the Taylor expansion of a function of the Ito processusually called box algebra.

dt dBtdt 0 0

dBt 0 dt

Example 4.1.11 Below is a nice application illustrating the power of the Ito for-mula. Define βk(t) = EBkt . Ito formula gives us

βk(t) =1

2k(k − 1)

∫ t

0

βk−2(s)ds.

104 4 Continuous Financial Models

B

f

fxdBt

12fxx(dBt)

2

O Bt Bt + dBt

fx(Bt + dBt, t)

fx(Bt, t)

fx(·, t)

Fig. 4.1. Graphic illustration of the Ito formula

We can use this to easily get EB3t = 0 and EB4

t = 3t2. Those are mostly used infinancial applications. By induction, in general EB2k+1

t = 0 and

EB2kt =

(2k)!tk

2kk!.

Ito ProcessesLet Bt be a one-dimensional Brownian motion with respect to filtration Ft on

(Ω,F , P ). Then

Xt = X0 +

∫ t

0

µ(s, ω)ds+

∫ t

0

σ(s, ω)dBs

is called a (1-dim) Ito processes if µ, σ are Ft adapted,

P

[∫ t

0

σ(s, ω)2ds <∞ for all t ≥ 0

]= 1

and

P

[∫ t

0

|µ(s, ω)|ds <∞ for all t ≥ 0

]= 1.

In shorthand we writedXt = µdt+ σdBt.

Here µ is a drift and σ indicates magnitude of the variation of the random part. Itis often useful to write stochastic process in this form if we can. A Brownian motionis an example of an Ito process where µ = 0 and σ = 1. The Ito formula can begeneralized to Ito process with dXt replacing dBt.

4.1 Continuous Stochastic Processes 105

Theorem 4.1.12 (The General Ito formula) Let f(t, x) ∈ C2 and let Xt be an Itoprocess. Then

df(Xt, t) = ft(Xt, t)dt+ fx(Xt, t)dXt +1

2fxx(Xt, t)(dXt)

2.

Example 4.1.13 Applying the Ito formula to f(x) = x2 we have∫ t

0

BsdBs =1

2(B2

t − t).

Example 4.1.14 (Integration by Parts) The pattern in handling f(x) = x2 holdsin more general setting. Let g(s) be a continuous function with bounded variationwith respect to s ∈ [0, t]. Applying the Ito formula to f(t, x) = g(t)x we have∫ t

0

g(s)dBs = g(t)Bt −∫ t

0

g′(s)Bsds.

Example 4.1.15 Here is an example of using the general Ito formula. Let Xt =µt+ σBt. Then dXt = µdt+ σdBt. Using the box algebra we have

df(Xt, t) = ftdt+ fxdXt +1

2fxxdX

2t

= ftdt+ µfxdt+ σfxdBt +1

2σ2fxxdt

Example 4.1.16 Letting f(t, x) = tx we have

tBt =

∫ t

0

Bsds+

∫ t

0

sdBs

or ∫ t

0

sdBs = tBt −∫ t

0

Bsds.

The Multidimensional Ito Formula

106 4 Continuous Financial Models

Let Xt = (X1t , . . . , X

nt ) be an n-dimensional Ito process satisfying

dXt = µdt+ σdBt,

where µ is an n-dimensional vector, σ an n ×m matrix and Bt an n-dimensionalBrownian motion. We require the components of µ and σ satisfy similar conditionsin the definition of the one-dimensional Ito process. Let g(t, x) : [0,∞)× Rn → Rp

has continuous second order partial derivatives. Then, for Yt = g(t,Xt),

dY kt =∂gkdt

dt+

n∑i=1

∂gk∂xi

dXit +

1

2

n∑i,j=1

∂2gk∂xi∂xj

dXitdX

jt . (4.1.3)

The following multi-dimensional box algebra is a convenient tool in simplifyingthe multi-dimensional Ito formula

dt dB1t dB

2t . . . dBnt

dt 0 0 0 . . . 0

dB1t 0 dt 0 . . . 0

dB2t 0 0 dt . . . 0

. . . . . . . . . . . . . . . . . .

dBnt 0 0 0 . . . dt

Example 4.1.17 (Integration by Parts) Let Xt, Yt be Ito processes in R. Applyingthe Ito formula to f(Xt, Yt) = XtYt we have

d(XtYt) = XtdYt + YtdXt + dXtdYt.

The integral form in the following is the general integration by parts formula∫ t

0

XsdYs = XtYt −X0Y0 −∫ t

0

YsdXs −∫ t

0

dXsdYs.

Remark 4.1.18 The term dXtdYt is called the quadratic covariation of Xt an Ytand is often denoted d⟨X,Y ⟩t.

Martingale RepresentationThe Ito formula is a crucial tool in proving the following important martingale

representation theorem. This representation theorem further highlights the closerelationship between martingales and Brownian motions. As an application orientedclass we will omit the proof and directly present the result.

4.1 Continuous Stochastic Processes 107

Theorem 4.1.19 (Martingale Representation) Let Bt be an n-dimensional Brow-nian motion generating filtration Fn

t . Suppose that Mt is an (P,Fnt )− martingale

and that Mt ∈ L2(P ) for all t ≥ 0. Then there exists a unique stochastic processv ∈ Vn such that

Mt = EM0 +

∫ t

0

vdBs.

Dual Ito FormulaLet f(x, t) ∈ C3,1 and let Xt be an Ito process. Then using the quadratic co-

variation in Remark 4.1.18 we can write the general Ito formula in Theorem 4.1.12as

df(Xt, t) = ft(Xt, t)dt+ fx(Xt, t)dXt +1

2d⟨fx(X, t), X⟩t. (4.1.4)

Now assume that f is convex in x for all t. We use f∗(y, t) to signify the conjugateof f with respect to variable x. Define Yt = fx(Xt, t). We see that Xt, Yt satisfiesthe Fenchel equality

f(Xt, t) + f∗(Yt, t) = XtYt. (4.1.5)

It follows that

ft(Xt, t) + f∗t (Yt, t) = 0, (4.1.6)

Yt = fx(Xt, t) and Xt = f∗y (Yt, t), (4.1.7)

and using Example 4.1.17

df(Xt, t) + df∗(Yt, t) = XtdYt + YtdXt + d⟨X,Y ⟩t. (4.1.8)

Combining (4.1.4), (4.1.6) and (4.1.8) we derive the following Dual Ito formula

df(Xt, t) = ft(Xt, t)dt+ YtdXt +1

2d⟨Y,X⟩t (4.1.9)

df∗(Yt, t) = f∗t (Yt, t)dt+XtdYt +

1

2d⟨X,Y ⟩t.

4.1.4 Girsanov Theorem

In financial applications, prices of stocks and other assets are often described by aIto process of the form

dSt = µdt+ σdBt

where µ models a drift reflecting the large trend of the asset price and σ describesthe volatility of the random fluctuation of the price process. In analyzing the priceprocess, the important part is the impact of σ. The Girsanov theorem allows usto ‘absorb’ the drift µ by using a change of the probability measure. This is verysimilar to the equivalent martingale measure that absorbs the excess gains for therisky assets in the discrete model.

108 4 Continuous Financial Models

Theorem 4.1.20 (Girsanov Theorem) Let St be an Ito process of the form

dSt = µ(t, ω)dt+ σ(t, ω)dBt, t ∈ [0, T ], S0 = 0,

where Bt is a (P,Ft)−Brownian motion and µ, σ are bounded and σ > c > 0 forsome constant c. Then

1. for u = µ/σ, Mt = exp(−∫ t0u(s, ω)dBs − 1

2

∫ t0u2(s, ω)ds

), t ∈ [0, T ], is a

(P,Ft)−martingale.2.

dQ(ω) =MT (ω)dP (ω).

is a probability measure on FT and3. B(t) =

∫ t0u(s, ω)ds+B(t) is a Brownian motion w.r.t. Q and

4.dSt = σ(t, ω)dB(t).

Proof. (Sketch) Let Xt =∫ t0u(s, ω)dBs +

12

∫ t0u2(s, ω)ds we have

dXt = udBt +1

2u2dt.

By direct calculation we have

dMt = −uexp(−Xt)dBt

and, therefore, Mt is a martingale by the martingale representation theorem.Since Mt is a martingale and M0 = 1,

Q(Ω) = EQ[1] = EP [MT ] = 1.

Thus, Q is a probability measure on FT . We note that dQ = MtdP on Ft. In fact,for any bounded Ft−measurable function f ,∫

Ω

fdQ =

∫Ω

fMT dP = E[fMT ] = E[E[fMT |Ft]]

= E[fEMT |Ft] = E[fMt] =

∫Ω

fMtdP.

To show Bt is a Brownian motion, we turn to check the conditions in the Levycharacterization of Theorem 4.1.8. We do only (i) and (ii) is similar. Using theproduct rule we can verify that MtBt is a martingale with respect to P . Now fors < t, and A ∈ Fs we have∫

A

EQ[Bt|Fs]dQ

=

∫A

BtdQ =

∫A

BtMtdP = EP [1AMtBt]

= EP [EP [1AMtBt|Fs]] = EP [1AMsBs]

=

∫A

BsMsdP =

∫A

BsdQ.

4.2 Bachelier and Black-Scholes Formulae 109

Since A ∈ Fs is arbitrary, EQ[Bt|Fs] = Bs. •Measure Q is called the martingale measure for process St.

4.2 Bachelier and Black-Scholes Formulae

4.2.1 Pricing Contingent Claims

Let St be an Ito process

dSt = µ(St, t)dt+ σ(St, t)dBt

that represents the price process of a certain financial asset. Here Bt is a Brownianmotion in a probability measure space (Ω,F , P ) with filtration Ft. Assume forsimplicity that the risk free interest rate is 0 and that µ, σ are bounded and σ ≥ c > 0for some constant c. Suppose that we want to price a European style contingent claimon St with the payoff f(ST ) at the maturity T . We can proceed as follows. Firstusing the Girsanov theorem we can write

dSt = σ(St, t)dWt

where Wt is a Brownian motion in (Ω,F , Q) with filtration Ft where Q is a mar-tingale measure for St equivalent to P . Similar to the discrete version of the funda-mental theorem of asset pricing, we can write down the no arbitrage price functionfor the contingent claim at any time t ∈ [0, T ] and price x as

v(x, t) = EQ[f(ST )|St = x]. (4.2.1)

Next we explicitly calculate the price function for call options under the Bachelierand Black-Scholes models.Bachelier Formula

Bachelier modeled the price of a stock in his 1900 pioneering paper [3] by

dSt = µdt+ σdBt

where µ and σ are constant. This model was thought unrealistic because stock pricecannot become negative. However, now we can see it as a good approximation forpair trading or forward for currency swap contracts. Consider the price of a calloption with a strike K maturing at T . Then formula (4.2.1) reduces to

B(x, t) = EQ[(ST −K)+|St = x], (4.2.2)

where Q is an equivalent martingale measure with respect to the price process St.Since under Q the dynamics of the price process is

dSt = σdWt

where Wt is a Q Brownian motion, we have

ST = x+√T − tσW1,

110 4 Continuous Financial Models

where W1 ∼ N(0, 1). Thus,

B(x, t) = EQ[(x+√T − tσW1 −K)+] (4.2.3)

=1√2π

∫ ∞

−∞(x−K +

√T − tσy)+e−

y2

2 dy

=1√2π

∫ ∞

K−xσ√T−t

(x−K +√T − tσy)e−

y2

2 dy

=1√2π

∫ x−Kσ√T−t

−∞(x−K −

√T − tσz)e−

z2

2 dz (z = −y)

We can write (4.2.3) concisely as

B(x, t) = (x−K)N

(x−K

σ√T − t

)+ σ

√T − tN ′

(x−K

σ√T − t

), (4.2.4)

where

N(t) =1√2π

∫ t

−∞e−

z2

2 dz.

Black-Scholes FormulaBlack and Scholes modeled the price of a stock as a geometric Brownian motion

dSt = µStdt+ σStdBt

where µ and σ are constant. Consider the price of a call option with a strike Kmaturing at T . Again formula (4.2.1) reduces to

C(x, t) = EQ[(ST −K)+|St = x], (4.2.5)

where Q is an equivalent martingale measure with respect to the price process St.Now under Q the dynamics of the price process is

dSt = σStdWt

where Wt is a Q Brownian motion. We have

ST = x exp

(−−σ2(T − t)

2+

√T − tσW1

), (4.2.6)

where W1 ∼ N(0, 1). Thus,

C(x, t) = EQ[(

x exp

(−−σ2(T − t)

2+

√T − tσW1

)−K

)+]

(4.2.7)

=1√2π

∫ ∞

−∞

(x exp

(−−σ2(T − t)

2+

√T − tσy

)−K

)+

e−y2

2 dy

=1√2π

∫ ∞

ln(Kx )+σ2(T−t)

2σ√T−t

(x exp

(−−σ2(T − t)

2+

√T − tσy

)−K

)e−

y2

2 dy

which can be represented as

C(x, t) = xN(d+)−KN(d−), (4.2.8)

where

d± =ln(xK

)± σ2(T−t)

2

σ√T − t

.

4.2 Bachelier and Black-Scholes Formulae 111

4.2.2 Convexity

Convexity and generalized convexity play important roles in dealing with optionpricing and hedging. Both Bachelier and Black-Scholes formulae involve interestingconvexity with respect to their various parameters.

We start with the Bachelier formula and use I =√T − tσ and forward price

X = x −K to simplify notation. We will also use their ratio moneyness m = X/I.Using these new variables the we can write the Bachelier formula (4.2.3) as

B(X, I) = EQ[(X + IW1)+] = XN

(X

I

)+ IN ′

(X

I

). (4.2.9)

Since for any fixed w, (X + Iw)+ is a sublinear function of (X, I), so is B(X, I).Thus, we have representation

B(X, I) = XBX + IBI . (4.2.10)

Comparing with (4.2.9) we see that

BX = N

(X

I

)and BI = N ′

(X

I

). (4.2.11)

We see that the sublinear property of the Bachelier formula brings us much conve-nience in calculating BX and BI .

The sublinearity of B also mens that its conjugate is an indicator function ofsome convex set M and we have the representation

B = σM and B∗ = ιM .

By the definition of conjugate function we can calculate that

M = (X∗, I∗) : I∗ +mX∗ ≤ mN(m) +N ′(m) (4.2.12)

= (N(m), I∗) : I∗ ≤ N ′(m),m ∈ R.

We now turn to the Black-Scholes formula. First direct calculation verifies

∂C(x, t)

∂x= N(d+). (4.2.13)

We observe that the variable x appears in the expressions of C(x, t) in three separateplaces. Yet curiously the calculation result of the partial derivative with respect tox contains only the partial derivative with respect to the linear term of x. Thisis rather similar to the simple formula for BX in (4.2.11). In the next section wewill show the reason is related to the convexity of C in x and Fenchel-Legendratransform of C in x is related to the delta hedging. It is nature to ask whetherC is also convex with respect to σ. It turns out the answer is negative. Yet if wecompensate C by a multiple of an at money call it becomes convex.

We start by calculating the partial derivative of C with respect to σ:

Cσ = xN ′(d+)∂d+∂σ

−KN ′(d−)∂d−∂σ

. (4.2.14)

Observing that

112 4 Continuous Financial Models

xN ′(d+) = KN ′(d−) =

√xK

2πexp

(− (ln(x/K))2

2τσ2− τσ2

8

)(4.2.15)

and

d+ − d− = σ√τ (4.2.16)

we can simplify the expression of Cσ to

Cσ =

√xKτ

2πexp

(− (ln(x/K))2

2τσ2− τσ2

8

). (4.2.17)

It follows that

Cσσ =

√xKτ

2πexp

(− (ln(x/K))2

2τσ2− τσ2

8

)((ln(x/K))2

τσ3− τσ

4

). (4.2.18)

Defining

f(σ) := C −√xK

[N

(√τσ

2

)−N

(−√τσ

2

)](note inside the hard bracket is the percentage premium of an at the money calloption) we have

f ′′(σ) = Cσσ +√xKτ

τσ

4N ′(√

τσ

2

)(4.2.19)

=√xKτN ′

(√τσ

2

)exp

(− (ln(x/K))2

2τσ2

)(ln(x/K))2

τσ3

+√xKτ

τσ

4N ′(√

τσ

2

)(1− exp

(− (ln(x/K))2

2τσ2

))≥ 0.

We note that

N

(√τσ

2

)−N

(−√τσ

2

)is the price of an at the money call. Thus, the Black-Scholes call price C compensatedby a multiple (−

√x/K) of an at the money call as a function of σ is convex. We

can also phrase this in terms of generalized convexity. Note that f is convex and,therefore, can be supported from below by an affine function. Thus, the Black-Scholes call price C as a function of σ can be supported from below by a functionof the form √

xK

[N

(√τσ

2

)−N

(−√τσ

2

)]+ yσ − b.

Define

c(σ, y) =√xK

[N

(√τσ

2

)−N

(−√τσ

2

)]+ yσ

Then the Black-Scholes call price C as a function of σ is Φc(1)−convex using thenotation in Section 1.5.

4.2 Bachelier and Black-Scholes Formulae 113

4.2.3 Duality

We turn to explore the reason why the derivative of the Black-Scholes call formulaC has a simple derivative with respect to x. To understand this phenomenon weneed to go back to the original derivation of the Black-Scholes formula in [5]. Blackand Scholes derive formula (4.2.8) by considering a portfolio of Nt shares of theunderlying to hedge a short position of one share of the European call option:

StNt − C(St, t). (4.2.20)

They want to choose Nt in such a way that the resulting portfolio (4.2.20) hasriskless gains, that is

NtdSt − dC(St, t) = 0. (4.2.21)

Using the Ito formula we have

NtdSt =∂C

∂xdSt +

(∂C

∂t+

1

2

∂2C

∂x2

)dt. (4.2.22)

It follows that

Nt =∂C

∂x(4.2.23)

and C must satisfies the Black-Scholes partial differential equation

∂C

∂t+

1

2

∂2C

∂x2= 0, (4.2.24)

with terminal condition

C(x, T ) = (x−K)+. (4.2.25)

The Black-Scholes partial differential equation (4.2.24) with the terminal condition(4.2.25) provide an alternative derivation of the Black-Scholes formula (4.2.8) viathe Feynmann-Kac formula.

Relationships (4.2.20) and (4.2.23) reveals that when portfolio (4.2.20) has risk-less gains its value equals to the Fenchel - Legendra transform of the no arbitrageoption price. Since Merton has shown that the Black-Scholes option price C(St, t)is convex in St, we have the following duality:

C∗(Nt, t) = supSt

[NtSt − C(St, t)], (4.2.26)

and

C(St, t) = supSt

[NtSt − C∗(St, t)], (4.2.27)

where the conjugate operation is with respect to the first variable. These relation-ships reveal that for each fixed t the option value is a convex function of the stockprice and the cash borrowed C∗(Nt, t) is a convex function of the share of the of thestock in the hedging portfolio. The same relationship also holds for the Bachelierformula. Thus, the simple form of the partial derivative of C in (4.2.13) is a conse-quence of the Fenchel-Young equaility in Proposition 1.3.1. This duality argumentalso explains the simplicity of BX but as mentioned before BX can be derived moredirectly using the sublinear property of the Bachelier formula B.

114 4 Continuous Financial Models

4.3 Duality and Delta Hedging

The duality relationship in delta hedging observed in the previous section for theBachelier and Black-Scholes formulae also holds in more general setting.

4.3.1 Delta Hedging

We consider a diffusion process St satisfying

dSt = σStdWt, (4.3.1)

whereWt is a standard Brownian motion under measure Q (so that Q is a martingalemeasure for St). We assume that the risk free rate is 0. Consider a contingent claimon St of European style with maturity at T > 0 and a terminal payoff f(ST ) att = T . Denoting the price of the European contingent claim at time t by v(St, t).We use a portfolio of Nt shares of the underlying St to hedge a short position of oneshare of the European call option:

StNt − v(St, t). (4.3.2)

The gain of this portfolio is

NtdSt − dv(St, t). (4.3.3)

Applying the Ito formula we get we can rewrite (4.3.3) as

NtdSt −(vt +

σ2x2

2vxx

)dt+ vxσdWt

To ensure a riskless gain we need

Nt = vx(St, t). (4.3.4)

Then the gain in portfolio reduces to(vt +

σ2x2

2vxx

)dt.

Now no arbitrage requires this quantity to be 0. Thus, v must satisfy the Black-Scholes PDE

vt +σ2x2

2vxx = 0. (4.3.5)

with terminal condition

v(x, T ) = f(x), (4.3.6)

where f is the payoff of the target at T .

4.3 Duality and Delta Hedging 115

4.3.2 Duality

Using (4.2.1) we know that

v(x, t) = EQ[f(ST )|St = x] (4.3.7)

= EQ[f(x exp

(−σ

2

2(T − t) +

√T − tσW1

)],

where W1 ∼ N(0, 1) under measure Q. Thus we see that v is convex in x providedthat f is convex.

Fixing t, vx(·, t) is a monotone increasing function. Thus, we can represent thepricing portfolio StNt − v(St, t) graphically in Fig. 4.2 and Fig. 4.3.

s

n

O St

Nt

vx(·, t)

v−1x (·, t)

s

n

O St

Nt

vx(·, t)

v−1x (·, t)

Fig. 4.2. Hedging portfolio

We see from those graphs the similarity with Fenchel duality. Indeed wheneverthe terminal payoff f of the European contingent claim is convex we have the fol-lowing duality relationship:

v∗(Nt, t) = supSt

[StNt − v(St, t)] (4.3.8)

and

v(St, t) = supNt

[StNt − v∗(Nt, t)]. (4.3.9)

Here the conjugate v∗ is the cash borrowed process when we maintaining a self-financing hedging portfolio. Relationship (4.3.8) corresponds to that the hedgingportfolio has riskless gain and relationship (4.3.9) shows that the hedging portfolioStNt − v∗(Nt, t) is self-financing.

To implement this hedging, Nt must satisfy the Fenchel equality

v(St, t) + v∗(Nt, t) = StNt. (4.3.10)

Then Nt = vx(St, t) is a function of St and St = v∗n(Nt, t) is a function of Nt.Moreover,

116 4 Continuous Financial Models

s

n

O St

Nt

vx(·, t)

v−1x (·, t)

Fig. 4.3. Equality holds when Nt = vx(St, t), St = v−1x (Nt, t)

∂v

∂t= −∂v

∂tand

vxxv∗nn = 1.

Substituting the above into (4.3.5) we derive

−∂v∗

∂t+σ2x2v2xx

2v∗nn = 0. (4.3.11)

4.3.3 Time Reversal

In particular, if we reverse the time by setting τ = T − t then equation (4.3.11)becomes

∂v∗

∂τ+σ2x2v2xx

2v∗nn = 0. (4.3.12)

Since equations (4.3.12) and (4.3.5) have the same form this suggests that in reversetime the cash borrowed process v∗ should be a martingale just like v is a martingalein time t.

Let us fix the notation first. We use τ to denote the reversed time. For a stochasticprocess Pt, t ∈ [0, T ] we define its time reversal by Pτ = Pt provided that t+ τ = T .Let us denote ∆ an infinitesimal increment of time. Setting τ + t+∆ = T , we have

dPt = Pt+∆ − Pt = Pτ − Pτ+∆ = −dPτ .

We note that if Wt is a Brownian motion under measure Q then so is Wτ underthe same measure. The time reversal of a function of a stochastic process is definedbelow using Nt = vx(St, t) as an example

Nτ = vx(Sτ , τ).

4.3 Duality and Delta Hedging 117

The time reversal for the differential of a product stochastic processes needs to bedealt with caution. For example, we can write (4.3.1) as

St+∆ − St = σSt(Wt+∆ −Wt).

Letting t+ τ +∆ = T we have

dSτ = Sτ+∆ − Sτ = −(St+∆ − St) = −dSt (4.3.13)

= −σSt(Wt+∆ −Wt) = −σSτ+∆(Wτ − Wτ+∆)

= σ(Sτ + dSτ )dWτ .

Iterating (4.3.13) and eliminating zero terms we have

dSτ = σ2Sτdτ + σSτdWτ . (4.3.14)

We see that although St is a martingale its time reversal Sτ is not.Now we turn to Nτ . Using Ito’s formula we have

dNτ =∂vx∂t

dτ +1

2

∂2vx∂x2

(dSτ )2 +

∂vx∂x

dSτ (4.3.15)

=

[∂vx∂t

+∂vx∂x

σ2Sτ +1

2

∂2vx∂x2

σ2S2τ

]dτ +

∂vx∂x

σSτdWτ .

Differentiating (4.3.5) with respect to x we have

∂vx∂t

+∂vx∂x

σ2x+1

2

∂2vx∂x2

σ2x2 = 0.

It follows that

dNτ =∂vx∂x

σSτdWτ (4.3.16)

is a martingale.Finally we consider the time reversal of the hedging portfolio (cash borrowed)

process Ht = v∗(Nt, t). Using the dual Ito formula (4.1.9) we have

dv = vtdt+NtdSt +1

2d⟨S,N⟩t (4.3.17)

dHt = dv∗ = v∗t dt+ StdNt +1

2d⟨S,N⟩t.

Combining (4.3.17) with the riskless gain condition dv = NtdSt and vt + v∗t = 0from (4.1.6) we have

dHt = Ht+∆ −Ht = StdNt + d⟨S,N⟩t (4.3.18)

= (St + dSt)dNt = St+∆(Nt+∆ −Nt).

Letting t+ τ +∆ = T we have

Hτ − Hτ+∆ = Sτ (Nτ − Nτ+∆)

or

dHτ = SτdNτ =∂vx∂x

σS2τdWτ . (4.3.19)

Thus, Hτ is also a martingale.

118 4 Continuous Financial Models

4.4 Generalized Duality and Hedging with ContingentClaims

Financial innovations in the past several decades have led to the creation of manynew types of financial derivatives. They become increasingly liquid and, thus, canalso be used as hedging devices. What happens when we use a contingent claiminstead the underlying to construct a hedging portfolio for the purpose of pricingand hedging a target contingent claim? It turns out that a duality also emergesbetween the value of the target contingent claim and the cash borrowed process interms of generalized duality which naturally corresponds to a generalized convexityconcept (see e.g. [39]). Moreover, similar to the classical option pricing theory, theno arbitrage value of the contingent claim derived this way preserves the generalizedconvexity of the terminal payoff.

4.4.1 Preservation of Generalized Convexity in the Value Functionof a Contingent Claim

Consistency of Generalized Convexity

Let St be a diffusion process

dSt = µ(St, t)dt+ σ(St, t)dWt, (4.4.1)

where Wt is a standard Brownian motion. We assume again that the risk free rateis 0. Consider a target contingent claim on St of European style with maturity atT > 0 and a terminal payoff f(ST ) at t = T . Suppose that a different contingentclaim, we call it hedging claim, on St is traded on the market with price p(St, t) atall time t ∈ [0, T ]. For uniqueness in what follows we always assume that p and v

are smooth functions bounded by αeβx2

for some α, β > 0. Our main result is:

Theorem 4.4.1 (Consistency of Generalized Convexity) Define ct(x, y) = p(x, t)yand assume that f is ΦcT (1)−convex. Then

(i) Partial differential equation vt +σ2

2vxx = 0, v(x, T ) = f(x), uniquely deter-

mines an arbitrage free price for the target claim;(ii) for any t ∈ [0, T ], v(·, t) is Φct(1)−convex; and(iii) Nt determined by

v(Nt, t)ct(1) + v(St, t) = p(St, t)Nt,

makes the portfolio of the hedging instrument and the riskless asset p(St, t)Nt −vct(1)(Nt, t) riskless.

4.4 Generalized Duality and Hedging with Contingent Claims 119

Proof. We price v by forming a potentially self-financing portfolio of staticallyshorting one share of the target contingent claim with Nt units of the hedgingclaim. Then

p(St, t)Nt − v(St, t). (4.4.2)

is the cash borrowed resulting from this portfolio. Self-financing implies that

Ntdp(St, t) = dv(St, t). (4.4.3)

Applying the Ito formula we get

Nt

[(pt + µpx +

σ2

2pxx

)dt+ pxσdWt

](4.4.4)

−(vt + µvx +

σ2

2vxx

)dt+ vxσdWt

To ensure riskless gains we need Nt to satisfy the equation

vx(St, t) = Ntpx(St, t). (4.4.5)

Then the gain in portfolio reduces to

Nt

(pt +

σ2

2pxx

)dt−

(vt +

σ2

2vxx

)dt.

Now no arbitrage requires this quantity to be 0. Thus

Nt

(pt +

σ2

2pxx

)dt =

(vt +

σ2

2vxx

)dt.

Since p is arbitrage free,

pt +σ2

2pxx = 0.

Thus, v must also satisfy the Black-Scholes PDE

vt +σ2

2vxx = 0. (4.4.6)

with terminal condition

v(x, T ) = f(x), (4.4.7)

where f is the payoff of the target at T .We show that vct(1)ct(2) satisfies the same Black-Scholes PDE as v does. Observe

that x → p(x, T ) is strictly monotone, which implies that x → p(x, t) is invertible,i.e., x = x(p, t). We can define

v(p, t) = v(x(p, t), t) + ιrange(p(·,t))(p).

Then we have

v∗(Nt, t) = supPt

[PtNt − v(Pt, t)]

= supSt

[p(St, t)Nt − v(St, t)] = vct(1)(Nt, t).

120 4 Continuous Financial Models

Similarly, for any Pt = p(St, t),

v∗∗(Pt, t) = supNt

[PtNt − v∗(Nt, t)]

= supNt

[p(St, t)Nt − vct(1)(Nt, t)] = vct(1)ct(2)(St, t).

Thus, we need only to show that v and v∗∗ satisfy the same Black-Scholes PDE.We do so through the PDE for the cash borrowed v∗. Changing variables we have

∂v

∂t=∂v

∂t+ vp

∂p

∂tvx = vppx

vxx = vppxx + vppp2x.

Substituting them into∂v

∂t+σ2

2vxx = 0

and using∂p

∂t+σ2

2pxx = 0

we have

∂v

∂t+σ2p2x2

vpp = 0. (4.4.8)

Thus, using Fenchel equality

v(Pt, t) + v∗(Nt, t) = PtNt

we have

n = vp, p = v∗n,∂v

∂t= −∂v

∂tand

vppv∗nn = 1.

Substituting the above into (4.4.8) we derive

−∂v∗

∂t+σ2p2xv

2pp

2v∗nn = 0. (4.4.9)

To derive the PDE for v∗∗ we start from Pt andNt satisfying the Fenchel equality

v∗∗(Pt, t) + v∗(Nt, t) = PtNt.

Then we have

n = v∗∗p , p = v∗n,∂v∗∗

∂t= −∂v

∂t

andv∗∗pp v

∗nn = 1.

Since vppv∗nn = 1 substituting the above relationship into (4.4.9) yields

∂v∗∗

∂t+σ2p2x2

v∗∗pp = 0.

4.4 Generalized Duality and Hedging with Contingent Claims 121

We see that v and v∗∗ satisfy the same Black-Scholes differential equaiton. Sincev(x, t) = v(p, t) and v∗∗(p, t) = vct(1)ct(2)(x, t) for x = x(p, t) we conclude thatv(x, t) and vct(1)ct(2)(x, t) also satisfy the same Black-Scholes differential equation.

Finally, since v(·, T ) is ΦcT (1)−convex we have v(x, T ) = vcT (1)cT (2)(x, T ). That

is v and vcT (1)cT (2) satisfy the same terminal condition. Thus, they must be thesame for all t, i.e. v(x, t) = vct(1)ct(2)(x, t) so that v(·, t) is Φct(1)−convex. •

Remark 4.4.2 Function ct(x, y) = p(x, t)y is known when we know the price ofclaim p that we use to hedge.

Fixing t and defining v(p, t) = v(x(p, t), t), we can represent the portfoliop(St, t)Nt − v(St, t) graphically in Fig 4.4-4.5

s

n

px(·, t)

O St

Nt

vp(·, t)

v−1p (·, t)

s

n

px(·, t)

O St

Nt

vp(·, t)

v−1p (·, t)

Fig. 4.4. Hedging portfolio

We see that these graphs are almost exact replications of the graphic representa-tion of the hedging portfolio StNt−v(St, t). The only difference is that the sn-planeis weighted by px(·, t). This implies the following generalized Fenchel duality rela-tionship .

vct(1)(Nt, t) = supSt

[p(St, t)Nt − v(St, t)] (4.4.10)

and

v(St, t) = supNt

[p(St, t)Nt − vct(1)(Nt, t)]. (4.4.11)

122 4 Continuous Financial Models

s

n

px(·, t)

O St

Nt

vp(·, t)

v−1p (·, t)

Fig. 4.5. Equality holds when px(St, t)Nt = vx(St, t)

Relationship (4.4.10) can be interpreted as a cash borrowed process having theproperty of riskless gains and equation (4.4.11) shows that the hedging portfoliop(St, t)Nt − vct(1)(Nt, t) of the hedging claim and cash is self-financing. The key ofthe formal proof of Theorem 4.4.1 is to verify that v(·, t) is Φct(1)−convex.

4.4.2 Determining the Hedging Process

While in principle the PDE with terminal condition (4.4.6) and (4.4.7) determinesan arbitrage free and Φct(1)-convexity preserving contingent claim pricing functionv, to determine the hedging process one must know the dynamics of Nt and Ht =v(·, t)ct(1)(Nt).

Defining n(x, t) := vx(x, t)/px(x, t), equation (4.4.5) implies that the hedgingprocess is

Nt = n(St, t). (4.4.12)

Differentiating (4.4.6) with respect to x we derive the PDE governing n:

nt +σ2

2nxx = −nxσ

px(pxσ)x. (4.4.13)

We turn to the hedging process Nt. Using Ito’s formula we have

dNt =

(nt + µnx +

σ2

2nxx

)dt+ nxσdWt (4.4.14)

Using (4.4.13) we can simplify (4.4.14) to

dNt = nx

([µ− σ

(pxσ)xpx

]dt+ σdWt

)(4.4.15)

4.4 Generalized Duality and Hedging with Contingent Claims 123

We see that Nt is in general not a martingale unless µ− σ (pxσ)xpx

= 0.Next we discuss the dynamic of the cash borrowed process Ht. We have seen

that no arbitrage forces v(·, t) = v(·, t)ct(1)ct(2). Thus, by (4.4.10) and (4.4.11) wehave

Ht(Nt) + v(St, t) = p(St, t)Nt. (4.4.16)

Due to the self-financing condition (3.2.4) we have

dHt = pdNt + d⟨p,N⟩t (4.4.17)

= nxpσdWt + σ2

[p2xnx +

1

2pxxp

2xn− pnx

(pxσ)xpxσ

]dt

In general Ht is not a martingale. However, in some special case it could be. Forexample, if p(x, t) = x, i.e. the hedging is done with the price process St itself thenpx = 1, pxx = 0 and equation (4.4.17) is simplified to

dHt = σnx (StdWt + [σ − Stσx] dt) . (4.4.18)

Now when St follows a geometric Brownian motion where σ(x, t) = σ(t)x, we haveσ = xσx and Ht is a martingale.

4.4.3 Hedging with p−multiple ETF

Double long and short ETFs and triple long and short ETFs are now availablefor many major indices. They provides convenient tools for hedging. We discuss inthis section the general p−multiple long ETF as a hedging tool. We will need thefollowing special case of Theorem 2.2.3.

Proposition 4.4.3 The function xq, x ≥ 0 is Φ[xpy](1)−convex if either q > 0 andp < q or q < 0 and q < p. Similarly, the function −xq, x ≥ 0 is Φ[xpy](1)−convex ifeither p > q > 0 or p < q < 0.

Proof. We prove only for the case xq. The discussion for −xq is similar. Letu(x) = xq, x ≥ 0. It is easy to calculate that

R(x) = −xu′′(x)

u′(x)= 1− q. (4.4.19)

When q > 0 and p < q, u is an increasing function and R(x) = 1 − q < 1 − p andwhen q < 0 and p > q, u is a decreasing function and R(x) = 1 − q > 1 − p. Nowthe conclusion of the proposition directly follows that of Theorem 2.2.3. •

Suppose St satisfies the diffusion process

124 4 Continuous Financial Models

dSt = σStdBt. (4.4.20)

Consider an Enropean styled option with payoff SqT at t = T . Denote the value ofthis option at time t by v(St, t). Solving (4.4.6) with terminal condition v(ST , T ) =SqT , we can determine that

v(St, t) = Sqt eq(q−1)

2σ2(T−t).

It is easy to verify thatdv(St, t)

v(St, t)= q

dStSt

.

Thus, v is a q-multiple of St. Similarly, a p-multiple of St has a no arbitrary price

Pt = Spt ep(p−1)

2σ2(T−t).

Theorem 4.4.4 (Hedging with Multiple of ETF) Let St be the price of an assetsatisfying the diffusion equation (4.4.20). Suppose that either q > 0 and p < qor q < 0 and q < p. Then a q-multiple long ETF of St, t ∈ [0, T ] can alwaysbe dynamically hedged with an arbitrage free self-financing portfolio involving a p-multiple ETF of St. Moreover, for any t ∈ [0, T ], the arbitrage free price of theq-multiple ETF is Φ[xpy](1)-convex.

Proof. By Theorem 4.4.1 we need only to check that v(x, T ) = xq is Φ[xpy](1)-convex. This follows directly from Proposition 4.4.3. •

In this case we can explicitly calculate that the hedging process is

Nt =q

pSq−pt e[

q(q−1)2

− p(p−1)2

]σ2(T−t)

and the cash borrowed process is

Ht =q − p

pv(St, t).

Note that the cash borrowed process is always a martingale. In particular, for q = 4and p = 2, we see that the no arbitrage price of the quadruple long ETF at anygiven time t ∈ [0, T ] is Φ[x2y](1)−convex and such a process can be hedged by adouble ETF.

Remark 4.4.5 It is worthy to observe that when q ∈ (0, 1) and p < q the Φ[xpy](1)-convex functions are, in fact, concave. We can see that Φ[xpy](1)-convex functionsrepresent a wide spectrum of convex and concave functions with different strengths.A few graphic illustrations are included in Fig 4.6–4.9.

4.4 Generalized Duality and Hedging with Contingent Claims 125

x

y

Fig. 4.6. Graphic illustration of q = 4 and p = 2

x

y

Fig. 4.7. Graphic illustration of q = 1/2 and p = 1/4

The above discussion can be applied to q-multiple short ETF of St. We summa-rize the result in the following Theorem.

126 4 Continuous Financial Models

x

y

Fig. 4.8. Graphic illustration of q = 1/2 and p = −1/2

x

y

Fig. 4.9. Graphic illustration of q = −2 and p = −1/2

4.4 Generalized Duality and Hedging with Contingent Claims 127

Theorem 4.4.6 Let St be the price of an asset satisfying the diffusion equation(4.4.20). Suppose that either p > q > 0 or p < q < 0 and q < p. Then a q-multipleshort ETF of St, t ∈ [0, T ] can always be dynamically hedged with an arbitrage freeself-financing portfolio involving a p-multiple long ETF of St. Moreover, for anyt ∈ [0, T ], the arbitrage free price of the q-multiple short ETF is Φ[xpy](1)-convex.

Proof. The proof is the same as that of the proof of Theorem 4.4.4 except we needto use the second part of Proposition 4.4.3. •

Generalized convexity is also shows up in other financial related functions. Thefollowing are two simple examples.

Example 4.4.7 (Stock Price as a Contingent Claim of Company’s Asset)Leland proposed the following perspective of stock price in [31]. Consider a

company’s activity has value at at t ∈ [0,+∞) with dynamics

dat = σatdWt.

where σ is a constant. Assume that the risk free rate is r and that there is nodividend. Let’s first view the stock price S(at) as a perpetual claim on at. ThenS(at) satisfies the ordinary differential equation

σ2x2

2Sxx + rxSx − rS = 0.

So thatS(at) = bat − caqt ,

where q = −r/σ2 < 0, b, c > 0.Now suppose that the company has outstanding bond maturing at T with a total

amount K. Then the stock price u becomes a contingent claim on at with terminalpayoff

u(aT , T ) = (baT − caqT −K)+.

It is easy to check that for x sufficiently large u is an increasing function and

−xu′′(x)

u′(x)≤ 1− q.

Thus, for K sufficiently large u(x, T ) is a Φ[xqy](1)−convex function. It follows fromTheorem 4.4.1 that, u(·, t) is also Φ[xqy](1)−convex.

128 4 Continuous Financial Models

Example 4.4.8 (Normal Kernel) Consider the scaled normal kernel

n(x) = e−kx2/2, x ≥ 0, k > 0.

We can verify that −xn′′(x)/n′(x) = kx2 − 1 ≥ −1 but there is no upper bound.

Thus, the decreasing function e−kx2/2, x ≥ 0 is Φ[xpy](1)-convex for any p ≥ 2. Due

to the symmetry of both e−kx2/2 and |x|py − b with respect to the vertical axis we

conclude that this property also holds when x < 0. So that e−x2/2 is Φ[|x|py](1)-

convex for any p ≥ 2..

We note that in both Example 4.4.7 and Example 4.4.8 the functions involvedare neither convex nor concave.

4.4.4 Reducing the Volatility of the Hedging Process

When there are multiple hedging claims available in the market, it is usually thecase that for a given target contingent claim there are many different ways to hedge.Choosing an appropriate hedging device that fits better in generalized convexityoften can help reducing the volatility of the hedging process.

Example 4.4.9 (Hedging q-multiple Long ETF Using p-multiple) Suppose that Stis a diffusion process

dSt = σStdWt, t ∈ [0, T ].

Let v be the value of the q-multiple long ETF of St. Suppose either q > 0, p < q orq < 0,p > q Then the process for the hedging shares has been explicitly calculated as

Nt =q

pSq−pt e[

q(q−1)2

− p(p−1)2

]σ2(T−t)

and the cash borrowed process is

Ht =q − p

pv(St, t).

Note that the closer the p to q, the smoother the cash borrowed process Ht which isan proxy for the value of the hedging portfolio.

Example 4.4.10 (Normal Kernel) Now consider St following a Bachelier modelSt = σWt and let v(St, t), t ∈ [0, T ] be the no arbitrage price of a contingent claim

with payoff f(x) = e−x2/2 at T .

4.4 Generalized Duality and Hedging with Contingent Claims 129

It is easy to directly calculate that

v(St, t) =1√

σ2(T − t) + 1exp

(− S2

t

2(σ2(T − t) + 1)

).

In this case, we can dynamically replicate v using either St (v is not convex in St)or its double long ETF Pt = S2

t + σ2(T − t) with respect to which v is convex.When hedging with St we can calculate that share of hedging NS

t = vx =−Stv/(σ2(T − t) + 1). The cash borrowed process is

HSt = StN

St − v = −S

2t + σ2(T − t) + 1

σ2(T − t) + 1v.

When hedging with Pt we can similarly calculate that the share of hedgingNPt = vx/px = −v/2(σ2(T − t) + 1). The cash borrowed process becomes

HPt = StN

Pt − v = −S

2t /2 + 3σ2(T − t)/2 + 1

σ2(T − t) + 1v.

We can see that hedging with Pt results in a smoother cash borrowed process becausethe random change related to the uncertain stock price is only half that of hedgingwith St.

4.4.5 The Volatility Trade

Now consider St following a diffusion process

dSt = σtStdWt, t ∈ [0, T ].

Let us assume that the volatility σ2t is unknown. We further assume that the market

implies a constant volatility σ2h which is, say, known to be too high by a certain

trader. Can he take advantage of the situation? Carr and Madan has shown in [11]that the answer is yes if there is a contingent claim whose no arbitrage price v(St, t)is convex in St.

In this example we show that generalize convexity can help us to derive a similarvolatility trade when v(St, t) has a certain generalized convexity properties. Letp(St, t) be the no arbitrage price of a hedging claim with p(·, t) strictly monotone.Let ct(x, y) = p(x, t)y. We assume that v(·, T ) is ΦcT (1)−convex but not necessarilyconvex in St such as in Examples 4.4.7 and 4.4.8.

Denote againv(p, t, σh) = v(x(p, t), t, σh).

We have already seen that v(p, t, σh) is convex in p. Here σh is added to emphasizethat the trader views that v(p, t, σh) follows the constant volatility σh implied bythe market in trading.

Ito’s formula tells us that

v(ST , T )− v(St, t)−∫ T

t

vpdPt =

∫ T

t

[∂v

∂s+vppp

2xS

2s

2σ2s

]ds.

130 4 Continuous Financial Models

The left hand is the trading portfolio and the right hand is the P&L. Since thetrader follows the constant volatility σh implied by the market in trading

∂v

∂t= − vppp

2xS

2t

2σ2h.

Thus,

P&L =

∫ T

t

vppp2xS

2s

2(σ2s − σ2

h)ds

where vpp > 0. We see that the trader can take advantage of the over estimation onvolatility by the market by dynamically trading the portfolio

v(ST , T )− v(St, t)−∫ T

t

vpdPt.

Comments

Chapter 1 Sections 1.1-1.4 give a concise summary of standard convex analysisduality theory, which is pioneered by Fenchel [18], Moreau [41] and Rockafellar [43].Our exposition follows [9, 21] emphasizing the variational approach by focusing onconvex programming. We also highlight the role of subdifferential of the optimalvalue function as the set of Lagrange multipliers and the set of dual solutions.

Generalized convexity, conjugacy and related duality discussed in Section 1.5can be traced back to Moreau. It gained more attention recently due to diverse ap-plications and also due to its role in mass transport theory [61]. Our main referenceshere are [30, 39]. Their applications in hedging with contingent claims are discussedin Section 4.4.

Chapter 2 Section 2.1 provides a unified treatment of the classical Markowitzportfolio theory [36], CAPM model [51] and Sharpe ratio [52]. Following [67] weemphasize that the underlying mathematical tools for all these applications areminimizing a quadratic function with linear constraint, a simplest form of convexprogramming. Convex duality is essential in revealing the structure of the solutionswith a practical financial meaning.

Section 2.2 deals with the portfolio problem from the perspective of utility op-timization. Utility function has a long history goes back to the work of Bernoulli[4] in 1738 related to the St. Petersburg paradox. The relevance to financial prob-lem comes in as optimizing the utility of a portfolio simultaneously accounts forinvestors pursuing capital growth and risk aversion. The concavity of utility func-tions means convex analysis is essential. Different agents have different degree of riskaversion. They can be measured by using either absolute risk aversion coefficientsor relative risk aversion coefficients [1, 49]. Interestingly, utility functions with thoserisk aversion coefficients bounded at a given level can be characterized by general-ized convexity discussed in Section 1.5. These new characterizations are included insubsection 2.2.2.

Growth optimal portfolio theory [32] and Kelly’es criterion [27, 34, 35, 56, 57,58] as a money management tool in investment are discussed as an illustration ofsuch utility optimization problems. In particular, following [66] we highlights thatoptimizing the expected log utility for a portfolio of cash and a given investmentstrategy on historical performance data amounts to measure the useful information

132 4 Comments

implied by the investment strategy and can be used as a measure to compare differentinvestment strategies.

Fundamental theorem of asset pricing (FTAP) relates no arbitrage to the exis-tence of a martingale measure that can be used to price assets in a financial market.Cox, Ross and Rubinstein observed such a principle in their classical work relatedto option pricing in complete markets [12, 13]. General FTAPs were discussed in[23, 22, 29, 15] with progressing generality, usually with a proof based on separa-tion arguments. Dybvig and Ross [17] observed that in an incomplete market themartingale measures are related to the risk aversion of market agent. In Section 2.3we approach the FTAP from the perspective of convex duality. We show that in anincomplete market, a martingale measure is, in fact, a scaling of the dual solutionto a portfolio utility maximization problem. We also illustrate with example thatthis relationship helps us to understand that in an incomplete market, a martingalemeasure provides a reference price for a certain agent to improve their utility ratherthan arbitrage.

Section 2.4 deals with risk measures, a concept that plays important roles forboth financial institutions and regulatory agencies. Diversification reduces risk im-plies the convexity of risk measures. We focus on coherent risk measures proposedby Artzner, Delbaen, Eber and Heath in [2]. Coherent risk measures are sublinear, aparticular type of convex function. Duality is involved in providing a dual character-ization of a coherent risk measure as the conjugate of an indicator function of a cone,called acceptance cone. Interestingly, the generating set for the acceptance cone isclosely related to the practice of stress tests. Convex duality also provides severalequivalent description of the coherent risk measures in terms of linear preferenceand value bonds. Moreover, the same argument is at the core of the discussion ofgood deal in financial markets as explained in Jaschke and Kuchler [24]. Beside pro-viding a framework to understand risk measures and their relationship with otherimportant financial concepts, convex duality methods also help to amend widelyused nonconvex risk measure value at risk [25] to the convex conditional value atrisk proposed by Rockafellar and Uryasev in [44, 45].

Chapter 3 Sections 3.1–3.3 demonstrate that many of the results in the previouschapter also persist in the more general setting of a multiperiod economy. We usethe general model laid out in S. Roman’s textbook [47].

Section 3.4 discusses super hedging (and symmetrically subhedging) bounds inincomplete markets. Following Kahale [26] we show that the super hedging bound ofa given contingent claim has a dual representation. On one hand it is the supremumof all the prices derived through martingale measure and on the other hand it canbe represented as the cost of the smallest super hedging portfolio. When the samplespace is finite, the super hedging portfolio in the second representation can be derivedby solving a linear programming problem. The linear programming duality can alsobe used to analyze narrowing the gap between the super and sub- hedging boundsby adding contingent claims with known prices. When discussing contingent claimsrelated to currency spread, incomplete markets may arise from complete markets.Considering supper hedging bounds in this kind of problems usually leads to aKantorovich mass transportation problem [61]. We illustrate the solution processwith an example on a finite sample space using linear programming duality.

Section 3.5 discusses a model for financial markets with bid and ask spread. Themain difference with a simplified one price financial market is that the attainable

4 Comments 133

payoff set due to trading is, in general, a convex cone rather than a subspace. Thisleads to the title conic finance as coined by Madan in [37, 38]. Besides a conciserepresentation of the basic conic finance model, we also discuss new refined fun-damental theorem of asset pricing as well as super and sub-hedging price bounds.These results are taken from [60] emphasizing the role of convex duality.

Chapter 4 Section 4.1 summarizes facts on continuous models that we need later.To be concise we are satisfied with a heuristic description of most of the material.Readers interested in further details may consult [42, 55]. The dual Ito formula is afirst taste of the role of duality in continuous model. It develops the generalized Itoformula using quadratic covariance in [19].

Section 4.2 discusses convexity and generalized convexity emerged in Bachelierand Black-Scholes formulae. The importance of these convexity properties is high-lighted in applying them in the computation of Greeks and in illustrating the deltahedging is, in fact, the Fenchel-Legendra transform of the pricing formula. This isthe observation in Carr [10] for more general settings and discussed in greater detailin Section 4.3.

It turns out that if one hedges using a contingent claim rather than the under-lying itself, similar duality still persists in the sense of generalized duality that wediscuss in Section 4.4. The general principles are summarized in Section 4.4.1 and4.4.2. A number of examples are included to illustrate their applications in financialpractice. How to hedge with the popular multiple ETFs of indices are discussed indetail in Section 4.4.3. What are also discussed in this section are examples of gener-alized convexity of Leland’s model of stock price as contingent claims of company’sassets [31] and the general convexity of the normal kernel. The common theme hereis that they all follow from characterizations of the generalized convexity using therelative risk aversion coefficient and the absolute risk aversion coefficient. Hedgingwith derivatives can help to reduce the risk and to expand the range of volatilitytrading which is proposed in [11]. These are discussed in Sections 4.4.4 and 4.4.5,respectively. Much of the materials regarding to these duality and generalized dual-ity relationships appear here for the first time. We believe that this is an area thatworthy further attention.

Inaddition, survey papers [14, 48, 64, 67] have also been valuable references.

References

1. K. J. Arrow, Aspects of the Theory of Risk Bearing. The Theory of Risk Aver-sion. Helsinki: Yrjo Jahnssonin Saatio. Reprinted in: Essays in the Theory ofRisk Bearing, Markham Publ. Co., Chicago, 1971, 90109, 1965.

2. P. Artzner, F. Delbaen, J.-M. Eber and D. Heath, Coherent measures of risk,Mathematical Finance, 9 (1999), 203228.

3. L. Bachelier, “Theorie de la speculation”, Annales Scientifiques de l’Ecole Nor-male Superieure, 3 (17), (1900) pp. 2186.

4. D. Bernoulli. Exposition of a new theory on the measurement of risk. Econo-metrica, 22:23–36, 1954/1738.

5. F. Black and M. Scholes, The pricing of options and corporate liabilities, Journalof Political Economy, 81:637–645, 1973

6. T. Bjork, Arbitrage Theory in Continuous Time, Oxford University Press, NewYork, 2009.

7. J.M. Borwein and A.S. Lewis, Convex analysis and nonlinear optimization,Springer-Verlag, 2000. Second edition, 2005.

8. J.M. Borwein and Q.J. Zhu, Techniques of Variational Analysis, Springer-Verlag, 2005.

9. J.M. Borwein and Q.J. Zhu, A variational approach to Lagrange multipliers, JOptim Theory Appl (2016) 171: 727-756. doi:10.1007/s10957-015-0756-2

10. P. Carr, Option as Optimization: A Dual Approach to Derivatives Pricing,Quant USA, New York, July, 2014.

11. P. Carr and D. Madan, Toward a theory of volatility trading, 2002.12. J. Cox and S. Ross. The valuation of options for alternative stochastic processes.

Journal of Financial Economics, 3:144–166, 1976.13. J. Cox, S. Ross, and M. Rubinstein. Option pricing: a simplified approach.

Journal of Financial Economics, 7:229–263, 1979.14. K. R. Dahl, Convex Duality and Mathematical Finance, Thesis for M. Sci. at

University of Oslo, 2012.15. F. Delbaen and W. Schachermayer. A general version of the fundamental theo-

rem of asset pricing. Math. Ann., 300:463–520, 1994.16. S. Doleski and S. Kurcyusz, On Φ− convexity in extremal problems, SIAM J.

Control and Optimization, 16, (1978) 277-300.17. P. Dybvig and S. A. Ross. Arbitrage, state prices and portfolio theory. Handbook

of the Economics of Finance, 2003.

136 References

18. W. Fenchel, Convex cones, sets and functions, Lecture Notes, Princeton Uni-versity, Princeton, NJ, 1951.

19. Follmer H., Protter P., and Shiryaev A.N., “Quadratic covariation and an ex-tension of Ito’s formula”, Bernoulli, 1: 149–169, 1995.

20. A. H. Hamel and F. Heyde, Duality for set-valued measures of risk, SIAM J.Financial Math. 1 (2010) 66-95.

21. D. Gale, A geometric duality theorem with economic applications, The Reviewof Economic Studies, 34 (1967), 19–24.

22. J.M. Harrison and S. Pliska. Martingales and stochastic integrals in the theoryof continuous trading. Stochastic Processes Appl., 11:215–260, 1981.

23. J.M. Harrison and D. M. Kreps, Martingales and arbitrage in multiperiodsecurities markets, J. Econom. Theory, 20 (1979) 381–408.

24. S. Jaschke and U. Kuchler, Coherent risk measures and good-deal bounds, Fi-nance Stochast. 5 (2001) 181-200.

25. P. Jorion. Value at Risk. McGraw-Hill, New York, 1997.26. N. Kahale, Sparse calibrations of contingent claims, Mathematical Finance, 20

(2010) 105-115.27. J. L. Kelly. A new interpretation of information rate. Bell System Technical

Journal, 35:917–926, 1956.28. A.J. King Duality and martingale: a stochastic programming perspective on

contingent claims. Math. Progam., Ser. B, 91:543–562, 2002.29. D. Kramkov and W. Schachermayer. The asymptotic elasticity of utility func-

tions and optimal investment in incomplete markets. Ann. Appl. Probability,9:904–950, 1999.

30. S.S. Kutateladze and A.M. Rubinov, Minkowski duality and its applications,Russian Math. Surveys, 27 (1972) 137-192.

31. H. Leland, Corporate debt value, bond covenants, and optimal capital structure.Journal of Finance, 49(4):1213–1252, September 1994.

32. J Lintner. The valuation of risk assets and the selection of risky investmentsin stock portfolios and capital budgets. Review of Economics and Statistics,47:13–37, 1965.

33. Lopez de Prado, M. Vince, R. and Zhu, Q.J. Optimal Risk Budgeting under aFinite Investment Horizon, (2013) SSRN 2364092.

34. L. C. Maclean, E. O. Thorp, and W. T. Ziemba. Good and bad properties ofthe Kelly criterion. in “The Kelly Capital Growth Investment Criterion, Theoryand Practice”, Maclean, L. C. and Thorp, E. O. and Ziemba, W. T. Editors,pages 563–574, 2010.

35. L. C. Maclean, E. O. Thorp, and W. T. Ziemba. Editors: The Kelly CapitalGrowth Investment Criterion, Theory and Practice. World Scientific Handbookin Financial Economics Series: Volume 3. World Scientific, Singapore, 2011.

36. H. Markowitz. Portfolio Selection. Cowles Monograph, 16. Wiley, New York,1959.

37. D. Madan, Asset Pricing Theory for Two Price Economies, 2014.38. D. Madan and W. Schoutens, Applied Conic Finance, Cambridge University

Press, 2016.39. J. E. Martinez-Legaz, Generalized convex duality and its economic applications.40. R. Merton, Theory of rational option pricing, Bell Journal of Economics and

Management Sciences, 4:141–183, 1973.41. J. J. Moreau, Fonctionelles convexes, College de France, Lecture notes, 1967.

References 137

42. B. Oksendal, Stochastic Differential Equations, 6th edition, Springer, 2003.43. R. T. Rockafellar, Convex Analysis, 1970.44. R.T. Rockafellar and S. Uryasev, Optimization of conditional value at risk, J.

Risk, 2 (2000) 21–41.45. R. T. Rockafellar and S. Uryasev, Conditional value-at-risk for general loss dis-

tributions, Journal of Banking and Finance 26 (2002) 1443–1471.46. R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer, 1998.47. S. Roman. Introduction to the Mathematics of Finance. Springer, New York,

2004.48. Convex duality in stochastic optimization and mathematical nance, Math. Oper.

Res. 36 (2011) 340–362.49. J. W. Pratt, Risk Aversion in the Small and in the Large, Econometrica. 32

(12): 122136, 1964.50. C. Shannon and W. Weaver, The Mathematical Theory of Communication.

Urbana, Illinois: University of Illinois Press, 1949.51. W. F. Sharpe. Capital asset prices: A theory of market equilibrium under

conditions of risk.52. W. F. Sharpe. Mutual fund performance. Journal of Business, 1:119–138, 1966.

Journal of Finance, 19:425–442, 1964.53. S. E. Shreve, Stochastical Calculus for Finance I, Springer, 2004.54. S. E. Shreve, Stochastical Calculus for Finance II, Springer, 2004.55. J. M. Steele, Stochastic Calculus and Financial Applications, Springer, 2001.56. Edward O. Thorp. Beat the Dealer. Random House, New York, 1962.57. Edward O. Thorp. Portfolio choice and the Kelly criterion. Business and

Econom. Statist. Proc. of American Statistical Association, pages 215–224, 1971.58. Edward O. Thorp and Sheen T. Kassouf. Beat the Market. Random House,

New York, 1967.59. Y. Tsuzuki, On the optimal super and sub-hedging strategies, CARF, preprints,

The university of Tokyo, 2013.60. M. Vazifedan and Q.J. Zhu, No arbitrage principle in conic finance, working

paper.61. C. Villani, Topics in Optimal Transportation, Graduate Studies in Mathematics

(58), America Math. Soc. 2003.62. R Vince. The New money Management: A Framework for Asset Allocation.

John Wiley and Sons, New York, 1995.63. Vince, R. and Zhu, Q.J., Optimal Betting Sizes for the Game of Blackjack,

SSRN 2324852, to appear Journal of Portfolio Management, 2016.64. J. Xia and J. A. Yan, Convex duality theory for optimal investment, preprint,

2006.65. C. Zalinescu, On duality gaps in linear conic problems, Preprint, School of In-

dustrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA2010. (www.optimization-online.org/DBHTML/2010/09/2737.html).

66. Q. J. Zhu, Mathematical analysis of investment systems, Journal of Mathemat-ical Analysis and Applications, 326 (2007) 708-720.

67. Q. J. Zhu, Convex Analysis in Mathematical Finance, Nonlinear Analysis: The-ory Method and Applications 75, (2012) 1719-1736.

Index

(Ω,F , P ), 31CV aR, 72I, 12K+, 2RV (Ω,F , P ), 31S, 32V aR, 71[A,B], 12Θ, 32χA, 94ιC , 1Θ, 32epi f , 1∂, 4ρs, 70σ-algebra, 31σC , 2, 4dC , 4dd, 70f∗, 10f−1, 1port[S], 51ts[S], 79E, 31Et, 94

arbitrage, 51

Bachelier formula, 109, 109convexity, 111

beta, 38biconjugate, 10, 13Black-Scholes formula, 109, 110

as Fenchel-Legendra transform, 111convexity, 111delta hedging, 115dual, 115generalized convexity, 112time reversal, 117

Blackjack, 46Boltzmann-Shannon entropy, 18box algebra, 103Brownian motion, 100

capital asset pricing model, 35, 36, 38capital market line, 36, 37capital market portfolio, 37chain rule, 7coherent

acceptance cone, 64acceptence cone, 64partial order, 66preference, 66price, 67risk measure, 61

commutator, 12complete market, 59, 88concave

function, 1mapping, 2

conic finance, 90conjugation, 11constrained optimization, 2constraint qualification, 16, 33, 53, 93contingent claim, 55, 85convex

cone, 2

140 Index

function, 1mapping, 2normal cone, 4programming, 2, 8set, 1subdifferential, 4subgradient, 4, 11

cyclical monotonicity, 26

decoupling lemma, 6delta hedging, 12, 114domain

of subdifferential, 4duality

Fenchel, 16generalized, 24Lagrange, 19, 28Lagrange multipliers, 15linear program, 19linear programming, 60Rockafellar, 14, 28strong, 16, 33, 53weak, 14

efficient index, 48entropy maximization, 17, 32

Fenchelbiconjugate, 11conjugate, 10examples, 11rules, 12

Fenchel–Rockafellar Theorem, 5Fenchel–Young

equality, 12, 26inequality, 11, 12generalized, 20multidimensional, 22weighted, 21

Fenchel-Legendre transform, 10, 12derivative, 12

filtration, 100financial market, 31function

characteristic, 94epigraph, 1indicator, 1optimal value, 6polyhedral, 5, 18, 20

preimage, 1support, 2utility, 40

fundamental theorem of asset pricing,49, 53, 83

generalized convexity, 118p−multiple ETF, 123normal kernel, 128reducing volatility, 128volatility trade, 129

Generalized duality, 118Girsanov theorem, 107good deal, 68growth portfolio theory, 44

hedge, 86, 95sub, 86super, 86with p−multiple ETF, 123with contingent claim, 118

income stream, 91incomplete market, 89inf-convolution, 4information rate, 46information structure, 78Ito formula, 102

basic form, 102dual, 107graphic illustration, 103multidimensional, 105

Ito process, 104

Jensen’s inequality, 2

Kelly Criterion, 46Kelly criterion, 48

Lagrange multiplier, 4, 8, 13, 87linear programming, 5

Markowitzbullet, 34frontier, 34portfolio, 31, 32

martingale, 101representation, 106

martingale measure, 52

necessary optimality condition, 8normal cone, 7

Index 141

and subgradients, 7to intersection, 7

optimal value function, 3optimization

subdifferential in, 8order-reversing, 10

partial order, 2payoff, 55polar cone, 2polyhedral

function, 5, 20set, 5

portfolio, 32equivalent, 50growth, 44Markowitz, 33, 34minimum risk, 35space, 51

Pshenichnii–Rockafellar condition, 8

return, 32risk

aversion, 42coefficient(absolute), 42coefficient(relative), 43

risk free asset, 31risk measure, 61

coherent, 61conditional value at risk, 72

drawdown, 70dual representation, 62, 63standard deviation, 70value at risk, 71

risky assets, 32

sandwich theorem, 6Sharpe ratio, 39standard deviation, 33stochastic processes, 99subdifferential, 4

at optimality, 8calculus, 7chain rulel, 7generalized, 26nonemptyness, 5sum rule, 7, 8

subgradient, 4sum rule, 7

trading strategy, 79admissible, 79, 80arbitrage, 80, 87, 92leverage level, 80norm, 80self-financing, 80

two fund separation theorem, 37two fund theorem, 35

utility optimizaation, 52