elements of stochastic analysis with application to finance...

Elements of Stochastic Analysis with

application to Finance (52579)

Pavel Chigansky

Department of Statistics, The Hebrew University, MountScopus, Jerusalem 91905, Israel

E-mail address: [email protected]

These are the lecture notes for the graduate course I taught at the Sta-tistics Department of the Hebrew University during the spring semester of2011. They are an adapted adoption of the textbooks [1], [2] and [3]. Thereader is assumed to be familiar with mathematical probability at the levelof [4]. As always the author will be delighted to get any comments, sugges-tions, etc.

P.Ch., Jerusalem, June, 2011

Contents

Chapter 1. Option pricing in discrete time 51. The one period binomial market model 52. The multi period binomial market model 113. Multi-asset one-period model 17

Chapter 2. Stochastic processes in continuous time 251. Elements of the general theory 252. The Brownian motion and its existence 383. Some properties of the Brownian motion 514. Ito’s stochastic integral 575. The Ito chain rule 736. Stochastic differential equations 837. The Feynman-Kac formula 1008. Girsanov’s change of measure 107

Chapter 3. Option pricing in continuous time 117

Bibliography 133

3

CHAPTER 1

Option pricing in discrete time

An Israeli company orders equipment from its US partner, to be shippedin a year from now, and will pay 1 million US $ at the moment of delivery.Today’s exchange rate of the US $ is 3.4 NIS, but in a year it may changein either direction. The products of the Israeli firm are sold at the nationalmarket and it’s budget is managed in the local currency. Hence it needs tosecure the future payment against increase in the exchange rate.

One obvious way to do this is to buy 1 million US $ today and put themin a bank account (perhaps, enjoying a modest interest rate). However, thispossibility would require holding back the funds, which could have been usedfor development or more profitable investments.

Alternatively, the Israeli company can buy a contract from e.g. a bank,according to which the bank will sell 1 million US $ at the rate 3.4 in a yearfrom now and the company will be obliged to buy it. Such contract allowsthe company to buy the US $ at at the rate, not higher than 3.4, but if theactual exchange rate drops below 3.4, it will overpay.

To address this issue, a different contract is possible: the bank will sell 1million US $ at the rate 3.4 in a year from now, and the company will havethe right, but not the obligation, to buy it. Hence if the actual exchangerate drops below 3.4, the company will be able to buy at the lower rate.This type of contract is called European call option. Of course, the optionwriter (the bank) will charge a premium for the service (usually at the timeof signing the contract). But what would be a fair price for the option ?We shall see that this question leads to an elegant theory of option pricing,which will be the main subject of our course.

In this chapter we shall address the problem, assuming that the pricesevolve randomly between discrete time epochs. In the next chapter we shallreview some basic theory of stochastic processes and will use it to modelthe prices, evolving continuously in time. While much more technically in-volved, the continuous time formulation admits some explicit simple pricingformulas, widely used by practitioners.

1. The one period binomial market model

The market is the process (Bt, St), t = 0, 1, where

B0 = 1B1 = 1 + r

5

6 1. OPTION PRICING IN DISCRETE TIME

is the risk free asset (e.g. bank account) with the interest rate r > 0 and

S0 = s

S1 = S0Z1

is the risky asset (stocks, etc.) with the share price s > 0 and random yieldZ1 ∈ 1 + u, 1 + d with

pu := P(Z1 = 1 + u) > 0, pd := P(Z1 = 1 + d) > 0

Definition 1.1. A portfolio strategy h is a vector (x, y) ∈ R2. Thevalue process of the portfolio h is

V ht = xBt + ySt.

Remark 1.2. For the one-period model, the portfolio is the same forboth t = 0 and t = 1: it is chosen at time t = 0 and its value changes fromV h

0 to V h1 due to the random change in stock’s price. We shall assume that

negative and fractional values of x and y are allowed. Negative x stands forborrowing money from the bank and negative y means a short position instocks. Also we shall assume liquid market, i.e. any amount of money canbe borrowed or invested in bank and any amount of shares can be boughtor sold.

Definition 1.3. h is an arbitrage portfolio if

V h0 = 0

P(V h1 > 0) = 1

The market is arbitrage free if no arbitrage portfolio exists.

Since V h1 takes only two values, the latter equality is equivalent to V h

1 >0.

Proposition 1.4. The market as above is arbitrage free if and only if

d ≤ r ≤ u.

Proof. (direct calculation) ¤Definition 1.5. A probability Q is a martingale measure if

S0 =1

1 + rEQS1,

where EQ is the expectation with respect to Q.

Proposition 1.6. The one period binomial model is arbitrage free ifand only if there exists a martingale measure.

Proof. Note that Q is defined by its values at 1 + u and 1 + d:

qu = Q(Z1 = 1 + u), qd = Q(Z1 = 1 + d),

andEQS1 = s(1 + d)qd + s(1 + u)qu.

1. THE ONE PERIOD BINOMIAL MARKET MODEL 7

Suppose there exists a martingale measure, i.e. there are qu ≥ 0 and qd ≥ 0,qu + qd = 1 such that

s =1

1 + r

(s(1 + d)qd + s(1 + u)qu

).

This implies r = dqd + uqu, i.e. d ≤ r ≤ u and by Proposition 1.4 themarket is arbitrage free. The converse claim is obtained by reversing theimplications. ¤

Corollary 1.7. If the market is arbitrage free, the martingale measureis unique and is given by

qu =r − d

u− d, qd =

u− r

u− d.

Proof. Solve the equations r = dqd + uqu and 1 = qu + qd. ¤Definition 1.8. A simple contingent claim is a random variable X of

the form X = Φ(Z1), where Φ is a contract function.

Example 1.9. For the European call option,

Φ(z) = (sz −K)+ =

0 sz < K

sz −K sz ≥ K

where K is the strike price (part of the contract). ¥Definition 1.10. Π(t; X), t = 0, 1 is the arbitrage free price process of

the contingent claim X, if the extended market (B,S,Π) is arbitrage free.

The arbitrage free price of the option is “fair” in the sense that anydeviation from this price creates arbitrage on the market, where the optionis traded as an asset. In particular, the “fair” price of the option (premium)is Π(0, X).

Definition 1.11. A contingent claim X is reachable if there is a portfolioh, called hedging or replicating portfolio, such that

V h1 = X, P− a.s.

Definition 1.12. The market is complete if any contingent claim isreachable.

Proposition 1.13. If X is hedged by h, the price process Π(t; X) := V ht

is arbitrage free.

Proof. (exercise) ¤Proposition 1.14. The binomial market is complete and a contingent

claim X = Φ(Z1) is hedged by the portfolio

x =(1 + u)Φ(1 + d)− (1 + d)Φ(1 + u)

(1 + r)(u− d)

y =Φ(1 + u)− Φ(1 + d)

s(u− d)


Proof. Since V h1 = x(1+r)+yS1, the requirement V h

1 = Φ(Z1), implies

x(1 + r) + ys(1 + u) = Φ(1 + u)

x(1 + r) + ys(1 + d) = Φ(1 + d).

The claimed answer is obtained by solving these equations. ¤

Corollary 1.15. (Risk Neutral valuation formula) If the market is ar-bitrage free, then

Π(0;X) =1

1 + rEQX,

where Q is the martingale measure.

Note that the valuation formula suggests that the “fair” price (premium)of the contingent claim is the discounted expected value of X with respectto the martingale measure, rather that the objective measure P.

1.1. Exercises.

Problem 1.1.1. Find the arbitrage free price and and the replicatingportfolio of the European call option with the strike price K = 3.6, for thefollowing one-period binomial market with the bond price dynamics:

B0 = 1B1 = 1 + r,

where r = 0.1 is the interest rate, and the stock price dynamics:

S0 = s,

S1 = S0Z1,

where s = 3.5 NIS/US$ and Z1 is a random variable taking values in u, d =0.2,−0.2 with probabilities

P(Z1 = 1 + u) = pu = 0.4

P(Z1 = 1 + d) = pd = 0.6.

Problem 1.1.2. This problem recalls some of the basic notions of math-ematical probability and exercises them in the simple setting with the finiteprobability space.

(1) Let Ω = ω1, ..., ωn for some fixed n ≥ 2. Recall that the powerset of Ω, denoted 2Ω, is the set of all subsets of Ω (including ∅).Argue that 2Ω is finite with cardinality |2Ω| = 2n, i.e. it contains2n elements

(2) Show that F := 2Ω is an algebra of sets, i.e. it is closed undertaking intersections and complements (and thus also unions).

1. THE ONE PERIOD BINOMIAL MARKET MODEL 9

(3) Recall that a probability measure on (Ω, F) is a function P : F 7→[0, 1] (from sets (events) to numbers in the interval [0, 1]), such thatP(Ω) = 1 and P(A ∪ B) = P(A) + P(B) whenever A ∩ B = ∅ (ad-ditivity property). Using these axioms, show that P is determinedby its values on the singletons ω, ω ∈ Ω

Consider a probability space (Ω, F) with two probability measures P andQ, defined on it. The probability Q is absolutely continuous with respect toP, denoted Q¿ P, if

P(A) = 0 =⇒ Q(A) = 0, ∀A ∈ F.

If both Q¿ P and P¿ Q, then Q and P are said to be equivalent, denotedQ ∼ P. The measures P and Q are orthogonal, denoted by Q⊥P, if thereexists a set A, such that P(A) = 0 and Q(A) = 1.

(4) Let P be the uniform measure on Ω, i.e. P(ω) = 1/n and Q bedefined by

Q(ω) =

1/2 ω ∈ ω1, ω20 otherwise

Is P¿ Q ? Is Q¿ P ? Is Q ∼ P ? Is Q⊥P ?(5) For P and Q from the previous question, characterize the sets of

measures, which are (a) absolutely continuous w.r.t. P; (b) equiv-alent to P; (c) absolutely continuous w.r.t. Q; (d) equivalent to Q(e) orthogonal to P; (f) orthogonal to Q.

(6) Do any two measures have to be either orthogonal or equivalent ?If not, give a counterexample.

Recall that expectation of a random variable X : Ω 7→ R with respectto P is 1

EPX =∑

i

X(ωi)P(ωi)

(7) Calculate EPX and EQX of X(ω) = 1ω∈ω1,ωn(8) Show that if ξ is a nonnegative random variable with EPξ = 1, then

the set function

Q(A) := EPξ1ω∈Ais a probability measure.

(9) Show that if Q ¿ P, then there exists a unique non-negative ran-dom variable ξ : Ω 7→ R, called Radon-Nikodym derivative of Qwith respect to P and denoted dQ

dP (ω), such that

Q(A) = EPξ1ω∈A.

Find ξ for P and Q as above.

1the expectations for uncountable Ω are defined as Lebesgue integrals with respect toa (probability) measure


(10) Show that if Q ∼ P, then

dQdP

(ω) =(

dPdQ

(ω))−1

∀ω ∈ Ω.

(11) Prove the change of measure formula for expectations: for a boundedvariable X and Q¿ P,

EQX = EPdQdP

X.

Calculate EQX for X from (7), using this formula(12) Recall that the conditional expectation of a r.v. X given the r.v Y

under probability P is the random variable2

EP(X|Y )(ω) =∑

y:P(Y =y)>0

EPX1Y =yP(Y = y)

1Y (ω)=y.

(13) Calculate EP(X|Y ) and EQ(X|Y ) for X as in (7) and

Y = 1ω∈ω1,ω2.

(14) Show that

EP(X − EP(X|Y )

)1Y =y = 0, ∀y ∈ R

and consequently

EP(X − EP(X|Y )

)2≤ EP

(X − ϕ(Y )

)2

for any function ϕ. In other words, the conditional expectationof X given Y minimizes the mean square error in the problem ofpredicting the value of X, given the realization of Y .

(15) Prove the following Bayes formula (or change of measure for con-ditional expectations)

EQ(X|Y )

=EP

(X

dQdP

∣∣Y)

EP(dQ

dP∣∣Y

)

and apply it for X and Y as above.(16) Explain why all the definitions above can be extended to infinitely

countable Ω. Which of the definitions/questions above are mean-ingless for Ω = [0, 1] ? We shall see later how the notions introducedabove can be defined in much more general setting, than countableΩ (ultimately, for Ω which is a set of functions on the time interval[0, T ]).

2Y = y is the usual abbreviation for ω : Y (ω) = y

2. THE MULTI PERIOD BINOMIAL MARKET MODEL 11

Problem 1.1.3. Consider the following definition of arbitrage portfolio:

V h0 = 0

V h1 ≥ 0

P(V h1 > 0) > 0

(1) Explain why the above definition of arbitrage is weaker than theone defined in class

(2) Show that the market is arbitrage free in this sense if and only if

d < r < u

(3) Show that the market is arbitrage free if and only if there exist anequivalent martingale measure, i.e. a probability measure Q, suchthat Q ∼ P and

S0 =1

1 + rEQ(S1|S0)

2. The multi period binomial market model

In this section we consider the market model (B,S) = (Bt, St)t∈0,...,T,where

Bt = Bt−1(1 + r), t = 1, ..., T

B0 = 1

is the price process of a risk free asset (e.g. bank account), with the interestrate 3 r ≥ 0 and

St = St−1Zt, t = 1, ..., T

S0 = s

is stock price process, with initial price s > 0 and Z1, ..., ZT are i.i.d. r.v.

pu = P(Z1 = 1 + u) > 0, pd = P(Z1 = 1 + d) > 0.

Definition 2.1. A portfolio h is a stochastic process h = (ht) = (xt, yt),t = 0, ..., T , which is predictable with respect to the price process, i.e. ht

is measurable with respect to FSt−1 = σS0, ..., St−1 (h0 = h1 is set by

convention). The corresponding value process is

V ht = xt(1 + r) + ytSt, t = 0, ..., T

Remark 2.2. xt and yt are the amounts of money in the bank and stocksrespectively. At time t = 0, one chooses the portfolio h0 = (x0, y0), whosevalue is

V h0 = x0 + y0s = x1 + y1s

3r ≥ 0 is assumed, since otherwise arbitrage is possible (think why). If, however, onemust hold her money either in the bank or in stocks (and not at home with zero interestrate), then r > −1 makes sense as well.


This portfolio is kept till time t = 1, when the price is updated to S1 = sZ1.Now the value of the portfolio is

V 1h = x1(1 + r) + y1S1.

Since more information on the market is available at this moment, it makessense to update the portfolio to h2. This portfolio is kept till time t = 2,when the new price S2 is revealed and the value of the portfolio becomes

V h2 = x2(1 + r) + y2S2.

Again, the portfolio is updated to rely on more data, namely S0, S1, S2,etc.

Definition 2.3. A portfolio h is self-financing if

xt(1 + r) + ytSt = xt+1 + yt+1St, t = 1, ..., T − 1.

The latter means that the rebalancing of the portfolio is made withoutbringing in or out any external funds.

Definition 2.4. an arbitrage possibility is a self-financing portfolio hwith the property

V h0 = 0

P(V hT ≥ 0) = 1

P(V hT > 0) > 0

Proposition 2.5. If the binomial market is arbitrage free, then d < r <u.

Proof. (exercise). ¤Definition 2.6. Q is an equivalent martingale measure (EMM) of the

market, if Q ∼ P and the process St/(1 + r)t is a martingale under Q, i.e.

St−1 =1

1 + rEQ

(St|St−1

), Q− a.s. t = 1, ..., T.

Proposition 2.7. For the arbitrage free binomial market, there existsthe unique EMM Q under which Z1, ..., ZT are i.i.d. and

Q(Z1 = 1 + u) =r − d

u− d> 0, Q(Z1 = 1 + d) =

u− r

u− d> 0.

Proof. (direct calculation) ¤Theorem 2.8 (the First Fundamental Theorem). The binomial market

is arbitrage free if and only if there exists at least one EMM.

Proof. One direction follows from the Propositions 2.5 and 2.7. Con-versely, assume that there exists an EMM Q and suppose that h is an arbi-trage possibility. Since h is an arbitrage possibility, V h

0 = 0 and thus (recallthe first home assignment)

EQV h0 = EP

dQdP

V h0 = 0.


Since Q ∼ P, P(V Th > 0) > 0 implies Q(V T

h > 0) > 0 and hence EQV hT > 0

(why?) On the other hand, for any self-financing portfolio h (exercise):

EQV hT =

1(1 + r)T

EQV h0 ,

which is a contradiction. Hence the market is arbitrage free. ¤

Definition 2.9. A simple contingent claim is a random variable of theform X = Φ(ST ), where Φ is the contract function.

An option bought at the time t = 0 guarantees possible profit at thefuture time T and hence at any time t = 1, ..., T − 1 has a value and can beconsidered as an asset on its own4. Consequently, the option can be tradedon the market, which raises the question of how to choose a “fair” price forit at each time t = 0, ..., T . If the underlying market is arbitrage free, itmakes sense to price the option so that the extended market, consisting ofthe original assets and the option, remains arbitrage free.

One simple consequence of this principle is that the only possible choiceof the price at time T is Π(T ;X) := X. Indeed, if we trade the option at e.g.a smaller price Π(T ; X) < X, one can easily construct an arbitrage portfolio:do nothing till time T (i.e. choose xt = 0 and yt = 0 for t = 0, ..., T ) and attime buy the option for Π(T ; X) and sell it for X, gaining X− Π(T ; X) > 0.

The question remains as to how the arbitrage free price is chosen atany other time t = 0, ..., T − 1. A handy tool to calculate this price is thefollowing notion:

Definition 2.10. A contingent claim X is reachable if there exists a self-financing portfolio h, called hedge or replicating portfolio, such that V h

T = X,P-a.s.

If a hedge exists for the given contingent claim X, its value process isan arbitrage free price for the option:

Π(t; X) := V ht , t = 0, ..., T.

Indeed, suppose that at some t ≤ T , the option is offered on the marketat a price Π(t; X) < V h

t , then we wait till time t (xs = 0 and ys = 0,s = 0, ..., t− 1), buy at time t the option at the price Π(t; X) and sell it asthe hedge for X at the price V h

t , investing the profit V ht − Π(t;X) > 0 in

the bank account, thus creating arbitrage.Hence the problem of pricing of a contingent claim on an arbitrage free

market is reduced to finding a hedge and calculating its value process. Aswe shall see shortly, this is indeed possible for the binomial market at hand.In fact, the binomial market is complete, i.e. a hedge can be found for all(simple) contingent claims, which gives the ultimate solution to the arbitragefree pricing problem.

4indeed, options are routinely traded on the stock markets


Remark 2.11. If a market is arbitrage free, but incomplete and a con-tingent claim X cannot be hedged, the pricing principle outlined above isnot applicable. We shall encounter such examples in a more general marketmodel, considered in the next section.

Proposition 2.12. The arbitrage free binomial model is complete andany simple contingent claim X = Φ(ST ) can be replicated by the self-finan-cing portfolio h = (xt, yt), t = 0, ..., T with

xt(k) =1

1 + r

(1 + u)Vt(k)− (1 + d)Vt(k + 1)u− d

, k = 0, ..., t

yt(k) =1

St−1

Vt(k + 1)− Vt(k)u− d

,

where Vt(k) is the value of the portfolio at the (t, k)-node of the correspondingbinomial tree, satisfying the backward recursion:

Vt(k) =1

1 + r

(quVt+1(k + 1) + qdVt+1(k)

), k = 0, ..., t

VT (k) = Φ(s(1 + u)k(1 + d)T−k

)

with

qu =r − d

u− d, qd =

u− r

u− d.

In particular, the arbitrage free premium for the contingent claim X is givenby:

Π(0;X) = V0(0) =

1(1 + r)T

T∑

k=0

(Tk

)qkuqT−k

d Φ(s(1 + u)k(1 + d)T−k

)=

1(1 + r)T

EQΦ(ST ), (2.1)

where Q is the unique EMM (guaranteed for the arbitrage free binomialmarket by Theorem 2.8).

Proof. (the binomial algorithm, demonstrated in class through a nu-merical example) ¤

Remark 2.13. (2.1) suggests the neutral valuation pricing formula forthe contingent claim X:

Π(0;X) =1

(1 + r)TEQX.

Note that the hedge h does not appear in this formula anymore. Remarkably,this formula remains valid even in incomplete arbitrage free markets for theclaims, which cannot be replicated (c.f. Remark 2.11).


2.1. Exercises.

Problem 1.2.1. Let Ω = ω1, ..., ωm and F be the power set5 of Ω. LetX be a random variable X : Ω 7→ Rd, d ≥ 1 and let RX be the range of X,i.e. RX = X(ω) : ω ∈ Ω.

(1) Argue that RX is a finite set with cardinality6 less than m(2) Let FX be the following collection of subsets of Ω:

FX :=ω ∈ Ω : X(ω) ∈ Γ : Γ ⊆ RX

.

Show that FX is an algebra of subsets7 of Ω and FX ⊆ F.(3) Suppose ω (the outcome of the experiment) is not known to us

and we see only X(ω). An event A occurs if ω ∈ A. Explainhow the value of X(ω) tells whether any event in FX occurred ornot. Conversely, suppose that we know whether each event in FX

occurred or not. Explain why this reveals the value of X(ω).(4) Let X and Y be random variables on Ω. Show that FY ⊆ FX

if and only if there exists a function ϕ : RX 7→ RY , such thatY (ω) = ϕ

(X(ω)

)for all ω ∈ Ω

(5) Argue that FY = FX if and only if there exists a one-to-one functionϕ, such that Y (ω) = ϕ

(X(ω)

)for all ω ∈ Ω.

(6) Give an example of a pair of random variables, so that neitherFX ⊆ FY nor FY ⊆ FX holds.

(7) Show that FX∩FY is an algebra of subsets of Ω. Given an exampleof FX and FY so that FX ∪ FY is not an algebra.

(8) Let FX ∨FY be the minimal algebra which contains FX ∪FY . LetFX,Y be the algebra generated by (X, Y ), i.e.

FX,Y =ω ∈ Ω : (X(ω) ∈ Γ1, Y (ω) ∈ Γ2 : Γ1 ⊆ RX , Γ2 ⊆ RY

.

Show that FX,Y = FX ∨ FY

(9) Show that FX ⊆ FX,Y

Problem 1.2.2. Let ξ ≥ 0 be a real valued random variable8 on aprobability space (Ω, F,P). Show that P(ξ > 0) > 0 implies Eξ > 0.

Hint: explain why ω : ξ(ω) > 0 = ∪n≥1ω : ξ(ω) ≥ 1/n and use theChebyshev inequality to show that Eξ = 0 implies P(ξ > 0) = 0.

5the set of all subsets of Ω6cardinality of a finite set is the number of its points7i.e. it is closed under intersections and complements (and thus unions). Note that

the ∅ is automatically added to FX .8do not assume that it takes values in a finite or countable set


Problem 1.2.3. Show that FSt = FZ

t for all t = 1, ..., T .

Hint: use the results from Problem 1.2.1.

Problem 1.2.4. Find the arbitrage free price of the European call optionwith the strike price K = 80 and maturity time T = 3 for the binomialmarket with the bank account (interest rate r = 0.1),

B0 = 1

Bt = (1 + r)Bt−1, t = 1, 2, 3

and the stock price dynamics:

S0 = s

St = St−1Zt, t = 1, 2, 3,

where s = 80, and Zt’s are i.i.d. random variables with values in u, d =0.5,−0.5 and

P(Z1 = u) = pu = 0.6

P(Z1 = d) = pd = 0.4

Problem 1.2.5. Let (B,S) = (Bt, St)t∈0,...,T be a multi-period bino-mial market. Prove that the arbitrage free price of a simple contingent claimX = Φ(ST ) is given by

Π(0;X) =1

(1 + r)TEQX =

1(1 + r)T

T∑

t=0

(Tt

)qtuqT−t

d Φ(s(1 + u)t(1 + d)T−t

),

where Q is the EMM, under which the stock yields Zt, t = 1, ..., T are i.i.d.with qu = Q(Z1 = 1 + u) and qd = Q(Z1 = 1 + d).

Problem 1.2.6. Prove the claim of Proposition 2.5.Hint: construct an concrete arbitrage possibility h

Problem 1.2.7. Assume that d < r < u and let Q be the equivalentmartingale measure (we saw that there is one and it is unique). Show thatthe value process V h

t , t = 0, ..., T of a self-financing portfolio satisfies

11 + r

EQ(V h

t |FSt−1

)= V h

t−1, t = 1, ..., T.

3. MULTI-ASSET ONE-PERIOD MODEL 17

Problem 1.2.8. Show that the binomial market is arbitrage free if thereexists an EMM.

Hint: assume that there exists an EMM Q and an arbitrage portfolio hand get a contradiction (Problems 1.2.7 and 1.2.2 can be useful).

3. Multi-asset one-period model

Consider the market model (St), t = 0, 1, consisting of N assets:

St =

S1t...

SNt

, t = 0, 1

where S0 = s ∈ RN is a deterministic vector and S1(ω) is a random vector onthe probability space Ω = ω1, ..., ωM of M outcomes with pi := P(ωi) > 0:

S1(ω) ∈

S1(ω1), ..., S1(ωM )

.

We shall make the following

Assumption 3.1.

S10 > 0 (w.l.o.g. S1

0 := 1)

S11(ω) > 0, ∀ω ∈ Ω

and will refer the first asset S1t as numeraire. In particular, the case

S11 = (1 + r) is the familiar risk free asset (bank account with the interest

rate r > 0).The corresponding notions of the portfolio, etc. are adjusted appropri-

ately:

Definition 3.2. The value of the portfolio h = (h1, ..., hN ) is

V ht =

N∑

i=1

hiSit =: h · St.

Definition 3.3. The portfolio h is an arbitrage possibility if

V h0 = 0

P(V h1 (ω) ≥ 0) = 1

P(V h1 (ω) > 0) > 0

For technical convenience define the normalized market St := St/S1t .

The asset corresponding to the normalizing price is called numeraire (seeAssumption 3.1 above) and St are the assets prices under numeraire S1

t .Note that St corresponds to the market, which includes a zero interest ratebank account.

Proposition 3.4. Let V ht = h · St as before and V h

t := h · St, t = 0, 1.Then


(1) V ht = V h

t /S1t

(2) a portfolio h is an arbitrage on the S-market if and only if it is anarbitrage on the S-market

(3) S1t (ω) = 1 for all ω ∈ Ω.

Proof. (by inspection) ¤Definition 3.5. A probability Q is an EMM if Q ∼ P and St is a

martingale under Q:

S0 = EQS1

S11

.

Remark 3.6. For S11 := (1+r), the familiar definition from the previous

models is recovered.

Theorem 3.7 (the First Fundamental theorem of Finance).The market is arbitrage free if and only if there exists an EMM Q

Proof. We shall exhibit a vector of strictly positive probabilities q, suchthat

S0 =M∑

i=1

Si1qi. (3.1)

To this end, define an RN×M matrices

D :=

S11(ω1) ... S1

1(ωM )... ...

...SN

1 (ω1) ... SN1 (ωM )

, D :=

1 ... 1... ...

...SN

1 (ω1) ... SN1 (ωM )

.

By Proposition 3.4, the market has an arbitrage if and only if there exists aportfolio h, such that

h · S0 = 0

hD ≥ 0

hDu > 0,

where u is an arbitrary vector with strictly positive entries (e.g. u := p canbe taken). The latter can be rewritten in the form to fit the Farkas lemma 9

h · S0 ≥ 0

h · (−S0) ≥ 0

hD ≥ 0

h(−Du) < 0.

9Lemma [Farkas] For any d0, ..., dk vectors in Rn, exactly one of two problems havea solution:

(1) find the numbers λi ≥ 0, i = 1, ..., k, such that d0 =∑k

i=1 λidi

(2) find a row vector h ∈ Rn such that h · d0 < 0 and h · di ≥ 0 for all i = 1, ..., k.


Hence, by the Farkas lemma, the market is arbitrage free if and only if theproblem

−Du =[D|S0| − S0

]︸︷︷︸∈RN×M+2

λ

has a solution λ with λi ≥ 0. Set βi := λi, i = 1, ...,M and α := λM+2 −λM+1, then the latter reads:

αS0 = D(β + u).

The first line of the this set of equalities yields α =∑M

i=1 βi +∑

i ui > 0 andthus (3.1) holds with q := (β + u)/α as required. ¤

Now we approach the question of pricing the contingent claims.

Definition 3.8. A (simple) contingent claim X is the random variableof the form Φ(S1).

Let Π(t;X) denote the price of the contingent claim at times t = 0, 1. Asin the previous models, the arbitrage free price is obviously Π(1;X) := X.Our goal is to choose the price Π(0;X) (i.e. the premium for the contingentclaim), so that the extended market (S, Π) is arbitrage free. To this end,note that Π(t; X) can be viewed as an asset of the type, which our modelsupports, once we fix a deterministic price for it at time t = 0 (i.e. it hasa fixed price at t = 0 and a random price, depending on S1, at t = 1).Hence the extended market (S, Π) fits the framework of the model underconsideration. If the original market S is arbitrage free, then there exists(possibly nonunique) EMM Q and if we set

Π(0;X) := EQΦ(S1)/S11 , (3.2)

the extended market (S, Π) remains arbitrage free by Theorem 3.7. Thuswe proved the

Proposition 3.9 (risk-neutral valuating formula). Let S be an arbitragefree market and Q be a corresponding EMM, then (3.2) is the arbitrage freeprice of the contingent claim X under the numeraire S1

1 .

Remark 3.10. Note that we didn’t appeal to replicating portfolios tofind the arbitrage free price of X, unlike in the case of the binomial model.In the binomial market the risk-neutral pricing formula is an outcome ofcompleteness of the market: once we know that a contingent claim can behedged, its arbitrage free price is constructed using the value process of thehedge.

The essentially new element of the multi-asset market is that it may beincomplete:

Proposition 3.11. The multi-asset one-period market is complete, i.e.any contingent claim can be hedged by a replicating portfolio, if and only if

rank(D) = M.


Proof. The portfolio h is replicating if V h1 (ω) = X(ω) for all ω ∈ Ω,

i.e.

hD =(X(ω1), ..., X(ωM )

).

The latter system of linear equations have a solution for any X if and onlyif the rows of the matrix D are linearly independent vectors in RM , whichis equivalent to rank(D) = M (recall why). ¤

Remark 3.12. If e.g. N < M , i.e. the number of assets is less than thenumber of possible prices of the assets, there will be non-reachable contin-gent claims.

Our pricing principle is based on the assumption that the underlyingmarket is arbitrage-free, which raises the question how the algebraic con-dition found in the latter proposition can be expressed in this case. Theanswer is given in the following

Theorem 3.13 (the Second Fundamental theorem of Finance). Thearbitrage-free market is complete if and only if the EMM is unique.

Proof. If the market is arbitrage free, the equation S0 = Dq does havea solution q with strictly positive entries. This solution is unique if and onlyif D has rank M : indeed, if the rank of D is less than M , then Dv = 0 forsome v ∈ RM and q′ := q + εv with ε > 0 small enough has strictly positiveentries and solves S0 = Dq′. Note that D and D has the same rank (why?)and hence the EMM is unique if and only if rank(D) = M , i.e. if and onlyif the market is complete by Proposition 3.11.

¤

Let’s explore the connection between the risk-neutral formula (3.2) andthe pricing by hedging. Suppose the market is complete and let Q be theunique EMM. It is still possible that X can be hedged by two differenthedges h and h (look back at Proposition 3.11 for N > M). In this case weshall expect that the two hedges will give the same price for the contingentclaim, i.e. their values at t = 0 coincide, since otherwise arbitrage is possible.Indeed, this is the case:

V h0 = hS0 = hEQS1/S1

1 = EQV h1 /S1

1 = EQX/S11 = Π(0;X) (3.3)

regardless of the particular hedge h.Suppose now that the market is not complete, i.e. there exists at least

two EMM’s Q and Q. Does the risk-neutral valuating formula (3.2) yieldthe same price for Q and Q ? If X is reachable, say, by a hedge h, then theanswer is positive, since V h

0 = EQX/S11 and V h

0 = EQX/S11 by (3.3).

What if X does not admit a hedge ? Note that in this case, the optionseller cannot find a replicating portfolio, but the price(s) given by (3.2) arestill such that the market (S, Π) is arbitrage free, i.e no portfolio yields


arbitrage. As the following numerical example shows, many arbitrage freeprices are possible in this latter case10.

Example 3.14. Consider the market with two assets N = 2 and threefinal prices of the stocks M = 3: Bt := S1

t = 1, t = 0, 1, i.e. the numeraireis zero interest rate bank account and

S0 := S20 = 1

S1 := S21 ∈ 0.8, 1, 1.2,

the latter taking values with positive probabilities (the values are of no im-portance). Both Q := (1/4, 1/2, 1/4) and Q = (1/3, 1/3, 1/3) are martingalemeasures:

EQS1 =13· 0.8 +

13· 1 +

13· 1.2 = 1 = S0

andEQS1 =

14· 0.8 +

12· 1 +

14· 1.2 = 1 = S0.

Consider the Europian call option with strike price K = 1, i.e.

Φ(s) = (s−K)+ = 0.2 · 1s=1.2.

This contingent claim is not reachable, since the vector(X(ω1), X(ω2), X(ω3)

)= (0, 0, 0.2)

is clearly not in the span of the rows of

D =(

1 1 10.8 1 1.2

).

The neutral risk valuation formula gives two different arbitrage free prices:

Π(0;X) = EQX = 1/4 · 0.2 = 0.05

andΠ(0;X) = EQX = 1/3 · 0.2 = 0.0666...

Check e.g. that the (silly) contingent claim with the contract functionΦ(s) = s is reachable and check that its premium is the same under both Qand Q. ¥

3.1. Exercises.

Problem 1.3.1 (Separating hyperplane theorem). Let C and D be non-intersecting closed convex subsets of Rd and assume that D is bounded.Then there a separating hyperplane exists, i.e. there exist a nonzero vectora ∈ Rd and a constant b such that a>x ≤ b for all x ∈ C and a>x ≥ b forall x ∈ D (the set x ∈ Rd : a>x = b is the separating hyperplane).

Prove this statement, following the steps below.

10Note that existence of more than one arbitrage free price is not a contradiction:one is not allowed to trade two copies of the option with different prices


Consider the distance between the sets C and D, defined by:

d(C, D) = inf‖c− d‖2 : c ∈ C, d ∈ D,where ‖x‖2

2 =∑d

i=1 x2i is the Euclidian norm.

(1) Give an example of sets C and D in R2 such that d(C,D) = 0, butC ∩D = ∅.Hint: try open sets

(2) Show that if C and D are closed and D is bounded, then C∩D = ∅implies d(C,D) > 0.

Hint: prove by contradiction: assume that d(C, D) = 0, i.e. thereexist sequences (un) ⊆ C and (vn) ⊆ D, such that ‖un − vn‖2 →0; use the Bolzano-Weierstrass theorem11 to extract a convergentsubsequence vnk

→ v and argue that v ∈ D; show that unk→ v

and argue that v ∈ C, i.e. v ∈ C ∩D

(3) Show that if C and D are closed and D is bounded, then d(C,D) =‖c− d‖2 for some c ∈ C and d ∈ D.

Hint: use the Bolzano-Weierstrass theorem again

By (2), d(C, D) > 0 and by (3), d(C, D) = ‖c − d‖2 for some c ∈ C andd ∈ D. Consider the hyperplane with

a := d− c and b :=12(‖d‖2

2 − ‖c‖22

).

This is our candidate for the separating plane, i.e. f(x) := a>x− b ≥ 0 forx ∈ D and f(x) ≤ 0 for x ∈ C

(4) Check that

f(x) = (d− c)>(x− d) +12‖d− c‖2

2

(5) Prove that f(x) ≥ 0 for x ∈ D by contradiction. Assume thatf(x) < 0 for some x ∈ D and conclude that

d

dt

∥∥d + t(x− d)− c∥∥2

2∣∣∣t = 0

= 2(d− c)>(x− d) < 0,

and, consequently,∥∥d + t(x− d)− c

∥∥2

2<

∥∥d− c∥∥2

2

for all t small enough. Argue that d + t(x − d) ∈ D, i.e. thatd(C,D) < ‖c− d‖2, which contradicts the choice of c and d.

(6) Prove that f(x) ≤ 0 for x ∈ C, similarly to (5).

11look up for it, if not familiar


Problem 1.3.2 (strict separation of a point and a closed convex set). IfC is a closed convex set and x0 6∈ C, then there exist a nonzero vector a anda number b, such that a>x < b for x ∈ C and a>x0 > b.

Follow the steps, to deduce this statement from the Separating Hyper-plane theorem:

(1) Argue that for some12 ε > 0, Bε(x0) ∩ C = ∅(2) By the Separating Hyperplane theorem there exist a nonzero a and

b such that a>x ≤ b for x ∈ C and a>x ≥ b for x ∈ Bε(x0). Showthat the latter is equivalent to

a>x0 + a>u ≥ b, ∀u such that ‖u‖2 ≤ ε.

(3) Show that minu:‖u‖2≤ε a>u = −ε‖a‖2 and the minimum is attainedat u∗ := −εa/‖a‖2. In particular,

a>x0 − ε‖a‖2 ≥ b.

(4) Let b := b + ε2‖a‖2 and show that a>x < b for x ∈ C and a>x0 > b

Problem 1.3.3. The Farkas lemma states that for the vectors d0, ..., dK ∈RN , exactly one of the following problems has a solution:

(a) find numbers λj ≥ 0 such that d0 =∑K

j=1 λjdj

(b) find a row vector h ∈ RN such that13

hd0 < 0hdj ≥ 0, j = 1, ..., M

Follow the following steps to prove the lemma(1) Show that (a) and (b) cannot hold simultaneously(2) Define a set C :=

∑Ki=1 βidi : βi ∈ R+ ∪ 0

⊆ Rn. Show that14

(a) C is a cone, i.e. v ∈ C implies av ∈ C for any number a > 0(b) C contains the origin of RN

(c) C is a convex set(d) C is a closed subset of RN

(3) Argue that the claim of the Farkas lemma follows, if we assumethat for d0 6∈ C and show that (b) holds.

(4) Assume that d0 6∈ C. Use the Problem 1.3.2 to show that thereexist an α ∈ R and an h ∈ RN so that

hd0 < α < h>x, ∀x ∈ C.

(5) Argue that 0 ∈ C implies α < 0 and hence hd0 < 0.

12Bε(x0) := x ∈ Rd : ‖x− x0‖2 ≤ ε is the closed ball around x0 with radius ε > 013for a row vector h and a column vector v, hv := h∗v =

∑i hivi

14if not familiar, look up for the definitions of convex and closed subsets


(6) Check that for any ξi ≥ 0, i = 1, ..., K

α < h>K∑

i=1

ξidi =K∑

i=1

ξi(h>di),

and argue that the latter implies h>di ≥ 0.

Hint: the inequality is impossible for any ξi ≥ 0 if h>di < 0. Why?

Problem 1.3.4. Let D be an n×m matrix of real numbers and ξ a rowvector in Rn.

(1) Show that the problem hD = ξ has the unique solution h for all ξ,if and only if rank(D) = m.

(2) Recall the definitions

Ker(D) = x ∈ Rm : Dx = 0and

Im(D>) = D>y : y ∈ RN ⊆ Rm.

Show that the orthogonal compliment of Im(D>) in Rm:(Im(D>)

)⊥=

z ∈ Rm : z>v = 0 ∀ v ∈ Im(D>)

coincides with Ker(D).

CHAPTER 2

Stochastic processes in continuous time

1. Elements of the general theory

Definition 1.1. A random process X with continuous time parametert ∈ [0,∞) is a collection of random variables Xt(ω), t ∈ [0,∞), defined on aprobability space (Ω, F,P).

Definition 1.2. The distributions of the random vectors (Xt1 , ..., Xtn)for t1, ..., tn ∈ [0,∞) and n ∈ N are called finite dimensional distributions(f.d.d) of X.

1.1. Existence of stochastic processes. The fundamental questionis whether a stochastic process with given finite dimensional distributionsexists ? An affirmative answer would be a construction of a probabilityspace (Ω, F,P) and a stochastic process Xt(ω), t ∈ [0,∞) with the givenf.d.d. To this end consider the following:

Definition 1.3. A family of distributions

Qt1,...,tn(A1 × ...×An), t1, ..., tn ∈ [0,∞), A1, ..., An ∈ B(R), n ∈ Nis consistent if

(i) for any permutation ti1 , ..., tin of t1, ..., tn

Qti1 ,...,tin (Ai1 × ...×Ain) = Qt1,...,tn(A1 × ...×An)

(ii)

Qt1,...,tn(A1 × ...×An−1 × R) = Qt1,...,tn−1(A1 × ...×An−1)

Example 1.4. The family of distributions

Qt1,...,tn(A1 × ....×An) =n∏

i=1

∫

Ai

f(xi)dxi (1.1)

where f is a probability density on R, is consistent, while

Qt1,...,tn(A1 × ....×An) =n∏

i=1

∫

Ai

f(xi − tn)dxi,

is not (why?) ¥

Obviously, the family of f.d.d. of a random process (if exists) is consistent(why?) Hence consistency of f.d.d. is the necessary condition for existence

25

26 2. STOCHASTIC PROCESSES IN CONTINUOUS TIME

of a random process with these f.d.d. Remarkably, consistency turns to bealso sufficient.

Definition 1.5. C is a cylinder set of the space of real valued functionson [0,∞), denoted by R[0,∞), if there is a finite set of times t1, ..., tn andBorel subsets A1, ..., An of R, such that

C = ω ∈ R[0,∞) : ωt1 ∈ A1, ..., ωtn ∈ An.Denote by B[0,∞) the σ-algebra generated by the cylinder sets, i.e. the

minimal σ-algebra of subsets of R[0,∞), containing the cylinder sets. A setΓ belongs to B[0,∞) if and only if there is a sequence of times (tn) and Borelsets (An), such that (see a guided exercise below)

Γ = ω ∈ R[0,∞) : ωt1 ∈ A1, ωt2 ∈ A2, ....This implies that many sets of interest, such as e.g.

ω ∈ R[0,∞) : supt≥0

ωt ≤ c,

the set of all continuous functions, the set of all functions continuous at apoint, etc. are not measurable w.r.t. B[0,∞) (see a guided exercise below).Roughly speaking, this means that the cylinder σ-algebra is not rich enoughfor such a big space as R[0,∞).

One of the reasons to consider R[0,∞) with the cylinder σ-algebra (andnot e.g. the Borel σ-algebra with respect to some metric) is the followingtheorem, which is the main tool for constructing random processes withgiven f.d.d.:

Theorem 1.6 (Daniel-Kolmogorov). Let Qt1,...,tn be a consistent familyof distributions, then there is a unique probability measure on the measurablespace (R[0,∞),B[0,∞)), such that the extension of P to any finite set of timescoincides with the corresponding f.d.d.:

P|(t1,...,tn) = Qt1,...,tn .

Remark 1.7. The proof applies Caratheodory’s extension theorem, stat-ing that a σ-additive measure on an algebra can be uniquely extended tothe σ-algebra generated by this algebra. In the context of D-K theorem,the algebra is the collection of the cylinder sets (check, that it is indeed analgebra!). The family of the given consistent distributions define an additivemeasure on this algebra. The nontrivial part is to check that this measure isin fact σ-additive. Once the latter is established, the Caratheodory theoremis applied.

Given a consistent family of distributions, the D-K theorem gives a prob-ability space (Ω, F,P) with Ω = R[0,∞) and F = B[0,∞), referred to as canon-ical, such that the coordinate process Xt(ω) = ωt has the required f.d.d.

Note that nothing is said about the properties of the obtained process: inparticular, it may have quite bizarre trajectories. For example, the process

1. ELEMENTS OF THE GENERAL THEORY 27

with f.d.d. (1.1) exists, but has discontinuous paths at all points; in fact, themap (ω, t) 7→ Xt(ω) is not even measurable with respect to F × B([0,∞))and hence it is not even clear whether the (Lebesgue) integral

∫ t0 Xs(ω)ds

can be defined for each ω (see an exercise below).

1.2. Types of measurability. Viewed as a map of both arguments(ω, t) 7→ Xt(ω), various types of measurability can be defined.

Definition 1.8. The process X is measurable, if the map (ω, t) 7→ Xt(ω)is F ×B([0,∞))-measurable 1.

Since jointly measurable function is measurable with respect to each oneof its coordinates, a measurable (bounded) process is Lebesgue integrablew.r.t. t for each ω, i.e. t 7→ ∫ t

0 Xs(ω)ds is again a measurable process.In many applications, the probability space comes with an increasing

family of σ-algebras (Ft), t ≥ 0, i.e. Fs ⊆ Ft ⊆ F for all t ≥ s ≥ 0. The fam-ily (Ft) is called filtration (of σ-algebras) and the quadruple (Ω, F, (Ft),P) isreferred to as the filtered probability space or stochastic basis. The filtrationis said to satisfy the usual conditions if it is right continuous:

Ft+ :=⋂

ε>0

Ft+ε = Ft, t ≥ 0

and F0 contains all the P-null sets of F. These are technical conditions,without which certain desirable properties do not hold (beyond the scope ofour course).

Definition 1.9. The process X is Ft-adapted, if the map ω 7→ Xt(ω) isFt-measurable for each t ≥ 0.

The practical meaning of X being Ft-adapted is that the trajectory of Xcan be reconstructed precisely if one knows which events from Ft occurred.For example, if S is a stock price process and Xt is the portfolio to be chosenat each t on the basis of the prices observed up to time t, X is to be adaptedto the natural filtration of S, i.e. FS

t := σSs, s ≤ t. Note that an adaptedprocess may not be Lebesgue integrable w.r.t. time variable t for each ω,hence the processes are usually required to be both measurable and adapted.

Definition 1.10. The process X is progressively measurable if the map(ω, t) 7→ Xt(ω) is Ft ×B([0, t])-measurable for each t.

Clearly, a progressively measurable process is both adapted and measur-able. Counterexamples show that the converse is false, unless X has suffi-ciently regular paths. It can be shown that if X is adapted and measurableand has cadlag2 paths, then it has a progressively measurable modification

1recall that for a pair of measurable spaces (X, X) and (Y, Y), the function ψ : X 7→ Yis X/Y-measurable if ψ−1(B) := x ∈ X : ψ(x) ∈ B ∈ X for all B ∈ Y. When the rangeσ-algebra Y is obvious from the context, the map is briefly said to be X-measurable.

2a function f : [0,∞) 7→ R is cadlag if it is right continuous and has limits from theleft


(refer to the next section for the definition of modification). The space ofcadlag functions plays the central role in the general theory of stochasticprocesses. Once again, let’s stress that the D-K theorem does not imposemuch structure on the trajectories of the emerging process.

1.3. Equality of random processes.

Definition 1.11. The cadlag processes X and Y are equal in distribu-tion (or identically distributed) if they have the same f.d.d., i.e.

(Xt1 , ..., Xtn) d= (Yt1 , ..., Ytn), ∀t1, ..., tn, ∀nNote that identically distributed processes must not even be defined on

the same probability space.

Definition 1.12. X is a modification (version) of Y if

P(ω ∈ Ω : Xt(ω) = Yt(ω)) = 1, ∀t ≥ 0.

Clearly, modifications are identically distributed (why?).

Definition 1.13. X and Y are indistinguishable if

P(ω ∈ Ω : sup

t≥0|Xt(ω)− Yt(ω)| = 0) = 1.

Clearly, indistinguishable processes are modifications of each other. Thefollowing example demonstrates that modifications must not be indistin-guishable.

Example 1.14. Let (Ω, F,P) = ([0, 1], B([0, 1]), λ), where λ is the Le-besgue measure and define Xt ≡ 0 and Yt = 1t=ω, t ∈ [0, 1]. Then

P(Xt = Yt) = P(t 6= ω) = 1,

but supt∈[0,1] |Xt(ω)− Yt(ω)| = 1. ¥Lemma 1.15. If X and Y are versions with cadlag paths, they are indis-

tinguishable (prove).

Note that the process Y in Example 1.14 violates the conditions of thislemma.

1.4. Exercises.

Problem 2.1.1. Recall that a collection A of subsets of a set Ω is analgebra if Ω ∈ A and A is closed under finite intersections and compliments,i.e. A,B ∈ A =⇒ A ∩B ∈ A and Ac ∈ A.

(1) Show that algebra is closed under unions as well

(2) Argue that the collection of all open intervals in R is not an algebra

(3) Argue that finite unions of nonintersecting open intervals do notform an algebra


(4) Prove that the collection of unions of nonintersecting intervals ofthe form (a, b], a < b together with R itself is an algebra (call it A)

Recall that a collection S of subsets of a set Ω is a σ-algebra if Ω ∈ S

and S is closed under compliments and countable number of intersections,i.e. whenever (An) is a sequence in S, ∩n≥1An ∈ S and Ac

n ∈ S.

(5) Argue that the algebra A from (4) is not a σ-algebra

(6) Show that a σ-algebra is closed under countable unions

(7) Let σ(A) be the smallest σ-algebra which contains the algebra A.Argue that σ(A) exists.

Hint: prove that if (Su), u ∈ U is an arbitrary collection of σ-algebras (not necessarily countable), then ∩u∈USu is a σ-algebra.Use this and the fact that the collection of all subsets of R is aσ-algebra

(8) Show that singletons, open and closed intervals are contained inσ(A).

(9) Show that the set of all rational numbers Q is contained in σ(A)(10) The Borel σ-algebra of the subsets of R, denoted B(R), is the small-

est σ-algebra, which contains the open intervals (the existence isestablished as in (7) above). Prove that B(R) = σ(A).

Hint: by (8), the open intervals are included in σ(A) and henceB(R) ⊆ σ(A). Give a similar argument for the other inclusion.

Problem 2.1.2. Recall the basic setting of the probability theory. Theprobability space (Ω, F,P) consists of a set Ω (of points or elementaryevents), a σ-algebra of its subsets (events) F (see the definition above) and aprobability measure P, i.e. a σ-additive3function F 7→ [0, 1]. A random vari-able X with values in R is a F/B(R)-measurable function, i.e. ω : X(ω) ∈B ∈ F for all B ∈ B(R). Note that PX(B) = P(ω : X(ω) ∈ B), B ∈B(R) is well defined. It is not hard to show that PX is a probability measureon B(R), induced by X. The function x ∈ PX((−∞, x]) =: FX(x) is calledthe cumulative distribution function (c.d.f) of X.

(1) Construct a probability space (i.e. specify (Ω,F,P)) so that X(ω) =ω is a Bernoulli random variable

(2) Construct a probability space so that X(ω) = ω is a Poisson ran-dom variable

(3) Presuming that the Lebesgue probability space

(Ω,F,P) =([0, 1], B([0, 1]), λ

)

3i.e. for any sequence of pairwise disjoint sets (An)n∈N, P(∪∞n=1An) =∑∞

n=1 P(An)


is already constructed4, construct a random variable X(ω) on itwith the given continuous (strictly increasing) c.d.f. F .

Hint: argue that X(ω) = F−1(ω) is a random variable andshow that it has the right c.d.f

(4) Construct a Poisson r.v. on the Lebesgue probability space

Remark 1.16. Construction of the Lebesgue space is one of the pearlsof the measure theory. The preceding problem hints that many randomvariables can be constructed starting from the Lebesgue space (in fact, evensuch a complex random process as the Brownian motion can be constructedfrom a single instance of the Lebesgue space!). How do we construct arandom vector with values in Rn and given joint c.d.f ? What is the ap-propriate σ-algebra for Rn ? How can we construct a probability space andan infinite sequence of random variables (a discrete time random process)with prescribed probability distribution ? Can we construct a family of ran-dom variables, indexed by a continuous parameter (the random process incontinuous time) ? The following problem aims to stimulate curiosity.

Problem 2.1.3. Consider the Lebesgue space ([0, 1], B,P) as in the pre-vious problem (we changed the name P := λ).

(1) Show that P(ω) = 0 for all ω ∈ R. Find the mistake in thefollowing absurd:

1 = P(∪ω∈[0,1]ω) =∑

ω∈[0,1]

P(ω) = 0.

(2) Argue that any countable subset of Ω is Borel measurable and haszero Lebesgue measure

Hint: by definition a countable set can be enumerated and isa countable union of its points

(3) Show that any number ω ∈ [0, 1] can be expanded into series

ω =∞∑

i=1

2−iωi,

where ωi ∈ 0, 1(4) Show that the set of numbers with two distinct expansions coincides

with the set of dyadic rationas and thus has Lebesgue measure zero(e.g. 1

2 = .1000... = .01111..., etc.)

4i.e. for any set A ∈ B([0, 1]), the probability measure λ(A) is well defined andλ((a, b)) = b− a


(5) Define an infinite sequence of random variables Xn(ω) = ωn, n ∈ N(argue that these are indeed r.v.’s) Prove that X1, ..., Xn are i.i.d.Ber(1/2) for any n ≥ 1.

(6) Define Y1(ω) =∑

i odd 2−(i+1)/2ωi and Y2(ω) =∑

i even 2−i/2ωi.Show that Y1 and Y2 are i.i.d. random variables with uniformdistribution on [0, 1]

(7) Suggest a way to generate an infinite sequence of i.i.d. r.v.’s (Yn),n ≥ 1 with Y1 ∼ U([0, 1]).

(8) Suggest a way to generate an infinite sequence of i.i.d. N(0, 1)random variables

(9) Consider the set

A :=

ω ∈ Ω : lim

n

1n

n∑

i=1

ωi = 1/2

.

Give examples of ω’s from A and from Ac. The strong law of largenumbers asserts that P(A) = 1. In particular A is uncountable.Does this imply that Ac is countable ?

Problem 2.1.4 (expectation = the Lebesgue integral). Let (Ω, F,P) bea probability space and X be a nonnegative random variable. Define asequence of simple5 random variables

Xn(ω) =n2n∑

k=1

k − 12n

1(k−1)/2n≤X(ω)≤k/2n + n1X(ω)≥n

and define the Lebesgue integral of Xn w.r.t. P

EX :=∫

XndP :=

n2n∑

k=1

k − 12n

P(ω : (k − 1)/2n ≤ X(ω) ≤ k/2n) + nP(X(ω) ≥ n).

(1) Explain why measurability of X is crucial for the definition of EXn

(2) Draw X1, X2 and X3 for X(ω) = |ω − 1/2| and calculate theirLebesgue integrals. Do they coincide with the Riemann integrals ?

(3) Show that Xn(ω) ≤ X(ω) for all n ≥ 1 and limn Xn(ω) = X(ω) forany ω ∈ Ω.

5simple random variable takes a finite number of values


(4) Use Lebesgue’s Monotone Convergence theorem 6 to show that thelimit limn EXn exists. This limit is the Lebesgue integral of X,denoted

∫XdP or EX.

(5) Extend the definition of EX for an arbitrary random variable X,with E|X| < ∞

Hint: note that X = X+ − X− and |X| = X+ + X−, whereX+ = X ∨ 0 and X− = −(X ∧ 0)

(6) Prove X(ω) = 1ω∈Q, ω ∈ [0, 1] is Lebesgue integrable on theLebesgue space and calculate the integral.

Remark 1.17. A theorem of Lebesgue tells that a function on [0, 1] isRiemann integrable if and only if the set of its discontinuity points has theLebesgue measure zero, in which case the Riemann and the Lebesgue inte-grals coincide. However, Lebesgue integrability conditions are much weaker:only measurability is required. Moreover, the construction of the Lebesgueintegral does not depend on the particular geometry of the space (unlike theconstruction of the Riemann integral on R). This makes possible integrationin very abstract infinite dimensional spaces such as the space of sequences orspace of continuous functions, etc.: all one needs is a measurable structureand a measure on the space!

Problem 2.1.5. This problem explores the cylinder sets and the σ-algebra they generate. A subset C ⊆ R[0,∞) is a cylinder if there exists afinite number of distinct times t1, ..., tn and a set A ∈ B(Rn), such that

C =ω ∈ R[0,∞) : (ωt1 , ..., ωtn) ∈ A

. (1.2)

Depending on the structure of the set A, three distinct collections of cylindersets can be considered:

(a) C1 is the collection of the cylinder sets as in (1.2) with A of theform A := I1 × ...× In, where Ii are open intervals of R;

(b) C2 is the collection of the cylinder sets as in (1.2) with A of theform A := A1 × ...×An, where Ai ∈ B(R);

(c) C3 is the collection of the cylinder sets as in (1.2) with A ∈ B(Rn)Clearly, C1 ⊆ C2 ⊆ C3 and hence σC1 ⊆ σC2 ⊆ σC3 (why?)

(1) Show that C3 is an algebra

6Theorem: if (Xn) is an increasing sequence of nonnegative random variables, i.e.Xn(ω) ≥ 0 and Xn(ω) ≤ Xn+1(ω) P-a.s., then the function X(ω) := limn Xn(ω) is arandom variable and

EX = limnEXn


(2) (bonus) Show that

σC1 = σC2 = σC3 =: B[0,∞) (1.3)

and that for any Γ ∈ B[0,∞) there exists at most countable sequenceof times (tn)n∈N and 7 A ∈ BN, such that

Γ =ω : (ωt1 , ωt2 , ...) ∈ A

. (1.4)

Hint: the collection of sets C of the form (1.4) contains the cylindersets from C3 and hence

σC1 ⊆ σC2 ⊆ σC3 ⊆ σ(C).

Argue that any set in C belongs to σC1, i.e. C ⊆ σC1 and showthat C is a σ-algebra: conclude that (1.3) holds and B[0,∞) = C.

(3) For a finite set of distinct times t1, ..., tn define the projection func-tional

πt1,...,tn(ω) :=(ωt1 , ..., ωtn

) ∈ Rn, ω ∈ R[0,∞). (1.5)

Show that the projections are measurable with respect to B[0,∞)

(4) Use (2) to show that the following subsets of R[0,∞) are not mea-surable with respect to B[0,∞)

(a) ω : supt≥0 ωt ≤ c with a constant c ∈ R(b) the set of all functions, continuous at the time point t0(c) the set of all right-continuous functions

Hint: argue by contradiction

Remark: the latter suggests that a probability measure defined on B[0,∞)

may not be defined on the above sets; in particular, the statement such as“the coordinate process Xt(ω) := ωt is continuous with probability one” canbe meaningless!

(5) Consider the space of real valued sequences

RN = ω = (ω1, ω2, ...) : ωi ∈ Rand let BN denote the cylinder σ-algebra of subsets of RN, gener-ated by the cylinder sets. Argue that ω : supj≥1 ωj ≤ c is BN

measurable. Compare with (4a).

Problem 2.1.6. In this problem we shall explore the connection betweenthe σ-algebra of cylinder sets and the Borel σ-algebra, generated by the openballs with respect to the uniform metric. Below, R[0,1] stands for the spaceof real valued functions on the interval [0, 1].

7BN is the cylinder σ-algebra of subsets of the space of real valued sequences RN


(1) Show that the map

ρ(x, y) = supt∈[0,1]

|xt − yt|, x, y ∈ R[0,1]

is a metric8 on R[0,1].(2) Check that the projection operators from (1.5) are continuous with

respect to ρ

The Borel σ-algebra on R[0,1], denoted B(R[0,1]), is the minimal σ-algebra ofsubsets of R[0,1], containing the open balls:

Br(x) := y ∈ R[0,1] : ρ(x, y) < r, r > 0, x ∈ R[0,1].

(3) Show that B[0,1] ⊆ B(R[0,1]).

Hint: argue that it is enough to check

x ∈ R[0,1] : xt ∈ A ∈ B(R[0,1])

for any t ∈ [0, 1] and A ∈ B(R); use (2) and the fact that a contin-uous functional on a metric space is Borel measurable (prove thelatter, if curious)

(4) Argue that B[0,1] 6⊇ B(R[0,1]) by constructing an appropriate set

(5) Let C[0,1] be the space of continuous functions on [0, 1]. Denote by

B[0,1]C the σ-algebra of the cylinder subsets of C[0,1] and by B(C[0,1]),

the Borel σ-algebra of subsets of C[0,1] with the uniform metric ρ

as above. Prove that B[0,1]C = B(C[0,1]).

Problem 2.1.7. Recall that an Ft-adapted process Y = (Yt)t∈[0,∞) ona filtered probability space (Ω, F, (Ft),P) is Poisson if

(i) Y0 = 0, P-a.s.(ii) the increments 9 Yt − Ys ∼ Poi(λ(t − s)) are independent10 of Fs

for all t > s(iii) the trajectories t 7→ Yt(ω) are piecewise constant

(1) Calculate the auto covariance function cov(Yt, Ys) for the Poissonprocess

(2) Argue that Poisson process has non-decreasing trajectories withunit jumps

8recall or look up for the relevant axioms9recall that a r.v. ξ ∼ Poi(µ) with parameter µ > 0 if P(ξ = k) = e−µµk/k!, for

k ∈ N ∪ 010a r.v. ξ is independent of a σ-algebra G if ξ and 1A are independent for any A ∈ G


(3) Show that the times between the jumps of the Poisson process arei.i.d. exponential random variables

(4) On the basis of (3), suggest a simple construction of the Poissonprocess

Problem 2.1.8. In this problem, we shall see that the D-K theorem mayproduce processes with very irregular trajectories. To this end, consider thefollowing notions of continuity of random processes. A process X on (Ω,F,P)is called continuous in probability if

XtnP−→ Xt

whenever tn → t for all t ∈ [0,∞). X is said to be continuous P-a.s. (orhave continuous paths with P-prob. 1), if

P(ω : t 7→ Xt(ω) is continuous for all t ≥ 0) = 1.

(1) Prove11 that the Poisson process is continuous in probability, butnot with probability one (in fact, P-a.s. discontinuous).

(2) Show that X is continuous in probability if it is continuous P-a.s.

(3) Explain why continuity in probability is well defined for the coor-dinate processes, constructed on the product probability space viaD-K theorem, while continuity with probability 1 may be meaning-less.

Consider the family of distributions:

Q(t1,...,tn)(A1, ..., An) =n∏

i=1

∫

Ai

ϕ(xi)dxi, (t1, ..., tn) ∈ Rn+, Ai ∈ B([0, 1]), n ∈ N

where ϕ is the density of N(0, 1) r.v.

(4) Argue that this family is consistent and deduce from the D-K the-orem the existence of a random process Xt(ω), t ∈ [0, 1], ω ∈[0, 1][0,∞) =: Ω, such that

P(ω ∈ Ω : Xt1(ω) ∈ A1, ..., Xtn(ω) ∈ An

)=

n∏

i=1

∫

Ai

ϕ(xi)dxi

where P is a probability measure on the σ-algebra F, generated bythe cylinder sets of Ω. Explain why this process is the continuous

11various types of convergence are revisited in Problem 2.1.10 below


time analogue of the sequence of i.i.d. Gaussian random variables(“white noise”).

(5) Prove that the process, constructed in (4) is not continuous in prob-ability (and hence not P-a.s. if applicable)

Hint: calculate the probability P(|Xtn −Xt| ≥ ε)

(6) Follow the steps below to prove that the process constructed in (4)is not even measurable

(a) Suppose that X is measurable.(b) Use the Fubini theorem12to argue that

E(∫ t

0Xsds

)2

= E∫ t

0

∫ t

0XsXrdsdr = 0, ∀t ∈ [0, 1],

i.e.

P(∫ t

0Xsds = 0

)= 1, ∀t ∈ [0, 1]

(c) Let Ωr := ω ∈ Ω :∫ r0 Xs(ω)ds = 0 for all rational r ∈ [0, 1]

and define

Ω′ :=⋂

r∈Q∩[0,1]

Ωr.

Argue that P(Ω′) = 1 and13thus

P(λs ∈ [0, 1] : Xs(ω) = 0) = 1,

and, in turn, E∫ 10 X2

s ds = 0.(d) Use the Fubini theorem again and check that

E∫ 1

0X2

s ds = 1,

which is a contradiction, showing that (a) cannot be true, i.e.X is not measurable.

13Recall that by the Radon-Nikodym theorem, for any signed measure µ absolutelycontinuous w.r.t. the Lebesgue measure λ, there exists a λ-a.s. unique function ϕs, suchthat µ(A) =

∫A

ϕsdλ(s).

Suppose gs is a Borel measurable function on [0, 1] with∫ 1

0|gs|dλ(s) < ∞, then

µ(A) :=∫

agsdλ(s) is a signed measure on [0, 1], absolutely continuous w.r.t. λ. If∫ r

0gsds = 0 for all rational r ∈ [0, 1], µ((r, q)) = 0 for all rational r, q and since such

intervals is a measure defining class, µ(A) = 0 for all Borel A. But then by the R-Ntheorem g must coincide with the zero function λ-a.s., i.e. λs ∈ [0, 1] : gs 6= 0 = 0.

13consult a book for Fubini theorem to refresh your memory


Problem 2.1.9. Let (ξn), n ∈ N∪0 be a sequence of i.i.d. bounded14

r.v.’s, defined on a probability space (Ω,F,P). Set

Ft := σξ0, ..., ξbtc.(1) Show that (Ft)t∈[0,∞) is a filtration(2) Argue that

Xt(ω) :=∞∑

i=0

ξi(ω)(

12 + t

)i

is a measurable process, not adapted to Ft.(3) Argue that

Xt(ω) :=btc∑

i=1

ξi(ω)(

12 + t

)i

is a measurable process, adapted to Ft.

Problem 2.1.10.Let (ξn), n ∈ N and ξ be random variables on (Ω, F,P). Recall that (ξn)

converges to ξ in P-probability, denoted ξnP−→ ξ, if

limnP(ω : |ξn(ω)− ξ(ω)| ≥ ε) = 0, ∀ε > 0

(1) Show that the sequence of Bernoulli r.v. ξn with P(ξn = 1) = 1/nconverges in probability to ξ ≡ 0.

(2) Show that the sequence of i.i.d. Bernoulli r.v. with P(ξn = 1) = 1/2does not converge in probability (to any random variable)

The sequence (ξn) converges to ξ with P-probability 1 (or P-a.s., or P-a.e,etc.), denoted ξn

P−a.s.−−−−→ ξ, if

P(ω : lim

nξn(ω) = ξ(ω)) = 1.

(3) Let (Ω, F,P) be the Lebesgue probability space. Show that thesequence of Bernoulli r.v.

ξn(ω) =

1 ω ∈

(n−2blog2 nc

2blog2 nc , n+1−2blog2 nc2blog2 nc

)

0 otherwise

converges in probability to 0, but diverges P-a.s. (draw a pictureof ξ1, ξ2 and ξ5)

(4) Show that the sequence ξn := ξ/n, where ξ is a r.v., converges to 0both in probability and P-a.s.

14i.e. |ξ1(ω)| ≤ C for some C > 0


(5) Prove that P-a.s. convergence implies convergence in probability

(6) Prove that if a sequence converges both in probability and P-a.s.,the limits coincide P-a.s.

The sequence (ξn) converges to ξ in Lp, p ≥ 1, denoted ξnLp−→ ξ, if E|ξn|p <

∞ and E|ξ|p < ∞ andlimnE|ξn − ξ|p = 0.

(7) Show that convergence in Lp implies convergence in probability.

(8) Give an example of a sequence which converges P-a.s., but not inL1

(9) Show that convergence in Lp implies convergence in Lq for p ≥ q ≥1.

Problem 2.1.11. Prove that if two modifications of a process have cad-lag paths, they are indistinguishable.

Hint: argue first that a cadlag function is determined by its values at therational numbers

2. The Brownian motion and its existence

Definition 2.1. The Ft-adapted process B = (Bt)t∈[0,∞) on a filteredspace (Ω, F, (Ft),P) is the (standard one-dimensional) Brownian motion (orthe Wiener process) if

(i) B0 = 0, P-a.s.(ii) Bt −Bs ∼ N(0, t− s) is independent of Fs for all t ≥ s ≥ 0(iii) B has continuous paths

Remark 2.2. The Brownian motion is similarly defined on any finitetime interval [0, T ], T > 0.

Do the latter axioms define a process ? In other words, can we constructa filtered probability space and a process on it, so that the above axiomsare satisfied ?

We shall mention three different constructions of the Brownian motion,briefly spelling two of them and focusing on the third more direct one, dueto Paul Levy.

2.1. Construction via D-K theorem. Consider the following con-sistent (check!) family of distributions:

Qt1,...,tn(A1, ..., An) :=∫

A1×...×An

n∏

i=1

1√2π(ti − ti−1)

exp(−1

2(xi − xi−1)2

ti − ti−1

)dx1...dxn

2. THE BROWNIAN MOTION AND ITS EXISTENCE 39

if 0 < t1 < t2 < ... < tn < ∞ and

Qt1,...,tn(A1, ..., An) := Qti1 ,...,tin (Ai1 , ..., Ain),

for an arbitrary unordered set of times t1, ..., tn, where (i1, ..., in) is thepermutation which puts ti’s in the increasing order, i.e. such that ti1 <ti2 < ... < tin .

The D-K theorem immediately gives a probability P on the measurablespace (Ω, F) :=

(R[0,∞),B[0,∞)

), such that the coordinate process Xt(ω) :=

ωt has Gaussian independent increments, i.e. for any t1 < ... < tn, therandom variables

Xt1 , Xt2 −Xt1 , ..., Xtn −Xtn−1

are independent and Xti − Xti−1 ∼ N(0, ti − ti−1) (why?). However, thecontinuity of paths (axiom (iii)) does not follow from the D-K theorem(look back at the Example 1.14 and Problem 2.1.8). One way to resolvethis is to hope that P assigns 1 to the set of continuous functions (in whichcase, (iii) would hold at least P-a.s.). However, this is not the case, as theset of continuous functions is not measurable with respect to the cylinderσ-algebra, on which P is defined. The following remarkable theorem comesto rescue:

Theorem 2.3 (Kolmogorov-Chentsov). Let X be a process on (Ω, F,P),such that

E|Xt −Xs|α ≤ C|t− s|1+β ∀t, s ∈ [0, 1],

for some positive constants α, β and C. Then there exists a continuousmodification X of X. Moreover, X is locally Holder continuous with anyexponent γ ∈ (0, β/α), i.e. for some δ > 0 and P-a.s. positive randomvariable h(ω)

P

(ω ∈ Ω : sup

0<t−s<h(ω)≤1

|Xt(ω)− Xs(ω)||t− s|γ ≤ δ

)= 1.

Note that to check the existence of a continuous version, we only needto know the two dimensional distributions ! The K-Ch theorem is readilyapplied to the f.d.f. of the Brownian motion: recall that if ξ ∼ N(0, σ2),then Eξ2k = Ckσ

2k for all k ∈ N with constants Ck > 0. Hence for theprocess X, constructed above with k ≥ 2

E|Xt −Xs|2k = Ck|t− s|1+(k−1),

which implies the existence of a continuous version of X (restricted to theinterval [0, 1]), which we call B. Moreover, taking k → ∞, we see thatthe trajectories of B are Holder continuous with any exponent γ < 1/2.Similarly, applying K-Ch theorem to X on the intervals [m,m + 1], m ≥ 1and patching the obtained segments appropriately (think how), we obtain aprocess B = (Bt)t∈[0,∞), defined on the probability space provided by D-Ktheorem, whose paths are continuous and whose increments are independent


and Gaussian as required by the axioms above. Finally, we can chooseFt := FB

t = σBs, s ≤ t, which completes the construction.

2.2. Construction via weak convergence. Recall that a sequence ofreal random variables (ξn) (possibly defined on different probability spaces)converge to ξ in distribution, denoted ξn

d→ ξ, if the (cumulative) distribu-tion functions of ξn converge to the distribution function of ξ at all pointsof continuity of the latter:

limnP(ξn ≤ x) = P(ξ ≤ x), ∀x at which x 7→ P(ξ ≤ x) is continuous.

Equivalently (why?), ξd→ ξ if and only if

limnEf(ξn) = f(ξ) for any bounded continuous function f .

Both definitions of convergence in distribution (also called weak conver-gence) can be given for sequences of random vectors in Rd of a finite di-mension d ≥ 1. However since the cumulative distribution function cannotbe defined for a random process X = (Xt)t∈[0,1], the weak convergence of asequence of processes Xn is defined through the latter characterization.

To this end, let C[0,1] be the space of continuous functions on the timeinterval [0, 1] with the uniform metric

ρ(x, y) = supt∈[0,1]

|xt − yt|, x, y ∈ C[0,1].

The Borel σ-field of C[0,1], denoted by B(C[0,1]), is the minimal σ-field whichcontains the open balls Br(x) = y ∈ C[0,1] : ρ(x, y) < r, r > 0, x ∈ C[0,1].A random process X defined on the probability space (Ω,F,P) whose pathsare continuous can be viewed now as a C[0,1]-valued random variable, i.e.a measurable map of (Ω, F) to (C[0,1], B(C[0,1])). In particular, X inducesa probability measure on the Borel field of the space of continuous func-tions, which plays the role and is often referred to as the distribution or(probability) law of X.

Definition 2.4. A sequence of processes (Xn) converges to a processX weakly, denoted Xn d→ X, if

Ef(Xn) → Ef(X)

for any bounded continuous (w.r.t. to the uniform metric) functional f .

The projection πt1,...,tn(x) = (xt1 , ..., xtn), supt≤1 xt and∫ 10 xtdt are ex-

amples of continuous functionals on C[0,1] (explain why). The first hittingtime of a level a ∈ R:

τa(x) = inft ∈ [0, 1] : xt = a,(with the convention inf∅ = ∞) and 1xt=c with a constant c are examplesof discontinuous functionals on C[0,1] (why?).


Since Xn and X induce probability measures µn and µ on the space ofcontinuous functions on [0, 1], the latter convergence is nothing but

limn

∫

C[0,1]

fdµn =∫

C[0,1]

fdµ

for all continuous functionals f . Hence if we want to construct a processwith the f.d.d., we may try to find a sequence of weakly convergent processes,or equivalently a sequence of weakly convergent measures on B(C[0,1]), suchthat the limit probability measure µ on (Ω, F) = (C[0,1], B(C[0,1])) has thegiven f.d.d. The coordinate process Xt(ω) = ωt on

(Ω,F,P) = (C[0,1], B(C[0,1]), µ)

is then the required process. In the case of the Brownian motion, the soobtained probability measure µ and the corresponding probability space iscalled the Wiener measure and the Wiener space after N.Wiener, who wasthe first to give a rigorous construction of the Brownian motion.

The disadvantage of this approach, is that it requires finding an appro-priate sequence of the weakly convergent processes, which in general can bequite challenging. On the technical level, establishing weak convergence ofprocesses is more involved than for finite dimensional random variables andis beyond the scope of our course. We shall only sketch the main idea ofhow it can used to construct the Brownian motion.

To this end, let (ξi) be a sequence of i.i.d. symmetric random signs15 ona probability space (Ω,F,P), i.e. P(ξ1 = ±1) = 1/2 and for a fixed n ≥ 1,define Bn = (Bn

t )t∈[0,1] as the linear interpolation of the values of Bnt at the

points ti = i/n, i = 0, ..., n:

Bnti :=

1√n

i∑

j=1

ξj , ti ∈ [0, 1].

In words, Bn is the rescaled and interpolated simple symmetric random walkon integers, where the time is scaled by the factor 1/n and the amplitudeby the factor 1/

√n (draw typical paths of Bn

t for n = 1, 2, 3).By the classical CLT (check!)

Bnt

d→ Bt, ∀t ∈ [0, 1],

where Bt ∼ N(0, t) is the Brownian motion at time t. By the classicalmultivariate CLT (check!)

(Bn

t1 , ..., Bntn

) d→ (Bt1 , ..., Btn), ∀(t1, ..., tn) ∈ [0, 1]n, ∀n ∈ N,

where the right hand side is the vector of multivariate Gaussian distribution,corresponding to the n-dimensional Brownian motion. While the latter is astrong indication in favor of the weak convergence Bn → B in the sense ofDefinition 2.4, its precise justification is more involved:

15any other zero mean random variables with finite variance can be used instead


0 0.2 0.4 0.6 0.8 1−2

−1

0

1

2

t

n=10

0 0.2 0.4 0.6 0.8 1−2

−1

0

1

2

t

n=100

0 0.2 0.4 0.6 0.8 1−2

−1

0

1

2

t

n=500

0 0.2 0.4 0.6 0.8 1−2

−1

0

1

2

t

n=1000

Figure 1. typical trajectories of Bn for various values of n

Theorem 2.5. (Donsker’s invariance principle) The sequence Bn con-verges weakly to the Brownian motion.

Remark 2.6. Note that this construction also gives a way to calculateexpectations of continuous functionals of the Brownian motion. For exam-ple, a combinatorial formula can be used to calculate E supt∈[0,1] B

nt and

since x 7→ supt∈[0,1] xt is a continuous functional, it follows16 from the weak

convergence Bn d→ B, that E supt∈[0,1] Bnt → E supt∈[0,1] Bt. As we shall see,

there are other ways to calculate the latter expectation, directly from theproperties of the axiomatic definition of the Brownian motion.

Remark 2.7. This construction gives a hint of how the Brownian pathscan look like (see Figure 1)

2.3. Construction via subsequent refinements (P.Levy).The main idea of this method is to construct a convergent sequence of

processes Bn = (Bnt )t∈[0,1] on a given probability space, so that its limit

B := limn Bn satisfies the axioms of the Brownian motion.To this end, consider a triangular array of i.i.d. N(0, 1) random variables

ξ(n)k , k ∈ I(n), n ∈ N, where I(n) is the set of odd integers between 0 and

2n, i.e. I(0) = 1, I(1) = 1, I(2) = 1, 3, etc., defined on a probabilityspace (Ω, F,P) (as we saw before, such an array can be indeed constructed).

16actually, an additional argument is required since supt∈[0,1] xt is not a bounded

functional


0 0.2 0.4 0.6 0.8 1

−0.2

−0.1

0

0.1

0.2

time0 0.2 0.4 0.6 0.8 1

−0.2

−0.1

0

0.1

0.2

time

0 0.2 0.4 0.6 0.8 1

−0.2

−0.1

0

0.1

0.2

time0 0.2 0.4 0.6 0.8 1

−0.2

−0.1

0

0.1

0.2

time

Figure 2. Each of the subplots depicts typical realizationsof B

(n−1)t (dotted line) along with its refinement B

(n)t (solid

line) for n = 1, 2, 3, 4

Define the sequence of random processes B(n) =(B

(n)t

)t∈[0,1]

, n ∈ N ∪ 0by setting B

(0)t = tξ

(0)1 and proceeding recursively as follows. First define

B(n)t at the dyadic points t ∈ 0, 1/2n, ..., 1 by setting:

B(n)k/2n−1 := B

(n−1)k/2n−1 , k ∈ 0, ..., 2n−1

B(n)k/2n :=

12

(B

(n−1)(k−1)/2n−1 −B

(n−1)(k+1)/2n−1

)+

√1

2n+1ξ(n)k , k ∈ I(n)

and extend the definition of B(n)t to all other points t ∈ [0, 1] by the piecewise

linear interpolation (run this recursion for several small n’s to get a feelinghow it works, see Figure 2).

We shall establish the existence of a random process B = (Bt)t∈[0,1] onthe given probability space (Ω, F,P), satisfying the axioms of the Brownianmotion.

We shall treat (Bn) as a sequence of random variables in the space C[0,1]

of continuous functions on [0, 1] with the uniform metric:

ρ(x, y) = supt∈[0,1]

|xs − ys|, x, y ∈ C[0,1].


Lemma 2.8.(B(n)

)is the Cauchy sequence P-a.s., i.e.

P(

ω ∈ Ω : limn

supm≥n

ρ(B(n)(ω), B(m)(ω)

)= 0

)= 1. (2.1)

As the space C[0,1] with the uniform metric is complete, i.e. any Cauchysequence is convergent, the Lemma implies existence of a measurable processBt(ω) and a measurable set Ω′ with P(Ω′) = 1, such that

limn

ρ(B(n)(ω), B(ω)

)= 0, ∀ω ∈ Ω′.

Note that t 7→ Bt(ω) is continuous17 for all ω ∈ Ω′ and by constructionB0(ω) = 0. Hence it is left to show that

Lemma 2.9. B has independent Gaussian increments and

Bt −Bs ∼ N(0, t− s), 0 ≤ s ≤ t ≤ 1.

To recap, B is the Brownian motion on the filtered probability space(Ω,F, (FB

t ),P).The following proposition states that Bn can be written as series:

Proposition 2.10. For n ≥ 0,

B(n)t =

n∑

m=0

∑

k∈I(m)

ξ(m)k (ω)S(m)

k (t),

where S(m)k (t) are the Schauder functions

S(m)k (t) =

∫ t

0H

(m)k (s)ds, t ∈ [0, 1]

and

H(m)k (t) :=

2(n−1)/2, t ∈ [(k − 1)/2n, k/2n)−2(n−1)/2, t ∈ [k/2n, (k + 1)/2n]0 otherwise

, k ∈ I(m), m ∈ N ∪ 0

are the Haar basis functions.

Proof. The proof is by inspection (plot H(m)k and S

(m)k for small m’s

to get a feeling how these functions look like). ¤

Proof of Lemma 2.8. To check (2.1), it is enough to find a set Ω′ withP(Ω′) = 1, such that

supt∈[0,1]

∞∑

m=0

∑

k∈I(m)

∣∣∣ξ(m)k (ω)S(m)

k (t)∣∣∣ < ∞, ∀ω ∈ Ω′.

17note that if the trajectories of B are continuous on a set of full P-measure, acontinuous version of B can be chosen by K-Ch theorem.


To this end, define bn := maxk∈I(n)

∣∣ξ(n)k

∣∣ and show (exercise) that

P(bn > n) ≤ c1n

2ne−n2/2, ∀n ≥ 1,

where c > 0 is a constant. Hence by the Borel-Cantelli lemma (see Problem2.2.9), there exists a random integer n0(ω) and a set Ω′ with P(Ω′) = 1, suchthat

bn(ω) ≤ n, ∀n ≥ n0(ω), ∀ω ∈ Ω′.Then for ω ∈ Ω′,

supt∈[0,1]

∞∑

m=0

∑

k∈I(m)

∣∣∣ξ(m)k (ω)S(m)

k (t)∣∣∣ ≤

∞∑

m=0

bm supt∈[0,1]

∑

k∈I(m)

S(m)k (t)

†≤

∞∑

m=0

bm

√1

2m+1=

n0(ω)∑

m=0

bm

√1

2m+1+

∞∑

m=n0(ω)+1

m

√1

2m+1< ∞

where in † we used the fact that the supports of S(m)k , k ∈ I(m) do not

overlap on [0, 1] and supt≤[0,1] S(m)k (t) ≤

√1

2m+1 for all k ∈ I(m). ¤

The following proposition will be convenient to use in the proof of Lemma2.8:

Proposition 2.11.∞∑

n=0

∑

k∈I(n)

S(n)k (t)S(n)

k (s) = t ∧ s.

Proof. (sketch) Consider space L2([0, 1];R) of the Borel measurablefunctions, square integrable on [0, 1] w.r.t. the Lebesgue measure:

φ ∈ R[0,1] :

∫ 1

0φ2

sds < ∞

,

with the norm18 ‖φ‖2 :=√∫ 1

0 φ2sds. The space L2([0, 1];R) becomes the

Hilbert space, if considered with the scalar product:

〈ψ, φ〉 =∫ 1

0ψsφsds.

It can be shown that the Haar functions H(n)k (t), t ∈ [0, 1] form an orthonor-

mal basis, i.e. for any ψ ∈ L2([0, 1];R)

φ(t) :=∞∑

n=0

∑

k∈I(n)

φk,nH(n)k (t),

18strictly speaking, ‖ · ‖2 is not a norm, since ‖φ − ψ‖2 = 0 does not imply φ = ψ.Hence the proper way is to define L2([0, 1];R) as the space of equivalence classes of squareintegrable functions with the equivalence relation φ ∼ ψ if and only if ‖φ− ψ‖2 = 0


where the coefficients are given by

φk,n = 〈φ,H(n)k 〉.

Finally, the Parseval identity states that for φ, ψ ∈ L2,

〈φ, ψ〉 =∞∑

n=0

∑

k∈I(n)

〈φ,H(n)k 〉〈ψ, H

(n)k 〉.

Now for t, s ∈ [0, 1], take

φ(u) := 1u≤t, ψ(u) := 1u≤s, u ∈ [0, 1].

Then

s ∧ t =∫ 1

0ψ(u)φ(u)du = 〈ψ, φ〉 =

∞∑

n=0

∑

k∈I(n)

〈φ,H(n)k 〉〈ψ,H

(n)k 〉 =

∞∑

n=0

∑

k∈I(n)

S(n)k (s)S(n)

k (t).

Now we are ready to complete the construction: ¤

Proof of Lemma 2.9. It suffices to check that for any λ1, ..., λk ∈ Rk,

E exp

i

k∑

j=1

λj(Btj −Btj−1)

=

k∏

j=1

exp(−1

2λ2

j (tj − tj−1))

.

Since ρ(B(n), B) → 0, P-a.s.

E exp

i

k∑

j=1

λj(Btj −Btj−1)

= lim

n→∞E exp

i

k∑

j=1

λj

(B

(n)tj−B

(n)tj−1

)

and hence it is left to show that

limn→∞E exp

i

k∑

j=1

λj

(B

(n)tj−B

(n)tj−1

) =

k∏

j=1

exp(−1

2λ2

j (tj − tj−1))

.

The latter is easily verified using Proposition 2.11 (exercise). ¤

2.4. Exercises.

Problem 2.2.1. In this problem we shall see that the axiomatic defini-tion of the Brownian motion determines its finite dimensional distributions.Consider the family of distributions

Qt1,...,tn(A1 × ...×An) :=∫

A1×...×An

n∏

i=1

1√2π(ti − ti−1)

exp−1

2(xi − xi−1)2

ti − ti−1

dx1...dxn


whenever 0 < t1 < ... < tn < ∞ and n ∈ N and

Qt1,...,tn(A1 × ...×An) := Qtj1 ,...,tjn(Aj1 × ...×Ajn)

for an arbitrary unordered set of times t1, ..., tn, where (j1, ..., jn) is thepermutation which puts ti’s in the increasing order, i.e. 0 < tj1 < ... < tjn .

(1) Show that the above family is consistent(2) Show that a process B, whose finite dimensional distributions coin-

cide with the above family, has independent Gaussian increments,i.e. for t1 < ... < tn, the random variables Bt1 , Bt2−Bt1 , Btn−Btn−1

are independent and Bti −Bti−1 ∼ N(0, ti − ti−1).

Problem 2.2.2. Show that no positive α, β and C exist so that

E|Xt −Xs|α ≤ C|t− s|1+β, ∀ 0 ≤ s ≤ t < ∞when X is the Poisson process. Explain why this should be expected in viewof the Kolmogorov-Chentsov theorem.

Hint: first show that if ξ ∼ Poi(λ), then for any α > 0 , Eξα ≥ 1− e−λ

and thus limλ→0 Eξα/λ ≥ 1

Problem 2.2.3. Let (ξi) be an i.i.d. sequence of random signs, i.e.P(ξ1 = ±1) = 1/2 and for n ≥ 1, let 19

Bnt =

1√n

[nt]∑

i=1

ξi, t ∈ [0, 1].

(1) Plot a typical trajectory of B4t , t ∈ [0, 1]

(2) Show that Bnt

d→ N(0, t) for any t ∈ [0, 1]

Hint: use the CLT

(3) Show that for any t1 < ... < tk and k ≥ 1(Bn

t1 , Bnt2 −Bn

t1 , ..., Bntk−Bn

tk−1

) d→ N(0, C)

where C is a k × k diagonal matrix with the entries C11 = t1 andCii = ti − ti−1

Hint: use the multivariate CLT

Problem 2.2.4 (elementary properties of the Brownian motion). Provethe following properties of the Brownian motion, using its axiomatic defini-tion:

19by convention∑0

i=1(...) = 0


(1) Show that20 cov(Bt, Bs) = t ∧ s(2) Show that B is a Markov process, i.e.

E(ϕ(Bt)|Fs) = E(ϕ(Bt)|Bs), P− a.s.

for all measurable bounded functions ϕ and t ≥ s ≥ 0.(3) Show that B is a martingale (w.r.t. to its natural filtration), i.e.

E(Bt|Fs) = Bs, P− a.s.

(4) Show that the process 〈B〉t := B2t − t, t ≥ 0 is an Ft- martingale

(5) Show that B is not differentiable in probability at any t ≥ 0, i.e.for any sequence hn → 0, the sequence of random variables (Bt −Bt+hn)/hn diverges in probability.

Hint: recall that if ξnP→ ξ, then ξn is a Cauchy sequence in

probability, i.e.

limn

supm≥n

P(|ξn − ξm| ≥ ε) = 0, ∀ε > 0,

and use the Gaussian properties of B(6) Show that the process Bc

t := Bt+c − Bc with a constant c > 0 isthe Brownian motion w.r.t. the filtration Ft+c

(7) Show21 that the process Bct := 1√

cBtc with a constant c > 0 is the

Brownian motion w.r.t. the filtration Ftc

(8) Let B be the Brownian motion on [0, T ] for some T > 0. Showthat BT−t − BT is the Brownian motion w.r.t. reversed filtrationfiltration Fr

t := σBT−s −BT , s ∈ [0, t]

Problem 2.2.5. Let bn := maxk∈I(n) |ξk|, where (ξk) are i.i.d. N(0, 1)r.v. and I(n) is the set of odd integers between 0 and 2n.

(1) Show that

P(bn > n) ≤ c1n

2ne−n2/2, ∀n ≥ 1,

where c > 0 is a constant.(2) Conclude that there exists a random index n0(ω) and a set Ω′ with

P(Ω′) = 1, such that

bn(ω) ≤ n, ∀n ≥ n0(ω), ∀ω ∈ Ω′.

20a ∧ b := min(s, b) and a ∨ b := max(a, b)21this shows that Brownian trajectories are ”fractals”: roughly, if take a typical

Brownian trajectory, shrink the time by the factor c and the amplitude by the factor c1/2,the obtained trajectory is still a typical Brownian trajectory.


Problem 2.2.6. Using the identity∞∑

n=0

∑

k∈I(n)

S(n)k (t)S(n)

k (s) = t ∧ s, (2.2)

where S(n)k are the Schauder functions, show that

limn→∞E exp

i

k∑

j=1

λj

(B

(n)tj−B

(n)tj−1

) =

k∏

j=1

exp(−1

2λ2

j (tj − tj−1))

.

Hint: note that

exp

i

k∑

j=1

λj

(B

(n)tj−B

(n)tj−1

) = exp

−i

k∑

j=1

(λj+1 − λj)B(n)tj

,

with λk+1 := 0. Substitute the partial sums, which define B(n)t and use the

Gaussian properties of ξ(n)k ’s and the identity (2.2).

Problem 2.2.7.

(1) Show that the Brownian motion is not increasing (and hence alsonot decreasing, i.e. not monotonous) P-a.s. on any interval, i.e.

P(ω ∈ Ω : Bt(ω) ≥ Bs(ω) ∀t ≥ s ∈ [a, b]

)= 0, ∀a < b ∈ R

Hint: argue that (w.l.o.g. assuming that [a, b] = [0, 1])

ω ∈ Ω : Bt(ω) ≥ Bs(ω) ∀t ≥ s ∈ [a, b]⊆

∩ni=1

ω ∈ Ω : Bi/n(ω)−B(i−1)/n(ω) ≥ 0

and calculate the probability of the left hand side.

(2) Show that the Brownian motion is not increasing on any intervalP-a.s., i.e. the set

ω ∈ Ω : ∃ a < b ∈ R such that Bt(ω) ≥ Bs(ω) ∀ t ≥ s ∈ [a, b]

is a subset of a measurable P-null set.

Hint: recall that rational numbers are dense in R

Problem 2.2.8.

(1) Give an example of a sequence (φn) of continuous functions on[0, 1], so that the pointwise limit φ(t) = limn φn(t), t ∈ [0, 1] isdiscontinuous


(2) Prove that if φn → φ uniformly over [0, 1], i.e.

limn

supt∈[0,1]

|φn(t)− φ(t)| = 0

and φn is continuous, then φ is continuous.

Problem 2.2.9 (The Borel-Cantelli lemmas).Let (An) be a sequence of events on a probability space (Ω,P, F). Define

the setlim

n→∞An := ∩∞n=1 ∪m≥n Am. (2.3)

(1) Recall that for a sequence of numbers (xn),

limn

xn := limn

supm≥n

xm.

Explain the analogy between limn xn and limn An, defined above.

(2) Show that ω ∈ limn An if and only if ω ∈ Akn for some infinitesubsequence of integers (kn). Explain, why limn An is referred toas the set, which occurs infinitely often (denotes An i.o.)

(3) Show that

limn

An =

ω ∈ Ω :

∞∑

k=1

1ω∈Ak = ∞

.

(4) Prove the First Borel-Cantelli lemma:∞∑

n=1

P(An) < ∞ =⇒ P(An i.o.) = 0.

Hint: use the definition (2.3)

(5) Let (ξn) be a sequence of r.v. such that∑

n P(|ξn| ≥ ε) < ∞ for allε > 0. Apply the First B-C lemma to prove that ξn → 0 P-a.s.

Hint: show first thatω ∈ Ω : lim

nξn(ω) 6= 0

= ∪∞k=1 ∩∞n=1 ∪m≥n

ω ∈ Ω : |ξm(ω)| ≥ 1/k

(6) Let (ξn) be a sequence of Bernoulli r.v. with P(ξn = 1) = 1/n2.Show that ξn → 0, P-a.s.

(7) Prove the second B-C lemma, which states22 that if the events An

are independent, then∞∑

n=1

P(An) = ∞ =⇒ P(An i.o.) = 1.

22there are many second B-C lemmas, depending on the independence forms of An

3. SOME PROPERTIES OF THE BROWNIAN MOTION 51

(8) Let (ξn) be a sequence of independent Bernoulli r.v. with P(ξn =1) = 1/n. Show that ξn converges to 0 in probability, but not P-a.s.

3. Some properties of the Brownian motion

The Brownian motion belongs to a number of important classes of sto-chastic process.

Definition 3.1. A process X = (Xt)t∈[0,∞) is Gaussian if all its finitedimensional distributions are Gaussian.

Obviously, the Brownian motion is a Gaussian process.

Definition 3.2. An adapted process X, defined on a filtered probabilityspace (Ω, F, (Ft),P) is a martingale if

E(Xt|Fs) = Xs, ∀t ≥ s ≥ 0 P− a.s.

Proposition 3.3. The Brownian motion is a square integrable23 mar-tingale.

Proof. (exercise) ¤Definition 3.4. A process X, defined on (Ω,F, (Ft),P) is Markov, if

P(Xt ∈ A|Fs) = P(Xt ∈ A|Xs), P− a.s. ∀A ∈ B(R), t ≥ s ≥ 0

Proposition 3.5. The Brownian motion is a Markov process.

Proof. (exercise) ¤In fact, the Brownian motion satisfies the strong Markov property. Recall

that a nonnegative random variable τ is a stopping time w.r.t. (Ft) if

τ ≤ t ⊆ Ft, ∀t ≥ 0,

i.e. given the information contained in Ft, one can decide whether τ ≤ toccurred. The collection of sets

Fτ := A ∈ F : A ∩ τ ≤ t ∈ Ftis a σ-algebra (check!). Roughly, Fτ contains the events, whose occurrencecan be decided given the information obtained till τ (think why).

Definition 3.6. A process X on (Ω, F, (Ft),P) is strongly Markov, if24

for any stopping time25τ

P(Xt ∈ A|Fτ ) = P(Xt ∈ A|Xτ ) ∀A ∈ B(R), P− a.s. on τ ≤ t23a martingale X is square integrable, if EX2

t < ∞ for all t ≥ 0.24this is one of many possible equivalent definitions of the strong Markov property25the statement

P(Xt ∈ A|Fτ ) = P(Xt ∈ A|Xτ ) P− a.s. on τ ≤ tis a shortcut for

P(Xt ∈ A|Fτ )1τ≤t = P(Xt ∈ A|Xτ )1τ≤t P− a.s.


Remark 3.7. The strong Markov property is defined similarly for dis-crete time processes. A calculation shows that in discrete time any Markovprocess is strongly Markov. A continuous time Markov process does nothave to be strongly Markov.

Proposition 3.8. The Brownian motion is strongly Markov and

P(Bt ∈ A|Fτ ) =∫

A

1√2π(t− τ)

exp(− 1

2(x−Bτ )2

(t− τ)

)dx, A ∈ B(R).

P-a.s. on τ < t.One of the consequences of the strong Markov property is that

Xt := Bτ+t −Bτ , t ≥ 0

is a Brownian motion for any stopping stopping time with P(τ < ∞) = 1.Another implication is

Proposition 3.9 (Andre’s reflection principle). For an a ∈ R, let τa =inft ≥ 0 : Bt = a. Then

P(τa ≤ t, Bt < a) =12P(τa < t), t ≥ 0.

Proof. By Proposition 3.8,

P(τa < t,Bt < a) = E1τa<t1Bt<a = E1τa<tE(1Bt<a|Fτa) =

E1τa<t

∫ a

−∞

1√2π(t− τa)

exp(− 1

2(x− a)2

(t− τa)

)dx =

E1τa<t

∫ 0

−∞

1√2π(t− τa)

exp(− 1

2u2

(t− τa)

)du =

12P(τa < t).

¤

The reflection principle allows to find the distribution of τa. Note thatP(τa = t) ≤ P(Bt = a) = 0 and hence

P(τa ≤ t) = P(τa < t) = P(τa < t, Bt ≥ a) + P(τa < t, Bt < a) =

P(Bt ≥ a) +12P(τa ≤ t),

which gives

P(τa ≤ t) = 2P(Bt ≥ a) =2√2πt

∫ ∞

ae−x2/2tdx =

2√π

∫ ∞

a/√

2te−u2

du

and in turn the density

fτa(t) :=d

dtP(τa ≤ t) =

a√2πt3

e−a2/2t.

Note that Eτa = ∞ (check).


3.1. Properties of the Brownian paths. By definition the pathsof the Brownian motion are continuous and, as it turns out, also Holdercontinuous with any exponent γ < 1/2. It is easy to see that the Brownianmotion is not differentiable in the sense that for any fixed t ≥ 0, the sequence(Bt −Bt+1/n)/(1/n), n ≥ 1 diverges in probability (exercise). Remarkably,

the Brownian motion is not differentiable in a much more stronger sense

Theorem 3.10 (Paley-Wiener-Zygmund, 1933). For almost every ω ∈ Ωthe Brownian motion sample path is nowhere differentiable, more precisely,the set 26

ω ∈ Ω : D+Bt(ω) = ∞ or D+Bt(ω) = −∞ ∀t ≥ 0

contains a measurable set of probability 1.

Here is another evidence of irregularity of the Brownian paths:

Theorem 3.11. The set of zeros Zω := t ≥ 0 : Bt(ω) = 0 satisfies thefollowing properties27 P-a.s.:

(1) mes(Zω) = 0(2) Zω is closed and unbounded(3) 0 is the limit point of Zω

(4) Zω has no isolated points (and hence Zω is dense in itself)

Proof.1. By the Fubini theorem

Emes(Zω) = E∫ ∞

01Bt=0dt =

∫ ∞

0P(Bt = 0)︸︷︷︸

≡0

dt = 0.

Since mes(Zω) ≥ 0, P-a.s. (1) follows.

2. Being a preimage of the closed set 0 under the continuous mapt 7→ Bt(ω), Zω is closed. Zω is unbounded, since the Brownian motionrevisits 0 infinitely often: by the strong Markov property the revisit timesof 0 are P-a.s. bounded i.i.d. r.v.

3. To check (3), we shall need

Theorem 3.12 (Blumenthal’s 0-1 law). Let F0+ := ∩t≥0Ft. For anyA ∈ F0+, P(A) equals either 0 or 1.

26for a function f : [0,∞) 7→ R, D+f and D+f are upper and lower right (Dini)derivatives:

D+ft = limh0

ft+h − ft

h, D+ft = lim

h0

ft+h − ft

h.

Similarly, D−f and D−f are defined. Note that f is differentiable at t if all four derivativescoincide and are finite

27mes(A) stands for the Lebesgue measure of a measurable subset


Define

H+ := inft > 0 : Bt(ω) > 0H− := inft > 0 : Bt(ω) < 0

Note that

P(H+ > 0, H− > 0) = P(∃ε > 0 s.t.Bt(ω) = 0 ∀t ∈ [0, ε]) = 0, (3.1)

where the latter equality holds, since the Brownian motion is not monoto-nous on any interval P-a.s. (exercise). By the symmetry of the distributionof B,

P(H+ > 0) = P(H− > 0).

Moreover28 H+ ∈ F0+ (why?) and hence by the Blumenthal’s 0-1 law,

P(H+ > 0) = P(H− > 0) ∈ 0, 1.P(H+ > 0) = P(H− > 0) = 1 contradicts (3.1) and hence P(H+ > 0) =P(H− > 0) = 0 or P(H+ = 0) = P(H− = 0) = 1 and consequently

P(H+ = 0,H− = 0) = 1.

But

H+ = 0,H− = 0 ⊆∃(sn), (tn) such that sn → 0, tn → 0, Bsn(ω) < 0, Btn(ω) > 0,

which verifies the claim (3).

4. (3) and the strong Markov property imply (4) (why?) ¤

Here is yet another manifestation of paths irregularity:

Proposition 3.13. Let (Pn) be a sequence of partitions 29 of [0, T ] withlimn ‖Pn‖ = 0. Then30

V(2)t (Pn) :=

|P n|∑

j=1

|Btj −Btj−1 |2 L2−−−→n→∞ T.

Moreover, if∑

n ‖Pn‖ < ∞, then the convergence is P-a.s.

Proof. Denote ∆Bi := Bti−Bti−1 and ∆ti := ti−ti−1 for brevity, thenusing the independence of increments and the formula E(∆Bi)4 = 3(∆ti)2

28Note that H+ = 0 6∈ F0 = ∅, Ω29recall that a partition P of [0, T ] is an ordered finite set of points 0 = t0 < t1 <

... < tm = T . We shall denote by |P n| the number of points in the partition and by ‖P n‖its diameter, i.e. maxi(ti − ti−1)

30we omit the dependence of ti’s from P n on n for brevity


we get

E(V

(2)t (Pn)− T

)2= E

|P n|∑

j=1

((∆Bi)2 −∆ti

)

2

=

|P n|∑

j=1

E((∆Bi)2 −∆ti

)2= 2

|P n|∑

j=1

(∆ti

)2 ≤ 2‖Pn‖T n→∞−−−→ 0.

If the partitions shrink fast enough, i.e.∑

n ‖Pn‖ < ∞, then the convergenceis P-a.s. by the Borel-Cantelli lemma (check!) ¤

Corollary 3.14. The Brownian paths are of unbounded variation P-a.s., i.e.

T∨

0

B := supP∈P

|P |∑

j=1

|Btj −Btj−1 | = ∞, P− a.s.,

where P is the set of all partitions of [0, T ].

Proof. (exercise) ¤

Remark 3.15. Recall that for a continuously differentiable function f :[0, T ] 7→ R

T∨

0

f =∫ T

0|f ′(t)|dt < ∞.

Hence unbounded variation is yet another indication of irregularity of Brow-nian paths.

Remark 3.16. In what follows we would like to make sense of the inte-gral

∫ t0 XsdBs, where B is the Brownian motion and X is another stochastic

process. Since the variation of B is infinite, such integral cannot be inter-preted neither as the the Riemann-Stieltjes nor as the Lebesgue-Stieltjesintegrals (we shall dwell on this point more in the next section). Remark-ably, the integral

∫ t0 XsdBs can still be defined for a certain restricted but

still quite broad class of processes X and the quadratic variation of B, whichis bounded, will play the key role in the construction.

3.2. Exercises.

Problem 2.3.1. Following the steps below, show that the Brownianpaths are of unbounded variation P-a.s. on any interval [0, T ], i.e.

supP∈P

|P |∑

j=1

|Btj −Btj−1 | = ∞, P− a.s. (3.2)

where P is the set of all partitions of [0, T ].


(1) Argue that (3.2) holds if there exists a sequence of partitions Pn,such that

limn

|P n|∑

j=1

|Btj −Btj−1 | = ∞, P− a.s.

(2) Use the Borel-Cantelli lemma to argue that for Pn =

0, 12n , ..., 1

,

n ≥ 1

limn

|P n|∑

j=1

|Btj −Btj−1 |2 = T, P− a.s.

(3) Show that for the uniform partition Pn from the previous question,

limn

maxtj∈P n

|Btj −Btj−1 | = 0, P− a.s.

(4) Prove (3.2), using the bound|P n|∑

j=1

|Btj −Btj−1 | ≥1

maxtj∈P n |Btj −Btj−1 ||P n|∑

j=1

|Btj −Btj−1 |2

Problem 2.3.2. Using Andre’s reflection principle, we showed that

τa = inft ≥ 0 : Bt = ahas the probability density

fτa(t) :=d

dtP(τa ≤ t) =

a√2πt3

e−a2/2t. (3.3)

This density can be deduced by an alternative route:(1) Show that Mt := exp(λBt − 1

2λ2t) is a martingale(2) Apply an appropriate version of the Optional Stopping theorem31

to derive the Laplace transform of the distribution of τa:

Ee−sτa = e−a√

2s, s ≥ 0 (3.4)

(3) (bonus) Show that (3.4) is the Laplace transform of the density(3.3).

Problem 2.3.3. Show that preimages of closed sets remain closed undercontinuous maps.

Hint: let E1 and E2 be complete topological spaces and f : E1 7→ E2 bea continuous map. Let K ⊆ E2 be a closed set. Take a convergent sequence(xn) in f−1(K) and show that its limit limn xn ∈ f−1(K).

31look up in a book, if unfamiliar

4. ITO’S STOCHASTIC INTEGRAL 57

Problem 2.3.4. For a function f : [0, 1] 7→ R, the variation is definedby

1∨

0

f := supP∈P

∑

tj∈P

|ftj − ftj−1 |

where P is the set of all partitions of [0, 1](1) State whether the following functions on [0, 1] are of bounded vari-

ation and calculate the variation if possible:

(a) the Dirichlet function ft = 1t∈Q(b) ft = 1

t 1t∈(0,1](c) ft = t sin(1

t )1t6=0(d) ft = 1t≥1/2(e) ft = 1t∈A, where A =

1n : n ∈ N

(f) ft =∑∞

n=11n2 1t≥1/n

(2) Show that f is of bounded variation, i.e.∨1

0 f < ∞ if and only ifit is a difference between two nondecreasing functions

Hint: try the decomposition ft = φt − ψt with φt :=∨t

0 f .

(3) Show that if f is continuously differentiable with derivative f ′, then

1∨

0

f =∫ 1

0|f ′|

4. Ito’s stochastic integral

4.1. The (mathematical) motivation. Our next goal is to makesense of the integral ∫ T

0XtdBt, T ≥ 0 (4.1)

where B is the Brownian motion (and later on a more general martingale)and X = (Xt)t≥0 is a stochastic process from a sufficiently broad classof processes, defined on the same filtered probability space (Ω, F, (Ft),P).Besides many practical reasons, defining such an integral is an interestingmathematical problem. To see why and appreciate the constructions tofollow, let’s briefly recall how an integral

∫ t0 xsdys is usually defined for a

pair of functions x = (xt)t∈[0,T ] and y = (yt)t∈[0,T ] and a fixed T > 0. Theclassical constructions are due to T.J. Stieltjes and H.Lebesgue.

For a tagged partition P = 0 = t0 < t1 < ... < tk = T with tagsti ∈ [ti, ti+1], i = 1, ..., k − 1 define the Riemann-Stieltjes sum

S(P, x, y) :=k∑

i=1

xti−1(yti − yti−1). (4.2)


Below we shall write |P | for the number of points in partition P and‖P‖ = maxi≤|P | |ti − ti−1| for its diameter.

Definition 4.1. The function x is Riemann-Stieltjes integrable w.r.t. yon [0, T ] if for any sequence of tagged partitions (Pn) with ‖Pn‖ → 0, thelimit

limn

|P n|∑

j=1

xtj−1

(yti − yti−1

)

exists and is unique (independent of the particular sequence of partitions andtheir tags). The value of the limit is called the Riemann-Stieltjes integral ofx with respect to y and is denoted by

∫ T0 xsdys.

The following is the classical existence result32

Theorem 4.2. If x is continuous and y is of bounded variation, then xis R-S integrable w.r.t. y.

We saw that the trajectories of B are not of bounded variation P-a.s.,hence the latter theorem is not directly33 applicable for defining (4.1).

A different way to approach (4.1), is to note that y can be used to definea measure on the Borel σ-algebra of [0, T ] and to understand

∫ T0 xtdyt in the

sense of Lebesgue. This is done with the help of the following claims34.

Theorem 4.3.(1) if y is nondecreasing, then it is of bounded variation(2) y is of finite variation if and only if it is a difference of two non-

decreasing functions, i.e. there exist nondecreasing functions yt andyt such that

yt = yt − yt, ∀t ∈ [0, T ]. (4.3)

Remark 4.4. The decomposition (4.3) is not unique: one canonicaldecomposition is yt :=

∨t0 y and yt := −(yt − yt).

The following theorem is similar to the existence of the Lebesgue mea-sure:

Theorem 4.5. Let y be a continuous nondecreasing (and thus of boundedvariation) function on [0, T ] and for a nonempty interval [s, t], define

µ([s, t]) := yt − ys ≥ 0.

Then µ extends uniquely to a signed measure on the Borel σ-algebra of [0, T ].

32the proof is not difficult - look up in a book if you are curious33in fact, (4.1) can be defined as R-S integral through the integration by parts formula,

if the integrand X is of bounded variation (this is the Wiener integral). However, sucha definition is not sufficient for many purposes, including stochastic differential equations

(see below). For example,∫ T

0BsdBs cannot be defined this way.

34again, think or consult a book if you are intrigued


Using the above theorems, one can define∫ T

0xsdys :=

∫ T

0xsdys −

∫ T

0xsdys,

where the integrals in the right hand side are understood in the Lebesguesense, if x is measurable (and bounded) and y is of bounded variation(again!). To recap, none of the classical integral constructions does notdirectly apply to (4.1).

It turns out, that in spite of having unbounded variation, B is stillregular enough to admit definition of (4.1), which we shall refer to as thestochastic integral. The first such construction, suggested by K.Ito, is givenbelow.

4.2. The Ito integral: existence and the properties. The Ito in-tegral (4.1) will be defined for the following classes of processes:

Definition 4.6. A measurable process X = (Xt)t≥0 belongs to class L

if it is and Ft-adapted and satisfies

E∫ T

0X2

t dt < ∞, ∀T ≥ 0.

Definition 4.7. A measurable process X = (Xt)t≥0 belongs to class P

if it is Ft-adapted and satisfies

P(∫ T

0X2

t dt < ∞)

= 1 ∀T ≥ 0.

Clearly, L ⊂ P and hence you’re probably wondering at this point, whyto mention P at all: the reason is that the stochastic integral to be yetdefined will have different properties on L and P.

We shall carry out the construction, according the following plan:(i) define the stochastic integral for a particular simple sub-class L0 of

processes from L

(ii) extend the definition by a limit procedure to the processes from L

(iii) extend the definition by a localization procedure to the processesfrom P

Definition 4.8. A process X belongs to the class of simple processesL0 ⊂ L if there exist

(a) a sequence of times (ti)i≥0, with t0 = 0, ti+1 > ti and limi ti = ∞;(b) a sequence of random variables (ξi), where ξi is Fti-measurable for

each i ≥ 1 and supi |ξi(ω)| ≤ C for some constant C > 0such that

Xt(ω) = ξ0(ω)1t=0 +∞∑

j=0

ξj(ω)1t∈(tj ,tj+1] (4.4)


In words, a simple process is piecewise constant with jumps at deter-ministic times and taking random values. Note that (4.4) agrees with therequirement that X is Ft-adapted: this fact will be crucial shortly.

Now we are ready for (i):

Definition 4.9. The stochastic integral of X ∈ L0 is

It(X) :=n−1∑

j=0

ξj

(Btj+1 −Btj

)+ ξn

(Bt −Btn

), t ≥ 0, (4.5)

where n is the (unique) index, such that tn ≤ t < tn+1.

It(X) is a shorter notation for∫ t0 XsdBs, which will help us to avoid

the temptation of claiming various properties, familiar from other integrals,without actually proving them. Note also that (4.5) can be also writtenmore neatly as

It(X) =∞∑

j=0

ξj

(Bt∧tj+1 −Bt∧tj

), t ≥ 0.

The definition (4.5) resembles the Riemann-Stieltjes sum (4.2), with oneimportant (!) distinction: rather than being an arbitrary point in [ti, ti+1],the tag ti is forced to be the beginning of each subinterval, i.e. ti := ti. Thisrequirement is decisive for the properties of the Ito integral:

Proposition 4.10. For X, Y ∈ L0,(p1) I0(X) = 0, P-a.s.(p2) t 7→ It(X) is continuous P-a.s.(p3) It(X) is an Ft-martingale, i.e. it is Ft-adapted and

E(It(X)|Fs

)= Is(X), P− a.s. t ≥ s ≥ 0

(p4) (the Ito isometry)

E(It(X)

)2 = E∫ t

0X2

s ds

and more generally

E((

It(X)− Is(X))2∣∣Fs

)= E

(∫ t

sX2

udu∣∣Fs

), P− a.s. t ≥ s ≥ 0

(p5)

EIt(X)Is(Y ) =∫ t∧s

0EXuYudu

(p6) (linearity) for constants a, b

It(aX + bY ) = aIt(X) + bIt(Y ), P− a.s. t ≥ 0.

(p7) Let It(X) and Jt(Y ) be the Ito integrals of X and Y with respect toindependent Brownian motions, then It(X)Jt(Y ) is a martingale


Proof. We shall check (p3), leaving the rest as an exercise. Let m besuch that tm ≤ s < tm+1; note that m ≤ n, since s ≤ t. Then

E(It(X)|Fs

)= E

n−1∑

j=0

ξj

(Btj+1 −Btj

)+ ξn

(Bt −Btn

)∣∣∣Fs

=

E

m−1∑

j=0

ξj

(Btj+1 −Btj

)+ ξm

(Bs −Btm

)∣∣∣Fs

+

E

ξm

(Btm+1 −Bs

)+

n−1∑

j=m+1

ξj

(Btj+1 −Btj

)+ ξn

(Bt −Btn

)∣∣∣Fs

.

Since ξj is Ftj -measurable and B is Ft-adapted and for j ≤ m, Ftj ⊆ Fs,

E

m−1∑

j=0

ξj

(Btj+1 −Btj

)+ ξm

(Bs −Btm

)∣∣∣Fs

=

m−1∑

j=0

ξj

(Btj+1 −Btj

)+ ξm

(Bs −Btm

)= Is(X)

For j ≥ m + 1, Btj+1 −Btj is independent of Ftj and hence

E(Btj+1 −Btj

∣∣Ftj

)= E

(Btj+1 −Btj

)= 0.

Consequently,

E(ξj

(Btj+1 −Btj

)∣∣Fs

)= E

(E

(ξj

(Btj+1 −Btj

)∣∣Ftj

)∣∣∣Fs

)=

E(ξjE

(Btj+1 −Btj

∣∣Ftj

)∣∣∣Fs

)= 0.

Assembling all parts together, we get the claim. ¤

Example 4.11. By definition,∫ t0 Bs = It(1) = Bt − B0 = Bt (formally,

take t0 = 0, t1 = 1 and ξ0(ω) ≡ 1). All the properties above reduce to thefamiliar properties of the Brownian motion. For example,

E((

It(1)− Is(1))2∣∣Fs

)= E

(∫ t

s12du

∣∣Fs

)= t− s.

¥Now we shall approach part (ii) of the plan, aiming to extend the defi-

nition of It(X) from L0 to L. This shall be accomplished in two phases:

Proposition 4.12. L0 is dense in L, in the sense that for any X ∈ L,there is a sequence of simple processes (Xn) ⊆ L0, such that

limnE

∫ T

0

(Xn

s −Xs

)2ds = 0, T ≥ 0. (4.6)


and

Proposition 4.13. For any X ∈ L, there exists a process It(X), t ≥ 0,called the Ito integral of X, such that

limnE

(It(Xn)− It(X)

)2= 0, t ≥ 0 (4.7)

for any sequence Xn, satisfying

limnE

∫ t

0

(Xn

s −Xs

)2ds = 0, t ≥ 0. (4.8)

To recap, given a process X ∈ L, the Proposition 4.12 yields a sequenceof simple processes Xn, which approximates X in the sense above. We knowhow to define the Ito integral for each Xn using the Definition 4.9. Finally,Proposition 4.13 suggests that the sequence of integrals It(Xn) converges tothe essentially unique limit, which we call the Ito integral of X.

Let’s first verify the easier phase:

Proof of Proposition 4.13. For t ≥ 0, we shall show that It(Xn)is a Cauchy sequence in the space L2(Ω, F,P) of square integrable randomvariables. Since this L2 space is complete35, the Cauchy sequence It(X) isconvergent, i.e. there is a square integrable r.v. It(X) (possibly dependentat this point on the particular sequence at hand), such that (4.7) holds. Tosee that It(Xn) is indeed Cauchy, we shall use the Ito isometry

E(It(Xn)− It(Xm)

)2 = E(It(Xn −Xm)

)2 = E∫ t

0(Xn

s −Xms )2ds ≤

2E∫ t

0(Xn

s −Xs)2ds + 2E∫ t

0(Xs −Xm

s )2ds

and by (4.8)limn

supm≥n

E(It(Xn)− It(Xm)

)2 = 0,

i.e. In(X) is Cauchy, as claimed. It is left to show that the limit It(X) doesnot depend on the approximating sequence. To this end, let (Xn) and (Xn)be sequences, both satisfying (4.8) and set

Y nt =

Xn

t even n

Xnt odd n

, t ≥ 0.

Then the sequence (Y n) also satisfies (4.8) and hence by what we have justproved converges to a limit It(Y ). But Y n converges to It(X) along evenn’s and to It(X) along odd n’s and hence It(X) = It(Y ) and It(X) = It(Y )P-a.s. and hence It(X) = It(X), P-a.s. as claimed. ¤

Now let’s deal with the more technical part:

35an external fact from analysis


Proof of Proposition 4.12. We shall carry out the proof in severalsteps:(step 1) note that the Proposition 4.12 claims that (4.6) is satisfied by a

single sequence Xn, which is independent of T ; however, a simplediagonal argument shows that it is enough to be able to constructXn for each fixed T , i.e. to let it depend on T .

(step 2) approximate processes in L, which are continuous and bounded bya constant

(step 3) use (step 2) to approximate processes in L, which are only adaptedand bounded by a constant (i.e. relax the continuity)

(step 4) use (step 3) to approximate arbitrary X ∈ L (i.e. relax bounded-ness)

Lemma 4.14. (step 1) For X ∈ L, let Xn,T be sequences in L0, suchthat

limnE

∫ T

0(Xn,T

t −Xt)2dt = 0,

for any fixed T ≥ 0. For an m ≥ 1, let nm be such that

E∫ m

0(Xnm,m

t −Xt)2dt ≤ 1/m.

Show that the sequence (Xnm,m) (which is independent of T !) satisfies

limmE

∫ T

0(Xnm,m

t −Xt)2dt = 0

for all T ≥ 0.

Proof. (exercise) ¤From here on, we shall fix T = 1 w.l.o.g.

Lemma 4.15. (step 2) Let X ∈ L be continuous and bounded by a con-stant C, i.e.

supt≤1

|Xs(ω)| ≤ C.

Then there is a sequence (Xn) ⊂ L0, such that (4.6) holds.

Proof. Define the simple process

Xnt (ω) := X0(ω)1t=0 +

n−1∑

k=0

Xk/n(ω)1t∈( kn

, k+1n

]

Since t 7→ Xt(ω) is continuous, supt≤1 |Xt(ω)−Xnt (ω)| → 0 as n →∞. and

as |Xt| ≤ C, (4.6) follows by the dominated convergence. ¤Lemma 4.16. (step 3) Let X ∈ L be adapted and bounded by a constant

C, i.e.supt≤1

|Xt(ω)| ≤ C.

Then there is a sequence (Xn) ⊂ L0, such that (4.6) holds.


Proof. We shall assume that X has cadlag paths and is measurable,in which case it is also progressively measurable. Define

Yt :=∫ t∧1

0Xs(ω)ds

and for a fixed m ≥ 1, let

Xmt (ω) :=

Yt(ω)− Y(t−1/m)∧0(ω)1/m

.

Note that Xmt is bounded by C and has continuous trajectories (why?).

Hence by the previous lemma, there is a sequence of simple processes (Xn,m)such that

E∫ 1

0(Xm

s −Xn,ms )2ds → 0.

By the Fundamental Theorem of calculus the (random) subset

Aω =

t ∈ [0, 1] : Xmt (ω) 6→ Xt(ω)

has zero Lebesgue measure, i.e. mes(Aω) = 0. Since Xm is progressivelymeasurable, by Fubini theorem

A :=

(ω, t) ∈ Ω× [0, 1] : Xmt (ω) 6→ Xt(ω)

is a P×mes-null set. Since both X and Xmt are bounded by C, the dominated

convergence theorem yields:

limmE

∫ 1

0(Xm

t −Xt)2dt = 0.

Now let nm be an index such that

E∫ 1

0(Xm

s −Xnm,ms )2ds ≤ 1/m,

then

E∫ 1

0(Xt −Xnm,m

t )2dt ≤

2E∫ 1

0(Xt −Xm

t )2dt + 2E∫ 1

0(Xm

t −Xnm,mt )2dt

m→∞−−−−→ 0,

which verifies the claim. ¤

Lemma 4.17. (step 4) For X ∈ L, there is a sequence (Xn) ⊂ L0, suchthat (4.6) holds.

Proof. For a k ≥ 1, let

Xkt (ω) := Xt(ω)1|Xt(ω)|≤k.


The process Xk is bounded and adapted, hence by the previous lemma,there is a sequence Xn,k such that Xn,k → Xk in the sense of (4.6). Let nk

be such that

E∫ 1

0

(Xnk,k

t −Xkt

)2dt ≤ 1/k.

Then

E∫ 1

0

(Xt −Xnk,k

t

)2dt ≤ 2E

∫ 1

0

(Xt −Xk

t

)2 + 2E∫ 1

0

(Xk

t −Xnk,kt

)2 ≤

2E∫ 1

0X2

t 1|Xt|>kdt + 1/k → 0,

where we used the assumption E∫ 10 X2

t dt < ∞ (how?). ¤

This completes the proof of the theorem. ¤

It is left to verify that the properties of Proposition 4.10 extend to theIto integral for X ∈ L.

Theorem 4.18. Let X ∈ L, then It(X) satisfies the properties of Propo-sition 4.10.

Proof. (p1) is obvious. (p2) is not at all obvious and in fact, cannot bededuced from our approximation procedure directly (think why). The pre-cise claim is that the process It(X) has a continuous modification. Provingthis requires Doob’s inequality (omitted). Properties (p3)-(p6) are checkedalong the same lines, and we shall prove (p3), leaving out the rest as anexercise.

Let (Xn) ⊂ L0 be the sequence, approximating X ∈ L. Since Xn ∈ L0,it follows that E(It(Xn)|Fs) = Is(Xn) and

E(E

(It(X)|Fs

)− Is(X))2

=

E(E

(It(X)|Fs

)− E(It(Xn)|Fs) + Is(Xn)− Is(X))2≤

2E(E

(It(X)|Fs

)− E(It(Xn)|Fs))2

+ 2E(Is(Xn)− Is(X)

)2=

2E(E

(It(X)− It(Xn)|Fs)

)2+ 2E

(Is(Xn)− Is(X)

)2 †≤

2EE((

It(X)− It(Xn))2|Fs

)+ 2E

(Is(Xn)− Is(X)

)2=

2E(It(X)− It(Xn)

)2 + 2E(Is(Xn)− Is(X)

)2 n→∞−−−→ 0,

where in † we used the Jensen inequality for conditional expectations andthe convergence holds by the definition of the stochastic integral on the classL. ¤


Remark 4.19. Since the stochastic integral is defined as an L2 limit(rather than P-a.s. limit) it is not defined pathwise, i.e. we are not integrat-ing with respect to (almost) each path of the Brownian motion individually,as it would be in the classical integration. This should not come out asa surprise: after all, the classical pathwise integrals were not suitable forintegration with respect to Brownian motion. Under certain additional as-sumptions on the integrands, the stochastic integral does have a versionwhich can be defined pathwise.

As in the case of classical integration, the value of the integral can besometimes calculated directly from its definition:

Example 4.20. Let’s calculate∫ t0 BsdBs. First of all, we have to check

that the integrand is in L: by Fubini theorem

E∫ t

0B2

sds =∫ t

0EB2

sds =∫ t

0sds = t2/2 < ∞.

Hence the Ito integral∫ t0 BsdBs is well defined. Let Pn be the uniform

partition of [0, t] of n points, then we know that (look back at the proofs):

∫ t

0BsdBs = lim

n

n−1∑

j=0

Bt jn

(Bt j+1

n−Bt j

n

),

where the convergence is in L2. Using the elementary identity u(v − u) =12(v2 − u2)− 1

2(v − u)2, we get

n−1∑

j=0

Bt jn

(Bt j+1

n−Bt j

n

)=

12

n−1∑

j=0

(B2

t j+1n

−B2t j

n

)− 1

2

n−1∑

j=0

(Bt j+1

n−Bt j

n

)2.

The first sum on the right is telescopic, summing to B2tn −B2

t0 = B2t −B2

0 =B2

t and the second sum converges to the quadratic variation t, as we havealready seen before. Hence

∫ t

0BsdBs =

12B2

t −12t.

This answer is yet another manifestation of unbounded variation of B! In-deed, if B were of bounded variation, the Stieltjes integral

∫ t0 BsdBs would

be well defined and satisfy (think why)∫ t

0BsdBs =

12B2

t .

The appendix 12 t is genuine Ito. This is a germ of the celebrated Ito formula,

to be deduced in the next section.¥


Our next and the final goal is to extend Ito integration to the yet widerclass P, i.e. to the adapted processes satisfying

P(∫ T

0X2

s ds < ∞)

= 1, ∀T ≥ 0. (4.9)

This is done through the following localization procedure. For n ≥ 1,define

τn := n ∧ inf

t ≥ 0 :∫ t

0X2

s ds ≥ n

,

with the convention inf∅ = ∞. In words, τn is minimum between n

and the first time the process t 7→ ∫ t0 X2

s ds hits the level n ≥ 1. By theassumption (4.9), (τn) is an increasing sequence of bounded stopping timesand P(τn →∞) = 1. Define the stopped process

Xnt := Xt∧τn .

By the very definition of τn,∫ T

0(Xn

s )2ds ≤ n2 < ∞ ∀T ≥ 0

and hence Xn ∈ L. Consequently, the Ito integral It(Xn) is well defined fort ≥ 0. Moreover36, for n ≥ m,

E(Iτm(Xn)− Iτm(Xm)

)2 = E∫ τm

0(Xn

s −Xms )2ds = 0,

since Xns = Xm

s on s ≤ τm. Hence for X ∈ P we can define

It(X) := It(Xn) on the event t ≤ τn.This definition makes sense, since It(Xn) coincide P-a.s. for all n whenevert ≤ τn and since P

(∪n t ≤ τn

)= 1 by (4.9).

The stochastic integral It(X) on P have the same path properties as onL, namely (p1), (p2) and (p6), but doesn’t necessarily satisfies (p3)-(p5),which are inherently tied up to L (beware, there are counterexamples!)

4.3. The Ito integral beyond Brownian motion. It is important37

to define the stochastic integral with respect to martingales, other thanB. This is indeed possible and can be done essentially through the sameapproximation procedure as in the case of the Brownian motion.

Our discussion of this issue will be sketchy, as its serious considerationrequires theory of martingales beyond our scope. Here are some basic facts.

36here we use an additional property of the Ito integral, which you would have toaccept without proof or consult a book for it: if X ∈ L and τ ≤ C is a stopping time,

then E(Iτ (X)

)2= E

∫ τ

0X2

s ds.37e.g. in the context of stochastic differential equation of the next section


Definition 4.21. An Ft-adapted process M = (Mt)t≥0 on the filteredprobability space (Ω, F, Ft,P) is a martingale, if

E(Mt|Fs) = Ms, P− a.s. t ≥ s ≥ 0,

supermartingale if

E(Mt|Fs) ≤ Ms, P− a.s. t ≥ s ≥ 0,

and submartingale if

E(Mt|Fs) ≥ Ms, P− a.s. t ≥ s ≥ 0.

One of the fundamental results in the theory of martingales is the Doob-Meyer decomposition, which states that a super/submartingale Xt (satisfy-ing certain integrability properties) admits the essentially unique represen-tation:

Xt = X0 + At + Mt, (4.10)

where Mt is a martingale with M0 = 0 and At is an adapted process withA0 = 0 and finite variation. If Xt is a supermartingale, then At is non-increasing and if it is submartingale, At is nondecreasing. Processes, whichadmit the decomposition (4.10) are collectively called semimartingales (notethat if A is not monotonous, then X is not sub or super martingale).

Example 4.22. The process Xt = B2t is a sub-martingale: by the Jensen

inequality

E(Xt|Fs) = E(B2t |Fs) ≥

(E(Bt|Fs)

)2 = B2s = Xs.

By Example 4.20, the corresponding Doob-Meyer decomposition is

Xt = 2∫ t

0BsdBs + t =: Mt + At.

Note that At increases. ¥

If M is a martingale and EM2t < ∞ for any t ≥ 0, M is said to be square

integrable.

Definition 4.23. The quadratic variation of a square integrable mar-tingale M is the process 〈M〉t, which compensates M2

t up to a martingale,i.e. such that the process M2

t − 〈M〉t is an Ft- martingale.

The unique existence of 〈M〉t follows from the Doob-Meyer decompo-sition. Indeed, the process Xt := M2

t is a sub-martingale (see the aboveexample) and hence

Xt = X0 + At + Mt,

for some martingale M and a nondecreasing process A. Hence 〈M〉t :=X0 + At.


Remark 4.24. At the first glance 〈M〉t doesn’t have much to do withthe quadratic variation in the usual sense. But the ambiguity of names isnot occasional and in fact, the two notions coincide (for continuous squareintegrable martingales). The proof is a calculation similar to the case of theBrownian motion, however a bit more delicate, since Mt is not Gaussian andmight not have the fourth moment at all.

Example 4.25. From the previous example, 〈B〉t = t. We saw that forX ∈ L, the stochastic integral It(X) =

∫ t0 XsdBs is a continuous square

integrable martingale. Its quadratic variation is 〈I(X)〉t =∫ t0 X2

s ds:

E((

It(X))2 −

∫ t

0X2

udu∣∣∣Fs

)=

E((

It(X)− Is(X) + Is(X))2 −

∫ t

0X2

udu∣∣∣Fs

)=

(Is(X)

)2 −∫ s

0Xudu + 2Is(X)E

(It(X)− Is(X)

∣∣∣Fs

)+

E((

It(X)− Is(X))2 −

∫ t

0X2

udu∣∣∣Fs

)=

(Is(X)

)2 −∫ s

0Xudu,

where in the last equality we used the properties (p3) and (p4). ¥The quadratic variation 〈M〉t is an important characteristic of the mar-

tingale. In particular, it is the main building block in the definition of Itointegral with respect to M .

Definition 4.26. Let M be a continuous square integrable martingaleand X be a measurable and adapted process, satisfying 38

E∫ T

0X2

t d〈M〉t < ∞, (4.11)

(denoted X ∈ L(M)). The Ito integral of X with respect to M , denotedIMt (X) =

∫ t0 XsdMs, is the essentially unique L2 limit of the sequence

IMt (Xn), where Xn is a sequence of simple processes (for which the in-

tegral is defined through sums as in (4.5) with respect to the increments ofM), such that

limnE

∫ t

0

(Xn

s −Xs

)2d〈M〉s = 0.

The correctness of this definition is verified essentially as in the case ofthe Brownain motion, with some techncial adjustments. By an appropriatelocalization, the definition of the integral is extended to the class P(M) ofadapted processes, satisfying

P(∫ T

0X2

t d〈M〉t < ∞)

= 1, T ≥ 0.

38since 〈M〉t is an increasing process, the integral (4.11) is well defined and understoodin the Stieltjes sense


The Ito integral with respect to continuous square integrable martingalessatisfies the same properties as in the case of the Brownian motion.

Remark 4.27. In fact, the definition of the Ito integral can be extendedeven further: one can integrate predictable processes with respect to localmartingales39.

One important application of the above generalization of the Ito integralis integration with respect to stochastic integrals, viewed as martingales ontheir own. To this end, we shall need the following notion.

Definition 4.28. For a pair of continuous square integrable martingalesM and N , the covariation process 〈M, N〉t is the essentially unique compen-sator of MtNt up to a martingale, i.e. such that the process MtNt−〈M, N〉tis a martingale.

An effort, which we shall omit, is required to argue that the above defi-nition is correct.

Example 4.29. For a pair of independent Brownian motions W and V ,the covariation 〈W,V 〉t = 0, since WtVt is a martingale (check!). On theother hand, the covariation 〈V, V 〉t = t = 〈V 〉t. Now let X and Y a pairof L processes with the corresponding Ito integrals It(X) =

∫ t0 XsdWs and

Jt(X) =∫ t0 YsdVs. Then 〈I(X), J(Y )〉t = 0, by (p7) and 〈I(X), I(Y )〉t =∫ t

0 XsYsds (look at (p5)). ¥

For a process X ∈ L(M), consider the Ito integral It(X) =∫ t0 XsdMs,

which is a square integrable martingale with 〈I〉t =∫ t0 X2

s d〈M〉t. If Y is aprocess such that

∫ t

0Y 2

s 〈I〉s =∫ t

0Y 2

s X2s 〈M〉s < ∞,

then the Ito integral∫ t0 YsdIs is well defined. Further40

E(∫ t

0YsdIs −

∫ t

0YsXsdMs

)2

=

E(∫ t

0Y 2

s d〈I〉s − 2∫ t

0Y 2

s Xs〈I, M〉s +∫ t

0Y 2

s X2s d〈M〉s

)=

E(∫ t

0Y 2

s X2s d〈M〉s − 2

∫ t

0Y 2

s XsXsd〈M〉s +∫ t

0Y 2

s X2s d〈M〉s

)= 0,

and hence∫ t0 YsdIs =

∫ t0 YsXsdMs - as should be expected!

39A process M is a local martingale, if there exists an increasing sequence of stoppingtimes τn →∞, P-a.s., such that the stopped process Mt∧τn is a martingale

40we use the obvious properties EM2t = E〈M〉t and EMtNt = E〈M, N〉t


Example 4.30. Let Mt :=∫ t0 BsdBs. Can we define the Ito integral∫ t

0 BsdMs ? We have

E∫ t

0B2

sd〈M〉s = E∫ t

0B4

sds =∫ t

0EB4

sds < ∞

and hence the answer is affirmative and, in fact,∫ t

0BsdMs =

∫ t

0B2

sdBs.

¥

4.4. Exercises.

Problem 2.4.1. State whether each of the processes below belongs to

the class P or/and L and calculate E(∫ t

0 XsdBs

)2, t ≥ 0 when applicable

(B is the Brownian motion).

(1) Xt =∫ t0 Bsds

(2) Xt =∫ 1t Bsds

(3) Xt =1√t

(4) Xt =Bt

t3/4

(5) Xt = exp(−B2

t

)

(6) Xt = exp(B2

t

)

Problem 2.4.2. We saw that the Ito integral∫ t0 BsdBs is the L2 limit

of the sums ∑

tj≤t

Btj−1

(Btj −Btj−1

), (4.12)

as maxj |tj − tj−1| → 0 and that computing this limit yields the formula∫ t

0BsdBs =

12(B2

t − t).

Note that (4.12) is the Riemann-Stieltjes sum, where the integrand is eval-uated at the leftmost point of each interval tj−1.

Now consider the sum (4.12), where the integrand is evaluated at themidpoint tj−1(λ) := λtj−1 + (1− λ)tj with a fixed parameter λ ∈ [0, 1].


(1) Show that

n∑

j=1

Btj−1(λ)

(Btj −Btj−1

) n→∞−−−→L2

∫ t

0BsdBs + λt. (4.13)

The result in (1) suggests, that for each λ ∈ [0, 1], a different (!) stochasticintegral (at least, for the particular integrand at hand) can be defined as thelimit (4.13):

Iλt (B) = lim

n

n∑

j=1

Btj−1(λ)

(Btj −Btj−1

).

(2) Argue that the only choice, for which Iλt (B) is a martingale, is

λ = 0, i.e. the Ito integral41

(3) Explain why (4.13) implies that B has unbounded variation (referto the definition and the existence theorem of the Stieltjes integral)

Problem 2.4.3. Show that the Ito integral It(X) is a square integrablecontinuous martingale with quadratic variation

〈I(X)〉t =∫ t

0X2

s ds, t ≥ 0,

for a simple process X ∈ L0. This amounts to showing that42

(1) I(X) has continuous paths(2) I(X) is Ft-adapted(3) It(X) is square integrable

E(It(X)

)2< ∞, t ≥ 0,

(4) I(X) is a martingale:

E(It(X)|Fs

)= Is(X), P− a.s. t ≥ s ≥ 0

(5) the process I2t (X)− ∫ t

0 X2s ds is a martingale

Problem 2.4.4. Prove the claim of the previous problem for X ∈ L.

41another choice λ = 1/2, which corresponds to the Fisk-Stratonovich integral, ispopular for the reasons beyond our course

42roughly corresponding to the properties (p1)-(p7)

5. THE ITO CHAIN RULE 73

Problem 2.4.5. Recall that if x is a continuous function and y is afunction of bounded variation, then for any sequence of tagged partitions(Pn), the limit

lim‖P n‖→0

|P n|∑

j=1

xtj−1

(ytj − ytj−1

)

exists, does not depend on the sequence of partitions used and is calledthe Stieltjes integral of x w.r.t. y, denoted by

∫ t0 xsdys. Remarkably, the

integration by parts formula:∫ t

0xsdys = xtyt − x0y0 −

∫ t

0ysdxs (4.14)

holds, whenever one of the Stieltjes integrals is defined, in which case theother is defined automatically.

(1) Explain why the Ito stochastic integral coincides with the Stieltjesintegral for integrands of bounded variation.

(2) Let xt, t ≥ 0 be a continuously differentiable (deterministic) func-tion with the derivative xt = d

dtxt, satisfying∫ t

0x2

udu < ∞, t ≥ 0.

Using the definition

It(x) := xtBt −∫ t

0Bsxsds, t ≥ 0,

show that It(x) is a square integrable martingale with the quadraticvariation

〈I(x)〉t =∫ t

0x2

sds.

Compare this to the standard definition of the Ito integral.

Hint: the calculations are direct, but a bit tedious - do not beafraid to apply integration by parts formula a number of times

5. The Ito chain rule

If t 7→ xt ∈ R is a continuously differentiable function of the time variablet ∈ [0,∞) and x 7→ f(x) ∈ R, x ∈ R is a continuously differentiable function,then by the chain rule from the classical calculus, the time function t 7→ f(xt)is differentiable and

d

dtf(xt) = f ′(xt)

d

dtxt := f ′(xt)xt,


or, in the integral form,

f(xt) = f(x0) +∫ t

0f ′(xs)xsds, t ≥ 0.

If the trajectory xt is of bounded variation, but not necessarily differentiable,then

f(xt) = f(x0) +∫ t

0f ′(xs)dxs, t ≥ 0, (5.1)

where the integral in the right hand side is understood in the Stieltjes-Lebesgue sense. The counterpart of the above formulas in stochastic calculusis the celebrated Ito chain rule.

Theorem 5.1 ( Ito chain rule). Let X be a continuous semimartingale

Xt = X0 + At + Mt, t ≥ 0

where A is an adapted process of finite variation and M is a continuoussquare integrable martingale43. Then for a twice continuously differentiablefunction44 f , the process f(Xt) is a semimartingale, satisfying

f(Xt) = f(X0) +∫ t

0f ′(Xs)dAs +

∫ t

0f ′(Xs)dMs+

∫ t

0

12f ′′(Xs)d〈M〉s, t ≥ 0 (5.2)

Remark 5.2. Since f ∈ C2, the functions t 7→ f ′(Xt(ω)

)and t 7→

f ′′(Xt(ω)

)are continuous P-a.s. Since A and 〈M〉 are of bounded variation,

the first and the last integrals are understood in the Stieltjes-Lebesgue sense.Since t 7→ f ′(Xt) is continuous P-a.s.

∫ t

0

(f ′(Xs)

)2d〈M〉s < ∞, P− a.s.

i.e. f ′(X) ∈ P(M) and hence the second integral is defined in the Ito sense.The integral w.r.t. the semimartingale X is defined

∫ t

0f ′(Xs)dXs :=

∫ t

0f ′(Xs)dAs +

∫ t

0f ′(Xs)dMs

and hence the formula (5.2), reads (compare with (5.1))

f(Xt) = f(X0) +∫ t

0f ′(Xs)dXs +

∫ t

0

12f ′′(Xs)d〈M〉s, t ≥ 0. (5.3)

Often the latter is written in a shorter “differential” notation

df(Xt) = f ′(Xt)dXt +12f ′′(Xt)d〈M〉t.

43in fact, the formula holds also for continuous local martingales (look up in theliterature, if curious)

44f ∈ C2 is the notation


Before proving (5.2), let’s explore it through simple examples:

Example 5.3. Let Xt := Bt and f(x) = x2. In the language of (5.1),X0 = 0, At = 0 and Mt = Bt and 〈M〉t = t. Hence we get the familiarformula

B2t = 2

∫ t

0BsdBs + t.

¥

Example 5.4 (Geometric Brownian motion). Consider the process

St = S0 exp((

µ− 12σ2

)t + σBt

), t ≥ 0, (5.4)

where µ and σ 6= 0 are constants, S0 is an F0-measurable random variableand B is the Brownian motion. The process St is referred to as the geometricBrownian motion and it is one of the classical models for the stock prices,in which case µ is interpreted as the interest rate (think how) and σ is thevolatility, which controls the local variability of the prices.

Let’s find the Doon-Meyer decomposition of St, by applying the Itoformula. To this end, set f(x) = ex, Xt :=

(µ− 1

2σ2)t + σBt with X0 := S0,

At :=(µ− 1

2σ2)t and Mt := σBt, then (5.3) yields:

eXt =eX0 +∫ t

0eXu

(µ− 1

2σ2

)du +

∫ t

0eXud(σBu) +

12

∫ t

0eXuσ2du =

eX0 + µ

∫ t

0eXudu + σ

∫ t

0eXudBu,

and since St = eXt , we conclude that St satisfies the relation

St = S0 + µ

∫ t

0Sudu + σ

∫ t

0SudBu, t ≥ 0. (5.5)

Is there an essentially different random process on the given probabilityspace, which satisfies the same relation ? We shall see that the answer to thisquestion is negative, which means that the geometric Brownian motion couldhave been defined as the unique (up to indistinguishability) solution of theintegral equation (5.5). Deferring the detailed discussion of this importanttopic to the next section, let’s reap the low hanging fruit right away!

Note that S ∈ L (check!) and hence the expectation of the stochasticintegral in the right hand side is zero, i.e. the mean process mt := ESt

satisfies the integral equation

mt = m0 + µ

∫ t

0msds, t ≥ 0.

It can be seen that the latter equation shares the same solutions as thedifferential equation

d

dtmt = µmt, t ≥ 0, (5.6)


subject to a given m0. The function mt = m0eµt is an obvious solution

(check!). In fact, it is the only solution and hence we conclude that

ESt = eµtES0, t ≥ 0.

Furthermore, let’s apply the Ito formula to S2t :

S2t = S2

0 +∫ t

02SudSu +

12

∫ t

02σ2S2

udu =

S20 + 2µ

∫ t

0S2

udu + 2σ

∫ t

0S2

udBu +12

∫ t

02σ2S2

udu.

Since S2 ∈ L, taking the expectation of both sides, yields the equation forthe second moment Qt := ES2

t

Qt = Q0 + (2µ + σ2)∫ t

0Qudu, t ≥ 0,

which similarly to (5.6) admits the unique solution:

Qt = Q0 exp((2µ + σ2)t

), t ≥ 0,

i.e. ES2t =

((2µ + σ2)t

)ES2

0 . Of course, the same formulas could have beenobtained by a direct calculation using the definition (5.4). ¥

Proof of Theorem 5.1. As you might have already noticed, the de-rivatives of f in (5.2) resemble the Taylor expansion, which is, indeed, wherethe formula comes from.

Define the stopping times

τn :=

0 |X0| ≥ n

inf

t ≥ 0 : |Mt| ∨∨t

0 A ∨ 〈M〉t ≥ n |X0| < n,

where inf∅ = ∞ and∨t

0 A is the variation of A. Since all the processes arecontinuous P-a.s. limn t ∧ τn = t P-a.s. and hence (5.2) can be obtained bychecking it, up to the bounded stopping times t∧τn and then taking n →∞.Note that Mt∧τn and At∧τn are bounded by n and hence |Xt∧τn | ≤ 3n.Consequently it is enough to check (5.2), when all the involved processes arebounded by a constant C. We have already encountered a similar localizationprocedure in the construction of the Ito integral for processes from P.


Let Pn = 0 = tn0 < tn1 < ... < tnn = t be a sequence of shrinkingpartitions, then by the Taylor expansion

f(Xt)−f(X0) =|P n|∑

j=1

f(Xtj )− f(Xtj−1) =

∑

j

f ′(Xtj−1)∆Xtj +12

∑

j

f ′′(ηj)(∆Xtj

)2 =

∑

j

f ′(Xtj−1)∆Atj +∑

j

f ′(Xtj−1)∆Mtj +12

∑

j


)2,

where we used some inevitable shortcuts of notations (e.g. tj := tnj , ∆Xtj :=Xtj −Xtj−1 , etc.) The points ηj(ω) := Xtj−1 + θj

(Xtj −Xtj−1

)are random

with45 θj(ω) ∈ [0, 1]. We shall show that each of the terms on the right handside converges to the corresponding term in (5.2). It is enough to establishconvergence in L2: one can extract then a subsequence, which converge tothe same limit P-a.s. 46

Since t 7→ f ′(Xt(ω)

)is continuous P-a.s. and A is of bounded variation,

the first sum on the right hand side converges to the Stieltjes-Lebesgueintegral P-a.s. Further, if we define

Y nt = f ′(X0)1t=0 +

∑

j

f ′(Xtj−1)1t∈(tj−1,tj ],

then the second sum is the Ito integral∫ t0 Y n

s dMs. Since f ′(X) is P-a.s.continuous, Y n

t → f ′(Xt) for all t ≥ 0 P-a.s. Further, since f ′ is continuousand X is bounded, f ′(X) is bounded and thus E

∫ T0 (Y n

s −f ′(Xs))2d〈M〉s →0, by the dominated convergence. Consequently,

∫ t0 Y n

s dMs →∫ t0 f ′(Xs)dMs

in L2, i.e. the second term converges to the Ito integral. It is left to showthat

∑

j


)2 L2−−−→n→∞

∫ t

0f ′′(Xs)d〈M〉s (5.7)

To this end, we have∑

j


)2 =∑

j

f ′′(ηj)(∆Atj

)2+

2∑

j

f ′′(ηj)∆Atj∆Mtj +∑

j

f ′′(ηj)(∆Mtj

)2.

45it is not clear at the outset, whether θj(ω) are random variables, i.e. are measurable;however, they are, since we can express them explicitly by solving an appropriate equationwhen ∆Xj 6= 0 and set θj = 0 otherwise

46this is essential, since convergence in L2 does not imply in general P-a.s.convergence.


Since A is continuous (and thus uniformly continuous on any bounded closedinterval) and of bounded variation

∣∣∣∣∣∣∑

j

f ′′(ηj)(∆Atj

)2

∣∣∣∣∣∣≤

sup|x|≤C

|f ′′(x)|( t∨

0

A)

maxj|∆Aj | n→∞−−−→ 0, t ≥ 0 P− a.s.

and, similarly, since M is continuous∣∣∣∣∣∣∑

j

f ′′(ηj)∆Atj∆Mtj

∣∣∣∣∣∣≤

sup|x|≤C

|f ′′(x)|( t∨

0

A)

maxj|∆Mtj | n→∞−−−→ 0, t ≥ 0 P− a.s.

Further, by continuity of f ′′ and X

∣∣∣∣∣∣∑

j

f ′′(ηj)(∆Mtj

)2 −∑

j

f ′′(Xtj−1)(∆Mtj

)2

∣∣∣∣∣∣≤

maxj

∣∣f ′′(ηj)− f ′′(Xtj−1)∣∣∑

j

(∆Mtj

)2 P−−−→n→∞ 0,

since∑

j

(∆Mtj

)2 P→ 〈M〉t < ∞. Finally,

E(∑

j

f ′′(Xtj−1)(∆Mtj )2 −

∑

j

f ′′(Xtj−1)∆〈M〉tj)2

≤

(sup|x|≤C

|f ′′(x)|)2E

(∑

j

((∆Mtj )

2 −∆〈M〉tj))2

=

(sup|x|≤C

|f ′′(x)|)2 ∑

j

E((∆Mtj )

2 −∆〈M〉tj)2≤

2(

sup|x|≤C

|f ′′(x)|)2(∑

j

E(∆Mtj )4 +

∑

j

E(∆〈M〉tj

)2) P−−−→

n→∞ 0,

where the convergence holds, since the first sum on the right is boundedby the variation of the fourth order, which vanishes as ‖Pn‖ → 0, and thesecond sum converges to zero by continuity of 〈M〉. This verifies (5.7) andcompletes the proof. ¤

Here is one important application of the Ito formula


Theorem 5.5 (Levy’s characterization of the Brownian motion). A con-tinuous square integrable martingale M with M0 = 0 and quadratic variation〈M〉t = t is the Brownain motion.

Proof. All but one axiom of the Brownian motion are satisfied: we haveto check only that Mt−Ms is independent of Fs and Mt−Ms ∼ N(0, t−s).Apply the Ito formula to eiλMt , where λ ∈ R and i is the imaginary unit:

eiλMt = eiλMs + iλ

∫ t

seiλMudMu − 1

2λ2

∫ t

seiλMudu, t ≥ s ≥ 0.

Multiplying both sides by e−iλMs , we get

eiλ(Mt−Ms) = 1 + e−iλMsiλ

∫ t

seiλMudMu − 1

2λ2

∫ t

seiλMu−Msdu.

Since |eiλMu | ≤ 1, the stochastic integral above is a martingale and condi-tioning both sides on Fs gives the equation for Qt,s := E

(eiλ(Mt−Ms)

∣∣Fs

):

Qt,s = 1− 12λ2

∫ t

sQu,sdu, t ≥ s ≥ 0.

This linear equation has the unique solution (compare with (5.6))

E(eiλ(Mt−Ms)

∣∣Fs

)= Qt,s = e−

12λ2(t−s), t ≥ s ≥ 0,

which verifies the claim. ¤

Example 5.6. Let B be the Brownian motion and set

Zt =∫ t

0sign(Bs)dBs,

where sign(x) is the sign of x. Since the integrand is bounded, the Itointegral is a martingale and

〈Z〉t =∫ t

0sign2(Bs)ds = t.

Hence by the Levy theorem, Z is the Brownian motion. ¥

Remark 5.7. Continuity is of course a crucial part of Levy’s character-ization. For example, Mt = Πt− t where Πt is the Poisson process with unitintensity, is a martingale with 〈M〉t = t (check!)

The following theorem is the (important!) multivariate analog of Theo-rem 5.1:

Theorem 5.8. Let Mt = (M (1)t , ..., M

(d)t ) be a vector of continuous mar-

tingales and At = (A(1)t , ..., A

(d)t ) be a vector of adapted processes of bounded

variation with A0 = 0 and set

Xt = X0 + At + Mt, t ≥ 0,


where X0 is F0-measurable random vector in Rd. Let f(t, x) : [0,∞)×Rd 7→R be a C1,2 function47. Then P-a.s.

f(t,Xt) =f(0, X0) +∫ t

0

∂

∂tf(s,Xs)ds+

d∑

j=1

∫ t

0

∂

∂xjf(s, Xs)dA(i)

s +d∑

j=1

∫ t

0

∂

∂xjf(s,Xs)dM (i)

s +

12

d∑

i=1

d∑

j=1

∫ t

0

∂2

∂xi∂xjf(s,Xs)d〈M (i),M (j)〉s, t ≥ 0.

(5.8)

The proof is similar to the proof of the one-dimensional counterpart(check that for d = 1, (5.8) reduces to (5.2) ).

Example 5.9. Let X and Y be a pair of semimartingales with the Doob-Meyer decompositions

Xt = X0 + At + Mt

andYt = Y0 + Ct + Nt.

Let’s apply the multivariate Ito formula to the product XtYt. We have

f(x, y) := xy,∂

∂xf(x, y) = y,

∂

∂yf(x, y) = x

∂2

∂x∂yf(x, y) = 1,

∂2

∂y2f(x, y) =

∂2

∂x2f(x, y) = 0

and hence

XtYt = X0Y0+∫ t

0YsdAs+

∫ t

0YsdMs+

∫ t

0XsdCs+

∫ t

0XsdNs+〈M, N〉t =

X0Y0 +∫ t

0YsdXs +

∫ t

0XsdYs + 〈M, N〉t.

The latter is recognized as the integration by parts formula, the Ito coun-terpart of the familiar formula from the classical Stieltjes calculus. In par-ticular, if M and N are independent, then the Ito and the Stieltjes formulascoincide. ¥

5.1. Exercises.

Problem 2.5.1. Derive the semimartingale representation (i.e. theDoob-Meyer decomposition) for the following processes, applying the Itoformula (B is the Brownian motion):

(1) Xt := cos(Bt) and use the obtained decomposition to calculate theexpectation E cos(Bt)

47i.e. continuously differentiable in the t variable and twice continuously jointly dif-ferentiable in the x variables


(2) Xt :=( ∫ t

0 BsdBs

)2

(3) Xt := Bt

∫ t0 B2

sdBs

Problem 2.5.2. Applying the Ito chain rule to polynomials of the Brow-nian motion Bk

t , k ≥ 1, deduce by induction the well known formula for themoments of N(0, 1) random variable

EB2k1 =

k∏

i=1

(2i− 1).

Problem 2.5.3. Consider the process

Xt = exp(∫ t

0asds

)(X0 +

∫ t

0exp

(−

∫ s

0audu

)bsdBs

), t ≥ 0, (5.9)

where X0 is an F0-measurable random variable, and a = (at)t≥0 and b =(bt)t≥0 are adapted continuous processes.

(1) Argue that the Ito integral in (5.9) is well defined

(2) Apply the Ito formula to show that X satisfies the relation:

Xt = X0 +∫ t

0asXsds +

∫ t

0bsdBs, t ≥ 0.

Hint: set Yt := e∫ t0 asds and Zt := X0 +

∫ t0 e−

∫ s0 audubsdBs and

apply the multivariate Ito formula to YtZt

(3) Assuming that as and bs are deterministic, argue that X is a Gauss-ian process, with Mt := EXt satisfying the equation:

Mt = M0 +∫ t

0asMsds

and Vt := var(Xt) satisfying

Vt = V0 +∫ t

02asVsds +

∫ t

0b2sds

Hint: to deduce the second equation, apply the Ito formula toD2

t := (Xt −Mt)2.

(4) Check that

Mt = M0 exp(∫ t

0asds

)

and

Vt = exp(∫ t

02asds

)(V0 +

∫ t

0exp

(−

∫ s

02audu

)b2udu

).


solve the equations from the previous question

Hint: check that Mt and Vt satisfy the corresponding differen-tial equations

(5) If at ≡ a and bt ≡ b are constants, the process X is called theOrnstein−Uhlenbeck process. Specify the conditions on a and b sothat the limits limt→∞Mt and limt→∞ Vt exist and calculate theselimits explicitly.

(6) Find the autocovariance function E(Xt − EXt)(Xs − EXs) of theOrnstein−Uhlenbeck process from the previous question.

Problem 2.5.4 (a moment inequality for Ito integrals). Let X be anadapted process such that

E∫ T

0|Xt|2mdt < ∞,

for some m ≥ 1 and T ≥ 0. Derive the following inequality for the stochasticintegral

E(∫ T

0XsdBs

)2m

≤ (m(2m− 1))mTm−1E∫ T

0|Xt|2mdt.

Hint: apply the Ito formula to M2mt with Mt :=

∫ t0 XsdBs and use the

Holder inequality with appropriate conjugates48(you might want to view1T

∫ T0 E(...)ds as an expectation over the space [0, T ]× Ω)

Problem 2.5.5. Let Bt = (B(1)t , ..., B

(d)t ), d ≥ 2 be a vector of indepen-

dent Brownian motions and define

Rt :=∥∥x + Bt

∥∥ =

√√√√d∑

j=1

(xj + B

(j)t

)2,

where x ∈ Rd. In words, Rt is the distance of the d-dimensional Brownianmotion, started at x from the origin.

(1) Show that if Q is an orthonormal matrix (rotation), i.e. QQ> = I,then the rotated process QBt is again the vector of independentBrownian motions.

48Holder inequality is an extremely useful extension of the Cauchy−Schwarz inequal-ity, which states that for a pair of r.v. ξ and η,

E|ξη| ≤(E|ξ|p

)1/p(E|ξ|q

)1/q

,

for Holder conjugates p and q, i.e. real positive numbers such that 1/p + 1/q = 1.

6. STOCHASTIC DIFFERENTIAL EQUATIONS 83

Hint: check that the components of QBt are independent andeach satisfies the axioms of the Brownian motion.

(2) Show that distribution of the process R depends on the initial pointx only through its norm ‖x‖

Hint: if ‖x‖ = ‖y‖, then there is an orthonormal matrix Q,such that x = Qy

(3) Argue that f(x) = ‖x‖, x ∈ Rd is not differentiable at the origin

(4) Ignoring49 your finding in the previous question, bravely apply themultivariate Ito formula to Rt = f(Bt) to prove that

Rt = ‖x‖+∫ t

0

d− 12Rs

ds + Bs, t ≥ 0

where B is the process

B(i)t =

d∑

i=1

∫ t

0

B(i)s

RsdB(i)

s .

(5) Using the Levy’s characterization of the Brownian motion, provethat B from the previous question is the Brownian motion. 50

6. Stochastic differential equations

6.1. A motivation and some heuristics. A way to model time evo-lution of objects in nature is by means of ordinary differential equations(ODEs):

d

dtxt = f(t, xt), t ∈ [0, T ]

x0 = ξ, (6.1)

where f : [0,∞) × Rd 7→ Rd is a known function, which depends on theparticular problem at hand, and ξ ∈ Rd is the initial condition. The re-lation51 (6.1) is an equation in the space of functions and its solution is adifferentiable function x = (xt)t∈[0,T ], which satisfies (6.1).

Heuristically, the vector field f in the right hand side of (6.1) defines thefield of velocities at each point of time t and space x ∈ Rd. By the Taylorformula, for small δ > 0

xt+δ = xt +d

dtxtδ + o(δ) = xt + f(t, xt)δ + o(δ),

49or, instead, considering the stopped process Rt∧τn , t ∈ [0, T ], where τn = inft ≥0 : Rt ≤ 1/n

50the process R is called the Bessel process of order d and it plays an important rolein stochastic analysis

51called the Cauchy initial value problem for ODEs


and hence the state of the solution at time t+ δ is determined by its presentstate xt and the velocity f(t, xt).

The immediate questions are: does (6.1) have a solution ? Is the solutionunique ? How can it be found explicitly or approximated ? What are theproperties of the solution (e.g. is it bounded ? how smooth is it, beyondbeing differentiable ? etc.) Here is a simple example:

Example 6.1. Consider the one dimensional ODE (i.e. d = 1)d

dtxt = αxt, t ∈ [0, T ]

x0 = ξ(6.2)

This ODE is linear, since its right hand side is a linear map of xt. Checkthat the function

xt = ξeαt, t ∈ [0, T ],is a solution. As we shall see below, it is also the unique solution. ¥

However, only a few types of ODEs are known to have an explicit solu-tion. Hence the important problem is to state the conditions on f , so thata solution exists and is unique. Do not think that this is always the case:

Example 6.2. Consider the Riccati equationd

dtxt = x2

t , t ∈ [0, T ]

x0 = 1. (6.3)

It can be shown that if a solution exists, then it must be unique. Checkthat xt = 1/(1− t) is a solution on [0, T ] for any T < 1. However, it cannotbe extended for larger intervals. By uniqueness, it follows that there is nosolution on [0, T ] for T ≥ 1 (think why). ¥

Example 6.3. Consider the equationd

dtxt = x

1/2t , t ∈ [0, T ]

x0 = 0. (6.4)

Clearly, xt ≡ 0 is a solution. Another solution is yt = t2/4, i.e. the unique-ness fails. ¥

It turns out that (6.1) admits a unique solution if the function f does notgrow too fast and is continuous in a certain rather strong way (see Remark6.15 below).

In applications, many ODEs have the following special formd

dtxt = b(t, xt) + σ(t, xt)ut, t ∈ [0, T ]

x0 = ξ(6.5)

where b and σ are fixed functions and u = (ut)t∈[0,T ] is a time function,which models the external input to the dynamical system.


Example 6.4. Suppose that a particle moves in a viscous liquid and wecan control its motion by applying a force at each time t ≥ 0. For simplicity,assume that the particle moves along a line and denote its position andvelocity at time t by pt and vt respectively (by definition, vt = d

dtpt).Recall that by the second Newton’s law the acceleration a of the particle

is proportional to the net applied force F and inversely proportional to themass m of the particle, i.e. a = F/m. The viscosity of the liquid slowsdown the particle by applying the force of magnitude γvt in the direction,opposite to the direction of the velocity vt (γ is a constant, depending onthe viscosity of the liquid). In addition, an external force ut can be appliedto the particle e.g. by charging it before the experiment and controlling theelectrical field, in which the particle moves, at each time t ≥ 0.

By definition, the acceleration of the particle at at time t equals ddtvt =

d2

dt2pt and hence the Newton’s law imply that the position of the particle pt

satisfies the equation

d2

dt2pt = −γ

d

dtpt + ut, t ∈ [0, T ]. (6.6)

A simple trick reduces this second order ODE to the canonical form (6.5):define

Xt =(

pt

vt

),

then

d

dtXt =

(ddtptddtvt

)=

(vt

−γvt

)+

(0ut

)=

(0 10 −γ

)(pt

vt

)+

(01

)ut =

(0 10 −γ

)Xt +

(01

)ut.

Hence we get (6.5) with d = 2 and

b(t, x) =(

0 10 −γ

)x, σ(t, x) =

(01

).

This equation is again linear and can be solved explicitly (try!). ¥Suppose now that the external force in this experiment is applied by the

other particles, which frequently collide with the particle under considera-tion. To model the motion of the particle, we can consider all the particlestogether as a single ODE of a huge dimension, however, quantitative analysisof such an object would be untractable.

A different approach is to assume that the collisions between particlesare random and independent, in which case each individual particle satisfies(6.6) with random ut. It is then reasonable to assume that the forces appliedto our particle at times t and s are uncorrelated, i.e. cov(ut, us) = 0 andthat ut acts symmetrically, i.e. Eut = 0. In physics and engineering, sucha (hypothetical) process is referred to as the “white noise”. The fruit toreap is that now we may be able to describe the motion of the particle


probabilistically, e.g. to calculate its mean position or its variance at agiven time, the probability of the exit time from a domain (e.g. the field ofview of our microscope, etc.)

However, such “white noise” process is not easy to define rigorously:for example, we saw that the Kolmogorov-Daniel theorem yields a process,which lacks the essential regularity properties, allowing its integration withrespect to time, etc.

Let’s see what happens if we nevertheless pretend for the moment thatthe Brownian motion Bt is differentiable (which is of course false, as we saw)and set ut := d

dtBt. Then the formal calculations gives

Eut =d

dtEBt = 0

and for t > s

Eutus =d2

dtdsEBtBs =

d2

dtds(t ∧ s) =

d2

dtdss = 0,

and similarly Eutus = 0 for t < s. The (formal) derivative of the Brownianmotion satisfies the properties of the “white noise” ! Of course, the troublewith this calculation is that there is no reasonable way to differentiate Bt.

To proceed, note that if x is a solution of the ODE (6.5) and u is anintegrable function, then it must also satisfy the integral equation:

xt = ξ +∫ t

0b(s, xs)ds +

∫ t

0σ(s, xs)usds, t ≥ 0.

By our formal calculation we saw that ut := ddtBt satisfies the properties of

“white noise” and with utdt = dBt the latter equation reads:

xt = ξ +∫ t

0b(s, xs)ds +

∫ t

0σ(s, xs)dBs, t ≥ 0, (6.7)

and can be thought of and interpreted as the stochastic “differential” equa-tion (SDE):

dxt = b(t, xt)dt + σ(t, xt)dBt, t ∈ [0, T ]x0 = ξ

(6.8)

The main question, we’d like to answer, is how we can define solutions ofsuch an SDE in a rigorous way.

6.2. The Ito theory of SDEs. The objective of this subsection is topresent the rigorous theory of stochastic differential equation of the form

dXt = b(t, Xt)ds + σ(t, Xt)dBt,

X0 = ξ.(6.9)

where B is a k-dimensional Brownian motion52 defined on a filtered proba-bility space (Ω,F, Ft,P), the drift coefficient b : [0,∞) × Rd 7→ Rd and the

52i.e. a vector of k independent Brownian motions Bt = (B(1)t , ..., B

(k)t )


diffusion matrix σ : [0,∞) × Rd 7→ Rd×k are measurable functions and theinitial condition ξ ∈ Rd is an F0-measurable random variable.

Definition 6.5. A process X = (Xt)t∈[0,T ] is a strong solution of theSDE (6.9), if it is an adapted to Ft, has continuous paths such that P-a.s.X0 = ξ and

∫ t

0

(‖b(s,Xs)‖+ ‖σ(s,Xs)‖2

)ds < ∞, t ∈ [0, T ] (6.10)

and

Xt = X0 +∫ t

0b(s,Xs)ds +

∫ t

0σ(s,Xs)dBs, t ∈ [0, T ] (6.11)

Remark 6.6. Following the heuristic discussion above, the solution ofthe SDE (6.9) will be understood as the solution of the integral stochasticequation (6.11). The sole motivation for this notation redundancy is tomimic the classical objects from calculus.

Remark 6.7. Note that the properties of X, required in the definitionare enough for all the integrals in (6.11) to be well defined (explain why).

Remark 6.8. The terms “drift” and “diffusion” is physics slang. If thestochastic integral in (6.11) (consider d = 1 for clarity) is a martingale, then

1δ

(E(Xt+δ|Xt = x)− x

)=

1δE

(∫ t+δ

tb(s,Xs)ds +

∫ t+δ

tσ(s,Xs)dBs|Xt = x

)=

1δE

(∫ t+δ

tb(s,Xs)ds|Xt = x

)δ→0−−−→ b(t, x),

under appropriate additional conditions on b and σ. In other words, theinfinitesimal mean displacement of the trajectory is governed by the driftcoefficient b:

E(Xt+δ|Xt = x) = x + b(t, x)δ + o(δ).Similar calculation shows that the infinitesimal variance of the displacementis controlled by the diffusion coefficient σ:

var(Xt+δ|Xt = x

)= σ2(t, x)δ + o(δ).

Remark 6.9. One can also define solutions of (6.11) in a certain weaksense, still good enough for many applications (see the SDE (6.20) at theend of this section).

Definition 6.10. The SDE has a unique strong solution if any twosolutions X and X are indistinguishable, i.e.

P(

supt≤T

|Xt − Xt| = 0)

= 1.


Example 6.11. Consider the one-dimensional (d = 1) SDE

dXt = µXtdt + σXtdBt, t ≥ 0 (6.12)

subject to X0 = 1. Applying the Ito formula (look back at Example 5.4) to

Xt = exp((

µ− 12σ2

)t + σBt

), t ≥ 0,

we see that

Xt = 1 +∫ t

0µXsds +

∫ t

0XsσdBs.

Clearly, X is adapted, has continuous paths and X0 = 1. Finally, (6.10)holds (why?) and hence X is a strong solution of this SDE. Is the foundsolution unique ? The answer to this question is affirmative, as will shall beable to deduce shortly. ¥

In the above example we solved the SDE, i.e. found a solution, byguessing. How do we know that a solution to a given SDE exists, if wecannot find it explicitly ? How do we know that the solution is unique ?

The following theorem gives an answer to these questions under certainconditions on the data of the problem, i.e. the drift and diffusion coefficientsand the initial condition.

Theorem 6.12 (K.Ito). Suppose b(t, x) and σ(t, x) satisfy the globalLipschitz continuity condition 53

‖b(t, x)− b(t, y)‖+ ‖σ(t, x)− σ(t, y)‖ ≤K‖x− y‖, x, y ∈ Rd, t ≥ 0 (6.13)

and the linear growth condition

‖b(t, x)‖2 + ‖σ(t, x)‖2 ≤ K2(1 + ‖x‖2

), x, y ∈ R, t ≥ 0 (6.14)

with an absolute constant K. Assume that ξ is F0-measurable and E‖ξ‖2 <∞. Then the unique strong solution exists and satisfies the bound

E‖Xt‖2 ≤ C(1 + E‖ξ‖2

)eCt, t ∈ [0, T ], (6.15)

where C is a constant, depending only on K and T .

Example 6.13. We already know that the SDE (6.12) has a solution- we were lucky enough to find it explicitly ! This involved guessing. Theabove theorem guarantees the existence of the solution and its uniqueness,since b(t, x) := µx and σ(t, x) = σx satisfy both (6.13) and (6.14). Now weare able to conclude that the solution we found is the only strong solution.¥

53for a matrix or vector A, ‖A‖ =∑

ij A2ij


Example 6.14. Consider the SDE

dXt = sin(Xt)dt + cos(Xt)dBt,

s.t. X0 = 0. There is little hope to guess the solution of this SDE, but againthe theorem states that the unique solution exists ! If it exists, we can e.g.approximate it numerically or, perhaps, calculate its mean, variance, etc.

Remark 6.15. Note that with σ(t, x) ≡ 0, the SDE reduces to theODE, discussed at the beginning of the section. Hence, in particular, weobtained the existence and uniqueness results for ODEs. Note that theODE in Example 6.2 fails to satisfy both (6.13) and (6.14) and Example 6.3satisfies (6.14), but not (6.13).

To prove Theorem 6.12 we shall need the following powerful tools, whichare of interest on their own:

Theorem 6.16 (Gronwall’s inequality). Let g be a continuous real valuedfunction on [0, T ] such that

g(t) ≤ α(t) + β

∫ t

0g(s)ds, t ∈ [0, T ], (6.16)

where α(t) is an integrable function and β > 0. Then

g(t) ≤ α(t) + β

∫ t

0α(s)eβ(t−s)ds, t ∈ [0, T ].

Proof. Set ψ(t) := e−βt∫ t0 βg(s)ds. Since g is continuous, by the Fun-

damental theorem of calculus

d

dtψ(t) = −βe−βt

∫ t

0βg(s)ds + e−βtβg(t) =

βe−βt(g(t)−

∫ t

0βg(s)ds

) ≤ βe−βtα(t),

where we used the assumption (6.16) and β > 0. Since α(t) is integrableand ψ(0) = 0,

ψ(t) ≤∫ t

0βe−βsα(s)ds

or ∫ t

0βg(s)ds ≤ β

∫ t

0eβ(t−s)α(s)ds.

But by (6.16)

g(t)− α(t) ≤ β

∫ t

0g(s)ds ≤ β

∫ t

0eβ(t−s)α(s)ds,

which is the claimed bound. ¤


Simple as it is, Gronwall’s inequality is very powerful as we shall seeshortly. In particular, it implies that if g(t) ≥ 0 and satisfies g(t) ≤β

∫ t0 g(s)ds with a positive constant β, it follows that g(t) = 0 for all

t ∈ [0, T ].

Theorem 6.17 (Doob’s maximal inequality). Let Z be a submartingalewith cadlag paths, then

E∣∣∣ sup

s≤tZs

∣∣∣p≤

(p

1− p

)p

E|Zt|p, p ≥ 1

The conclusion of this theorem is surprisingly strong: the whole trajec-tory is controlled by its value at the tip! Look up for the proof in a book, ifcurious.

We already saw that the martingale∫ t0 XsdBs with E

∫ T0 |Xs|2mds < ∞,

m ≥ 1 satisfies

E∣∣∣∣∫ t

0XsdBs

∣∣∣∣2m

≤ CE∫ T

0|Xs|2mds,

with a constant54, depending only on m and T . Plugging this into the Doob’sinequality we get

Proposition 6.18. If E∫ T0 |Xs|2mds < ∞, m ≥ 1, then

E sups≤t

∣∣∣∣∫ s

0XudBu

∣∣∣∣2m

≤ CE∫ T

0|Xs|2mds, (6.17)

with a constant, depending only on m and T .

Now we are ready to proceed with the proof.

Proof of Theorem 6.12. For the sake of transparency we shall con-sider the one-dimensional case d = 1; the general case is left as a tedious,but straightforward exercise. The idea is again to construct a convergentsequence of processes Xm, m ≥ 1, whose limit will be a strong solution of(6.11). Then we shall show that the solution is unique.

The sequence Xm is constructed by means of the so called Picard iter-ations: set X0

t := ξ and for m ≥ 1,

Xmt := ξ +

∫ t

0b(s, Xm−1

s )ds +∫ t

0σ(s,Xm−1

s )dBs, t ∈ [0, T ].

Note that since ξ ∈ F0 and due to the linear growth assumption (6.14), theiterations are well defined.

The proof proceeds in several steps:

Step 1. We shall need the following bound

E|Xmt |2 ≤ C(1 + Eξ2)eCt, t ∈ [0, T ] (6.18)

54hereafter C stands for a generic constant, whose value is of no importance andwhich may change from line to line


with C := 12(K2 + 1)(1 + T )2. To this end, note that (6.18) trivially holdsfor m = 0. Assuming that it also holds for m− 1, it follows

E∣∣Xm

t

∣∣2 ≤3Eξ2 + 3E(∫ t

0b(s,Xm−1

s )ds)2

+ 3E(∫ t

0σ(s, Xm−1

s )dBs

)2=

3Eξ2 + 3E∫ t

0

(Tb2(s,Xm−1

s ) + σ2(s,Xm−1s )

)ds ≤

3Eξ2 + 3(1 + T )E∫ t

0K2

(1 +

(Xm−1

s

)2)ds ≤

3Eξ2 + 3K2(1 + T )2 + 3(1 + T )K2

∫ t

0E

(Xm−1

s

)2ds ≤

3Eξ2 + 3K2(1 + T )2 + 3(1 + T )K2

∫ t

0C(1 + Eξ2)eCsds =

3Eξ2 + 3K2(1 + T )2 + 3K2(1 + T )(1 + Eξ2)eCt ≤12(K2 + 1)(1 + T )2(1 + Eξ2)eCt = C(1 + Eξ2)eCt,

as claimed. Note that this bound implies that Xm ∈ L and hence the Itointegrals in the Picard iterations are martingales. Also (6.18) implies (6.15),where X is the limit Xm (which we shall shortly claim to exist).

Step 2. Using (6.18) and the Doob’s inequality (6.17) we shall show nowthat the sequence Xm is Cauchy P-a.s. and hence converges to a process Xwith continuous paths. To this end,

E sups≤t

(Xm

s −Xm−1s

)2 ≤2E sups≤t

(∫ s

0

(b(u,Xm−1

u )− b(u, Xm−2u )

)du

)2

+

2E sups≤t

(∫ s

0

(σ(u,Xm−1

u )− σ(u,Xm−2u )

)dBu

)2 †≤

2TE sups≤t

∫ s

0

(b(u,Xm−1

u )− b(u,Xm−2u )

)2du+

2C

∫ t

0E

(σ(u,Xm−1

u )− σ(u,Xm−2u )

)2du ≤

2T

∫ t

0E

(b(u,Xm−1

u )− b(u,Xm−2u )

)2du+

2C

∫ t

0E

(σ(u,Xm−1

u )− σ(u,Xm−2u )

)2du

‡≤

2(T + C)K2

︸︷︷︸:=L

∫ t

0E

(Xm−1

u −Xm−2u

)2du,


where † holds by the Doob’s inequality and ‡ is a consequence of the globalLipschitz assumption (6.13). Applying this inequality to m := 2, we get

E sups≤t

(X2

s −X1s

)2 ≤ L

∫ t

0E

(X1

u − ξ)2

du ≤ tL supu≤T

E(X1

u − ξ)2 ≤ tLC ′,

where C ′ is a constant, suggested by the bound (6.18). Similarly,

E sups≤t

(X3

s −X2s

)2 ≤ L

∫ t

0E

(X2

u −X1u

)2du ≤

L

∫ t

0E sup

r≤u

(X2

r −X1r

)2du ≤ L

∫ t

0uLC ′du ≤ C ′ (Lt)2

2.

By induction, we obtain the bound:

E sups≤T

(Xm

s −Xm−1s

)2 ≤ C ′ (TL)m

m!.

Now by the Chebyshev inequality

P

(sups≤T

|Xms −Xm−1

s | ≥ 12m

)≤ 4mC ′ (TL)m

m!.

The right hand side is summable over m and hence by the Borel-Cantellilemma there is a set Ω′ with P(Ω′) = 1, such that for all ω ∈ Ω′ there existsan integer M(ω), such that

sups≤T

|Xms −Xm−1

s | ≤ 12m

, ∀m ≥ M(ω)

Hence for any k ≥ 1 and m ≥ M(ω)

sups≤T

|Xm+ks −Xm

s | ≤k+m∑

j=m+1

sups≤T

|Xjs −Xj−1

s | ≤∞∑

j=m+1

12j≤ 1

2m

and hence

limm

supk≥0

sups≤T

|Xm+ks −Xm

s | = 0, P− a.s.

i.e. Xm is a Cauchy sequence P-a.s. and thus converges. Define the limit byX and note that by completeness of the space of continuous functions withuniform metric, X has continuous paths.

Step 3. We claim now that X is the desired strong solution of (6.11). Xhas continuous paths and, obviously, X0 = ξ P-a.s. Note that Xm

t ∈ Ft byconstruction and hence Xt = limm Xm

t ∈ Ft as well, i.e. X is adapted. Thelinear growth condition (6.14) and the bound (6.15) imply (6.10). It is left


to show that X satisfies (6.11) P-a.s. To this end let

∆t := Xt − ξ −∫ t

0b(s,Xs)ds−

∫ t

0σ(s,Xs)dBs =

Xt−Xmt −

∫ t

0

(b(s,Xs)−b(s,Xm−1

s ))ds−

∫ t

0

(σ(s, Xs)−σ(s,Xm−1

s ))dBs,

where the latter equality holds by construction of Xm. By (6.15), Xmt is

uniformly integrable and hence by the dominated convergence and the globalLipschitz condition (6.13), we conclude that E∆2

t = 0, i.e. ∆t = 0, P-a.s.for all t ∈ [0, T ]. Since X is continuous, ∆t is indistinguishable from zeroprocess, as claimed.

Step 4. Finally, we shall argue that the solution is unique. Suppose X andX are two strong solutions ad let δt := Xt − Xt. A calculation similar toStep 2 shows that

Eδ2t ≤ C∗

∫ t

0Eδ2

sds, t ∈ [0, T ],

where C∗ is a constant, depending only on K and T . By the GronwallLemma 6.16, the latter implies Eδ2

t = 0, i.e. δt = 0, P-a.s. Since δ hascontinuous paths, it is indistinguishable from the zero process, i.e. X andX are indistinguishable. ¤

The conditions of the Theorem 6.12 are not necessary. Some SDEs mightnot fit this setting, but still have unique strong solutions, whose existenceor/and uniqueness requires an additional argument (such as appropriatelocalization, etc.) For example, the SDEs

dXt = |Xt|pdBt

has the unique strong solution for all p ∈ [1/2, 1]. Another example, is

dXt = −X3t dt + Bt.

6.3. Weak solutions of SDEs. You might be curious by now, how adifferent type of solution of (6.11) can be defined ? The celebrated Tanakaformula is the clue!

Consider the occupation time of the interval [−ε, ε] by the Brownianmotion till time t:

ηεt :=

∫ t

01|Bs|≤εds.

The occupation time for a fixed t vanishes as ε → 0, e.g.

E|ηεt | = Eηε

t =∫ t

0P(|Bs| ≤ ε)ds =

∫ t

0

∫ ε

−ε

1√tϕ(x/

√t)dxds → 0,


where ϕ is the standard Gaussian density. But the latter hints that ηεt = o(ε)

(why?). Indeed, it can be shown that the limit

Lt := limε→0

1ε

∫ t

01|Bs|≤εds, P− a.s.

exists and defines an additive increasing process with continuous paths 55.The process Lt is called the local time of the Brownian motion at 0.

Now recall that if f is a C2 function, then by the Ito formula

f(Bt) = f(0) +∫ t

0f ′(Bs)dBs +

12

∫ t

0f ′′(Bs)dBs, t ≥ 0.

The function G(x) := |x| is not even differentiable and hence the Ito formuladoes not apply.

Consider the sequence of functions ϕn(x) := nϕ(xn), where ϕ is thestandard Gaussian density. For any n ≥ 1, ϕn(x) is differentiable any num-ber of times (smooth) and for large n it concentrates around zero. Definethe mollifications of sign(x) (plot gn to figure out what is going on):

gn(x) :=∫

Rsign(y − x)ϕn(x)dy.

gn(x) is smooth and limn gn(x) = sign(x). Define Gn(x) :=∫ x0 gn(x)dx and

check that limn Gn(x) = G(x) = |x|. Gn(x) is smooth and hence we canapply the Ito formula to it:

Gn(Bt) =∫ t

0gn(Bs)dBs +

12

∫ t

0g′n(Bs)dBs, t ≥ 0.

You may guess that the term in the left hand side and the first term inthe right hand side converge to |Bt| and

∫ t0 sign(Bs)dBs respectively. A

calculation shows that this is true and moreover, the second term on theright converges to Lt. Hence we obtain the Tanaka formula

|Bt| =∫ t

0sign(Bs)dBs +

12Lt, t ≥ 0. (6.19)

Note that Lt is a functional of |Bt| and hence the latter formula suggeststhat the filtration generated by the process Zt :=

∫ t0 sign(Bs)dBs coincides

with the filtration generated by the process |Bt|, i.e. FZt = F

|B|t , which is

strictly smaller than the filtration FBt . This will be important in a second !

Now we are ready for a surprising example: consider the SDE

Xt =∫ t

0sign(Xs)dBs, t ∈ [0, T ], (6.20)

55in fact the local time is conventionally defined as Lt/4


on the probability space (Ω,F, FBt ,P), where FB

t is the natural filtration ofB. Assume that this SDE has a strong solution X. Then

〈X〉t =∫ t

0

2sign(Xs)ds = t

and by the Levy characterization, X is the Brownian motion on its own (ofcourse, it is different from B, think why).

Further, ∫ t

0sign(Xs)dXs =

∫ t

0sign(Xs)2dBs = Bt.

But since X is the Brownian motion,

Bt = |Xt| − LXt ,

where LXt is the local time of X, which implies that FB

t = F|X|t . But the

filtration F|X|t is strictly smaller than FX

t , as we saw above. This suggeststhat the filtration of X is strictly larger than FB

t , i.e. X is not FBt -adapted.

In other words, the history of B up to time t does not determine Xt. Theinevitable conclusion is that the SDE (6.20), referred to as the Tanaka SDE,does not have a strong solution !

The lack of the strong solution on its own is perhaps not so surprising,since after all sign(x) is far from being Lipschitz continuous. What is muchmore surprising, is that one can still define a solution of the Tanaka SDE inthe following weak sense:

Definition 6.19. The SDE (6.11) admits a weak solution, if there existsa filtered probability space carrying a Brownian motion B and a continuousprocess X, satisfying (6.11).

Clearly any strong solution is also a weak solution, since we can takethe original Brownian motion as B in this case. Let’s see now that, thoughthe Tanaka SDE does not have a strong solution, it nevertheless does havea weak one!

To this end, consider the process

Bt :=∫ t

0sign(Bs)dBs,

which by the Levy theorem, is the Brownian motion. Further, note that

Bt =∫ t

0sign(Bt) sign(Bt)dBt =

∫ t

0sign(Bt)dBt.

Hence the given Brownian motion B is the weak solution of the TanakaSDE, but on a different probability space, namely on (Ω, F,FB

t ,P)!The weak solutions suffice for many purposes, including calculating the

expectations, etc.


6.4. Exercises.

Problem 2.6.1 (the Langevin equation). The motion of a particle in aviscous liquid formally satisfies the second order differential Langevin equa-tion:

d2

dt2xt = −γ

d

dtxt + σηt, t ≥ 0 (6.21)

subject to the initial conditions x0 = 0 andd

dtx0 = 0, where γ is a constant,

depending on the properties of the liquid, η is the “white noise” process andσ > 0 is the noise intensity.

(1) Explain how the Langevin equation (6.21) is interpreted as thelinear two dimensional SDE

dxt = vtdt

dvt = −γvt + σdBt(6.22)

subject to x0 = 0, v0 = 0 where B is the one dimensional Brownianmotion.

(2) Argue that this two-dimensional SDE has the unique strong solu-tion

(3) Find the mean position Ext and mean velocity Evt of the particleat time t

(4) Find var(vt), cov(xt, vt) and var(xt)

(5) Argue that (x, v) is a Gaussian process

Problem 2.6.2. The geometrical Brownian motion

St = S0 exp((

µ− 12σ2

)t + σBt

), t ≥ 0

is the simplest model of prices on a stock market. The drift µ is interpretedas the efficient interest rate and the diffusion coefficient σ is called volatility.Though the price process is assumed to evolve continuously in time, it canonly be observed at discrete time epochs. The objective of this problem isto suggest an estimation procedure for the model coefficients µ and σ, giventhe price samples:

St0 , St1 , ..., Stn

where t0 = 0 and t1 < t2 < ... < tn are known times.

(1) Show that Xi := log Sti/Sti−1 , i = 1, ..., n are independent Gaussianrandom variables with

EXi =(µ− 1

2σ2

)(ti − ti−1), var(Xi) = σ2(ti − ti−1).


(2) Write down the density of the vector X = (X1, ..., Xn) and try tofind an explicit formula the Maximum Likelihood estimator of µand σ.

(3) Download from the website of the Bank of Israel, the daily pricesof 1 US$ on the Tel Aviv stock market during the last year andcalculate the above MLEs.

Problem 2.6.3. Fill in the details of the Step 3 and Step 4 of the proofof Theorem (6.12).

Problem 2.6.4. Consider the one-dimensional SDE:

dXt =(√

1 + X2t +

12Xt

)dt +

√1 + X2

t dBt,

e.g. subject to X0 = 0.

(1) Argue that this SDE has a unique strong solution, by checking theconditions of Theorem (6.12)

(2) Find the solution explicitly

Hint: apply the Ito formula to Yt :=√

1 + X2t and derive an

SDE for Zt := Xt + Yt.

Problem 2.6.5 (the Euler algorithm). In this problem we shall explorethe simplest numerical SDE solver. Consider the one dimensional SDE

dXt = b(Xt)dt + σ(Xt)dBt, t ∈ [0, T ]X0 = x

where x ∈ R and b and σ are functions satisfying the global Lipschitz and thelinear growth condition of Theorem 6.12, so that the unique strong solutionexists.

For an integer n, let tj = j/n, j = 0, ..., n be the uniform partition of theinterval [0, T ] and for a fixed n, define the finite sequence (Xn

tj ), j = 0, ..., n

recursively:

Xntj = Xn

tj−1+ b

(Xn

tj−1

) 1n

+ σ(Xn

tj−1

)∆Bj , (6.23)

subject to Xn0 = x, where ∆Bj := Bj/n −B(j−1)/n.

The random variable Xntj is viewed as the approximation of Xtj and if

we connect the points Xntj , using some interpolation, we obtain a contin-

uous time process Xn = (Xnt )t∈[0,T ], which hopefully approximates X =

(Xt)t∈[0,T ]. The idea of such type of time discretization dates back toL.Euler, who applied it to solve numerically ODEs.


(1) Using your favorite software, generate the sequence Xn using (6.23)for the SDE

dXt = µXtdt + σXtdBt,

with µ = 3/2 and σ = 1, T = 1 and n = 100. Plot your approxi-mation along the exact solution

Xt = x exp((µ− σ2/2)t + σBt

).

Run the simulation again with n = 1000. Did you get better ap-proximation ?

(2) Following the steps below, show that Euler’s approximation (6.23)is consistent in the sense

E(Xn

T −XT

)2 ≤ C

n, (6.24)

where C is a positive constant, which depends only on the SDEcoefficients and T .

(a) Using the linear growth condition of b and σ, show that

supj≤n

E(Xn

tj

)2 ≤ c1(x2 + 1)ec2T ,

with positive constants c1 and c2, independent of n.

(b) Let (Xnt ), t ∈ [0, T ] be the continuous time process, satisfying

Xnt = x +

∫ t

0b(Xn

[s/n]n

)ds +

∫ t

0σ(Xn

[s/n]n

)dBs, t ≥ 0.

Show that (Xnt )t∈[0,T ] is well defined, coincides with the se-

quence Xntj on the grid points t ∈ t0, ..., tn and

supt≤T

E(Xn

t −Xn[t/n]n

)2 ≤ 2n−1K2(1 + c1(x2 + 1)ec2T

).

Consider the process ∆t := Xt −Xnt , t ∈ [0, T ] and note that

∆t =∫ t

0

(b(Xs

)− b(Xn

s

))ds +

∫ t

0

(σ(Xs

)− σ(Xn

s

))dBs+

∫ t

0

(b(Xn

s

)− b(Xn

[s/n]n

))ds +

∫ t

0

(σ(Xn

s

)− σ(Xn

[s/n]n

))dBs.

(c) Show that the function Vt := E∆2t satisfies

Vt ≤ 2K2(T + 1)∫ t

0Vsds + 2K2(T + 1)22n−1K2

(1 + c1(x2 + 1)ec2T

)

and prove (6.24), applying Gronwall’s inequality.


Problem 2.6.6 (Linear SDE). The objective of this problem is to ex-plore the important class of linear SDEs of the form:

dXt = AXtdt + BdVt,

X0 = ξ(6.25)

where A and B are d×d and d×k matrices respectively, V is k-dimensionalBrownian motion and ξ is a Gaussian vector, independent of V , with meanµ and covariance matrix Σ.

(1) Argue that this SDE admits the unique strong solution, by checkingthe conditions of Theorem (6.12)

As we shall see, the solution of (6.25) is given by an explicit formula,similar to the one-dimensional Ornstein−Uhlenbeck (look back at Problem10.3). The key ingredient of this formula is the matrix exponent:

eA =∞∑

j=0

Aj

j!. (6.26)

(2) Check that eA is well defined, i.e. the series in (6.26) converge, andverify the following properties

(a) If A and B commute, i.e. AB = BA, then eA+B = eAeB =eBeA. Give an example of A and B for which this formula fails

(b) eA is invertible and (eA)−1 = e−A

(c) (eA)> = aA>

(3) Argue that the function t 7→ etA is differentiable and that

d

dteAt = AeAt = eAtA

(4) Applying the Ito formula, show that (6.25) is solved by:

Xt = eAt

(ξ +

∫ t

0e−AsBdVs

)

(5) Argue that Xt is a Gaussian vector with EXt = eAtEξ and

Qt := cov(Xt, Xt) = eAtΣeA>t +∫ t

0eAsBB>eA>sds


which solves the Lyapunov ODEd

dtQt = AQt + QtA

> + BB>

Q0 = Σ.

(6) Identify the matrices A and B in (6.22) and apply the above for-mulae to confirm your answers in (3)-(5) of Problem 2.6.1.

7. The Feynman-Kac formula

How do we calculate expectations of functionals of processes, generatedby SDEs such as (6.9) ? The answer to this question has a deep connectionto partial differential equations (PDEs).

For simplicity56 we shall consider the one-dimensional SDE

dXt = b(t,Xt)dt + σ(t,Xt)dBt, t ∈ [0, T ] (7.1)

subject to an initial condition X0 = x ∈ R, where the drift and the diffusioncoefficients are regular enough (e.g. see Theorem 6.12), so that the uniquesolution exists (either strong or weak).

For a C2 function 57 ϕ, consider the limit

limδ→0

Ex,tϕ(Xt+δ)− ϕ(x)δ

(7.2)

where Ex,t stands for the conditional expectation E(...|Xt = x). By the Itoformula

ϕ(Xt+δ) = ϕ(Xt) +∫ t+δ

t(Lϕ)(s, Xs)ds+

∫ t+δ

t∂xϕ(Xs)σ(s,Xs)dBs, (7.3)

where58 L is the infinitesimal operator associated with the SDE (7.1):

(Lϕ)(t, x) := b(t, x)∂xϕ(t, x) +12σ2(t, x)∂xxϕ(t, x).

Assuming that the Ito integral in the right hand side of (7.3) is a martin-gale59, we obtain

Et,xϕ(Xt+δ) = ϕ(x) +∫ t+δ

tEt,x(Lϕ)(s,Xs)ds.

56the general multidimensional case is treated similarly57we shall denote by Cm the m times continuously differentiable Rd 7→ R functions

and by Ck,m the functions [0,∞) × Rd 7→ R, which have k continuous derivatives in thetime variable t and m continuous derivatives in the space variable x

58∂t, ∂x, ∂xx are standard notations for the corresponding partial derivatives59e.g. this holds if ϕ has a bounded derivative and σ satisfies the linear growth

condition of Theorem 6.12

7. THE FEYNMAN-KAC FORMULA 101

If we assume that b and σ are continuous functions and plug the latter ex-pression into (7.2), the fundamental theorem of calculus and the dominatedconvergence gives

limδ→0

Ex,tϕ(t + δ,Xt+δ)− ϕ(t, x)δ

= (Lϕ)(t, x).

The second order differential operator L plays a crucial role in the theory ofSDEs.

Theorem 7.1 (Feynman-Kac formula). Suppose b and σ in (7.1) arecontinuous functions, satisfying the linear growth conditions 60 and that (7.1)has a unique solution (either strong or weak).

Let f(x), g(t, x) and k(t, x) ≥ 0, t ∈ [0, T ], x ∈ R be continuous func-tions and assume that f and g do not grow faster than a quadratic polyno-mial61:

|f(x)| ∨ |g(t, x)| ≤ C(1 + x2), x ∈ R,

with a constant C > 0.Suppose that there is a C1,2 function v : [0, T ]×R 7→ R, which solves the

PDE terminal value problem∂tv + Lv − kv = −g, on [0, T ]× Rv(T, x) = f(x)

(7.4)

and, moreover, v grows at most polynomially:

maxt≤T

|v(t, x)| ≤ M(1 + x2µ), x ∈ R (7.5)

for some constants M > 0 and µ ≥ 1.Then

v(t, x) = Et,x

(f(XT )e−

∫ Tt k(s,Xs)ds +

∫ T

tg(s,Xs)e−

∫ Tt k(u,Xu)duds

)(7.6)

Remark 7.2. The problems such as (7.4), known in the functional analy-sis as backward parabolic PDEs, have been extensively studied. Conditionsunder which (7.4) admits the unique solution are well known. Thus theFeynman-Kac formula reduces the computation of expectations of certainfunctionals to solving a PDE problem. It can also be used in the oppositedirection, e.g. to find explicit solutions to (7.4). More precisely, suppose weknow that the PDE (7.4) has the unique solution satisfying (7.5) and wemanage to calculate the expectation in (7.6) and, moreover, the obtainedv(t, x) is C1,2. Then by uniqueness v(t, x) is the solution! We shall shortlysee such an application in finance.

(7.6) is an example of the so called stochastic representation formulas,which express the solution of a PDE problem as an expectation of an ap-propriate functional.

60i.e. (6.14) from Theorem 6.1261instead, f(x) ≥ 0 or/and g(t, x) ≥ 0 can be assumed


Before giving its proof, let’s explore the Feynman-Kac formula througha simple example:

Example 7.3. Let’s calculate EB2T through the F-K formula. For the

Brownian motion B, b(t, x) = 0 and σ(t, x) = 1. If we choose, f(x) = x2,g(t, x) = 0 and k(t, x) = 0, the PDE (7.4) reads:

∂tv +12∂xxv = 0

v(T, x) = f(x)

Clearly, v(t, x) = x2 + T − t solves this problem and satisfies (7.5). Hence

Et,xB2T = E(B2

T |Bt = x) = x2 + T − t

and, in particular, setting t = 0 and x = 0 yields EB2T = T . ¥

Proof of Theorem 7.1. Define ξt = v(t,Xt), ζt := e−∫ t0 k(s,Xs)ds and

ηt := ξtζt = v(t,Xt)e−∫ t0 k(s,Xs)ds. Note that the process ξt and ζt satisfy

dξt = ∂tv(t,Xt)dt + (Lv)(t,Xt)dt + ∂xv(t,Xt)σ(t,Xt)dBt

and

dζt = −k(t,Xt)ζtdt.

Applying the Ito formula to ηt, we get

dηt = ξtdζt + ζtdξt =(− k(t,Xt)ξt + ∂tv(t,Xt) + (Lv)(t, Xt)

)ζtdt + ζt∂xv(t, Xt)σ(t,Xt)dBt =

− g(t,Xt)ζtdt + ζt∂xv(t,Xt)σ(t,Xt)dBt,

where in the last equation we used the first line of (7.4). Define the stoppingtimes τn := infs ≥ t : X2

s ≥ n, n = 1, 2, ..., then the Ito formula abovereads:

ηT∧τn − ηt = −∫ T∧τn

tg(s, Xs)ζsds +

∫ t∧τn

tζs∂xv(s,Xs)σ(s,Xs)dBs,

or

v(T ∧ τn, XT∧τn)e−∫ T∧τn0 k(s,Xs)ds − v(t,Xt)e−

∫ t0 k(s,Xs)ds =

−∫ T∧τn

tg(s,Xs)e−

∫ s0 k(u,Xu)duds+

∫ T∧τn

te−

∫ t0 k(s,Xs)ds∂xv(s,Xs)σ(s,Xs)dBs.


Multiplying both sides by the Ft-measurable e∫ t0 k(s,Xs)ds, rearranging and

taking the expectation of both sides, we get

Et,xv(t,Xt) = Et,xv(T ∧ τn, XT∧τn)e−∫ T∧τn

t k(s,Xs)ds+

Et,x

∫ T∧τn

tg(s,Xs)e−

∫ st k(u,Xu)duds. (7.7)

Since X ∈ P, limn τn → ∞ P-a.s. Under the assumptions made, the domi-nated convergence applies and the claim follows. ¤

The following special case is of particular importance

Corollary 7.4. Under the assumptions of Theorem 7.1, the solutionof the backward Kolmogorov equation

∂ut + Lu = 0

u(T, x) = f(x)(7.8)

has the stochastic representation

u(t, x) = Et,xf(XT ),

where X is the solution of the SDE (7.1).

Under appropriate conditions on the data of the problem, it can be shownthat there exists a function p(y, x; T, y) which for any fixed T > 0 and y ∈ R,is C1,2 in the backward variables t and x and satisfies the boundary valueproblem

∂tp + (Lp) = 0

limtT

∫

Rp(t, x; T, y)φ(y)dy = φ(x), x ∈ R (7.9)

for all sufficiently regular functions φ. Hence for a particular function f andfixed T > 0 and y ∈ R, the function u(t, x) :=

∫R p(t, x;T, y)f(y)dy is the

solution of (7.8) and hence by uniqueness

Ex,tf(XT ) =∫

Rp(t, x;T, y)f(y)dy.

This suggests that the transition probability Pt,x(XT ∈ A) has a densitywith respect to the Lebesgue measure and this transition density is given bythe solution of the backward Kolmogorov equation (7.9).

The conditions for (7.9) to have a solution are more stringent than for(7.8), as hinted by the following example:

Example 7.5. Consider the ODE

Xt = Xt, X0 = x,

which has the unique solution, satisfying Xt = Xset−s. We shall regard

this ODE as an instance of SDE with degenerate diffusion coefficient. The


corresponding infinitesimal operator is (Lφ)(x) = x∂xφ(x) and (7.8) reads

∂tu + x∂xu = 0

u(T, x) = f(x)

Suppose f is C1, then u(t, x) = f(xeT−t) solves (7.8):

∂tu = ∂tf(xeT−t) = −xeT−tf(xeT−t) = −x∂xf(xeT−t) = −x∂xu,

and u(T, x) = f(x). However, being deterministic, XT does not have adensity w.r.t. the Lebesgue measure and hence (7.9) cannot have a solution,which describes a probability density. ¥

It turns out that non-degeneracy of the diffusion coefficient, i.e.

σ2(t, x) ≥ c for somec > 0

suffices for existence of the transition density, in which case it is the uniquesolution of the backward Kolmogorov equation (7.9).

This naturally raises the question how can we describe the evolution ofthe transition density in its forward variables T and y, when the backwardvariables t and x are kept fixed. Here is a hopefully convincing heuristic62

argument.Let φ be a C2 function with compact support, then by the Ito formula

φ(XT ) = φ(Xt) +∫ T

t

(b(s,Xs)φ′(Xs)+

12σ2(s,Xs)φ′′(Xs)

)ds +

∫ T

tφ′(Xs)σ(s,Xs)dBs.

Conditioning both sides we get

Ex,tφ(XT ) = φ(x) +∫ T

t

(Et,xb(s,Xs)φ′(Xs)+

12Et,xσ2(s,Xs)φ′′(Xs)

)ds. (7.10)

Taking derivative w.r.t. T yields

∂TEx,tφ(XT ) = Et,xb(T, XT )φ′(XT ) +12Et,xσ2(T,XT )φ′′(XT ) (7.11)

Suppose that the transition density p(t, x; T, y) exists and is C1,2 in theforward variables T and y ∈ R. Then integration by parts gives:

Et,xb(T, XT )φ′(XT ) =∫

Rφ′(y)b(T, y)p(t, x;T, y)dy =

−∫

Rφ(y)∂y

(b(T, y)p(t, x; T, y)

)dy,

62the rigorous derivation is more involved


where we used the compact support of φ. Similarly, integrating by partstwice we get

Et,xσ2(T,XT )φ′′(XT ) =∫

Rφ(y)∂yy

(σ2(T, y)p(t, x; T, y)

)dy.

Plugging this back into (7.11) and rearranging, we obtain:∫

R

(∂T p(x, t; T, y) + ∂y

(b(T, y)p(t, x;T, y)

)−

12∂yy

(σ2(T, y)p(t, x; T, y)

))φ(y)dy = 0.

Since φ was arbitrary, we conclude that p(t, x;T, y) must solve the forwardFokker-Planck-Kolmogorov equation

∂T p + L∗p = 0

limTt

∫

Rp(t, x;T, y)φ(y)dy = φ(x),

(7.12)

for any bounded sufficiently regular φ, where the operator L∗ is the formaladjoint of the operator L, defined as

(Lp)(T, y) := −∂y

(b(T, y)p(t, x; T, y)

)+

12∂yy

(σ2(T, y)p(t, x;T, y)

).

Once again, be warned: the existence of solutions of (7.12) can be an in-volved issue, depending on the properties of the SDE coefficients.

Example 7.6. For the Brownian motion b(s, x) = 0 and σ(s, x) = 1 and(7.12) reads

∂T p(t, x; T, y) +12∂yyp(t, x;T, y) = 0.

Check that

p(t, x; T, y) =1√

2π(T − t)exp

(− (y − x)2

2(T − t)

)

is a solution and that limTt

∫R p(t, x; T, y)φ(y)dy = φ(x) holds. ¥

The problems (7.9) and (7.12) can rarely be solved explicitly. A some-what more trackable problem in dimension one is to find the stationarydensity, i.e. the limit

p(y) := limT→∞

p(t, x; T, y). (7.13)

Obviously, this limit cannot be expected to exist if b and σ depend on thetime variable (why?). If they don’t, establishing the convergence in (7.13) isan interesting problem, which can be solved affirmatively under appropriateconditions. If such a limit exists, it satisfies the ODE (formally obtained bysetting the time derivative in (7.12) to zero)

12

d2

dy2

(σ2(y)p(y)

)=

d

dy

(b(y)p(y)

). (7.14)


Remarkably, this ODE admits the explicit solution63

p(y) =C

σ2(y)exp

(∫ y

0

2b(u)σ2(u)

du

), y ∈ R, (7.15)

where C is the normalizing constant taking care of∫R p(y)dy = 1.

Example 7.7. For the Brownian motion we have b = 0 and σ = 1. Thelatter formula suggests that the density is constant, which of course cannotbe true (since such a density does not integrate to 1 on R). This should notsurprise you, since the limit (7.13) doesn’t exist in this case (why ?).

Let’s consider now the Ornstein−Uhlenbeck process, generated by thelinear SDE:

dXt = −bXtdt + σdBt,

where b > 0 and σ are constants. It can be shown that the limit (7.13) doesexist and (7.15) yields:

p(y) =C

σ2exp

(∫ y

0

−2bu

σ2du

)∝ e−y2 2b

σ2 ,

which after normalizing is recognized as the N(0, 2b/σ2) Gaussian density.¥

7.1. Exercises.

Problem 2.7.1. Consider the Ornstein−Uhlenbeck process, generatedby the SDE

dXt = bXtdt + σdBt, t ∈ [0, T ],

where b and σ are constants.

(1) Find the infinitesimal operator and write the backward and theforward Kolmogorov equations for the transition density

(2) Argue that both of the equations are solved by the Gaussian den-sity:

p(t, x; T, y) =1√

2πVT−t

exp

(−1

2

(y − xeb(T−t)

)2

VT−t

),

where VT−t = σ2

2b

(e2b(T−t) − 1

).

(3) Show that if b < 0, then the limit limT→∞ p(t, x;T, y) exists anddoes not depend on t and x.

Problem 2.7.2. The objective of this problem is to fill in the gaps inthe proof of Theorem 7.1 from the lecture notes

63This can be easily seen, once you notice that (σ2p)′ = 2bp

8. GIRSANOV’S CHANGE OF MEASURE 107

(1) Show that the localization procedure (i.e. introducing the stoppingtimes τn, etc.) is unnecessary, if ∂xv(t, x) is assumed to grow notfaster than linearly.

(2) Elaborate how the dominated convergence is applied to show that

limnEt,x

∫ T∧τn

tg(s,Xs)e−

∫ st k(u,Xu)duds =

Et,x

∫ T

tg(s,Xs)e−

∫ st k(u,Xu)duds

(3) Assuming that

Ex,t supt≤s≤T

|Xs|2m ≤ C(1 + |x|2m)eC(s−t), s ∈ [t, T ]

prove that

limnEt,xv(T ∧ τn, XT∧τn)e−

∫ T∧τnt k(s,Xs)ds = Et,xf(XT )e−

∫ Tt k(s,Xs)ds

8. Girsanov’s change of measure

Recall the following fact from real analysis

Theorem 8.1. Let (S, S) be a measurable space.(1) Let ν and µ two σ-finite measures on (S, S) and assume µ ¿ ν,

i.e. if ν(A) = 0 implies µ(A) = 0 for all A ∈ S, then there exists ameasurable ν-a.s. unique function f : S 7→ R, denoted by dµ

dν (x) :=f(x) and referred to as the Radon-Nikodym derivative of µ w.r.t.ν, such that

µ(A) =∫

Afdν, ∀A ∈ S.

(2) Let ν be a σ-finite measure on (S, S) and f be a measurable functionsuch that ν(x : |f(x)| = ∞) = 0. Then the set function

µ(A) :=∫

Afdν, A ∈ S

is a measure, absolutely continuous w.r.t. ν, i.e. µ ¿ ν.

If µ ¿ ν and φ is a measurable function, then∫φdµ =

∫φ

dµ

dνdν.

This is, the so called, change of measure formula. If µ ¿ ν and ν ¿ µ, thenthe measures are said to be equivalent, denoted by µ ∼ ν. It can be seenthat

dµ

dν(x) =

(dν

dµ(x)

)−1

, µ− a.s. and ν − a.s.


Let’s explore this theorem through the following naive but illuminatingexample:

Example 8.2. Consider a probability space (Ω, F,P) hosting two ran-dom variables X ∼ N(0, 1) and Y ∼ N(a, 1), where a is a real number. Thismeans that the random variables X and Y induce probability measures µX

and µY on (R, B(R)), so that

µX(A) =∫

A

1√2π

e−x2/2λ(dx),

where λ is the Lebesgue measure64 on R. If A is such that λ(A) = 0, thenµX(A) = 0, which means that µX ¿ λ and the latter formula suggests that

dµX

dλ(x) =

1√2π

e−x2/2, x ∈ R.

Similarly, µY ¿ λ and

dµY

dλ(x) =

1√2π

e−(x−a)2/2, x ∈ R.

In fact, the converse is true as well: since e−x2/2 > 0 for all x ∈ R, µX(A) = 0implies65 that A must be of Lebesgue measure zero, i.e. λ(A) = 0. Thecorresponding R-N derivative is

dλ

dµX(x) =

√2πex2/2, x ∈ R.

Indeed,∫

A

dλ

dµX(x)µX(dx) =

∫

A

√2πex2/2 1√

2πe−x2/2λ(dx) =

∫

Aλ(dx) = λ(A).

Also

µX(A) =∫

R

1√2π

e−x2/2dx =∫

Re−xa+ 1

2a2 1√

2πe−(x−a)2/2dx =

∫

Ae−xa+ 1

2a2

µY (dx)

which implies that µX ¿ µY (and by the symmetry µX ∼ µY ) with the R-Nderivative

dµX

dµY(x) = e−xa+ 1

2a2

, x ∈ R.

To recap, the same measure has different densities with respect to differentequivalent measures.

64note that the Lebesgue measure is not a finite measure on R, but σ-finite, i.e.λ(A) < ∞ whenever A is measurable and bounded

65this si a property of the Lebesgue integral


Now consider the random variable

Z(ω) := exp(

aX(ω)− 12a2

),

and note that

EZ =∫

Ωexp

(aX(ω)− 1

2a2

)P(dω) =

∫

Rexp

(ax− 1

2a2

)1√2π

e−x2/2λ(dx) =∫

R

1√2π

e−(x−a)2/2λ(dx) = 1.

Since Z(ω) ≥ 0 and EZ = 1 (in particular, P(Z = ∞) = 0), the set function

Q(A) :=∫

AZ(ω)P(dω)

is a probability measure on F. Note that Q is the probability on F, definedby specifying its R-N derivative

dQdP

(ω) := Z(ω).

The random variables X and Y have different distributions on (Ω, F,Q) and(Ω,F,P). Indeed,

Q(X ∈ A) =∫

Ω1X∈A

dQdP

(ω)P(dω) =∫

Ω1X∈A exp

(aX(ω)− 1

2a2

)P(dω) =

∫

R1x∈A exp

(ax− 1

2a2

)1√2π

e−x2/2λ(dx) =∫

R1x∈A

1√2π

e−(x−a)2/2λ(dx),

which means that under Q, X is a Gaussian random variable with unitvariance and mean a. Similarly Y ∼ N(0, 1) under Q. ¥

The scope of Theorem 8.1 is very general and its applicability goes far be-yond this simple example. Consider a filtered probability space (Ω, F,Ft,P),carrying the Brownian motion Bt, t ≥ 0. Analogously to the above exam-ple’s nomenclature, we shall denote by µB the probability measure, calledthe Wiener measure, induced by B on the space of continuous functionswith the appropriate Borel σ-algebra (e.g. corresponding to the uniformmetric). We shall also denote by PT the restriction of P to the σ-algebra FT

for a fixed T and by µBT the measure induced by B on the space C[0,T ] of

continuous functions on the interval [0, T ].In the view of the above example, we are now curious how the abstract

Theorem 8.1 applies to this setting. Let’s start the exploration from exam-ples:


Example 8.3. Let Xt(ω) := 2Bt(ω) and denote by µX be the probabilitymeasure, induced by X on the space of continuous functions. Is µX ¿ µB?Recall that quadratic variation of the Brownian motion on [0, T ] equals TP-a.s., while the quadratic variation of X is 4T P-a.s. (why?). Hence theset

A =

x ∈ C[0,T ] :T∨

0

x = T

is of µX -measure zero66:

µX(A) = P(X ∈ A) = 0,

while µB(A) = 1. Hence µX 6¿ µB and, in fact, the measures µX and µB

are orthogonal, i.e. are supported on essentially nonintersecting sets.Note the difference: the measures, induced by the real valued random

variables ξ ∼ N(0, 1) and 2ξ are equivalent (check!), while the measuresinduced by the processes B and 2B are orthogonal. This hints that equiva-lence of measures in the infinite dimensional spaces, such as C[0,T ], is a moredelicate matter. ¥

Example 8.4. Define a random variable

ZT (ω) := exp(aBT (ω)− 1

2a2T

).

ZT ∼ N(0, T ) and hence a simple calculation shows EZT = 1. Since ZT ≥ 0,the set function

QT (A) := E1AZT (ω) =∫

AZT (ω)P(dω),

is a probability measure on F. The Example 8.2 suggests that under thenew measure QT , the random variable BT has the same variance as underP, namely T , but a different mean, i.e. aT (check!). In fact, for t ≤ T(hereafter, EQT

and EP denote expectations w.r.t. QT and P respectively)

EQTBt = EP

dQT

dPBt = EPZT Bt =

EPeaBT− 12a2T Bt = EPBtEP

(eaBT− 1

2a2T |Ft

)=

EPBteaBt− 1

2a2tEP

(ea(BT−Bt)− 1

2a2(T−t)|Ft

)=

EPBteaBt− 1

2a2t =

∫

Rxeax− 1

2a2t 1√

2πe−

x2

2t dx =∫

Rx

1√2π

e−(x−at)2

2t dx = at.

In other words, the mean of the process Bt under QT equals at. A similarcalculation shows that

EQT(Bt − at)2 = t, t ∈ [0, T ],

66more precisely, is a subset of a µX -null set


i.e. B has the same variance both under QT and under P. In a minute, weshall see that the process Bt = Bt − at is the Brownian motion under QT ,which means B is the Brownian motion with drift a ! ¥

One of the most famous results in the area is

Theorem 8.5 (I.Girsanov). For a process X ∈ P and a fixed T > 0,define the stochastic exponential

Zt := exp(∫ t

0XsdBs − 1

2

∫ t

0X2

s ds

), t ∈ [0, T ]

Assume that EZT = 1 and define the probability measure QT on FT bydQT

dP(ω) := ZT (ω).

Then the process

Bt := Bt −∫ t

0Xsds

is the Brownian motion on (Ω,F, (Ft)t≤T ,QT ).

Proof. (exercise) ¤Remark 8.6. The theorem above assumes that EPZT = 1, which holds

if and only if Zt, t ≤ T is a martingale. Indeed, let τn = inft ≥ 0 : X2t ≥ n,

then the stopped process Zt∧τn is a martingale (why?). Since t ∧ τn → t,P-a.s., by the Fatou lemma

E(Zt|Fs) = EP(limn

Zt∧τn |Fs) ≤ limnEP(Zt∧τn |Fs) = EP lim

nZs∧τn = EPZs,

i.e. Zt is a nonnegative supermartingale. If it is not a martingale, thenE(ZT ) < Z0 = 1. Hence ZT = 1 implies that Z is a martingale.

The question left is when the process Z is a martingale ? For example,this holds obviously for bounded X. Establishing the martingale property ofZ is an interesting and quite delicate problem. Here is a classical conditiondue to A.Novikov under which Z is a martingale:

EP exp(

12

∫ T

0X2

s ds

)< ∞.

The Girsanov theorem has numerous applications in stochastic calculusand in finance in particular, as we shall witness in the next section. Hereare some examples.

Example 8.7. Consider the Brownian motion with drift process Yt :=Bt + at, a ∈ R. Define the stopping time

τb(Y ) := inft ≥ 0 : Yt = b,where b is a fixed number and inf∅ = ∞, by convention. Our goal is tofind the distribution of τb(Y ). Pay attention, that τb(Y ) may be an improperrandom variable, i.e. P(τb(Y ) < ∞) < 1 cannot be excluded a priori.


We shall tackle the problem by changing measure on FT so that un-der new measure Y becomes the standard Brownian motion, for which thedistribution of τb(Y ) has already been computed. To this end, set

dQT

dP(ω) := exp

(− aBT − 1

2a2T

)=: ZT .

Let’s check that Zt is a martingale (under P): for s ≤ t ≤ T

EP(Zt|Fs) = ZsEPe−a(Bt−Bs)− 12a2(t−s) = Zs.

Hence by the Girsanov theorem the process

Yt = Bt + at

is the Brownian motion under QT . Further,

P(τb ≤ T ) = EQT1τb≤T

dQT

dP= EQT

1τb≤T1

ZT=

EQT1τb≤TeaBT + 1

2a2T = EQT

1τb≤TeaYT− 12a2T =

EQT1τb≤TEQT

(eaYT− 1

2a2T

∣∣FT∧τb

)= EQT

1τb≤TeaYT∧τb

− 12a2T∧τb =

EQT1τb≤TeaYτb

− 12a2τb = EQT

1τb≤Teab− 12a2τb .

Under QT , Y is the Brownian motion and as we already saw τb(Y ) has thedensity

fτb(t) :=

b√2πt3

e−b2/2t, t ≥ 0

and hence

P(τb(Y ) ≤ T ) =∫ ∞

01t≤Teab− 1

2a2tfτb

(t)dt,

i.e. under P, τb(Y ) has the density

fτb(t) := eab− 1

2a2t b√

2πt3e−b2/2t.

Moreover, by the dominated convergence

P(τb(Y ) < ∞) = limT→∞

P(τb(Y ) ≤ T ) =∫ ∞

0fτb

(t)dt =

eab

∫ ∞

0e−

12a2t b√

2πt3e−b2/2tdt = eabe−|ab|,

where in the last equality we used the Laplace transform formula

EPe−λτb(B) = e−|b|√

2λ, λ ≥ 0.

Hence if ab > 0, then P(τb(Y ) < ∞) = 1, otherwise P(τb(Y ) < ∞) = e2ab <1, i.e. Y may never hit the level b > 0, if its drift is negative. ¥

Here is an example of a statistical flavor:


Example 8.8. In classical statistics the data is assumed to be a sample,i.e. a single realization, from an unknown probability distribution and theobjective is to infer this underlying distribution on the basis of the sample.Here is a standard parametric experiment: suppose Xn := (X1, ..., Xn) arei.i.d. random variables from the probability density f(x; θ) (with respect tothe Lebesgue measure) and we observe a particular realization of Xn. θ is anunknown parameter, assumed to be a point in an open subset Θ ⊆ R, whosevalue we want to estimate, given the observed sample. The crucial role inthe inference of this type plays the likelihood function, which is defined asthe probability density of the vector Xn, i.e.

L(xn; θ) =n∏

i=1

f(xi; θ), xn = (x1, ..., xn) ∈ Rn.

In particular, if the function θ 7→ L(xn; θ) is continuous for all xn ∈ Rn,then the maximum likelihood estimator (MLE) is defined as

θn(Xn) := argmaxθ∈ΘL(Xn; θ),

where Θ is the closure of Θ. Remarkably, it turns out that under certainregularity conditions on f(x; θ), the MLE is consistent, i.e. converges to thetrue value of the parameter, and its asymptotical performance cannot beimproved (in a certain sense).

Suppose now we observe the motion of the Brownian particle, whichhas an unknown drift. More precisely, we assume that the position of theparticle at time t ≥ 0 is given by

Xt = Bt + θt, t ≥ 0,

where θ ∈ R is the unknown drift parameter. We observe the trajectoryXT (ω) = Xt(ω), t ∈ [0, T ] and we want to estimate θ.

In the modern digital world, the trajectory will not be observed as afunction of continuous time parameter, but rather discretized on a fine grid,i.e. one will actually observe XT,δ(ω) := Xδj , j = 0, ..., [T/δ]. It is intu-itively clear, however, that any discretization inevitably loses informationand hence the inference from the trajectory XT may provide useful boundson the attainable performance. Another advantage of working with XT isthat the emerging estimation algorithms are typically simpler than thosewhich are obtained after discretization and discretizing the continuous timeestimator to fit the discrete observations is often advantageous.

Motivated by this prelude, we need a more abstract definition of thestatistical experiment and the likelihood function.

Definition 8.9. A statistical experiment consists of a measurable space(S, S) and a family of probability measures (µθ)θ∈Θ on the σ-algebra S, whereΘ is a subset of a metric space. The data X in such experiment is assumedto take values in S and be a sample from µθ0 for a fixed unknown value ofthe parameter θ0. The family (µθ)θ∈Θ is said to be dominated by a reference


measure ν, independent of θ, if µθ ¿ ν for all θ. The R-N derivative dµθdν (x)

is called the likelihood function67 of the experiment.

This definition is flexible enough to include many settings of interest,such as the continuous time problem addressed above. So in our experi-ment, we can take (S, S) :=

(C[0,T ], B(C[0,T ])

), the space of the continuous

functions with the supremum metric and the corresponding Borel σ-algebra.The process X induces the measures µX

θ , θ ∈ Θ = R on this space and weobserve the sample XT ∼ µX

θ0, where θ0 is the unknown value of the param-

eter.In view of the Girsanov theorem the natural candidate for the reference

measure is the Wiener measure µB, induced by the Brownian motion. Tothis end, define the probability measure QT on (Ω,FT ) by

dQT

dP(ω) = exp

(−

∫ T

0θdBt − 1

2

∫ T

0θ2dt

)=

exp(−θBT (ω)− 1

2θ2T

)= exp

(−θXT (ω) +

12θ2T

).

By the Girsanov theorem, Xt = Bt + θt, t ∈ [0, T ] is the Brownian motionunder Q. Hence

µXθ (A) = P(X ∈ A) = EQT

1X∈AdP

dQT(ω) =

EQT1X∈AeθXT (ω)− 1

2θ2T =

∫

AeθxT− 1

2θ2T µB(dx).

The latter means that µXθ ¿ µB and the likelihood of the experiment is

L(XT ; θ) :=dµX

θ

dµB(X) = exp

(θXT − 1

2θ2T

).

The MLE of θ is

θT (XT ) = argmaxθ∈R log L(XT ; θ) = argmaxθ∈R(θXT − 12θ2T ) =

XT

T.

It is not hard to see that Eθ0(θT (XT )− θ0)2 = 1/T → 0 as T →∞, i.e. theMLE is consistent.

¥8.1. Exercises.

Problem 2.8.1. Which of the following processes X = (Xt)t∈[0,T ] inducea measure equivalent to the Wiener measure ? Find the correspondingRadon-Nikodym derivative, whenever appropriate.

(1) Xt = Bt + 1

67the likelihood function depends on the particular choice of the reference measure ν;however, this ambiguity does not alter the inference (think why)


(2) Xt = Bt +√

t

(3) Xt = Bt + t2/3

(4) Xt = Bt + 2 tT 1t≤T/2 + 2(1− t/T )1t>T/2

(5) Xt = Bt +∫ t0 Bsds

Problem 2.8.2. (cf. Problem 1.1.2 (15)) Let P and Q be probabilitymeasures on (Ω, F) and assume that Q¿ P. Let G ⊆ F be a σ-algebra andξ a random variable with EP|ξ| < ∞. Prove the following change of measureformula for conditional expectations:

EP(ξ|G) =EQ

(dPdQξ

∣∣G)

EQ(

dPdQ

∣∣G) , P− a.s.

Hint: check that the right hand side satisfies the axioms of the condi-tional expectations

Problem 2.8.3. In this problem we shall prove the Girsanov theorem,assuming that the process X is bounded, i.e. |Xt| ≤ C P-a.s. with s constantC, and adapted. Define the stochastic exponential

Zt(X) = exp(∫ t

0XsdBs − 1

2

∫ t

0X2

s ds

), t ≥ 0.

Assume that Z is a martingale and hence EZT = 1 for a fixed T > 0. Definethe probability measure QT on FT by

dQT

dP(ω) = ZT (ω).

We shall prove that the process Bt := Bt−∫ t0 Xsds, t ∈ [0, T ] is the Brownian

motion under QT . Our strategy will be to use Levy’s characterization of theBrownian motion: we shall check that B is a continuous martingale underQT , whose quadratic variation equals t.

(1) Explain why QT is not necessarily defined on any event from F

(2) Show that

EP(Zt(X)

)k ≤ exp(1

2k2C2T

), k ≥ 1, t ∈ [0, T ] (8.1)

Hint: Note that(Zt(X)

)k = Zt(kX) exp(

k(k − 1)2

∫ t

0X2

s ds

)(8.2)

and recall that the stochastic exponential Zt(kX) is a supermartin-gale.


(3) Show that Z solves the SDE

Zt = 1 +∫ t

0XsZsdBs, t ∈ [0, T ]

(4) Show that

EQT

(Bt|Fs

)=EP

(BtZt|Fs

)

Zs, 0 ≤ s ≤ t ≤ T (8.3)

Hint: use the result of Problem 2.8.2 and the assumed martingaleproperty of Z

(5) Applying the Ito formula to BtZt and using the bound (8.1), showthat BtZt is a martingale under P and hence by (8.3), Bt is amartingale under QT .

(6) Show that 〈B〉t = t under QT , i.e. that B2t −t is a martingale under

QT .

(7) (bonus) Show68that when X is bounded, Z is indeed a martingale.

Problem 2.8.4. Consider the Ornstein−Uhlenbeck process, generatedby the liner SDE

dXt = θXtdt + dBt, t ≥ 0, X0 = 0

where θ ∈ R is the unknown parameter.

(1) Using the Girsanov’s theorem, find the likelihood function L(XT ; θ)for this experiment

(2) Show that the MLE of θ is given by

θT (XT ) =

∫ T0 XtdXt∫ T0 X2

t dt

(3) Calculate the Fisher information contained in the sample XT

IT (θ) = E( d

dθlog L(XT ; θ)

)2= −E d2

dθ2log L(XT ; θ)

68The proof is by a localization argument: let τn = inft ≥ 0 : Zt ≥ n and argue thatEPZt∧τn = 1. Next argue that limn t∧τn = t, P-a.s. (why?) and that limn EPZt∧τn = EPZt

if supn EP(Zt∧τn

)2< ∞ (why?) and use the identity (8.2) to verify the latter.

CHAPTER 3

Option pricing in continuous time

1.2. The market model. Figure 1 depicts the daily exchange rates forUS $ on Tel Aviv stock market for the last three years. The curve appearsto be highly irregular, resembling a trajectory of a diffusion process. In theclassical Black-Scholes model, the stock price is assumed to be a trajectoryof the geometric Brownian motion, i.e. the solution of the Ito SDE

dSt = µStdt + σStdBt, t ∈ [0, T ], (1.4)

subject to the initial price St = s > 0. Here B is the Brownian motionon the probability space (Ω, F, Ft,P) with Ft := FB

t . The constants µ andσ > 0 are referred to as the local mean rate of return and the volatility of thestock. The larger volatility of the price, the more risky is the stock (thinkwhy).

Recall that this SDE has the unique strong solution

St = S0 exp((

µ− 12σ2

)t + σBt

), t ∈ [0, T ].

In particular, the stock price process St remains positive for all t ∈ [0, T ].The second component of the market Q is assumed to be risk free (e.g.

bank account) with the constant rate of return r > 0:

dQt = rQtdt, t ∈ [0, T ], (1.5)

subject to Q0 = 1, i.e. Qt = ert.A portfolio π = (πt)t∈[0,T ] is a measurable process πt = (βt, γt), adapted

to the filtration FSt (which coincides with FB

t ), satisfying∫ T0 γ2

t dt < ∞, P-a.s. (i.e. γ ∈ P). The value (or capital) of the portfolio π at time t ∈ [0, T ]is given by1

V πt = βtQt + γtSt.

A portfolio is self-financing if its value can change only through the changesof the prices on the market, more precisely:

Definition 1.10. The portfolio π is self-financing, denoted π ∈ SF , if

V πt = V π

0 +∫ t

0

(βtdQt + γtdSt

), t ∈ [0, T ]. (1.6)

1we shall use different notations in continuous and discrete time cases deliberately

117

118 3. OPTION PRICING IN CONTINUOUS TIME

The heuristics is provided by the discretization of the market on a parti-tion P = 0 = t0 < ... < tn = T of [0, T ]. Let πtj−1 be the portfolio, chosenat time tj−1 (adapted2 to FS

tj−1). At time tj , after the price Stj is revealed,

the value of the portfolio becomes

V πtj := βtj−1Qtj + γtj−1Stj .

These are the only funds, available to us for creating a new portfolio πtj

(which can now take into account the newly revealed price Stj ). Hence

βtj−1Qtj + γtj−1Stj = βtjQtj + γtjStj . (1.7)

2in discrete time we used slightly different indexing

0 100 200 300 400 500 600 700 8003.2

3.4

3.6

3.8

4

4.2

4.4

4.6

days since 30/05/2008

pric

e of

1U

S $

in N

IS

Figure 1. Daily US $ prices in NIS on the Tel Aviv stockmarket (the prices are not reported on weekends, nationalHolidays, etc.)

3. OPTION PRICING IN CONTINUOUS TIME 119

Further,

V πt = V π

0 +∑

j:tj≤t

(V π

tj − V πtj−1

)=

V π0 +

∑

j:tj≤t

(βtj−1Qtj + γtj−1Stj − βtj−2Qtj−1 − γtj−2Stj−1

) †=

V π0 +

∑

j:tj≤t

(βtj−1Qtj + γtj−1Stj − βtj−1Qtj−1 − γtj−1Stj−1

)=

V π0 +

∑

j:tj≤t

(βtj−1

(Qtj −Qtj−1

)+ γtj−1

(Stj − Stj−1

)),

where the equality † holds by (1.7). The latter is the “pre-limit” primitive ofthe integrals in (1.6) (note the resemblance to the Ito integral construction).

Recall that the European call option is a contract between the seller andthe buyer, according to which the holder pays a premium to the seller attime t = 0 and gets the right (but not the obligation) to buy the stock atthe strike price K at the maturity time T . If ST > K, the holder exercisesthe option, i.e. buys the stock at the price K and, assuming liquidity of themarket, immediately sells it at the market price ST , thus making the profitST −K > 0. If ST ≤ K, the holder doesn’t exercise the option.

In more general terms, a contingent claim with maturity time T on themarket (Q, S) is an FT -measurable random variable XT . The contingentclaim is simple, if it has the form XT := f(ST ) with a fixed measurablecontract function f . In particular, for the European call option f(s) :=(s−K)+, s ∈ R+.

1.3. Arbitrage in continuous time. An important question in thiscontext is how to set a fair premium for the option ? One reasonable ap-proach would be to choose the price so that adding the option as an addi-tional asset to our market does not generate arbitrage possibilities.

First, let’s recall that the classical notion of arbitrage suggests that apositive capital can be generated from zero investment with positive proba-bility:

Definition 1.11. An arbitrage opportunity is a self-financing portfolioπ, such that

V π0 = 0

P(V πT ≥ 0) = 1

P(V πT > 0) > 0.

The negation of this definition gives


Definition 1.12. The market satisfies No-Arbitrage (NA) property, iffor any self financing portfolio

V π

0 = 0P(V π

T ≥ 0) = 1=⇒ P(V π

T > 0) = 0.

In what follows, it will be convenient to work with the normalized pricesQt := Qt/Qt = 1 and St := St/Qt = e−rtSt. By the Ito formula, thenormalized stock price S satisfies the SDE

dSt = (µ− r)Stdt + σStdBt, t ∈ [0, T ]. (1.8)

Define the normalized portfolio value process

V πt = V π

t /Qt = e−rt(βtQt + γtSt) = βt + γtSt.

It is not a priori clear that a self financing portfolio with respect to themarket (Q,S) remains self-financing for the market (1, S). The followingcalculation shows that, in fact, it does:

dV πt =d(V π

t /Qt) = d(V πt e−rt) = −rV π

t e−rtdt + e−rtdV πt

†=

− rV πt dt + e−rt

(βtdQt + γtdSt

)=

− r(βte−rtQt + γte

−rtSt)dt + βte−rtrQtdt + γte

−rtdSt =

γt(−re−rtStdt + e−rtdSt) = γtdSt

where the equality † holds since π is self financing on the market (Qt, St).Suppose now that π is an arbitrage on the market (Q,S), i.e. V0 = 0

and VT ≥ 0 P-a.s., but P(Vt > 0) > 0. Then π is also an arbitrage on thenormalized market (1, S), since V0 = V0 = 0 and VT = e−rT VT ≥ 0, butP(Vt > 0) = P(VT > 0) > 0. Similarly, the converse is true: an arbitrage onthe normalized market is also an arbitrage on the original market. Henceestablishing the property NA for (Q,S) (and other notions of arbitrage tobe introduced below) reduces to establishing it for the market (1, S).

Remark 1.13. The normalizing asset is called the numeraire. The choiceof the numeraire is arbitrary, as long as its price is positive. In our market,the price of the risk free asset is a natural alternative.

When is a given market arbitrage free? The central role in tackling thisquestion plays the notion of equivalent martingale measure (EMM):

Definition 1.14. A measure P ∼ P is an EMM if the discounted stockprice under P is a martingale, i.e.

EP(St/Qt

∣∣Fs

)= Ss/Qs, 0 ≤ s ≤ t ≤ T.

In the analogous discrete time setting, the first Fundamental Theoremof finance states that the market is arbitrage free (in the sense NA) if andonly if the set of EMMs is nonempty. This statement can be proved in


continuous time for a different notion of arbitrage free market, which weshall now introduce through a number of preliminary definitions.

Consider the following classes of admissible portfolios:

Definition 1.15. For a ≥ 0, let

Πa(S) :=

π ∈ SF : V πt ≥ −a, t ∈ [0, T ]

Π+(S) := ∪a≥0Πa(S)

In words, Πa(S) is the family of portfolios, whose values do not dropbelow the level −a on the time interval [0, T ], i.e. no debts of more than aare allowed. Π+(S) are the portfolios, whose values do not drop below somenon-random constant.

In addition, we shall need the following classes of the test functions:

Definition 1.16. For an3a ≥ 0,

Ψa :=

ψ ∈ L∞(Ω,FT ,P) : ψ ≤∫ T

0γsdSs for some π ∈ Πa

and

Ψ+ :=

ψ ∈ L∞(Ω, FT ,P) : ψ ≤∫ T

0γsdSs for some π ∈ Π+

.

Finally, we are ready to define a number non-equivalent notions of ab-sence of arbitrage:

Definition 1.17. The market satisfies property NAa, if 4

Ψa ∩ L+∞(Ω,FT ,P) = 0.

The property NAa fails to hold, if for some portfolio π ∈ Πa, thereis a nonnegative random variable ψ with nontrivial positive part. If thisis the case, then taking V π

0 = 0, we get V πT =

∫ T0 γdSt ≥ ψ and hence

P(V πT > 0) > 0, which agrees with our understanding of arbitrage. Note

however that NAa is not comparable with NA, since e.g. it involves onlyparticular type of policies.

The property NA+ is defined similarly, using Ψ+ instead of Ψa. Finally,let’s introduce the property NA+, which is referred to as “no free lunch withvanishing risk”:

Definition 1.18. The market satisfies NA+ if 5

Ψ+ ∩ L+∞(Ω, FT ,P) = 0.

3L∞(Ω, FT ,P) is the space of essentially bounded FT -measurable random variables.A random variable ξ is essentially bounded, if P(|ξ| ≤ c) = 1 for a constant c

4L+∞(Ω, FT ,P) is the set of all nonnegative essentially bounded FT -measurable random

variables5Ψ+ is the closure of Ψ+ with respect to the norm of L∞(Ω, FT ,P):

ρ(ξ, η) = infC ≥ 0 : P(|ξ − η| ≤ C) = 1

, ξ, η ∈ L∞(Ω, FT ,P),

i.e. the set of limits of all convergent in the above metric sequences of essentially boundedrandom variables.


The class Ψ+ ∩ L+∞(Ω,FT ,P) contains functions, which are limits offunctions with vanishing negative part, interpreted as risk of getting V π

T < 0.Here is a typical form of the first fundamental theorem of finance in

continuous time (whose proof goes well beyond the scope of our course and,in fact, the scope of most of the textbooks on the subject)

Theorem 1.19 (F. Delbaen, W. Schachermayer, 93). Suppose that theprice processes on the normalized market are bounded semi-martingales, thenthe market is NA+ if and only if the set of equivalent martingale measuresis nonempty.

For the model at hand, the normalized stock price becomes a martingaleunder the measure6 P, defined by

dPdP

(ω) := ZT (ω) = exp(−µ− r

σBT − 1

2

(µ− r

σ

)2T

). (1.9)

Indeed, by the Girsanov theorem, the process

Bt := Bt +∫ t

0

µ− r

σds

is the Brownian motion and

St = S0 +∫ t

0(µ− r)Ssds +

∫ t

0σSsdBs = S0 +

∫ t

0σSsdBs

is a martingale under P (think why). Hence P is an equivalent martingalemeasure. Theorem 1.19 is not applicable per se, since St is unbounded.However, a search through the literature for an appropriate version of thistheorem, should convince you that the market is nevertheless arbitrage freein the sense NA+.

1.4. Option pricing through hedging. The second fundamental the-orem in the continuous time setting states completeness of the market in thefollowing sense:

Theorem 1.20. Assume that the set of EMMs contains only P, thenthere exists a self-financing policy such that VT = XT P-a.s. for all contin-gent claims XT with EP|XT | < ∞.

Once again, the proof of this claim is beyond our scope. It can beshown that the measure P, found above, is the only measure, under whichS is a martingale and hence the Black-Scholes market under considerationis complete. In particular, the hedging portfolio π exists for the Europeancall option. This means that for some x > 0 (but, perhaps, not all!), thereis a portfolio πt = (βt, γt) ∈ SF , such that

VT = x +∫ T

0γtdSt = (K − erT ST )+, P− a.s.

6since T is fixed, we shall drop the subscript T in PT


It is clear intuitively, that if it is possible to hedge the option with initialcapital x > 0, then it is a fortiori possible to hedge it with a larger initialcapital x′ > x. This suggests that a fair price for the contingent claim is theminimal initial capital required for hedging, more precisely:

C(XT ) := inf x ≥ 0 : ∃π ∈ Π+ such that V π0 = x and V π

T = XT .

If such a hedge with initial capital is traded on the market, selling/buyingthe option for a smaller/bigger price will generated arbitrage (try to check itfor the various definitions of arbitrage, introduced above). It can be shownthat on a complete market, the only arbitrage-free price of the option is theprice, of the minimal hedge.

A question of more practical interest, is how the price C(XT ) can becomputed and what the corresponding hedge portfolio is. The followingtheorem gives an answer in the form of the (neutral risk) valuation formula:

Theorem 1.21. Let P be the unique martingale measure on the market(Q,S) and XT is a nonnegative contingent claim with EPXT < ∞, then

C(XT ) = e−rTEPXT .

To prove this theorem we shall need the following result:

Theorem 1.22 (martingale representation theorem). Let B be a Brow-nian motion on a filtered probability space (Ω,F, FB

t ,P), satisfying the usualconditions. For an FT -measurable random variable X with E|X| < ∞, thereexists an essentially unique measurable adapted process ψt,T (ω) ∈ P, suchthat

E(X|FBt ) = EX +

∫ t

0ψs,T (ω)dBs.

We shall not give the proof of this remarkable theorem, but rather tasteit through examples.

Example 1.23. Let X := B2T . We saw that

B2T = 2

∫ T

0BtdBt + T,

which suggests the representation

E(B2T |FB

t ) = T +∫ t

02BsdBs,

i.e. ψt,T (ω) := 2Bt(ω) in this case. Note that it doesn’t depend on T , whichis not typical. ¥

The martingale representation is quite hard to find explicitly, beyondthe simple cases as above. Here is one remarkable exception:


Example 1.24. Consider the running maximum of the Brownian motionB∗

t := maxs≤t Bs. For T > 0,

E(B∗

T |FBt ) =

√2T

π+

∫ t

02

(1− Φ

(B∗

s −Bs√T − s

))dBs,

where Φ is the distribution function of N(0, 1) random variables. In partic-ular,

B∗T =

√2T

π+

∫ T

02

(1− Φ

(B∗

t −Bt√T − t

))dBt.

¥Now we are ready for the proof:

The proof of Theorem 1.21. Note that

C(XT ) := infx ≥ 0 : ∃π ∈ Π+ such that V π

0 = x and V πT = XT

,

where XT = e−rT XT . Hence we shall work with the normalized market andshow that

C(XT ) = EPXT .

Take a hedging portfolio π ∈ Π+ with initial capital x ≥ 0. By the self-financing property

V πt = V π

0 +∫ t

0γsdSs = V π

0 +∫ t

0γsσSsdB.

Under P, defined by (1.9), Bt := Bt+ µ−rσ t is a Brownian motion and the sto-

chastic integral is a local martingale (for the admissible γt, we cannot claimthat it is a martingale). However, by the definition of Π+ and the Fatoulemma (see Remark 8.6 for the similar argument), it is a supermartingaleand hence

EPVπT ≤ V π

0 = x.

Since the portfolio is a hedge, it follows that

EPXT ≤ V π0 = x.

The left hand side does not depend on the portfolio, which implies the bound

EPXT ≤ C(XT ).

It is left to exhibit a hedging portfolio π ∈ Π+, such that

V π0 = EPXT , and V π

T = XT .

To this end, consider the process

Vt := EP(XT |Ft) = Z−1t EP(ZT XT |Ft),

where ZT is defined in (1.9). Note that the process Vt satisfies the sameconditions as required from V π

t above:

V0 = EPXT , and VT = XT .


By the martingale representation theorem, there exists a process ψ ∈ P

EP(ZT XT |Ft) = EPZT XT +∫ t

0ψs,T dBs.

Recall that

Zt = 1−∫ t

0

µ− r

σZsdBs,

and hence by the Ito formula

dZ−1t = −Z−2

t dZt +122Z−3

t

(µ− r

σ

)2

Z2t dt =

µ− r

σZ−1

t dBt +(

µ− r

σ

)2

Z−1t dt.

Set ξt := EP(ZT XT |Ft) and apply the Ito formula to Z−1t ξt:

d(Z−1t ξt) = Z−1

t dξt + ξtdZ−1t + ψt,T

µ− r

σZ−1

t dt =

Z−1t ψt,T dBt + ξt

µ− r

σZ−1

t dBt + ξt

(µ− r

σ

)2

Z−1t dt + ψt,T

µ− r

σZ−1

t dt =

Z−1t σ

(ψt,T

σ+ ξt

µ− r

σ2

)dBt + Z−1

t (µ− r)(ψt,T

σ+ ξt

µ− r

σ2

)dt =

Z−1t

(ψt,T

σ+ ξt

µ− r

σ2

)((µ− r)dt + σdBt

)=

ψt,T

σ + ξtµ−rσ2

ZtStSt

((µ− r)dt + σdBt

).

Now define the portfolio πt = (βt, γt):

βt := Vt − γtSt

γt :=ψt,T

σ + ξtµ−rσ2

ZtSt.

(1.10)

The so defined portfolio is self-financing:

V πt = βt + γtSt = Vt = EPXT +

∫ t

0γsdSs.

Since Vt ≥ 0, π ∈ Π0 ⊆ Π+ and, moreover, V π0 = EPXT and V π

T = XT , asrequired. ¤

Note that computing the hedging portfolio using the formulas (1.10) canbe quite hard, since e.g. they require finding the martingale representationfor a quite complicated random variable. When XT is a simple contingent


claim, i.e. XT = e−rT f(ST ), a trick provides an alternative! The priceprocess S is Markov and hence

V πt = Z−1

t EP(ZT XT |Ft) = EP(

ZT

ZtXT |Ft

)=

EP(

ZT

ZtXT |FSt

)= EP

(ZT

ZtXT |St

),

where the latter equality holds by the Markov property of S (think why).Hence there is a function C(t, T, s) such that

V πt = C(t, T, St).

Now suppose this function is C1,2, then by the Ito formula:

C(t, T, St) = C(0, T, S0) +∫ t

0Ct(u, T, Su)du+

∫ t

0Cs(u, T, Su)dSu +

12

∫ t

0Css(u, T, Su)σ2S2

udu,

where we use subscripts to denote the partial derivatives of C(t, T, s) withrespect to t and s respectively. On the other hand,

C(t, T, St) = V πt = EP(ZT XT ) +

∫ t

0γudSu

By the uniqueness of the Doob-Meyer decomposition, it follows that

γt = Cs(t, T, St),

and by the first formula in (1.10),

βt = C(t, T, St)− Cs(t, T, St)St.

Another conclusion is that P-a.s.

Ct(t, T, St) +12Css(t, T, St)σ2S2

t = 0.

Since St has a positive density (explain why), it follows

Ct(t, T, s) +12Css(t, T, s)σ2s2 = 0, s ∈ R+

subject to C(T, T, s) = e−rT f(erT s).In terms of the original market (Q,S), the value of the hedging portfolio

is given by

C(t, T, s) := ertC(t, T, e−rts).


Hence

Ct(t, T, s) = ∂t

(ertC(t, T, e−rts)

)=

rertC(t, T, e−rts) + ert(Ct(t, T, e−rts)− re−rtsCs(t, T, e−rts)

)=

rertC(t, T, e−rts)− ert(1

2Css(t, T, e−rts)σ2e−2rts2 + re−rtsCs(t, T, e−rts)

)=

rC(t, T, s)− 12Css(t, T, s)σ2s2 − rsCs(t, T, s),

and C(T, T, s) = f(s). Moreover,

Cs(t, T, s) = ertCs(t, T, e−rts) = Cs(t, T, e−rts)

andγt = Cs(t, T, St) = Cs(t, T, St)

To recap, the value of the hedging portfolio C(t, T, s) at time t ∈ [0, T ]of a simple contingent claim with maturity time T and contract function fsatisfies the PDE terminal value problem

Ct + rsCs +12σ2s2Css = rC

C(T, T, s) = f(s)(1.11)

and the hedging portfolio is given by

βt = e−rt(C(t, T, St)− StCs(t, T, St)

)

γt = Cs(t, T, St).(1.12)

The equation (1.11) is called the fundamental PDE for the value of thehedging portfolio. The valuation formula is the Feynman-Kac stochasticrepresentation of the solution of the fundamental PDE (cf. Theorem 7.1).Do not forget that we assumed that C(t, T, s) is C1,2, which is not at allclear at the outset.

For the European call option explicit pricing formulas are available:

Proposition 1.25 (the Black-Scholes formula). The hedge price of theEuropean call option with maturity T and the strike price K > 0 on themarket (Q,S), governed by (1.4) and (1.5), is given by

C(t, T, St) = StΦ

(log St

K + (T − t)(r + 12σ2)

σ√

T − t

)−

Ke−r(T−t)Φ

(log St

K + (T − t)(r − 12σ2)

σ√

T − t

), (1.13)

where Φ is the c.d.f. of a standard Gaussian random variable. The corre-sponding hedging portfolio is

γt := Φ

(log St

K + (T − t)(r + 12σ2)

σ√

T − t

)(1.14)


and

βt := −Ke−rT Φ

(log St

K + (T − t)(r − 12σ2)

σ√

T − t

). (1.15)

Proof. (exercise) ¤The Black-Scholes formulas are illustrated in Figure 2. The horizontal

axis is the current price of the stock on the market at a particular time (blueline corresponds to t = 0 and the red line to t = 2.5) and the vertical axis isthe value of the hedging portfolio C(t, T, s). Note first, that C(t, T, s) ≤ sas shall be expected: after all, there is no point of buying an option for theprice greater or equal to the stock itself (think why). Further, s 7→ C(t, T, s)is an increasing function: for larger current price s, the option is likely tobe exercised and the seller will pay the amount of s − K, which increaseswith s. If the current price is small close to the maturity time (as e.g. att = 2.5), the option price is very low, since it is unlikely to be exercised atall. On the other hand, at time t = 0, the option price is higher, since it ispossible that the price will grow towards the maturity time.

90 95 100 105 110 115 1200

5

10

15

20

25

30

35

s

C(t

,T,s

)

B−S formula for the European call opt. with K = 100, T = 3, σ = 0.0174, r = 0.04

(K−s)+

C(2.5,T,s)C(0,T,s)

Figure 2. The option prices calculated by the Black-Scholes formulas


1.5. Exercises.

Problem 3.1.5. Derive the Black-Scholes formula from Proposition 1.25,following the steps

(1) Check that

C(t, T, St) = ertEP(e−rT (K − ST )+

∣∣Ft

)

(2) Using the probability law of S under P, derive the explicit formula(1.13) for C(t, T, s)

(3) Check that C(t, T, s) is C1,2 function in (t, s) ∈ R+ × R+.

(4) Derive the formulas (1.14) and (1.15) for γt and βt, using the for-mulas (1.12).

Problem 3.1.6. Louis Bachelier was the first to suggest using the Brow-nian motion to model the stock prices 7. Ignoring positiveness of the prices,he considered the following market model

Qt = 1

St = S0 + µt + σBt, t ∈ [0, T ],

In this problem, we shall derive explicit formulas for the hedge price of theEuropean call option and the corresponding hedging portfolio for such amarket.

(1) Find the R-N derivative, defining the equivalent martingale mea-sure P for this market

(2) Following the proof of Theorem 1.21, verify the valuation formulafor pricing a contingent claim XT on the Bachelier market

C(XT ) = EPXT ,

and show that the value of the corresponding hedging portfolio πis given by

V πt = EP(XT |Ft).

(3) Show that the option price at time t ∈ [0, T ] for the Europeancall option with the strike price K and maturity T is given by theBachelier formula

C(t, T, St) = (St −K)Φ(

St −K

σ√

T − t

)+ σ

√T − tϕ

(St −K

σ√

T − t

),

where Φ and ϕ are the c.d.f. and the p.d.f. of the standard Gaussianrandom variable.

7read more on http://en.wikipedia.org/wiki/Louis Bachelier


(4) Check that C(t, T, s) is a C1,2 function in (t, s) ∈ R+ × R. Showthat the hedging portfolio is given by

βt = C(t, T, St)− γtSt

γt = Φ(

St −K

σ√

T − t

).

and derive the corresponding fundamental PDE

Ct +12σ2Css = 0

C(T, T, s) = (K − s)+

(5) Plot C(0, 3, s) and C(2, 3, s) as a function of s for K = 100, σ =0.0174, T = 3. Discuss the obtained picture.

Problem 3.1.7. European put option on the stock price is a simplecontingent claim with maturity time T and the contract function f(s) =(K − s)+.

(1) Give a hypothetical real life example of a contract, correspondingto the European put option.

(2) Derive explicit formulas for the hedging portfolio and its valueC(t, T, s) on the Black-Scholes market as in Problem 3.1.5

(3) Plot C(0, 3, s) and C(2, 3, s) as a function of s for K = 100, σ =0.0174, r = 0.04 and T = 3. Discuss the obtained picture.

Problem 3.1.8. Asian call option with maturity time T and the strikeprice K > 0 is a contingent claim of the form

XT =(

1T

∫ T

0Sudu−K

)+

.

In this problem we shall price this option on the Black-Scholes market.

(1) Explain why XT is not a simple contingent claim

(2) Prove the valuation formula for the hedging portfolio value

C(t, T, St, At) = e−r(T−t)EP((AT −K)+|Ft

)=

e−r(T−t)EP((AT −K)+|At, St

),

where

At =1T

∫ t

0Sudu.


(3) Assuming that the function C(t, T, s, a) is sufficiently smooth, showthat it satisfies 8 the backward PDE:

Ct + rsCs +12σ2s2Css +

1T

Ca − rC = 0

C(T, T, s, a) = (a−K)+,

and the hedging portfolio is given by

βt = C(t, T, St, At)− StCs(t, T, St, At)

γt = StCs(t, T, St, At)

8this PDE, as most of the PDEs, do not admit an explicit solution and is solvednumerically in practice

Bibliography

[1] I. Karatzas, S.E. Shreve, Brownian Motion and Stochastic Calculus, 2nd ed., Springer,1991.

[2] T. Bjork, Arbitrage Theory in Continuous Time, 3d ed., Oxford University Press,2009.

[3] A. Shiryaev, Essentials of Stochastic Finance: Facts, Models, Theory, World ScientificPub, 1999

[4] A. Shiryaev, Probability, 2nd ed. Springer, 1995

133

elements of stochastic analysis with application to finance...

Documents