stochastic processesbelak.ch/wp-content/uploads/2018/01/lecturenotes-stoprows17-2018-01-31.pdf ·...

STOCHASTIC PROCESSES

Dr. Christoph BelakWinter Term 2017/18

Christoph Belak

Department IV – MathematicsUniversity of TrierUniversitätsring 19D–54269 TrierGermany

[email protected]

CONTENTS

1 Stochastic Processes 11.1 Examples of Stochastic Processes . . . . . . . . . . . . . . . . 41.2 Filtrations, Stopping Times, and Hitting Times . . . . . . . . . 9

2 Brownian Motion 272.1 Definition of Brownian Motion . . . . . . . . . . . . . . . . . 282.2 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . 292.3 Path Properties of Brownian Motion . . . . . . . . . . . . . . . 352.4 The Kolmogorov Consistency Theorem . . . . . . . . . . . . . 382.5 The Kolmogorov-Centsov Continuity Theorem . . . . . . . . . 502.6 Existence of Brownian Motion . . . . . . . . . . . . . . . . . . 58

3 Martingales 613.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . 613.2 Discrete Time Martingales . . . . . . . . . . . . . . . . . . . . 663.3 Paths of Continuous Martingales . . . . . . . . . . . . . . . . 73

4 Martingale Convergence 774.1 Upcrossings and Submartingale Limits . . . . . . . . . . . . . 774.2 The Martingale Convergence Theorem . . . . . . . . . . . . . 844.3 Continuous Time Martingales . . . . . . . . . . . . . . . . . . 894.4 Càdlàg Modifications . . . . . . . . . . . . . . . . . . . . . . . 92

i

Contents

5 Lévy Processes 975.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . 985.2 Properties of Lévy Processes . . . . . . . . . . . . . . . . . . . 1035.3 The Strong Markov Property . . . . . . . . . . . . . . . . . . . 1075.4 Characterization of Brownian Motion and Poisson Process . . 1115.5 The Lévy-Ito Decomposition . . . . . . . . . . . . . . . . . . . 119

Index 141

Vocabulary 145

ii

Chapter

1 STOCHASTIC PROCESSES

In the course on probability theory, the main object of study was that of arandom variable. Random variables are often interpreted as mathematicalmodels of static random phenomena, that is, random phenomena which oc-cur only once. Think for example of the lottery next weekend, the outcomeof which could be modeled as a hypergeometric random variable.

In this course, we direct our focus towards the mathematics behind dynamicrandom phenomena, i.e. random phenomena which have an additional timecomponent. Examples could be the sunshine hours each day next year, stockprices, or even the number of students showing up in each lecture of thiscourse. A natural way of modeling these phenomena is to use sequences ofrandom variables, or, more generally, families of random variables indexedby a suitable time index set. These families will be referred to as stochasticprocesses and are the main object of interest in this course.

To formalize the notion of a stochastic process mathematically, we take asgiven a probability space (Ω,A,P) (the source of uncertainty) as well asa measurable space (S,S) (the space of possible outcomes at each timeinstant). Moreover, we fix a time index set T , which is assumed to be asubset of the extended real line R , R ∪ −∞,∞.

Definition 1.1 (Stochastic Process). A family X = X(t)t∈T of S-valuedrandom variables is called a stochastic process.

Observe that the random variables X(t)t∈T are defined on the same prob-ability space and take values in the same measurable space. Typicalchoices for the time index set are T = 1, . . . , N for N ∈ N, T = N0,T = [0,∞), or T = [0, T ] for 0 < T ≤ ∞.

1

Chapter 1 Stochastic Processes

The parameter t in the definition of a stochastic process is interpreted asa time parameter and hence the time index set T can be thought of asthe set all possible time points. The random variable X(t) is the model forthe dynamic random phenomenon at time t. For example, if we chooseT = 1, 2, . . . , 365 and S = 1, 2, . . . , 24, then X(1) is a model for thenumber of sunshine hours on the first day next year, X(2) is our model forthe number of sunshine hours on the second day next year, and so on.

In Definition 1.1, we take the point of view that for each t ∈ T fixed, thetime-t value of the stochastic process is described by the random variable

X(t) : Ω→ S, ω 7→ X(t, ω).

On the other hand, we may also turn this around, keep ω ∈ Ω fixed, andthink of the stochastic process as a mapping

X(·, ω) : T → S, t 7→ X(t, ω).

The mapping X(·, ω) is called a path of the stochastic process and de-scribes the dynamic evolution if the state of the world is ω ∈ Ω.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−30

−20

−10

0

10

20

30

X(t1, ω1)

X(t2, ω1)

X(t1, ω2)

X(t2, ω2)

X(t1, ω3)

X(t2, ω3)

X(·, ω1)

Time t

Valu

eX(t)

Figure 1.1 Interpreting X as a family of random variables vs. interpreting X as apath-valued random variable.

2

Taking this interpretation one step further, we may also think of a stochasticprocess as a random variable taking values in the set of all paths, i.e.

X : Ω→ ST , ω 7→ X(·, ω) = X(t, ω)t∈T ,

where ST , f : T → S is the set of all functions from T to S, andST is equipped with the product σ-field ST ,

⊗t∈T S. Indeed, the path-

valued function X : Ω→ ST is A-ST -measurable if and only if the S-valuedfunction X(t) : Ω→ S is A-S-measurable for all t ∈ T .

Exercise 1. Let X = X(t)t∈T be a stochastic process. Show that X is A-ST -measurable if and only if X(t) is A-S-measurable for all t ∈ T .

Since we can regard stochastic processes as path-valued random variables,each stochastic process induces a probability measure on (ST ,ST ).

Definition 1.2 (Distribution of a Process). Let X = X(t)t∈T be a stochasticprocess. Then the distribution PX , P[X ∈ ·] on (ST ,ST ) of the path-valuedrandom variable

X : Ω→ ST , ω 7→ X(·, ω) = X(t, ω)t∈T ,

is called the distribution of X.

The distribution of a stochastic process X = X(t)t∈T can equivalently bedescribed by the family of all joint distributions of the process evaluated atfinitely many time points.

Definition 1.3 (Finite-Dimensional Distributions). Let X = X(t)t∈T be astochastic process and let F = [t1, . . . , tn] be a finite subset1 of T . Then wedenote by PF the joint distribution of X(t1), . . . , X(tn), i.e.

PF , P[(X(t1), . . . , X(tn)

)∈ ·]

on SF =⊗

k=1...,nS.

We call the family PF : F = [t1, . . . , tn] ⊂ T , n ∈ N the finite-dimensionaldistributions of the stochastic process X.

The finite dimensional distributions of a stochastic process characterize itsdistribution in the following sense: two stochastic processes X = X(t)t∈T

1[t1, . . . , tn] denotes the set t1, . . . , tn with the convention that t1 < . . . < tn.

3


and Y = Y (t)t∈T have the same finite-dimensional distributions if andonly if X and Y have the same distribution. This is a simple consequenceof the fact that the product σ-field ST is generated by the π-system of allfinite-dimensional cylinder subsets of ST , i.e. all sets of the form

×t∈T Bt, where Bt ∈ S for all t ∈ Tand Bt 6= S for at most finitely many t ∈ T .

Since two probability measures coincide if and only if they coincide on aπ-system generating the underlying σ-field, the statement follows. Hence, ifwe are only interested in distributional properties of a stochastic process, itsuffices to look at the finite-dimensional distributions.

Exercise 2. Let X = X(t)t∈T and Y = Y (t)t∈T be two stochastic pro-cesses. Show that X and Y have the same distribution if and only if X and Yhave the same finite-dimensional distributions.

1.1 Examples of Stochastic Processes

Before we proceed any further, let us take a look at several examples ofstochastic processes. We distinguish two different types: discrete time andcontinuous time stochastic processes.

Definition 1.4 (Discrete/Continuous Time Processes). Let X = X(t)t∈Tbe a stochastic process. If the time index set T is either finite or countablyinfinite, we say that X is a discrete time stochastic process. Otherwise wecall X a continuous time stochastic process.

Our first example of a stochastic process is a rather trivial one and, as itturns out, we have encountered it many times before.

Example 1.5 (White Noise). Let Ztt∈T be a family of independent and iden-tically distributed random variables with values in S , R. Then this familymay be regarded as a stochastic process X = X(t)t∈T by setting

X(t) , Zt for all t ∈ T .

The process X is known as the white noise process.

4


0 5 10 15 20 25 30 35 40 45 50

−1

0

1

Time t

Valu

eX(t)

Figure 1.2 Path of a white noise process X constructed from normally distributedrandom variables.

Observe that the time index set T of the white noise process can be cho-sen arbitrarily (in particular, it can both be a discrete time and continuoustime process), but is typically chosen to be N, in which case the white noiseprocess is simply a sequence of independent and identically distributed ran-dom variables. The white noise process has played a very important role inthe course on probability theory (as it shows up, e.g., in the law of largenumbers and the central limit theorem). From the perspective of stochas-tic processes, however, things only start getting interesting when there isdependence between the outcomes at two (or more) different time points.Thus, the white noise process will not play any major role in this course.

Example 1.6 ((Classical) Random Walk). Let Znn∈N be a sequence of inde-pendent and identically distributed random variables with values in S , R.Choose the time index set as T = N0 and define X = X(t)t∈N0 by

X(t) ,∑t

n=1 Zn for all t ∈ N0.

In this definition, we use the convention that∑0

n=1 , 0, i.e. X(0) , 0. Theprocess X is called a random walk. If, in addition, it holds that

P[Zn = 1] = P[Zn = −1] = 12

for all n ∈ N,

then X is called a classical random walk.

5


0 5 10 15 20 25 30 35 40 45 50

−5

0

5

Time t

Valu

eX(t)

Figure 1.3 Path of a classical random walk X.

Quite obviously, the random walk exhibits dependence between differenttime points, i.e. unless the random variables Znn∈N are almost surely con-stant, the random variables X(s) and X(t) for s, t ∈ N0 with s 6= t are notindependent. The dependence structure of the random walk is, however,relatively simple: It has independent and stationary increments.

Definition 1.7 (Independent/Stationary Increments). Let X = X(t)t∈T bea stochastic process. We say that X has independent increments if

X(t)−X(s) is independent of σ(X(r) : r ∈ T , r ≤ s

)for all s, t ∈ T , s < t.

Moreover, if the time index set is closed under addition, i.e. s+ t ∈ T whenevers, t ∈ T , we say that X has stationary increments if

X(t2 + s)−X(t1 + s) ∼ X(t2)−X(t1) for all s, t1, t2 ∈ T , t2 > t1.

Exercise 3. Let X = X(t)t∈T be a stochastic process. Show that X hasindependent increments if and only ifX(tn)−X(tn−1), . . . , X(t1)−X(t0), X(t0)are independent for all [t0, t1, . . . , tn] ⊂ T , n ∈ N.

It is not difficult to verify that the random walk has independent and sta-tionary increments. As a matter of fact, it is the only stochastic process onT = N0 with X(0) = 0 satisfying this property.

Exercise 4. Let X = X(t)t∈N0 be a stochastic process. Show that X is arandom walk if and only if X(0) = 0 and X has independent and stationaryincrements.

6


Example 1.8 (Markov Chain). As before, let Znn∈N be a sequence of inde-pendent and identically distributed R-valued random variables. Suppose that2

g : R× S → S is B(R)⊗S-S-measurable.

Given an S-valued random variable X(0) which is independent of Znn∈N, weconstruct a stochastic process X = X(t)t∈N0 recursively by setting

X(t) , g(Zt, X(t− 1)

)for all t ∈ N.

The process X is called a Markov chain.

Observe that we have already encountered an example of a Markov chain:The random walk. Indeed, the random walk X can be constructed as inExample 1.8 by setting X(0) , 0 and

g : R× R→ R, g(z, x) , z + x.

0 5 10 15 20 25 30 35 40 45 50−0.5

0

0.5

1

1.5

Time t

Valu

eX(t)

Figure 1.4 Path of a Markov chain X.

Figure 1.4 shows another example of a Markov chain in which X(0) isBernoulli distributed with P[X(0) = 0] = P[X(0) = 1] = 1/2, each Zn isuniformly distributed on [0, 1], and, for p ∈ [0, 1], the mapping g is chosen as

g : R× 0, 1 → 0, 1, g(z, x) ,

x if z < p,

1− x if z ≥ p.2Here and subsequently, B(E) always denotes the Borel σ-field on a topological space E,

i.e. the smallest σ-field containing the open sets in E.

7


Exercise 5. Show that the Markov chainX depicted in Figure 1.4 is stationary,i.e. show that for every s ∈ N, the process X(· + s) = X(t + s)t∈N0 has thesame distribution as X.

Let us conclude this section with an example of a continuous time process:The renewal process.

Example 1.9 (Renewal Process). Let Znn∈N be a sequence of independentand identically distributed random variables taking values in [0,∞). We definea stochastic process X = X(t)t∈[0,∞) by setting

X(t) , supn ∈ N :

∑nk=1 Zk ≤ t

for all t ∈ [0,∞).

Then X is called a renewal process.

The name renewal process stems from the following interpretation: Think-ing of Zn as the lifetime of the n-th replacement of some component (say,a light bulb), X(t) is the number of replacements by time t. Every time acomponent breaks down and has to be renewed, X jumps by one.

0 0.2 0.4 0.6 0.8 1

0

2

4

6

Time t

Valu

eX(t)

Figure 1.5 Path of a renewal process X.

It should be noted that the renewal process is well defined since the supre-mum in the definition of X(t) is only taken over a countable set and henceX(t) is indeed a random variable. Moreover, X takes values in S = N0∪∞and each path of X is nondecreasing and right continuous. A process withthese properties is also referred to as a counting process.

8

1.2 Filtrations, Stopping Times, and Hitting Times

Definition 1.10 (Path Properties of a Process). Let X = X(t)t∈T be astochastic process and let ♣ be a property of functions f : T → S. We saythat X is ♣ if each path of X is ♣, i.e.

X(·, ω) : T → S, t 7→ X(t, ω) is ♣ for all ω ∈ Ω.

This rather generic definition needs some explanation: An example of ♣could for example be right continuous. Thus a process X is said to be rightcontinuous if each path is right continuous. Thus, e.g., the renewal processis nondecreasing and right continuous.

Observe furthermore that the paths of the renewal process have existingleft limits. We recall that if S is a metric space (with metric d), then afunction f : T → S is said to have existing left limits if

f(t−) , lims↑t

f(s) exists for all t ∈ T .

If f has existing left limits and is furthermore right continuous, i.e.

f(t) = lims↓t

f(s) for all t ∈ T ,

we say that the function f is càdlàg (an acronym from the French continue àdroite, limite à gauche). Finally, we can associate with f the total variationVf : T → [0,∞] defined for each t ∈ T as

Vf (t) , sup∑n

k=1 d(f(tk−1), f(tk)

): [t0, . . . , tn] ⊂ T , n ∈ N

We say that f is of finite variation if Vf (t) <∞ for all t ∈ T .


We now turn to the mathematical concepts relating to the flow of infor-mation associated with a stochastic process and to the issue of stopping ofstochastic processes.

9


Consider the following example: Suppose there is a group of criminals thatregularly sets out to rob banks. We model their success by some 0, 1-valued stochastic process X = X(t)t∈N, where X(t) = 1 means that thecriminals were successful and did not get caught during the tth bank robberywhereas X(t) = 0 means that the group did get caught. What kind of in-formation can we associate with this process and how do these informationevolve with time? Clearly, this depends. A person who follows the newsregularly will know at any time t ∈ N which ones of the previous robberieswhere successful, i.e. this person knows the outcomes of X(1), . . . , X(t) atany time t ∈ N. On the other hand, a person who does not follow the newsand is not aware of the existence of the group of criminals cannot see theprocess X at all. A witness of the 5th bank robbery, who was previously un-aware of the criminals, clearly knows the outcome of X(5) at time t = 5, butdoes not know X(1), . . . , X(4). If after the 9th robbery the police sets out atrap which guarantees that the criminals will get caught on their next at-tempt, then the police officers know X(1), . . . , X(10) at time t = 9, whereasthe criminals only know X(1), . . . , X(9) at this point in time. Finally, ateach time t ∈ N, the criminals will have additional information on the nextbank robbery at time t + 1 such as location of the bank, calender date ofthe robbery, number of security guards in the bank, and so on, whereas thisinformation becomes available to the police only at time t+ 1.

How can we formalize this mathematically? What does it mean to knowthe outcome of a random variable? It means that for every event involvingthe random variable, we can decide whether the event has occurred or not.In other words: Knowing the random variable means knowing the σ-fieldgenerated by this random variable. In the above example, the regular newsobserver knows σ(X(1), . . . , X(5)) at time t = 5, whereas the eye witnessonly knows σ(X(5)) at time t = 5, and the person not following the newsmerely knows σ(∅). As time evolves, the regular news observer will gathernew information, i.e. his or her σ-field increases with time. The flow ofinformation available to an observer can therefore be modeled as a nonde-creasing family of σ-fields, indexed by the time index set T . Such a familyof σ-fields is called a filtration.

Definition 1.11 (Filtration; Filtered Probability Space). Let F = F(t)t∈Tbe a family of sub-σ-fields of A satisfying

F(s) ⊂ F(t) for all s, t ∈ T with s ≤ t.

10


Then F is called a filtration of (Ω,A). The quadruple (Ω,A,F,P) is called afiltered probability space.

The filtration F = F(t)t∈T represents the flow of information and has(for now) nothing to do with any stochastic processes. F(t) is simply theinformation available at time t ∈ T . In the bank robbery example, thefiltration of the news observer is given by F1 = F1(t)t∈N with

F1(t) = σ(X(1), . . . , X(t)

)for all t ∈ N.

On the other hand, the person who does not follow the news has the filtra-tion F2 = F2(t)t∈N given by

F2(t) = σ(∅) =∅,Ω

for all t ∈ N.

Moreover, the witness of the 5th robbery (who does not follow the news) hasthe filtration F3 = F3(t)t∈N with

F3(t) =

σ(∅) if t < 5

σ(X(5)) if t ≥ 5for all t ∈ N.

The criminals themselves have a filtration F4 = F4(t)t∈T which potentiallycontains more information than the filtration of the news observer, i.e.

F4(t) ⊃ F3(t) for all t ∈ N.

Finally, being all-knowing, God has the filtration F5 = F5(t)t∈T given by

F5(t) = A for all t ∈ N.

How does this link to stochastic processes? Recall that knowing a randomvariable means knowing the σ-field it generates. Hence knowing a stochasticprocess X = X(t)t∈T means knowing X(t) at any time t ∈ T . Since ourflow of information is modeled by a filtration F = F(t)t∈T , this is to saythat σ(X(t)) is a subset of F(t), i.e. X(t) is F(t)-measurable for all t ∈ T .

Definition 1.12 (Adaptedness). Let F = F(t)t∈T be a filtration. Then wesay that a stochastic process X = X(t)t∈T is adapted to the filtration F if

X(t) is F(t)-measurable for all t ∈ T .

11


Continuing with our example, the process X modeling the success of thebank robbers is adapted to the filtrations F1, F4, and F5, but not to F2 andF3.3 The filtration F1 in this example plays a special role, since it containsjust enough information to make the process X adapted, i.e. it contains justthe bare minimum of information to know X.

Definition 1.13 (Natural Filtration). Let X = X(t)t∈T be a stochastic pro-cess and consider the filtration FX = FX(t)t∈T given by

FX(t) , σ(X(s) : s ∈ T , s ≤ t

)for all t ∈ T .

Then FX is called the natural filtration of X.

Clearly, every stochastic process X is adapted to its natural filtration FX . Itis furthermore easy to see that FX is the smallest filtration X is adapted to.

Exercise 6. Let X = X(t)t∈T be a stochastic process and let F = F(t)t∈Tbe a filtration. Show that X is adapted to F if and only if

FX(t) ⊂ F(t) for all t ∈ T .

Exercise 7. Let Z = Z(t)t∈N be a white noise process and letX = X(t)t∈N0

be the corresponding random walk given by

X(t) ,∑t

n=1 Z(n), t ∈ N0.

Show that X is FZ-adapted, where FZ(0) , ∅,Ω and FZ(t)t∈N denotes thenatural filtration of Z.

Suppose that the group of criminals decides at some point τ to stop theircareer as bank robbers. Clearly, τ can both be deterministic (e.g. ‘Stop afterthe third robbery.’, i.e. τ = 3) or random (e.g. ‘Stop after three successfulrobberies in a row.’, i.e. τ = inft ≥ 3 : X(t) = X(t− 1) = X(t− 2) = 1). Itshould furthermore be possible for the criminals to decide never to stop, i.e.τ =∞. On the other hand, we would like to rule out times which look intothe future such as stopping exactly one robbery before the first unsuccessfulone: τ = inft ∈ N : X(t+ 1) = 0. These ideas are formalized in the notionof a stopping time.

3Unless, of course, X is deterministic.

12


Definition 1.14 (Stopping Time). Let F = F(t)t∈T be a filtration and letτ : Ω→ T ∪ ∞ be a mapping such that

τ ≤ t ∈ F(t) for all t ∈ T .

Then τ is called a stopping time with respect to F.

The notion of a stopping time does exactly the job that we hoped for. It canbe both deterministic and random and it takes values in the time index setbut also allows the value infinity. Moreover, since the filtration F models theinformation available at any point in time, the condition

τ ≤ t ∈ F(t) for all t ∈ T

simply means that at any time t, we can decide if τ has already occurred ornot. E.g., in the bank robbery example, the stopping time τ = inft ∈ N :X(t + 1) = 0 is in general not a stopping time with respect to the naturalfiltration FX since τ requires the knowledge of X(t+ 1) at time t.

Exercise 8 (Galmarino’s Test). Suppose that Ω = RT = ω : T → R isthe set of all functions from T to R (such that, in particular, the stopped pathω(· ∧ t) is a member of Ω for all ω ∈ Ω and t ∈ T ). Denote by X = X(t)t∈Tthe so-called canonical process given by

X(t, ω) , ω(t), t ∈ T , ω ∈ Ω.

Denote by FX = FX(t)t∈T the natural filtration of X and define FX(∞) ,σ(FX(t) : t ∈ T ). Show that the following statements hold:

(a) Let F ⊂ Ω and t ∈ T . Then F ∈ FX(t) if and only if F ∈ FX(∞) andthe following implication holds:

if ω ∈ F and X(s, ω) = X(s, ω) for all s ∈ T with s ≤ t,

then ω ∈ F.

(b) A mapping τ : Ω→ T ∪∞ is a stopping time with respect to FX if andonly if τ is an FX(∞)-measurable random variable and the followingimplication holds for all t ∈ T :

if τ(ω) ≤ t and X(s, ω) = X(s, ω) for all s ∈ T with s ≤ t,

then τ(ω) ≤ t.

13


We make two more important observations. First, observe that the propertyof being a stopping time depends on the filtration. While we have alreadyconvinced ourselves that τ = inft ∈ N : X(t + 1) = 0 is in general nota stopping time with respect to FX , it is a stopping time with respect tothe God filtration F5 (which was given by F5(t) = A, t ∈ T ). Second,observe that we do not assume that τ is a random variable. It is howeverstraightforward to verify that the condition τ ≤ t ∈ F(t) for all t ∈ Timplies that every stopping time is a random variable. In the case whenτ takes at most countably many values (which holds, in particular, if T is atmost countable), there is a weaker condition to check if a random time is astopping time.

Lemma 1.15 (Characterization of Discrete Stopping Times). Let τ : Ω →T ∪ ∞ be such that τ(Ω) , τ(ω) : ω ∈ Ω is at most countable and letF = F(t)t∈T be a filtration. Then τ is a stopping time with respect to F ifand only if

τ = t ∈ F(t) for all t ∈ τ(Ω) with t <∞.

Proof. Step 1: Suppose that τ = t ∈ F(t) for all t ∈ τ(Ω) with t < ∞ andlet s ∈ T . We have to check that τ ≤ s ∈ F(s), in which case we concludethat τ is a stopping time. If s = ∞, then τ ≤ s = Ω ∈ F(s), so supposethat s <∞. In that case, we have

τ ≤ s =⋃t∈τ(Ω),t≤sτ = t.

Now for every t ∈ τ(Ω) with t ≤ s, we have τ = t ∈ F(t) by assumptionand hence, since t ≤ s implies F(t) ⊂ F(s), we find that τ = t ∈ F(s). Butnow, since τ(Ω) is countable, this implies that τ ≤ s ∈ F(s).

Step 2: Suppose that τ is a stopping time and let s ∈ τ(Ω) with s < ∞. Wehave to show that τ = s ∈ F(s). For this, we first observe that, for everyt ∈ T with t ≤ s, we have τ ≤ t ∈ F(t) ⊂ F(s). But then, again since τ(Ω)is countable, we conclude since

τ = s = τ ≤ s \⋃t∈τ(Ω),t<sτ ≤ t ∈ F(s).

It is of course immediately clear that if τ : Ω → T ∪ ∞ is constant, i.e.τ ≡ t for some t ∈ T , then τ is a stopping time. By the previous lemma, we

14


only have to verify that τ = t ∈ F(t), which is obvious since τ = t = Ω.Hence all deterministic times are stopping times. The property of beinga stopping time is furthermore stable under several operations, for exampletaking pointwise minima and maxima.

Lemma 1.16 (Operations on Stopping Times). Let τ and σ be two stoppingtimes with respect to a filtration F = F(t)t∈T . Then

τ ∧ σ : Ω→ T ∪ ∞, τ ∧ σ(ω) , minτ(ω), σ(ω),τ ∨ σ : Ω→ T ∪ ∞, τ ∨ σ(ω) , maxτ(ω), σ(ω),

are stopping times with respect to F as well.

Proof. For all t ∈ T it holds that

τ ∧ σ ≤ t = τ ≤ t ∪ σ ≤ t ∈ F(t)

as well asτ ∨ σ ≤ t = τ ≤ t ∩ σ ≤ t ∈ F(t).

Thus both τ ∧ σ and τ ∨ σ are stopping times.

Exercise 9. Let F = F(t)t∈[0,∞) be a filtration and let τnn∈N be a sequenceof stopping times. Show that

τ : Ω→ [0,∞], τ(ω) , supn∈N

τn(ω)

is a stopping time.

Exercise 10. Let τ : Ω → T ∪ ∞ be a function and fix a filtration F =F(t)t∈T . Show that τ is a stopping time if and only if the indicator processX = X(t)t∈T given by

X(t) , 1τ≤t, t ∈ T

is adapted to F.

A natural question arising now is to ask which information are availableup to the stopping time τ . More precisely, if F ∈ A is some event, we wouldlike to know if F is known at time τ . In other words, at any given time

15


t ∈ T , we would like to know if F is known at time t, provided of coursethat τ has already occurred. Mathematically, this means that F is known attime τ if and only if F ∩ τ ≤ t ∈ F(t) for all t ∈ T .

Definition 1.17 (σ-Field of the τ -Past). Let τ be a stopping time with respectto a filtration F = F(t)t∈T . Then

F(τ) ,F ∈ A : F ∩ τ ≤ t ∈ F(t) for all t ∈ T

is called the σ-field of the τ -past.

We have to be a bit careful with the previous definition as it is quite sug-gestive. First, the name ‘σ-field of the τ -past’ suggests that F(τ) is a σ-field.This is indeed true and can easily be verified. Moreover, the notation F(τ)suggests some sort of compatibility with the filtration F. Of course, weshould never ever think of F(τ) as a mapping ω 7→ F(τ(ω)). Nevertheless,F(τ) is compatible with the filtration in that F(τ) = F(t) if τ is constantequal to t.

Exercise 11. Let τ : Ω → T ∪ ∞ be a stopping time with respect to afiltration F = F(t)t∈T . Show that F(τ) is a sub-σ-field of A. Moreover, if forsome t ∈ T we have τ(ω) = t for all ω ∈ Ω, show that F(τ) = F(t).

Another property of the σ-field of the τ -past which justifies the notation F(τ)is that the mapping τ 7→ F(τ) respects the order on F induced by the orderon T , i.e. if σ and τ are two stopping times with σ ≤ τ , then F(σ) ⊂ F(τ).

Lemma 1.18 (Properties of the σ-Field of the τ -Past). Let σ and τ be stoppingtimes with respect to a filtration F = F(t)t∈T . Then

F(τ ∧ σ) = F(τ) ∩ F(σ).

In particular, if σ ≤ τ , then F(σ) ⊂ F(τ). Finally, we have

σ ≤ τ, σ < τ, σ = τ ∈ F(τ ∧ σ).

Proof. Step 1: We show that F(τ ∧ σ) ⊂ F(τ) ∩ F(σ). Let F ∈ F(τ ∧ σ) andt ∈ T . Since σ ≤ t = σ ∧ τ ≤ t ∩ σ ≤ t, it follows that

F ∩ σ ≤ t =(F ∩ σ ∧ τ ≤ t

)∩ σ ≤ t.

16


As F ∈ F(τ ∧ σ), we have F ∩ σ ∧ τ ≤ t ∈ F(t), and, since σ is a stoppingtime, we see moreover that σ ≤ t ∈ F(t). Thus F ∩ σ ≤ t ∈ F(t),implying that F ∈ F(σ) since t ∈ T was chosen arbitrarily. Interchangingthe roles of τ and σ furthermore yields F ∈ F(τ), i.e. F ∈ F(τ) ∩ F(σ). Thisin turn implies F(τ ∧ σ) ⊂ F(τ) ∩ F(σ).

Step 2: We show that F(τ)∩F(σ) ⊂ F(τ ∧σ). Let F ∈ F(τ)∩F(σ) and t ∈ T .Using that τ ∧ σ ≤ t = τ ≤ t ∪ σ ≤ t and F ∈ F(τ) and F ∈ F(σ), wefind that

F ∩ τ ∧ σ ≤ t =(F ∩ τ ≤ t

)∪(F ∩ σ ≤ t

)∈ F(t).

However, this implies that F ∈ F(τ ∧ σ), i.e. F(τ) ∩ F(σ) ⊂ F(τ ∧ σ).

Step 3: We are left with showing that σ < τ ∈ F(τ ∧ σ) since, from this, itfollows that

σ ≤ τ = τ < σc ∈ F(τ ∧σ) and σ = τ = σ ≤ τ\σ < τ ∈ F(τ ∧σ).

To see that σ < τ ∈ F(τ ∧ σ), let t ∈ T be arbitrary and denote by Q acountable and dense subset of T with t ∈ Q. Then it holds that

σ < τ ∩ τ ≤ t =⋃q∈Q,q<tσ ≤ q < τ ≤ t

=⋃q∈Q,q<tσ ≤ q ∩

(τ ≤ t \ τ ≤ q

).

Since Q is countable, we must have σ < τ∩ τ ≤ t ∈ F(t), which impliesthat σ < τ ∈ F(τ). Conversely, we have

σ < τ ∩ σ ≤ t =⋃q∈Q,q≤tσ ≤ q < τ =

⋃q∈Q,q≤tσ ≤ q ∩ τ ≤ qc.

By countability of Q, this implies that σ < τ ∩ σ ≤ t ∈ F(t) and henceσ < τ ∈ F(σ). But then σ < τ ∈ F(τ) ∩ F(σ) ⊂ F(τ ∧ σ) by step 2.

Recalling that each deterministic time t ∈ T is also a stopping time andusing that F(τ ∧ t) ⊂ F(τ) and F(τ ∧ t) ⊂ F(t), Lemma 1.18 implies that theevents τ ≤ t, τ < t, τ = t are contained in both F(τ) and F(t).

Exercise 12. Let τ and σ be two stopping times with respect to a filtrationF = F(t)t∈T . Show that

F(τ ∨ σ) = σ(F(τ),F(σ)

).

17


Exercise 13. Let τ and σ be two stopping times with respect to a filtrationF = F(t)t∈T and let X be a random variable with E[|X|] <∞. Show that

1τ=σE[X∣∣F(τ)

]= 1τ=σE

[X∣∣F(σ)

]almost surely.

Thus far, we have not seen any nontrivial example of a stopping time andwe have not drawn any connection to stochastic processes yet. This is aboutto change: The prime example of a stopping time, which we shall encounterover and over again, is the first time that a stochastic process enters into ameasurable subset of its state space.

Definition 1.19 (Hitting Time). Let X = X(t)t∈T be a stochastic processand let B ∈ S. Then the mapping

τB : Ω→ T ∪ ∞, τB(ω) , inft ∈ T : X(t, ω) ∈ B

,

is called the (first) hitting time of B by X.

A priori, it is not clear if every hitting time is a stopping time with respectto a filtration F, even if we assume that X is adapted to F. This turns outto be a very deep question which goes far beyond the scope of this course(and is in general not true). Under some additional assumptions on X, F,and/or B, we can however give a positive answer to this question, and thiswill be sufficient for our purposes. The first, most basic, case is when X is adiscrete time process.

Lemma 1.20 (Hitting Times of Discrete Time Processes). Let X = X(t)t∈Tbe a discrete time stochastic process adapted to a filtration F = F(t)t∈T . Letmoreover B ∈ S and assume that

for each t ∈ T , the set s ∈ T : s ≤ t is finite.

Then the first hitting time τB of B by X is a stopping time. Similarly, if σ is astopping time, then the first hitting time τσB of B by X after σ given by

τσB , inft ∈ T ∩ (σ,∞) : X(t) ∈ B

is a stopping time as well.

18


Proof. The condition that s ∈ T : s ≤ t is finite for all t ∈ T implies that,for all ω ∈ Ω, we either have τB(ω) =∞ or

τB(ω) = inft ∈ T : X(t, ω) ∈ B

= min

t ∈ T : X(t, ω) ∈ B

∈ T ,

and the same can be said about τσB. Let now t ∈ T with t <∞. Then

τB = t = X(t) ∈ B ∩(⋂

s∈T ,s<tX(s) 6∈ B).

and, similarly,

τσB = t = σ < t ∩ X(t) ∈ B ∩(⋂

s∈T ,s<tX(s) 6∈ B ∪ σ < sc).

Since T is discrete, X is adapted, and σ is a stopping time, this shows thatτB = t ∈ F(t) and τσB = t ∈ F(t) for all t ∈ T with t < ∞, and weconclude by Lemma 1.15 (Characterization of Discrete Stopping Times).

In Lemma 1.20, the finiteness of the sets s ∈ T : s ≤ t is needed toguarantee that the hitting time τB takes values in T ∪ ∞ (in particular Thas a minimal element). Indeed, if for example T = 1/n : n ∈ N and wechoose B = S, then τB(ω) = 0 for all ω ∈ Ω, but 0 6∈ T .

Exercise 14. Let X = X(t)t∈N be a white noise process and let B ∈ B(R).Determine the distribution of the first hitting time τB of B by X and show thatτB <∞ almost surely if and only if P[Z1 ∈ B] > 0.

For continuous time processes, the issue of whether hitting times are stop-ping times is more delicate. Imagine, e.g., that X is a continuous processand B is an open set. Then at time τB, the process X will be on the bound-ary of B, but not in B itself. So, whenever the process X is on the boundaryof B, we need to be able to look an infinitesimal amount of time into the fu-ture for τB to be a stopping time (compare with Figure 1.6), since otherwisewe will not be able to decide whether we have to stop or not.

19


0 0.1 τB(ω1) 0.3 0.4 0.5 0.6 τB(ω2) 0.8 0.9 1

-20

0a

20X(·, ω1)

X(·, ω2)

Time t

Valu

eX(t)

Figure 1.6 Hitting time of the open set B = (a,∞) by a continuous process X.Observe that while both paths take the value a at time τB(ω1), only thepath X(·, ω1) enters B at this time, whereas X(·, ω2) moves away fromB. Hence, in order to decide if we should stop at any given time, weneed to be able to look an infinitesimal amount of time into the future.

Definition 1.21 (Right-Continuous Filtration). We say that a filtration F =F(t)t∈T is right continuous if

F(t+) ,⋂

s∈T ,s>tF(s) = F(t) for all t ∈ T .

Right continuity of a filtration F = F(t)t∈T is only interesting if the timeindex set is sufficiently rich. Indeed, if each t ∈ T has a distinct next elementt+1 (which holds, e.g., for T = N but not for T = Q), then F(t+) = F(t+1)for all t ∈ T and hence the notion of right continuity of F boils down to therequirement F(t) = F(t+ 1) for all t ∈ T .

Proposition 1.22 (Hitting Times of Continuous Time Processes). Let X =X(t)t∈[0,∞) be a stochastic process adapted to a filtration F = F(t)t∈[0,∞).Suppose that S is a metric space with metric d and assume that S = B(S)is the Borel σ-field on S. Then the hitting time τB of a set B ∈ S by X is astopping time with respect to F in each of the following cases:

(i) X is continuous and B is closed.

(ii) X is right continuous, B is open, and F is right continuous.

20


Proof. Step 1: Assume that X is continuous and B is closed. For everyt ∈ [0,∞), we claim that

τB ≤ t = X(t) ∈ B ∪(⋃

s∈[0,t)X(s) ∈ B)

(1.1)

= X(t) ∈ B ∪(⋂∞

k=1

⋃q∈[0,t)∩Q

d(X(q), B

)≤ 1/k

). (1.2)

From this and the adaptedness of X, it follows that τB ≤ t ∈ F(t), andhence τB is a stopping time. To see that this representation of τB ≤ tindeed holds, let us first fix ω ∈ Ω with τB(ω) <∞. Then the continuity of Xand the closedness of B imply that X(τB(ω), ω) ∈ B. But then ω ∈ τB ≤ timplies that there exists s ∈ [0, t] such that X(s, ω) ∈ B. On the otherhand, if ω ∈ X(s) ∈ B for some s ∈ [0, t], then τB(ω) ≤ s ≤ t. ThusEquation (1.1) is argued for. To see that Equation (1.2) holds, let us firstassume that ω ∈ X(s) ∈ B for some s ∈ [0, t). Then the continuityof X(·, ω) implies that there exists a sequence qkk∈N in [0, t) ∩ Q suchthat d(X(qk, ω), X(s, ω)) ≤ 1/k. But then, since X(s, ω) ∈ B, we mustnecessarily have d(X(qk, ω), B) ≤ 1/k. This shows that the set on the righthand side of Equation (1.1) is contained in the set in Equation (1.2). For thereverse containment, assume that ω ∈ Ω is such that, for each k ∈ N, thereexists qk ∈ [0, t) ∩ Q such that d(X(qk, ω), B) ≤ 1/k. Since the sequenceqkk∈N ⊂ [0, t] and [0, t] is compact, we may without loss of generalityassume that s , limk→∞ qk exists in [0, t] (drop to a subsequence if qkk∈Ndoes not converge). But then X(s, ω) ∈ B since by continuity of X we have

0 ≤ d(X(s, ω), B

)= lim

k→∞d(X(qk, ω), B

)≤ lim

k→∞1/k = 0.

Step 2: Assume that X and F are right continuous and B is open. Observethat, in this situation, we do not necessarily have X(τB) ∈ B on τB <∞,so we must be careful with events of the form τB = t. Nevertheless, ift ∈ [0,∞), we have

τB < t =⋃q∈[0,t)∩QX(q) ∈ B ∈ F(t). (1.3)

Indeed, if ω ∈ X(q) ∈ B for some q ∈ [0, t)∩Q, then clearly τB(ω) ≤ q < tand hence ω ∈ τB < t. On the other hand, if ω ∈ τB < t, then bydefinition of τB(ω) there exists s ∈ [0, t) such that X(s, ω) ∈ B. Now letqkk∈N be a sequence in [0, t) ∩ Q with qk > s and qk ↓ s as k → ∞. ThenX(qk, ω) → X(s, ω) by the right continuity of X. But then, since B is open

21


and X(s, ω) ∈ B, we must have X(qk, ω) ∈ B for all k ∈ N sufficiently large,i.e. ω ∈

⋃q∈[0,t)∩QX(q) ∈ B, which establishes Equation (1.3). But then

τB ≤ t =⋂q∈(t,∞)∩QτB < q =

⋂q∈(t,s]∩QτB < q ∈ F(s) for all s > t

since the events τB < q are increasing in q. But then τB ≤ t ∈ F(t+) =F(t) by right continuity of F and hence τB is a stopping time.

The previous proposition extends immediately to processes X = X(t)t∈Twith time index set of the form T = [0, T ] for some T > 0. Indeed, givenX, we can simply define a new process Y = Y (t)t∈[0,∞) by setting Y (t) ,X(t ∧ T ) for all t ∈ [0,∞) and extend the filtration F = F(t)t∈[0,T ] toa filtration on [0,∞) by setting F(t) = F(T ) for all t ≥ T . Since X andY coincide on [0, T ] and Y is constant outside of [0, T ], any hitting timeof the process X coincides with any hitting time of the process Y . ThusProposition 1.22 remain valid also for the process X.

Exercise 15. Let τ : Ω → [0,∞] and fix a filtration F = F(t)t∈[0,∞). SettingF(∞) , σ(F(t), t ∈ [0,∞)), we say that τ is an optional time with respect toF if

τ < t ∈ F(t) for all t ∈ [0,∞].

Show that every stopping time is an optional time. Moreover, show that τ isan optional time if and only if τ is a stopping time with respect to the filtrationF(·+) = F(t+)t∈[0,∞).

There are, of course, more cases in which hitting times turn out to be stop-ping times, but the two cases in Proposition 1.22 are sufficient for our pur-poses. The following exercise gives another sufficient set of conditions.

Exercise 16. Let X = X(t)t∈[0,∞) be a real valued, right continuous, nonde-creasing process which is adapted to a filtration F. Given a ∈ R, show that thehitting time τ[a,∞) of the set [a,∞) by the process X is a stopping time.

Now what are stopping times good for? Well, as the name suggests, theycan be used to stop stochastic processes.

Definition 1.23 (Stopped Process). LetX = X(t)t∈T be a stochastic processand let τ be a T -valued stopping time with respect to a filtration F. We define

X(τ) : Ω→ S, ω 7→ X(τ, ω) , X(τ(ω), ω). (1.4)

22


With this, if τ is an arbitrary T ∪ ∞-valued stopping time, we refer to Xτ =X(· ∧ τ) = X(t ∧ τ)t∈T as the process X stopped at time τ , or stoppedprocess for short.

We observe that Equation (1.4) makes sense for τ(ω) = ∞ only if ∞ ∈ T ,which is why we restrict this part of the definition to stopping times takingvalues in T instead of T ∪ ∞. Next, for any t ∈ T and any stopping timeτ (with values in T ∪ ∞), Lemma 1.16 (Operations on Stopping Times)implies that t ∧ τ is again a stopping time and t ∧ τ can only take the value∞ if∞ ∈ T . The stopped process Xτ simply freezes the original process Xat time τ , see Figure 1.7.

0 0.1 τ(ω) 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-20

0

20X(·, ω)

Xτ (·, ω)

Time t

Valu

esX(t)

andXτ(t)

Figure 1.7 A process X and the corresponding stopped process Xτ .

Now take another close look at Equation (1.4). Since ω 7→ X(τ(ω), ω) has adouble dependence on ω, it is not clear without any additional conditionsif X(τ) : Ω → S is a random variable and hence it is not clear if Xτ is astochastic process in the sense of Definition 1.1. If X is adapted to F, thenone would naturally expect Xτ (t) = X(t ∧ τ) to be F(t ∧ τ)-measurable forall t ∈ T , i.e. one would expect the stopped process Xτ = Xτ (t)t∈T to beadapted to the stopped filtration Fτ = F(· ∧ τ) = F(t ∧ τ)t∈T (and hencealso adapted to F as F(t ∧ τ) ⊂ F(t), t ∈ T ). As the next exercise shows,however, this is not true in general!

23


Exercise 17. Let Ω = [0, 1], A = B([0, 1]), and let P be the Lebesgue measureon (Ω,A). Moreover, fix a nonmeasurable subset A of Ω and define X =X(t)t∈[0,∞) by

X(t, ω) ,

t+ ω if t ∈ A,−t− ω if t 6∈ A,

, for all t ∈ [0,∞), ω ∈ Ω.

Show that σ(X(t)) = A and hence X is a stochastic process with naturalfiltration given by FX(t) = A for all t ∈ [0,∞). Now define τ : Ω→ [0,∞] by

τ(ω) , inft ≥ 0 : 2t ≥ |X(t)|

, ω ∈ Ω.

Show that τ is an FX-stopping time, that the σ-field of the τ -past is given byFX(τ) = A, but X(τ) > 0 6∈ FX(τ), i.e. X(τ) is not FX(τ) measurable.

So under which conditions is the stopped process adapted? The answer isonce again easy in discrete time, or, more generally, if the stopping timetakes only countably many values.

Lemma 1.24 (Measurability of Stopped Processes (Discrete Case)). Let X =X(t)t∈T be a stochastic process adapted to a filtration F = F(t)t∈T and letτ be a stopping time with respect to F. If τ(Ω) is countable, the stopped processXτ is adapted to the stopped filtration Fτ = Fτ (t)t∈T given by Fτ (t) ,F(t ∧ τ) for all t ∈ T . In particular, Xτ is adapted to F and, if τ is T -valued,X(τ) is F(τ)-measurable.

Proof. Assume that τ is T -valued. We argue that in this case X(τ) is F(τ)-measurable. From this, it follows immediately that for arbitrary stoppingtimes τ , the random variableXτ (t) = X(t∧τ) is F(t∧τ) = Fτ (t)-measurable,and hence Xτ is Fτ -adapted. Since Fτ (t) ⊂ F(t) for all t ∈ T , this further-more implies that Xτ is F-adapted.

Let B ∈ S and observe that, for any s, t ∈ T with s ≤ t, we have

X(τ) ∈ B ∩ τ = s = X(s) ∈ B ∩ τ = s ∈ F(s) ⊂ F(t)

by adaptedness of X and Lemma 1.18 (Properties of the σ-Field of the τ -Past). But then, using that τ(Ω) is countable, we find

X(τ) ∈ B ∩ τ ≤ t =⋃s∈τ(Ω),s≤t

(X(τ) ∈ B ∩ τ = s

)∈ F(t)

for all t ∈ T , i.e. X(τ) ∈ B ∈ F(τ) and thus X(τ) is F(τ)-measurable.

24


In continuous time, as you may have guessed, the question of adaptedness ofthe stopped process is more delicate and as Exercise 17 shows, the answer isin general not affirmative. We shall prove the adaptedness of the stoppedprocess in the case of right continuous processes (which is, once again, notthe most general result but sufficient for our purposes). The proof is basedon the very useful fact that any stopping time can be approximated from theright by a monotone sequence of stopping times with a finite range.

Proposition 1.25 (Approximation of Stopping Times). Assume that T =[0,∞) and let τ be a stopping time with respect to a filtration F = F(t)t∈[0,∞).Then there exists a sequence τnn∈N of stopping times such that

(i) τn(Ω) ⊂ [0,∞] is finite for all n ∈ N,

(ii) τ ≤ τn+1 ≤ τn for all n ∈ N,

(iii) τ < τn for all n ∈ N on τ <∞,

(iv) infn∈N τn = limn→∞ τn = τ .

Moreover, if σ is another stopping time satisfying σ ≤ τ , and σnn∈N denotesthe corresponding approximating sequence, then this sequence can be chosensuch that σn ≤ τn for all n ∈ N.

Proof. Given n ∈ N, we define τn : Ω→ [0,∞] by

τn(ω) ,∑n2n

k=1 k2−n1τ∈[(k−1)2−n,k2−n) +∞1τ≥n for all ω ∈ Ω.

Then the sequence τnn∈N clearly satisfies the properties (i) to (iv) above.Moreover, each τn is a stopping time by Lemma 1.15 (Characterization ofDiscrete Stopping Times) since

τn = k2−n

=τ ≥ (k − 1)2−n

∩τ < k2−n

∈ F(k2−n

)for all k ∈ 1, 2, . . . , n2n. Finally, if σ is another stopping time satisfyingσ ≤ τ and if σnn∈N is defined in the same way as the sequence τnn∈Nabove, then it is obvious that σn ≤ τn for all n ∈ N.

It is crucial that the sequence τnn∈N approximates the stopping time τfrom the right. In general, it is not possible to find an approximating se-quence from the left, since otherwise we would be able to anticipate the

25


stopping time τ . With this approximating procedure at hand, we can nowshow by a limiting argument that if we stop adapted right continuous pro-cesses, the stopped process is still adapted.

Proposition 1.26 (Measurability of Stopped Processes (Continuous Case)).LetX = X(t)t∈[0,∞) be a right continuous stochastic process which is adaptedto a filtration F = F(t)t∈[0,∞). Let moreover τ be a stopping time with respectto F. Then the stopped process Xτ is adapted to both the stopped filtration Fτ

and the original filtration F. Moreover, if τ takes values in [0,∞) only, thenX(τ) is F(τ)-measurable.

Proof. Let τnn∈N be the approximating sequence of stopping times for τgiven by Proposition 1.25 (Approximation of Stopping Times). Then it fol-lows from Lemma 1.24 (Measurability of Stopped Processes (Discrete Case))that Xτn is F-adapted for each n ∈ N. From this and the right continuity ofX, it follows that

Xτ (t) = X(t ∧ τ) = limn→∞

X(t ∧ τn) = limn→∞

Xτn(t)

is F(t)-measurable for all t ∈ [0,∞), i.e. Xτ is F-adapted. Now assume thatτ takes values in [0,∞) and let t ∈ [0,∞). Then, since τ ≤ t ∈ F(t) andXτ (t) is F(t)-measurable, it follows that, for all B ∈ S, we have

X(τ) ∈ B ∩ τ ≤ t = Xτ (t) = X(t ∧ τ) ∈ B ∩ τ ≤ t ∈ F(t),

and hence X(τ) is F(τ)-measurable. As in the proof of Lemma 1.24 (Mea-surability of Stopped Processes (Discrete Case)), this implies that Xτ (t) =X(t ∧ τ) is Fτ (t) = F(t ∧ τ)-measurable for all t ∈ [0,∞) and hence Xτ isalso adapted to the stopped filtration Fτ .

26

Chapter

2 BROWNIAN MOTION

In this chapter, we study the most important stochastic process in continuoustime: Brownian motion. Why is it so important? Because it’s distributionis in some sense the stochastic process analogue of the standard normal dis-tribution in probability theory in that a version of the central limit theoremholds: Suitably rescaled sequences of random walks converge in distribu-tion to Brownian motion. Moreover, Brownian motion belongs to all classesof stochastic processes we will encounter throughout this course: Gaussianprocesses, martingales, Lévy processes, and Markov processes.

The plan for this chapter is as follows: We begin with the definition of Brow-nian motion and give some historical background. Then we take a closerlook at Gaussian processes, which are stochastic processes whose finite-dimensional distributions are multivariate normal, and provide a character-ization of Brownian motion as a Gaussian process. Then we take a closerlook at path properties of Brownian motion. E.g., as we shall see, despitethe fact that all paths of Brownian motion are continuous, almost every pathis nowhere differentiable. We shall furthermore see that it is not clear fromits definition if Brownian motion exists. For this reason, we then spend theremainder of this chapter with the construction of Brownian motion.

There are different ways to construct Brownian motion. Our constructionis based on two theorems due to Kolmogorov: The Kolmogorov Consis-tency Theorem, which provides a convenient way to construct stochas-tic processes by specifying their finite-dimensional distributions, and theKolmogorov-Centsov Continuity Theorem, which provides a method forconstructing stochastic processes with continuous paths. A combination ofthe two theorems will not only establish existence of Brownian motion, butfor a whole range of other processes as well.

27

Chapter 2 Brownian Motion

2.1 Definition of Brownian Motion

Throughout this chapter, we fix a filtration F = F(t)t∈[0,∞). We begin withthe definition of Brownian motion.

Definition 2.1 (Brownian Motion/Wiener Process). Let W = W (t)t∈[0,∞)

be an F-adapted stochastic process. We say that W is a Brownian motion ora Wiener process with respect to F if

(W1) W (0) = 0,

(W2) for all s, t ∈ [0,∞) with s < t, the increment W (t) −W (s) is indepen-dent of F(s),

(W3) for all s, t ∈ [0,∞) with s < t, the increment W (t)−W (s) is normallydistributed with mean zero and variance t− s,

(W4) W is continuous.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.5

0

0.5

Time t

Valu

eW

(t)

Figure 2.1 Path of a Brownian motion W .

Brownian motion is named after the Scottish botanist Robert Brown, whois, among other important contributions, famous for the following exper-iment: In 1827, Brown observed that microscopic particles within pollengrains suspended in water randomly move around. Up to some scaling, the

28

2.2 Gaussian Processes

stochastic process of Definition 2.1 can be thought of as a mathematicalmodel for this kind of motion in one dimension – hence the name Brownianmotion. The movement of the pollen particles is now believed to be causedby collisions of the particles with the water molecules, an explanation firstgiven by Albert Einstein in 1905. The first existence proof of the stochasticprocess in Definition 2.1 is due to Norbert Wiener in 1923, which is whythe process is sometimes also referred to as the Wiener process.

Definition 2.1 does not guarantee existence of Brownian motion and wehave to put in a significant amount of effort to arrive at the existence. Es-pecially the question of whether a process satisfying (W1), (W2), and (W3)can have continuous paths turns out to be quite delicate. Before we addressthis issue, however, let us first look at some properties of Brownian motionin more detail.


Let us first try to understand the distributional properties of Brownian mo-tion. As highlighted in Chapter 1, the distribution of a stochastic process isuniquely characterized in terms of the finite-dimensional distributions. Inthe case of Brownian motion, the finite-dimensional distributions are easilycomputed.

For n ∈ N, let us fix [t1, . . . , tn] ⊂ [0,∞) and consider the n-dimensionalrandom vector

X ,(W (t1), . . . ,W (tn)

).

Writing t0 , 0 and Zj , W (tj)−W (tj−1) for j = 1, . . . , n, the kth componentXk of X can be expressed as

Xk = W (tk) =∑k

j=1 Zj for all k = 1, . . . , n.

In particular, for any y ∈ Rn it holds that

〈y,X〉 =∑n

k=1 ykXk =∑n

k=1 yk∑k

j=1 Zj =∑n

j=1 Zj∑n

k=j yk =∑n

j=1 yjZj,

where yj ,∑n

k=j yk ∈ R for j = 1, . . . , n. Therefore, since Z1, . . . , Zn areindependent by (W2) and each Zj is normally distributed by (W3), it follows

29


that each scalar product 〈y,X〉, y ∈ Rn, is normally distributed. But thisimplies that X has a multivariate normal distribution (n-dimensional). Inother words: The finite-dimensional distributions of Brownian motion aremultivariate normal. Processes with this property are also referred to asGaussian processes.

Definition 2.2 ((Centered) Gaussian Process; Covariance Function). A real-valued stochastic process X = X(t)t∈T is called a Gaussian process if forall [t1, . . . , tn] ⊂ T , n ∈ N, it holds that(

X(t1), . . . , X(tn))

is n-dimensional normally distributed.

We say that X is centered if

E[X(t)

]= 0 for all t ∈ T .

Finally, the function

Γ : T × T → R, Γ(s, t) , Cov[X(s), X(t)

]is called the covariance function of X.

Observe that Brownian motion is centered since by (W1) and (W3) we have

E[W (t)

]= E

[W (t)−W (0)

]= 0 for all t ∈ [0,∞).

The distribution of a centered Gaussian process is uniquely determinedby its covariance function. This is immediate since the finite-dimensionaldistributions uniquely characterize the distribution of a process and the n-dimensional normal distribution is uniquely determined by its expectationvector and covariance matrix. In particular, we have the following usefulcharacterization of Brownian motion as a Gaussian process.

Theorem 2.3 (Brownian Motion as a Gaussian Process). A stochastic processX = X(t)t∈[0,∞) is a Brownian motion with respect to its natural filtrationFX if and only if X is a continuous centered Gaussian process with X(0) = 0and covariance function

Γ : [0,∞)× [0,∞)→ R, Γ(s, t) , s ∧ t.

30


Proof. Suppose that X is a Brownian motion. We already know that Brow-nian motion is a continuous centered Gaussian process with X(0) = 0. For0 ≤ s ≤ t, the fact that Brownian motion starts in zero shows that

Γ(s, t) = Cov(X(s), X(t)

)= Cov

(X(s), X(t)−X(s) +X(s)

)= Cov

(X(s), X(t)−X(s)

)+ Cov

(X(s), X(s)

)= Cov

(X(s), X(t)−X(s)

)+ Cov

(X(s)−X(0), X(s)−X(0)

).

Now X(s) and X(t) − X(s) are independent by (W2) and thus the firstcovariance is equal to zero. The second covariance is equal to the varianceof X(s) − X(0), which by (W3) is equal to s. Thus Γ(s, t) = s = s ∧ t asclaimed.

On the other hand, since the covariance function uniquely determines thedistribution of a centered Gaussian process, it follows that every centeredGaussian process with the above covariance function satisfies (W2) (withrespect to its natural filtration) and (W3). Thus, if this process is in additioncontinuous and satisfies X(0) = 0, it must be a Brownian motion.

The previous theorem is useful since it allows us to show that Brownianmotion is stable under various transformations.

Exercise 18. Let W be a Brownian motion, a > 0, and t0 ∈ [0,∞). Show thatthe following processes are Brownian motions as well:

−W (t)t∈[0,∞)

,

1√aW (at)

t∈[0,∞)

,W (t+ t0)−W (t0)

t∈[0,∞)

.

Exercise 19. Let W be a Brownian motion and t > 0. For each n ∈ N, fix[tn0 , . . . , t

nn] ⊂ [0, t] with tn0 = 0 and tnn = t and assume that

maxk=1,...,n

|tnk − tk−1| → 0 as n→∞.

Show thatlimn→∞

E[∣∣∣t−∑n

k=1

∣∣W (tnk)−W (tnk−1)∣∣2∣∣∣2] = 0.

Brownian motion is of course not the only Gaussian process. Let us take alook at some more examples.

31


Example 2.4 (Brownian Bridge). Let W be a Brownian motion and define astochastic process X = X(t)t∈[0,1] by

X(t) = W (t)− tW (1), t ∈ [0, 1].

Then the process X is called a Brownian bridge.

The Brownian bridge gets it name from its behavior at time t = 0 and t = 1as we have

X(0) = W (0)− 0W (1) = 0 and X(1) = W (1)− 1W (1) = 0,

i.e. the Brownian bridge starts and ends in zero. The fact that it is a Gaussianprocess is easily verified.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

−1

0

1

2

Time t

Valu

esW

(t)

andX(t)

W (t)

X(t)

Figure 2.2 Path of a Brownian motion W and corresponding Brownian bridge X.

Exercise 20. Show that the Brownian bridge X = X(t)t∈[0,1] constructedfrom a Brownian motion W is a continuous centered Gaussian process withcovariance function

Γ : [0, 1]× [0, 1]→ R, Γ(s, t) = s ∧ t− st.

Is the Brownian bridge adapted to the natural filtration FW = FW (t)t∈[0,1] ofthe underlying Brownian motion?

32


Another important Gaussian process is the Ornstein-Uhlenbeck process.

Example 2.5 (Ornstein-Uhlenbeck Process). Let κ, σ > 0 be given. The con-tinuous centered Gaussian process X = X(t)t∈[0,∞) with covariance function

Γ : [0,∞)× [0,∞)→ R, Γ(s, t) , σ2

2κe−κ(s+t)

(e2κs∧t − 1

)is called Ornstein-Uhlenbeck process.

0 0.2 0.4 0.6 0.8 1−10

−5

0

5

10

Valu

eX(t)

κ = 1, σ = 1

0 0.2 0.4 0.6 0.8 1−10

−5

0

5

10κ = 1, σ = 10

0 0.2 0.4 0.6 0.8 1−10

−5

0

5

10

Time t

Valu

eX(t)

κ = 100, σ = 1

0 0.2 0.4 0.6 0.8 1−10

−5

0

5

10

Time t

κ = 100, σ = 10

Figure 2.3 Paths of Ornstein-Uhlenbeck processes for various choices of the pa-rameters κ and σ.

As usual, existence of the Ornstein-Uhlenbeck process is a priori not guar-anteed. However, as the following exercise shows, the Ornstein-Uhlenbeckprocess can be constructed in a straightforward manner from a Brownianmotion by means of rescaling and a suitable time change.

33


Exercise 21. Let κ, σ > 0 and let W be a Brownian motion. Define a stochasticprocess X = X(t)t∈[0,∞) by

X(t) , e−κtW(σ2

2κ(e2κt − 1)

)for all t ∈ [0,∞).

Show that X is an Ornstein-Uhlenbeck process.

The Ornstein-Uhlenbeck process has the interesting feature that it tends toreturn to its mean of zero. This feature is sometimes referred to as meanreversion. The larger the choice of κ, the faster the Ornstein-Uhlenbeckprocess returns to its mean. The parameter σ on the other hand controls thevolatility of the process. These features are exemplified in Figure 2.3.

Example 2.6 (Fractional Brownian Motion). Let H ∈ (0, 1). The continuouscentered Gaussian process X = X(t)t∈[0,∞) with covariance function

Γ : [0,∞)× [0,∞)→ R, Γ(s, t) , 12

(s2H + t2H − |t− s|2H

)is called fractional Brownian motion with Hurst index H.

It is not very difficult to verify that a fractional Brownian motion with Hurstindex H = 1/2 is a Brownian motion. If the Hurst index H is smaller than1/2, then the paths of a fractional Brownian motion look rougher than thepaths of a Brownian motion. On the other hand, if the Hurst index H isbigger than 1/2, the paths of the fractional Brownian motion look less roughthan those of a Brownian motion. We will return to this feature later, butconvince ourselves for now by looking at Figure 2.4.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

−1

0

1

Time t

Valu

eX(t)

H = 1/4

H = 3/4

Figure 2.4 Paths of fractional Brownian motions for different choices of H.

34

2.3 Path Properties of Brownian Motion


Let us now take a closer look at the paths of Brownian motion. Figure 2.1suggests that the paths are quite rough. As it turns out, almost every path ofBrownian motion is nowhere differentiable. More precisely, it can be shownthat almost every path is nowhere p-Hölder continuous if p > 1/2.

Suppose that (S, d) is a metric space and fix p ∈ (0, 1]. We say that a functionf : T → S is p-Hölder continuous at a point t ∈ T if

lim sups→t

d(f(s), f(t))

|s− t|p<∞.

Moreover, we say that f is p-Hölder continuous if there exist constantsδ > 0 and C > 0 such that

d(f(s), f(t)

)≤ C|s− t|p for all s, t ∈ T with |s− t| ≤ δ.

Clearly, if f is p-Hölder continuous (in t), then it is also q-Hölder continuous(in t) for all 0 < q ≤ p. Moreover, p-Hölder continuity implies uniformcontinuity and if f is differentiable in t, then f must be 1-Hölder continuousin t. Conversely, if there exists no t ∈ T such that f is p-Hölder continuousin t for some p ∈ (0, 1], then f is cannot be differentiable anywhere.

Exercise 22. Let q ∈ (0, 1] and f : [0,∞) → [0,∞) be given by f(x) , xq forall x ∈ [0,∞). For which p ∈ (0, 1] is this function p-Hölder continuous?

Exercise 23. Construct a function which is uniformly continuous, but not p-Hölder continuous for any p ∈ (0, 1].

Theorem 2.7 (Nowhere Differentiability of Brownian Paths). Let W be aBrownian motion and p > 1/2. Then for P-almost every ω ∈ Ω, the pathW (·, ω) is continuous, but nowhere p-Hölder continuous. In particular, almostevery path of W is nowhere differentiable.

Proof. Choose r ∈ N such that (p−1/2)r > 1, which is possible since p > 1/2.For each t ∈ [0, 1], we define

H(t) ,f : [0,∞)→ R : f is p-Hölder continuous in t

⊂ R[0,∞).

35


We now proceed to construct a set A ⊂ R[0,∞) with A ∈ B(R)[0,∞) suchthat H(t) ⊂ A for all t ∈ [0, 1] and P[W ∈ A] = 0. From this, it followsthat almost every path is nowhere p-Hölder continuous on [0, 1]. Using thescaling invariance of Brownian motion, we then extend the result to [0,∞).

Step 1: Construction of A. Given K,n ∈ N, and k ∈ 1, . . . , n, define

A(K,n, k) ,⋂rj=1

f : [0,∞)→ R :

∣∣f(k+jn

)− f

(k+j−1n

)∣∣ ≤ Kn−p.

With this, we then proceed to set

A(K,n) ,⋃nk=1A(K,n, k)

followed by

A(K) , lim infn→∞

A(K,n) =⋃∞n=1

⋂∞N=nA(K,N),

and finallyA ,

⋃K∈NA(K).

Since the set A is constructed from countably many unions and intersectionsof sets A(K,n, k) and we clearly have A(K,n, k) ∈ B(R)[0,∞), we see thathave A ∈ B(R)[0,∞), i.e. A is a measurable set. We claim that⋃

t∈[0,1]H(t) ⊂ A.

For this, let us fix t ∈ [0, 1] as well as f ∈ H(t). To see that f ∈ A, it sufficesto show that there exists K sufficiently large, such that for eventually alln ∈ N there exists some k ∈ 1, . . . , n with f ∈ A(K,n, k). For this, we firstobserve that we can find δ = δ(f, t) > 0 and C = C(f, t) > 0 such that

|f(t)− f(s)| ≤ C|t− s|p for all s ∈ [t− δ, t+ δ] ∩ [0,∞).

Now choose n ∈ N sufficiently large such that (r + 1)/n < δ. Next, letk ∈ 1, . . . , n such that (k − 1)/n < t ≤ k/n (and k = 1 if t = 0). Then∣∣k+j

n− t∣∣ = k+j

n− t =

(kn− t)

+ jn≤ 1

n+ j

n≤ r+1

n< δ for all j = 0, . . . , r.

But then we must have∣∣f(k+jn

)− f

(k+j−1n

)∣∣ ≤ ∣∣f(k+jn

)− f(t)

∣∣+∣∣f(t)− f

(k+j−1n

)∣∣≤ 2C

(r+1n

)p= 2C(r + 1)pn−p, j = 1, . . . , r.

36


Put differently, choosing K ∈ N such that K ≥ 2C(r + 1)p, we have f ∈A(K,n, k) for all n ∈ N sufficiently large and some k ∈ 1, . . . , n and there-fore f ∈ A as claimed.

Step 2: We show that P[W ∈ A] = 0. By definition of A, it suffices to showthat P[W ∈ A(K)] = 0 for all K ∈ N. To see this, let n ∈ N and choosek ∈ 1, . . . , n. Using the independence and stationarity of the incrementsof Brownian motion, it follows that

P[W ∈ A(K,n, k)

]= P

[∣∣W(k+jn

)−W

(k+j−1n

)∣∣ ≤ Kn−p for all j = 1, . . . , r]

=∏r

j=1 P[∣∣W(k+j

n

)−W

(k+j−1n

)∣∣ ≤ Kn−p]

= P[∣∣W( 1

n

)∣∣ ≤ Kn−p]r

= P[|W (1)| ≤ Kn−p+1/2

]r.

Since the density of the standard normal distribution is bounded from aboveby 1, we can estimate the probability of the event |W (1)| ≤ Kn−p+1/2 bythe Lebesgue measure of the set [−Kn−p+1/2, Kn−p+1/2], i.e.

P[W ∈ A(K,n, k)

]≤ (2K)rn(−p+1/2)r.

Using Fatou’s lemma and the σ-subadditivity of P, we therefore find that

P[W ∈ A(K)

]≤ lim inf

n→∞P[W ∈ A(K,n)

]≤ lim sup

n→∞

∑nk=1 P

[W ∈ A(K,n, k)

]≤ lim sup

n→∞n(2K)rn(−p+1/2)r

= (2K)r lim supn→∞

n1−(p−1/2)r = 0

since we have chosen r such that (p− 1/2)r > 1. Thus P[W ∈ A] = 0.

Step 3: Extension to [0,∞). Thus far, we have seen that P-almost every pathis nowhere p-Hölder continuous on [0, 1]. Now assume by contradiction thatthere exists an event F ∈ A with P[F ] > 0 such that for each ω ∈ F , thereexists some t(ω) ∈ [0,∞), such that W (·, ω) is p-Hölder continuous in t(ω).Since [0,∞) =

⋃m∈N[m − 1,m), we may without loss of generality assume

that there exists M ∈ N such that every t(ω) ≤M for all ω ∈ F . Now definea stochastic process X = X(t)t∈[0,∞) by

X(t) , 1√MW (Mt) for all t ∈ [0,∞).

37


By Exercise 18, X is a Brownian motion and thus almost surely nowherep-Hölder continuous on [0, 1]. On the other hand, since for each ω ∈ F thepathW (·, ω) is p-Hölder continuous in t(ω), X(·, ω) must be p-Hölder contin-uous in t(ω)/M ≤ 1. This is a contradiction, hence the proof is finished.

Let us emphasize the following: Almost every path of Brownian motion isnowhere differentiable, which is a lot stronger than the statement thatalmost every path is not differentiable. Indeed, for the latter statement itsuffices to find one t ∈ [0,∞) at which W (·, ω) is not differentiable. Thestatement of the previous theorem tells us however that t 7→ W (t, ω) is notdifferentiable in any t ∈ [0,∞). Intuitively, this means that the paths ofBrownian motion must have a lot of kinks.

Differentiability is not the only property which fails for paths of Brown-ian motion. For example, it can also be shown that almost every path isnowhere monotone.

Exercise 24. Let W be a Brownian motion. Show that

P[ω ∈ Ω : there exist t0, t1 ∈ [0,∞) with t0 < t1 such that

W (s, ω) ≤ W (t, ω) for all t0 ≤ s ≤ t ≤ t1]

= 0.

2.4 The Kolmogorov Consistency Theorem

We now turn to the issue of existence of Brownian motion. As alreadymentioned, there are several ways to establish the existence result. Theapproach taken in this course is a rather systematic one in that the resultsestablish existence for a whole range of stochastic processes at once, includ-ing all Gaussian processes introduced in Section 2.2.

We follow a two step procedure. First, we derive a theorem which allowsto construct stochastic processes by specifying finite-dimensional distri-butions satisfying a certain consistency condition. This will allow us toconstruct raw versions of stochastic processes satisfying the correct distri-butional properties, i.e. in case of Brownian motion the theorem can be used

38


to construct a raw Brownian motion satisfying the independent incrementsproperty (W2) and the stationary normal increments property (W3), but notnecessarily the continuity property (W4). The theorem facilitating this rawconstruction is called Kolmogorov’s Consistency Theorem and is the mainsubject of this section.

In the second step, we show that we can modify the raw process on suitablenullsets to ensure the continuous paths property. We shall see that thisis always possible provided that the increments of the raw process satisfya certain integrability condition (which is a distributional property). Thisresult is called the Kolmogorov-Centsov Continuity Theorem and will bethe derived in the subsequent section.

In order to establish the consistency theorem, we first need to followingregularity result concerning probability measures on Rd, stating that theprobability of a Borel set can be approximated from below by probabilitiesof compact sets and from above by probabilities of open sets. A measurewhich satisfies these two properties is often referred to as being regular.

Lemma 2.8 (Regularity of Probability Measures on Rd). Let µ be a probabil-ity measure on (Rd,B(Rd)) and let B ∈ B(Rd) be a Borel set. Then

µ(B) = supµ(K) : K ⊂ B compact

= inf

µ(O) : O ⊃ B open

, (2.1)

i.e. µ is a regular measure.

Proof. The proof is based on a monotone class argument and divided intothree steps.

Step 1: The good set. We denote by G the set of Borel sets for which theclaim of the lemma holds, i.e.

G ,G ∈ B(Rd) : G satisfies Equation (2.1)

.

Our aim is of course to show that G = B(Rd). For this, we show that Gis a λ-system which contains a π-system generating B(Rd). The result thenfollows from Dynkin’s π-λ lemma (which states, as we recall, that the σ-field generated by a π-system coincides with the λ-system generated by this

39


π-system). The fact that G contains a π-system generating B(Rd) is easilyseen since G clearly contains the system of sets of the form

×d

i=1(−∞, ai] for all a1, . . . , ad ∈ R.

Indeed,×d

i=1(−∞, ai] can be approximated from below by the compact sets

Knn∈N and from above by the open sets Onn∈N given by

Kn ,×d

i=1[−n, ai] and On ,×d

i=1(−∞, ai + 1/n), n ∈ N.

By the continuity of µ from below and above it then follows that

µ(×d

i=1(−∞, ai]

)= lim

n→∞µ(Kn) = lim

n→∞µ(On),

i.e.×d

i=1(−∞, ai] ∈ G. To see that G is a λ-system, we have to show that

Rd ∈ G (which follows by a similar argument as above), and that G is stableunder taking complements and countable unions of disjoint sets.

Step 2: To see that G is stable under taking complements, let G ∈ G. Thenthere exists a sequence of compact sets Knn∈N and a sequence of open setsOnn∈N such that Kn ⊂ G ⊂ On for all n ∈ N and

µ(G) = limn→∞

µ(Kn) = limn→∞

µ(On).

For each n ∈ N, it follows that Gc is contained in the open set Kcn, and hence

µ(Gc) = 1− µ(G) = 1− limn→∞

µ(Kn) = limn→∞

µ(Kcn).

On the other hand, for each n,m ∈ N, the set Fmn , Oc

n ∩ [−m,m]d is acompact subset of Gc and

µ(Ocn) = lim

m→∞µ(Fm

n ) for all n ∈ N

by continuity of µ from below. Now given j ∈ N, there exists N(j) ∈ N with

µ(G) ≥ µ(On)− 12j

for all n ≥ N(j)

and for each n ∈ N we can find M(n) ∈ N such that

µ(Ocn) ≤ µ(Fm

n ) + 12j

for all m ≥M(n).

40


Now define a sequence of sets Fjj∈N by

Fj , FM(N(j))N(j) , j ∈ N.

Then it follows that Fj ⊂ Gc is compact and

µ(Fj) ≥ µ(OcN(j))−

12j

= 1− µ(ON(j))− 12j≥ 1− µ(G)− 1

j= µ(Gc)− 1

j

for all j ∈ N, i.e. µ(Gc) = limj→∞ µ(Fj) and thus Gc ∈ G.

Step 3: To finish the proof, it remains to show that whenever Gnn∈N is asequence of disjoint sets in G, then G ,

⋃n∈NGn ∈ G. For this, let us fix

ε > 0 and for each n ∈ N we select Kn ⊂ Gn compact and On ⊃ Gn openwith

µ(On)− ε2−n ≤ µ(Gn) ≤ µ(Kn) + 12ε2−n.

Now the set O ,⋃n∈NOn is open, contains G, and

µ(G) ≤ µ(O) ≤∑∞

n=1 µ(On) ≤∑∞

n=1 µ(Gn) + ε2−n

= ε+∑∞

n=1 µ(Gn) = ε+ µ(G), (2.2)

where the last equality is a consequence of the σ-additivity of µ since the se-quence Gnn∈N is disjoint. From

∑∞n=1 µ(Gn) = µ(G) < ∞, it furthermore

follows that there exists m ∈ N such that∑∞n=m+1 µ(Gn) ≤ 1

2ε.

Now define K ,⋃mn=1 Kn and observe that K is a compact subset of G and

µ(K) ≤ µ(G) =∑m

n=1 µ(Gn) +∑∞

n=m+1 µ(Gn) ≤∑m

n=1 µ(Gn) + 12ε

≤∑m

n=1 µ(Kn) + 12ε2−n + 1

2ε ≤ ε+

∑mn=1 µ(Kn) = ε+ µ(K), (2.3)

where we have used thatK1, . . . , Km are disjoint sinceKn ⊂ Gn for all n ∈ Nand the sequence Gnn∈N is disjoint. Since ε > 0 was chosen arbitrarily, itfollows from Equation (2.2) and Equation (2.3) that G ∈ G.

Let us now turn to Kolmogorov’s consistency theorem. For this, let us denoteS = Rd and S = B(Rd). Now for each n ∈ N and each finite subsetF , [t1, . . . , tn] of T , let PF be a probability measure on (SF ,SF). Thequestion we ask ourselves now is whether there exists a stochastic process

41


X = X(t)t∈T such that PF : F = [t1, . . . , tn] ⊂ T , n ∈ N are the finite-dimensional distributions of X. In general, this need not be true as we havenot yet specified any conditions on the probability measures PF .

To see what kind of conditions are necessary, let us assume for a secondthat PF : F = [t1, . . . , tn] ⊂ T , n ∈ N really are the finite-dimensionaldistributions of some stochastic process X = X(t)t∈T . Let us furthermorefix two finite subsets F = [t1, . . . , tn] and G = [s1, . . . , sm] of T such thatF ⊂ G. We denote by

πGF : SG → SF , x(t)t∈G 7→ x(t)t∈F

the coordinate projection from SG to SF . Let us fix B ∈ SF of the form

B =×t∈F Bt where Bt ∈ S for all t ∈ F .

Now set B , πGF ∈ B and observe that B is given explicitly by

B =×t∈G Bt where Bt =

Bt if t ∈ F ,S if t ∈ G \ F .

Now since PF : F = [t1, . . . , tn] ⊂ T , n ∈ N are the finite dimensionaldistributions of X, it follows that

P[X(t)

t∈F ∈ B

]= PF [B].

On the other hand, since Bt = S whenever t ∈ G \ F and X(t) ∈ S = Ωfor all t ∈ T , we also have

P[X(t)

t∈F ∈ B

]= P

[X(t)

t∈F ∈ B,X(t) ∈ Bt for all t ∈ G \ F

]= P

[X(t)

t∈G ∈ B

]= PG[B] = PG

[πGF ∈ B

],

i.e. for PF : F = [t1, . . . , tn] ⊂ T , n ∈ N to be the finite dimensionaldistributions of a stochastic process X we must necessarily have

PF [B] = PG[πGF ∈ B

]for all B ∈ SF (2.4)

for any choice of finite sets F and G with F ⊂ G ⊂ T . Equation (2.4)is called Kolmogorov’s Consistency Condition, and Kolmogorov’s consis-tency theorem below shows that this consistency condition is not just nec-essary for the existence of the process X, but also sufficient.

42


Let us think for a second about the strategy to construct a stochastic pro-cess X = X(t)t∈T given the finite-dimensional distributions PF : F =[t1, . . . , tn] ⊂ T , n ∈ N. Observe that we have some additional freedom inour construction: We can choose the probability space (Ω,A,P) in whateverway we want and we shall use this to our advantage. If we choose Ω = ST

to be the space of functions from T to S, let the σ-field A = ST be theproduct σ-field on Ω, we can choose X = X(t)t∈T to be the coordinateprocess or canonical process given by

X(t, ω) , ω(t) for all t ∈ T and ω ∈ Ω.

Considering the process X as a path-valued random variable, i.e.

X : Ω→ ST , ω 7→ X(·, ω),

we see that X is the identity mapping from Ω into itself. Thus choosing theprobability measure P on (Ω,A) is the same as choosing the distribution ofthe process X. Since the distribution of a process is determined uniquely byits finite-dimensional distributions, we should be able to construct P fromthe given set of probability measures PF : F = [t1, . . . , tn] ⊂ T , n ∈ N.

Theorem 2.9 (Kolmogorov’s Consistency Theorem). Let S = Rd and S =B(Rd). For each F = [t1, . . . , tn] ⊂ T , n ∈ N, assume that PF is a probabilitymeasure on (SF ,SF) and assume that whenever G = [s1, . . . , sm] ⊂ T , m ∈ N,is such that F ⊂ G, then Kolmogorov’s consistency condition

PF [B] = PG[πGF ∈ B

]for all B ∈ SF

holds. Then there exists a probability space (Ω,A,P) as well as a stochastic pro-cess X = X(t)t∈T on (Ω,A,P) such that the finite-dimensional distributionsof X are given by PF : F = [t1, . . . , tn] ⊂ T , n ∈ N.

Proof. As outlined above, we choose

Ω , ST and A , ST ,

and let X = X(t)t∈T be the canonical process given by

X(t, ω) , ω(t) for all t ∈ T and ω ∈ Ω.

43


Then specifying a probability measure P on (Ω,A) is equivalent to specifyingthe distribution of X. To construct the measure P, we construct a premea-sure P from PF : F = [t1, . . . , tn] ⊂ T , n ∈ N and then use Carathéodory’sextension theorem.

Step 1: Construction of P. Given F = [t1, . . . , tn] ⊂ T , we introduce theshort hand notation πF , πTF . Now consider the system of sets

R ,⋃n∈N

⋃F=[t1,...,tn]⊂T σ(πF)

=πF ∈ B : F = [t1, . . . , tn] ⊂ T , n ∈ N, B ∈ SF

.

It is clear that σ(R) = A since A = ST is the smallest σ-field on ST con-taining all finite-dimensional cylinder sets. Moreover, R is a ring of sets,i.e. it is nonempty, closed under unions, and closed under taking set dif-ferences. Indeed, R is non-empty since each σ(πF) is nonempty. Let nowπF1 ∈ B1, πF2 ∈ B2 ∈ R for F1,F2 ⊂ T finite and B1 ∈ SF1 andB2 ∈ SF2. Setting F , F1 ∪ F2, we observe that

πFi ∈ Bi =πF ∈ πFFi ∈ Bi

i = 1, 2.

But then

πF1 ∈ B1 ∪ πF2 ∈ B2 =πF ∈

πFF1∈ B1

∪πFF2∈ B2

∈ R

and similarly

πF1 ∈ B1 \ πF2 ∈ B2 =πF ∈

πFF1∈ B1

\πFF2∈ B2

∈ R

Thus R really is a ring. With this, we now define

P : R→ [0, 1], P[πF ∈ B

], PF [B].

We have to be careful here to ensure that P is well-defined as

πF ∈ B =πG ∈ πGF ∈ B

if G ⊂ T is finite and contains F .

However, by Kolmogorov’s consistency condition, P is well-defined. If wecan show that P is a premeasure, it follows from Carathéodories exten-sion theorem that P extends uniquely to a probability measure P on (Ω,A).Since

P[X(t)t∈F ∈ B

]= P

[ω ∈ Ω : ω(t)t∈F ∈ B

]= P

[πF ∈ B

]= PF [B]

44


for all B ∈ SF and F ⊂ T finite, showing that P is a premeasure thusfinishes the proof.

Step 2: We show that P is a content. For this, we first observe that P[∅] = 0.Indeed, for any t ∈ T we have ∅ = πt ∈ ∅ and thus

P[∅] = P[πt ∈ ∅

]= Pt[∅] = 0.

Moreover, P is finitely additive. Indeed, let F1, . . . ,Fn be finite subsets ofT and B1, . . . , Bn such that Bi ∈ SFi, i = 1, . . . , n, and such that πFi ∈ Bi,i = 1, . . . , n, are disjoint. Now set F =

⋃ni=1Fi and observe that, as above,

πFi ∈ Bi =πF ∈ πFFi ∈ Bi

i = 1, . . . , n.

With πFi ∈ Bi, i = 1, . . . , n, being disjoint, it hence follows that the setsπFFi ∈ Bi, i = 1, . . . , n, are disjoint as well. But then

P[⋃n

i=1πFi ∈ Bi]

= P[⋃n

i=1

πF ∈ πFFi ∈ Bi

]= P

[πF ∈

⋃ni=1πFFi ∈ Bi

]= PF

[⋃ni=1πFFi ∈ Bi

]=∑n

i=1 PF[πFFi ∈ Bi

]=∑n

i=1 PFi[Bi

]=∑n

i=1 P[πFi ∈ Bi

].

We have hence argued that P is a content.

Step 3: We are left with showing that P is σ-additive. Since we alreadyknow that P is finitely additive and P[A] ≤ 1 for all A ∈ R, it is sufficientto show that P is continuous in ∅, i.e. whenever Ann∈N is a sequence in Rwith An ⊃ An+1 for all n ∈ N and A ,

⋂n∈NAn = ∅, then limn→∞ P[An] = 0.

Indeed, if Bnn∈N is a sequence of pairwise disjoint sets in R and if we set

B ,⋃∞n=1 Bn and An , B \

⋃ni=1Bi for all n ∈ N,

then Ann∈N satisfies An ⊃ An+1 for all n ∈ N and⋂n∈NAn = ∅. But then,

if P is continuous in ∅, P[B] is finite, and P is finitely additive,

0 = limn→∞

P[An]

= P[B]− lim

n→∞

∑ni=1P

[Bi

]= P

[B]−∑∞

n=1P[Bn

],

i.e. P is σ-additive. Finally, showing that P is continuous in ∅ is evidentlyequivalent to showing that if δ , infn∈N P[An] > 0, then A 6= ∅.

45


Since An ∈ R, there exists Fn ⊂ T finite and Bn ∈ SFn such that An =πFn ∈ Bn. Clearly, we may choose Fn in such a way that Fn ⊂ Fn+1 forall n ∈ N. Applying Lemma 2.8 to the probability measure PFn, there existsa compact set Kn ⊂ Bn such that

P[An \ πFn ∈ Kn

]= P

[πFn ∈ Bn \Kn

]= PFn

[Bn \Kn

]≤ δ2−n.

Now define

Ln ,⋂nk=1πFk ∈ Kk for all n ∈ N and L ,

⋂n∈N Ln.

Then, for each n ∈ N, we observe that

Ln+1 ⊂ Ln ⊂ πFn ∈ Kn ⊂ πFn ∈ Bn = An.

Now suppose that x ∈ An \ Ln for some n ∈ N. Then x ∈ Ak for all k ≤ nsince the sequence Ann∈N is nonincreasing, and x 6∈ Ln, i.e. x 6∈ πFk ∈Kk for some k ≤ n. But then x ∈ Ak \ πFk ∈ Kk for some k ≤ n and thusAn \ Ln ⊂

⋃nk=1Ak \ πFk ∈ Kk. From this and δ ≤ P[An] it follows that

δ − P[Ln] ≤ P[An \ Ln

]≤∑n

k=1 P[Ak \ πFk ∈ Kk

]≤ δ

∑nk=1 2−k < δ

and hence P[Ln] > 0 for all n ∈ N. In particular, this implies that Ln 6= ∅ forall n ∈ N. Now, for each n ∈ N, we fix some arbitrary xn ∈ Ln and observethat this implies

xn ∈ πFk ∈ Kk, i.e. πFk(xn) ∈ Kk for all k ≤ n.

Put differently, this shows that πF1(xn) ∈ K1 for all n ∈ N, πF2(xn) ∈ K2

for all n ≥ 2, and so on. Since K1 is compact, there exists a subsequencen1

jj∈N ⊂ N such that

y1 , limj→∞ πF1(xn1j) ∈ K1 exists.

Iteratively, for each k ≥ 2, we find nkjj∈N ⊂ nk−1j j∈N with

yk , limj→∞ πFk(xnkj ) ∈ Kk exists.

Since nmj j∈N ⊂ nkjj∈N and the projection πFmFk is continuous for eachk,m ∈ N with k ≤ m, we observe moreover that

46


πFmFk (ym) = πFmFk(limj→∞ πFm(xnmj )

)= lim

j→∞πFmFk

(πFm(xnmj )

)= limj→∞ πFk(xnmj ) = yk.

But this can only be the case if there exists y ∈ ST such that

yn = πFn(y) for all n ∈ N,

and since yn ∈ Kn for each n ∈ N, this shows that y ∈ πFn ∈ Kn for alln ∈ N. By the definition of L, this yields y ∈ L ⊂ A and thus A is nonempty.Hence P is indeed a premeasure and we conclude.

It is not very difficult to see that Theorem 2.9 remains valid if we replaceS = Rd and S = B(Rd) by any discrete space I together with its power set.The main reason for this is that any discrete space may be identified withZ ⊂ R. This is important since it allows us to construct I-valued stochasticprocesses. As a matter of fact, Theorem 2.9 is true for S being any complete,separable metric space and S = B(S) the Borel σ-field on S.

The main difficulty in applying Kolmogorov’s consistency theorem is check-ing if a given set PF : F = [t1, . . . , tn] ⊂ T , n ∈ N of finite-dimensionaldistributions is consistent. By induction, it suffices to check that

PG[πGF ∈ B

]= PF [B] for all B ∈ B(Rd)F

whenever F ⊂ G are finite subsets of T and G has exactly one more el-ement than F . Moreover, since both PG

[πGF ∈ ·

]and PF are probability

measures on B(Rd)F , it suffices to check the consistency for all B containedin some π-system generating B(Rd)F .

Let us now put Kolmogorov’s consistency theorem to use and constructstochastic processes. The first example proves the existence of sequencesof independent random variables — a result which is often silently assumedin introductory courses on probability theory, but seldomly proved.

Corollary 2.10 (Existence of Independent Sequences). For each n ∈ N, letµn be a probability measure on (Rd,B(Rd)). Then there exists a sequence ofindependent random variables Xnn∈N on some probability space (Ω,A,P)such that Xn has distribution µn for each n ∈ N.

47


Proof. We may treat Xnn∈N as a stochastic process with finite-dimensionaldistributions PF : F = [t1, . . . , tn] ⊂ N, n ∈ N given by PF ,

⊗t∈F µt for

all F = [t1, . . . , tn] ⊂ N, n ∈ N. By Theorem 2.9, we only have to checkconsistency. For this, let F and G be finite subsets of N with F ⊂ G, and foreach t ∈ F fix some Bt ∈ B(Rd). Then B ,×t∈F Bt ∈ B(Rd)F and

πGF ∈ B =×t∈G Bt, where Bt , Rd whenever t 6∈ F .

But then, since µn(Rd) = 1 for all n ∈ N, it follows that

PF [B] =∏

t∈F µt(Bt) =∏

t∈G µt(Bt) = PG[×t∈G Bt

]= PG

[πGF ∈ B

],

implying that PF : F = [t1, . . . , tn] ⊂ N, n ∈ N is consistent.

Exercise 25. Let (Ω,A,P) be a finite probability space, i.e. Ω = ω1, . . . , ωnfor some n ∈ N. Show that there exists no nontrivial sequence Znn∈N ofindependent random variables on this probability space.

Exercise 26. Let Ω = [0, 1], A = B([0, 1]), and P be the Lebesgue measureon [0, 1]. Construct a sequence Znn∈N of independent Bernoulli distributedrandom variables with parameter p = 1/2 on (Ω,A,P).

The next example is the construction of a raw Brownian motion, i.e. astochastic process X = X(t)t∈[0,∞) which has the same distribution as aBrownian motion (but need not necessarily be continuous).

Corollary 2.11 (Existence of Raw Brownian Motion). There exists a stochas-tic process X = X(t)t∈[0,∞) on some probability space (Ω,A,P) which sat-isfies the properties (W1), (W2) with respect to its natural filtration FX , and(W3) of Definition 2.1.

Proof. We already know that the finite-dimensional distributions of a pro-cess satisfying (W1), (W2), and (W3) are multivariate normal. More pre-cisely, we look for a stochastic processX = X(t)t∈[0,∞) such that wheneverF = [t1, . . . , tn] ⊂ [0,∞), n ∈ N, then (X(t1), . . . , X(tn)) is multivariate nor-mally distributed with mean zero and covariance matrix ΣF = (ΣFi,j)i,j=1,...,n

given byΣFi,j = ti ∧ tj, i, j = 1, . . . , n.

48


Denote by PF the probability measure on (RF ,B(R)F) corresponding tothis distribution, the existence of which is guaranteed since ΣF is positivesemidefinite: If a = (a1, . . . , an) ∈ RF , then∑n

i,j=1 aiajΣFi,j =

∑ni,j=1 aiaj(ti ∧ tj) =

∫∞0

(∑ni=1 ai1[0,ti]

)2dx ≥ 0.

It is straightforward to check that this yields a set of finite-dimensional dis-tributions satisfying Kolmogorov’s consistency condition, and hence Theo-rem 2.9 yields the existence of a stochastic process X = X(t)t∈[0,∞) withthese finite-dimensional distributions. Observe that the distribution of X(0)degenerates, i.e. it has a Dirac distribution at zero, meaning that X(0) = 0almost surely. Clearly, it makes no difference for the distribution of X if weset replace X(0) by constant zero, and hence (W1) is satisfied. The proper-ties (W2) with respect to FX and (W3) are argued for exactly as in the proofof Theorem 2.3 (Brownian Motion as a Gaussian Process).

A closer inspection of the proof of Corollary 2.11 shows that we make use ofthe specific covariance structure of Brownian motion only to guarantee thatΣF is positive semidefinite, which in turn is only needed for the multivariatenormal distribution to exist in the first place. The same argument can hencebe used to construct raw Gaussian processes.

Exercise 27. Show the following equivalence: There exists a centered Gaussianprocess X = X(t)t∈T with covariance function Γ : T × T → R if and only iffor all F = [t1, . . . , tn] ⊂ T , n ∈ N, the matrix ΣF = (ΣFi,j)i,j=1,...,n with

ΣFi,j , Γ(ti, tj), i, j = 1, . . . , n,

is positive semidefinite.

Exercise 28. Show that there exist raw versions (i.e. not necessarily continu-ous) of the following Gaussian processes: Brownian bridge, Ornstein-Uhlenbeckprocess, and fractional Brownian motion.

A word of warning is in order here: The processes constructed in Corol-lary 2.11, Exercise 27, and Exercise 28 are in general not continuous, sowe have to put in some extra effort to construct continuous modifications ofthese processes without changing their distributions.

49


A natural idea to get a continuous modification of raw Brownian motionwould be to argue that the probability measure obtained from Kolmogorov’sconsistency theorem is concentrated on the subset of continuous functions.However, as the following exercise makes evident, this set is not measur-able and hence it does not even make sense to assign any probability to it.Thus, we have to come up with a different idea.

Exercise 29. Let us denote by C ⊂ R[0,∞) the set of continuous functions from[0,∞) to R. Show that C 6∈ B(R)[0,∞).

2.5 The Kolmogorov-Centsov Continuity Theorem

So how can we construct Brownian motion W from raw Brownian motionX? Clearly, if we modify each X(t), t ∈ [0,∞), on a nullset, the finite-dimensional distributions of X remain unchanged. The idea is hence tomake these modifications in a way to end up with continuous paths. It isquite easy to imagine that this will not work for any arbitrary stochasticprocess X, and hence we need to identify a condition under which this ispossible. This condition is provided by the Kolmogorov-Centsov Continu-ity Theorem, and involves a condition on the moments of the increments ofX. We start our endeavors with a definition.

Definition 2.12 (Modification; Indistinguishability). Let X = X(t)t∈T andY = Y (t)t∈T be two stochastic processes defined on the same probabilityspace. We say that Y is a modification of X if

X(t) = Y (t) almost surely for each t ∈ T .

Moreover, we say that X and Y are indistinguishable if

X(t) = Y (t) for all t ∈ T almost surely

in which case we also write X = Y a.s. for brevity.

Y being a modification of X means that for each t ∈ T , there exists a nullsetNt ∈ A such that X(t, ω) = Y (t, ω) for all ω 6∈ Nt. It is crucial to understandthat the nullset Nt depends on t. If X and Y are indistinguishable, there

50


exist a universal nullset N ∈ A such that X(t, ω) = Y (t, ω) for all t ∈ T andall ω 6∈ N , i.e. almost all paths of X and Y coincide. In particular, the notionof indistinguishability is stronger than the notion of modification, i.e. if Xand Y are indistinguishable, then Y is also a modification of X. Moreover,in both cases, X and Y have the same finite-dimensional distributions as(X(t1), . . . , X(tn)

)=(Y (t1), . . . , Y (tn)

)a.s. for all t1, . . . , tn ∈ T , n ∈ N.

If T is countable, then⋃t∈T Nt is still a nullset and the two notions coincide.

If T is uncountable this need not be the case.

Exercise 30. Show that indistinguishability is stronger than the notion ofmodification by constructing processes X = X(t)t∈[0,1] and Y = Y (t)t∈[0,1]

such that Y is a modification of X, but X and Y are not indistinguishable.

If, on the other hand, X and Y are, say, right continuous, then the paths ofX and Y are determined by the values at countably many time points. Inthis situation, it is not surprising that modifications are indistinguishable.

Exercise 31. Let Y = Y (t)t∈[0,∞) be a modification of X = X(t)t∈[0,∞).Show that if X and Y are either both left continuous or both right continuous,then they are indistinguishable.

It will turn out that Brownian motion can be constructed as a modificationof raw Brownian motion. To arrive at this result, we first need some prelimi-nary notation and basic results on increments on dyadic rational numbers.

Definition 2.13 (Dyadic Rationals; Dyadic Neighbors; Dyadic Modulus). Foreach n ∈ N, we define the dyadic rationals via

Dn ,k2−n ∈ [0, 1) : k = 0, . . . , 2n − 1

and D∞ ,

⋃n∈N Dn.

We say that s, t ∈ Dn are neighbors in Dn if |s− t| ≤ 2−n. If (S, d) is a metricspace and f : D∞ → S is an arbitrary function, then we call

$f : (0,∞)→ [0,∞], $f (δ) , sup|s−t|≤δ

d(f(s), f(t)

)the dyadic modulus of f .

51


Note that for any t ∈ Dn, there are at most three neighbors in Dn, namelyt − 2−n, t, and t + 2−n. Moreover, Dn consists of exactly 2n points. Finally,for p ∈ (0, 1] given, a function f : D∞ → S is p-Hölder continuous whenever

$f (2−n) ≤ C2−np for eventually all n ∈ N.

Indeed, choose n large enough such that the above estimate holds andchoose s, t ∈ D∞ with 0 < |s − t| ≤ 2−n. Then there exists m ∈ N withm ≥ n such that

2−(m+1) ≤ |s− t| ≤ 2−m.

But then it follows that

d(f(s), f(t)

)≤ $f

(2−m

)≤ C2−mp = 2pC2−(m+1)p ≤ 2pC|s− t|p.

But this means that f is p-Hölder continuous.

Lemma 2.14 (Estimate on the Dyadic Modulus). If (S, d) is a metric spaceand f : D∞ → S, then

$f

(2−n)≤ 3

∑∞k=n$

kf for all n ∈ N,

where, for each k ∈ N, the constant $kf is defined as

$kf , sup

d(f(s), f(t)

): s, t ∈ Dk are neighbors in Dk

.

Proof. For each n ∈ N, we define un : D∞ → Dn by

un(t) , maxk2−n : k ∈ N0, k2−n ≤ t

.

Clearly, un(t) is simply the left neighbor in Dn of t. In particular, it is easilyseen that un+1(t) and un(t) are neighbors in Dn+1 and un(t) ≤ un+1(t) ≤ t.Moreover, it is clear that un(t) = t if t ∈ Dn, implying that for every t ∈ D∞there exists some N ∈ N (depending on t) such that un(t) = t for all n ≥ N .From this and the triangle inequality, it follows that

d(f(t), f(un(t))

)= d(f(uN(t)), f(un(t))

)≤∑N−1

k=n d(f(uk(t)), f(uk+1(t))

)≤∑∞

k=n d(f(uk(t)), f(uk+1(t))

)≤∑∞

k=n$kf

52


for all n ∈ N. Since the last estimate does not depend on t any longer,this shows that the estimate holds uniformly in t ∈ D∞. Now suppose thats, t ∈ D∞ are such that |s−t| ≤ 2−n. Then un(s) and un(t) must be neighborsin Dn and hence d(f(un(s)), f(un(t))) ≤ $n

f . But then

d(f(s), f(t)

)≤ d(f(s), f(un(s))

)+ d(f(un(s)), f(un(t))

)+ d(f(un(t)), f(t)

)≤ 2

∑∞k=n$

kf +$n

f ≤ 3∑∞

k=n$kf .

Again, since the last estimate no longer depends on the particular choice ofs and t, this implies the result.

We now put this estimate to use by showing that every stochastic processdefined on T = D∞ satisfying a suitable moment condition must necessarilyhave Hölder continuous paths.

Lemma 2.15 (Hölder Continuity from Moments). Let X = X(t)t∈D∞ be astochastic process taking values in a metric space (S, d). Suppose that thereexist constants α, β, γ > 0 such that

E[d(X(s), X(t)

)α] ≤ γ|s− t|1+β for all s, t ∈ D∞.

Then, for every p ∈ (0, β/α), there exists an event N ∈ A with P[N ] = 0 suchthat t 7→ X(t, ω) is p-Hölder continuous on D∞ for all ω ∈ N c.

Proof. Step 1: Let p ∈ (0, β/α) and θ > 0 such that p + θ < β/α. For eachs, t ∈ D∞ with s 6= t, the Markov inequality shows that

P[d(X(s),X(t))|s−t|p ≥ |s− t|θ

]= P

[d(X(s), X(t)

)α ≥ |s− t|α(p+θ)]

≤ |s− t|−α(p+θ)E[d(X(s), X(t)

)α]≤ γ|s− t|1+β−α(p+θ).

Since p+ θ < β/α, it follows that κ , β−α(p+ θ) > 0. Now for each n ∈ N,define an event

An ,d(X(s),X(t))|s−t|p ≥ 2−nθ for some distinct s, t ∈ Dn neighbors in Dn

=⋃s∈Dn

⋃t∈Dn,|s−t|=2−n

d(X(s),X(t)|s−t|p ≥ 2−nθ = |s− t|θ

.

53


Recall that there are at most 2n points s ∈ Dn, and there are at most twopoints t ∈ Dn with |s− t| = 2−n. This implies that

P[An] ≤ 2 · 2nγ(2−n)1+β−α(p+θ) = γ21+n−n(1+κ) = γ21−nκ for all n ∈ N.

In particular,∑∞

n=1 P[An] < ∞ since κ > 0, and thus the Borel-Cantellilemma implies that

N , lim supn→∞

An =⋂∞n=1

⋃∞m=nAm ∈ A is a set of probability zero.

Step 2: p-Hölder continuity outside of N . Let us fix ω ∈ N c and writef , X(·, ω) for brevity. Since ω 6∈ N , there exists some N ∈ N such thatω 6∈ An for all n ≥ N , i.e. for all n ∈ N with n ≥ N it holds that

d(f(s),f(t))|s−t|p < 2−nθ whenever s, t ∈ Dn are neighbors in Dn and s 6= t.

This, however, simply means that $nf ≤ 2−n(p+θ) for all n ≥ N . But then

Lemma 2.14 (Estimate on the Dyadic Modulus) shows that

$f

(2−n)≤ 3

∑∞k=n$

kf ≤ 3

∑∞k=n 2−k(p+θ) = 2−n(p+θ) 3

1−2−(p+θ) ≤ C2−np

for all n ≥ N , where C , 31−2−(p+θ) . But then f is p-Hölder continuous.

The next step is to show that any Hölder continuous function f on D∞ canbe extended to a Hölder continuous function on [0, 1].

Lemma 2.16 (Hölder Continuous Extension). Suppose that (S, d) is a com-plete metric space and let n ∈ N. Assume that there exists p ∈ (0, 1] such thatf : 2nD∞ → S is p-Hölder continuous. Then there exists a uniquely determinedp-Hölder continuous extension g : [0, 2n]→ S of f .

Proof. Step 1: Construction of g. Let t ∈ [0, 2n] and choose a sequenceskk∈N ⊂ 2nD∞ converging to t. We claim that f(sk)k∈N is a Cauchysequence in S. Indeed, let ε > 0 be given. Since f is p-Hölder continuouson 2nD∞, there exist δ, C > 0 such that

d(f(r), f(s)

)≤ C|r − s|p for all r, s ∈ 2nD∞ with |r − s| ≤ δ.

54


Since the sequence skk∈N converges, it must be Cauchy, and hence thereexists K ∈ N such that |sk− sj| < min(ε/C)1/p, δ for all k, j ≥ K. But then

d(f(sk), f(sj)

)≤ C|sk − sj|p < ε for all k, j ≥ K,

showing that f(sk)k∈N is Cauchy. Since (S, d) is complete, this implies thatthe sequence converges. Now set

g(t) , limk→∞

f(sk).

Observe that the right hand side is equal to f(t) if t ∈ 2nD∞ by the Höldercontinuity of f , and hence g is an extension of f .

Step 2: We show that g(t) does not depend on the particular choice of thesequence skk∈N. Indeed, let skk∈N ⊂ 2nD∞ be another sequence con-verging to t. Then rkk∈N defined by

r2k−1 , sk and r2k , sk for all k ∈ N

also converges to t and contains skk∈N and skk∈N as subsequences. Nowas in the first step, we see that f(rk)k∈N is Cauchy and hence converges.But then any subsequence converges to the same point and thus

g(t) = limk→∞

f(sk) = limk→∞

f(rk) = limk→∞

f(sk).

Step 3: g is p-Hölder continuous on [0, 2n]. Since f is p-Hölder continuous,there exist δ, C > 0 such that

d(f(s), f(t)

)≤ C|s− t|p for all s, t ∈ 2nD∞ with |s− t| ≤ δ.

Now let s, t ∈ [0, 2n] with |s− t| ≤ δ/2 and choose skk∈N, tkk∈N ⊂ 2nD∞converging to s and t, respectively, and |sk − tk| ≤ δ for all k ∈ N. Then

d(g(s), g(t)

)≤ d(g(s), f(sk)

)+ d(f(sk), f(tk)

)+ d(f(tk), g(t)

)≤ d(g(s), f(sk)

)+ d(f(tk), g(t)

)+ C|sk − tk|p

for all k ∈ N. Taking the limit k → ∞ and using that s and t where chosenarbitrarily thus shows that g is p-Hölder continuous, i.e.

d(g(s), g(t)

)≤ C|s− t|p for all s, t ∈ [0, 2n] with |s− t| ≤ δ/2.

55


Step 4: Uniqueness of g. Let g be another p-Hölder continuous extensionof f . Then g and g must coincide on 2nD∞ since they both coincide with fon this set. If t ∈ [0, 2n] \ 2nD∞, there exists a sequence skk∈N ⊂ 2nD∞converging to t. But then, since both g and g are continuous, it follows that

g(t) = limk→∞

g(sk) = limk→∞

f(sk) = limk→∞

g(sk) = g(t),

i.e. g = g everywhere on [0, 2n].

Putting the previous lemmas together allows us to prove the Kolmogorov-Centsov continuity theorem.

Theorem 2.17 (Kolmogorov-Centsov Continuity Theorem). Let (S, d) be acomplete metric space andX = X(t)t∈[0,∞) be an S-valued stochastic process.Assume that there exist constants α, β, γ > 0 such that

E[d(X(s), X(t)

)α] ≤ γ|s− t|1+β for all s, t ∈ [0,∞).

Then, for every p ∈ (0, β/α), there exists a p-Hölder continuous modificationY = Y (t)t∈[0,∞) of X.

Proof. Step 1: Construction of Y . Fix n ∈ N and consider the stochasticprocess Xn = Xn(t)t∈D∞ given by

Xn(t) , X(t2n) for all t ∈ D∞.

It is clear that for all s, t ∈ D∞

E[d(Xn(s), Xn(t)

)α]= E

[d(X(s2n), X(t2n)

)α] ≤ γ2(1+β)n|s− t|1+β

and hence Lemma 2.15 (Hölder Continuity From Moments) shows thatthere exists an event Nn ∈ A with P[Nn] = 0 such that Xn(·, ω) is p-Höldercontinuous on D∞ for all ω ∈ N c

n. This however means that X(·, ω) is p-Hölder continuous on 2nD∞ for all ω ∈ N c

n since, for the p-Hölder continuityconstants C, δ > 0 of Xn(·, ω),

d(X(s, ω), X(t, ω)

)= d(Xn(s2−n, ω), Xn(t2−n, ω)

)≤ C

∣∣s2−n − t2−n∣∣p = C2−np∣∣s− t∣∣p

56


for all s, t ∈ 2nD∞ with |s − t| ≤ δ2n. By Lemma 2.16, for each such ω,there exists a unique p-Hölder continuous extension Yn(·, ω) : [0, 2n]→ S ofX(·, ω) : 2nD∞ → S. Now set N ,

⋃n∈NNn ∈ A and observe that P[N ] = 0.

Due to the uniqueness of the extension and since 2mD∞ ⊂ 2nD∞ wheneverm,n ∈ N with m ≤ n, we observe that

Ym(t, ω) = Yn(t, ω) for all t ∈ [0, 2m] and ω ∈ N c if m,n ∈ N with m ≤ n.

Denoting by x some arbitrary element of S, we define Y = Y (t)t∈[0,∞) by

Y (t, ω) , x whenever t ∈ [0,∞) and ω ∈ N and

Y (t, ω) , Yn(t, ω) whenever t ∈ [0, 2n] and ω ∈ N c.

Step 2: We are left with showing that Y is a stochastic process and a modi-fication of X. By construction of Y we have

Y (s) = X(s)1Nc + x1N for all s ∈⋃n∈N 2nD∞.

In particular, for every such s, we have Y (s) = X(s) a.s. and Y (s) is A-measurable since X(s) is A-measurable and N ∈ A. For any t ∈ [0,∞),there exists skk∈N ⊂

⋃n∈N 2nD∞ converging to t. Continuity of Y implies

Y (t, ω) = limk→∞

Y (sk, ω) for all ω ∈ Ω

and thus Y (t) is A-measurable for all t ∈ [0,∞) and therefore Y is a stochas-tic process. It remains to show that Y (t) = X(t) almost surely. For this, wefirst observe that the Markov inequality and the moment estimate show that

P[d(Y (sk), X(t)

)≥ ε]

= P[d(X(sk), X(t)

)≥ ε]

= P[d(X(sk), X(t)

)α ≥ εα]

≤ ε−αE[d(X(sk), X(t)

)α] ≤ ε−αγ|sk − t|1+β

for all ε > 0. Since the right hand side converges to zero, it follows thatY (sk) converges in probability to X(t). But Y (sk, ω) converges to Y (t, ω) forall ω ∈ Ω, hence also in probability. Since limits with respect to convergencein probability are almost surely unique, we must have Y (t) = X(t) a.s. andthus Y is a modification of X and the proof is complete.

57


Observe that if X is F-adapted, it is in general not true that the modificationY is also F-adapted since we do not necessarily have N ∈ F(t) for all t ∈[0,∞). If we want to ensure that the continuous modification Y is adapted,we need an additional assumption on the filtration F.

Definition 2.18 (Complete Filtration). We say that a filtration F = F(t)t∈Tis complete if it contains all P-nullsets in A, i.e.

N ∈ F(t) for all t ∈ T whenever N ∈ A satisfies P[N ] = 0.

If the filtration F is complete and X is F-adapted, it is straightforward to seethat any modification Y of X is F-adapted. We leave this as an exercise.

Exercise 32. Let Y = Y (t)t∈T be a modification of X = X(t)t∈T andassume that X is adapted to a complete filtration F = F(t)t∈T . Show that Yis F-adapted as well.

2.6 Existence of Brownian Motion

At this point, existence of Brownian motion requires little more effort thanpiecing together the results of the previous two sections.

Theorem 2.19 (Existence of Brownian Motion). Brownian motion exists.Moreover, almost every path of Brownian motion is p-Hölder continuous forall p ∈ (0, 1/2).

Proof. Step 1: Existence. Let X = X(t)t∈[0,∞) denote the raw Brownianmotion process constructed in Corollary 2.11 and fix s, t ∈ [0,∞) with s < t.Then X(t)−X(s) has normal distribution with mean zero and variance t−s.Thus, if Z denotes a standard normal random variable,

E[∣∣X(t)−X(s)

∣∣n] = (t− s)n/2E[|Z|n

]for all n ∈ N.

For n ≥ 3, setting α , n, β , n/2 − 1 > 0, and γ , E[|Z|n] < ∞, this canequivalently be written as

E[∣∣X(t)−X(s)

∣∣α] ≤ γ|t− s|1+β,

58

2.6 Existence of Brownian Motion

and thus Theorem 2.17 (Kolmogorov-Centsov) implies that X admits a con-tinuous modification W = W (t)t∈[0,∞). Since W (0) 6= 0 ∈ A is a nullset,we may without loss of generality assume that W satisfies (W1) (set W tozero on W (0) 6= 0 otherwise). Since X and W share the same finite-dimensional distributions, it follows moreover that W satisfies (W2) (withrespect to its natural filtration) and (W3). (W4) is clear. Thus W is a Brow-nian motion.

Step 2: Hölder continuity. Let W be an arbitrary Brownian motion. Usingthe same arguments as in the first step, it follows that W admits a modifica-tion W = W (t)t∈[0,∞) which is p-Hölder continuous for any p satisfying

p < βα

= n/2−1n

= 12− 1

nfor some n ∈ N,

i.e. for all p < 1/2. But since both W and W are continuous it follows fromExercise 31 that W and W are indistinguishable, i.e. up to a nullset, thepaths of W and W coincide.

Comparing with the examples in Section 2.2, Theorem 2.19 furthermoreimplies that the Brownian bridge and the Ornstein-Uhlenbeck process existsince they can be constructed directly from Brownian motion. The existenceof fractional Brownian must be argued for separately, but of course the ar-guments resemble those of Brownian motion.

Exercise 33. Show that fractional Brownian motion with Hurst index H ∈(0, 1) exists. Moreover, show that almost every path of fractional Brownianmotion is p-Hölder continuous for all p ∈ (0, H).

Exercise 34. Let X = X(t)t∈[0,∞) be an Ornstein-Uhlenbeck process andY = Y (t)t∈[0,1] be a Brownian bridge. For which p ∈ (0, 1] are X and Yp-Hölder continuous?

Another interesting consequence of the existence of Brownian motion is theexistence of functions which are nowhere differentiable. If this does notimpress you, try to construct such a function by yourself!

Corollary 2.20 (Existence of Nowhere Differentiable Functions). There ex-ists a function f : [0,∞)→ R which is nowhere differentiable.

59


Proof. This is an immediate consequence of Theorem 2.19 (Existence ofBrownian motion) and Theorem 2.7 (Nowhere Differentiability of Brown-ian Paths).

Another consequence of the Kolmogorov-Centsov continuity theorem is thefollowing time inversion result, with which we conclude this chapter onBrownian motion.

Corollary 2.21 (Time Inversion for Brownian Motion). Given a Brownianmotion W = W (t)t∈[0,∞), define a stochastic process X = X(t)t∈[0,∞) by

X(0) , 0 and X(t) , tW(1/t)

for all t ∈ (0,∞).

Then there exists a Brownian motion Y (with respect to its natural filtrationFY ) which is indistinguishable from X.

Proof. It is immediately seen that X is a centered Gaussian process since Wis a centered Gaussian process. The covariance function Γ of X is given byΓ(s, t) = 0 = s ∧ t if either s or t is zero and otherwise

Γ(s, t) = Cov[sW (1/s), tW (1/t)

]= stmin1/s, 1/t = s ∧ t.

Thus X is a raw Brownian motion and each path of X is continuous on(0,∞), thus left continuous on [0,∞). But then the continuous modificationY of X obtained from Theorem 2.17 (Kolmogorov-Centsov) is a Brownianmotion with respect to FY and indistinguishable from X by Exercise 31.

60

Chapter

3 MARTINGALES

We now turn to the concept of martingales – a class of stochastic processeswhich plays a pivotal role in many fields of application such as dynamicstochastic optimization or financial mathematics.

3.1 Definition and Examples

From now onwards, we always work with a given filtration F = F(t)t∈T .Unless explicitly stated otherwise, whenever we speak of concepts whichdepend on the choice of the filtration (such as adaptedness, stopping times,Brownian motion, ...), we always take these concepts with respect to F with-out specifically mentioning this fact.

Definition 3.1 ((Sub-/Super-)Martingale). We say that a stochastic processX = X(t)t∈T is a submartingale (with respect to F) if X is adapted, satis-fies E[|X(t)|] <∞ for all t ∈ T , and

X(s) ≤ E[X(t)

∣∣F(s)]

a.s. for all s, t ∈ T with s < t.

We say that X is a supermartingale if it is adapted, satisfies E[|X(t)|] < ∞for all t ∈ T , and

X(s) ≥ E[X(t)

∣∣F(s)]


Finally, we say that X is a martingale if it is both a submartingale and asupermartingale. In particular, any martingale X satisfies

X(s) = E[X(t)

∣∣F(s)]


61

Chapter 3 Martingales

Warning: Experience shows that students often only remember the property

X(s) = E[X(t)

∣∣F(s)]

a.s. for all s, t ∈ T with s < t

in the definition of a martingale. The adaptedness of X and the integrabil-ity assumption E[|X(t)|] <∞ for all t ∈ T are equally important and shouldnever be forgotten when checking if a given process is a martingale.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−1

0

1

2

Time t

Valu

eX(t)

Figure 3.1 A martingale (Brownian Motion). Averaging the values of the red pathsat time t = 1 yields (approximately) the common value of these pathsat time s = 1/2.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−1

0

1

Time t

Valu

eX(t)

Figure 3.2 Not a martingale (Brownian Bridge). The values of the red paths attime t = 1 (which are all equal to zero) are different from the realiza-tion at time s = 1/2.

62


A martingale has the feature that starting at any time s ∈ T and giventhe information F(s) available up to this point in time, the process is onaverage constant; see Figure 3.1. Clearly, if X is a submartingale, then −Xis a supermartingale, and X is a martingale if and only if X and −X aresubmartingales if and only if X and −X are supermartingales.

We note that the property of being a (sub-/super-)martingale depends onthe choice of the filtration. That is, if X is a submartingale with respectto F, it need not be a submartingale with respect to another filtration G =G(t)t∈T , even if X is G-adapted.

Exercise 35. Let X = X(t)t∈T be a submartingale with respect to F. Showthat X is then also a submartingale with respect to its natural filtration FX =FX(t)t∈T . More generally, suppose that G = G(t)t∈T is a filtration satisfy-ing G(t) ⊂ F(t) for all t ∈ T . Under which condition on G is X a submartin-gale with respect to G?

Martingales are often interpreted as models for fair games. Suppose thatyou are offered a game in which you win 1 Euro whenever a coin toss givesheads whereas you lose 1 Euro if tails shows up. If it is a fair coin and thecoin tosses in each round are independent, your winning probability in eachround is equal to p = 1/2. Denoting by X(s) your total winnings after thesth round, it is is easy to see that

E[X(s+ 1)

∣∣FX(s)]

= E[X(s+ 1)

∣∣X(1), . . . , X(s)]

= p(X(s) + 1

)+ (1− p)

(X(s)− 1

)a.s.

Since p = 1/2, the latter expression is equal to X(s). Applying this equa-tion iteratively and using the tower property of conditional expectation thusshows that, for all t > s,

E[X(t)

∣∣FX(s)]

= E[E[X(t)

∣∣FX(t− 1)]∣∣∣FX(s)

]= E

[X(t− 1)

∣∣FX(s)]

= . . . = E[X(s)

∣∣FX(s)]

= X(s) a.s.,

i.e. X is a martingale with respect to FX . The same arguments also showthat if p < 1/2, i.e. if the game is unfavorable, then

E[X(s+ 1)

∣∣FX(s)]

= p(X(s) + 1

)+ (1− p)

(X(s)− 1

)≤ X(s) a.s.

63


This can be used to show that X is a supermartingale, i.e. supermartingalesare models for unfavorable games. A similar argument with p > 1/2 showsthat submartingales correspond to favorable games.

The keen observer will have noticed that we have encountered the processX in the above example before: It is a random walk since the winnings ineach round are independent, and X(t) is just the sum of all winnings up totime t. In the case of a fair game it its in fact a classical random walk. Wehave hence found our first example of a martingale!

Exercise 36. Let X be a random walk constructed from a sequence Znn∈Nof independent and identically distributed random variables with E[|Z1|] <∞.Show that X is

(i) a submartingale with respect to FX if and only if E[Z1] ≥ 0,

(ii) a supermartingale with respect to FX if and only if E[Z1] ≤ 0, and

(iii) a martingale with respect to FX if and only if E[Z1] = 0.

Let us look at some more examples.

Example 3.2 (Closed Martingale). Let Z be a random variable satisfyingE[|Z|] <∞. Then the process X = X(t)t∈T given by

X(t) , E[Z∣∣F(t)

]for all t ∈ T

is a martingale. Indeed, for any s, t ∈ T with s < t, the tower property ofconditional expectation shows that

X(s) = E[Z∣∣F(s)

]= E

[E[Z∣∣F(t)

]∣∣∣F(s)]

= E[X(t)

∣∣F(s)]

a.s.

Conversely, if X is a martingale and sup T ∈ T , then, upon setting Z ,X(sup T ), it follows that

X(t) = E[X(sup T )

∣∣F(t)]

= E[Z∣∣F(t)

]a.s.

Any martingale of this form is called a closed martingale.

Example 3.3 (Increasing Processes are Submartingales). If M = M(t)t∈Tis a martingale and A = A(t)t∈T is an adapted increasing process with

64


E[|A(t)|] <∞ for all t ∈ T , then X ,M + A is a submartingale since

X(s) = M(s) + A(s) = E[M(t)

∣∣F(s)]

+ A(s) = E[M(t) + A(s)

∣∣F(s)]

≤ E[M(t) + A(t)

∣∣F(s)]

= E[X(t)

∣∣F(s)]

a.s.

for all s, t ∈ T with s < t. Since the process M = M(t)t∈[0,∞) with M(t) = 0for all t ∈ [0,∞) is obviously a martingale, this implies that any increasingprocess, and in particular every renewal process, is a submartingale.

Example 3.4 (Brownian Motion is a Martingale). If W is a Brownian motionand s, t ∈ [0,∞) with s < t, then, using that W (t) −W (s) is independent ofF(s) and has zero mean,

E[W (t)

∣∣F(s)]−W (s) = E

[W (t)−W (s)

∣∣F(s)]

= E[W (t)−W (s)

]= 0 a.s.

This shows that Brownian motion is a martingale.

Similar arguments can be used to show that a whole range of other processesare martingales.

Exercise 37. Let W be a Brownian motion. Show that the processes W (t)2−tt∈[0,∞) and W (t)3 − 3tW (t)t∈[0,∞) are martingales.

Exercise 38 (Geometric Brownian Motion). LetW be a Brownian motion andµ ∈ R and σ > 0. Let moreover X = X(t)t∈[0,∞) be a geometric Brownianmotion given by

X(t) , eµt+σW (t), t ∈ [0,∞).

Under which conditions on µ and σ is X a (sub-/super-)martingale?

The following lemma is a useful result showing that martingales are turnedinto submartingales under convex transformations and the submartingaleproperty is stable under convex increasing transformations.

Proposition 3.5 (Submartingales via Convex Transformations). Let X =X(t)t∈T be a submartingale and ϕ : R → R be a convex function. De-fine Y = Y (t)t∈T by Y (t) , ϕ(X(t)) for all t ∈ T and assume thatE[|Y (t)|] < ∞ for all t ∈ T . Then Y is a submartingale if ϕ is nondecreasingor if X is a martingale.

65


Proof. Let s, t ∈ T with s < t. Then Jensen’s inequality shows that

E[ϕ(X(t)

)∣∣F(s)]≥ ϕ

(E[X(t)

∣∣F(s)])

a.s.

If X is a martingale, the right hand side equals ϕ(X(s)) almost surely,whereas it is almost surely greater or equal than ϕ(X(s)) if X is a sub-martingale and ϕ is nondecreasing. In either case, this shows that

E[Y (t)

∣∣F(s)]

= E[ϕ(X(t))

∣∣F(s)]≥ ϕ(X(s)) = Y (s) a.s.

3.2 Discrete Time Martingales

We now turn to a series of results for discrete time martingales. Whilemost of the results also hold in continuous time, we do not have the nec-essary tools available to prove these yet. As a matter of fact, most of thecontinuous time results are inferred from the discrete time results by a lim-iting procedure, which will be the subject of the subsequent chapter. Fornow, we content ourselves with the discrete time results.

The first result may look rather obvious at first sight, but has far reachingconsequences. It shows that the (sub-)martingale property also holds if thedeterministic times s and t are replaced by stopping times with finite range(and is thus considered a discrete time result although the process underconsideration may be a continuous time process).

Theorem 3.6 (Optional Stopping (Discrete)). Let X = X(t)t∈T be a sub-martingale and let F be a finite subset of T . If σ and τ are stopping timestaking values in F and σ ≤ τ , then

X(σ) ≤ E[X(τ)

∣∣F(σ)]

a.s.

If X is a martingale, this relation holds with equality almost surely.

Proof. If we can prove the result for submartingales, the martingale casefollows immediately since then both X and −X are submartingales. Let uswrite F = [t0, t1, . . . , tn]. With this, X(σ) and X(τ) can be written as

X(σ) = X(t0) +∑n

k=1

[X(tk)−X(tk−1)

]1tk≤σ,

66


X(τ) = X(t0) +∑n

k=1

[X(tk)−X(tk−1)

]1tk≤τ.

From this representation it is easily seen that E[|X(σ)|],E[|X(τ)|] <∞ and

X(τ)−X(σ) =∑n

k=1

[X(tk)−X(tk−1)

]1σ<tk≤τ.

Let us now fix F ∈ F(σ) and k ∈ 1, . . . , n. Since

F ∩ σ < tk ≤ τ =(F ∩ σ ≤ tk−1

)∩ τ ≤ tk−1c ∈ F(tk−1),

we see that

E[1F1σ<tk≤τ

(X(tk)−X(tk−1)

)]= E

[1F1σ<tk≤τ

(E[X(tk)

∣∣F(tk−1)]−X(tk−1)

)]≥ 0

by the submartingale property of X. But then

E[1F(X(τ)−X(σ)

)]=∑n

k=1 E[1F1σ<tk≤τ

(X(tk)−X(tk−1)

)]≥ 0.

By the definition of conditional expectation, this means that

E[X(τ)−X(σ)

∣∣F(σ)]≥ 0 a.s.,

and we conclude since X(σ) is F(σ)-measurable by Lemma 1.24.

Returning to the interpretation of a martingale X as a model for a fair game,the optional stopping theorem implies that no matter how good a strategyone chooses to enter and leave the game, it is not possible to make strictlypositive gains without running the risk of losing money as well. Moreprecisely, one may think of σ and τ as the entry time into and the exit timefrom the game, respectively. The optional stopping theorem then impliesthat it is impossible to find a combination of σ and τ such that X(σ) ≤ X(τ)almost surely and X(σ) < X(τ) with positive probability. While the optionalstopping theorem requires σ and τ to have a finite range to be valid, anygame that can be played in the real world clearly satisfies this assumption.

67


0 10 20 30 40 50

−5

0

5

Time t

Valu

eX(t)

0 10 20 30 40 50

−5

0

5

Time t

Figure 3.3 The optional stopping theorem in action: In a fair game with finitelymany rounds, it is not possible to guarantee a strictly positive gain byleaving the game early (left picture, stop as soon as total earnings reach5 Euro) without running the risk of losing money (right picture, 5 Euroare never reached, loss of 6 Euro at time t = 50).

The finiteness assumption on the range of the stopping times in Theo-rem 3.6 is crucial, as the following example shows. Let X = X(t)t∈N0 bea classical random walk and consider the stopping times σ , 0 and

τ , inft ∈ N0 : X(t) = 1

.

It is possible to show that τ < ∞ almost surely. In particular, we haveX(τ) = 1 a.s. and thus

X(σ) = 0 < 1 = E[X(τ)

].

It should be noted that the martingale property

X(s) = E[X(t)

∣∣F(s)]

for all s, t ∈ T with s < t

implies thatE[X(s)] = E[X(t)] for all s, t ∈ T ,

i.e. martingales are constant in expectation. The reverse statement is,however, in general not true.

Exercise 39. Let X = X(t)t∈[0,∞) be an Ornstein-Uhlenbeck process. Showthat E[X(t)] = E[X(s)] for all s, t ∈ [0, 1], but X is not a martingale.

68


Exercise 40. Assume that X = X(t)t∈[0,1] is a Brownian bridge. Show thatE[X(t)] = E[X(s)] for all s, t ∈ [0,∞), but X is not a martingale.

It is nevertheless still possible to characterize martingales in terms of theirexpected values. As the previous two exercises show, it is not enough for aprocess X to be constant in expectation along all time points t ∈ T to be amartingale. If, however, the process is constant in expectation along a largeenough class of stopping times, i.e. E[X(σ)] = E[X(τ)] for stopping timesσ and τ , then it turns out that X is a martingale. This result is the famousoptional sampling theorem. We state the result only for submartingales,but note that the analogous results for supermartingales and martingales areevidently also true (and a direct consequence of the submartingale case).

Theorem 3.7 (Optional Sampling (Discrete)). Assume that X = X(t)t∈Tis an adapted stochastic process. Then X is a submartingale if and only if, forall stopping times σ, τ with σ ≤ τ that take values in a finite subset F of T , itholds that E[|X(σ)|],E[|X(τ)|] <∞ and

E[X(σ)

]≤ E

[X(τ)

].

In particular, if τ is an F -valued stopping time and X is a submartingale, thestopped process Xτ = X(· ∧ τ) = X(t ∧ τ)t∈T is a submartingale as well.

Proof. Let σ and τ be F -valued stopping times with σ ≤ τ . If X is a sub-martingale, Theorem 3.6 (Discrete Optional Stopping) shows that X(σ) ≤E[X(τ)|F(σ)] a.s., and hence

E[X(σ)

]≤ E

[E[X(τ)|F(σ)]

]= E

[X(τ)

]by taking expectations.

Now suppose that E[X(σ)] ≤ E[X(τ)] for all F -valued stopping times σ, τwith σ ≤ τ . Given s, t ∈ T with s < t and F ∈ F(s), we define

σ , s and τ , t1F + s1F c .

The F(s)-measurability of F implies that τ is a stopping time since, for anyr ∈ T , we have τ ≤ r = ∅ ∈ F(r) if r < s, τ ≤ r = Ω ∈ F(r) if r ≥ t, and

τ ≤ r = F c ∈ F(s) ⊂ F(r) if s ≤ r < t.

69


Thus σ and τ are s, t-valued stopping times with σ ≤ τ and hence

E[1F(X(t)−X(s)

)]= E

[1F(X(τ)−X(σ)

)]= E

[X(τ)−X(σ)

]≥ 0,

where we have used that X(τ)1F c = X(s)1F c = X(σ)1F c to obtain thesecond equality. Since F ∈ F(s) was chosen arbitrarily, this implies thatE[X(t)−X(s)|F(s)] ≥ 0 a.s., and hence X is a submartingale.

Corollary 3.8 (Wald’s Identity). Let X = X(t)t∈N0 be a martingale withT = N0 and suppose that there exists a constant K > 0 such that

|X(t)−X(t− 1)| ≤ K a.s. for all t ∈ N.

Then it holds that

E[X(τ)] = E[X(0)] for all N0-valued stopping times τ with E[τ ] <∞.

Proof. Applying Theorem 3.7 (Discrete Optional Sampling) to the stoppingtimes σ , 0 and τ ∧ n shows that

E[X(τ ∧ n)

]= E

[X(0)

]for all n ∈ N.

Since X(τ ∧ n) → X(τ) as n → ∞ and the sequence X(τ ∧ n)n∈N isdominated by the integrable random variable

|X(τ ∧ n)| ≤ |X(0)|+∑τ∧n

k=1 |X(t)−X(t− 1)| ≤ |X(0)|+Kτ a.s.,

it follows from the dominated converge theorem that

E[X(0)

]= lim

n→∞E[X(τ ∧ n)

]= E[X(τ)].

Returning to the example discussed just after Figure 3.3 in which a classicalrandom walk X is stopped at

τ , inft ∈ N0 : X(t) = 1

,

the conclusion of Corollary 3.8 implies that E[τ ] =∞. Thus, in the fair cointossing game, waiting to win one Euro takes on average an infinite amountof time! Quite surprising on first sight, right?

70


Theorem 3.9 (Doob’s Maximal Inequalities (Discrete)). Let X = X(t)t∈Tbe a stochastic process, assume that T is finite, and fix α > 0. If X is asubmartingale, then1

αP[maxt∈T

X(t) > α]≤ E

[1maxt∈T X(t)>αX

(max T

)]≤ E

[X(max T

)+].

If on the other hand X is a supermartingale, then

αP[maxt∈T

X(t) > α]≤ E

[X(min T

)]+ E

[X(max T

)−].

Proof. Define a T -valued stopping time τ by

τ , mint ∈ T : X(t) > α

∧max T .

By Lemma 1.24 (Measurability of the Stopped Process), it follows that

F ,

maxt∈T X(t) > α

= X(τ) > α ∈ F(τ).

From this we see in particular that X(τ) > α on F and thus

E[1FX(τ)

]≥ αE[1F ] = αP[F ].

But then, if X is a submartingale, it follows from Theorem 3.6 (OptionalStopping) that

αP[F ] ≤ E[1FX(τ)

]≤ E

[1FE

[X(max T

)∣∣F(τ)]

= E[1FX(max T )

]≤ E

[X(max T )+

]as asserted. If X is a supermartingale, we apply optional stopping twice toarrive at

E[X(min T

)]≥ E

[X(τ)

]= E

[1FX(τ) + 1F cX(τ)

]≥ αP[F ] + E

[1X(τ)≤αX(τ)

]≥ αP[F ] + E

[1X(τ)≤αE

[X(max T )

∣∣F(τ)]]

= αP[F ] + E[1X(τ)≤αX(max T )

].

1Here and subsequently, x+ , maxx, 0 and x− , max−x, 0 denote the positive andnegative part of a real number x, respectively.

71


Upon rearranging and further estimation, this implies that

αP[F ] ≤ E[X(min T

)]− E

[1X(τ)≤αX(max T )

]≤ E

[X(min T

)]+ E

[X(max T )−

]and the proof is complete.

Doob’s maximal inequalities are remarkable because the right hand sides ofthe inequalities only depend on the process X at time min T and max T ,whereas the left hand side involves the whole path X(·). In particular, thecardinality and mesh size of T do not play any role in these bounds.

Theorem 3.10 (Doob’s Lp Inequality (Discrete)). Let X be a martingale oran a.s. nonnegative submartingale and suppose that T is finite. If for somep ∈ (1,∞) we have E[|X(t)|p] <∞ for all t ∈ T , then

E[maxt∈T|X(t)|p

]<∞

andE[maxt∈T|X(t)|p

]1/p

≤ pp−1

E[∣∣X(max T

)∣∣p]1/p

.

Proof. If X is a martingale, |X| is a submartingale by Proposition 3.5 (Sub-martingales via Convex Transformations), and hence it suffices to prove theresult for submartingales which are almost surely nonnegative. We defineY , maxt∈T X(t) and observe that Y ≤

∑t∈T X(t). The discrete version of

Jensen’s inequality thus shows that

E[Y p]≤ E

[Np−1

∑t∈T X(t)p

]= Np−1

∑t∈T E

[X(t)p

]<∞,

where N denotes the number of elements in T . Now use xp =∫ x

0pαp−1dα

and apply Tonelli’s theorem to rewrite

E[Y p]

= pE[∫ Y

0αp−1dα

]= pE

[∫∞0αp−11Y >αdα

]= p

∫∞0αp−1P[Y > α]dα.

Now Doob’s maximal inequality for submartingales allows us to estimatethis by

E[Y p]

= p∫∞

0αp−1P[Y > α]dα ≤ p

∫∞0αp−2E

[1Y >αX

(max T

)]dα.

72

3.3 Paths of Continuous Martingales

Using Tonelli’s theorem once more therefore yields

E[Y p]≤ p

∫∞0αp−2E

[1Y >αX

(max T

)]dα

= pE[∫∞

0αp−21Y >αX

(max T

)dα]

= pE[X(max T

) ∫ Y0αp−2dα

]= p

p−1E[Y p−1X

(max T

)].

Setting q , p/(p− 1) so that 1/p+ 1/q = 1 and using Hölder’s inequality, wecan estimate the last term further and arrive at

E[Y p]≤ p

p−1E[Y p−1X

(max T

)]≤ p

p−1E[Y q(p−1)

]1/qE[X(max T

)p]1/p= p

p−1E[Y p]1−1/p

E[X(max T

)p]1/p.

Now dividing both sides by E[Y p]1−1/p finishes the proof.


Let us now turn to path properties of general continuous martingales. As itturns out, quite similarly to the Brownian special case, paths of continuousmartingales are quite rough. More precisely, a continuous martingale offinite variation are necessarily almost surely constant. To prove this, we firstneed the following useful proposition.

Proposition 3.11 (Orthogonality of Martingale Increments). Let us assumethat X = X(t)t∈T is a martingale with E[|X(t)|2] <∞ for all t ∈ T . Then

(i) for all s, t ∈ T with s < t, it holds that

E[|X(t)−X(s)|2

∣∣F(s)]

= E[X(t)2 −X(s)2

∣∣F(s)]

a.s., and

(ii) for all [t0, t1, . . . , tn] ⊂ T , n ∈ N, it holds that

E[X(tn)2

]= E

[X(t0)2

]+∑n

k=1 E[|X(tk)−X(tk−1)|2

].

In particular, the mapping t 7→ E[X(t)2] is nondecreasing.

73


Proof. Step 1: We prove (i). For this, let s, t ∈ T with s < t. By expandingthe square and using the martingale property of X, we obtain

E[|X(t)−X(s)|2

∣∣F(s)]

= E[X(t)2

∣∣F(s)]− E[2X(s)X(t)

∣∣F(s)]

+ E[X(s)2

∣∣F(s)]

= E[X(t)2

∣∣F(s)]− 2X(s)E[X(t)

∣∣F(s)] +X(s)2

= E[X(t)2

∣∣F(s)]− 2X(s)2 +X(s)2

= E[X(t)2 −X(s)2

∣∣F(s)] a.s.

Step 2: We prove (ii). Let n ∈ N and [t0, t1, . . . , tn] ⊂ T . Then it followsfrom (i) that

E[X(tn)2

]= E

[X(t0)2

]+∑n

k=1

(E[X(tk)

2]− E

[X(tk−1)2

])= E

[X(t0)2

]+∑n

k=1 E[|X(tk)−X(tk−1)|2

].

With the orthogonality of martingale increments at hand, we can now pro-ceed to show that continuous martingales of finite variation are almostsurely constant.

Theorem 3.12 (Continuous Martingales of Finite Variation). Assume thatX = X(t)t∈[0,∞) is a continuous martingale of finite variation. Then

X(t) = X(0) for all t ∈ [0,∞) a.s.

Proof. We can without loss of generality assume that X(0) = 0 (consider theprocess X − X(0) otherwise). We denote by VX = VX(t)t∈[0,∞) the totalvariation process of X defined by

VX(t, ω) , VX(·,ω)(t) for all t ∈ [0,∞),

where we recall that the total variation Vf : [0,∞) → [0,∞] of a functionf : [0,∞)→ R is defined for each t ∈ [0,∞) by

Vf (t) , sup∑n

k=1 |f(tk−1)− f(tk)| : 0 ≤ t0 < . . . < tn ≤ t, n ∈ N.

The assumption that X is of finite variation can then be stated as VX(t) <∞for all t ∈ [0,∞).

74


Step 1: Localization. We show that we may without loss of generality as-sume that both X and VX are uniformly bounded. Indeed, for each n ∈ Nwe can define a stopping time

τn , inft ∈ [0,∞) : |X(t)| ≥ n or VX(t) ≥ n

.

By continuity, the stopped processes X(· ∧ τn) and VX(· ∧ τn) = VX(·∧τn) arebounded by n. If the result now holds for X(· ∧ τn) for all n ∈ N, then itmust also holds for X since τn →∞.

Step 2: Orthogonality of martingale increments. Let us assume that both|X| and VX are bounded by some α > 0 and fix t > 0. For an arbitrarypartition 0 = t0 < t1 < . . . < tn = t of [0, t], Proposition 3.11 (Othogonalityof Martingale Increments) shows that

E[X(t)2

]= E

[∑nk=1 |X(tk)−X(tk−1)|2

]≤ E

[maxk=1,...,n |X(tk)−X(tk−1)|

∑nk=1 |X(tk)−X(tk−1)|

]≤ E

[maxk=1,...,n |X(tk)−X(tk−1)|VX(t)

]≤ αE

[maxk=1,...,n |X(tk)−X(tk−1)|

].

Since X is continuous, it is uniformly continuous on [0, t] and we thereforefind that |X(tk) − X(tk−1)| → 0 as maxk=1,...,n |tk − tk−1| → 0. Since X isuniformly bounded it follows moreover that |X(tk) − X(tk−1)| ≤ 2α for allk = 1, . . . , n. The dominated convergence theorem hence yields

αE[maxk=1,...,n |X(tk)−X(tk−1)|

]→ 0 as maxk=1,...,n |tk − tk−1| → 0.

But this implies that E[X(t)2] = 0, which is only possible if X(t) = 0 almostsurely for all t ∈ [0,∞). Since X is continuous, this shows that X(t) = 0 forall t ∈ [0,∞) almost surely.

Theorem 3.12 is a nice tool when we want wants to show that a givenprocess is not a martingale.

Exercise 41. Let X = X(t)t∈[0,∞) be an Ornstein-Uhlenbeck process anddefine the corresponding Langevin process Y = Y (t)t∈[0,∞) by

Y (t) ,∫ t

0X(s)ds for all t ∈ [0,∞).

Show that Y is a Gaussian process, but not a martingale.

75


Since Brownian motion is a continuous martingale, it follows it has paths ofinfinite variation. One can however show that Brownian paths have finitequadratic variation.

Exercise 42. Let W be a Brownian motion and fix t > 0. For each n ∈ N, let0 = tn0 < tn1 < . . . < tnn = t be a partition of [0, t] such that maxk=1,...,n |tnk −tnk−1| → 0 as n→∞. Show that

limn→∞

E[∣∣∑n

k=1 |W (tk)−W (tk−1)|2 − t∣∣2] = 0.

76

Chapter

4 MARTINGALE CONVERGENCE

In this chapter, we study convergence results for (sub-/super-)martingales.In particular, this will allow us to extend the discrete-time martingale resultsto the continuous time setting. We furthermore analyze the path regular-ity of continuous time submartingales in detail and show that there is a bigclass of submartingales which admit càdlàg modifications. We begin withthe path regularity.

4.1 Upcrossings and Submartingale Limits

In order to quantify how much a path of a stochastic process moves, one cancount how often it crosses a given interval.

Definition 4.1 (Upcrossings). Suppose that X = X(t)t∈T is an R-valuedstochastic process and let α, β ∈ R with α < β. For ω ∈ Ω and n ∈ N0,we say that X(·, ω) has at least n upcrossings through (α, β) if there exists1, . . . , sn, t1, . . . , tn ∈ T with s1 < t1 < s2 < t2 < . . . < sn < tn such that

X(sk, ω) ≤ α and X(tk, ω) ≥ β for all k = 1, . . . , n.

The number of upcrossings through (α, β) is the mapping Uβα : Ω→ N0∪∞

given by

Uβα (ω) , sup

n ∈ N0 : X(·, ω) has at least n upcrossings through (α, β)

.

Note that it is a priori not clear if Uβα is a random variable. We may of

course look at downcrossings instead, which is essentially the same concept.

77

Chapter 4 Martingale Convergence

This is because in between any two upcrossing there must be at least onedowncrossing and vice versa. If a process has many upcrossings through agiven interval, it must be quite "wild". On the other hand, if the number ofupcrossings is small through any interval, the paths of the process are likelyto be quite regular.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

α

β

Time t

Valu

eW

(t)

Figure 4.1 A path of a Brownian motion with three upcrossings through (α, β).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time t

Valu

eX(t)

Figure 4.2 A path of a fractional Brownian motion with sixty-seven upcrossingsthrough (α, β).

78


Recall that we may interpret supermartingales X as unfavorable games. Ifthe player enters the game whenever X falls below the level α and leavesthe game whenever X rises above β, the player is guaranteed to make again of at least β − α during each upcrossing. Sounds too good to be true?It is. Indeed, the problem with this strategy is that the last upcrossing maybe incomplete: The player enters the game, but X never reaches the levelβ again. The loss in this case is bounded from above by (X(max T ) − α)−.Now optional sampling tells us that the losses must on average exceedthe gains (the game is unfavorable!), so we should expect

(β − α)E[Uβα

]≤ E

[(X(max T )− α

)−].

This is the upcrossing inequality, which we shall now prove rigorously.

Lemma 4.2 (Doob’s Upcrossing Inequality). Let X = X(t)t∈T be a super-martingale and assume that T is finite. Let moreover α, β ∈ R with α < β.Then Uβ

α is a random variable and

(β − α)E[Uβα

]≤ E

[(X(max T )− α

)−].

Proof. Step 1: We first construct stopping times characterizing the upcross-ings of X. For this, denote by n ∈ N the number of elements in T . Nowdefine

σ1 , mint ∈ T : X(t) ≤ α

∧max T ,

τ1 , mint ∈ T ∩ (σ1,∞) : X(t) ≥ β

∧max T ,

and, for all k = 1, . . . , n− 1,

σk+1 , mint ∈ T ∩ (τk,∞) : X(t) ≤ α

∧max T ,

τk+1 , mint ∈ T ∩ (σk,∞) : X(t) ≥ β

∧max T .

With this, σk denotes the beginning and τk the completion of the kth up-crossing. We observe that the kth upcrossing is completed if and only ifσk < max T and X(τk) ≥ β. It is obvious that σk, τk, k = 1, . . . , n, areT -valued, satisfy

min T ≤ σ1 ≤ τ1 ≤ σ2 ≤ τ2 ≤ . . . ≤ σn = τn = max T ,

79


and are stopping times by Lemma 1.20 (Hitting Times of Discrete Time Pro-cesses). From this, we also see that Uβ

α is an A-measurable random variablesince

Uβα ≥ m

=⋂mk=1σk < max T ∩ X(τk) ≥ β for m = 1, . . . , n.

Step 2: We now formalize the player’s strategy discussed before the state-ment of this lemma to arrive at the upcrossing inequality. We claim that∑n

k=1

[X(τk)−X(σk)

]≥ (β − α)Uβ

α −(X(max T )− α

)−.

Indeed, it is clear from the definition of σk and τk that

X(τk)−X(σk) ≥ β − α on Uβα ≥ k

as well as

X(τk)−X(σk) = X(max T )−X(max T ) = 0 on Uβα < k − 1.

On Uβα = k − 1, σk = max T we necessarily have τk = max T and hence

X(τk)−X(σk) = 0 ≥ −(X(max T )− α

)− on Uβα = k − 1, σk = max T ,

while the fact that X(σk) ≤ α on σk < max T implies that

X(τk)−X(σk) ≥ X(max T )− α ≥ −(X(max T )− α

)−on Uβ

α = k − 1, σk < max T . Summing these cases over all k = 1, . . . , nhence shows that∑n

k=1

[X(τk)−X(σk)

]≥ (β − α)Uβ

α −(X(max T )− α

)−.

Taking expectations and using Theorem 3.7 (Optional Sampling) then con-cludes the proof since

(β − α)E[Uβα

]− E

[(X(max T )− α

)−] ≤∑nk=1 E[X(τk)]− E[X(σk)] ≤ 0.

The upcrossing inequality allows us to show that paths of submartingalesadmit limits from both the left and the right.

80


Lemma 4.3 (Submartingale Limits). Let X = X(t)t∈T be a submartingaleand assume that T is countable. If

supt∈T

E[X(t)+

]<∞,

the limits1

limt↑t0

X(t) and limt↓t0

X(t) exist in R for all t0 ∈ T ⊂ R a.s.

If, in addition,inft∈T

E[X(t)

]> −∞,

then the above limits exist in R almost surely.

Proof. We set K , supt∈T E[X(t)+

]and fix α, β ∈ Q with α < β.

Step 1: Construction of the nullset N outside of which the limits exist. Forthis, we note that for each finite subset F of T , the process −X(t)t∈F isa supermartingale with respect to F(t)t∈F . Thus we can apply Lemma 4.2(Upcrossing Inequality) and conclude that the number of upcrossings Uβ

α (F)of −X through (α, β) on F satisfies

(β − α)E[Uβα (F)

]≤ E

[(−X(maxF)− α

)−] ≤ K + |α|.

Now choose a sequence Fnn∈N of finite subsets of T such that Fn ↑ T .Given ω ∈ Ω, denote by s, t ∈ T an upcrossing of −X(·, ω) through (α, β)on T . Then there exists n0 ∈ N such that s, t ∈ Fn for all n ≥ n0 and hences, t is also an upcrossing of −X(·, ω) on Fn for all n ≥ n0. This argumentshows that Uβ

α (Fn) ↑ Uβα (T ) = Uβ

α as n → ∞. In particular, Uβα is a random

variable and monotone convergence shows that

0 ≤ (β − α)E[Uβα

]= lim

n→∞(β − α)E

[Uβα (Fn)

]≤ K + |α| <∞.

But then there exists an event Nβα ∈ A with P[Nβ

α ] = 0 such that Uβα (ω) <∞

for all ω 6∈ Nβα . Now define N ,

⋃α,β∈Q,α<β N

βα . Then N ∈ A, P[N ] = 0, and

Uβα (ω) <∞ for all α, β ∈ Q with α < β for all ω ∈ N c.

1Here, T denotes the closure of T with respect to R.

81


Step 2: Convergence in R. Suppose that there exists ω ∈ Ω and a monotonesequence tnn∈N ⊂ T such that X(tn, ω) does not converge in R. Then wecan find α, β ∈ Q such that

lim infn→∞

−X(tn, ω) < α < β < lim supn→∞

−X(tn, ω).

But this can only be the case if Uβα (ω) =∞, i.e. ω ∈ N .

Step 3: Convergence in R. Let us set L , inft∈T E[X(t)] > −∞. Now applyTheorem 3.9 (Doob’s Maximal Inequalities) to the submartingale X(t)t∈Fnand the supermartingale −X(t)t∈Fn to obtain

αP[maxt∈Fn |X(t)| > α

]≤ αP

[maxt∈Fn X(t) > α

]+ αP

[maxt∈Fn −X(t) > α

]≤ E

[X(maxFn)+

]+ E

[−X(minFn)

]+ E

[(−X(maxFn)

)−] ≤ 2K − L

for all α > 0. By σ-continuity of P, this yields

αP[supt∈T |X(t)| > α

]= limn→∞ αP

[maxt∈Fn |X(t)| > α

]≤ 2K − L

and therefore

P[supt∈T |X(t)| =∞

]= limα↑∞ P

[supt∈T |X(t)| > α

]≤ limα↑∞

1α

(2K − L

)= 0.

But then there exists N0 ∈ A with P[N0] = 0 such that supt∈T |X(t, ω)| < ∞for all ω ∈ N c

0 and any existing limit on N c0 must be finite and hence the

result follows by replacing N with N ∪N0.

By Proposition 3.5 (Submartingales via Convex Transformations), X+ is asubmartingale whenever X is one. But then the mapping t 7→ E[X(t)+] isincreasing. Thus, if T has a maximal element, i.e. if sup T ∈ T , then thecondition

supt∈T

E[X(t)+

]<∞

is automatically satisfied.

An immediate corollary of Lemma 4.3 (Submartingale Limits) states thatevery right continuous submartingale must necessarily be a.s. càdlàg.

82

4.2 The Martingale Convergence Theorem

Lemma 4.4 (Submartingale Regularity). Let X = X(t)t∈T be a right con-tinuous submartingale. Then

limt↑t0

X(t) exists in R for all t0 ∈ T with t0 < sup T a.s.

In particular, almost every path of X is càdlàg.

Proof. Let Q be a countable and dense subset of T and select sequencesskk∈N and tkk∈N is Q such that sk ↓ inf T and tk ↑ sup T as well ass1 < t1. Using that t 7→ E[X(t)] and t 7→ E[X(t)+] are nondecreasing (sinceX is a submartingale, and hence also X+ is a submartingale), it follows that,for every k ∈ N, the process X(q)q∈[sk,tk]∩Q satisfies

infq∈[sk,tk]∩Q E[X(q)

]= E

[X(sk)

]≥ −E

[|X(sk)|

]> −∞

as well as

supq∈[sk,tk]∩Q E[X(q)+

]= E

[X(tk)

+]≤ E

[|X(tk)|

]<∞.

This shows that X(q)q∈[sk,tk]∩Q satisfies the assumptions of Lemma 4.3(Submartingale Limits), and hence there exists a nullset Nk ∈ A such that

limq↑t0,q∈Q

X(q, ω) exists in R for every t0 ∈ T with sk < t0 ≤ tk for all ω ∈ N ck .

Now define N ,⋃k∈NNk and observe that P[N ] = 0 and

limq↑t0,q∈Q

X(q, ω) exists in R for every t0 ∈ T with t0 < sup T for all ω ∈ N c.

Since X(·, ω) is right continuous, this in turn shows that

limt↑t0

X(t, ω) exists in R for every t0 ∈ T with t0 < sup T for all ω ∈ N c,

and the proof is complete.

83



To lift the discrete time (sub-)martingale optional stopping theorem to thecontinuous time setting, we need a better understanding of convergence ofsubmartingales. The following theorem provides us with exactly this.

Before stating the result, let us briefly fix some notation by setting

F(inf T ) ,⋂t∈T F(t) whenever inf T 6∈ T

as well as

F(sup T ) , σ(F(t) : t ∈ T

)whenever sup T 6∈ T .

Theorem 4.5 (Martingale Convergence). LetX = X(t)t∈T be a submartin-gale and assume that T is countable.

(i) Decreasing Submartingale Convergence: If inf T 6∈ T andX satisfies

inft∈T

E[X(t)

]> −∞,

there exists an F(inf T )-measurable random variable X(inf T ) withE[|X(inf T )|] <∞ such that

X(t)→ X(inf T ) almost surely and in expectation as t ↓ inf T .

Moreover, the submartingale property extends to inf T , i.e.

X(inf T ) ≤ E[X(t)

∣∣F(inf T )]

a.s. for all t ∈ T .

(ii) Increasing Submartingale Convergence: If sup T 6∈ T andX satisfies

supt∈T

E[X(t)+

]<∞,

there exists an F(sup T )-measurable random variable X(sup T ) withE[|X(sup T )|] <∞ such that

X(t)→ X(sup T ) almost surely as t ↑ sup T .

84


If X is uniformly integrable, then we furthermore have

X(t)→ X(sup T ) in expectation as t ↑ sup T

and the submartingale property extends to sup T , i.e.

X(t) ≤ E[X(sup T )

∣∣F(t)]


Proof. In what follows, we fix t0 ∈ T with inf T < t0 < sup T .

Step 1: Almost sure convergence in (i). Suppose that inf T 6∈ T andinft∈T E[X(t)] > −∞. We define

X(inf T , ω) , limt↓inf T

X(t, ω)1limt↓inf T X(t,ω) exists for all ω ∈ Ω.

Since t0 is the maximal element of T0 , t ∈ T : t ≤ t0, the processX(t)t∈T0 satisfies the assumptions of Lemma 4.3 (Submartingale Limits)and hence we see that limt↓inf T X(t) does not exist is an A-measurablenullset and hence X(inf T ) is a random variable and X(t) → X(inf T ) a.s.for t ↓ inf T . We claim that X(inf T ) is even F(inf T )-measurable. Indeed,

X(inf T ) = limt↓inf T ,t≤s

X(t)1limt↓inf T ,t≤sX(t) exists for all s ∈ T0

from which we infer thatX(inf T ) is F(s)-measurable for all s ∈ T0. But thenX(inf T ) is F(inf T ) =

⋂s∈T F(s) =

⋂s∈T0 F(s)-measurable. Now choose a

sequence tnn∈N ⊂ T such that tn ↓ inf T . By Fatou’s lemma, we have

E[|X(inf T )|

]= E

[lim infn→∞ |X(tn)|

]≤ lim infn→∞ E

[|X(tn)|

]= lim infn→∞ E

[2X(tn)+ −X(tn)

]≤ 2E

[X(t1)+

]− inft∈T E

[X(t)

]<∞,

i.e. E[|X(inf T )|] <∞ as required.

Step 2: Convergence in expectation in (i) if X is a martingale. If X is amartingale, then X(t)t∈T0 is uniformly integrable since

X(t) = E[X(t0)

∣∣F(t)]

a.s. for all t ∈ T0.

But then E[|X(t)−X(inf T )|]→ 0 as t ↓ inf T by Vitali’s theorem.

85


Step 3: Convergence in expectation in (i) if X is a submartingale. Fix ε > 0and choose t0 ∈ T in such a way that

E[X(t0)

]≤ inf

t∈TE[X(t)

]+ 1

4ε.

Now define a martingale M = M(t)t∈T0 by

M(t) , E[X(t0)

∣∣F(t)]

for all t ∈ T0.

Step 2 implies that M(t) converges in expectation as t ↓ inf T , implying thatM(t)t∈T0 is Cauchy for t ↓ inf T , and therefore there exists t1 ∈ T0 suchthat

E[∣∣M(t)−M(t1)

∣∣] ≤ 14ε for all t ≤ t1.

Since X(t) ≤ E[X(t0)|F(t)] = M(t) a.s. for all t ∈ T0 by the submartingaleproperty of X, it follows that

E[∣∣X(t)−M(t)

∣∣] = E[M(t)−X(t)

]= E

[X(t0)

]− E

[X(t)

]≤ infs∈T E

[X(s)

]− E

[X(t)

]+ 1

4ε ≤ 1

4ε for all t ∈ T0.

Combining the previous two estimates thus yields

E[∣∣X(t)−X(s)

∣∣] ≤ E[∣∣X(t)−M(t)

∣∣]+ E[∣∣M(t)−M(t1)

∣∣]+ E

[∣∣M(t1)−M(s)∣∣]+ E

[∣∣M(s)−X(s)∣∣] ≤ ε

for all s, t ∈ T0 with s, t ≤ t1. But then X(t)t∈T must be Cauchy fort ↓ inf T with respect to convergence in expectation. Since the space ofintegrable random variables is complete, this implies the convergence ofX(t)→ X(inf T ) in expectation as t ↓ inf T .

Step 4: The submartingale property in (i). Let s, t ∈ T with s < t andF ∈ F(inf T ). Then F ∈ F(s), and the submartingale property of X and thedefinition of conditional expectation imply that

E[1FX(s)

]≤ E

[1FX(t)

]for all s, t ∈ T with s < t and F ∈ F(inf T ).

Hence, upon sending s ↓ inf T ,

E[1FX(inf T )

]≤ E

[1FX(t)

]for all t ∈ T and F ∈ F(inf T ).

86


By definition of conditional expectation, this shows that

X(inf T ) ≤ E[X(t)

∣∣F(inf T )]


Step 5: Proof of (ii). Suppose that sup T 6∈ T and supt∈T E[X(t)+] < ∞.As in step 1, the submartingale X(t)t∈T ,t≥t0 satisfies the assumptions ofLemma 4.3 (Submartingale Limits) and hence there exists an F(sup T )-measurable random variable X(sup T ) such that

X(t)→ X(sup T ) a.s. as t ↑ sup T .

Again as in step 1, we see that E[|X(sup T )|] < ∞ by using Fatou’s lemmaand E[X(t)] ≥ E[X(t0)] for all t ≥ t0:

E[|X(sup T )|

]= E

[lim infn→∞ |X(tn)|

]≤ lim infn→∞ E

[|X(tn)|

]= lim infn→∞ E

[2X(tn)+ −X(tn)

]≤ 2 supt∈T E

[X(t)+

]− E

[X(t0)

]<∞.

If X(t)t∈T is uniformly integrable, then the convergence in expectationfollows from almost sure convergence and Vitali’s theorem. Now pick t ∈ Tand a sequence snn∈N ⊂ T with t ≤ sn ↑ sup T . Since X(sn)n∈N con-verges in expectation to X(sup T ), it follows that E[X(sn)|F(t)]n∈N con-verges in expectation to E[X(sup T )|F(t)]. Dropping to a subsequence ifnecessary, we find that E[X(sn)|F(t)]n∈N converges to E[X(sup T )|F(t)] al-most surely. But then the submartingale property of X yields

X(t) ≤ limn→∞

E[X(sn)

∣∣F(t)]

= E[X(sup T )|F(t)] a.s.

and the prove is complete.

In the martingale case, the condition inft∈T E[X(t)] > −∞ is obviously sat-isfied since martingales are constant in expectation. Therefore, every mar-tingale converges downward!

Corollary 4.6 (Downward Convergence of Martingales). Let X = X(t)t∈Tbe a martingale, let T be countable and inf T 6∈ T . Then there exists anF(inf T )-measurable random variable X(inf T ) with E[|X(inf T )|] <∞ and

X(t)→ X(inf T ) almost surely and in expectation as t ↓ inf T .

87


Moreover, the martingale property extends to inf T , i.e.

X(inf T ) = E[X(t)

∣∣F(inf T )]


Let us now return to closed martingales. We recall that a martingale X =X(t)t∈T is closed if there exists a random variable Z with E[|Z|] <∞ suchthat

X(t) = E[Z∣∣F(t)

]a.s. for all t ∈ T .

If sup T ∈ T , X is obviously closed by Z = X(sup T ). The following corol-lary characterizes closed martingales.

Corollary 4.7 (Characterization of Closed Martingales). For a martingaleX = X(t)t∈T , the following statements are equivalent:

(i) X is uniformly integrable.

(ii) X(t) converges in expectation to a random variable Z with E[|Z|] <∞as t ↑ sup T .

(iii) X is closed.

Proof. We fix a countable dense subset Q ⊂ T .

Step 1: (i) implies (ii). Applying increasing submartingale convergence tothe uniformly integrable martingale X(q)q∈Q, we find that

X(q)→ Z in expectation as q ↑ sup T through Q

for some F(sup T )-measurable random variable Z with E[|Z|] < ∞. Iftnn∈N ⊂ T is an arbitrary sequence with tn ↑ sup T , the same argument ap-plies to the uniformly integrable martingale X(q)q∈Q′, whereQ′ , Q∪tn :n ∈ N and we conclude that X(tn)→ Z in expectation. Hence (ii) holds.

Step 2: (ii) implies (iii). If sup T ∈ T , then X is trivially closed. If sup T 6∈T , we have

E[1FX(t)

]= E

[1FX(s)

]→ E

[1FZ

]as s ↑ sup T for any t ∈ T and F ∈ F(t).

But then X(t) = E[Z|F(t)] a.s. for all t ∈ T , i.e. X is closed.

88

4.3 Continuous Time Martingales

Step 3: (iii) implies (i). This follows from the uniform integrability of con-ditional expectation since

X(t) = E[Z∣∣F(t)

]a.s. for all t ∈ T .

Observe that if T is countable, then increasing martingale convergence im-plies that we also have almost sure convergence in (ii). If T is uncount-able, this need not be the case!


One of the main applications of the martingale convergence theorem is tolift the discrete time optional stopping theorem to continuous time. Thisis possible whenever the submartingale has sufficiently regular paths (moreprecisely: right continuous paths). After establishing this result, we extendDoob’s inequalities to the continuous time case.

Theorem 4.8 (Optional Stopping (Continuous)). Let X = X(t)t∈[0,∞) bea right continuous submartingale. If σ and τ are bounded stopping timessatisfying σ ≤ τ , we have

X(σ) ≤ E[X(τ)

∣∣F(σ)]

a.s.

Proof. Step 1: Discrete Time Optional Stopping. By Proposition 1.25 (Ap-proximation of Stopping Times), there exist sequences σnn∈N and τnn∈Nof stopping times such that σn ≤ τn for all n ∈ N and

σn(Ω) is finite, σ < σn+1 ≤ σn for all n ∈ N, σ = infn∈N

σn = limn→∞

σn,

τn(Ω) is finite, τ < τn+1 ≤ τn for all n ∈ N, τ = infn∈N

τn = limn→∞

τn.

Moreover, choosing T > 0 such that σ, τ < T , we may assume that σn, τn ≤ Tfor all n ∈ N by replacing σn and τn with σn ∧ T and τn ∧ T , respectively.Applying Theorem 3.6 yields

X(σn) ≤ E[X(τn)

∣∣F(σn)]

a.s. for all n ∈ N.

89


Step 2: Convergence in expectation. By right continuity and since σn ↓ σand τn ↓ τ , it is clear that

X(σn)→ X(σ) and X(τn)→ X(τ) as n→∞.

Now consider the process Y = Y (t)t∈−N given by

Y (t) , X(τ−t) for all t ∈ −N

and define G = G(t)t∈−N by

G(t) , F(τ−t) for all t ∈ −N.

Next, if s, t ∈ −N are such that s < t, then τ−s ≤ τ−t and thus G(s) ⊂ G(t),i.e. G is a filtration and Y is clearly adapted to G. Moreover, by step 1,

Y (s) = X(τ−s) ≤ E[X(τ−t)

∣∣F(τ−s)]

= E[Y (t)

∣∣G(s)]

a.s.,

i.e. Y is a G-submartingale. Since E[X(0)] ≤ E[X(τ−t)] = E[Y (t)] for allt ∈ −N by optional sampling, we see that inft∈−N E[Y (t)] > −∞ and wecan apply decreasing submartingale convergence (Theorem 4.5) to see thatY (t)t∈−N = X(τn)n∈N converges almost surely and in expectation toX(τ) as n → ∞. Replacing τn by σn and τ by σ, the same argument alsoshows that X(σn)→ X(σ) in expectation as n→∞.

Step 3: The submartingale property at between σ and τ . Since F(σ) ⊂ F(σn)for all n ∈ N, step 1 shows that

E[1FX(σn)

]≤ E

[1FX(τn)

]for all F ∈ F(σ) ⊂ F(σn).

Since X(σn)→ X(σ) and X(τn)→ X(τ) in expectation, this implies

E[1FX(σ)

]≤ E

[1FX(τ)

]for all F ∈ F(σ),

i.e. X(σ) ≤ E[X(τ)∣∣F(σ)] as claimed.

Comparing the discrete time and the continuous time optional stopping the-orems, we see that the finiteness of the range of the stopping times in thediscrete time case is replaced by boundedness of the stopping times andright continuity of the submartingale in the continuous time case. With this

90


in mind, we see that the finiteness assumption in the discrete time case canbe thought of as a boundedness assumption on the stopping times as well.

The optional sampling theorem in continuous time is now straightforward,for if for all bounded stopping times σ and τ with σ ≤ τ we have

E[X(σ)

]≤ E

[X(τ)

],

then this inequality obviously also holds for all stopping times taking valuesin a finite subset of T .

Corollary 4.9 (Optional Sampling (Continuous)). Let X = X(t)t∈[0,∞) bean adapted and right continuous stochastic process. Then X is a submartingaleif and only if, for all bounded stopping times σ, τ with σ ≤ τ , it holds thatE[|X(σ)|],E[|X(τ)|] <∞ and

E[X(σ)

]≤ E

[X(τ)

].

In particular, if τ is an arbitrary stopping time and X a right continuous sub-martingale, the stopped process Xτ = X(t ∧ τ)t∈[0,∞) is a submartingale.

Proof. This follows from the continuous optional stopping theorem (Theo-rem 4.8) and the discrete optional sampling theorem (Theorem 3.7).

Since any uncountable time index set can be approximated from below byfinite subsets, it is not very difficult to extend Doob’s inequalities to thecontinuous time setting provided that the process is right continuous. Weexemplify this in the case of Doob’s Lp-inequality and leave the maximalinequalities as an exercise.

Theorem 4.10 (Doob’s Lp Inequality (Continuous)). Let X = X(t)t∈[0,∞)

be a right continuous martingale or an a.s. nonnegative and right continuoussubmartingale. If for some p ∈ (1,∞) and T > 0 we have E[|X(t)|p] < ∞ forall t ∈ [0, T ], then

E[

supt∈[0,T ]

|X(t)|p]<∞

andE[

supt∈[0,T ]

|X(t)|p]1/p

≤ pp−1

E[∣∣X(T )

∣∣p]1/p

.

91


Proof. Since X is right continuous, we have

supt∈[0,T ]

|X(t)| = supq∈([0,T )∩Q)∪T

|X(q)| = supq∈F∪T⊂([0,T )∩Q)∪T finite

|X(q)|.

By monotone convergence we therefore have

E[

supt∈[0,T ]

|X(t)|p]1/p

= supF⊂[0,T )∩Q finite

E[

maxt∈F∪T

|X(t)|p]1/p

.

Now Theorem 3.10 (Doob’s Lp-Inequality in Discrete Time) shows that

E[

maxt∈F∪T

|X(t)|p]1/p

≤ pp−1

E[∣∣X(T )

∣∣p]1/p

for any finite subset F of [0, T )∩Q. Together with the previous identity thisconcludes the proof.

Exercise 43 (Doob’s Maximal Inequalities (Continuous)). For T > 0, letX = X(t)t∈[0,T ] be a right continuous stochastic process and fix α > 0. Showthat if X is a submartingale, then

αP[

supt∈[0,T ]

X(t) > α]≤ E

[1supt∈[0,T ]X(t)>αX(T )

]≤ E

[X(T )+

],

and if X is a supermartingale, then

αP[

supt∈[0,T ]

X(t) > α]≤ E

[X(0)

]+ E

[X(T )−

].

4.4 Càdlàg Modifications

Taking a close look at the continuous time results for submartingales, wesee that we usually require right continuity of paths. So quite naturally thequestion arises when and if a submartingale is right continuous, or, moreprecisely, in which cases a submartingale has a right continuous modifica-tion which is again a submartingale. As we shall see in this section, this isquite often the case.

92


We first recall that Lemma 4.4 (Submartingale Regularity) implies that ev-ery right continuous submartingale has a.s. càdlàg paths. Hence, whenevera submartingale admits a right continuous modification which is again asubmartingale, then this immediately implies that the modification can bechosen to be càdlàg (if the filtration is complete). The following main the-orem of this section on the existence of càdlàg modifications can thereforealso be thought of as a result on the existence of right continuous modifica-tions.

Theorem 4.11 (Càdlàg Modifications of Submartingales). Suppose that X =X(t)t∈[0,∞) is a submartingale and that the filtration F = F(t)t∈[0,∞) isright continuous and complete. Assume moreover that the mapping

t 7→ E[X(t)

], t ∈ [0,∞)

is right continuous. Then there exists a modification Y = Y (t)t∈[0,∞) of Xsuch that Y is a càdlàg submartingale (with respect to F).

Proof. Step 1: Construction of Y . For each k ∈ N, the process X(q)q∈[0,k]∩Qsatisfies the assumptions of Lemma 4.3 (Submartingale Limits) since

−∞ < E[X(0)

]≤ E

[X(q)

]≤ E

[X(q)+

]≤ E

[X(k)+

]<∞, q ∈ [0, k] ∩Q.

Hence there exists Nk ∈ A with P[Nk] = 0 such that for all ω ∈ N ck

limq↑t,q∈[0,k]∩Q

X(q, ω) and limq↓t,q∈[0,k]∩Q

X(q, ω) exist in R for all t ∈ [0, k].

But then, setting N ,⋃k∈NNk, we have P[N ] = 0 and for all ω ∈ N c

limq↑t,q∈[0,∞)∩Q

X(q, ω) and limq↓t,q∈[0,∞)∩Q

X(q, ω) exist in R for all t ∈ [0,∞).

Now define a stochastic process Y = Y (t)t∈[0,∞) by

Y (t, ω) ,

limq↓t,q∈[0,∞)∩QX(q, ω) for all ω ∈ N c

0 for all ω ∈ Nfor all t ∈ [0,∞).

Step 2: Properties of Y . It is clear that Y is right continuous and

lims↑t

Y (s, ω) = limq↑t,q∈[0,∞)∩Q

X(q, ω) exists for all ω ∈ N c,

93


which is to say that Y is even càdlàg. Next, Y is adapted since F is rightcontinuous and complete. Indeed, if t ∈ [0,∞), we have N c ∈ F(t) bycompleteness and

Y (t) = 1Nc limq↓t,q∈[t,s]∩Q

X(q) is F(s)-measurable for all s ∈ (t,∞).

But then Y (t) is F(t) = F(t+) =⋂s>t F(s)-measurable. By decreasing sub-

martingale convergence we have, for all t ∈ [0,∞),

X(q)→ Y (t) in expectation as q ↓ t through [0,∞) ∩Q.

In particular, we have E[|Y (t)|] < ∞ for all t ∈ [0,∞). Moreover, sincefor every s, t ∈ [0,∞) and p, q ∈ [0,∞) ∩ Q with s < p < t < q we haveX(p) ≤ E[X(q)|F(p)] a.s., we have

X(p) ≤ E[Y (t)

∣∣F(p)]

a.s.

since convergence of X(q) → Y (t) in expectation implies E[X(q)|F(p)] →E[Y (t)|F(p)] in expectation and hence almost sure convergence of a subse-quence. Now E[Y (t)|F(p)]p∈(s,t]∩Q is a martingale, and hence decreasingmartingale convergence and X(p)→ Y (s) a.s. shows that

Y (s) ≤ E[Y (t)

∣∣F(s)]

a.s. for all s, t ∈ [0,∞) with s < t.

Thus Y is a càdlàg submartingale, and it remains to verify that Y is a modi-fication of X.

Step 3: Y is a modification of X. Since X(t) ≤ E[X(q)|F(t)] a.s. for all q ∈ Qwith q > t, we obtain from X(q)→ X(t) in expectation as q ↓ t and the factthat Y is adapted that

X(t) ≤ E[Y (t)

∣∣F(t)]

= Y (t) a.s. for all t ∈ [0,∞).

On the other hand, we have

E[∣∣X(t)− Y (t)

∣∣] = E[Y (t)−X(t)

]= lim

q↓t,q∈[0,∞)∩QE[X(q)

]− E

[X(t)

]= 0

by the right continuity of the map t 7→ E[X(t)]. But then X(t) = Y (t) a.s.for all t ∈ [0,∞), i.e. Y is a modification of X and the proof is complete.

94


Since every martingale is constant in expectation and hence the mappingt 7→ E[X(t)], t ∈ [0,∞), is in particular right continuous, it follows thatevery martingale with respect to a right continuous and complete filtrationhas a càdlàg modification.

Corollary 4.12 (Càdlàg Modifications of Martingales). Suppose that X =X(t)t∈[0,∞) is a martingale with respect to a right continuous and completefiltration F = F(t)t∈[0,∞). Then there exists a modification Y = Y (t)t∈[0,∞)

of X such that Y is a càdlàg martingale (with respect to F).

95

Chapter

5 LÉVY PROCESSES

We now turn to another important class of stochastic processes: Lévy pro-cesses. Lévy processes can be thought of as continuous time analoguesof the random walk as they have independent and stationary increments.These two properties make Lévy processes very important for applications asthey imply that models based on Lévy processes are typically very tractable.

We have thus far encountered two Lévy processes in disguise: Brownianmotion and the renewal process constructed from exponential random vari-ables (which is usually rather referred to as the Poisson process). As weshall see, these two processes are the fundamental building blocks of Lévyprocesses in the following sense. First, if one replaces the fixed jump size ofone of the Poisson process by jumps sizes given by a sequence of indepen-dent and identically distributed random variables, one obtains the so-calledcompound Poisson process. It is then possible to show that every Lévyprocess is the limit (in a suitable sense) of the superposition of a Brown-ian motion with drift and independent compound Poisson processes. Thisis the so-called Lévy-Ito Decomposition. As a corollary, we obtain an ex-plicit representation of the characteristic function of the distribution of anyLévy process which depends on only three parameters – the characteristictriplet of the Lévy process. This result is referred to as the Lévy-Khintchineformula. Once we have established this formula, we then turn the tablesaround and show that, for any possible choice of the characteristic triplet,there exists a Lévy process which is characterized by it.

We begin our tour through the theory of Lévy processes with their defini-tion, and then take a closer look at the Poisson and the compound Poissonprocess. Throughout this chapter, we fix a filtration F = F(t)t∈[0,∞) whichis both right continuous and complete.

97

Chapter 5 Lévy Processes


Definition 5.1 (Lévy Process). An F-adapted process X = X(t)t∈[0,∞) tak-ing values in Rd is called a Lévy process (with respect to F) if

(L1) X(0) = 0,

(L2) X(t)−X(s) is independent of F(s) for all s, t ∈ [0,∞) with s < t,

(L3) X has stationary increments, i.e. X(t2 + s) − X(t1 + s) has the samedistribution as X(t2)−X(t1) for all s, t1, t2 ∈ [0,∞) with t1 < t2,

(L4) X is càdlàg.

From Definition 5.1 it is immediately clear that Brownian motion is a contin-uous Lévy process, and hence it seems apparent that there exists at least oneLévy process. However, since we have constructed Brownian motion withrespect to its natural filtration FW which is neither complete nor rightcontinuous, we have to be careful.

We resolve this matter by showing that if X is a Lévy process with respectto an arbitrary filtration G = G(t)t∈[0,∞), then X is also a Lévy pro-cess with respect to the smallest right continuous and complete filtrationF = F(t)t∈[0,∞) which contains G. More precisely, let us denote by N thesystem of all P-nullsets given by

N ,N ∈ A : P[N ] = 0

.

Then we define the filtration F by

F(t) ,⋂s>t σ

(G(s) ∪N

)for all t ∈ [0,∞).

It is easy to see that F is indeed the smallest filtration containing G which isright continuous and complete.

Exercise 44. Show that F is the smallest right continuous and complete filtra-tion containing F.

We then have the following result, which implies existence of a Brownianmotion with respect to a right continuous and complete filtration and henceexistence of at least one Lévy process.

98


Proposition 5.2 (Lévy Processes with Right Continuous and Complete Fil-trations). Let X be a Lévy process with respect to an arbitrary filtration G.Then X is also a Lévy process with respect to F, the smallest right continuousand complete filtration containing G.

Proof. Clearly, X is F-adapted since G(t) ⊂ F(t) for all t ∈ [0,∞), andthus we only have to show that X(t) − X(s) is independent of F(s) for alls, t ∈ [0,∞) with s < t. For this, let N ∈ N be a nullset and A ∈ G(r) forall r > s, implying that A ∪ N ∈ F(s). Let now B ∈ B(Rd) and choose asequence rnn∈N with s < rn < t for all n ∈ N and rn ↓ s. Since N is anullset and X has independent increments with respect to G, it follows that

P[X(t)−X(rn) ∈ B,A ∪N

]= P

[X(t)−X(rn) ∈ B,A

]= P

[X(t)−X(rn) ∈ B

]P[A]

= P[X(t)−X(rn) ∈ B

]P[A ∪N ].

Thus, for each n ∈ N, X(t)−X(rn) is independent of F(s). This is equivalentto saying that

E[1Ff

(X(t)−X(rn)

)]= P[F ]E

[f(X(t)−X(rn)

)]for all n ∈ N

for any choice of F ∈ F(s) and any f : Rd → R bounded and continuous.Sending n → ∞ and using the continuity of f , the right continuity of X,and dominated convergence hence yields

E[1Ff

(X(t)−X(s)

)]= P[F ]E

[f(X(t)−X(s)

)].

Therefore, X(t)−X(s) is independent of F(s) and the proof is complete.

Another consequence of the assumption that our filtration is complete is thefollowing: If τ is an F-stopping time and if σ : Ω → [0,∞] is a randomvariable with σ = τ a.s., then the completeness of F implies that σ is also astopping time with respect to F.

Exercise 45. Let τ be an F-stopping time and σ : Ω → [0,∞] be a randomvariable with σ = τ almost surely. Show that σ is a F-stopping time.

Let us now proceed with more examples of Lévy processes. One such exam-ple is the Poisson process.

99


Definition 5.3 (Poisson Process). An F-adapted process N = N(t)t∈[0,∞)

with values in N0 is called a Poisson process with intensity λ ≥ 0 (withrespect to F) if, for all s, t ∈ [0,∞) with s < t,

(N1) N(0) = 0,

(N2) N(t)−N(s) is independent of F(s),

(N3) N(t)−N(s) is Poisson distributed with parameter λ(t− s),

(N4) N is nondecreasing and right continuous, i.e. a counting process.

Since the monotonicity and right continuity imply that the Poisson processhas existing left limits, we see that it is càdlàg and hence a Lévy process.In the case of λ = 0, the Poisson distribution degenerates to a Dirac distri-bution at zero, in which case N(t) = 0 for all t ∈ [0,∞) almost surely.

With all the tools we have available by now, we have two straightforwardways to construct a Poisson process. The first approach consists of com-puting the finite-dimensional distributions of the Poisson process and thenusing Theorem 2.9 (Kolmogorov’s Consistency Theorem) to construct a rawPoisson process satisfying (N1) to (N3). Since the raw Poisson process is a.s.nondecreasing, it follows immediately that it is a submartingale. But then,since by (N1) and (N3) the mapping

t 7→ E[N(t)

]= E

[N(t)−N(0)

]= λt

is (right) continuous, Theorem 4.11 (Càdlàg Modifications of Submartin-gales) implies that it has a càdlàg modification. This càdlàg modification isthen a Poisson process with respect to its natural filtration and we concludeby Proposition 5.2.

The second construction is based on showing that the Poisson process Nhas the same finite-dimensional distributions as a renewal process X con-structed from exponential random variables and hence their distributionscoincide. This immediately implies that the renewal process satisfies theproperties (N2) to (N4) (with respect to its natural filtration). In general,however, the renewal process only satisfies X(0) = 0 a.s. and X(t) <∞ a.s.,so one has to redefine X on a nullset to obtain the Poisson process. Thisargument shows in particular that the renewal process constructed from ex-ponential random variables and the Poisson process are indistinguishable.We leave the rigorous proof of these arguments as an exercise.

100


Exercise 46. Show that the Poisson process with intensity λ > 0 and the re-newal process X constructed from exponential random variables with param-eter λ have the same finite-dimensional distributions. Moreover, show that theevents X = 0 and X(t) = ∞ for some t ∈ [0,∞) are A-measurable P-nullsets. Use this to construct a Poisson process N with intensity λ from X andshow that N and X are indistinguishable.

Since the Poisson process is indistinguishable from a renewal process, itfollows that the jumps of the Poisson process are a.s. equal to one. If weallow for more general jumps, we obtain the compound Poisson process.

Definition 5.4 (Compound Poisson Process). Let N be a Poisson process withintensity λ ≥ 0 and let Znn∈N be a sequence of independent and identicallydistributed random variables taking values in Rd. Assume moreover that Nand Znn∈N are independent. The process X = X(t)t∈[0,∞) given by

X(t) ,∑N(t)

n=1 Zn for all t ∈ [0,∞)

is called compound Poisson process.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

0

2

4

6

8

Time t

Valu

esN(t)

andX(t)

N(t)

X(t)

Figure 5.1 Path of a compound Poisson process X with standard normally dis-tributed jump sizes and path of the underlying Poisson process N .

101


We refer to the distribution of Z1 as the jump size distribution of the com-pound Poisson process. Since the sum of independent and identically dis-tributed random variables is a random walk (if d = 1), it follows that thecompound Poisson process is a random walk in which the time index t isreplaced by the value N(t) of the Poisson process at time t. More precisely,let Y = Y (t)t∈N0 be the random walk associated with Znn∈N given by

Y (t) ,∑t

n=1 Zn for all t ∈ N0.

It follows that

Y(N(t)

)=∑N(t)

n=1 Zn = X(t) for all t ∈ [0,∞).

The principle of concatenating two stochastic processes N and Y to obtaina new stochastic process X , Y N is called (random) time change orsubordination. To make sure the concatenation is well-defined, one has torequire that the process Y , the so-called subordinator, takes values in thetime index set of X. Moreover, one typically requires that the subordinatorY is nondecreasing so as to ensure that the order on the time index setremains unaltered. In this situation, the subordinator Y can be thought ofas a (random) rescaling of the time index set of the process X.

The compound Poisson process inherits several properties from its under-lying Poisson process: X(0) = 0, it is càdlàg, and it has independent in-crements with respect to its natural filtration FX . On the other hand, theincrements need no longer be Poisson distributed (since X may take val-ues in all of Rd instead of just N0) and, unless the jump size distribution isnonnegative, X is no longer nondecreasing. In particular, it is in generalno longer a counting process. Nevertheless, since the jumps Znn∈N areindependent and identically distributed, the compound Poisson process hasstationary increments and is thus a Lévy process.

From the definition of the compound Poisson process it is easy to see thatnot every Lévy process X is integrable, i.e. it does in general not hold that

E[|X(t)|

]<∞ for all t ∈ [0,∞).

Indeed, consider a compound Poisson process X with a jump size distribu-tion which is not integrable. For example, one can choose Z1 to be Cauchydistributed, i.e. the density f : R→ [0,∞) of Z1 is given by

f(x) ,1

π

α

α2 + (x− β)2for all x ∈ R,

102

5.2 Properties of Lévy Processes

where α > 0 and β ∈ R. With this, it is easy to see that E[|Z1|] = ∞.Fixing t ∈ (0,∞), it follows that from the independence of Znn∈N and theunderlying Poisson process N that

E[|X(t)|

]≥ E

[|X(t)|1N(t)=1

]= E

[|Z1|1N(t)=1

]= P

[N(t) = 1

]E[|Z1|

]=∞.

We now have several examples of Lévy processes: Brownian motion, thePoisson process, and the compound Poisson process. From these, we canconstruct more Lévy processes by taking linear combinations.

Exercise 47. For n ∈ N, let X1, . . . , Xn be Lévy processes and z1, . . . , zn ∈ R.Define X = X(t)t∈[0,∞) by

X(t) ,∑n

i=1 ziXi(t) for all t ∈ [0,∞).

Show that X is a Lévy process if X1, . . . , Xn are independent.


We proceed to study some properties of Lévy processes. More precisely, weshall see in this section that every Lévy process is automatically continuousin probability, has almost surely no jump at any fixed time t ∈ (0,∞), andexpectation and covariance matrix of a Lévy process as functions of timeare linear – provided of course that they exist. Finally, we show that thedistribution of any Lévy process is uniquely determined by the distributionof the process at time t = 1.

Definition 5.5 (Continuity in Probability). A process X = X(t)t∈[0,∞) tak-ing values in a metric space (S, d) is called continuous in probability if

lims→t

P[d(X(s), X(t)) ≥ ε

]= 0 for all ε > 0 and all t ∈ [0,∞).

For Rd-valued processes with stationary increments, we know that X(t) −X(s) has the same distribution as X(t − s) if t ≥ s. Hence, continuity

103


in probability of a process with stationary increments is equivalent to thecondition

limt↓0

P[|X(t)−X(0)| ≥ ε

]= 0 for all ε > 0.

Lemma 5.6 (Continuity in Probability for Lévy Processes). Every Lévy processX is continuous in probability.

Proof. Since X is right continuous, it follows that |X(t, ω)| → 0 as t ↓ 0 forall ω ∈ Ω, and hence in particular in probability. Since X has stationaryincrements this implies the result.

Continuity in probability implies in particular that if we fix some t ∈ (0,∞),then X has a.s. no jump at time t. Subsequently, we write

X(0−) , X(0) and X(t−) , lims↑t

X(s) for all t ∈ (0,∞)

whenever X = X(t)t∈[0,∞) is a stochastic process with existing left limits.

Corollary 5.7 (Non-Deterministic Jump Times of Lévy Processes). Let X bea Lévy process. Then

P[X(t) = X(t−)

]= 1 for all t ∈ [0,∞).

Proof. This follows immediately from the subadditivity of P and the conti-nuity in probability of X since for all t ∈ (0,∞)

P[X(t) 6= X(t−)

]= P

[lims↑t

∣∣X(s)−X(t)∣∣ > 0

]= P

[⋃n∈N

lims↑t∣∣X(s)−X(t)

∣∣ > 1/n]

≤∑∞

n=1 P[lims↑t

∣∣X(s)−X(t)∣∣ > 1/n

]=∑∞

n=1 lims↑t P[∣∣X(s)−X(t)

∣∣ > 1/n]

= 0.

Another simple observation is that the mappings

t 7→ E[X(t)

]and t 7→ Cov

[X(t)

]must be linear (if they exist) for any Lévy process X.

104


Lemma 5.8 (Linearity of Expectation and Covariance of Lévy Processes).Let X be a Lévy process with E[|X(t)|] <∞ for all t ∈ [0,∞). Then

E[X(t)

]= µt for all t ∈ [0,∞),

where µ , E[X(1)]. If, moreover, E[|X(t)|2] <∞ for all t ∈ [0,∞), then

Cov[X(t)

]= Σt for all t ∈ [0,∞)

with Σ , Cov[X(1)] ≥ 0.

Proof. Define mappings f : [0,∞)→ Rd and g : [0,∞)→ Rd×d by

f(t) , E[X(t)

]and g(t) , Cov

[X(t)

]for all t ∈ [0,∞).

Now fix s, t ∈ [0,∞). The stationary increments property of X yields

f(t+ s) = E[X(t+ s)

]= E

[X(t+ s)−X(s)

]+ E

[X(s)

]= E

[X(t)

]+ E

[X(s)

]= f(t) + f(s),

hence f is linear. Since f(0) = E[X(0)] = 0, this implies that f(t) = µt forµ , E[X(1)]. Similarly, using the independent and stationary incrementsproperty of X, it follows that

g(t+ s) = Cov[X(t+ s)

]= Cov

[X(t+ s)−X(s)

]+ Cov

[X(s)

]= Cov

[X(t)

]+ Cov

[X(s)

]= g(t) + g(s).

Hence g is linear as well. Since g(0) = Cov[X(0)] = 0, g must take the formg(t) = Σt where Σ , Cov[X(1)].

Lemma 5.8 shows that the expectations and covariance matrices of a LévyprocessX are determined by the expectation and covariance matrix ofX(1).As it turns out, a much more general result is true: the entire distributionof X is determined by the distribution of X(1).

Proposition 5.9 (Infinite Divisibility and One-Dimensional Distributions).Let X be a Lévy process and, for each t ∈ [0,∞), denote by ϕX(t) : Rd → C thecharacteristic function of X(t). Then the following holds:

105


(i) For all t ∈ [0,∞), we have

ϕX(t)(z) = ϕX(1)(z)t for all z ∈ Rd.

(ii) The distribution of X is uniquely determined by the distribution ofX(1).

Proof. Step 1: Proof of (i). The result is clear for t = 0. Fix t ∈ (0,∞)and z ∈ Rd and observe that the independent and stationary incrementsproperty of X implies that

ϕX(t)(z) = E[ei〈z,X(t)〉] = E

[ei〈z,

∑nk=1[X(kt/n)−X((k−1)t/n)]〉]

=∏n

k=1 E[ei〈z,[X(kt/n)−X((k−1)t/n)]〉]

= E[ei〈z,X(t/n)〉]n = ϕX(t/n)(z)n (5.1)

for all n ∈ N. Thus, if t is a natural number, the result follows by choosingn = t. If t is a rational number of the form t = p/q for p, q ∈ N, first choosen = p so that

ϕX(t)(z) = ϕX(p/q)(z) = ϕX(1/q)(z)p.

Now applying Equation (5.1) with t = 1 and n = q shows that

ϕX(t)(z) = ϕX(1/q)(z)p = ϕX(1)(z)p/q = ϕX(1)(z)t.

Finally, if t is irrational, there exists a sequence rkk∈N ⊂ (t,∞) ∩ Q withrk ↓ t. Since X is right continuous, it follows that X(rk) → X(t) pointwise,hence in particular also in distribution. But this implies that

ϕX(t)(z) = limk→∞

ϕX(rk)(z) = limk→∞

ϕX(1)(z)rk = ϕX(1)(z)t.

Step 2: Proof of (ii). Let n ∈ N and fix [t0, t1, . . . , tn] ⊂ [0,∞) and z =(z0, z1, . . . , zn) ∈ Rd×(n+1). Writing Y = (X(t0), . . . , X(tn)) and zk ,

∑nj=k zj

for all k = 0, 1, . . . , n, it follows from independence and stationarity of in-crements

ϕY (z) = E[ei

∑nk=0〈zk,X(tk)〉]

= E[ei

∑nk=0〈zk,X(tk)−X(tk−1)〉]

=∏n

k=0 E[ei〈zk,X(tk−tk−1)〉]

=∏n

k=0 ϕX(tk−tk−1)(zk) =∏n

k=0 ϕX(1)(zk)tk−tk−1 .

Hence the finite-dimensional distributions ofX are determined byX(1).

106

5.3 The Strong Markov Property

The property (i) in the previous proposition is often referred to as infinitedivisibility of the distribution of the Lévy process. More generally, we saythat the distribution of an Rd-valued random variable Y is infinitely divisi-ble, if for each n ∈ N there exists a random variable Yn such that

ϕY (z) = ϕYn(z)n for all z ∈ Rd.

Since Brownian motion and the Poisson process are Lévy processes, it fol-lows immediately that the normal distribution and the Poisson distribu-tion are infinitely divisible.


We now turn to the so-called strong Markov property of Lévy processes.Roughly speaking, it states that if we restart a Lévy process at any finite stop-ping time, this restartet process is again a Lévy process with the same distri-bution as the original process and independent of the information availableup to the stopping time. We prove this result in two steps: First, we considerthe case when the stopping time is deterministic (in which case we simplyspeak of the Markov property), then we extend the result to general finitestopping times.

Theorem 5.10 (Markov Property of Lévy Processes). Let X be a Lévy processand T ∈ [0,∞). Define a stochastic process Y = Y (t)t∈[0,∞) by

Y (t) , X(T + t)−X(T ) for all t ∈ [0,∞).

Then Y is a Lévy process with respect to FT = F(T + ·) = F(T + t)t∈[0,∞), Yand X are identically distributed, and Y is independent of F(T ).

Proof. Step 1: To see that Y is a Lévy process, we first observe that Y (0) = 0,Y is càdlàg since X is càdlàg, and Y is adapted to FT . Therefore, it sufficesto show that Y satisfies (L2) and (L3). Regarding (L3), let s, t ∈ [0,∞) withs < t. Using the stationary increments property of X, we see that

Y (t)− Y (s) = X(T + t)−X(T + s)

107


andY (t− s) = X(T + t− s)−X(T )

are both identically distributed with X(t−s) and hence Y satisfies (L3). Re-garding (L2), fix s, t ∈ [0,∞) with s < t. Then Y (t)−Y (s) = X(T+t)−X(T+s) is independent of FT (s) = F(T + s). To see that Y and X are identicallydistributed, it suffices to show that Y (1) and X(1) are identically distributedby Proposition 5.9 (Infinite Divisibility and One-Dimensional Distributions).But stationarity of the increments of X and Y (1) = X(T + 1)−X(T ) imme-diately imply this statement.

Step 2: Y is independent of F(T ). Let A ∈ F(T ), set YA , 1A and fix[t1, t2] ⊂ [0,∞). Since Y is independent of F(T ), we have

ϕ(YA,Y (t1))(zA, z1) = ϕYA(zA)ϕY (t1)(z1) for all zA ∈ R, z1 ∈ Rd.

Moreover, since Y (t2) − Y (t1) is independent of FT (t1) = F(T + t1) and(YA, Y (t1)) is F(T + t1)-measurable, it follows for all zA ∈ R and z1, z2 ∈ Rd

ϕ(YA,Y (t1),Y (t2))(zA, z1, z2) = ϕ(YA,Y (t1),Y (t2)−Y (t1))(zA, z1 + z2, z1)

= ϕ(YA,Y (t1))(zA, z1 + z2)ϕY (t2)−Y (t1)(z1)

= ϕYA(zA)ϕY (t1)(z1 + z2)ϕY (t2)−Y (t1)(z1)

= ϕYA(zA)ϕ(Y (t1),Y (t2)−Y (t1))(z1 + z2, z1)

= ϕYA(zA)ϕ(Y (t1),Y (t2))(z1, z2),

so (Y (t1), Y (t2)) is independent of A, hence of F(T ). Iteratively, this showsthat (Y (t1), . . . , Y (tn)) is independent of F(T ) for all [t1, . . . , tn] ⊂ [0,∞),n ∈ N, which establishes the claim.

For the special case of X being a Brownian motion, Theorem 5.10 is (al-most) old news: This was already proved in part in Exercise 18 based on thecharacterization of Brownian motion as a Gaussian process. Theorem 5.10therefore provides an alternative proof based on independence and station-arity of increments.

Before proceeding, let us remark that for general processes X which are notnecessarily Lévy, the Markov property is typically formulated in a differentway. Namely, one says that X satisfies the Markov property if

E[f(X(t)

)∣∣F(s)]

= E[f(X(t)

)∣∣σ(X(s))]

a.s. for all s, t ∈ [0,∞) (5.2)

108


for all functions f : S → R bounded and measurable. It is however easy tosee that Theorem 5.10 implies this statement. Indeed, letting s, t ∈ [0,∞)and f : Rd → R bounded and measurable, Theorem 5.10 with T , s shows

E[f(X(t)

)∣∣F(s)]

= E[f(X(s) +X(t)−X(s)

)∣∣F(s)]

= g(X(s)

)= E

[f(X(s) +X(t)−X(s)

)∣∣σ(X(s))]

= E[f(X(t)

)∣∣σ(X(s))]

a.s.,

where the function g : Rd → R is defined as

g(x) , E[f(x+X(t)−X(s)

)]for all x ∈ Rd.

In general, however, the statement of Theorem 5.10 is stronger than thestatement in Equation (5.2).

The next step is to extend the Markov property to arbitrary finite stoppingtimes. The proof is based on the idea of approximating the stopping timeby stopping times taking only countably many values.

Theorem 5.11 (Strong Markov Property of Lévy Processes). Let X be a Lévyprocess and τ be a [0,∞)-valued stopping. Define Y = Y (t)t∈[0,∞) by

Y (t) , X(τ + t)−X(τ) for all t ∈ [0,∞).

Then Y is a Lévy process with respect to Fτ = F(τ + ·) = F(τ + t)t∈[0,∞), Yand X are identically distributed, and Y is independent of F(τ).

Proof. Step 1: Approximation of τ . Given n ∈ N, we define

τn ,∑∞

m=1 m2−n1τ∈[(m−1)2−n,m2−n).

Similarly to Proposition 1.25 (Approximation of Stopping Times), the se-quence τnn∈N satisfies τ < τn for all n ∈ N, τn ↓ τ , and τn(Ω) is countablefor all n ∈ N. Now define Yn = Yn(t)t∈[0,∞) by

Yn(t) , X(τn + t)−X(τn) for all t ∈ [0,∞).

It follows immediately that Yn can be represented as

Yn(t) =∑∞

m=1 Ymn (t)1τn=m2−n for all t ∈ [0,∞),

109


where

Y mn (t) , X(m2−n + t)−X(m2−n) for all t ∈ [0,∞).

By Theorem 5.10 (Markov Property of Lévy Processes), each Y mn is a Lévy

process with respect to Fm2−n, independent of F(m2−n), and identical indistribution with X.

Step 2: We show that Y and X are identically distributed and Y is in-dependent of F(τ). For this, fix k, n ∈ N, let [t1, . . . , tk] ⊂ [0,∞) andB1, . . . , Bk ∈ B(Rd) as well as A ∈ F(τn). Since, for each m ∈ N, the processY mn is independent of F(m2−n), Y m

n and X are identically distributed, and

A ∩ τn = m2−n = A ∩ τn ≤ m2−n ∩ τn = m2−n ∈ F(m2−n)

by the definition of F(τn), we see that

P[A, Yn(ti) ∈ Bi, i = 1, . . . , k

]=∑∞

m=1 P[A ∩ τn = m2−n, Yn(ti) ∈ Bi, i = 1, . . . , k

]=∑∞

m=1 P[A ∩ τn = m2−n, Y m

n (ti) ∈ Bi, i = 1, . . . , k]

=∑∞

m=1 P[A ∩ τn = m2−n

]P[Y mn (ti) ∈ Bi, i = 1, . . . , k

]= P[A]P

[X(ti) ∈ Bi, i = 1, . . . , k

].

Choosing A = Ω implies that Yn and X have the same distribution. But then

P[A, Yn(ti) ∈ Bi, i = 1, . . . , k

]= P[A]P

[X(ti) ∈ Bi, i = 1, . . . , k

]= P[A]P

[Yn(ti) ∈ Bi, i = 1, . . . , k

],

which shows that Yn is independent of F(τn). Now Yn and X are identicallydistributed if and only if

E[f(Yn(t1), . . . , Yn(tk)

)]= E

[f(X(t1), . . . , X(tk)

)]for all bounded and continuous f : Rd×k → R. Sending n → ∞, using theright continuity of X, the continuity of f , and dominated convergence, gives

E[f(Y (t1), . . . , Y (tk)

)]= E

[f(X(t1), . . . , X(tk)

)],

showing that also Y and X have the same distribution. To see that Y isindependent of F(τ), let A ∈ F(τ) ⊂ F(τn). The independence of Yn andF(τn) and dominated convergence then show that

E[1Af

(Y (t1), . . . , Y (tk)

)]= lim

n→∞E[1Af

(Yn(t1), . . . , Yn(tk)

)]

110

5.4 Characterization of Brownian Motion and Poisson Process

= P[A] limn→∞

E[f(Yn(t1), . . . , Yn(tk)

)]= P[A]E

[f(Y (t1), . . . , Y (tk)

)],

and therefore Y is independent of F(τ).

Step 3: We are left with showing that Y is a Lévy process. For this, weobserve that Y is Fτ -adapted, càdlàg, and satisfies Y (0) = 0 by construc-tion. Moreover, since Y and X are identically distributed, Y has stationaryincrements and it remains to verify that Y has independent increments withrespect to Fτ . For this, let s, t ∈ [0,∞) with s < t, A ∈ Fτ (s) = F(τ + s) ⊂F(τn + s), and let f : Rd → R be continuous and bounded. Then

E[1Af

(Y (t)− Y (s)

)]= lim

n→∞E[1Af

(Yn(t)− Yn(s)

)]= lim

n→∞

∑∞m=1 E

[1A∩τn=m2−nf

(Y mn (t)− Y m

n (s))]

= limn→∞

∑∞m=1 P[A ∩ τn = m2−n]E

[f(Y mn (t)− Y m

n (s))]

= limn→∞

∑∞m=1 P[A ∩ τn = m2−n]E

[f(X(t)−X(s)

)]= P[A]E

[f(X(t)−X(s)

)]= P[A]E

[f(Y (t)− Y (s)

)],

showing that Y has independent increments.


Let us now turn to the characterization of Brownian motion and the Poissonprocess as Lévy processes. We show that every continuous Lévy process is aBrownian motion with drift and every Lévy process which is also a countingprocess with jump sizes a.s. equal to one is a Poisson process.

To begin with, we need a result on the existence of moments of Lévy pro-cesses. Recalling the counterexample of a Lévy process which is not inte-grable, the compound Poisson process with Cauchy distributed jump sizes,we are left with the impression that integrability of Lévy processes is relatedto the size of its jumps. As the following proposition shows, Lévy processeswith bounded jumps have moments of any order.

111


Proposition 5.12 (Moments of Lévy Processes). Let X be a Lévy process andassume that there exists a constant C > 0 such that∣∣X(t)−X(t−)

∣∣ ≤ C for all t ∈ [0,∞).

Then E[|X(t)|p] <∞ for all t ∈ [0,∞) and all p ∈ [0,∞).

Proof. Define sequences τnn∈N0, σnn∈N0 of random times by setting τ0 ,σ0 , 0 and, for all n ∈ N,

τn , inft ≥ σn−1 : |X(t)−X(σn−1)| > C

and σn , τn1τn<∞.

Using that X is right continuous and applying Proposition 1.22 (HittingTimes of Continuous Time Processes), it follows that each τ1 is a stoppingtime. Let us now show that τ1 is either a.s. finite or a.s. infinite. For this, fixt ∈ (0,∞), K ∈ N, and observe that by independence and stationarity of theincrements of X we have

P[τ1 =∞] ≤ P[τ1 ≥ Kt] ≤ P[∣∣X(kt)−X

((k − 1)t

)∣∣ ≤ 2C, k = 1, . . . K]

= P[|X(t)| ≤ 2C

]K.

If P[|X(t)| ≤ 2C] = 1, this implies that τ1 = ∞ a.s., i.e. X is a.s. boundedand we are done. If otherwise P[|X(t)| ≤ 2C] < 1, it follows that τ1 <∞ a.s.by sending K → ∞. But then σ1 = τ1 a.s. and it follows from Exercise 45that σ1 is a finite stopping time. By the strong Markov property and iteratingthis procedure, it follows that τn and σn are stopping times which coincidealmost surely, τn− τn−1, σn− σn−1, τ1, and σ1 are identically distributed andσn − σn−1 is independent of F(σn−1). Moreover, by right continuity, we have

0 = τ0 = σ0 < τ1 = σ1 < τ2 = σ2 < . . . almost surely.

In particular, the sequence σn − σn−1n∈N is independent and identicallydistributed, and it follows that

E[e−σn

]= E

[e−

∑nk=1(σk−σk−1)

]=∏n

k=1 E[e−(σk−σk−1)

]= E

[e−σ1

]n= qn,

where q , E[e−σ1 ] ∈ [0, 1) since σ1 > 0 almost surely. Moreover, from thedefinition of the stopping times τn, we infer that

|X(τn ∧ t)|1τn<∞

112


≤∑n

k=1

∣∣X(τk)−X(τk−1)∣∣1τn<∞

≤∑n

k=1

[|X(τk)−X(τk−)|+ |X(τk−)−X(τk−1)|

]1τn<∞ ≤ 2nC.

We therefore see that |X(t, ω)| > 2nC implies that σn(ω) = τn(ω) < t for allω ∈ τn <∞, and thus τn <∞ a.s. and the Markov inequality yield

P[|X(t)| > 2nC

]≤ P[σn < t] = P

[e−σn > e−t

]≤ etE

[e−σn

]= etqn.

But then, for any p ∈ [0,∞),

E[|X(t)|p

]=∑∞

n=0 E[|X(t)|p12nC<|X(t)|≤2(n+1)C

]≤ 2pCp

∑∞n=0(n+ 1)pP

[2nC < |X(t)| ≤ 2(n+ 1)C

]≤ 2pCp

∑∞n=0(n+ 1)pP

[|X(t)| > 2nC

]≤ 2pCpet

∑∞n=0(n+ 1)pqn <∞.

Let us now define what exactly we mean by a Brownian motion with drift.It is essentially an affine transformation of d independent Brownian motions.

Definition 5.13 (Brownian Motion with Drift; Multidimensional BrownianMotion). Let µ ∈ Rd and Σ ∈ Rd×d be symmetric and positive semidefinite. AnRd-valued F-adapted stochastic processB = B(t)t∈[0,∞) is called a Brownianmotion with drift if

(B1) B(0) = 0,

(B2) B(t)−B(s) is independent of F(s) for all s, t ∈ [0,∞) with s < t,

(B3) B(t) − B(s) is multivariate normal with mean vector µ(t − s) and co-variance matrix Σ(t− s),

(B4) B is continuous.

Moreover, in the special case of µ = 0 and Σ being the identity matrix, we saythat B is a d-dimensional Brownian motion.

From the definition of Brownian motion with drift, it is immediate that it isa continuous Lévy process and, as it turns out, the only continuous Lévy pro-cess. Moreover, it is easy to see that B is a d-dimensional Brownian motionif and only if it is a d-dimensional vector of independent one-dimensionalBrownian motions.

113


The proof of the characterization of continuous Lévy processes as Brownianmotions with drift is based on the Lindeberg central limit theorem, whichwe recall here for convenience.

Theorem 5.14 (Central Limit Theorem). For each n ∈ N, let Zn1 , . . . , Z

nn be

R-valued and independent random variables satisfying E[|Zni |2] < ∞ for all

i = 1, . . . , n. Assume moreover that

E[Zni

]= 0 for all i = 1, . . . , n and

∑ni=1 Var

[Zni

]= 1,

and that the Lindeberg condition

limn→∞

E[∑n

i=1

∣∣Zni

∣∣21|Zni |>ε] = 0 for all ε > 0

is satisfied. If Z denotes any standard normal random variable, it holds that∑ni=1 Z

ni

n∈N converges in distribution to Z.

We now have all the necessary tools at hand to establish the characterizationof continuous Lévy processes.

Theorem 5.15 (Brownian Motion as a Lévy Process). Every continuous Lévyprocess X = X(t)t∈[0,∞) is a Brownian motion with drift.

Proof. Step 1: Preliminaries. We first observe that we only have to verifythe property (B3), i.e. the increments of X are multivariate normal, and byProposition 5.9 (Infinite Divisibility and One-Dimensional Distributions) itsuffices to show that X(1) is multivariate normal. Moreover, by Proposi-tion 5.12 (Moments of Lévy Processes), we already know that X has finitemoments of any order, and by replacing X with X − E[X] (which is again aLévy process by Lemma 5.8 (Linearity of Expectation and Variance of LévyProcesses)), we may without of loss of generality assume that E[X(1)] = 0.Finally, to see that X(1) is multivariate normal, it suffices to show that〈ξ,X(1)〉 is normally distributed (possibly degenerate) for any ξ ∈ Rd.

Step 2: Increments. It is clear that E[〈ξ,X(1)〉] = 0 since E[X(1)] = 0. Nowconsider the stochastic process M = M(t)t∈[0,1] given by

M(t) , 〈ξ,X(t)〉 for all t ∈ [0, 1].

114


Clearly, M is a one-dimensional continuous Lévy process and it admits finitemoments of any order. If we assume that

σ2 , Var[M(1)

]= E

[|M(1)|2

]= 0,

it follows thatM(1) is a.s. equal to zero, i.e. degenerate normally distributedand the proof is finished. Let us therefore assume subsequently that σ2 > 0.For each n ∈ N, let us set

Zni , 1

σ

[M(i/n)−M

((i− 1)/n

)]for all i = 1, . . . , n.

By the independent and stationary increments property, Zn1 , . . . , Z

nn are in-

dependent and identically distributed for each n ∈ N. Moreover, it clearlyholds that

∑ni=1 Z

ni = 1

σM(1) and by independence∑n

i=1 Var[Zni

]= Var

[∑ni=1 Z

ni

]= 1

σ2 Var[M(1)

]= 1.

If we can now show that the Lindeberg condition holds, i.e.

limn→∞

E[∑n

i=1

∣∣Zni

∣∣21|Zni |>ε] = 0 for all ε > 0,

Theorem 5.14 (Central Limit Theorem) implies that

1σM(1) =

∑ni=1 Z

ni converges in distribution to Z,

where Z is some standard normal random variable. But then M(1) =〈ξ,X(1)〉 is normally distributed and we can conclude. For the Lindebergcondition to hold, we remark that it suffices to show that

the family∑n

i=1 |Zni |2 : n ∈ N

is uniformly integrable. (5.3)

Indeed, this implies that for any ε > 0 the family ∑n

i=1 |Zni |21|Zni |>ε : n ∈

N is uniformly integrable as well. But then the fact that M has uniformlycontinuous paths (as it has continuous paths on the compact interval [0, 1])shows that ∑n

i=1 |Zni |21|Zni |>ε → 0 as n→∞,

and the Lindeberg condition follows from Vitali’s theorem. Thus, we are leftwith proving Equation (5.3).

Step 3: Uniform integrability. After possibly switching the probability spaceto (Ω × Ω,A ⊗ A,P ⊗ P), we can find an independent copy M of M , i.e. M

115


and M are independent and identically distributed. Define Zni in terms of

M in the same way as we have defined Zni in terms of M . The binomial

theorem implies for all z, z ∈ R

z2 = (z − z + z)2 = (z − z)2 + 2(z − z)z + z2 ≤ (z − z)2 + 2zz

and hence we can observe that∑ni=1 |Zn

i |2 ≤∑n

i=1 |Y ni |2 + 2

∑ni=1 Z

ni Z

ni , (5.4)

where Y ni , Zn

i − Zni . Using independence, we see that

E[∣∣∑n

i=1 Zni Z

ni

∣∣2] = Var[∑n

i=1 Zni Z

ni

]=∑n

i=1 E[|Zn

i |2|Zni |2]

=∑n

i=1 E[|Zn

i |2]E[|Zn

i |2]

=∑n

i=11

n2σ4 Var[M(1)

]Var[M(1)

]≤ 1.

But this implies that the family 2∑n

i=1 Zni Z

ni : n ∈ N is bounded in

the space of square integrable random variables and hence uniformly in-tegrable. Therefore, in light of Equation (5.4), we are left with showingthat

∑ni=1 |Y n

i |2 : n ∈ N is uniformly integrable. Since

Y ni = Zn

i − Zni and − Y n

i = Zni − Zn

i

are identically distributed, it follows immediately that(Y ni , |Y n

i |2)

and(−Y n

i , |Y ni |2)

=(−Y n

i , | − Y ni |2)

are identically distributed as well. But then the independence of Y n1 , . . . , Y

nn

implies that(Y ni , Y

nj , |Y n

1 |2, . . . , |Y nn |2)

and(−Y n

i , Ynj , |Y n

1 |2, . . . , |Y nn |2)

are identically distributed whenever i 6= j. Thus

E[Y ni Y

nj

∣∣∣σ(|Y n1 |2, . . . , |Y n

n |2)]

= −E[Y ni Y

nj

∣∣∣σ(|Y n1 |2, . . . , |Y n

n |2)]

a.s.,

which is only possible if both conditional expectations are a.s. equal to zero.This in turn implies∑n

i=1 |Y ni |2 = E

[∑ni=1 |Y n

i |2∣∣∣σ(|Y n

1 |2, . . . , |Y nn |2)]

116


= E[(∑n

i=1 Yni

)2∣∣∣σ(|Y n

1 |2, . . . , |Y nn |2)]

= 1σ2 E[(M(1)− M(1)

)2∣∣∣σ(|Y n

1 |2, . . . , |Y nn |2)].

The uniform integrability of ∑n

i=1 |Y ni |2 : n ∈ N therefore follows from the

uniform integrability of conditional expectations.

Having characterized Brownian motion with drift as the only continuousLévy process, we now proceed to show that the Poisson process is the onlyLévy process with jump sizes a.s. equal to one which is simultaneously acounting process, i.e. N0-valued, nondecreasing, and right continuous. Theproof is significantly easier than for the Brownian motion case and is to themost part based on the strong Markov property and the memorylessnessproperty of exponential random variables which states that for any [0,∞)-valued random variable T , it holds that

P[T > t+ s

∣∣T > t]

= P[T > s] for all s, t ∈ [0,∞)

if and only if T is an exponentially distributed random variable.

Theorem 5.16 (Poisson Process as a Lévy Process). Let X = X(t)t∈[0,∞) bean N0-valued Lévy process and counting process with jump sizes a.s. equal toone. Then X is a Poisson process.

Proof. Step 1: Let us denote by S1 the first jump time of X given by

S1 , inft ∈ [0,∞) : X(t) > 0

.

We claim that P[S1 = ∞] ∈ 0, 1. Indeed, since X is a nondecreasing Lévyprocess, we have

P[S1 =∞

]= P

[⋂n∈NX(n) = 0

]= lim

n→∞P[X(n) = 0

]= lim

n→∞P[X(m)−X(m− 1) = 0,m = 1, . . . , n

]= lim

n→∞P[X(1) = 0

]n,

and the latter limit is either equal to zero or one. If S1 = ∞ a.s. then weare done since this implies that X(t) = 0 for all t ∈ [0,∞) almost surely.

117


Otherwise, we define T1 , S11S1<∞ and observe that the independenceand stationarity of the increments of X and right continuity imply

P[T1 > t+ s

∣∣T1 > t]

= P[S1 > t+ s

∣∣S1 > t]

= P[X(t+ s) = 0

∣∣X(t) = 0]

= P[X(t+ s)−X(t) = 0

∣∣X(t) = 0]

= P[X(t+ s)−X(t) = 0

]= P

[X(s) = 0

]= P

[S1 > s

]= P

[T1 > s

],

i.e. T1 is exponentially distributed by the memorylessness property.

Step 2: Proposition 1.22 (Hitting Times of Continuous Time Processes) im-plies that S1 is a stopping time since X is adapted and right continuous, andthe filtration F is right continuous as well. But then the completeness ofF implies that T1 is also a stopping time since T1 = S1 almost surely; seeExercise 45. Now we can apply Theorem 5.11 (Strong Markov Property ofLévy Processes) to argue that the process X1 = X1(t)t∈[0,∞) with

X1(t) , X(T1 + t)−X(T1) for all t ∈ [0,∞)

is a Lévy process with respect to FT1 independent of F(T1) and identicalin distribution with X. Repeating the arguments in step 1, we obtain anexponentially distributed random variable T2 such that T1 + T2 coincidesalmost surely with the second jump time of X, and an FT1+T2-Lévy processX2 = X2(t)t∈[0,∞) with

X2(t) , X(T1 + T2 + t)−X(T1 + T2) for all t ∈ [0,∞),

which is independent of F(T1) and identical in distribution with X. Iter-atively we obtain an entire sequence Tnn∈N such that

∑nk=1 Tk coincides

almost surely with the nth jump time of X as well as a sequence of pro-cesses Xnn∈N such that each Xn is a FT1+...+Tn-Lévy process independentof F(T1 + . . . + Tn) and identical in distribution with X. The independenceproperty guarantees in particular that the sequence Tnn∈N is independent.Moreover, we have

X(t) = supn ∈ N :

∑nk=1 Tk ≤ t

for all t ∈ [0,∞) almost surely.

But then X is indistinguishable from a renewal process constructed from ex-ponential random variables, which is in turn indistinguishable from a Pois-

118

5.5 The Lévy-Ito Decomposition

son process by Exercise 46. In particular, X has Poisson distributed incre-ments, i.e. it satisfies (N3). The remaining properties (N1), (N2), and (N4)follow since X is a Lévy process and a counting process.


We proceed towards the Lévy-Ito decomposition, stating that every Lévyprocess can be written as the sum of three independent Lévy process: aBrownian motion with drift, a compound Poisson process with jumps greaterthan one, and a pure jump martingale with jump sizes less than or equal toone.

To obtain the Brownian motion with drift component of the Lévy processX, it seems intuitive to simply subtract from X all of its jumps, i.e. subtractfrom X(t) the sum ∑

s∈[0,t]

[X(s)−X(s−)

].

The problem is, however, that this sum may diverge: Since X is càdlàg,it may have infinitely many jumps on the time interval [0, t]. Nevertheless,given any ε > 0, any càdlàg function has at most finitely many jumps of sizegreater than ε on [0, t], and hence we can at least extract all jumps whichare greater than ε, i.e. subtract from X(t) the sum∑

s∈[0,t]

[X(s)−X(s−)

]1|X(s)−X(s−)|>ε.

The decomposition of X can then be obtain by a limiting argument as ε ↓ 0by using martingale techniques.

Let us analyze this extraction procedure in more detail. For this, let us fixan adapted càdlàg process X = X(t)t∈[0,∞) taking values in Rd (for now,X need not be a Lévy process). Moreover, we fix constants c, d ∈ R with0 < c ≤ d ≤ ∞ and define the annulus

Ddc ,

x ∈ Rd : c < |x| ≤ d

⊂ Rd

∗ , Rd \ 0.

Our aim is to extract the jumps of X which take values in Ddc . For this, we

iteratively define T c,d0 , 0 and

T c,dn , inft > T c,dn−1 : |X(t)−X(t−)| ∈ Dd

c

for all n ∈ N.

119


The number of jumps of X taking values in Ddc on the finite time interval

[0, t] is then given by

JX([0, t]×Dd

c , ω),∑∞

n=1 1T c,dn (ω)≤t for all t ∈ [0,∞) and ω ∈ Ω. (5.5)

Since each càdlàg function has at most finitely many jumps of size greaterthan c, we see that the sum in the latter equation is finite. To ensure appro-priate measurability of JX , let us quickly verify that the random times T c,dn ,n ∈ N, are stopping times.

Lemma 5.17 (Measurability of Jumps). Let X = X(t)t∈[0,∞) be an Rd-valued, adapted, and càdlàg stochastic process. Assume that F = F(t)t∈[0,∞)

is right continuous, and fix c, d ∈ R with 0 < c ≤ d ≤ ∞. Define T c,d0 , 0 and

T c,dn , inft > T c,dn−1 : |X(t)−X(t−)| ∈ Dd

c

for all n ∈ N. (5.6)

Then each T c,dn , n ∈ N0 is a stopping time. Moreover, for JX defined as inEquation (5.5) and each t ∈ [0,∞), it holds that the mapping

(s, ω) 7→ JX([0, s]×Dd

c , ω), (s, ω) ∈ [0, t]× Ω

is B([0, t])⊗ F(t)-measurable. Finally, for each n ∈ N and t ∈ [0,∞),

Y tn ,

[X(T c,dn

)−X

(T c,dn −

)]1T c,dn ≤t

defines an F(t)-measurable random variable.

Proof. Step 1: We show that each T c,dn , n ∈ N0, is a stopping time. Letus fix ω ∈ Ω, t ∈ (0,∞), and n ∈ N. If we assume that T c,dn (ω) ≤ t,right continuity of X implies that T c,dn−1(ω) < T c,dn (ω) and hence there existsq ∈ [0, t)∩Q such that Tn−1(ω) ≤ q and X(·, ω) must have at least one jumpin Dd

c on (q, t]. Since X is càdlàg, it follows that for all j, k ∈ N there existu, v ∈ (q, t+ 1/k] ∩Q with |u− v| ≤ 1/k such that

c < |X(u, ω)−X(v, ω)| < d+ 1j.

On the other hand, if we assume that there exists q ∈ [0, t) ∩ Q such thatT c,dn−1(ω) ≤ q and if for all j, k ∈ N there exist u, v ∈ (q, t + 1/k] ∩ Q with|u− v| ≤ 1/k such that

c < |X(u, ω)−X(v, ω)| < d+ 1j,

120


it follows that X(·, ω) has a jump in Ddc in (q, t], i.e. T c,dn (ω) ≤ t. We have

thus argued that the event T c,dn ≤ t can be represented as⋃q∈[0,t)∩Q

[T c,dn−1 ≤ q ∩

( ⋂j,k∈N

⋃u,v∈(q,t+1/k]∩Q|u−v|≤1/k

c < |X(u)−X(v)| < d+ 1

j

)].

Since T c,d0 = 0 is a stopping time, we see by iteration that T c,dn ≤ t isF(t+) = F(t) measurable and hence each T c,dn , n ∈ N0, is a stopping time.

Step 2: To verify the measurability of

(s, ω) 7→ JX([0, s]×Dd

c , ω), (s, ω) ∈ [0, t]× Ω

let us first show that any adapted stochastic process Z = Z(s)s∈[0,t] whichis either right or left continuous satisfies

(s, ω) 7→ Z(s, ω) is B([0, t])⊗ F(t)-measurable.

Indeed, if Z is right continuous, this holds since

Z(s, ω) = limn→∞

∑∞k=1 Z

(kn, ω)1(k−1)/n≤s<k/n, (s, ω) ∈ [0, t)× Ω,

and each Z(k/n, ω)1(k−1)/n≤s<k/n is jointly measurable. In the left contin-uous case, the measurability follows by the same argument if we replaceZ(k/n, ω) by Z((k − 1)/n, ω). With this, the measurability of JX is now aconsequence of the fact that

JX([0, s]×Dd

c

)=∑∞

n=1 1T c,dn ≤s for all s ∈ [0,∞)

is a right continuous and adapted stochastic process.

Step 3: We are left with showing that

Y tn ,

[X(T c,dn

)−X

(T c,dn −

)]1T c,dn ≤t

is an F(t)-measurable random variable. We first observe that we may writeY tn equivalently as

Y tn ,

[X(T c,dn ∧ t

)−X

((T c,dn ∧ t)−

)]1T c,dn ≤t.

121


Since X is right continuous and X(t−)t∈[0,∞) is left continuous, it followsfrom step 2 that

(s, ω) 7→ X(s, ω) and (s, ω) 7→ X(s−, ω)

are both B([0, t]) ⊗ F(t)-measurable. Moreover, T c,dn ∧ t is a [0, t]-valuedstopping time and hence the mapping

ω 7→(T c,dn (ω) ∧ t, ω

), ω ∈ Ω,

is F(t)-measurable. The F(t)-measurability of Y tn then follows since compo-

sitions of measurable functions are again measurable.

Since the sets of the form [0, t] × Ddc form an intersection stable generator

of the σ-field B([0,∞)× Rd∗), it follows that JX(·, ω) extends uniquely to a

counting measure on ([0,∞) × Rd∗,B([0,∞) × Rd

∗)) for all ω ∈ Ω, i.e. wemay think of JX as a random measure. Moreover, Lemma 5.17 (Measura-bility of Jumps) and a monotone class argument imply that ω 7→ JX(A, ω) ismeasurable for any A ∈ B([0,∞)× Rd

∗).

Definition 5.18 (Jump Measure). Let X = X(t)t∈[0,∞) be an Rd-valued,adapted, càdlàg stochastic process. The random measure

JX : B([0,∞)× Rd

∗)× Ω→ N0 ∪ ∞

defined uniquely via Equation (5.5) is called the jump measure of X.

Given a Lévy process X and an annulus Ddc ∈ B(Rd

∗) with 0 < c ≤ d ≤ ∞,we define a stochastic process Nd

c = Ndc (t)t∈[0,∞) by

Ndc (t) , JX([0, t]×Dd

c ) =∑∞

n=1 1T c,dn ≤t for all t ∈ [0,∞).

Clearly, Ndc simply counts the number of jumps of X in Dd

c and is hence acounting process with jump sizes equal to one. As the following propositionshows, Nd

c is Lévy process and hence a Poisson process.

Proposition 5.19 (The Jump Measure Process). Let X be a Lévy process withjump measure JX . Fix c, d ∈ R with 0 < c ≤ d ≤ ∞ and define a stochasticprocess Nd

c = Ndc (t)t∈[0,∞) by

Ndc (t) , JX

([0, t]×Dd

c

)for all t ∈ [0,∞).

Then Ndc is a Poisson processes with intensity λ , E[Nd

c (1)].

122


Proof. It is clear that Ndc satisfies Nd

c (0) = 0, is a counting process, andthe jump sizes are equal to one. Moreover, since c > 0 and X(·, ω) iscàdlàg and has hence at most finitely many jumps greater than c on anybounded time interval, it becomes clear that Nd

c (t, ω) < ∞ and thus Ndc is

N0-valued. Adaptedness of Ndc follows from Lemma 5.17 (Measurability of

Jumps). Hence, in order to conclude, it suffices to show that Ndc has inde-

pendent and stationary increments (and is hence a Lévy process) and thenapply Theorem 5.16 (Poisson Process as a Lévy Process). If we denote by λthe intensity of this Poisson process, E[Nd

c (1)] = λ follows immediately fromthe fact that Nd

c (1) is Poisson distributed with parameter λ.

Let us now fix s, t ∈ [0,∞) with s < t. Using the Markov property, Xcoincides in distribution with Y , X(· + s) −X(s), and the latter is a Lévyprocess independent of F(s). Hence, since X has no jump at time zero,

Ndc (t− s) = JX

([0, t− s]×Dd

c

)= JX

((0, t− s]×Dd

c

)and

Ndc (t)−Nd

c (s) = JX([0, t]×Dd

c

)− JX

([0, s]×Dd

c

)= JX

((s, t]×Dd

c

)= JY

((0, t− s]×Dd

c

)are identically distributed and Nd

c (t)−Ndc (s) is independent of F(s).

The process Ndc allows us to extract from X the jumps of size in Dd

c . Forthis, we recall the stopping times T c,dn n∈N given in Equation (5.6) anddefine Y d

c = Y dc (t)t∈[0,∞) by

Y dc (t) ,

∑∞n=1

[X(T c,dn

)−X

(T c,dn −

)]1T c,dn ≤t for all t ∈ [0,∞).

We observe that Y dc can also be written as

Y dc (t) =

∫[0,t]×Rd∗

y1|y|∈Ddc JX(ds⊗ dy) for all t ∈ [0,∞),

the latter being well-defined since JX(·, ω) is a measure for each ω ∈ Ω andX(·, ω) has only finitely many jumps on [0, t] which are greater than c. ByLemma 5.17 (Measurability of Jumps), Y d

c is an adapted stochastic process.Moreover, Y d

c is clearly piecewiese constant and consists of the jumps of Xwhich are greater than c and less than or equal to d. We proceed by showingthat Y d

c is a Lévy process.

123


Proposition 5.20 (Extracted Jump Processes). Let X be a Lévy process andc, d, e, f ∈ R with 0 < c ≤ d ≤ e ≤ f ≤ ∞. Define stochastic processesY dc = Y d

c (t)t∈[0,∞), Y fe = Y f

e (t)t∈[0,∞), and Z = Z(t)t∈[0,∞) by

Y dc (t) ,

∫[0,t]×Rd∗

y1|y|∈Ddc JX(ds⊗ dy), t ∈ [0,∞),

Y fe (t) ,

∫[0,t]×Rd∗

y1|y|∈Dfe JX(ds⊗ dy), t ∈ [0,∞),

Z(t) , X(t)− Y dc (t)− Y f

e (t), t ∈ [0,∞).

Then (Y dc , Y

fe , Z) is a Lévy process and Y d

c , Yfe , Z are independent.

Proof. Step 1: (Y dc , Y

fe , Z) is a Lévy process. Clearly, we only have to ver-

ify the independent and stationary increments property. For this, fix s, t ∈[0,∞) with s < t and choose sequences f c,dn n∈N and f e,fn n∈N of continu-ous and compactly supported functions1 mapping Rd to Rd such that zero isnot contained in the support and

f c,dn (y)→ y1|y|∈Ddc and f e,fn (y)→ y1|y|∈Dfe as n→∞.

If d < ∞, it is clear that we can choose the functions f c,dn to be bounded. Ifd =∞, the set Dd

c is open and we can assume without loss of generality thatthe sequence f c,dn n∈N is increasing. In any case, for all ω ∈ Ω, it holds that

limn→∞

∫[0,t]×Rd∗

f c,dn (y) JX(ds⊗ dy, ω) =∫

[0,t]×Rd∗y1|y|∈Ddc JX(ds⊗ dy, ω).

Upon setting smk , km

(t− s) for all k = 0, 1, . . . ,m and m ∈ N, we have

Y dc (t− s, ω) =

∫[0,t−s]×Rd∗

y1|y|∈Ddc JX(dr ⊗ dy, ω)

= limn→∞

∫[0,t−s]×Rd∗

f c,dn (y) JX(dr ⊗ dy, ω)

= limn→∞

limm→∞

∑mk=1 f

c,dn

(X(smk , ω)−X(smk−1, ω)

)and similarly

Y dc (t, ω)− Y d

c (s, ω) = limn→∞

limm→∞

∑mk=1 f

c,dn

(X(s+ smk , ω)−X(s+ smk−1, ω)

).

Now the same holds for Y fe in place of Y d

c , and thus

1The support of a function f : Rd → Rd is defined as the closure of the set x ∈ Rd :f(x) 6= 0. f is said to be compactly supported if the support is a compact set.

124


Z(t− s) = limn→∞

limm→∞

∑mk=1

[X(smk )−X(smk−1)

− f c,dn(X(smk )−X(smk−1)

)− f e,fn

(X(smk )−X(smk−1)

)]as well as

Z(t)− Z(s) = limn→∞

limm→∞

∑mk=1

[X(s+ smk )−X(s+ smk−1)

− f c,dn(X(s+ smk )−X(s+ smk−1)

)− f e,fn

(X(s+ smk )−X(s+ smk−1)

)],

where the limits are pointwise in ω. Now independence and stationarity ofthe increments of X and the fact that independence is stable under point-wise convergence in ω implies independence and stationarity of the incre-ments of (Y d

c , Yfe , Z).

Step 2: Y dc , Y

fe , Z are independent. For ease of notation, we write

U1 , Y dc and U2 , Y f

e .

Fix s, t ∈ [0,∞) with s < t, z1, z2 ∈ Rd, and define Mj = Mj(r)r∈[s,t],j = 1, 2, by

Mj(r) ,ei〈zj ,Uj(r)−Uj(s)〉

E[ei〈zj ,Uj(r)−Uj(s)〉]for all r ∈ [s, t] and j = 1, 2.

Then M1 and M2 are a bounded martingales. Indeed, for j ∈ 1, 2 theboundedness of the enumerator in the definition of Mj is immediate, sowe only need to show that the denominator is lower bounded. But sinceUj(r) − Uj(s)r∈[s,∞) is a Lévy process by the Markov property, it followsfrom infinite divisibility that, for all r ∈ [s, t],

E[ei〈zj ,Uj(r)−Uj(s)〉] = ϕUj(r)−Uj(s)(zj) = ϕUj(s+1)−Uj(s)(zj)r−s, (5.7)

which is clearly lower bounded. To see that Mj is a martingale, fix r1, r2 ∈[s, t] with r1 < r2. Then, using independence of increments,

E[Mj(r2)

∣∣F(r1)]

=1

E[ei〈zj ,Uj(r2)−Uj(r1)+Uj(r1)−Uj(s)〉]E[ei〈zj ,Uj(r2)−Uj(r1)+Uj(r1)−Uj(s)〉

∣∣F(r1)]

125


=1

E[ei〈zj ,Uj(r2)−Uj(r1)〉]E[ei〈zj ,Uj(r1)−Uj(s)〉]ei〈zj ,Uj(r1)−Uj(s)〉E

[ei〈zj ,Uj(r2)−Uj(r1)〉]

=ei〈zj ,Uj(r1)−Uj(s)〉

E[ei〈zj ,Uj(r1)−Uj(s)〉]= Mj(r1) a.s.

For each n ∈ N and all k = 0, 1, . . . , n, let us define rkn , s + kn(t − s).

Whenever k, ` ∈ 0, 1, . . . ,m are such that k < `, we know that M1(rkn) −M1(rk−1

n ) is F(r`−1n )-measurable and thus

E[(M1(rkn)−M1(rk−1

n ))(M2(r`n)−M2(r`−1

n ))]

= E[(M1(rkn)−M1(rk−1

n ))E[M2(r`n)−M2(r`−1

n )∣∣F(r`−1

n )]]

= 0.

Since the same argument also shows that

E[(M1(rkn)−M1(rk−1

n ))(M2(r`n)−M2(r`−1

n ))]

= 0

if k > `, it follows from

E[M1(t)] + E[M2(t)] = E[M1(s)] + E[M2(s)] = 2

that

E[M1(t)M2(t)

]− 1

= E[M1(t)M2(t)−M1(t)−M2(t) + 1

]= E

[(M1(t)− 1

)(M2(t)− 1

)]= E

[(M1(t)−M1(s)

)(M2(t)−M2(s)

)]= E

[∑nk,`=1

(M1(rkn)−M1(rk−1

n ))(M2(r`n)−M2(r`−1

n ))]

= E[∑n

k=1


n ))(M2(rkn)−M2(rk−1

n ))]. (5.8)

Let us denote by V1 = V1(r)r∈[s,t] the total variation process of M1. ByEquation (5.7) and since c > 0, M1(·, ω) is piecewise monotone and has atmost finitely many jumps, which implies that V1 is finite-valued. Moreover,by Proposition 5.19 (The Jump Measure Process), the number of jumps ofU1 (hence also of M1) on [s, t] is a Poisson distributed random variable withsome parameter λ(t− s) ≥ 0. Since M1 is bounded, this implies that we canestimate V1(t) by a constant K > 0 times the number of jumps of M1 and itfollows that

E[|V1(t)|

]≤ Kλ(t− s).

126


Now estimate∣∣∣∑nk=1


n ))(M2(rkn)−M2(rk−1

n ))∣∣∣

≤ 2 supr∈[s,t] |M2(r)|∑n

k=1

∣∣M1(rkn)−M1(rk−1n )

∣∣≤ 2V1(t) supr∈[s,t] |M2(r)|.

The right hand side is clearly integrable since M2 is bounded. But this im-plies that we are allowed to apply dominated convergence in Equation (5.8)and it follows that

E[M1(t)M2(t)

]− 1 = E

[∑r∈(s,t]

(M1(r)−M1(r−)

)(M2(r)−M2(r−)

)]= 0,

where the last equality follows from the fact that U1 and U2 and thereforealso M1 and M2 never jump at the same time. By the definition of M1 andM2, this holds if and only if

ϕ(U1(t)−U1(s),U2(t)−U2(s))(z1, z2) = E[∏2

j=1ei〈zj ,Uj(t)−Uj(s)〉

]=∏2

j=1E[ei〈zj ,Uj(t)−Uj(s)〉]

=∏2

j=1ϕUj(t)−Uj(s)(zj), (5.9)

i.e. U1(t) − U1(s) and U2(t) − U2(s) are independent for any choice of s, t ∈[0,∞) with s < t. This in turn implies that U1 and U2 are independent.Indeed, let us fix n ∈ N, [t1, . . . , tn] ⊂ [0,∞), and zk1 , z

k2 ∈ Rd for all k =

1, . . . , n. We set t0 , 0 and for each j = 1, 2 write

zj ,(z0j , . . . , z

nj

)and Uj ,

(Uj(t1)− Uj(t0), . . . , Uj(tn)− Uj(tn−1)

).

We then have

ϕ(U1,U2)(z1, z2) = E[∏n

k=1

∏3j=1 e

i〈zkj ,Uj(tk)−Uj(tk−1)〉]

=∏n

k=1 E[∏2

j=1 ei〈zkj ,Uj(tk)−Uj(tk−1)〉

]=∏n

k=1

∏2j=1 E

[ei〈z

kj ,Uj(tk)−Uj(tk−1)〉

]=∏2

j=1 E[∏n

k=1 ei〈zkj ,Uj(tk)−Uj(tk−1)〉

]=∏2

j=1 ϕUj(zj),

since (U1, U2) is a Lévy process with independent increments (for the secondequality), by Equation (5.9) (for the third equality), and the fact that U1 andU2 are Lévy processes with independent increments. This however, implies

127


independence of the entire processes U1 = Y dc and U2 = Y f

e . Repeating theargument with

U1 , (Y dc , Y

fe ) and U2 , Z

then implies the mutual independence of Y dc , Y

fe , Z.

We are now almost ready to prove the Lévy-Ito decomposition. But first weneed to introduce the notion of the Lévy measure of a Lévy process andstudy its properties.

Definition 5.21 (Lévy Measure). Let X be a Lévy process with jump measureJX . The measure ν on (Rd

∗,B(Rd∗)) defined by

ν(B) , E[JX([0, 1]×B)

]for all B ∈ B(Rd

∗)

is called the Lévy measure of X.

The Lévy measure of a set B ∈ B(Rd∗) is simply the expected number of

jumps in B of the underlying Lévy process during [0, 1]. It should be clearthat the Lévy measure of a Brownian motion with drift is the zero measure.The Lévy measure of a Poisson process with intensity λ ≥ 0 is given by

ν(B) =

λ if 1 ∈ B0 if 1 6∈ B

for all B ∈ B(Rd∗).

Proposition 5.22 (Exponential Formula). Let X be a Lévy process with Lévymeasure ν and fix 0 < c ≤ d ≤ ∞. Then

E[ei〈z,Y

dc (t)〉] = exp

−t∫Ddc

[1− ei〈z,y〉] ν(dy)

for all z ∈ Rd.

Proof. Let us first observe that the integral∫Ddc

[1 − ei〈z,y〉] ν(dy) is well-defined since the integrand is bounded and

ν(Ddc

)= E

[JX([0, 1]×Dd

c

)]= E

[Ndc (1)

]<∞

by Proposition 5.19 (The Jump Measure Process). For n ∈ N, let us considera function fn : Rd

∗ → Rd of the form

fn(y) =∑n

k=1 ξnk1cnk<|y|≤d

nk for all y ∈ Rd

∗,

128


where ξn1 , . . . , ξnn ∈ Rd and 0 < cn1 ≤ dn1 ≤ . . . ≤ cnn ≤ dnn ≤ ∞. We assume

that fnn∈N is such that

limn→∞

fn(y) = y1|y|∈Ddc for all y ∈ Rd∗

as well as

Y dc (t) =

∫[0,t]×Rd∗

y1|y|∈Ddc JX(ds⊗ dy)

= limn→∞

∫[0,t]×Rd∗

fn(y) JX(ds⊗ dy) = limn→∞

∑nk=1 ξ

nkN

nk (t).

where Nnk , N

dnkcnk

. For any z ∈ Rd, it follows that

〈z, Y dc (t)〉 = lim

n→∞

∑nk=1〈z, ξnk 〉Nn

k (t),

Iteratively applying Proposition 5.20 (Extracted Jump Processes), we seethat Nn

1 , . . . , Nnn are independent (being the jump measure processes of the

corresponding extracted jump processes Y n1 , . . . , Y

nn ). Now dominated con-

vergence and this independence property imply that

E[ei〈z,Y

dc (t)〉

]= lim

n→∞E[ei

∑nk=1〈z,ξnk 〉N

nk (t)]

= limn→∞

∏nk=1 E

[ei〈z,ξ

nk 〉N

nk (t)]

By Proposition 5.19 (The Jump Measure Process), each Nnk is a Poisson pro-

cess with intensity E[Nnk (1)] = ν(Dn

k ), where Dnk , y ∈ Rd

∗ : cnk < |y| ≤ dnk.Hence

E[ei〈z,Y

dc (t)〉

]= lim

n→∞

∏nk=1 E

[ei〈z,ξ

nk 〉N

nk (t)]

= limn→∞

∏nk=1 e

tν(Dnk )[ei〈z,ξnk 〉−1]

= limn→∞

e−t∑nk=1[1−ei〈z,ξ

nk 〉]ν(Dnk )

= exp−t∫Ddc

[1− ei〈z,y〉] ν(dy).

In general, the Lévy measure may become infinite if the Lévy process hasa lot of small jumps, i.e. ν(Rd

∗) = ∞ is possible. However, if B ∈ B(Rd∗) is

such that 0 6∈ B it is possible to show that ν(B) <∞. The following lemmamakes this statement more precise.

Proposition 5.23 (Integrability of the Lévy Measure). LetX be a Lévy processwith Lévy measure ν. Then ∫

Rd∗|y|2 ∧ 1 ν(dx) <∞.

129


Proof. Let us first observe that it suffices to show that∫Rd∗|y|21|y|≤1 ν(dy) <∞

since∫Rd∗1|y|>1 ν(dy) = ν(y ∈ Rd

∗ : |y| > 1)= E

[JX([0, 1]×D∞1 )

]= E

[N∞1 (1)

]<∞,

where the finiteness follows since E[N∞1 (1)

]is the intensity of the Poisson

process N∞1 by Proposition 5.19 (The Jump Measure Process). Let us fixδ ∈ (0, 1). Then the processes Y 1

δ and X − Y 1δ are independent by Proposi-

tion 5.20 (Extracted Jump Processes) and therefore

0 <∣∣E[ei〈z,X(t)〉]∣∣ =

∣∣E[ei〈z,Y 1δ (t)〉]∣∣ · ∣∣E[ei〈z,X(t)−Y 1

δ (t)〉]∣∣ ≤ ∣∣E[ei〈z,Y 1δ (t)〉]∣∣

for all t ∈ [0,∞) and all z ∈ Rd. For a complex number c ∈ C, let usdenote by Re(c) and Im(c) its real and imaginary part, respectively. Werecall that |ec| = eRe(c) and Re(ec) = eRe(c) cos(Im(c)), and by a Taylor seriesrepresentation it holds that 1

4u2 ≤ 1 − cos(u) whenever |u| ≤ 1. With this

and by Proposition 5.22 (Exponential Formula), it follows that

0 <∣∣E[ei〈z,X(t)〉]∣∣ ≤ ∣∣E[ei〈z,Y 1

δ (t)〉]∣∣=∣∣exp

−t∫D1δ[1− ei〈z,y〉] ν(dy)

∣∣= exp

−t∫D1δ[1− cos(〈z, y〉)] ν(dy)

≤ exp

−t1

4

∫D1δ|〈z, y〉|2 ν(dy)

for all z ∈ Rd with |z| ≤ 1. Sending δ ↓ 0 and using that the left hand sidedoes not depend on δ, it follows that∫

Rd∗|〈z, y〉|21|y|≤1 ν(dy) <∞ for all z ∈ Rd with |z| ≤ 1,

which in turn implies the result.

Let us return to the main problem in the Lévy-Ito decomposition, which, aswe recall, consists of the problem that∑

s∈[0,t]

[X(s)−X(s−)

]

130


may diverge. Observe that we may think of this sum as the limit of Y ∞ε (t)as ε ↓ 0. To get the divergence issue under control, we have to compensatethe sum in a suitable way or, more appropriately, compensate Y ∞ε to ensurethat a limit exists as ε ↓ 0. The following result is the main ingredient forthis procedure. Let us stress here that the result is only formulated for Y d

c

with d <∞ since large jumps may destroy integrability.

Proposition 5.24 (Compensated Lévy Process). Let X be a Lévy process withLévy measure ν and fix 0 < c ≤ d <∞. Then

E[Y dc (t)

]= t∫Ddcy ν(dy) for all t ∈ [0,∞).

Moreover, the stochastic process M = M(t)t∈[0,∞) given by

M(t) ,∫

[0,t]×Rd∗y1|y|∈Ddc

[JX(ds⊗ dy)− tν(dy)

]for all t ∈ [0,∞)

is a martingale satisfying

E[|M(t)|2

]= t∫Ddcy2 ν(dy) <∞.

Proof. We first observe that Y dc has finite moments of any order since d <∞

and the integrals with respect to ν in the statement of the proposition arewell-defined by Proposition 5.23 (Integrability of the Lévy Measure). Let usfirst consider a step function f : Rd

∗ → Rd of the form

f(y) =∑n

k=1 ξk1ck<|y|≤dk for all y ∈ Rd∗,

where n ∈ N, ξ1, . . . , ξn ∈ Rd and 0 < c1 ≤ d1 ≤ . . . ≤ cn ≤ dn <∞. Then

E[∫

[0,t]×Rd∗f(y) JX(ds⊗ dy)

]= E

[∑nk=1 ξkN

dkck

(t)]

= t∑n

k=1 ξkν(Ddkck

)= t∫Rd∗f(y) ν(dy)

since each Ndkck

is a Poisson process with intensity ν(Ddkck

). Moreover, when-ever k, ` ∈ 1, . . . , n with k 6= `, the independence of Ndk

ckand Ndl

climplies

E[(Ndkck

(t)− tν(Ddkck

))(Nd`c`

(t)− tν(Dd`c`

))]

= E[Ndkck

(t)− tν(Ddkck

)]E[Nd`c`

(t)− tν(Dd`c`

)]

= 0,

131


whereas if k = ` we have

E[(Ndkck

(t)− tν(Ddkck

))(Nd`c`

(t)− tν(Dd`c`

))]

= Var[Ndkck

(t)− tν(Ddkck

)]

= Var[Ndkck

(t)]

= tν(Ndkck

).

This shows that

E[∣∣∫

[0,t]×Rd∗f(y) [JX(ds⊗ dy)− tν(dy)]

∣∣2]= E

[∣∣∑nk=1 ξk

[Ndkck

(t)− tν(Ddkck

)]∣∣2]

= t∑n

k=1 |ξk|2ν(Ddkck

)= t∫Rd∗f(y)2 ν(dy).

Hence, by approximating the mapping

y 7→ y1|y|∈Ddc

by step functions as in the proof of Proposition 5.22 (Exponential Formula),it follows that

E[Y dc (t)

]= t∫Ddcy ν(dy) and E

[|M(t)|2

]= t∫Ddcy2 ν(dy).

To see that M is a martingale, we first observe that

M(t) = Y dc (t)− t

∫Ddcy ν(dy) = Y d

c (t)− E[Y dc (t)

]= Y d

c (t)− tE[Y dc (1)

].

In particular, M is a centered Lévy process and thus, by independence ofincrements

E[M(t)−M(s)

∣∣F(s)]

= E[M(t)−M(s)

]= 0 a.s.

for all s, t ∈ [0,∞) with s < t.

We are now finally ready for the celebrated Lévy-Ito decomposition.

Theorem 5.25 (Lévy-Ito Decomposition). Let X be a Lévy process with Lévymeasure ν. Then there exist independent Lévy processes B, Y , and M with

X(t) = B(t) + Y (t) +M(t) for all t ∈ [0,∞) a.s.,

and such that

132


(i) B is a Brownian motion with drift,

(ii) Y is given by

Y (t) ,∫

[0,t]×Rd∗y1|y|>1JX(ds⊗ dy) for all t ∈ [0,∞),

(iii) and M is a martingale with E[|M(t)|2] <∞ for all t ∈ [0,∞) such that

limε↓0

supt∈[0,T ]

∣∣M(t)−Mε(t)∣∣ = 0 a.s. for all T > 0,

where for each ε ∈ (0, 1] the process Mε = Mε(t)t∈[0,∞) is defined as

Mε(t) ,∫

[0,t]×Rd∗y1ε<|y|≤1


], t ∈ [0,∞).

Proof. Step 1: Construction of M . For each 0 < ε1 < ε2 ≤ 1 and t ∈ [0,∞),it holds that

Mε1(t)−Mε2(t) =∫

[0,t]×Rd∗y1|y|∈Dε2ε1


],

and it follows from Proposition 5.24 (Compensated Lévy Process) thatMε1−Mε2 is both a Lévy process and a martingale satisfying

E[|Mε1(t)−Mε2(t)|2

]= t∫Rd∗y21ε1<|y|≤ε2 ν(dy) <∞.

Fixing T > 0, Doob’s Lp Inequality (Theorem 4.10) shows that

E[supt∈[0,T ]

∣∣Mε1(t)−Mε2(t)∣∣2] ≤ 4E

[|Mε1(T )−Mε2(T )|2

]= 4T

∫Rd∗y21ε1<|y|≤ε2 ν(dy).

By Proposition 5.23 (Integrability of the Lévy Measure), the right hand sidecan be made arbitrarily small by making ε2 sufficiently small. Hence, when-ever εkk∈N ⊂ (0, 1] is a sequence with εk ↓ 0, it follows that

limk,`→∞

E[supt∈[0,T ]

∣∣Mεk(t)−Mε`(t)∣∣2] = 0,

i.e., uniformly in t ∈ [0, T ], Mεk(t)k∈N is Cauchy in the space of square-integrable random variables, and hence converges to some square integrable

133


random variable MTt . Since convergence in mean-square implies conver-

gence in probability and monotone convergence implies that

E[∑∞

k=1 supt∈[0,T ]

∣∣Mεk+1(t)−Mεk(t)

∣∣2] ≤ 4T∫Rd∗y210<|y|≤1 ν(dy) <∞,

it follows that, still uniformly in t ∈ [0, T ], Mεk(t)k∈N also converges almostsurely to MT

t . In particular, there exists a P-nullset NT ∈ A such that thefunction t 7→ Mεk(t, ω) converges in the space of càdlàg functions to t 7→MT

t (ω) uniformly in t ∈ [0, T ] for all ω 6∈ NT , which shows that the mapping

t 7→MTt (ω) is càdlàg for all ω 6∈ NT .

A standard argument shows that MTt does not depend on the particular

choice of the sequence εkk∈N and that, whenever 0 < T1 < T2,

MT1t (ω) = MT2

t (ω) for all t ∈ [0, T1] and all ω 6∈ NT1 ∪NT2 .

Now define N ,⋃T∈NNT and M = M(t)t∈[0,∞) by

M(t) = 1NMTt for all t ∈ [0, T ] and T ∈ N.

By the above discussion, M is well-defined.

Step 2: Properties of M . Since N is a nullset and F is complete, it followsthat M is adapted. Next, it is clear that for any T > 0 we have

limε↓0

supt∈[0,T ]

∣∣M(t)−Mε(t)∣∣ = 0 a.s.

The square-integrability of M(t) for each t ∈ [0,∞) follows from the factthat M(t) is also the limit of Mε(t) in mean-square. To see that M is amartingale, it suffices to observe that for any s, t ∈ [0,∞) and F ∈ F(s), wehave

E[1FM(s)

]= lim

k→∞E[1FMεk(s)

]= lim

k→∞E[1FMεk(t)

]= E

[1FM(t)

],

showing that M(s) = E[M(t)|F(s)] almost surely. Finally, to see that Mis a Lévy process, we first observe that M is càdlàg by construction andM(0) = 0 holds without loss of generality since M(0) = limε↓0Mε(0) = 0almost surely (redefine M by setting M ≡ 0 on the nullset M(0) 6= 0).Stationarity of the increments ofM follows sinceMε(t)−Mε(s) andMε(t−s)

134


are identically distributed and converge in distribution to M(t)−M(s) andM(t − s), respectively. Independence of the increments follows similarlysince independence is stable under almost sure convergence.

Step 3: Construction of B. Take an arbitrary sequence εkk∈N ⊂ (0, 1] withεk ↓ 0. For each k ∈ N, let us define Bk = Bk(t)t∈[0,∞) by

Bk(t) , X(t)− Y (t)−Mεk(t) for all t ∈ [0,∞).

Moreover, let B = B(t)t∈[0,∞) be the adapted stochastic process given by

B(t) , X(t)− Y (t)−M(t) for all t ∈ [0,∞).

As k → ∞, it follows from the previous step that Bk converges almostsurely to B. Moreover, the jumps of Bk are by construction bounded byεk. Hence B is almost surely continuous and by completeness of F with-out loss of generality continuous (again, after possibly setting B ≡ 0 onthe F(0)-measurable nullset in which continuity fails). By Proposition 5.20(Extracted Jump Processes), each Bk is a Lévy process and Bk, Y,Mεk areindependent. But, as for the process M , this in turn implies that B must bea Lévy process as well and B, Y,M are independent. But then B is a contin-uous Lévy process and thus a Brownian motion with drift by Theorem 5.15(Brownian Motion as a Lévy Process). Since the decompositon of X as thesum of B, Y , and M holds by construction, this concludes the proof.

Let us remark here that choosing the process Y in the Lévy-Ito decomposi-tion to consist of the jumps of X of size greater than one is entirely arbi-trary. We could have separated the large and the small jumps of X at anyother level δ ∈ (0,∞). We observe that changing from the cutoff level oneto another cutoff level δ ∈ (0,∞), this changes the drift parameter of theof the Brownian motion with drift. Also, note that once this cutoff level isfixed, the Lévy-Ito decomposition is unique up to indistinguishability ofthe processes B, Y , and M .

The Lévy-Ito decomposition together with the exponential formula allowsus to compute the characteristic function of any Lévy process explicitly. Thisresult is known as the Lévy-Khintchine formula and shows that the dis-tribution of any Lévy process is determined by its Lévy measure ν and theparameters of the Brownian motion with drift in the Lévy-Ito decomposition.

135


Corollary 5.26 (Lévy-Khintchine Formula). Let X be a Lévy process with Lévymeasure ν. Then there exist µ ∈ Rd and Σ ∈ Rd×d symmetric and positive semi-definite such that

ϕX(t)(z) = etΨ(z) for all z ∈ Rd and t ∈ [0,∞),

where Ψ : Rd → R is defined as

Ψ(z) , i〈z, µ〉 − 12〈z,Σz〉+

∫Rd∗

[ei〈z,y〉 − 1− i〈z, y〉10<|y|≤1

]ν(dy)

for all z ∈ Rd.

Proof. Step 1: Let us first observe that the integral in the definition of Ψ iswell-defined. If y > 1, it is clear that the integrand ei〈z,y〉 − 1 is bounded forall z ∈ Rd. Moreover, for any z ∈ Rd and any y ∈ Rd

∗ with |y| ≤ 1, a Taylorexpansion of y 7→ ei〈z,y〉 around zero shows that

ei〈z,y〉 = 1 + i〈z, y〉 − 12|〈z, y〉|2θ for some θ ∈ C with |θ| ≤ 1.

But then ∣∣ei〈z,y〉 − 1− i〈z, y〉∣∣ = 1

2|θ||〈z, y〉|2 ≤ 1

2|z|2|y|2,

which is integrable by Proposition 5.23 (Integrability of the Lévy Measure).

Step 2: Let us now turn to the characteristic function. By infinite divisibility,it suffices to prove the claim for t = 1. Denote by (B, Y,M) the Lévy-Itodecomposition of X. By independence, we have

ϕX(1)(z) = ϕB(1)(z)ϕY (1)(z)ϕM(1)(z) for all z ∈ Rd.

Since B is a Brownian motion with drift, B(1) is normally distributed withmean µ , E[B(1)] and covariance matrix Σ , Cov[B(1)], i.e.

ϕB(1)(z) = expi〈z, µ〉 − 12〈z,Σz〉 for all z ∈ Rd.

Moreover, Y = Y ∞1 and Proposition 5.22 (Exponential Formula) implies

ϕY (1)(z) = exp∫

Rd∗[ei〈z,y〉 − 1]1|y|>1 ν(dy)

for all z ∈ Rd.

Similarly, if ε ∈ (0, 1] and if we let Mε = Mε(t)t∈[0,∞) be given by

Mε(t) ,∫

[0,t]×Rd∗y1ε<|y|≤1


]for all t ∈ [0,∞),

136


it follows that

ϕMε(1)(z) = exp∫

Rd∗

[ei〈z,y〉 − 1− i〈z, y〉

]1ε<|y|≤1ν(dy)

for all z ∈ Rd.

The result is thus a consequence of Mε(1)→M(1) a.s. (hence also in distri-bution, i.e. ϕMε(1) → ϕM(1)) as ε ↓ 0 and dominated convergence.

The Lévy-Khintchine formula implies that the distribution of a Lévy processX is determined by (µ,Σ, ν), the characteristic triplet of X.

Definition 5.27 (Characteristic Triplet). Let X be a Lévy process with Lévymeasure ν and characteristic function

ϕX(1)(z) = expi〈z, µ〉 − 1

2〈z,Σz〉+

∫Rd∗

[ei〈z,y〉 − 1− i〈z, y〉10<|y|≤1

]ν(dy)

for all z ∈ Rd and some µ ∈ Rd and symmetric and positive semidefiniteΣ ∈ Rd×d. Then (µ,Σ, ν) is called the characteristic triplet of X.

We conclude this chapter by showing that if µ ∈ Rd, Σ ∈ Rd×d is symmetricand positive semidefinite, and ν is a measure on (Rd

∗,B(Rd∗)) with∫

Rd∗|y|2 ∧ 1 ν(dy) <∞,

there exists a Lévy process X with characteristic triplet (µ,Σ, ν).

Theorem 5.28 (Existence of Lévy Processes). Let µ ∈ Rd, Σ ∈ Rd×d symmet-ric and positive semidefinite, and ν a measure on (Rd

∗,B(Rd∗)) satisfying∫

Rd∗|y|2 ∧ 1 ν(dy) <∞,

Then there exists a Lévy process with characteristic triplet (µ,Σ, ν).

Proof. Step 1: Let W1, . . . ,Wd be independent Brownian motions. WritingW = (W1, . . . ,Wd), it follows that the process B = B(t)t∈[0,∞) with

B(t) , µt+√

ΣW (t) for all t ∈ [0,∞)

is a Brownian motion with drift such that E[B(t)] = µt and Cov[B(t)] = Σt.

137


Step 2: Since λ , ν(y ∈ Rd∗ : |y| > 1) < ∞, there exists a Poisson process

N = N(t)t∈[0,∞) with intensity λ. If λ > 0, we can define a probabilitymeasure ρ on (Rd,B(Rd)) by

ρ(B) , 1λ

∫Rd 1B1|y|≥1ν(dy) for all B ∈ B(Rd).

Now let Znn∈N be a sequence of independent and identically distributedrandom variables independent of N such that Z1 has distribution ρ. Definea compound Poisson process Y = Y (t)t∈[0,∞) by

Y (t) ,∑N(t)

n=1 Zn for all n ∈ N.

Clearly, for all t ∈ [0,∞) and z ∈ Rd,

ϕY (t)(z) = E[ei〈z,Y (t)〉] =

∑∞n=0 P

[N(t) = n

]E[ei〈z,Y (t)〉

∣∣N(t) = n]

=∑∞

n=0 e−λt (λt)n

n!E[ei

∑nk=1〈z,Zk〉

]= e−λt

∑∞n=0

(λt)n

n!E[ei〈z,Z1〉

]n,

where we have used that N(t) is Poisson distributed with parameter λt forthe third equality and the independent and identical distribution propertyof Znn∈N for the last equality. Using the definition of ρ, we have

E[ei〈z,Z1〉

]=∫Rd e

i〈z,y〉ρ(dy) = 1λ

∫Rd∗ei〈z,y〉1|y|>1ν(dy).

Plugging this into the expression for ϕY (t) hence shows that

ϕY (t)(z) = e−λt∑∞

n=0tn

n!

[∫Rd∗ei〈z,y〉1|y|>1ν(dy)

]n= e−λt exp

t∫Rd∗ei〈z,y〉1|y|>1ν(dy)

.

Using λ = ν(y ∈ Rd∗ : |y| > 1) =

∫Rd∗1|y|>1ν(dy) therefore yields

ϕY (t)(z) = expt∫Rd∗

[ei〈z,y〉 − 1]1|y|>1ν(dy).

If λ = ν(y ∈ Rd∗ : |y| > 1) = 0, we simply define Y ≡ 0 and the same

representation of ϕY (t) is valid.

Step 3: For each n ∈ N, we can construct as in step 2 a compound Poissonprocess Yn = Yn(t)t∈[0,∞) such that the underlying Poisson process Nn =

Nn(t)t∈[0,∞) has intensity λn , ν(y ∈ Rd∗ : 1/(n + 1) < |y| ≤ 1/n) < ∞

138


and, unless λn = 0 the jump size distribution is given by the measure ρn on(Rd,B(Rd)) given by

ρn(B) , 1λn

∫Rd 1B11/(n+1)<|y|≤1/nν(dy) for all B ∈ B(Rd).

Moreover, we choose these processes such that Ynn∈N are mutually inde-pendent, from which it follows that each Yn is a Lévy process with respectto the filtration FM = FM(t)t∈[0,∞) given by

FM(t) , σ(FYn(t) : n ∈ N

)for all t ∈ [0,∞).

Observe that, as in step 2, for any z ∈ Rd,

ϕYn(t)(z) = expt∫Rd∗

[ei〈z,y〉 − 1]11/(n+1)<|y|≤1/nν(dy),

and since Yn has bounded jumps it has moments of any order. In particular,

E[Yn(t)

]= 1

iDzϕYn(t)(z)

∣∣z=0

= t∫Rd∗y11/(n+1)<|y|≤1/nν(dy).

Defining Mn = Mn(t)t∈[0,∞) by Mn = Yn − E[Yn] and using the indepen-dence of Ynn∈N, it follows that Mn is both a Lévy process and a centeredmartingale with E[|Mn(t)|2] <∞ and

ϕMn(t)(z) = expt∫Rd∗

[ei〈z,y〉 − 1− i〈z, y〉]11/(n+1)<|y|≤1/nν(dy).

Setting Mn ,∑n

k=1Mn, it follows from the independence of Y1, . . . , Yn thatMn is also a Lévy process and a centered martingale with E[|Mn(t)|2] < ∞and

ϕMn(t)(z) = expt∫Rd∗

[ei〈z,y〉 − 1− i〈z, y〉]11/(n+1)<|y|≤1ν(dy).

As in the proof of the Lévy-Ito decomposition, the processes Mnn∈N con-verge almost surely (uniformly on any finite time interval [0, t]) to a càdlàgmartingale and Lévy process M = M(t)t∈[0,∞) such that

ϕM(t)(z) = expt∫Rd∗

[ei〈z,y〉 − 1− i〈z, y〉]10<|y|≤1ν(dy).

Step 4: It is clear that the processes B, Y , and M can be constructed tobe mutually independent. We already know that M is a Lévy process withrespect to FM , and it is clear that Y and B are Lévy processes with respect to

139


their natural filtrations FY and FB, respectively. Since B, Y,M are indepen-dent, it follows moreover that all three processes are Lévy processes withrespect to the filtration F = F(t)t∈[0,∞) given by

F(t) , σ(FB(t),FY (t),FM(t)

)for all t ∈ [0,∞).

But then, again by independence, X , B+ Y +M is also an F-Lévy processwith characteristic triplet (µ,Σ, ν).

140

INDEX

σ-Field of the τ -Past, 16

Adaptedness, 11

Brownian Bridge, 32Brownian Motion, 28

Finite-Dimensional Distribu-tions, 30

Geometric, 65Multidimensional, 113with Drift, 113

Càdlàg, 9Canonical Process, 13Cauchy Distribution, 102Central Limit Theorem, 114Compound Poisson Process, 101

Jump Size Distribution, 102Continuity in Probability, 103

Lévy Process, 104Counting Process, 8Covariance Function, 30

Doob’s Lp InequalityContinuous Time, 91Discrete Time, 72

Doob’s Maximal Inequalities

Continuous Time, 92Discrete Time, 71

Doob’s Upcrossing Inequality, 79Dyadic Rationals, 51

Dyadic Modulus, 51Neighbors, 51

Filtered Probability Space, 10Filtration, 10

Complete, 58Natural, 12Right Continuous, 20Stopped, 23

Finite Variation, 9Finite-Dimensional Distributions,

3Brownian Motion, 30

Fractional Brownian Motion, 34Hurst Index, 34

Galmarino’s Test, 13Gaussian Process, 30

Brownian Bridge, 32Brownian Motion, 28Centered, 30Covariance Function, 30

141

Index

Fractional Brownian Motion,34

Langevin Process, 75Ornstein-Uhlenbeck Process,

33

Hölder Continuity, 35Hitting Time, 18

IncrementsIndependent, 6Stationary, 6

Indistinguishability, 50Infinite Divisibility, 107

of Lévy Processes, 105

Jump Measure, 122

Kolmogorov’s Consistency Condi-tion, 42

Kolmogorov’s Consistency Theo-rem, 43

Kolmogorov-Centsov ContinuityTheorem, 56

Lévy Measure, 128Brownian Motion with Drift,

128Poisson Process, 128

Lévy Process, 98Brownian Motion, 98, 114Characteristic Triplet, 137Compound Poisson Process,

101Continuity in Probability, 104Infinite Divisibility, 105Jump Measure, 122Lévy Measure, 128Lévy-Ito Decomposition, 132Lévy-Khintchine Formula, 136

Markov Property, 107Poisson Process, 100, 117Strong Markov Property, 109

Langevin Process, 75

Markov Chain, 7Markov Property, 108

Lévy Process, 107Martingale, 61

see also SubmartingaleCàdlàg Modification, 95Closed, 64, 88Orthogonality of Increments,

73Wald’s Identity, 70

Martingale Convergence Theorem,84

Memorylessness Property, 117Modification, 50

Optional SamplingContinuous Time, 91Discrete Time, 69

Optional StoppingContinuous Time, 89Discrete Time, 66

Optional Time, 22Ornstein-Uhlenbeck Process, 33

Poisson Process, 100

Random Time Change, 102Random Walk, 5Renewal Process, 8

Stochastic Process, 1Adaptedness, 11Continuous Time, 4Discrete Time, 4Distribution, 3

142

Index

Path, 2Stopped Process, 22

Stopping Time, 13Stopped Filtration, 23Stopped Process, 22

Strong Markov PropertyLévy Process, 109

Submartingale, 61Càdlàg Modification, 93

Subordination, 102

Supermartingale, 61see also Submartingale

Time Index Set, 1Total Variation, 9

Upcrossings, 77

White Noise, 4Wiener Process, 28

see also Brownian Motion

143

VOCABULARY

English German

σ-field σ-Algebra

σ-field of the τ -past σ-Algebra der τ -Vergangenheit

adapted adaptiert

annulus Kreisring

Brownian motion Brownsche Bewegung

Brownian motion with drift Brownsche Bewegung mit Drift

Brownian bridge Brownsche Brücke

canonical process kanonischer Prozess

centered zentriert

central limit theorem zentraler Grenzwertsatz

145

Vocabulary

English German

characteristic triplet charakteristisches Triplett

closed martingale geschlossenes Martingal

complete vollständig

compound Poisson process zusammengesetzter Poisson Prozess

continuity in probability Stetigkeit in Wahrscheinlichkeit

continuous extension stetige Fortsetzung

continuous time zeitstetig

coordinate process Koordinatenprozess

coordinate projection Koordinatenprojektion

countable abzählbar

counting process Zählprozess

decreasing submartingale con-vergence

absteigende Submartingalkonvergenz

denominator Nenner

discrete time zeitdiskret

distribution Verteilung

dyadic modulus dyadisches (Stetigkeits-)Modul

dyadic rationals dyadische Zahlen

146

English German

enumerator Zähler

filtered probability space filtrierter Wahrscheinlichkeitsraum

filtration Filtrierung

finite-dimensional cylinder set endlich-dimensionale Zylindermenge

finite-dimensional distributions endlich-dimensionale Verteilungen

fractional Brownian motion fraktionale Brownsche Bewegung

Gaussian process Gaußprozess

geometric Brownian motion geometrische Brownsche Bewegung

hitting time Treffzeit

identity mapping Identitätsabbildung

identity matrix Einheitsmatrix

increasing submartingale conver-gence

aufsteigende Submartingalkonver-genz

indistinguishable ununterscheidbar

infinite divisibility unendliche Teilbarkeit

jump measure Sprungmaß

jump size distribution Sprunghöhenverteilung

Kolmogorov’s consistency condi-tion

Kolmogorov’s Konsistenzbedingung

147

Vocabulary

English German

Lévy-Ito decomposition Lévy-Ito Zerlegung

Lévy measure Lévy-Maß

Markov chain Markov-Kette

Markov property Markov-Eigenschaft

martingale Martingal

mean reversion Mittelwertrückkehr

measurable space messbarer Raum bzw. Messraum

memorylessness Gedächtnislosigkeit

modification Modifikation

one-dimensional distributions eindimensionale Verteilungen

optional time optionale Zeit

probability space Wahrscheinlichkeitsraum

path Pfad

premeasure Prämaß

random measure Zufallsmaß

random time change zufälliger Zeitwechsel

random variable Zufallsvariable

random walk Irrfahrt

148

English German

raw process roher Prozess

renewal process Erneuerungsprozess

stationary stationär

stochastic process stochastischer Prozess

stopping time Stoppzeit

strong Markov property starke Markov-Eigenschaft

submartingale Submartingal

subordination Subordination

supermartingale Supermartingal

time index set Zeitindexmenge

total variation Totalvariation

uniformly integrable gleichgradig integrierbar

upcrossing Aufkreuzung

upcrossing inequality Aufkreuzungsungleichung

149

stochastic processesbelak.ch/wp-content/uploads/2018/01/lecturenotes-stoprows17-2018-01-31.pdf ·...

Documents