filtering and measure theory

8/6/2019 Filtering and Measure Theory

1/270
http://www.cambridge.org/9780521838030


2/270

This page intentionally left blank


3/270

Measure Theory and Filtering

Introduction and Applications

The estimation of noisily observed states from a sequence of data has traditionally incor-porated ideas from Hilbert spaces and calculus-based probability theory. As conditional

expectation is the key concept, the correct setting for filtering theory is that of a probabil-

ity space. Graduate engineers, mathematicians, and those working in quantitative finance

wishing to use filtering techniques will find in the first half of this book an accessible

introduction to measure theory, stochastic calculus, and stochastic processes, with particular

emphasis on martingales and Brownian motion. Exercises are included, solutions to which

are available from www.cambridge.org. The book then provides an excellent users guide

to filtering: basic theory is followed by a thorough treatment of Kalman filtering, includingrecent results that exend the Kalman filter to provide parameter estimates. These ideas are

then applied to problems arising in finance, genetics, and population modelling in three sep-

arate chapters, making this a comprehensive resource for both practitioners and researchers.

Lakhdar Aggoun is Associate Professor in the Department of Mathematics and Statistics

at Sultan Qabos University, Oman.

Robert Elliott is RBC Financial Group Professor of Finance at the University of Calgary,

Canada.


4/270

CAMBRIDGE SERIES IN STATISTICAL AND

PROBABILISTIC MATHEMATICS

Editorial Board

R. Gill (Department of Mathematics, Utrecht University)

B. D. Ripley (Department of Statistics, University of Oxford)

S. Ross (Department of Industrial Engineering, University of California, Berkeley)

M. Stein (Department of Statistics, University of Chicago)

B. Silverman (St. Peters College, University of Oxford)

This series of high-quality upper-division textbooks and expository monographs covers

all aspects of stochastic applicable mathematics. The topics range from pure and appliedstatistics to probability theory, operations research, optimization, and mathematical pro-

gramming. The books contain clear presentations of new developments in the field and

also of the state of the art in classical methods. While emphasizing rigorous treatment of

theoretical methods, the books also contain applications and discussions of new techniques

made possible by advances in computational practice.

Already published

1. Bootstrap Methods and Their Application, by A. C. Davison and D. V. Hinkley

2. Markov Chains, by J. Norris3. Asymptotic Statistics, by A. W. van der Vaart

4. Wavelet Methods for Time Series Analysis, by Donald B. Percival and Andrew T. Walden

5. Bayesian Methods, by Thomas Leonard and John S. J. Hsu

6. Empirical Processes in M-Estimation, by Sara van de Geer

7. Numerical Methods of Statistics, by John F. Monahan

8. A Users Guide to Measure Theoretic Probability, by David Pollard

9. The Estimation and Tracking of Frequency, by B. G. Quinn and E. J. Hannan

10. Data Analysis and Graphics using R, by John Maindonald and John Braun

11. Statistical Models, by A. C. Davison

12. Semiparametric Regression, by D. Ruppert, M. P. Wand, R. J. Carroll

13. Exercises in Probability, by Loic Chaumont and Marc Yor


5/270


6/270

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, So Paulo

Cambridge University PressThe Edinburgh Building, Cambridge , UK

First published in print format

- ----

- ----

Cambridge University Press 2004

2004

Information on this title: www.cambridge.org/9780521838030

This publication is in copyright. Subject to statutory exception and to the provision ofrelevant collective licensing agreements, no reproduction of any part may take placewithout the written permission of Cambridge University Press.

- ---

- ---

Cambridge University Press has no responsibility for the persistence or accuracy ofsfor external or third-party internet websites referred to in this publication, and does not

guarantee that any content on such websites is, or will remain, accurate or appropriate.

Published in the United States of America by Cambridge University Press, New York

www.cambridge.org

hardback

eBook (EBL)

eBook (EBL)

hardback
http://www.cambridge.org/9780521838030http://www.cambridge.org/http://www.cambridge.org/9780521838030http://www.cambridge.org/


7/270

Contents

Preface page ix

Part I Theory 1

1 Basic probability concepts 3

1.1 Random experiments and probabilities 3

1.2 Conditional probabilities and independence 9

1.3 Random variables 14

1.4 Conditional expectations 28

1.5 Problems 34

2 Stochastic processes 38

2.1 Definitions and general results 38

2.2 Stopping times 46

2.3 Discrete time martingales 50

2.4 Doob decomposition 56

2.5 Continuous time martingales 59

2.6 DoobMeyer decomposition 62

2.7 Brownian motion 702.8 Brownian motion process with drift 72

2.9 Brownian paths 72

2.10 Poisson process 75

2.11 Problems 75

3 Stochastic calculus 79

3.1 Introduction 79

3.2 Quadratic variations 80

3.3 Simple examples of stochastic integrals 87

3.4 Stochastic integration with respect to a Brownian motion 90

3.5 Stochastic integration with respect to general martingales 94

3.6 The Ito formula for semimartingales 97

3.7 The Ito formula for Brownian motion 108

3.8 Representation results 115

3.9 Random measures 123

3.10 Problems 127


8/270

vi Contents

4 Change of measures 131

4.1 Introduction 131

4.2 Measure change for discrete time processes 134

4.3 Girsanovs theorem 145

4.4 The single jump process 1504.5 Change of parameter in poisson processes 157

4.6 Poisson process with drift 161

4.7 Continuous-time Markov chains 163

4.8 Problems 165

Part II Applications 167

5 Kalman filtering 1695.1 Introduction 169

5.2 Discrete-time scalar dynamics 169

5.3 Recursive estimation 169

5.4 Vector dynamics 175

5.5 The EM algorithm 177

5.6 Discrete-time model parameter estimation 178

5.7 Finite-dimensional filters 180

5.8 Continuous-time vector dynamics 1905.9 Continuous-time model parameters estimation 196

5.10 Direct parameter estimation 206

5.11 Continuous-time nonlinear filtering 211

5.12 Problems 215

6 Financial applications 217

6.1 Volatility estimation 217

6.2 Parameter estimation 221

6.3 Filtering a price process 222

6.4 Parameter estimation for a modified Kalman filter 223

6.5 Estimating the implicit interest rate of a risky asset 229

7 A genetics model 235


7.2 Recursive estimates 235

7.3 Approximate formulae 239

8 Hidden populations 242


8.2 Distribution estimation 243

8.3 Parameter estimation 246

8.4 Pathwise estimation 247

8.5 A Markov chain model 248


9/270

Contents vii

8.6 Recursive parameter estimation 250

8.7 A tags loss model 250

8.8 Gaussian noise approximation 253

References 255Index 257


10/270


11/270

Preface

Traditional courses for engineers in filtering and signal processing have been based on

elementary linear algebra, Hilbert space theory and calculus. However, the key objectiveunderlying such procedures is the (recursive) estimation of indirectly observed states given

observed data. This means that one is discussing conditional expected values, given the

observations. The correct setting for conditional expected value is in the context of mea-

surable spaces equipped with a probability measure, and the initial object of this book is

to provide an overview of required measure theory. Secondly, conditional expectation, as

an inverse operation, is best formulated as a form of Bayes Theorem. A mathematically

pleasing presentation of Bayes theorem is to consider processes as being initially defined

under a reference probability. This is an idealized probability under which all the observa-tions are independent and identically distributed. The reference probability is a much nicer

measure under which to work. A suitably defined change of measure then transforms the

distribution of the observations to their real world form. This setting for the derivation of the

estimation and filtering results enables more general results to be obtained in a transparent

way.

The book commences with a leisurely and intuitive introduction to -fields and the results

in measure theory that will be required.

The first chapter also discusses random variables, integration and conditional expectation.

Chapter 2 introduces stochastic processes, with particular emphasis on martingales and

Brownian motion.

Stochastic calculus is developed in Chapter 3 and techniques related to changing proba-

bility measures are described in Chapter 4.

The change of measure method is the basic technique used in this book.

The second part of the book commences with a treatment of Kalman filtering in

Chapter 5. Recent results, which extend the Kalman filter and enable parameter estimates

to be obtained, are included. These results are applied to financial models in Chapter 6. The

final two chapters give some filtering applications to genetics and population models.

The authors would like to express their gratitude to Professor Nadjib Bouzar of the

Department of Mathematics and Computer Science, University of Indianapolis, for the

incredible amount of time he spent reading through the whole manuscript and making

many useful suggestions.

Robert Elliott would like to acknowledge the support of NSERC and the hospitality of

the Department of Applied Mathematics at the University of Adelaide, South Australia.


12/270

x Preface

Lakhdar Aggoun would like to acknowledge the support of the Department of Mathemat-

ics and Statistics, Sultan Qaboos University, Al-Khoud, Sultanate of Oman; the hospitality

of the Department of Mathematical Sciences at the University of Alberta, Canada; and the

Haskayne School of Business, University of Calgary, Calgary, Canada.


13/270

Part I

Theory


14/270


15/270

1

Basic probability concepts

1.1 Random experiments and probabilities

An experiment is random if its outcome cannot be predicted with certainty. A simple exampleis the throwing of a die. This experiment can result in any of six unpredictable outcomes 1,

2, 3, 4, 5, 6 which we list in what is usually called a sample space = {1, 2, 3, 4, 5, 6} ={1, 2, 3, 4, 5, 6}. Another example is the amount of yearly rainfall in each of thenext 10 years in Auckland. Each outcome here is an ordered set containing ten nonnegative

real numbers (a vector in IR10+ ); however, one has to wait 10 years before observing theoutcome .

Another example is the following.

Let Xt be the water level of a dam at time t. If we are interested in the behavior of Xt

during an interval of time [t0, t1] say, then it is necessary to consider simultaneously an

uncountable family of Xts, that is,

= {0 Xt < , t0 t t1}.

The smallest observable outcome of an experiment is called simple.

The set{

1}

containing 1 resulting from a throw of a die is simple. The outcome odd

number is not simple and it occurs if and only if the throw results in any of the three simple

outcomes 1, 3, 5. If the throw results in a 5, say, then the same throw results also in a

number larger than 3 or odd number. Sets containing outcomes are called events. The

events odd number and a number larger than 3 are not mutually exclusive, that is, both

can happen simultaneously, so that we can define the event odd number and a number

larger than 3.

The event odd number and even number is clearly impossible or empty. It is called

the impossible eventand is denoted, in analogy with the empty set in set theory, by

. The

event odd number oreven number occurs no matter what is the event . It is itself and

is called the certain event.

In fact possible events of the experiment can be combined naturally using the set opera-

tions union, intersection, and complementation. This leads to the concept of field or algebra

(-field (sigma-field) or -algebra, respectively) which is of fundamental importance in the

theory of probability.


16/270

4 Basic probability concepts

A nonempty class F of subsets of a nonempty set is called a fieldor algebra if

1. F,2. F is closed under finite unions (or finite intersections),

3. F is closed under complementation.

It is a -fieldor (-algebra) if the stronger condition

2. F is closed under countable unions (or countable intersections)

holds.

If{F} is a -field the pair (,F) is called a measurable space. The sets B Fare calledevents and are said to be measurable sets.

For instance, the collection of finite unions of the half open intervals (a, b], ( 0 Ft+ . We may also say that a filtration {Ft, t 0} is right-continuous if newinformation at time t arrives precisely at time t and not an instant after t.

It is left-continuous if{Ft} contains events strictly priorto t, that is Ft =

s


19/270

1.1 Random experiments and probabilities 7

It is easily seen that if is finite we need only specify P on atoms ofF.

The triple (, F, P) is called a probability space.

Nonempty events which are unlikely to occur and to which a zero probability is assigned

are called negligible events or null events.

A -field F is P-complete if all subsets of null events are also events. Of course, theirprobability is zero.

A filtration is complete ifF0 is complete, i.e. all the null events are known at the initial

time.

The mathematical object (,F,Ft, P), where the filtration {Ft, t 0} is right-continuous and complete, is sometimes called a stochastic basis or a filtered probability

space .

The filtration {Ft, t 0} is said to satisfy the usual conditions if it is right-continuous

and complete.For monotonic sequences of events we have the following result on continuity of proba-

bility measures.

Theorem 1.1.3 Let(,F, P) be a probability space. If{An} is an increasing sequence ofevents with limit A, then

P(An) P(A),

and if{B

n}is a decreasing sequence of events with limit B, then

P(Bn) P(B).

Proof To prove the first statement, visualize the sequence {An} as a sequence of increasingconcentric disks and then define the sequence of disjoint rings {Rn} (except for R1 whichis the disk A1):

R1 = A1, R2 = A2 A1, . . . , Rn = An An1.

Note that

Ak = kn=1Rn , A = n=1An = n=1Rn ,

so that by -additivity

P(A) = n=1 P(Rn ) = limkkn=1 P(Rn) = limk P(kn=1Rn ) = limk P(Ak).The proof of the second statement follows by considering the sequence of complementary

events { Bn} which is increasing with limit B, so that

1 P(An) 1 P(A) = P(An) P(A).

Example 1.1.4 Consider the experiment of tossing a fair coin infinitely many times and

observing the outcomes ofall tosses. Here each = (H, T) is a countably infinitesequence of Heads and Tails. If we denote Heads and Tails by 0 and 1, each is a

sequence of 0s and 1s and it can be shown that there are as many s as there are points in

the interval [0, 1)!


20/270


Suppose we wish to estimate the probability of the event consisting of those s for which

the proportion of heads converges to 1/2. The so-called Strong Law of Large Numbers

says that this probability is equal to one, i.e. the s for which the convergence to 1/2

does not hold form a negligible set. However, this negligible set is rather huge, as can be

imagined!

Example 1.1.5 In Example 1.1.4 let Fn,S be the collection of infinite sequences of Hs and

Ts with some restriction S put on the first n tosses. For instance, ifn = 3,

S = { H H T . . . , H T H . . . , T H H . . . } (H, T)3,

F3,S is the collection of infinite sequences of Hs and Ts for which the first three entries

contain exactly two Hs.It is left as an exercise to show that the class

F= {Fn,S, S (H, T)n , n IN} is a field.

We now quote without proof from [4] the following result on extending a function P defined

on sets in a field.

Theorem 1.1.6 ([4]) If P is a probability measure on a fieldA, then it can be extended

uniquely to the -field F= {A} generated by A, i.e. the restriction of the extensionmeasure to the fieldA is P itself and by tradition they are both denoted by P.

Let us return to the coin-tossing situation of Example 1.1.5.

Using the extension theorem (Theorem 1.1.6) one can construct a (unique) probability

measure P called product probability measure on the space ((H, T),F), starting from aninitial probability (p(H), p(T)) = (1/2, 1/2) by setting

P(Fn,S ) =S

12

n = (number of infinite sequences in S) 12

n.

It is left as an exercise to show that P does not depend on the representations of sets in F

and that it is countably additive. (See [4]).

An immediate generalization of the coin tossing experiment in Example 1.1.5 is to con-

sider an infinite sequence of independent experiments, to which corresponds an infinite

sequence of probability spaces (1,F1, P1), (2,F2, P2), . . . . We are interested in the

space (

)

=

1

2 . . . of all infinite sequences

=(

1,

2, . . . ). Events of inter-

est are again cylinder sets, i.e. infinite sequences with restrictions put on the first n outcomes.

The collection of all these cylinders form a field which generates a -field F, often denoted

F1 F2 . . . . A probability measure P can be defined on cylinder sets then extendeduniquely to F using the Extension Theorem 1.1.6.

In the coin-tossing experiment, an example of an event which is in F is the event F that

a Head will occur. Clearly, F = k=1 Fk, where Fk is the event that a Head occurs onthe k-th trial and not before. Since each Fk is a cylinder set, P(Fk) is well defined for each


21/270


k 1. Moreover the Fks are pairwise disjoint, hence

P(F) =

k=1P(Fk) =

k=1

1

2k= 1.

Note that this probability is still 1 regardless of the size of the probability of occurrence ofa Head, (as long as it is not 0).

Modeling with infinite sample spaces is not a mathematical fantasy. In many very simple

minded problems infinite sequences of outcomes cannot be avoided. For example, the first

time a Head occurs event cannot be described in a finite sample space model because the

number of trials before it occurs cannot be bounded in advance.

In general, it is impossible to define a probability measure on all the subsets of an infinite

sample space; that is, one cannot say any subset is an event. However, consider the following

case.

Example 1.1.7 Suppose that is countable and let F be the -field 2. Then it is not

difficult to define a probability measure on F. Choose P such that

0 P({}) 1 and P({}) =

P() = 1,

and for any F F, define P(F) =F P().Let {Fn}nIN be a sequence of disjoint sets in F and let n, denote the simple events in

Fn. Since we have an infinite series of nonnegative numbers,

P(

n

Fn) =n,m

P(n,m ) =

n

m

P(n,m ) =

n

P(Fn ).

1.2 Conditional probabilities and independence

Given a probability space (,F, P) and some event B with P(B)

=0, we define a new

posteriorprobability measure as follows. If A is any event we define the probability of A

given B as

P(A | B) = P(A and B)P(B)

= P(A B)P(B)

,

provided P(B) > 0. Otherwise P(A | B) is left undefined.What we mean by given event B is that we know that event B has occurred, that is we

know that B, so that we no longer assign the same probabilities given by P to eventsbut assign new, or updated, probabilities given by the probability measure P(. | B). Anyevent which is mutually exclusive with B has probability zero under P(. | B) and the newprobability space is now (B,F B, P(. | B)).

If our observation is limited to knowing whether event B has occurred or not we may as

well define P(. | B), where B is the complement of B within . Prior to knowing wherethe outcome is we define the, now random, quantity:

P(. | B or B)() = P(. | {B})() = P(. | B)IB () + P(. | B)IB ().


22/270


This definition extends in an obvious way to a -field G generated by a finite or countable

partition {B1, B2, . . . } of and the random variable P(. | G)() is called the conditionalprobability given G. The random function P(. | G)() whose values on the atoms Bi areordinary conditional probabilities P(. | Bi ) =

P(. Bi )P(B

I)

is not defined if P(Bi ) = 0. Inthis case we have a family of functions P(. | G)(), one for each possible arbitrary valueassigned to the undefined P(. | Bi ). Usually, one version is chosen and different versionsdiffer only on a set of probability 0.

Example 1.2.1 Phone calls arrive at a switchboard between 8:00 a.m. and 12:00 p.m.

according to the following probability distribution:

1. P(k calls within an interval of length l) = el lk

k!;

2. If I1 and I2 are disjoint intervals,

P((k1 calls within I1) (k2 calls within I2))= P(k1 calls within I1) P(k2 calls within I2),

that is, events occurring within disjoint time intervals are independent.

Suppose that the operator wants to know the probability that 0 calls arrive between 8:00

and 9:00 given that the total number of calls from 8:00 a.m. to 12:00 p.m., N812, is known.

From past experience, the operator assumes that this number is near 30 calls, say. Hence

P(0 calls within [8, 9) | 30 calls within [8, 12])

= P((0 calls within [8, 9))

(30 calls within [9, 12]))

P(30 calls within [8, 12])

= P(0 calls within [8, 9))P(30 calls within [9, 12])P(30 calls within [8, 12])

=

3

4

30,

which can be written as

P(0 calls within [8, 9) | N812 = N) =

3

4

N. (1.2.1)

Remarks 1.2.2 Consider again Example 1.2.1.

1. The events Fi = { : N812() = i}, i = 0, 1, . . . form a partition of and are atomsof the -field generated by observing only N812, so we may write:

P(0 calls within [8, 9) | Fi , i IN)()= P(0 calls within [8, 9) | {Fi , i IN})()

=i

3

4

iIFi ().

2. Observe that since each event F {Fi , i IN} is a union of some Fi1 , Fi2 , . . . , andsince we know, at the end of the experiment, which Fj contains , then we know


23/270


whether or not lies in F, that is whether F or the complement of F has occurred. In

this sense, {Fi , i IN} is indeed all we can answer about the experiment from what weknow.

The likelihood of occurrence of any event A could be affected by the realization of B.Roughly speaking if the proportion of A within B is the same as the proportion of A

within then it is intuitively clear that P(A | B) = P(A | ) = P(A). Knowing that Bhas occurred does not change the prior probability P(A). In that case we say that events

A and B are independent. Therefore two events A and B are independent if and only if

P(A B) = P(A)P(B).Two -fields F1 and F2 are independent if and only if P(A1 A2) = P(A1)P(A2) for

all A1 F1, A2 F2.

If events A and B are independent so are {A} and {B} because the impossible event is independent of everything else including itself, and so is . Also A and Bc, Ac andB, Ac and Bc are independent. We can say a bit more, if P(E) = 0 or P(E) = 1 then theevent E is independent of any other event including E itself, which seems intuitively clear.

Mutually exclusive events with positive probabilities provide a good example of depen-

dent events.

Example 1.2.3 In the die throwing experiment the -fields

F1 =

{{

1, 2}

,{

3, 4, 5, 6}}

,

and

F2 = {{1, 2}, {3, 4}, {5, 6}},are not independent since if we know, for instance, that has landed in {5, 6} (or equivalently{5, 6} has occurred) in F2 then we also know that the event {3, 4, 5, 6} in F1 has occurred.This fact can be checked by direct calculation using the definition. However, the -fields

F3

=

{{1, 2, 3

},

{4, 5, 6

}},

and

F4 = {{1, 4}, {2, 5}, {3, 6}},are independent. The occurrence of any event in any ofF3 or F4 does not provide any

nontrivial information about the occurrence of any (nontrivial) event in the other field.

Another fundamental concept of probability theory is conditional independence. Events A

and C are said to be conditionally independent given event B if P(A

C|

B)=

P(A|B)P(C | B), P(B) > 0.

The following example shows that it is not always easy to decide, under a probability

measure, if conditional independence holds or not between events.

Example 1.2.4 Consider the following two events:

A1=person 1 is going to watch a football game next weekend,A2=person 2, with no relation at all with person 1, is going to watch a football game next

weekend.


24/270


There is no reason to doubt the independence of A1 and A2 in our model. However consider

now the event B = next weekend weather is good. Suppose thatP(A1 | B) = .90, P(A2 | B) = .95, P(A1 | B) = .40,

P(A2|

B)=

.30, P(B)=

.75 and P(B)=

.25.

Using this information it can be checked that P(A1 A2) = P(A1) P(A2). The reason isthat event B has linked events A1 and A2 in the sense that if we knew that A1 has occurred

the probability of B should be high, resulting in the probability of A2 increasing.

The independence concept extends to arbitrary families of events. A family of events

{A, I} is said to be a family of independent events if and only if any finite subfamilyis independent, i.e., for any finite subset of indices

{i1, i2, . . . , ik

} I,

P(Ai1 Ai2 Aik) = P(Ai1 )P(Ai2 ) . . . P(Aik).A family of-fields {F , I} is said to be a family of independent -fields if and only ifany finite subfamily {Fi1 ,Fi2 , . . . ,Fik} is independent; that is, if and only if any collectionof events of the form {Ai1 Fi1 , Ai2 Fi2 , . . . , Aik Fik} is independent.

An extremely powerful and standard tool in proving properties which are true

with probability one is the BorelCantelli Lemma. This lemma concerns sequences of

events.

Let {An} be a monotone decreasing sequence of events, i.e.A1 A2 An An+1 . . . ,

then by definition

limn

An =

n=1An.

Let

{Bn

}be a monotone increasing sequence of events, i.e.

B1 B2 Bn Bn+1 . . . ,then by definition

limn

Bn =

n=1Bn .

Let {Cn} be an arbitrary sequence of events. Define

An = supkn

Ck= k=n

Ck,

and

Bn = infkn

Ck=

k=n

Ck.

Event An occurs if and only if at least one of the events Cn, Cn+1, . . . occurs and event Bnoccurs if and only if all the Cn occur simultaneously except for a finite number.


25/270


By construction, An and Bn are monotone. An is decreasing and Bn is increasing so that:

A = limn

An =

n=1An =

n=1

k=n

Ck,

and

B = limn

Bn =

n=1Bn =

n=1

k=n

Ck.

Event A = n=1k=n Ck = lim sup Cn occurs if and only if infinitely many Cn occur, orCn occurs infinitely often (Cn i.o.). To see this suppose that belongs to an infinite number

of Cn s; then for every n,

k=n Ck. Therefore,

n=1

k=n Ck. Conversely, if belongs to only a finite number of Cn s, then there is some n0 such that

k=n0 Ck.

Sincen=1k=n Ck k=n0 Ck, this shows that n=1k=n Ck if belongs to only

a finite number ofCn s.

Event B = n=1k=n Ck = lim infCn occurs if and only if all but a finite number ofCn occur.

Clearly lim infCn lim sup Cn .Consider the following simple example of sequences of intervals in IR.

Example 1.2.5 Let A and B be any subsets of and define the sequences C2n = A andC2n

+1

=B. Then:

lim sup Cn = A B, lim infCn = A B.

Example 1.2.6 Let

Ck = {(x , y) IR2 : 0 x < k, 0 y 0,

({ : |f()| }) 1p |f()|

pd().

Proof Let F = { : |f()| }. Then

|f()|pd()

F

|f()|pd() p

F

d = p(F ).

In addition to almost sure convergence, which was defined in Example 1.3.11, we have

the following types of convergence.


39/270

1.3 Random variables 27

First recall that Lp(, F, P), p 1, is the space of random variables with finite absolutep-th moments, that is, E[|X|p] < .

{Xk} converges to X in Lp (Xk Lp

X), (0 < p < ), ifE[

|Xk

|p] ] 0 (k ).

Let Fn (x ) = P[Xn x ], F(x) = P[X x ]. Xn converges in distribution to X (Xn DX) if

IR

g(x)dFn(x) IR

g(x)dF(x),

for every real valued, continuous bounded function g defined on IR. A necessary and suffi-

cient condition for that is:

Fn(x) F(x ),at every continuity point x of F [7].

These convergence concepts are in the following relationship to each other.

(Xk a.s. X) (Xk P X) (Xn D X).A useful concept is the uniform integrability of a family of random variables which

permits the interchange of limits and expectations.

Definition 1.3.34 A sequence {Xn} of random variables is said to be uniformly integrableif

supn

E[|Xn|I{|Xn |>A}] 0, (A ). (1.3.4)

A family {Xt}, t 0 of random variables is said to be uniformly integrable ifsup

t

E[|Xt|I{|Xt|>A}] 0, (A ). (1.3.5)

Example 1.3.35 IfL is bounded in Lp(,F, P) for some p > 1, then L is uniformly

integrable.

Proof Choose A so large that E[|X|p] < A for all X L. For fixed X L, let Y =|X|I{|X|>K}. Then Y() K I{|X|>K} > 0 for all . Since p > 1,

Yp1

Kp1 I{|X|>K},

and

K1pYp = Yp1

Kp1Y Y I{|X|>K} = Y.

Thus

E[Y] K1pE[Yp] K1pE[|X|p] K1pA,which goes to 0 when K , from which the result follows.


40/270


The following result is a somewhat stronger version of Fatous Lemma 1.3.16.

Theorem 1.3.36 Let{Xn} be a uniformly integrable family of random variables. Then

E[lim infXn ] lim infE[Xn ].

Proof The proof is left as an exercise

Corollary 1.3.37 Let{Xn} be a uniformly integrable family of random variables such thatXn X (a.s.), then

E|Xn| < , E(Xn ) E(X), and E|Xn X| 0.

The following deep result (Shiryayev [36]) gives a necessary and sufficient condition for

taking limits under the expectation sign.

Theorem 1.3.38 Let0 Xn X and E(Xn ) < . Then

E(Xn ) E(X) the family {Xn} is uniformly integrable.

Proof The sufficiency part follows from Theorem 1.3.36. To prove the necessity, note

that if x is not a point of positive probability for the distribution of the random variable X

then XnI

{Xn


41/270


Let X = i xiIAi be a simple random variable on a probability space (,F, P). Whatis the expected value of X given some event B having positive probability P(B)? Under

the posterior probability measure P(. | B) this is

E[X

|B]

= xi P(X = xi | B)= 1

P(B)

xi P({X = xi } B) =

1

P(B)E[X IB ].

E[X IB ] is the probability weighted sum of the values taken on by X in the event B. We

divide the weighted sum by P(B) to obtain the weighted average.

We could write as a definition:

E[X | B] = E[X IB ]E[IB ]

= E[X IB ]P(B)

.

Let X = IC and Y = IB . The -field (Y) is generated by the atoms B and B. To see this,consider any Borel set B:

Y1{B} =

if{0, 1} / B,B if 0 B, 1 / B,B if 1 B, 0 / B, if{0, 1} B.

Hence (Y) = {, B, B, }.Define

E[X | Y] = E[X | (Y)] = E[X | atoms of(Y)] = E[X | B, B].

Or,

E[IC | B, B]() = P(C | B, B)() = P(C | B)IB () + P(C | B)IB ().

Hence E[X | Y] is a function constant on the atoms of(Y). That isE[X | Y] is (Y)-measurable.

Since E[X | Y] is a random variable its mean is:

E[E[X | Y]] = E[P(C | B)IB () + P(C | B)IB ()]= P(C B) + P(C B) = P(C) = E[X].

If X is an integrable random variable and Y = i yiIBi is a simple random variable, wewrite

E[X | Y] = E[X | (Y)] = E[X IBi ]

P(Bi )IBi ().

Hence E[X | Y] is (Y)-measurable and

E[E[X | Y]] =

E[X IBi ] = E[X].

The expected value of E[X|

Y] is the same as the expected value of X.


42/270


Let X L1 (E|X| < ) be a (nonnegative for simplicity) random variable on a prob-ability space (,F, P) and G be a sub--field ofF. The probability space (,G, P) is

a coarsening of the original one and X is, in general, not measurable with respect to G.

We seek now a G-measurable random variable, which we denote temporarily by XG , that

assumes, on average, the same values as X. That is, we seek an integrable random variableXG such that XG is G-measurable and

A

XGdP =

A

XdP, for all A G.

Now the set function Q(A) = A

Xd P is a measure absolutely continuous with respect to

P, so that the RadonNikodym Theorem 1.3.25 guarantees the existence of a G-measurable

random variable suggestively denoted by E(X | G), which is uniquely determined excepton an event of probability zero, such that

A

XdP =

A

E[X | G]dP,

for all A G. We say that XG is a version of E(X | G). For a general integrable randomvariable X we define E[X | G] as E[X+ | G] E[X | G].

Remark 1.4.1 Let(,F, P) be given, and suppose Xisan L2 random variable (measurable

with respect to F). Let G be a sub--algebra ofF, that is, G is less informative than F. A

natural question is: by observing only G how much can we learn about X? Or, among all

random variables which are G-measurable which one gives us the best information (in the

mean square sense) about the random variable X? It turned out that E[X | G] is the closest(G-measurable) random variable to X. This is seen by considering, for any G-measurable

random variable,

Z = X E[X | G].

Then:

E[(Z Y)2] = E[(X E[X | G])2 + Y2 + 2Y(X E[X | G])]= E[E[(X E[X | G])2 | G]] + E[Y2].

This is minimized when Y = 0 a.s.

Example 1.4.2 Let = (0, 1], X() = , P be Lebesgue measure and consider the -field

G = {(0,1

4 ], (1

4 ,

1

2 ], (1

2 ,

3

4 ], (3

4 , 1]} = {A1, A2, A3, A4}.E[X | G] must be constant on the atoms of G so that

E[X | G]() =

xiIAi ().

where xi =E[X IAi ]

P(Ai ).

Clearly P(Ai )

=

1

4and E[X IAi ]

= Aixdx .


43/270


Hence

E[X | G]() = 18

IA1 () +2

8IA2 () +

5

8IA3 () +

7

8IA4 (),

which is a G-measurable random variable.

Example 1.4.3 Let X1, X2 and X3 be three independent, identically distributed (i.i.d.)

random variables such that

P(Xi = 1) = p = 1 P(Xi = 0) = 1 q.

Let S = X1 + X2 + X3. Suppose that we observe X1 and X2 and we wish to find the(conditional) probability that S = 2 given X1 and X2. The -field generated by the (vector)

random variable (X1, X2) is generated by the atoms {Ai j }, i, j = 0, 1, where Ai j = [ :X1() = i, X2() = j ].

P(S = 2 | X1, X2)() = P(S = 2 | {X1, X2})()=

i,j=0,1P(S = 2 | Ai j )IAi j ()

=

i,j=0,1

P(S = 2 Ai j )P(Ai j )

IAi j ()

= i,j=0,1

P(i + j + X3 = 2) P(Ai j )P(Ai j )

IAi j ()

=

i,j=0,1P(X3 = 2 i j )IAi j ()

= P(X3 = 0)IA11 + P(X3 = 1)I{A10A01}= q IA11 () + p I{A10A01}().

The expected value of the (

{X1, X2

}-measurable) random variable

P(S = 2 | X1, X2) isE[q IA11 () + p I{A10A01}()] = q P(A11) + p[ P(A01) + P(A01)] = P(S = 2).

Example 1.4.4 Let f L1[0, 1], i.e. the Lebesgue integral [0,1)

|f(x)|dx exists and isfinite. Let Fn = {[

j

2n,

j + 12n

), j = 0, . . . , 2n 1}. Then

E[ f | Fn]() =2n1j=0

(j+1)2nj 2n f(x)dx

2nI[j 2n ,(j+1)2n )().

Theorem 1.4.5 If X is real F-measurable random variable and if

A

XdP = 0 for allA

F, then X

=0 a.s.


44/270


Proof Suppose X 0 and

A

Xd P = 0 for all A F. Write An = { : X() 1n }.

AnXdP 1

nP(An) 0.

But

An

XdP = 0 so P(An ) = 0 for all n. Therefore,

P({X > 0} = P(

An )

P(An ) = 0.For a general random variable X, recall that X = X+ X, where both X+ and X arenonnegative.

The following is a list of classical results on conditional expectation:

1. E(X | A) is unique (a.s.)Proof Let X1 = E(X | A) and X2 be an A-measurable random variable such that

A

X2d P =

A

Xd P,

for all A A and let 0 = { : X1 > X2} A. Hence

0

X1

dP= 0 E(X | A) = 0 XdP,

and 0

X2d P =

0

XdP,

so that 0

X1dP =

0

X2d P,

or 0

(X1 X2)dP = 0.

Using Theorem 1.4.5 X1 = X2 a.s.

2. IfA1 and A2 are two sub--fields ofF such that A1 A2, thenE(E(X

|A1)

|A2)

=E(E(X

|A2)

|A1)

=E(X

|A1). (1.4.1)

Proof Clearly E(E(X | A1) | A2) = E(E(X | A2) | A1). Now E(E(X | A2) | A1) isA1-measurable and for A A1,

A

E(E(X | A2) | A1))dP =

A

E(X | A2)d P

=

A

XdP =

A

E(X | A1)dP.

Hence E(E(X|A2)

|A1)

=E(X

|A1) a.s.


45/270


3. If X, Y, X Y L1 and Y is A-measurable then

E[X Y | A] = Y E[X | A]. (1.4.2)

Proof It is sufficient to prove the result when X and Y are positive. IfY

=IA, A

A,

then for every B AB

X YdP =

ABXdP =

AB

E[X | A]dP

=

B

IAE[X | A]dP =

B

Y E[X | A]dP.

That is, E[X Y | A] = Y E[X | A], ifY is an indicator function. It follows that the resultis true for simple functions of sets in A and therefore for a limit of bounded increasing

sequence of such functions converging to Y.

4. If X is independent of the -field A, then

E(X | A) = E(X). (1.4.3)

Proof First note that E(X) is A-measurable. Now, for A A we have to show that

A E(X |

A)dP = A E(X)dP.

However, the left hand side is equal to E[IAX] and the right hand side is equal to

E[IA]E[X], and their equality follows from the definition of independence of random

variables.

5. Conditional expectation is a projection operation, and so

E[E[X|A]

|A]

=E[X

|A]. (1.4.4)

Example 1.4.6 Consider the joint distribution function F(x1,x2) of two real valued random

variables X1, X2 and the probability measure P on the two-dimensional Borel sets generated

by the distribution function F(x1,x2). Suppose that P is absolutely continuous with respect

to two-dimensional Lebesgue measure. Then, by the RadonNikodym theorem, there exists

a nonnegative density function f(x1,x2) such that for any Borel set B:

P(B)

= IB (x1,x2) f(x1,x2)dx1dx2.

If f(x1,x2) > 0 everywhere,

P(B | X2 = x2) ={x1:(x1,x2)B} f(x1,x2)dx1+

f(x1,x2)dx1,

from which we can deduce thatf(x1,x2)

+ f(x1,x2)dx1

is the density function of the conditional

probability measure P(.|

X2=

x2).


46/270


Example 1.4.7 Let X1 and X2 be two random variables with a normal joint distribution.

Then their probability density function has the form

(x1,x2) =1

2 121 2exp

1

2(1

2) x

21 2 x1 x2 + x 22 ,

where 0 < 1 and xi =xi i

i, i = 1, 2. The conditional density of X1 given X2 =

x2 is a normal density with mean 1 + 1

2(x2 2) and variance Var(X1 | X2 = x2) =

(1 2)21 < 21 = Var(X1). To see this, recall that, by definition, the conditional densityof X1 given X2 is given by

(x1 | x2) =(x1,x2)

IR

(x1,x2)dx1

=

1

2 12

1 2exp

1

2(1 2)

x 21 2 x1 x2 + x 22

1

2 2exp

1

2x 22

= 12 11

2exp

1

2(1 2) x 21 2 x1 x2 + 2 x 22

= 12 1

1 2

exp

1

2(1 2) [ x1 x2]2

= 12 1

1 2

exp

1221 (1 2)

x1 (1 +

1

2(x2 2))

2,

and the result follows.

Thus by conditioning on X2 we have gained some statistical information about X1 which

resulted in a reduction in the variability of X1.

1.5 Problems

1. Let {Fi }iI be a family of-fields on . Prove that

iIFi is a -field.2. Let A and B be two events. Express by means of the indicator functions of A and B

IAB , IAB , IAB , IBA, I(AB)(BA),

where A B = A B.3. Let = IR and define the sequences C2n = [1, 2 +

1

2n) and C2n+1 = [2

1

2n + 1 , 1). Show that

lim sup Cn=

[

2,

2], lim infCn

=[

1, 1].


47/270

1.5 Problems 35

4. Let = (1, 2, 3, 4) and P(1) =1

12, P(2) =

1

6, P(3) =

1

3, and P(4) =

5

12.

Let

An

= {1, 3} ifn is odd,

{2, 4} ifn is even.Find P(lim sup An), P(lim infAn ), lim sup P(An), and lim infP(An) and compare.

5. Give a proof to Theorem 1.3.36.

6. Show that a -field is either finite or uncountably infinite.

7. Show that if X is a random variable, then {|X|} {X}.8. Show that the set B0 of countable unions of open intervals in IR is not closed under

complementation and hence is not a -field. (Hint: enumerate the rational numbers

and choose, for each one of them, an open interval containing it. Now show that the

complement of the union of all these open intervals is not in B0.)

9. Show that the class of finite unions of intervals of the form (, a], (b, c], and (d, )is a field but not a -field.

10. Show that a sequence of random variables {Xn} converges (a.s.) to X if and only if > 0 limm P[|Xn X| n m] = 1.

11. Show that if{Xk} converges (a.s.) to X then {Xk} converges to X in probability but theconverse is false.

12. Consider the probability space (IN,F, P), where IN is the set of natural numbers, F

is the collection of all the subsets of IN and P({k}) = 12k

. Let Xk() = I[=k]. Discussthe convergence (a.s.) and in probability of Xk and show that on this particular space

they are equivalent.

13. Let {Xn} be a sequence of random variables with

P[Xn = 2n] = P[Xn = 2n ] =1

2n,

P[Xn=

0]

=1

1

2n1.

Show that {Xn} converges (a.s.) to 0 but E|Xn|p does not converge to 0.14. Let {Xn} be a sequence of random variables with

P[Xn = n1/2p ] =1

n,

P[Xn = 0] = 1 1

n.

Show that

{Xn

}does not converge (a.s.) to 0 but E

|Xn

|p converges to 0.

15. Suppose Q is another probability measure on (,F) such that P(A) = 0 impliesQ(A) = 0 (Q P). Show that P-a.s. convergence implies Q-a.s. convergence.

16. Prove that ifF1 and F2 are independent sub--fields and F3 is coarser than F1, then

F3 and F2 are independent.

17. Let = (1, 2, 3, 4, 5, 6), P(i ) = pi =1

6and the sub--fields

F1 = {{1, 2}, {3, 4, 5, 6}},F2

=

{{1, 2

},{

3, 4}

,{

5, 6}}

.


48/270


Show that F1 and F2 are not independent. What can be said about the sub--fields

F3 = {{1, 2}, {3}, {4, 5, 6}},

and

F5 = {{1, 4}, {2, 5}, {3, 6}}?

18. Let = {(i, j ) : i, j = 1, . . . , 6} and P({i, j}) = 1/36. Define the quantity

X() =

k=0k I{(i,j ):i+j=k}.

Is X a random variable? Find PX(x ) = P(X = x), calculate E[X] and describe (X),the -field generated by X.

19. For the function X defined in the previous exercise, describe the random variable

P(A | X), where A = {(i, j ) : i odd, j even} and find its expected value E[P(A | X)].20. Let be the unit interval (0, 1] and on it be given the following -fields:

F1 = {(0, 12 ], ( 12 , 34 ], ( 34 , 1]},F2 = {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]},F3 = {(0, 18 ], ( 18 , 28 ], . . . , ( 78 , 1]}.

Consider the mapping

X() = x1I(0,

14

]() + x2I

(14

, 12

]() + x3I

(12

, 34

]() + x4I

(34

, 1]().

Find E[X | F1], E[X | F2], and E[X | F3].21. Let be the unit interval and ((0, 1], P) be the Lebesgue-measurable space and consider

the following sub--fields:

F1=

{

(0, 1

2

], ( 1

2

, 3

4

], ( 3

4

, 1]}

,

F2 = {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]}.

Consider the mapping

X() = .

Find E[E[X | F1] | F2], E[E[X | F2] | F1] and compare.22. Consider the probability measure P on the real line such that:

P(0) = p, P((0, 1)) = q, p + q = 1,and the random variables defined on = IR,

X1(x ) = 1 + x, X2(x ) = 0I{x0} + (1 + x)I{0


49/270

1.5 Problems 37

23. Let X1, X2 and X3 be three independent, identically distributed (i.i.d.) random vari-

ables such that P(Xi = 1) = p = 1 P(Xi = 0) = 1 q. Find P(X1 + X2 + X3 =s | X1, X2).

24. Let X1, X2 and X3 be three random variables with multinomial distribution with pa-

rameters p1, p2, p3, n, that is

P(X1 = n1, X2 = n2, X3 = n3) =n!p

n11 p

n22 p

n33

n1!n2!n3!,

where n1, n2 and n3 are nonnegative integers such that n1 + n2 + n3 = n. Show that ifn is a random variable with Poisson distribution with parameter then the three random

X1, X2 X3 become mutually independent with Poisson distributions.

25. On = [0, 1] and P being Lebesgue measure show that

X = x1I(0, 12 ] + x2I( 12 ,1] and Y = y1I(0, 14 ]( 34 ,1] + y2I( 14 ], 34 ]are independent.

26. Show that (see Example 1.4.4)

E[ f | Fn] =2n1j=0

(j+1)2nj 2n f(x)dx

2nI[j 2n ,(j+1)2n )

converges a.s. and in L1 to f as n .In particular, if f = IE for some Borel set E, then

2n1j=0

(E [j 2n, (j + 1)2n))2n

I[j 2n ,(j+1)2n )(x)a.s. IE(x),

x [0, 1]. Here (.) is the Lebesgue measure.


50/270

2

Stochastic processes

2.1 Definitions and general results

A stochastic process is a mathematical model for any phenomenon evolving or varying intime (or over some index set), subject to random influences. Examples include the price

of a commodity observed through time, the fluctuating water level behind a dam or the

distribution of shades in a noisy image observed over a region of IR2. Suppose (,F)

is a measurable space. We shall define a stochastic process to be a mapping X(index)()

from {index space} into a second measurable space (E, E), called the state space,or the range space. Alternatively, we can consider a stochastic process as a family {Xt}t

{index space

}of random variables all defined on a measurable space (,F).

For a fixed simple outcome , X(.)() is a function describing one possible trajectory, orsample path, followed by the process. If the time index is frozen at t, say, then we have a

random variable Xt(.), i.e. an F-measurable function of.

When the time index t is continuous, measurability, continuity, etc. in t are considered.

A continuous-time stochastic process {Xt} is said to have independent increments if forall t0 < t1 < t2 < < tn, the random variables Xt1 Xt0 , Xt2 Xt1 , . . . , Xtn Xtn1 areindependent. If for all s, Xt+s Xt has the same distribution for all t, {Xt} is said to possessstationary increments.

Sometimes, a stochastic process is interpreted as just a single random variable takingvalues in a space of functions, that is, with each is associated a function. In analogy with

real random variables, the state space is then endowed with a Borel -field (generated by

the open sets of an underlying topology).

Example 2.1.1 Let

= {1, 2, . . . },

and let the time index n be finite 0 n N. A stochastic process X in this setting is atwo-dimensional array or matrix such that:

X =

X1(1) X1(2) . . .

X2(1) X2(2) . . .

. . . . . . . . .

XN(1) XN(2) . . .


51/270


Each row represents a random variable and each column is a sample path or a realization

of the stochastic process X. If the time index is unbounded, each sample path is given by

an infinite sequence.

Example 2.1.2 Let N = 4 in the previous example and suppose that X is given by thefollowing array.

2 3 5 7 11 3 2.3 1

1 1 5.7

2 3 6 83 19

11 7 70 3 2 5 2 215 3 2 1 0 1 2 3

The sample space of{Xn} is IR4 and the stochastic process can be thought of as a mapping(in fact a random variable)

i X(i ) = (X1(i ), . . . , X4(i )) = (x i1,x i2,x i3,x i4)= x i IR4.

The random variable X induces a probability measure PX on the Borel -field B(IR4) in the

usual way, i.e., for any B B(IR4),

PX(B)= P[ : X() B] = P(X1(B)).

For instance,

B1 = {x IR4 : 3 x1 5, 2 x2 7}

contains a single trajectory (column 6 in the table) so that PX(B1) = P(6).

B2 = {x IR4

: max1n4xn 7}contains four trajectories (column 2, column 3, column 4 and column 6 in the table) so that

PX(B2) = P(2, 3, 4, 6).

Example 2.1.3 Let = {1, 2, . . . } and P be a probability measure on (,F). Supposethat the time index set is the set of positive integers. A real valued stochastic process X in

this setting is a two-dimensional infinite array such that:

X =X1(1) X1(2) . . .

X2(1) X2(2) . . .

. . . . . . . . .

.

Here the sample space is

IR

= {(x1,x2, . . . )

IR

IR

. . .

}.


52/270

40 Stochastic processes

Note that the Borel -field B(IR) coincides with the smallest -field containing the open

sets in IR in the metric (x 1,x 2) =

k 2k |x 1k x 2k|

1 + |x 1k x 2k|([36]).

Now think of the stochastic process X as an IR valued random variable

i X(i ) = (X1(i ), X2(i ), . . . ) = (x i1,x i2, . . . )= x i IR.

The random variable X induces a probability measure PX on the -field B(IR). For

instance, if

A = {x IR : supxn > a} B(IR),

then the set A consists of all sequences with some of their entries larger than a and PX(A)

=P( : X() A).

Example 2.1.4 (The Single Jump Process) Consider a stochastic process {Xt}, t 0, whichtakes its values in some measurable space {E, E} and which remains at its initial valuez0 Euntil a random time T, when it jumps to a random position Z. A sample path of the process

is

Xt() = z0 ift < T(),Z() ift T().The underlying probability space can be taken to be

= [0, ] E,

with the -field B E. A probability measure P is given on (, B E) and we suppose

P([, 0] {z0}) = 0 = P({0} E),so that the probabilities of a zero jump and a jump at time zero are zero.

Write

Ft = P[T > t, Z E],c = inf{t : Ft = 0}.

Ft is right-continuous and monotonic decreasing, so there are only countably many pointsof discontinuity {u} = D where Fu = Fu Fu = 0. At points in D, there are positiveprobabilities that X jumps. Note that the more probability mass there is at a point u, the

more predictable is the jump at that point.

Formally define a function by setting:

d(t) = P(T ]t dt, t], Z E | T > t dt).

Then is the probability that the jump occurs in the interval ]t

dt, t], given it has not


53/270


happened at t dt. Roughly speaking we have

d(t) = P(T ]t dt, t] | T > t dt)

=

P(T ]t dt, t])

Ftdt

= 1 Ft (1 Ftdt)Ftdt

= (Ft Ftdt)Ftdt

= (Ft Ft)Ft

= dFtFt

.

Define

(t) =

]0,t[

dFs

Fs. (2.1.1)

For instance, ifT is exponentially distributed with parameter we have

(t) = ]0,t[

d exp(

s)

exp( s) = .

Write

FAt = P[T > t, Z A],

then clearly the measure on (IR+,B(IR+)) given by FAt is absolutely continuous with respectto that given by Ft, so that there is a RadonNikodym derivative (A, s) such that

FAt FA0 =

]0,t[

(A, s)dFs . (2.1.2)

The pair (, )istheLevy system for the jump process. Roughly, (dx , s) is the conditional

distribution of the jump position Z given the jump happens at time s.

Let Xt be a continuous time stochastic process. That is, the time index belongs to some

interval of the real line, say, t [0, ). If we are interested in the behavior of Xt duringan interval of time [t

0, t

1] it is necessary to consider simultaneously an uncountable family

of Xts {Xt, t0 t t1}. This results in a technical problem because of the uncountabilityof the index parameter t. Recall that -fields are, by definition, closed under countable

operations only and that statements like {Xt x, t0 t t1} =

t0tt1{Xt x} are notevents! However, for most practical situations this difficulty is bypassed by replacing un-

countable index sets by countable dense subsets without losing any significant information.

In general, these arguments are based on the separability of a continuous time stochastic

process. This is possible, for example, if the stochastic process Xis almost surely continuous

(see Definition 2.1.6).


54/270


Let X = {Xt : t 0} and Y = {Yt : t 0} be two stochastic processes defined on thesame probability space (,F, P). Because of the presence of, the functions Xt() and

Yt() can be compared in different ways.

Definition 2.1.5

1. X and Y are calledindistinguishable if

P({ : Xt() = Yt(), t 0}) = 1.2. Y is a modification of X if for every t 0, we have

P({ : Xt() = Yt()}) = 1.3. X and Y have the same law or probability distribution if and only if all their finite dimen-

sional probability distributions coincide, that is, if and only if for any sequence of times0 t1 tn the joint probability distributions of (Xt1 , . . . , Xtn ) and (Yt1 , . . . , Ytn )coincide.

Note that the first property is much stronger than the other two. The null sets in the second

and third properties may depend on t.

Recall that there are different definitions of limitfor sequences of random variables. So

to each definition corresponds a type of continuity of a real valued time index process.

Definition 2.1.6

1. {Xt} is continuous in probability if for every t and > 0,limh0

P[|Xt+h Xt| > ] = 0.

2. {Xt} is continuous in Lp if for every t ,limh0

E[|Xt+h Xt|p] = 0.

3. {Xt} is continuous almost surely (a.s.) if for every t,P[lim

h0Xt+h = Xt] = 1.

4. {Xt} is right continuous if for almost every the map t Xt() is right continuous.That is,

limst

Xs = Xt a.s.

If in addition

limst

Xs = Xt exists a.s.,

{Xt} is right continuous with left limits (rcll or corlol or cadlag).However, none of the above notions is strong enough to differentiate, for instance, between a

process for which almost all sample paths are continuous for every t, and a process for which

almost all sample paths have a countable number of discontinuities, when the two processes

have the same finite dimensional distributions. A much stronger criterion for continuity is

sample paths continuity which requires continuity for all ts simultaneously! In other words,


55/270


for almost all the function X(.)() is continuous in the usual sense. Unfortunately, the

definition of a stochastic process in terms of its finite dimensional distributions does not

help here since we are faced with whole intervals containing uncountable numbers of ts.

Fortunately, for most useful processes in applications, continuous versions (sample path

continuous), or right-continuous versions, can be constructed.If a stochastic process with index set [0, ) is continuous its sample space can be

identified with C[0, ), the space of all real valued continuous functions. A metric on thisspace is

(x, y) =

k

2ksup0tk |x(t) y(t)|

1 + sup0tk |x(t) y(t)|,

for x , y C[0, ). (See [36].)Let B(C) be the smallest -field containing the open sets of the topology induced by on

C[0, ), the Borel -field. Then ([36]) the same -field B(C) is generated by the cylindersets ofC[0, ) which have the form

{x C[0, ) : xt1 I1, xt2 I2, . . . , xtn In},where each Ii is an interval of the form (ai , bi ]. In other words, a cylinder set is a set

of functions with restrictions put on a finite number of coordinates, or, in the language of

Shiryayev ([36]), it is the set of functions that, at times t1, . . . , tn, get through the windows

I1

, . . . , In

and at other times have arbitrary values.

An example of a Borel set from B(C) is

A = {x : supxt > a, t 0}.Remark 2.1.7 Note that the set given by A depends on the behavior of functions on an

uncountable set of points and would not be in the -field B(C) ifC[0, ) were replaced bythe much larger space IR[0,) (see Theorem 3, page 146 of [36]). In this latter space everyBorel set is determined by restrictions imposed on the functions x , on an at most countable

set of points t1, t2, . . . .

Suppose the index parameter t is either a nonnegative integer or a nonnegative real

number. The -fields FXt = {Xu , u t} are the smallest ones with respect to which therandom variables Xu , u t, are measurable, and are naturally associated with any stochasticprocess {Xt}. FXt is sometimes called the natural filtration associated with the stochasticprocess {Xt}.

The -field FXt contains all the events which by time t are known to have occurred or

not by observing X up to time t.

Often it is convenient to consider larger -fields than FXt . For instance, {Ft ={Xu , Yu ; u t} where {Yt} is another stochastic process.Definition 2.1.8 The stochastic process X is adapted to the filtration {Ft, t 0} if for eacht 0 Xt is a Ft-measurable random variable.Clearly X is adapted to FXt . A function f is F

Xt -measurable if the value of f() can be

decided by observing the history of X up to time t (and nowhere else). This follows from

the multivariate version of Theorem 1.3.6. For instance, f()=

Xt2 () is FXt -measurable

for 0 < t < 1 but it is not FXt -measurable for t 1.


56/270


As a function of two variables (t, ), a stochastic process should be measurable with

respect to both variables to allow a minimum of good behavior.

Definition 2.1.9 A stochastic process {Xt} with t [0, ) on a probability space

{,F, P

}is measurable if, for all Borel sets B in the Borel -fieldB(IRd),

{(, t) : Xt() B} F B([0, )).If the probability space {,F, P} is equipped with a filtration {Ft} then a much strongerstatement of measurability which relates measurability in t and with the filtration {Ft} isprogressive measurability.

Definition 2.1.10 A stochastic process {Xt} on a filtered probability space {, F,Ft, P}is progressively measurable if, for any t [0, ) and for any set B in the Borel -fieldB(IRd),

{(, s) : s t, Xs () B} Ft B([0, t]).Here B([0, t]) is the -field of Borel sets on the interval [0, t].

A measurable process need not be progressively measurable since (Xt) may contain events

not in Ft.

Lemma 2.1.11 If X is a progressively measurable stochastic process, then X is adapted.

Proof The map (s, ) from [0, t] is Ft-measurable. The map (s, ) Xs () from [0, t] to the state space of X is Ft-measurable. By composition of the twomaps the result follows.

Theorem 2.1.12 If the stochastic process {Xt : t 0} on the filtered probability space{,F,Ft, P} is measurable and adapted, then it has a progressively measurable modifi-cation.

Proof See [28] page 68.

Typically, in a description of a random process, the measure space and the probability

measure on it are not given. One simply describes the family of joint distribution functions

of every finite collection of random variables of the process. A basic question is whether

there is a stochastic process with such a family of joint distribution functions. The following

theorem ([36] page 244), due to Kolmogorov, guarantees us that this is the case if the joint

distribution functions satisfy a set of natural consistency conditions.

Theorem 2.1.13 (Kolmogorov Consistency Theorem) For all t1, . . . , tk, k IN, in the timeindex T , let Pt1,...,tk be probability measures on (IR

k,B(IRk)) such that

Pt(1) ,...,t(k) (F1 Fk) = Pt1,...,tk(F1(1) F1(k)).for all permutations on {1, 2, . . . , k} and

Pt1

,...,tk

(F1

Fk)=

Pt1

,...,tk

,tk+1

,...,tk+m

(F1

Fk

IRn

IRn ),


57/270


for all m IN, and the set on the right hand side has a total of k+ m factors. Then thereis a unique probability measure P on the space (IRT,B(IRT)) such that the restriction of P

to any cylinder set Bn = {x IRT : xt1 I1, xt2 I2, . . . , xtn In} is Pt1,...,tn, that is

P(Bn )

=Pt1 ,...,tn (Bn).


Theorem 2.1.14 ( Kolmogorovs Existence Theorem). For all 1, . . . , k, k IN and inthe time index let P1,...,k be probability measures on IR

nk such that

P(1),...,(k) (F1 Fk) = P1,...,k(F1(1) F1(k)),

for all permutations on {1, 2, . . . , k} andP1,...,k(F1 Fk) = P1,...,k,k+1,...,k+m (F1 Fk IRn IRn),

for all m IN, and the set on the right hand side has a total of k+ m factors. Then thereexist a probability space (,F, P) and a stochastic process {X} on into IRn suchthat

P1,...,k(F1 Fk) = P[X1 F1, . . . , Xk Fk],

for all i in the time set, k IN and all Borel sets Fi .Proof The proof follows essentially from Theorems 1.3.9, 1.3.10 and 2.1.13. See [36]

page 247.

Definition 2.1.15 Suppose X is a stochastic process whose index set is the positive integers

Z+. Suppose Fn is a filtration. Then {Xn} is predictable if Xn is Fn1-measurable, that is,Xn() is known from observing events in Fn

1 at time n

1.

In continuous time, without loss of generality, we shall take the time index set to be

[0, ).In the continuous time case, roughly speaking, a stochastic process {Xt} is predictable if

knowledge about the behaviorof the process is left-continuous, that is, Xt isFt-measurable.Stated differently, for processes which are continuous on the left one may predict their value

at each point by their values at preceding points. A Poisson process (see Section 2.10) is not

predictable (its sample paths are right-continuous) otherwise we would be able to predict a

jump time immediately before it jumps. More precisely, a stochastic process is predictableif it is measurable with respect to the -field on [0, ) generated by the family of allleft-continuous adapted stochastic processes.

A stochastic process X with continuous time parameter is optional if it is measurable

with respect to the -field on [0, ) generated by the family of all right-continuous,adapted stochastic processes which have left limits.

Definition 2.1.16 A measurable stochastic process {Xt} with values in [0, ), is called anincreasing process if almost every sample path X() is right-continuous and increasing.


58/270


Theorem 2.1.17 Suppose {Xt} is an increasing process. Then Xt has a unique decomposi-tion as Xct + Xdt , where {Xct} is an increasing continuous process, and{Xdt } is an increasingpurely discontinuous process, that is, {Xdt } is the sum of the jumps of{Xt} .

If{Xt} is predictable {Xdt } is predictable. If{Xt} is adapted{Xct} is predictable.Proof See [11] page 69.

2.2 Stopping times

One of the most important questions in the study of stochastic processes is the study of

when a process hits a certain level or enters a certain region in its state space for the first

time. Since for each possible trajectory, or realization , there is a hitting time (finite or

infinite), the hitting time is a random variable taking values in the index, or time, space of

the stochastic process.Let IN = {1, 2, 3, . . . , } and F = {n=1Fn}

= n=1 Fn .A random variable taking values in IN is a stopping time (or optional or Markov

time) with respect to a filtration {Fn} if for all n IN we have { : () n} Fn . Anequivalent definition in discrete time is to require { : () = n} Fn .

The concept of stopping time is directly related to the concept of the flow of information

through time, that is, the filtration. The event { : () n} is Fn-measurable, that is,measurable with respect to the information available up to time n. This means a stopping

time is a nonanticipative function, whereas a general random variable may anticipate thefuture.

Example 2.2.1 Let {Xn,Fn} be an adapted process (i.e. {Fn} is a filtration and Xn is Fn -measurable for all n). Suppose A is a measurable set of the state space of X. Then the

random time

= min{k : Xk A}is a stopping time since

{ n} =n

k=1{Xk A} Fn .

If is a stopping time with respect to a filtration Fn so is + m, m IN. However, m, m IN is not a stopping time since the event { m = n} = { = n + m} is notin Fn ; it is in Fn

+m and hence anticipates the future.

In order to measure the information accumulated up to a stopping time we should define

the -field F of events prior to a stopping time . Suppose that some event B is part

of this information. This means that if n we should be able to tell whether or notB has occurred. However, { n} Fn so that we should have B { n} Fn andBc { n} Fn . We, therefore, define:

F = {A F : A { : () n} Fn n 0}.The next examples should help to clarify this concept.


59/270


Example 2.2.2 Let = {i ; i = 1, . . . , 8} and the time index T = {1, 2, 3}. Consider thefollowing filtration:

F1 = {{1, 2, 3, 4, 5, 6}, {7, 8}},F2

=

{{1, 2

},{

3, 4}

,{

5, 6}

,{

7, 8}}

,

F3 = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}}.Now define the random variable

(1) = (2) = (5) = (6) = 2,(3) = (4) = (7) = (8) = 3,

so that

{ = 0} = , { = 1} = ,{ = 2} = {1, 2, 5, 6},{ = 3} = {3, 4, 7, 8},

and is a stopping time.

Now F = { all events A F(= F3) such that for some n the event A is a subset ofthe event { : () n} }. In our situation

F

=

{{1, 2

},

{5, 6

},

{3

},

{4

},

{7

},

{8

}}.

Note that the first two simple events ofF, {1, 2}, {5, 6}, are in F2 and the rest are inF3 as they should be. Also, note that F is notthe -field generated by the random variable

. However, a closer look shows that is F-measurable. If, for instance, the outcome is 1then = 2 and 1(2) = { = 2} = {1, 2, 5, 6} is an atom of the -field generatedby the random variable but not an atom ofF.

Example 2.2.3 Consider again the experiment of tossing a fair coin infinitely many times.

Each is an infinite sequence of heads and tails and

= {H, T}IN.Define the filtration:

F1 = {{ starting with H}, { starting with T}},F2 = {{ starting with HH}, { starting with HT}, { starting with TH},

{ starting with TT}}, . . . ,Fn

=

{{ starting with n fixed letters

}}Suppose that we win one dollar each time Heads comes up and lose one otherwise. Let

S0 = 0 and Sn be our fortune after the n-th toss. Define the random variable = inf{n :Sn > 0}, which is the first time our winnings exceed our losses. Clearly, is a stoppingtime with respect to the filtration Fn .

Here

F = {{ starting with H}, { starting with THH},

{ starting with THTHH

},{

starting with TTHHH}

, . . .}

.


60/270


and

( starting with H) = 1,( starting with THH) = 3,( starting with THTHH)

=( starting with TTHHH)

=5.

If = T H T H H . . . , then the information at time (T H T H H . . . ) = 5 is in F5 and isgiven by the event composed of all the smaller events starting with T H T H H and is an

atom ofF . However { = 5} = {{T H T H H . . . }, {T T H H H . . . }} which is not an atomofF .

If are two stopping times then F F , because if A F,

A{ n} = (A{ n}){ n} Fn (2.2.1)for all n. From this result we see that if {n} is an increasing sequence of stopping times,the sequence {Fn } is a filtration.

Example 2.2.4 Let = {i , i = 1, . . . , 8} and the time index T = {1, 2, 3, 4}. Considerthe following filtration:

F1 = {{i , i = 1, . . . , 6}, {7, 8}},F

2 =

{{

1,

2,

3},{

4

, 5

, 6}

,{

7

, 8}}

,

F3 = {{1, 2}, {3}, {4}, {5, 6}, {7, 8}},F4 = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}}.

Now define the stopping times 1 and 2:

1(1) = 1(2) = 1(3) = 1(4) = 1(5) = 1(6) = 2,1(7) = 1(8) = 3,

2(1) = 2(2) = 2(3) = 2, 2(5) = 2(6) = 3,2(4) = 2(7) = 2(8) = 4,

so that 1 2 and F1 F2 , where

F1 = {{1, 2, 3}, {4, 5, 6}, {7, 8}},F2 = {{1, 2, 3}, {4}, {5, 6}, {7}, {8}}.

For any Borel set B,

{ : X()() B} =

n=0{Xn () B, () = n} F,

that is, X is a random variable.

If X has been defined and X F =n

Fn, then we define X () = X()(), i.e.X

= nIN XnI{=n} F, that is, X is F -measurable.


61/270


In the continuous time situation, definitions are more involved and the time parameter

t plays a much more important role since continuity, limits etc. enter the scene. Let {Ft},t [0, ) be a filtration. A nonnegative random variable is called a stopping time withrespect to the filtration Ft if for all t 0 we have { : () t} Ft.

A nonnegative random variable is an optional time with respect to the filtration Ft iffor all t 0 we have { : () < t} Ft.

Every stopping time is optional, and the two concepts coincide if the filtration is

right-continuous since { : () t} Ft+ for every > 0, and hence { : () t} >0 Ft+ = Ft+ = Ft provided that Ft is right-continuous.

Example 2.2.5 Suppose {Xt, t 0} is continuous and adapted to the filtration {Ft, t 0}.

1. Consider () = inf{t, Xt() = b}, the first time the process X hits level b IR (firstpassage time to a level b IR). Then is a stopping time since

{ t} =nIN

{rQ,rt}

{|Xr b| 1

n} Ft.

2. Consider () = inf{t, |Xt()| 1}, the first time the process X leaves the interval[1, +1]. Then is a stopping time.

3. Consider () = inf{t, Xt() > 1} which is the first time the jump Xt = Xt Xtexceeds 1. Then is a stopping time.

Similarly to the discrete time case, the -field of events prior to a stopping time is

defined by

F = {A F : A { : () t} Ft t 0}. (2.2.2)

Any stopping time is F -measurable as, for s

t,

{ : () s} { : () t} = { : () min(t, s)} Fmin(t,s) Ft. (2.2.3)

Hence { : () s} F.If 1, 2 are stopping times, then min(1, 2), max(1, 2) and 1 + 2 are stopping

times as:

1. {min(1, 2) t} = {1 t} {2 t} F

t,2. {max(1, 2) t} = {1 t} {2 t} Ft,3. {1 + 2 t} = {1 = 0, 2 = t} {2 = 0, 1 = t}

p,qQ,p+qt({1 p} {2 q}

,

where Q is the set of rational numbers.

4. If{n} is a sequence of stopping times then sup n is a stopping time since {sup n t} =

n{n t} Ft.

5. If1, 2 are stopping times such that 1

2 then F1

F2

.


62/270


Perhaps one of the most important applications of the concept of stopping time is the

so-called strong Markov property.

A stochastic process {Xt} is a Markov process if

E[ f(Xt+

s )|F

X

t

]=

E[ f(Xt+

s )|

Xt], ( P-a.s.) (2.2.4)

where f is any bounded measurable function and FXt = {Xu , u t}. Equation (2.2.4) istermed the Markov property.

A natural generalization of the Markov property is the strong Markov property, where

the present time t in (2.2.4) is replaced by a stopping time and the future time t + s isreplaced by another later stopping time. That is, if and are stopping times and ,

E[X

|F ]

=X a.s.

In other words a stochastic process {Xt} has the strong Markov property if the informationabout the behavior of{Xt} prior to the stopping time is irrelevant in predicting its behaviorafter that time once X is observed.

2.3 Discrete time martingales

Martingales are probably the most important type of stochastic processes used for modeling.

They occur naturally in almost any information processing problem involving sequentialacquisition of data: for example, the sequence of estimates of a random variable based on

increasing observations, and the sequence of likelihood ratios in a sequential hypothesis

test are martingales.

The stochastic process X is a submartingale (supermartingale) with respect to the filtra-

tion {Fn} if it is

1. Fn-adapted,

2. E[|Xn|] < for all n, and3. E[Xn | Fn ] Xn a.s. (E[Xn | Fn ] Xn a.s.) for all n n.

The stochastic process X is a martingale if it is a submartingale and a supermartingale.

If we recall the definition of conditional expectation we see that the requirement E[Xn+1 |Fn] = Xn a.s. implies the following:

FE[Xn+1 | Fn ]dP = F

Xn+1dP, F Fn ,

and F

Xn dP =

F

Xn+1d P, F Fn. (2.3.1)

Since Fn Fn+1 Fn+k, it easily seen that

FXnd P = F Xn+1dP = FXn+kdP, F Fn. (2.3.2)


63/270


and hence with probability 1 E[Xn+k | Fn ] = Xn . Setting F = and n = 1, 2, . . . in(2.3.2) gives

E[X1] = E[X2] = = E[Xn].

A classical example of a martingale X is a players fortune in successive plays of a fair game.If X0 is the initial fortune, then fair means that, on average, the fortune at some future

time n, after more plays, should be neither more nor less than X0. If the game is favorable

to the player, then his fortune should increase on average and Xn is a submartingale. If the

game is unfavorable to the player, Xn is a supermartingale.

The following important inequality is used to prove a fundamental result on constructing

a uniformly integrable family of random variables by conditioning a fixed (integrable)

random variable on a family of sub--fields.

Lemma 2.3.1 (Jensens Inequality). Suppose X L 1. If : IR IR is convex and(X) L1, then

E[(X) | G] (E[X | G]). (2.3.3)

Proof (see, for example, [11]) Any convex function : IR IR is the supremum of afamily of affine functions, so there exists a sequence (n) of real functions with n(x) =anx + bn for each n, such that = supn n . Therefore (X) anX + bn holds a.s. for each(and hence all) n. So by the positivity ofE[. | G], E[(X) | G] supn (anE[X | G] + bn) =(E[X | G]) a.s.

Lemma 2.3.2 Let X Lp, p 1. The family

L = {E[X | G] : G is a sub--field ofF},

is uniformly integrable.

Proof Since (x) = |x|p is convex, Jensens Inequality 2.3.1 implies that

|E[X | G]|p E[|X|p | G].

Hence

E[|E[X | G]|p] E[E[|X|p | G]] = E[|X|p],

that is, E[

|E[X

|G]

|p] b, for1 i k.

The following theorem is a useful tool in proving convergence results for submartingales.

Theorem 2.3.6 (Doob). If{Xn,Fn} is a submartingale then for all n 1,

E[Cn [a, b]] E[Xn a]+

b a ,

where and[Xn a]+ = max{(Xn a), 0}.


Theorem 2.3.7 If{Xn,Fn} is a nonnegative martingale then Xn X a.s., where X is anintegrable random variable.

Proof Suppose that the event { : lim infXn () < lim sup Xn()} =

p 0. (2.3.5)


65/270


This means that {Xn} oscillates about or up-crosses the interval [a, b] infinitely many times.However, using Theorem 2.3.6 and the fact that sup E[Xn ] = E[X1] < we have:

limn

E[Cn[a, b]]

lim

n

E[Xn a]+

b a

E[X1] + |a|

b an(Xn+1 Xn ) | Fn ]

= I{>n}E[(Xn+1 Xn) | Fn ] = 0,since { > n} Fn .

We also have that stopping at an optional time preserves the martingale property.

Theorem 2.3.12 (Doob Optional Sampling Theorem). Suppose {Xn ,Fn} is a martingale.Let (a.s.) be stopping times such that X and X are integrable. Also supposethat

lim inf

{n}

|Xn|dP 0, (2.3.6)

and

lim inf

{n}

|Xn|d P 0. (2.3.7)

Then

E[X | F] = X. (2.3.8)

In particular E[X ] = E[X ].

Proof Using the definition of conditional expectation, we have to show that for every

A F,

A I{}E[X | F ]dP =

A I{}X dP =

A I{}Xd P.

However, { } =n0{ = n} { n}. Hence it suffices to show that, for all n 0:A

I{=n}{n}X dP =

A

I{=n}{n}XdP

=

A

I{=n}{n}XndP. (2.3.9)

Now, { : () n} = { : () = n}{ : () n + 1} and in view of (2.3.1), thelast integral in (2.3.9) is equal to

A{=n}{=n}XndP +

A{=n}{n+1}

Xn+1dP

=

A{=n}{=n}X d P +

A{=n}{n+1}

Xn+1dP. (2.3.10)


67/270


Also, { : () n} = { : n () n + 1}{ : () n + 2} and using (2.3.1)again, (2.3.10) equals

A{=n}{nn+1}X d P + A{=n}{n+2}

Xn+2dP.

Repeating this step k times,A

I{=n}{n}Xn dP =

A{=n}{nn+k}X dP

+

A{=n}{n+k+1}Xn+k+1dP,

that is A{=n}{nn+k}

X d P =

A{=n}{n}XndP

A{=n}{n+k+1}Xn+k+1d P.

Now,

Xn+k+1 = X+n+k+1 Xn+k+1= 2X+n+k+1 (X+n+k+1 + Xn+k+1) = 2X+n+k+1 |Xn+k+1|

so that A{=n}{nn+k}

X dP =

A{=n}{n}Xnd P

2

A{=n}{n+k+1}X+n+k+1d P

+ A{=n}{n+k+1}

|Xn+k+1|dP. (2.3.11)

Taking the limit when k of both sides of (2.3.11) and using (2.3.7), we obtainA{=n}{n}

X d P =

A{=n}{n}XndP,

which establishes (2.3.9) and finishes the proof.

Definition 2.3.13 The stochastic process {Xn,Fn} is a local martingale if there is a se-quence of stopping times {k} increasing to with probability 1 and such that{Xnk,Fn}is a martingale.

Remark 2.3.14 The interesting fact about local martingales is that they can be obtained

rather naturally through a martingale transform (stochastic integral in the continuous time

case) which is defined as follows. Suppose {Yn ,Fn} is a martingale and {An,Fn} is a


68/270


predictable process. Then the sequence

Xn = A0Y0 +n

k=1Ak(Yk Yk1)

is called a martingale transform and is a local martingale.Proof To show that {Xn ,Fn} is a local martingale we have to find a sequence of stop-ping times {k}, k 1, increasing to infinity (P-a.s.) and such that the stopped pro-cess {Xmin(n,k),Fn} is a martingale. Let k = inf{n 0 : |An+1| > k}. Since A is pre-dictable the k are stopping times and clearly k (P-a.s.). Since Y is a martingale and|Amin(n,k)I{k>n}| k then, for all n 1,

E[|Xmin(n,k)I{k>n}| < .

Moreover, from Theorem 2.3.11,

E[(Xmin(n+1,k) Xmin(n,k))I{k>n} | Fn]= I{k>n}Amin(n+1,k)E[Ymin(n+1,k) Ymin(n,k) | Fn] = 0.

This finishes the proof.

Example 2.3.15 Suppose that you are playing a game using the following strategy. At

each time n your stake is An. Write Xn for the state of your total gain through the n-th game

with X0 = 0 for simplicity.Write Fn = {Xk : 0 k n}. We suppose for each n, An is Fn1 measurable, that

is A = {An} is predictable with respect to the filtration Fn. This means that An =An(X0, X1, . . . , Xn1) is a function of X0, X1, . . . , Xn1.

If we assume that you win (or lose) at time n if a Bernouilli random variable bn is equal

to 1 (or 1), then

Xn =n

k=1Akbk =

nk=1

AkCk.

Here Ck = Ck Ck1 and Ck =k

i=1 bi . IfCis a martingale with respect to the filtrationFn (in this case we say that the game is fair), then the same thing holds for X because

E[Xn | Fn1] = Xn1 + AnE[Cn Cn1 | Fn1]= Xn1 + An (E[Cn | Fn1] Cn1)

=Xn1

+An (Cn1

Cn1)

=Xn1.

2.4 Doob decomposition

A submartingale is a process which on average is nondecreasing. Unlike a martingale,

which has a constant mean over time, a submartingale has a trend or an increasing predictable

part perturbated by a martingale component which is not predictable. This is made more

precise by the following theorem due to J. L. Doob.


69/270

2.4 Doob decomposition 57

Theorem 2.4.1 (Doob Decomposition). Any submartingale {Xn} can be written (P-a.s.uniquely) as

Xn = Yn + Zn, a.s. (2.4.1)

where {Yn} is a martingale and{Zn} is a predictable, increasing process, i.e. E(Zn ) < ,Z1 = 0 and Zn Zn+1 a.s. n.

Proof Write n = Xn Xn1, yi = i E[i | Fi1] and zi = E[i | Fi1], z0 = 0.Then:

Xn = 1 E[1 | F0] + 2 E[2 | F1]

+ +n

E[n

|Fn

1]

+

n

i=1

E[i

|Fi

1]

=n

i=1yi +

ni=1

zi

= Yn + Zn,

To prove uniqueness suppose that there is another decomposition Xn = Yn + Zn =

ni=1 yi +ni=1 zi . Let yn + zn = xn = yn + zn and take conditional expectation with re-spect to Fn1 to get zn = zn , because yn is a martingale increment and zn is predictable.This implies yn = yn and the uniqueness of the decomposition.

Remarks 2.4.2

1. In Theorem 2.4.1 if{Xn} is just an Fn -adapted and integrable process the decompositionremains valid but we lose the increasing property of the process {Zn}.

2. The process X

Z is a martingale; as a result Z is called the compensator of the

submartingale X.3. A processes which is the sum of a predictable process and a martingale is called a

semimartingale.

4. Uniqueness of the decomposition is ensured by the predictability of the process {Zn}.

Definition 2.4.3 A discrete-time stochastic process {Xn} , with finite-state space S ={s1, s2, . . . , sN}, defined on a probability space (,F, P) is a Markov chain if

P(Xn+1 = sin+1 | X0 = si0 , . . . , Xn = sin ) = P(Xn+1 = sin+1 | Xn = sin ),

for all n 0 and all states si0 , . . . , sin , sin+1 S. This is termed the Markov property.{Xn} is a homogeneous Markov chain if

P(Xn+1 = sj | Xn = si ) = j iis independent of n.


70/270


The matrix = {j i} is called the probability transition matrix of the homogeneousMarkov chain and it satisfies the property

Nj=1 j i = 1.

Note that our transition matrix is the transpose of the traditional transition matrix defined

elsewhere. The convenience of this choice will be apparent later.

The following properties of a homogeneous Markov chain are easy to check.

1. Let 0 = ( 01 , 02 , . . . , 0N) be the distribution of X0. ThenP(X0 = si0 , X1 = si1 , . . . , Xn = sin ) = 0i0 i0i1 . . . in1in .

2. Let n = ( n1 , n2 , . . . , nN) be the distribution of Xn . Then

n = n 0 = n1.

Example 2.4.4 Let {n} be a discrete-time Markov chain as in Definition 2.4.3. Considerthe filtration {Fn} = {0, 1, . . . , n}.

Write Xn = (I(n=s1), I(n=s2), . . . , I(n=sN)).Then Xn is a discrete-time Markov chain with state space the set of unit vectors e1 =

(1, 0, . . . , 0), . . . , eN = (0, . . . , 1) of IRN. However, the probability transitions matrix ofX is . We can write:

E[Xn | Fn1] = E[Xn | Xn1] = Xn1, (2.4.2)

from which we conclude that Xn1 is the predictable part of Xn , given the history of X

up to time n 1 and the nonpredictable part of Xn must be Mn = Xn Xn1. In factit can be easily shown that Mn IRN is a

filtering and measure theory

Documents