filtering and measure theory
TRANSCRIPT
-
8/6/2019 Filtering and Measure Theory
1/270
http://www.cambridge.org/9780521838030 -
8/6/2019 Filtering and Measure Theory
2/270
This page intentionally left blank
-
8/6/2019 Filtering and Measure Theory
3/270
Measure Theory and Filtering
Introduction and Applications
The estimation of noisily observed states from a sequence of data has traditionally incor-porated ideas from Hilbert spaces and calculus-based probability theory. As conditional
expectation is the key concept, the correct setting for filtering theory is that of a probabil-
ity space. Graduate engineers, mathematicians, and those working in quantitative finance
wishing to use filtering techniques will find in the first half of this book an accessible
introduction to measure theory, stochastic calculus, and stochastic processes, with particular
emphasis on martingales and Brownian motion. Exercises are included, solutions to which
are available from www.cambridge.org. The book then provides an excellent users guide
to filtering: basic theory is followed by a thorough treatment of Kalman filtering, includingrecent results that exend the Kalman filter to provide parameter estimates. These ideas are
then applied to problems arising in finance, genetics, and population modelling in three sep-
arate chapters, making this a comprehensive resource for both practitioners and researchers.
Lakhdar Aggoun is Associate Professor in the Department of Mathematics and Statistics
at Sultan Qabos University, Oman.
Robert Elliott is RBC Financial Group Professor of Finance at the University of Calgary,
Canada.
-
8/6/2019 Filtering and Measure Theory
4/270
CAMBRIDGE SERIES IN STATISTICAL AND
PROBABILISTIC MATHEMATICS
Editorial Board
R. Gill (Department of Mathematics, Utrecht University)
B. D. Ripley (Department of Statistics, University of Oxford)
S. Ross (Department of Industrial Engineering, University of California, Berkeley)
M. Stein (Department of Statistics, University of Chicago)
B. Silverman (St. Peters College, University of Oxford)
This series of high-quality upper-division textbooks and expository monographs covers
all aspects of stochastic applicable mathematics. The topics range from pure and appliedstatistics to probability theory, operations research, optimization, and mathematical pro-
gramming. The books contain clear presentations of new developments in the field and
also of the state of the art in classical methods. While emphasizing rigorous treatment of
theoretical methods, the books also contain applications and discussions of new techniques
made possible by advances in computational practice.
Already published
1. Bootstrap Methods and Their Application, by A. C. Davison and D. V. Hinkley
2. Markov Chains, by J. Norris3. Asymptotic Statistics, by A. W. van der Vaart
4. Wavelet Methods for Time Series Analysis, by Donald B. Percival and Andrew T. Walden
5. Bayesian Methods, by Thomas Leonard and John S. J. Hsu
6. Empirical Processes in M-Estimation, by Sara van de Geer
7. Numerical Methods of Statistics, by John F. Monahan
8. A Users Guide to Measure Theoretic Probability, by David Pollard
9. The Estimation and Tracking of Frequency, by B. G. Quinn and E. J. Hannan
10. Data Analysis and Graphics using R, by John Maindonald and John Braun
11. Statistical Models, by A. C. Davison
12. Semiparametric Regression, by D. Ruppert, M. P. Wand, R. J. Carroll
13. Exercises in Probability, by Loic Chaumont and Marc Yor
-
8/6/2019 Filtering and Measure Theory
5/270
-
8/6/2019 Filtering and Measure Theory
6/270
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, So Paulo
Cambridge University PressThe Edinburgh Building, Cambridge , UK
First published in print format
- ----
- ----
Cambridge University Press 2004
2004
Information on this title: www.cambridge.org/9780521838030
This publication is in copyright. Subject to statutory exception and to the provision ofrelevant collective licensing agreements, no reproduction of any part may take placewithout the written permission of Cambridge University Press.
- ---
- ---
Cambridge University Press has no responsibility for the persistence or accuracy ofsfor external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
hardback
eBook (EBL)
eBook (EBL)
hardback
http://www.cambridge.org/9780521838030http://www.cambridge.org/http://www.cambridge.org/9780521838030http://www.cambridge.org/ -
8/6/2019 Filtering and Measure Theory
7/270
Contents
Preface page ix
Part I Theory 1
1 Basic probability concepts 3
1.1 Random experiments and probabilities 3
1.2 Conditional probabilities and independence 9
1.3 Random variables 14
1.4 Conditional expectations 28
1.5 Problems 34
2 Stochastic processes 38
2.1 Definitions and general results 38
2.2 Stopping times 46
2.3 Discrete time martingales 50
2.4 Doob decomposition 56
2.5 Continuous time martingales 59
2.6 DoobMeyer decomposition 62
2.7 Brownian motion 702.8 Brownian motion process with drift 72
2.9 Brownian paths 72
2.10 Poisson process 75
2.11 Problems 75
3 Stochastic calculus 79
3.1 Introduction 79
3.2 Quadratic variations 80
3.3 Simple examples of stochastic integrals 87
3.4 Stochastic integration with respect to a Brownian motion 90
3.5 Stochastic integration with respect to general martingales 94
3.6 The Ito formula for semimartingales 97
3.7 The Ito formula for Brownian motion 108
3.8 Representation results 115
3.9 Random measures 123
3.10 Problems 127
-
8/6/2019 Filtering and Measure Theory
8/270
vi Contents
4 Change of measures 131
4.1 Introduction 131
4.2 Measure change for discrete time processes 134
4.3 Girsanovs theorem 145
4.4 The single jump process 1504.5 Change of parameter in poisson processes 157
4.6 Poisson process with drift 161
4.7 Continuous-time Markov chains 163
4.8 Problems 165
Part II Applications 167
5 Kalman filtering 1695.1 Introduction 169
5.2 Discrete-time scalar dynamics 169
5.3 Recursive estimation 169
5.4 Vector dynamics 175
5.5 The EM algorithm 177
5.6 Discrete-time model parameter estimation 178
5.7 Finite-dimensional filters 180
5.8 Continuous-time vector dynamics 1905.9 Continuous-time model parameters estimation 196
5.10 Direct parameter estimation 206
5.11 Continuous-time nonlinear filtering 211
5.12 Problems 215
6 Financial applications 217
6.1 Volatility estimation 217
6.2 Parameter estimation 221
6.3 Filtering a price process 222
6.4 Parameter estimation for a modified Kalman filter 223
6.5 Estimating the implicit interest rate of a risky asset 229
7 A genetics model 235
7.1 Introduction 235
7.2 Recursive estimates 235
7.3 Approximate formulae 239
8 Hidden populations 242
8.1 Introduction 242
8.2 Distribution estimation 243
8.3 Parameter estimation 246
8.4 Pathwise estimation 247
8.5 A Markov chain model 248
-
8/6/2019 Filtering and Measure Theory
9/270
Contents vii
8.6 Recursive parameter estimation 250
8.7 A tags loss model 250
8.8 Gaussian noise approximation 253
References 255Index 257
-
8/6/2019 Filtering and Measure Theory
10/270
-
8/6/2019 Filtering and Measure Theory
11/270
Preface
Traditional courses for engineers in filtering and signal processing have been based on
elementary linear algebra, Hilbert space theory and calculus. However, the key objectiveunderlying such procedures is the (recursive) estimation of indirectly observed states given
observed data. This means that one is discussing conditional expected values, given the
observations. The correct setting for conditional expected value is in the context of mea-
surable spaces equipped with a probability measure, and the initial object of this book is
to provide an overview of required measure theory. Secondly, conditional expectation, as
an inverse operation, is best formulated as a form of Bayes Theorem. A mathematically
pleasing presentation of Bayes theorem is to consider processes as being initially defined
under a reference probability. This is an idealized probability under which all the observa-tions are independent and identically distributed. The reference probability is a much nicer
measure under which to work. A suitably defined change of measure then transforms the
distribution of the observations to their real world form. This setting for the derivation of the
estimation and filtering results enables more general results to be obtained in a transparent
way.
The book commences with a leisurely and intuitive introduction to -fields and the results
in measure theory that will be required.
The first chapter also discusses random variables, integration and conditional expectation.
Chapter 2 introduces stochastic processes, with particular emphasis on martingales and
Brownian motion.
Stochastic calculus is developed in Chapter 3 and techniques related to changing proba-
bility measures are described in Chapter 4.
The change of measure method is the basic technique used in this book.
The second part of the book commences with a treatment of Kalman filtering in
Chapter 5. Recent results, which extend the Kalman filter and enable parameter estimates
to be obtained, are included. These results are applied to financial models in Chapter 6. The
final two chapters give some filtering applications to genetics and population models.
The authors would like to express their gratitude to Professor Nadjib Bouzar of the
Department of Mathematics and Computer Science, University of Indianapolis, for the
incredible amount of time he spent reading through the whole manuscript and making
many useful suggestions.
Robert Elliott would like to acknowledge the support of NSERC and the hospitality of
the Department of Applied Mathematics at the University of Adelaide, South Australia.
-
8/6/2019 Filtering and Measure Theory
12/270
x Preface
Lakhdar Aggoun would like to acknowledge the support of the Department of Mathemat-
ics and Statistics, Sultan Qaboos University, Al-Khoud, Sultanate of Oman; the hospitality
of the Department of Mathematical Sciences at the University of Alberta, Canada; and the
Haskayne School of Business, University of Calgary, Calgary, Canada.
-
8/6/2019 Filtering and Measure Theory
13/270
Part I
Theory
-
8/6/2019 Filtering and Measure Theory
14/270
-
8/6/2019 Filtering and Measure Theory
15/270
1
Basic probability concepts
1.1 Random experiments and probabilities
An experiment is random if its outcome cannot be predicted with certainty. A simple exampleis the throwing of a die. This experiment can result in any of six unpredictable outcomes 1,
2, 3, 4, 5, 6 which we list in what is usually called a sample space = {1, 2, 3, 4, 5, 6} ={1, 2, 3, 4, 5, 6}. Another example is the amount of yearly rainfall in each of thenext 10 years in Auckland. Each outcome here is an ordered set containing ten nonnegative
real numbers (a vector in IR10+ ); however, one has to wait 10 years before observing theoutcome .
Another example is the following.
Let Xt be the water level of a dam at time t. If we are interested in the behavior of Xt
during an interval of time [t0, t1] say, then it is necessary to consider simultaneously an
uncountable family of Xts, that is,
= {0 Xt < , t0 t t1}.
The smallest observable outcome of an experiment is called simple.
The set{
1}
containing 1 resulting from a throw of a die is simple. The outcome odd
number is not simple and it occurs if and only if the throw results in any of the three simple
outcomes 1, 3, 5. If the throw results in a 5, say, then the same throw results also in a
number larger than 3 or odd number. Sets containing outcomes are called events. The
events odd number and a number larger than 3 are not mutually exclusive, that is, both
can happen simultaneously, so that we can define the event odd number and a number
larger than 3.
The event odd number and even number is clearly impossible or empty. It is called
the impossible eventand is denoted, in analogy with the empty set in set theory, by
. The
event odd number oreven number occurs no matter what is the event . It is itself and
is called the certain event.
In fact possible events of the experiment can be combined naturally using the set opera-
tions union, intersection, and complementation. This leads to the concept of field or algebra
(-field (sigma-field) or -algebra, respectively) which is of fundamental importance in the
theory of probability.
-
8/6/2019 Filtering and Measure Theory
16/270
4 Basic probability concepts
A nonempty class F of subsets of a nonempty set is called a fieldor algebra if
1. F,2. F is closed under finite unions (or finite intersections),
3. F is closed under complementation.
It is a -fieldor (-algebra) if the stronger condition
2. F is closed under countable unions (or countable intersections)
holds.
If{F} is a -field the pair (,F) is called a measurable space. The sets B Fare calledevents and are said to be measurable sets.
For instance, the collection of finite unions of the half open intervals (a, b], ( 0 Ft+ . We may also say that a filtration {Ft, t 0} is right-continuous if newinformation at time t arrives precisely at time t and not an instant after t.
It is left-continuous if{Ft} contains events strictly priorto t, that is Ft =
s
-
8/6/2019 Filtering and Measure Theory
19/270
1.1 Random experiments and probabilities 7
It is easily seen that if is finite we need only specify P on atoms ofF.
The triple (, F, P) is called a probability space.
Nonempty events which are unlikely to occur and to which a zero probability is assigned
are called negligible events or null events.
A -field F is P-complete if all subsets of null events are also events. Of course, theirprobability is zero.
A filtration is complete ifF0 is complete, i.e. all the null events are known at the initial
time.
The mathematical object (,F,Ft, P), where the filtration {Ft, t 0} is right-continuous and complete, is sometimes called a stochastic basis or a filtered probability
space .
The filtration {Ft, t 0} is said to satisfy the usual conditions if it is right-continuous
and complete.For monotonic sequences of events we have the following result on continuity of proba-
bility measures.
Theorem 1.1.3 Let(,F, P) be a probability space. If{An} is an increasing sequence ofevents with limit A, then
P(An) P(A),
and if{B
n}is a decreasing sequence of events with limit B, then
P(Bn) P(B).
Proof To prove the first statement, visualize the sequence {An} as a sequence of increasingconcentric disks and then define the sequence of disjoint rings {Rn} (except for R1 whichis the disk A1):
R1 = A1, R2 = A2 A1, . . . , Rn = An An1.
Note that
Ak = kn=1Rn , A = n=1An = n=1Rn ,
so that by -additivity
P(A) = n=1 P(Rn ) = limkkn=1 P(Rn) = limk P(kn=1Rn ) = limk P(Ak).The proof of the second statement follows by considering the sequence of complementary
events { Bn} which is increasing with limit B, so that
1 P(An) 1 P(A) = P(An) P(A).
Example 1.1.4 Consider the experiment of tossing a fair coin infinitely many times and
observing the outcomes ofall tosses. Here each = (H, T) is a countably infinitesequence of Heads and Tails. If we denote Heads and Tails by 0 and 1, each is a
sequence of 0s and 1s and it can be shown that there are as many s as there are points in
the interval [0, 1)!
-
8/6/2019 Filtering and Measure Theory
20/270
8 Basic probability concepts
Suppose we wish to estimate the probability of the event consisting of those s for which
the proportion of heads converges to 1/2. The so-called Strong Law of Large Numbers
says that this probability is equal to one, i.e. the s for which the convergence to 1/2
does not hold form a negligible set. However, this negligible set is rather huge, as can be
imagined!
Example 1.1.5 In Example 1.1.4 let Fn,S be the collection of infinite sequences of Hs and
Ts with some restriction S put on the first n tosses. For instance, ifn = 3,
S = { H H T . . . , H T H . . . , T H H . . . } (H, T)3,
F3,S is the collection of infinite sequences of Hs and Ts for which the first three entries
contain exactly two Hs.It is left as an exercise to show that the class
F= {Fn,S, S (H, T)n , n IN} is a field.
We now quote without proof from [4] the following result on extending a function P defined
on sets in a field.
Theorem 1.1.6 ([4]) If P is a probability measure on a fieldA, then it can be extended
uniquely to the -field F= {A} generated by A, i.e. the restriction of the extensionmeasure to the fieldA is P itself and by tradition they are both denoted by P.
Let us return to the coin-tossing situation of Example 1.1.5.
Using the extension theorem (Theorem 1.1.6) one can construct a (unique) probability
measure P called product probability measure on the space ((H, T),F), starting from aninitial probability (p(H), p(T)) = (1/2, 1/2) by setting
P(Fn,S ) =S
12
n = (number of infinite sequences in S) 12
n.
It is left as an exercise to show that P does not depend on the representations of sets in F
and that it is countably additive. (See [4]).
An immediate generalization of the coin tossing experiment in Example 1.1.5 is to con-
sider an infinite sequence of independent experiments, to which corresponds an infinite
sequence of probability spaces (1,F1, P1), (2,F2, P2), . . . . We are interested in the
space (
)
=
1
2 . . . of all infinite sequences
=(
1,
2, . . . ). Events of inter-
est are again cylinder sets, i.e. infinite sequences with restrictions put on the first n outcomes.
The collection of all these cylinders form a field which generates a -field F, often denoted
F1 F2 . . . . A probability measure P can be defined on cylinder sets then extendeduniquely to F using the Extension Theorem 1.1.6.
In the coin-tossing experiment, an example of an event which is in F is the event F that
a Head will occur. Clearly, F = k=1 Fk, where Fk is the event that a Head occurs onthe k-th trial and not before. Since each Fk is a cylinder set, P(Fk) is well defined for each
-
8/6/2019 Filtering and Measure Theory
21/270
1.2 Conditional probabilities and independence 9
k 1. Moreover the Fks are pairwise disjoint, hence
P(F) =
k=1P(Fk) =
k=1
1
2k= 1.
Note that this probability is still 1 regardless of the size of the probability of occurrence ofa Head, (as long as it is not 0).
Modeling with infinite sample spaces is not a mathematical fantasy. In many very simple
minded problems infinite sequences of outcomes cannot be avoided. For example, the first
time a Head occurs event cannot be described in a finite sample space model because the
number of trials before it occurs cannot be bounded in advance.
In general, it is impossible to define a probability measure on all the subsets of an infinite
sample space; that is, one cannot say any subset is an event. However, consider the following
case.
Example 1.1.7 Suppose that is countable and let F be the -field 2. Then it is not
difficult to define a probability measure on F. Choose P such that
0 P({}) 1 and P({}) =
P() = 1,
and for any F F, define P(F) =F P().Let {Fn}nIN be a sequence of disjoint sets in F and let n, denote the simple events in
Fn. Since we have an infinite series of nonnegative numbers,
P(
n
Fn) =n,m
P(n,m ) =
n
m
P(n,m ) =
n
P(Fn ).
1.2 Conditional probabilities and independence
Given a probability space (,F, P) and some event B with P(B)
=0, we define a new
posteriorprobability measure as follows. If A is any event we define the probability of A
given B as
P(A | B) = P(A and B)P(B)
= P(A B)P(B)
,
provided P(B) > 0. Otherwise P(A | B) is left undefined.What we mean by given event B is that we know that event B has occurred, that is we
know that B, so that we no longer assign the same probabilities given by P to eventsbut assign new, or updated, probabilities given by the probability measure P(. | B). Anyevent which is mutually exclusive with B has probability zero under P(. | B) and the newprobability space is now (B,F B, P(. | B)).
If our observation is limited to knowing whether event B has occurred or not we may as
well define P(. | B), where B is the complement of B within . Prior to knowing wherethe outcome is we define the, now random, quantity:
P(. | B or B)() = P(. | {B})() = P(. | B)IB () + P(. | B)IB ().
-
8/6/2019 Filtering and Measure Theory
22/270
10 Basic probability concepts
This definition extends in an obvious way to a -field G generated by a finite or countable
partition {B1, B2, . . . } of and the random variable P(. | G)() is called the conditionalprobability given G. The random function P(. | G)() whose values on the atoms Bi areordinary conditional probabilities P(. | Bi ) =
P(. Bi )P(B
I)
is not defined if P(Bi ) = 0. Inthis case we have a family of functions P(. | G)(), one for each possible arbitrary valueassigned to the undefined P(. | Bi ). Usually, one version is chosen and different versionsdiffer only on a set of probability 0.
Example 1.2.1 Phone calls arrive at a switchboard between 8:00 a.m. and 12:00 p.m.
according to the following probability distribution:
1. P(k calls within an interval of length l) = el lk
k!;
2. If I1 and I2 are disjoint intervals,
P((k1 calls within I1) (k2 calls within I2))= P(k1 calls within I1) P(k2 calls within I2),
that is, events occurring within disjoint time intervals are independent.
Suppose that the operator wants to know the probability that 0 calls arrive between 8:00
and 9:00 given that the total number of calls from 8:00 a.m. to 12:00 p.m., N812, is known.
From past experience, the operator assumes that this number is near 30 calls, say. Hence
P(0 calls within [8, 9) | 30 calls within [8, 12])
= P((0 calls within [8, 9))
(30 calls within [9, 12]))
P(30 calls within [8, 12])
= P(0 calls within [8, 9))P(30 calls within [9, 12])P(30 calls within [8, 12])
=
3
4
30,
which can be written as
P(0 calls within [8, 9) | N812 = N) =
3
4
N. (1.2.1)
Remarks 1.2.2 Consider again Example 1.2.1.
1. The events Fi = { : N812() = i}, i = 0, 1, . . . form a partition of and are atomsof the -field generated by observing only N812, so we may write:
P(0 calls within [8, 9) | Fi , i IN)()= P(0 calls within [8, 9) | {Fi , i IN})()
=i
3
4
iIFi ().
2. Observe that since each event F {Fi , i IN} is a union of some Fi1 , Fi2 , . . . , andsince we know, at the end of the experiment, which Fj contains , then we know
-
8/6/2019 Filtering and Measure Theory
23/270
1.2 Conditional probabilities and independence 11
whether or not lies in F, that is whether F or the complement of F has occurred. In
this sense, {Fi , i IN} is indeed all we can answer about the experiment from what weknow.
The likelihood of occurrence of any event A could be affected by the realization of B.Roughly speaking if the proportion of A within B is the same as the proportion of A
within then it is intuitively clear that P(A | B) = P(A | ) = P(A). Knowing that Bhas occurred does not change the prior probability P(A). In that case we say that events
A and B are independent. Therefore two events A and B are independent if and only if
P(A B) = P(A)P(B).Two -fields F1 and F2 are independent if and only if P(A1 A2) = P(A1)P(A2) for
all A1 F1, A2 F2.
If events A and B are independent so are {A} and {B} because the impossible event is independent of everything else including itself, and so is . Also A and Bc, Ac andB, Ac and Bc are independent. We can say a bit more, if P(E) = 0 or P(E) = 1 then theevent E is independent of any other event including E itself, which seems intuitively clear.
Mutually exclusive events with positive probabilities provide a good example of depen-
dent events.
Example 1.2.3 In the die throwing experiment the -fields
F1 =
{{
1, 2}
,{
3, 4, 5, 6}}
,
and
F2 = {{1, 2}, {3, 4}, {5, 6}},are not independent since if we know, for instance, that has landed in {5, 6} (or equivalently{5, 6} has occurred) in F2 then we also know that the event {3, 4, 5, 6} in F1 has occurred.This fact can be checked by direct calculation using the definition. However, the -fields
F3
=
{{1, 2, 3
},
{4, 5, 6
}},
and
F4 = {{1, 4}, {2, 5}, {3, 6}},are independent. The occurrence of any event in any ofF3 or F4 does not provide any
nontrivial information about the occurrence of any (nontrivial) event in the other field.
Another fundamental concept of probability theory is conditional independence. Events A
and C are said to be conditionally independent given event B if P(A
C|
B)=
P(A|B)P(C | B), P(B) > 0.
The following example shows that it is not always easy to decide, under a probability
measure, if conditional independence holds or not between events.
Example 1.2.4 Consider the following two events:
A1=person 1 is going to watch a football game next weekend,A2=person 2, with no relation at all with person 1, is going to watch a football game next
weekend.
-
8/6/2019 Filtering and Measure Theory
24/270
12 Basic probability concepts
There is no reason to doubt the independence of A1 and A2 in our model. However consider
now the event B = next weekend weather is good. Suppose thatP(A1 | B) = .90, P(A2 | B) = .95, P(A1 | B) = .40,
P(A2|
B)=
.30, P(B)=
.75 and P(B)=
.25.
Using this information it can be checked that P(A1 A2) = P(A1) P(A2). The reason isthat event B has linked events A1 and A2 in the sense that if we knew that A1 has occurred
the probability of B should be high, resulting in the probability of A2 increasing.
The independence concept extends to arbitrary families of events. A family of events
{A, I} is said to be a family of independent events if and only if any finite subfamilyis independent, i.e., for any finite subset of indices
{i1, i2, . . . , ik
} I,
P(Ai1 Ai2 Aik) = P(Ai1 )P(Ai2 ) . . . P(Aik).A family of-fields {F , I} is said to be a family of independent -fields if and only ifany finite subfamily {Fi1 ,Fi2 , . . . ,Fik} is independent; that is, if and only if any collectionof events of the form {Ai1 Fi1 , Ai2 Fi2 , . . . , Aik Fik} is independent.
An extremely powerful and standard tool in proving properties which are true
with probability one is the BorelCantelli Lemma. This lemma concerns sequences of
events.
Let {An} be a monotone decreasing sequence of events, i.e.A1 A2 An An+1 . . . ,
then by definition
limn
An =
n=1An.
Let
{Bn
}be a monotone increasing sequence of events, i.e.
B1 B2 Bn Bn+1 . . . ,then by definition
limn
Bn =
n=1Bn .
Let {Cn} be an arbitrary sequence of events. Define
An = supkn
Ck= k=n
Ck,
and
Bn = infkn
Ck=
k=n
Ck.
Event An occurs if and only if at least one of the events Cn, Cn+1, . . . occurs and event Bnoccurs if and only if all the Cn occur simultaneously except for a finite number.
-
8/6/2019 Filtering and Measure Theory
25/270
1.2 Conditional probabilities and independence 13
By construction, An and Bn are monotone. An is decreasing and Bn is increasing so that:
A = limn
An =
n=1An =
n=1
k=n
Ck,
and
B = limn
Bn =
n=1Bn =
n=1
k=n
Ck.
Event A = n=1k=n Ck = lim sup Cn occurs if and only if infinitely many Cn occur, orCn occurs infinitely often (Cn i.o.). To see this suppose that belongs to an infinite number
of Cn s; then for every n,
k=n Ck. Therefore,
n=1
k=n Ck. Conversely, if belongs to only a finite number of Cn s, then there is some n0 such that
k=n0 Ck.
Sincen=1k=n Ck k=n0 Ck, this shows that n=1k=n Ck if belongs to only
a finite number ofCn s.
Event B = n=1k=n Ck = lim infCn occurs if and only if all but a finite number ofCn occur.
Clearly lim infCn lim sup Cn .Consider the following simple example of sequences of intervals in IR.
Example 1.2.5 Let A and B be any subsets of and define the sequences C2n = A andC2n
+1
=B. Then:
lim sup Cn = A B, lim infCn = A B.
Example 1.2.6 Let
Ck = {(x , y) IR2 : 0 x < k, 0 y 0,
({ : |f()| }) 1p |f()|
pd().
Proof Let F = { : |f()| }. Then
|f()|pd()
F
|f()|pd() p
F
d = p(F ).
In addition to almost sure convergence, which was defined in Example 1.3.11, we have
the following types of convergence.
-
8/6/2019 Filtering and Measure Theory
39/270
1.3 Random variables 27
First recall that Lp(, F, P), p 1, is the space of random variables with finite absolutep-th moments, that is, E[|X|p] < .
{Xk} converges to X in Lp (Xk Lp
X), (0 < p < ), ifE[
|Xk
|p] ] 0 (k ).
Let Fn (x ) = P[Xn x ], F(x) = P[X x ]. Xn converges in distribution to X (Xn DX) if
IR
g(x)dFn(x) IR
g(x)dF(x),
for every real valued, continuous bounded function g defined on IR. A necessary and suffi-
cient condition for that is:
Fn(x) F(x ),at every continuity point x of F [7].
These convergence concepts are in the following relationship to each other.
(Xk a.s. X) (Xk P X) (Xn D X).A useful concept is the uniform integrability of a family of random variables which
permits the interchange of limits and expectations.
Definition 1.3.34 A sequence {Xn} of random variables is said to be uniformly integrableif
supn
E[|Xn|I{|Xn |>A}] 0, (A ). (1.3.4)
A family {Xt}, t 0 of random variables is said to be uniformly integrable ifsup
t
E[|Xt|I{|Xt|>A}] 0, (A ). (1.3.5)
Example 1.3.35 IfL is bounded in Lp(,F, P) for some p > 1, then L is uniformly
integrable.
Proof Choose A so large that E[|X|p] < A for all X L. For fixed X L, let Y =|X|I{|X|>K}. Then Y() K I{|X|>K} > 0 for all . Since p > 1,
Yp1
Kp1 I{|X|>K},
and
K1pYp = Yp1
Kp1Y Y I{|X|>K} = Y.
Thus
E[Y] K1pE[Yp] K1pE[|X|p] K1pA,which goes to 0 when K , from which the result follows.
-
8/6/2019 Filtering and Measure Theory
40/270
28 Basic probability concepts
The following result is a somewhat stronger version of Fatous Lemma 1.3.16.
Theorem 1.3.36 Let{Xn} be a uniformly integrable family of random variables. Then
E[lim infXn ] lim infE[Xn ].
Proof The proof is left as an exercise
Corollary 1.3.37 Let{Xn} be a uniformly integrable family of random variables such thatXn X (a.s.), then
E|Xn| < , E(Xn ) E(X), and E|Xn X| 0.
The following deep result (Shiryayev [36]) gives a necessary and sufficient condition for
taking limits under the expectation sign.
Theorem 1.3.38 Let0 Xn X and E(Xn ) < . Then
E(Xn ) E(X) the family {Xn} is uniformly integrable.
Proof The sufficiency part follows from Theorem 1.3.36. To prove the necessity, note
that if x is not a point of positive probability for the distribution of the random variable X
then XnI
{Xn
-
8/6/2019 Filtering and Measure Theory
41/270
1.4 Conditional expectations 29
Let X = i xiIAi be a simple random variable on a probability space (,F, P). Whatis the expected value of X given some event B having positive probability P(B)? Under
the posterior probability measure P(. | B) this is
E[X
|B]
= xi P(X = xi | B)= 1
P(B)
xi P({X = xi } B) =
1
P(B)E[X IB ].
E[X IB ] is the probability weighted sum of the values taken on by X in the event B. We
divide the weighted sum by P(B) to obtain the weighted average.
We could write as a definition:
E[X | B] = E[X IB ]E[IB ]
= E[X IB ]P(B)
.
Let X = IC and Y = IB . The -field (Y) is generated by the atoms B and B. To see this,consider any Borel set B:
Y1{B} =
if{0, 1} / B,B if 0 B, 1 / B,B if 1 B, 0 / B, if{0, 1} B.
Hence (Y) = {, B, B, }.Define
E[X | Y] = E[X | (Y)] = E[X | atoms of(Y)] = E[X | B, B].
Or,
E[IC | B, B]() = P(C | B, B)() = P(C | B)IB () + P(C | B)IB ().
Hence E[X | Y] is a function constant on the atoms of(Y). That isE[X | Y] is (Y)-measurable.
Since E[X | Y] is a random variable its mean is:
E[E[X | Y]] = E[P(C | B)IB () + P(C | B)IB ()]= P(C B) + P(C B) = P(C) = E[X].
If X is an integrable random variable and Y = i yiIBi is a simple random variable, wewrite
E[X | Y] = E[X | (Y)] = E[X IBi ]
P(Bi )IBi ().
Hence E[X | Y] is (Y)-measurable and
E[E[X | Y]] =
E[X IBi ] = E[X].
The expected value of E[X|
Y] is the same as the expected value of X.
-
8/6/2019 Filtering and Measure Theory
42/270
30 Basic probability concepts
Let X L1 (E|X| < ) be a (nonnegative for simplicity) random variable on a prob-ability space (,F, P) and G be a sub--field ofF. The probability space (,G, P) is
a coarsening of the original one and X is, in general, not measurable with respect to G.
We seek now a G-measurable random variable, which we denote temporarily by XG , that
assumes, on average, the same values as X. That is, we seek an integrable random variableXG such that XG is G-measurable and
A
XGdP =
A
XdP, for all A G.
Now the set function Q(A) = A
Xd P is a measure absolutely continuous with respect to
P, so that the RadonNikodym Theorem 1.3.25 guarantees the existence of a G-measurable
random variable suggestively denoted by E(X | G), which is uniquely determined excepton an event of probability zero, such that
A
XdP =
A
E[X | G]dP,
for all A G. We say that XG is a version of E(X | G). For a general integrable randomvariable X we define E[X | G] as E[X+ | G] E[X | G].
Remark 1.4.1 Let(,F, P) be given, and suppose Xisan L2 random variable (measurable
with respect to F). Let G be a sub--algebra ofF, that is, G is less informative than F. A
natural question is: by observing only G how much can we learn about X? Or, among all
random variables which are G-measurable which one gives us the best information (in the
mean square sense) about the random variable X? It turned out that E[X | G] is the closest(G-measurable) random variable to X. This is seen by considering, for any G-measurable
random variable,
Z = X E[X | G].
Then:
E[(Z Y)2] = E[(X E[X | G])2 + Y2 + 2Y(X E[X | G])]= E[E[(X E[X | G])2 | G]] + E[Y2].
This is minimized when Y = 0 a.s.
Example 1.4.2 Let = (0, 1], X() = , P be Lebesgue measure and consider the -field
G = {(0,1
4 ], (1
4 ,
1
2 ], (1
2 ,
3
4 ], (3
4 , 1]} = {A1, A2, A3, A4}.E[X | G] must be constant on the atoms of G so that
E[X | G]() =
xiIAi ().
where xi =E[X IAi ]
P(Ai ).
Clearly P(Ai )
=
1
4and E[X IAi ]
= Aixdx .
-
8/6/2019 Filtering and Measure Theory
43/270
1.4 Conditional expectations 31
Hence
E[X | G]() = 18
IA1 () +2
8IA2 () +
5
8IA3 () +
7
8IA4 (),
which is a G-measurable random variable.
Example 1.4.3 Let X1, X2 and X3 be three independent, identically distributed (i.i.d.)
random variables such that
P(Xi = 1) = p = 1 P(Xi = 0) = 1 q.
Let S = X1 + X2 + X3. Suppose that we observe X1 and X2 and we wish to find the(conditional) probability that S = 2 given X1 and X2. The -field generated by the (vector)
random variable (X1, X2) is generated by the atoms {Ai j }, i, j = 0, 1, where Ai j = [ :X1() = i, X2() = j ].
P(S = 2 | X1, X2)() = P(S = 2 | {X1, X2})()=
i,j=0,1P(S = 2 | Ai j )IAi j ()
=
i,j=0,1
P(S = 2 Ai j )P(Ai j )
IAi j ()
= i,j=0,1
P(i + j + X3 = 2) P(Ai j )P(Ai j )
IAi j ()
=
i,j=0,1P(X3 = 2 i j )IAi j ()
= P(X3 = 0)IA11 + P(X3 = 1)I{A10A01}= q IA11 () + p I{A10A01}().
The expected value of the (
{X1, X2
}-measurable) random variable
P(S = 2 | X1, X2) isE[q IA11 () + p I{A10A01}()] = q P(A11) + p[ P(A01) + P(A01)] = P(S = 2).
Example 1.4.4 Let f L1[0, 1], i.e. the Lebesgue integral [0,1)
|f(x)|dx exists and isfinite. Let Fn = {[
j
2n,
j + 12n
), j = 0, . . . , 2n 1}. Then
E[ f | Fn]() =2n1j=0
(j+1)2nj 2n f(x)dx
2nI[j 2n ,(j+1)2n )().
Theorem 1.4.5 If X is real F-measurable random variable and if
A
XdP = 0 for allA
F, then X
=0 a.s.
-
8/6/2019 Filtering and Measure Theory
44/270
32 Basic probability concepts
Proof Suppose X 0 and
A
Xd P = 0 for all A F. Write An = { : X() 1n }.
AnXdP 1
nP(An) 0.
But
An
XdP = 0 so P(An ) = 0 for all n. Therefore,
P({X > 0} = P(
An )
P(An ) = 0.For a general random variable X, recall that X = X+ X, where both X+ and X arenonnegative.
The following is a list of classical results on conditional expectation:
1. E(X | A) is unique (a.s.)Proof Let X1 = E(X | A) and X2 be an A-measurable random variable such that
A
X2d P =
A
Xd P,
for all A A and let 0 = { : X1 > X2} A. Hence
0
X1
dP= 0 E(X | A) = 0 XdP,
and 0
X2d P =
0
XdP,
so that 0
X1dP =
0
X2d P,
or 0
(X1 X2)dP = 0.
Using Theorem 1.4.5 X1 = X2 a.s.
2. IfA1 and A2 are two sub--fields ofF such that A1 A2, thenE(E(X
|A1)
|A2)
=E(E(X
|A2)
|A1)
=E(X
|A1). (1.4.1)
Proof Clearly E(E(X | A1) | A2) = E(E(X | A2) | A1). Now E(E(X | A2) | A1) isA1-measurable and for A A1,
A
E(E(X | A2) | A1))dP =
A
E(X | A2)d P
=
A
XdP =
A
E(X | A1)dP.
Hence E(E(X|A2)
|A1)
=E(X
|A1) a.s.
-
8/6/2019 Filtering and Measure Theory
45/270
1.4 Conditional expectations 33
3. If X, Y, X Y L1 and Y is A-measurable then
E[X Y | A] = Y E[X | A]. (1.4.2)
Proof It is sufficient to prove the result when X and Y are positive. IfY
=IA, A
A,
then for every B AB
X YdP =
ABXdP =
AB
E[X | A]dP
=
B
IAE[X | A]dP =
B
Y E[X | A]dP.
That is, E[X Y | A] = Y E[X | A], ifY is an indicator function. It follows that the resultis true for simple functions of sets in A and therefore for a limit of bounded increasing
sequence of such functions converging to Y.
4. If X is independent of the -field A, then
E(X | A) = E(X). (1.4.3)
Proof First note that E(X) is A-measurable. Now, for A A we have to show that
A E(X |
A)dP = A E(X)dP.
However, the left hand side is equal to E[IAX] and the right hand side is equal to
E[IA]E[X], and their equality follows from the definition of independence of random
variables.
5. Conditional expectation is a projection operation, and so
E[E[X|A]
|A]
=E[X
|A]. (1.4.4)
Example 1.4.6 Consider the joint distribution function F(x1,x2) of two real valued random
variables X1, X2 and the probability measure P on the two-dimensional Borel sets generated
by the distribution function F(x1,x2). Suppose that P is absolutely continuous with respect
to two-dimensional Lebesgue measure. Then, by the RadonNikodym theorem, there exists
a nonnegative density function f(x1,x2) such that for any Borel set B:
P(B)
= IB (x1,x2) f(x1,x2)dx1dx2.
If f(x1,x2) > 0 everywhere,
P(B | X2 = x2) ={x1:(x1,x2)B} f(x1,x2)dx1+
f(x1,x2)dx1,
from which we can deduce thatf(x1,x2)
+ f(x1,x2)dx1
is the density function of the conditional
probability measure P(.|
X2=
x2).
-
8/6/2019 Filtering and Measure Theory
46/270
34 Basic probability concepts
Example 1.4.7 Let X1 and X2 be two random variables with a normal joint distribution.
Then their probability density function has the form
(x1,x2) =1
2 121 2exp
1
2(1
2) x
21 2 x1 x2 + x 22 ,
where 0 < 1 and xi =xi i
i, i = 1, 2. The conditional density of X1 given X2 =
x2 is a normal density with mean 1 + 1
2(x2 2) and variance Var(X1 | X2 = x2) =
(1 2)21 < 21 = Var(X1). To see this, recall that, by definition, the conditional densityof X1 given X2 is given by
(x1 | x2) =(x1,x2)
IR
(x1,x2)dx1
=
1
2 12
1 2exp
1
2(1 2)
x 21 2 x1 x2 + x 22
1
2 2exp
1
2x 22
= 12 11
2exp
1
2(1 2) x 21 2 x1 x2 + 2 x 22
= 12 1
1 2
exp
1
2(1 2) [ x1 x2]2
= 12 1
1 2
exp
1221 (1 2)
x1 (1 +
1
2(x2 2))
2,
and the result follows.
Thus by conditioning on X2 we have gained some statistical information about X1 which
resulted in a reduction in the variability of X1.
1.5 Problems
1. Let {Fi }iI be a family of-fields on . Prove that
iIFi is a -field.2. Let A and B be two events. Express by means of the indicator functions of A and B
IAB , IAB , IAB , IBA, I(AB)(BA),
where A B = A B.3. Let = IR and define the sequences C2n = [1, 2 +
1
2n) and C2n+1 = [2
1
2n + 1 , 1). Show that
lim sup Cn=
[
2,
2], lim infCn
=[
1, 1].
-
8/6/2019 Filtering and Measure Theory
47/270
1.5 Problems 35
4. Let = (1, 2, 3, 4) and P(1) =1
12, P(2) =
1
6, P(3) =
1
3, and P(4) =
5
12.
Let
An
= {1, 3} ifn is odd,
{2, 4} ifn is even.Find P(lim sup An), P(lim infAn ), lim sup P(An), and lim infP(An) and compare.
5. Give a proof to Theorem 1.3.36.
6. Show that a -field is either finite or uncountably infinite.
7. Show that if X is a random variable, then {|X|} {X}.8. Show that the set B0 of countable unions of open intervals in IR is not closed under
complementation and hence is not a -field. (Hint: enumerate the rational numbers
and choose, for each one of them, an open interval containing it. Now show that the
complement of the union of all these open intervals is not in B0.)
9. Show that the class of finite unions of intervals of the form (, a], (b, c], and (d, )is a field but not a -field.
10. Show that a sequence of random variables {Xn} converges (a.s.) to X if and only if > 0 limm P[|Xn X| n m] = 1.
11. Show that if{Xk} converges (a.s.) to X then {Xk} converges to X in probability but theconverse is false.
12. Consider the probability space (IN,F, P), where IN is the set of natural numbers, F
is the collection of all the subsets of IN and P({k}) = 12k
. Let Xk() = I[=k]. Discussthe convergence (a.s.) and in probability of Xk and show that on this particular space
they are equivalent.
13. Let {Xn} be a sequence of random variables with
P[Xn = 2n] = P[Xn = 2n ] =1
2n,
P[Xn=
0]
=1
1
2n1.
Show that {Xn} converges (a.s.) to 0 but E|Xn|p does not converge to 0.14. Let {Xn} be a sequence of random variables with
P[Xn = n1/2p ] =1
n,
P[Xn = 0] = 1 1
n.
Show that
{Xn
}does not converge (a.s.) to 0 but E
|Xn
|p converges to 0.
15. Suppose Q is another probability measure on (,F) such that P(A) = 0 impliesQ(A) = 0 (Q P). Show that P-a.s. convergence implies Q-a.s. convergence.
16. Prove that ifF1 and F2 are independent sub--fields and F3 is coarser than F1, then
F3 and F2 are independent.
17. Let = (1, 2, 3, 4, 5, 6), P(i ) = pi =1
6and the sub--fields
F1 = {{1, 2}, {3, 4, 5, 6}},F2
=
{{1, 2
},{
3, 4}
,{
5, 6}}
.
-
8/6/2019 Filtering and Measure Theory
48/270
36 Basic probability concepts
Show that F1 and F2 are not independent. What can be said about the sub--fields
F3 = {{1, 2}, {3}, {4, 5, 6}},
and
F5 = {{1, 4}, {2, 5}, {3, 6}}?
18. Let = {(i, j ) : i, j = 1, . . . , 6} and P({i, j}) = 1/36. Define the quantity
X() =
k=0k I{(i,j ):i+j=k}.
Is X a random variable? Find PX(x ) = P(X = x), calculate E[X] and describe (X),the -field generated by X.
19. For the function X defined in the previous exercise, describe the random variable
P(A | X), where A = {(i, j ) : i odd, j even} and find its expected value E[P(A | X)].20. Let be the unit interval (0, 1] and on it be given the following -fields:
F1 = {(0, 12 ], ( 12 , 34 ], ( 34 , 1]},F2 = {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]},F3 = {(0, 18 ], ( 18 , 28 ], . . . , ( 78 , 1]}.
Consider the mapping
X() = x1I(0,
14
]() + x2I
(14
, 12
]() + x3I
(12
, 34
]() + x4I
(34
, 1]().
Find E[X | F1], E[X | F2], and E[X | F3].21. Let be the unit interval and ((0, 1], P) be the Lebesgue-measurable space and consider
the following sub--fields:
F1=
{
(0, 1
2
], ( 1
2
, 3
4
], ( 3
4
, 1]}
,
F2 = {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]}.
Consider the mapping
X() = .
Find E[E[X | F1] | F2], E[E[X | F2] | F1] and compare.22. Consider the probability measure P on the real line such that:
P(0) = p, P((0, 1)) = q, p + q = 1,and the random variables defined on = IR,
X1(x ) = 1 + x, X2(x ) = 0I{x0} + (1 + x)I{0
-
8/6/2019 Filtering and Measure Theory
49/270
1.5 Problems 37
23. Let X1, X2 and X3 be three independent, identically distributed (i.i.d.) random vari-
ables such that P(Xi = 1) = p = 1 P(Xi = 0) = 1 q. Find P(X1 + X2 + X3 =s | X1, X2).
24. Let X1, X2 and X3 be three random variables with multinomial distribution with pa-
rameters p1, p2, p3, n, that is
P(X1 = n1, X2 = n2, X3 = n3) =n!p
n11 p
n22 p
n33
n1!n2!n3!,
where n1, n2 and n3 are nonnegative integers such that n1 + n2 + n3 = n. Show that ifn is a random variable with Poisson distribution with parameter then the three random
X1, X2 X3 become mutually independent with Poisson distributions.
25. On = [0, 1] and P being Lebesgue measure show that
X = x1I(0, 12 ] + x2I( 12 ,1] and Y = y1I(0, 14 ]( 34 ,1] + y2I( 14 ], 34 ]are independent.
26. Show that (see Example 1.4.4)
E[ f | Fn] =2n1j=0
(j+1)2nj 2n f(x)dx
2nI[j 2n ,(j+1)2n )
converges a.s. and in L1 to f as n .In particular, if f = IE for some Borel set E, then
2n1j=0
(E [j 2n, (j + 1)2n))2n
I[j 2n ,(j+1)2n )(x)a.s. IE(x),
x [0, 1]. Here (.) is the Lebesgue measure.
-
8/6/2019 Filtering and Measure Theory
50/270
2
Stochastic processes
2.1 Definitions and general results
A stochastic process is a mathematical model for any phenomenon evolving or varying intime (or over some index set), subject to random influences. Examples include the price
of a commodity observed through time, the fluctuating water level behind a dam or the
distribution of shades in a noisy image observed over a region of IR2. Suppose (,F)
is a measurable space. We shall define a stochastic process to be a mapping X(index)()
from {index space} into a second measurable space (E, E), called the state space,or the range space. Alternatively, we can consider a stochastic process as a family {Xt}t
{index space
}of random variables all defined on a measurable space (,F).
For a fixed simple outcome , X(.)() is a function describing one possible trajectory, orsample path, followed by the process. If the time index is frozen at t, say, then we have a
random variable Xt(.), i.e. an F-measurable function of.
When the time index t is continuous, measurability, continuity, etc. in t are considered.
A continuous-time stochastic process {Xt} is said to have independent increments if forall t0 < t1 < t2 < < tn, the random variables Xt1 Xt0 , Xt2 Xt1 , . . . , Xtn Xtn1 areindependent. If for all s, Xt+s Xt has the same distribution for all t, {Xt} is said to possessstationary increments.
Sometimes, a stochastic process is interpreted as just a single random variable takingvalues in a space of functions, that is, with each is associated a function. In analogy with
real random variables, the state space is then endowed with a Borel -field (generated by
the open sets of an underlying topology).
Example 2.1.1 Let
= {1, 2, . . . },
and let the time index n be finite 0 n N. A stochastic process X in this setting is atwo-dimensional array or matrix such that:
X =
X1(1) X1(2) . . .
X2(1) X2(2) . . .
. . . . . . . . .
XN(1) XN(2) . . .
-
8/6/2019 Filtering and Measure Theory
51/270
2.1 Definitions and general results 39
Each row represents a random variable and each column is a sample path or a realization
of the stochastic process X. If the time index is unbounded, each sample path is given by
an infinite sequence.
Example 2.1.2 Let N = 4 in the previous example and suppose that X is given by thefollowing array.
2 3 5 7 11 3 2.3 1
1 1 5.7
2 3 6 83 19
11 7 70 3 2 5 2 215 3 2 1 0 1 2 3
The sample space of{Xn} is IR4 and the stochastic process can be thought of as a mapping(in fact a random variable)
i X(i ) = (X1(i ), . . . , X4(i )) = (x i1,x i2,x i3,x i4)= x i IR4.
The random variable X induces a probability measure PX on the Borel -field B(IR4) in the
usual way, i.e., for any B B(IR4),
PX(B)= P[ : X() B] = P(X1(B)).
For instance,
B1 = {x IR4 : 3 x1 5, 2 x2 7}
contains a single trajectory (column 6 in the table) so that PX(B1) = P(6).
B2 = {x IR4
: max1n4xn 7}contains four trajectories (column 2, column 3, column 4 and column 6 in the table) so that
PX(B2) = P(2, 3, 4, 6).
Example 2.1.3 Let = {1, 2, . . . } and P be a probability measure on (,F). Supposethat the time index set is the set of positive integers. A real valued stochastic process X in
this setting is a two-dimensional infinite array such that:
X =X1(1) X1(2) . . .
X2(1) X2(2) . . .
. . . . . . . . .
.
Here the sample space is
IR
= {(x1,x2, . . . )
IR
IR
. . .
}.
-
8/6/2019 Filtering and Measure Theory
52/270
40 Stochastic processes
Note that the Borel -field B(IR) coincides with the smallest -field containing the open
sets in IR in the metric (x 1,x 2) =
k 2k |x 1k x 2k|
1 + |x 1k x 2k|([36]).
Now think of the stochastic process X as an IR valued random variable
i X(i ) = (X1(i ), X2(i ), . . . ) = (x i1,x i2, . . . )= x i IR.
The random variable X induces a probability measure PX on the -field B(IR). For
instance, if
A = {x IR : supxn > a} B(IR),
then the set A consists of all sequences with some of their entries larger than a and PX(A)
=P( : X() A).
Example 2.1.4 (The Single Jump Process) Consider a stochastic process {Xt}, t 0, whichtakes its values in some measurable space {E, E} and which remains at its initial valuez0 Euntil a random time T, when it jumps to a random position Z. A sample path of the process
is
Xt() = z0 ift < T(),Z() ift T().The underlying probability space can be taken to be
= [0, ] E,
with the -field B E. A probability measure P is given on (, B E) and we suppose
P([, 0] {z0}) = 0 = P({0} E),so that the probabilities of a zero jump and a jump at time zero are zero.
Write
Ft = P[T > t, Z E],c = inf{t : Ft = 0}.
Ft is right-continuous and monotonic decreasing, so there are only countably many pointsof discontinuity {u} = D where Fu = Fu Fu = 0. At points in D, there are positiveprobabilities that X jumps. Note that the more probability mass there is at a point u, the
more predictable is the jump at that point.
Formally define a function by setting:
d(t) = P(T ]t dt, t], Z E | T > t dt).
Then is the probability that the jump occurs in the interval ]t
dt, t], given it has not
-
8/6/2019 Filtering and Measure Theory
53/270
2.1 Definitions and general results 41
happened at t dt. Roughly speaking we have
d(t) = P(T ]t dt, t] | T > t dt)
=
P(T ]t dt, t])
Ftdt
= 1 Ft (1 Ftdt)Ftdt
= (Ft Ftdt)Ftdt
= (Ft Ft)Ft
= dFtFt
.
Define
(t) =
]0,t[
dFs
Fs. (2.1.1)
For instance, ifT is exponentially distributed with parameter we have
(t) = ]0,t[
d exp(
s)
exp( s) = .
Write
FAt = P[T > t, Z A],
then clearly the measure on (IR+,B(IR+)) given by FAt is absolutely continuous with respectto that given by Ft, so that there is a RadonNikodym derivative (A, s) such that
FAt FA0 =
]0,t[
(A, s)dFs . (2.1.2)
The pair (, )istheLevy system for the jump process. Roughly, (dx , s) is the conditional
distribution of the jump position Z given the jump happens at time s.
Let Xt be a continuous time stochastic process. That is, the time index belongs to some
interval of the real line, say, t [0, ). If we are interested in the behavior of Xt duringan interval of time [t
0, t
1] it is necessary to consider simultaneously an uncountable family
of Xts {Xt, t0 t t1}. This results in a technical problem because of the uncountabilityof the index parameter t. Recall that -fields are, by definition, closed under countable
operations only and that statements like {Xt x, t0 t t1} =
t0tt1{Xt x} are notevents! However, for most practical situations this difficulty is bypassed by replacing un-
countable index sets by countable dense subsets without losing any significant information.
In general, these arguments are based on the separability of a continuous time stochastic
process. This is possible, for example, if the stochastic process Xis almost surely continuous
(see Definition 2.1.6).
-
8/6/2019 Filtering and Measure Theory
54/270
42 Stochastic processes
Let X = {Xt : t 0} and Y = {Yt : t 0} be two stochastic processes defined on thesame probability space (,F, P). Because of the presence of, the functions Xt() and
Yt() can be compared in different ways.
Definition 2.1.5
1. X and Y are calledindistinguishable if
P({ : Xt() = Yt(), t 0}) = 1.2. Y is a modification of X if for every t 0, we have
P({ : Xt() = Yt()}) = 1.3. X and Y have the same law or probability distribution if and only if all their finite dimen-
sional probability distributions coincide, that is, if and only if for any sequence of times0 t1 tn the joint probability distributions of (Xt1 , . . . , Xtn ) and (Yt1 , . . . , Ytn )coincide.
Note that the first property is much stronger than the other two. The null sets in the second
and third properties may depend on t.
Recall that there are different definitions of limitfor sequences of random variables. So
to each definition corresponds a type of continuity of a real valued time index process.
Definition 2.1.6
1. {Xt} is continuous in probability if for every t and > 0,limh0
P[|Xt+h Xt| > ] = 0.
2. {Xt} is continuous in Lp if for every t ,limh0
E[|Xt+h Xt|p] = 0.
3. {Xt} is continuous almost surely (a.s.) if for every t,P[lim
h0Xt+h = Xt] = 1.
4. {Xt} is right continuous if for almost every the map t Xt() is right continuous.That is,
limst
Xs = Xt a.s.
If in addition
limst
Xs = Xt exists a.s.,
{Xt} is right continuous with left limits (rcll or corlol or cadlag).However, none of the above notions is strong enough to differentiate, for instance, between a
process for which almost all sample paths are continuous for every t, and a process for which
almost all sample paths have a countable number of discontinuities, when the two processes
have the same finite dimensional distributions. A much stronger criterion for continuity is
sample paths continuity which requires continuity for all ts simultaneously! In other words,
-
8/6/2019 Filtering and Measure Theory
55/270
2.1 Definitions and general results 43
for almost all the function X(.)() is continuous in the usual sense. Unfortunately, the
definition of a stochastic process in terms of its finite dimensional distributions does not
help here since we are faced with whole intervals containing uncountable numbers of ts.
Fortunately, for most useful processes in applications, continuous versions (sample path
continuous), or right-continuous versions, can be constructed.If a stochastic process with index set [0, ) is continuous its sample space can be
identified with C[0, ), the space of all real valued continuous functions. A metric on thisspace is
(x, y) =
k
2ksup0tk |x(t) y(t)|
1 + sup0tk |x(t) y(t)|,
for x , y C[0, ). (See [36].)Let B(C) be the smallest -field containing the open sets of the topology induced by on
C[0, ), the Borel -field. Then ([36]) the same -field B(C) is generated by the cylindersets ofC[0, ) which have the form
{x C[0, ) : xt1 I1, xt2 I2, . . . , xtn In},where each Ii is an interval of the form (ai , bi ]. In other words, a cylinder set is a set
of functions with restrictions put on a finite number of coordinates, or, in the language of
Shiryayev ([36]), it is the set of functions that, at times t1, . . . , tn, get through the windows
I1
, . . . , In
and at other times have arbitrary values.
An example of a Borel set from B(C) is
A = {x : supxt > a, t 0}.Remark 2.1.7 Note that the set given by A depends on the behavior of functions on an
uncountable set of points and would not be in the -field B(C) ifC[0, ) were replaced bythe much larger space IR[0,) (see Theorem 3, page 146 of [36]). In this latter space everyBorel set is determined by restrictions imposed on the functions x , on an at most countable
set of points t1, t2, . . . .
Suppose the index parameter t is either a nonnegative integer or a nonnegative real
number. The -fields FXt = {Xu , u t} are the smallest ones with respect to which therandom variables Xu , u t, are measurable, and are naturally associated with any stochasticprocess {Xt}. FXt is sometimes called the natural filtration associated with the stochasticprocess {Xt}.
The -field FXt contains all the events which by time t are known to have occurred or
not by observing X up to time t.
Often it is convenient to consider larger -fields than FXt . For instance, {Ft ={Xu , Yu ; u t} where {Yt} is another stochastic process.Definition 2.1.8 The stochastic process X is adapted to the filtration {Ft, t 0} if for eacht 0 Xt is a Ft-measurable random variable.Clearly X is adapted to FXt . A function f is F
Xt -measurable if the value of f() can be
decided by observing the history of X up to time t (and nowhere else). This follows from
the multivariate version of Theorem 1.3.6. For instance, f()=
Xt2 () is FXt -measurable
for 0 < t < 1 but it is not FXt -measurable for t 1.
-
8/6/2019 Filtering and Measure Theory
56/270
44 Stochastic processes
As a function of two variables (t, ), a stochastic process should be measurable with
respect to both variables to allow a minimum of good behavior.
Definition 2.1.9 A stochastic process {Xt} with t [0, ) on a probability space
{,F, P
}is measurable if, for all Borel sets B in the Borel -fieldB(IRd),
{(, t) : Xt() B} F B([0, )).If the probability space {,F, P} is equipped with a filtration {Ft} then a much strongerstatement of measurability which relates measurability in t and with the filtration {Ft} isprogressive measurability.
Definition 2.1.10 A stochastic process {Xt} on a filtered probability space {, F,Ft, P}is progressively measurable if, for any t [0, ) and for any set B in the Borel -fieldB(IRd),
{(, s) : s t, Xs () B} Ft B([0, t]).Here B([0, t]) is the -field of Borel sets on the interval [0, t].
A measurable process need not be progressively measurable since (Xt) may contain events
not in Ft.
Lemma 2.1.11 If X is a progressively measurable stochastic process, then X is adapted.
Proof The map (s, ) from [0, t] is Ft-measurable. The map (s, ) Xs () from [0, t] to the state space of X is Ft-measurable. By composition of the twomaps the result follows.
Theorem 2.1.12 If the stochastic process {Xt : t 0} on the filtered probability space{,F,Ft, P} is measurable and adapted, then it has a progressively measurable modifi-cation.
Proof See [28] page 68.
Typically, in a description of a random process, the measure space and the probability
measure on it are not given. One simply describes the family of joint distribution functions
of every finite collection of random variables of the process. A basic question is whether
there is a stochastic process with such a family of joint distribution functions. The following
theorem ([36] page 244), due to Kolmogorov, guarantees us that this is the case if the joint
distribution functions satisfy a set of natural consistency conditions.
Theorem 2.1.13 (Kolmogorov Consistency Theorem) For all t1, . . . , tk, k IN, in the timeindex T , let Pt1,...,tk be probability measures on (IR
k,B(IRk)) such that
Pt(1) ,...,t(k) (F1 Fk) = Pt1,...,tk(F1(1) F1(k)).for all permutations on {1, 2, . . . , k} and
Pt1
,...,tk
(F1
Fk)=
Pt1
,...,tk
,tk+1
,...,tk+m
(F1
Fk
IRn
IRn ),
-
8/6/2019 Filtering and Measure Theory
57/270
2.1 Definitions and general results 45
for all m IN, and the set on the right hand side has a total of k+ m factors. Then thereis a unique probability measure P on the space (IRT,B(IRT)) such that the restriction of P
to any cylinder set Bn = {x IRT : xt1 I1, xt2 I2, . . . , xtn In} is Pt1,...,tn, that is
P(Bn )
=Pt1 ,...,tn (Bn).
Proof See [36] page 167.
Theorem 2.1.14 ( Kolmogorovs Existence Theorem). For all 1, . . . , k, k IN and inthe time index let P1,...,k be probability measures on IR
nk such that
P(1),...,(k) (F1 Fk) = P1,...,k(F1(1) F1(k)),
for all permutations on {1, 2, . . . , k} andP1,...,k(F1 Fk) = P1,...,k,k+1,...,k+m (F1 Fk IRn IRn),
for all m IN, and the set on the right hand side has a total of k+ m factors. Then thereexist a probability space (,F, P) and a stochastic process {X} on into IRn suchthat
P1,...,k(F1 Fk) = P[X1 F1, . . . , Xk Fk],
for all i in the time set, k IN and all Borel sets Fi .Proof The proof follows essentially from Theorems 1.3.9, 1.3.10 and 2.1.13. See [36]
page 247.
Definition 2.1.15 Suppose X is a stochastic process whose index set is the positive integers
Z+. Suppose Fn is a filtration. Then {Xn} is predictable if Xn is Fn1-measurable, that is,Xn() is known from observing events in Fn
1 at time n
1.
In continuous time, without loss of generality, we shall take the time index set to be
[0, ).In the continuous time case, roughly speaking, a stochastic process {Xt} is predictable if
knowledge about the behaviorof the process is left-continuous, that is, Xt isFt-measurable.Stated differently, for processes which are continuous on the left one may predict their value
at each point by their values at preceding points. A Poisson process (see Section 2.10) is not
predictable (its sample paths are right-continuous) otherwise we would be able to predict a
jump time immediately before it jumps. More precisely, a stochastic process is predictableif it is measurable with respect to the -field on [0, ) generated by the family of allleft-continuous adapted stochastic processes.
A stochastic process X with continuous time parameter is optional if it is measurable
with respect to the -field on [0, ) generated by the family of all right-continuous,adapted stochastic processes which have left limits.
Definition 2.1.16 A measurable stochastic process {Xt} with values in [0, ), is called anincreasing process if almost every sample path X() is right-continuous and increasing.
-
8/6/2019 Filtering and Measure Theory
58/270
46 Stochastic processes
Theorem 2.1.17 Suppose {Xt} is an increasing process. Then Xt has a unique decomposi-tion as Xct + Xdt , where {Xct} is an increasing continuous process, and{Xdt } is an increasingpurely discontinuous process, that is, {Xdt } is the sum of the jumps of{Xt} .
If{Xt} is predictable {Xdt } is predictable. If{Xt} is adapted{Xct} is predictable.Proof See [11] page 69.
2.2 Stopping times
One of the most important questions in the study of stochastic processes is the study of
when a process hits a certain level or enters a certain region in its state space for the first
time. Since for each possible trajectory, or realization , there is a hitting time (finite or
infinite), the hitting time is a random variable taking values in the index, or time, space of
the stochastic process.Let IN = {1, 2, 3, . . . , } and F = {n=1Fn}
= n=1 Fn .A random variable taking values in IN is a stopping time (or optional or Markov
time) with respect to a filtration {Fn} if for all n IN we have { : () n} Fn . Anequivalent definition in discrete time is to require { : () = n} Fn .
The concept of stopping time is directly related to the concept of the flow of information
through time, that is, the filtration. The event { : () n} is Fn-measurable, that is,measurable with respect to the information available up to time n. This means a stopping
time is a nonanticipative function, whereas a general random variable may anticipate thefuture.
Example 2.2.1 Let {Xn,Fn} be an adapted process (i.e. {Fn} is a filtration and Xn is Fn -measurable for all n). Suppose A is a measurable set of the state space of X. Then the
random time
= min{k : Xk A}is a stopping time since
{ n} =n
k=1{Xk A} Fn .
If is a stopping time with respect to a filtration Fn so is + m, m IN. However, m, m IN is not a stopping time since the event { m = n} = { = n + m} is notin Fn ; it is in Fn
+m and hence anticipates the future.
In order to measure the information accumulated up to a stopping time we should define
the -field F of events prior to a stopping time . Suppose that some event B is part
of this information. This means that if n we should be able to tell whether or notB has occurred. However, { n} Fn so that we should have B { n} Fn andBc { n} Fn . We, therefore, define:
F = {A F : A { : () n} Fn n 0}.The next examples should help to clarify this concept.
-
8/6/2019 Filtering and Measure Theory
59/270
2.2 Stopping times 47
Example 2.2.2 Let = {i ; i = 1, . . . , 8} and the time index T = {1, 2, 3}. Consider thefollowing filtration:
F1 = {{1, 2, 3, 4, 5, 6}, {7, 8}},F2
=
{{1, 2
},{
3, 4}
,{
5, 6}
,{
7, 8}}
,
F3 = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}}.Now define the random variable
(1) = (2) = (5) = (6) = 2,(3) = (4) = (7) = (8) = 3,
so that
{ = 0} = , { = 1} = ,{ = 2} = {1, 2, 5, 6},{ = 3} = {3, 4, 7, 8},
and is a stopping time.
Now F = { all events A F(= F3) such that for some n the event A is a subset ofthe event { : () n} }. In our situation
F
=
{{1, 2
},
{5, 6
},
{3
},
{4
},
{7
},
{8
}}.
Note that the first two simple events ofF, {1, 2}, {5, 6}, are in F2 and the rest are inF3 as they should be. Also, note that F is notthe -field generated by the random variable
. However, a closer look shows that is F-measurable. If, for instance, the outcome is 1then = 2 and 1(2) = { = 2} = {1, 2, 5, 6} is an atom of the -field generatedby the random variable but not an atom ofF.
Example 2.2.3 Consider again the experiment of tossing a fair coin infinitely many times.
Each is an infinite sequence of heads and tails and
= {H, T}IN.Define the filtration:
F1 = {{ starting with H}, { starting with T}},F2 = {{ starting with HH}, { starting with HT}, { starting with TH},
{ starting with TT}}, . . . ,Fn
=
{{ starting with n fixed letters
}}Suppose that we win one dollar each time Heads comes up and lose one otherwise. Let
S0 = 0 and Sn be our fortune after the n-th toss. Define the random variable = inf{n :Sn > 0}, which is the first time our winnings exceed our losses. Clearly, is a stoppingtime with respect to the filtration Fn .
Here
F = {{ starting with H}, { starting with THH},
{ starting with THTHH
},{
starting with TTHHH}
, . . .}
.
-
8/6/2019 Filtering and Measure Theory
60/270
48 Stochastic processes
and
( starting with H) = 1,( starting with THH) = 3,( starting with THTHH)
=( starting with TTHHH)
=5.
If = T H T H H . . . , then the information at time (T H T H H . . . ) = 5 is in F5 and isgiven by the event composed of all the smaller events starting with T H T H H and is an
atom ofF . However { = 5} = {{T H T H H . . . }, {T T H H H . . . }} which is not an atomofF .
If are two stopping times then F F , because if A F,
A{ n} = (A{ n}){ n} Fn (2.2.1)for all n. From this result we see that if {n} is an increasing sequence of stopping times,the sequence {Fn } is a filtration.
Example 2.2.4 Let = {i , i = 1, . . . , 8} and the time index T = {1, 2, 3, 4}. Considerthe following filtration:
F1 = {{i , i = 1, . . . , 6}, {7, 8}},F
2 =
{{
1,
2,
3},{
4
, 5
, 6}
,{
7
, 8}}
,
F3 = {{1, 2}, {3}, {4}, {5, 6}, {7, 8}},F4 = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}}.
Now define the stopping times 1 and 2:
1(1) = 1(2) = 1(3) = 1(4) = 1(5) = 1(6) = 2,1(7) = 1(8) = 3,
2(1) = 2(2) = 2(3) = 2, 2(5) = 2(6) = 3,2(4) = 2(7) = 2(8) = 4,
so that 1 2 and F1 F2 , where
F1 = {{1, 2, 3}, {4, 5, 6}, {7, 8}},F2 = {{1, 2, 3}, {4}, {5, 6}, {7}, {8}}.
For any Borel set B,
{ : X()() B} =
n=0{Xn () B, () = n} F,
that is, X is a random variable.
If X has been defined and X F =n
Fn, then we define X () = X()(), i.e.X
= nIN XnI{=n} F, that is, X is F -measurable.
-
8/6/2019 Filtering and Measure Theory
61/270
2.2 Stopping times 49
In the continuous time situation, definitions are more involved and the time parameter
t plays a much more important role since continuity, limits etc. enter the scene. Let {Ft},t [0, ) be a filtration. A nonnegative random variable is called a stopping time withrespect to the filtration Ft if for all t 0 we have { : () t} Ft.
A nonnegative random variable is an optional time with respect to the filtration Ft iffor all t 0 we have { : () < t} Ft.
Every stopping time is optional, and the two concepts coincide if the filtration is
right-continuous since { : () t} Ft+ for every > 0, and hence { : () t} >0 Ft+ = Ft+ = Ft provided that Ft is right-continuous.
Example 2.2.5 Suppose {Xt, t 0} is continuous and adapted to the filtration {Ft, t 0}.
1. Consider () = inf{t, Xt() = b}, the first time the process X hits level b IR (firstpassage time to a level b IR). Then is a stopping time since
{ t} =nIN
{rQ,rt}
{|Xr b| 1
n} Ft.
2. Consider () = inf{t, |Xt()| 1}, the first time the process X leaves the interval[1, +1]. Then is a stopping time.
3. Consider () = inf{t, Xt() > 1} which is the first time the jump Xt = Xt Xtexceeds 1. Then is a stopping time.
Similarly to the discrete time case, the -field of events prior to a stopping time is
defined by
F = {A F : A { : () t} Ft t 0}. (2.2.2)
Any stopping time is F -measurable as, for s
t,
{ : () s} { : () t} = { : () min(t, s)} Fmin(t,s) Ft. (2.2.3)
Hence { : () s} F.If 1, 2 are stopping times, then min(1, 2), max(1, 2) and 1 + 2 are stopping
times as:
1. {min(1, 2) t} = {1 t} {2 t} F
t,2. {max(1, 2) t} = {1 t} {2 t} Ft,3. {1 + 2 t} = {1 = 0, 2 = t} {2 = 0, 1 = t}
p,qQ,p+qt({1 p} {2 q}
,
where Q is the set of rational numbers.
4. If{n} is a sequence of stopping times then sup n is a stopping time since {sup n t} =
n{n t} Ft.
5. If1, 2 are stopping times such that 1
2 then F1
F2
.
-
8/6/2019 Filtering and Measure Theory
62/270
50 Stochastic processes
Perhaps one of the most important applications of the concept of stopping time is the
so-called strong Markov property.
A stochastic process {Xt} is a Markov process if
E[ f(Xt+
s )|F
X
t
]=
E[ f(Xt+
s )|
Xt], ( P-a.s.) (2.2.4)
where f is any bounded measurable function and FXt = {Xu , u t}. Equation (2.2.4) istermed the Markov property.
A natural generalization of the Markov property is the strong Markov property, where
the present time t in (2.2.4) is replaced by a stopping time and the future time t + s isreplaced by another later stopping time. That is, if and are stopping times and ,
E[X
|F ]
=X a.s.
In other words a stochastic process {Xt} has the strong Markov property if the informationabout the behavior of{Xt} prior to the stopping time is irrelevant in predicting its behaviorafter that time once X is observed.
2.3 Discrete time martingales
Martingales are probably the most important type of stochastic processes used for modeling.
They occur naturally in almost any information processing problem involving sequentialacquisition of data: for example, the sequence of estimates of a random variable based on
increasing observations, and the sequence of likelihood ratios in a sequential hypothesis
test are martingales.
The stochastic process X is a submartingale (supermartingale) with respect to the filtra-
tion {Fn} if it is
1. Fn-adapted,
2. E[|Xn|] < for all n, and3. E[Xn | Fn ] Xn a.s. (E[Xn | Fn ] Xn a.s.) for all n n.
The stochastic process X is a martingale if it is a submartingale and a supermartingale.
If we recall the definition of conditional expectation we see that the requirement E[Xn+1 |Fn] = Xn a.s. implies the following:
FE[Xn+1 | Fn ]dP = F
Xn+1dP, F Fn ,
and F
Xn dP =
F
Xn+1d P, F Fn. (2.3.1)
Since Fn Fn+1 Fn+k, it easily seen that
FXnd P = F Xn+1dP = FXn+kdP, F Fn. (2.3.2)
-
8/6/2019 Filtering and Measure Theory
63/270
2.3 Discrete time martingales 51
and hence with probability 1 E[Xn+k | Fn ] = Xn . Setting F = and n = 1, 2, . . . in(2.3.2) gives
E[X1] = E[X2] = = E[Xn].
A classical example of a martingale X is a players fortune in successive plays of a fair game.If X0 is the initial fortune, then fair means that, on average, the fortune at some future
time n, after more plays, should be neither more nor less than X0. If the game is favorable
to the player, then his fortune should increase on average and Xn is a submartingale. If the
game is unfavorable to the player, Xn is a supermartingale.
The following important inequality is used to prove a fundamental result on constructing
a uniformly integrable family of random variables by conditioning a fixed (integrable)
random variable on a family of sub--fields.
Lemma 2.3.1 (Jensens Inequality). Suppose X L 1. If : IR IR is convex and(X) L1, then
E[(X) | G] (E[X | G]). (2.3.3)
Proof (see, for example, [11]) Any convex function : IR IR is the supremum of afamily of affine functions, so there exists a sequence (n) of real functions with n(x) =anx + bn for each n, such that = supn n . Therefore (X) anX + bn holds a.s. for each(and hence all) n. So by the positivity ofE[. | G], E[(X) | G] supn (anE[X | G] + bn) =(E[X | G]) a.s.
Lemma 2.3.2 Let X Lp, p 1. The family
L = {E[X | G] : G is a sub--field ofF},
is uniformly integrable.
Proof Since (x) = |x|p is convex, Jensens Inequality 2.3.1 implies that
|E[X | G]|p E[|X|p | G].
Hence
E[|E[X | G]|p] E[E[|X|p | G]] = E[|X|p],
that is, E[
|E[X
|G]
|p] b, for1 i k.
The following theorem is a useful tool in proving convergence results for submartingales.
Theorem 2.3.6 (Doob). If{Xn,Fn} is a submartingale then for all n 1,
E[Cn [a, b]] E[Xn a]+
b a ,
where and[Xn a]+ = max{(Xn a), 0}.
Proof See [36] page 474.
Theorem 2.3.7 If{Xn,Fn} is a nonnegative martingale then Xn X a.s., where X is anintegrable random variable.
Proof Suppose that the event { : lim infXn () < lim sup Xn()} =
p 0. (2.3.5)
-
8/6/2019 Filtering and Measure Theory
65/270
2.3 Discrete time martingales 53
This means that {Xn} oscillates about or up-crosses the interval [a, b] infinitely many times.However, using Theorem 2.3.6 and the fact that sup E[Xn ] = E[X1] < we have:
limn
E[Cn[a, b]]
lim
n
E[Xn a]+
b a
E[X1] + |a|
b an(Xn+1 Xn ) | Fn ]
= I{>n}E[(Xn+1 Xn) | Fn ] = 0,since { > n} Fn .
We also have that stopping at an optional time preserves the martingale property.
Theorem 2.3.12 (Doob Optional Sampling Theorem). Suppose {Xn ,Fn} is a martingale.Let (a.s.) be stopping times such that X and X are integrable. Also supposethat
lim inf
{n}
|Xn|dP 0, (2.3.6)
and
lim inf
{n}
|Xn|d P 0. (2.3.7)
Then
E[X | F] = X. (2.3.8)
In particular E[X ] = E[X ].
Proof Using the definition of conditional expectation, we have to show that for every
A F,
A I{}E[X | F ]dP =
A I{}X dP =
A I{}Xd P.
However, { } =n0{ = n} { n}. Hence it suffices to show that, for all n 0:A
I{=n}{n}X dP =
A
I{=n}{n}XdP
=
A
I{=n}{n}XndP. (2.3.9)
Now, { : () n} = { : () = n}{ : () n + 1} and in view of (2.3.1), thelast integral in (2.3.9) is equal to
A{=n}{=n}XndP +
A{=n}{n+1}
Xn+1dP
=
A{=n}{=n}X d P +
A{=n}{n+1}
Xn+1dP. (2.3.10)
-
8/6/2019 Filtering and Measure Theory
67/270
2.3 Discrete time martingales 55
Also, { : () n} = { : n () n + 1}{ : () n + 2} and using (2.3.1)again, (2.3.10) equals
A{=n}{nn+1}X d P + A{=n}{n+2}
Xn+2dP.
Repeating this step k times,A
I{=n}{n}Xn dP =
A{=n}{nn+k}X dP
+
A{=n}{n+k+1}Xn+k+1dP,
that is A{=n}{nn+k}
X d P =
A{=n}{n}XndP
A{=n}{n+k+1}Xn+k+1d P.
Now,
Xn+k+1 = X+n+k+1 Xn+k+1= 2X+n+k+1 (X+n+k+1 + Xn+k+1) = 2X+n+k+1 |Xn+k+1|
so that A{=n}{nn+k}
X dP =
A{=n}{n}Xnd P
2
A{=n}{n+k+1}X+n+k+1d P
+ A{=n}{n+k+1}
|Xn+k+1|dP. (2.3.11)
Taking the limit when k of both sides of (2.3.11) and using (2.3.7), we obtainA{=n}{n}
X d P =
A{=n}{n}XndP,
which establishes (2.3.9) and finishes the proof.
Definition 2.3.13 The stochastic process {Xn,Fn} is a local martingale if there is a se-quence of stopping times {k} increasing to with probability 1 and such that{Xnk,Fn}is a martingale.
Remark 2.3.14 The interesting fact about local martingales is that they can be obtained
rather naturally through a martingale transform (stochastic integral in the continuous time
case) which is defined as follows. Suppose {Yn ,Fn} is a martingale and {An,Fn} is a
-
8/6/2019 Filtering and Measure Theory
68/270
56 Stochastic processes
predictable process. Then the sequence
Xn = A0Y0 +n
k=1Ak(Yk Yk1)
is called a martingale transform and is a local martingale.Proof To show that {Xn ,Fn} is a local martingale we have to find a sequence of stop-ping times {k}, k 1, increasing to infinity (P-a.s.) and such that the stopped pro-cess {Xmin(n,k),Fn} is a martingale. Let k = inf{n 0 : |An+1| > k}. Since A is pre-dictable the k are stopping times and clearly k (P-a.s.). Since Y is a martingale and|Amin(n,k)I{k>n}| k then, for all n 1,
E[|Xmin(n,k)I{k>n}| < .
Moreover, from Theorem 2.3.11,
E[(Xmin(n+1,k) Xmin(n,k))I{k>n} | Fn]= I{k>n}Amin(n+1,k)E[Ymin(n+1,k) Ymin(n,k) | Fn] = 0.
This finishes the proof.
Example 2.3.15 Suppose that you are playing a game using the following strategy. At
each time n your stake is An. Write Xn for the state of your total gain through the n-th game
with X0 = 0 for simplicity.Write Fn = {Xk : 0 k n}. We suppose for each n, An is Fn1 measurable, that
is A = {An} is predictable with respect to the filtration Fn. This means that An =An(X0, X1, . . . , Xn1) is a function of X0, X1, . . . , Xn1.
If we assume that you win (or lose) at time n if a Bernouilli random variable bn is equal
to 1 (or 1), then
Xn =n
k=1Akbk =
nk=1
AkCk.
Here Ck = Ck Ck1 and Ck =k
i=1 bi . IfCis a martingale with respect to the filtrationFn (in this case we say that the game is fair), then the same thing holds for X because
E[Xn | Fn1] = Xn1 + AnE[Cn Cn1 | Fn1]= Xn1 + An (E[Cn | Fn1] Cn1)
=Xn1
+An (Cn1
Cn1)
=Xn1.
2.4 Doob decomposition
A submartingale is a process which on average is nondecreasing. Unlike a martingale,
which has a constant mean over time, a submartingale has a trend or an increasing predictable
part perturbated by a martingale component which is not predictable. This is made more
precise by the following theorem due to J. L. Doob.
-
8/6/2019 Filtering and Measure Theory
69/270
2.4 Doob decomposition 57
Theorem 2.4.1 (Doob Decomposition). Any submartingale {Xn} can be written (P-a.s.uniquely) as
Xn = Yn + Zn, a.s. (2.4.1)
where {Yn} is a martingale and{Zn} is a predictable, increasing process, i.e. E(Zn ) < ,Z1 = 0 and Zn Zn+1 a.s. n.
Proof Write n = Xn Xn1, yi = i E[i | Fi1] and zi = E[i | Fi1], z0 = 0.Then:
Xn = 1 E[1 | F0] + 2 E[2 | F1]
+ +n
E[n
|Fn
1]
+
n
i=1
E[i
|Fi
1]
=n
i=1yi +
ni=1
zi
= Yn + Zn,
To prove uniqueness suppose that there is another decomposition Xn = Yn + Zn =
ni=1 yi +ni=1 zi . Let yn + zn = xn = yn + zn and take conditional expectation with re-spect to Fn1 to get zn = zn , because yn is a martingale increment and zn is predictable.This implies yn = yn and the uniqueness of the decomposition.
Remarks 2.4.2
1. In Theorem 2.4.1 if{Xn} is just an Fn -adapted and integrable process the decompositionremains valid but we lose the increasing property of the process {Zn}.
2. The process X
Z is a martingale; as a result Z is called the compensator of the
submartingale X.3. A processes which is the sum of a predictable process and a martingale is called a
semimartingale.
4. Uniqueness of the decomposition is ensured by the predictability of the process {Zn}.
Definition 2.4.3 A discrete-time stochastic process {Xn} , with finite-state space S ={s1, s2, . . . , sN}, defined on a probability space (,F, P) is a Markov chain if
P(Xn+1 = sin+1 | X0 = si0 , . . . , Xn = sin ) = P(Xn+1 = sin+1 | Xn = sin ),
for all n 0 and all states si0 , . . . , sin , sin+1 S. This is termed the Markov property.{Xn} is a homogeneous Markov chain if
P(Xn+1 = sj | Xn = si ) = j iis independent of n.
-
8/6/2019 Filtering and Measure Theory
70/270
58 Stochastic processes
The matrix = {j i} is called the probability transition matrix of the homogeneousMarkov chain and it satisfies the property
Nj=1 j i = 1.
Note that our transition matrix is the transpose of the traditional transition matrix defined
elsewhere. The convenience of this choice will be apparent later.
The following properties of a homogeneous Markov chain are easy to check.
1. Let 0 = ( 01 , 02 , . . . , 0N) be the distribution of X0. ThenP(X0 = si0 , X1 = si1 , . . . , Xn = sin ) = 0i0 i0i1 . . . in1in .
2. Let n = ( n1 , n2 , . . . , nN) be the distribution of Xn . Then
n = n 0 = n1.
Example 2.4.4 Let {n} be a discrete-time Markov chain as in Definition 2.4.3. Considerthe filtration {Fn} = {0, 1, . . . , n}.
Write Xn = (I(n=s1), I(n=s2), . . . , I(n=sN)).Then Xn is a discrete-time Markov chain with state space the set of unit vectors e1 =
(1, 0, . . . , 0), . . . , eN = (0, . . . , 1) of IRN. However, the probability transitions matrix ofX is . We can write:
E[Xn | Fn1] = E[Xn | Xn1] = Xn1, (2.4.2)
from which we conclude that Xn1 is the predictable part of Xn , given the history of X
up to time n 1 and the nonpredictable part of Xn must be Mn = Xn Xn1. In factit can be easily shown that Mn IRN is a